CN112328839B - Enterprise risk identification method and system based on enterprise marketing relationship graph - Google Patents
Enterprise risk identification method and system based on enterprise marketing relationship graph Download PDFInfo
- Publication number
- CN112328839B CN112328839B CN202011224147.0A CN202011224147A CN112328839B CN 112328839 B CN112328839 B CN 112328839B CN 202011224147 A CN202011224147 A CN 202011224147A CN 112328839 B CN112328839 B CN 112328839B
- Authority
- CN
- China
- Prior art keywords
- enterprise
- similarity
- entry
- sales
- goods
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004364 calculation method Methods 0.000 claims abstract description 53
- 238000012216 screening Methods 0.000 claims abstract description 18
- 238000010276 construction Methods 0.000 claims abstract description 17
- 230000011218 segmentation Effects 0.000 claims description 15
- 102100038367 Gremlin-1 Human genes 0.000 claims description 11
- 101001032872 Homo sapiens Gremlin-1 Proteins 0.000 claims description 11
- 230000018109 developmental process Effects 0.000 description 4
- 241000531116 Blitum bonus-henricus Species 0.000 description 2
- 235000008645 Chenopodium bonus henricus Nutrition 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 101150039322 outE gene Proteins 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Computational Linguistics (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Animal Behavior & Ethology (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
Abstract
The invention discloses an enterprise risk identification method and system based on enterprise business-to-sales relationship graph, comprising data set construction; constructing a relation map, adding each enterprise and the entry and sales item goods in the data set into a graph database as nodes, and adding the corresponding relation between each enterprise and the entry and sales item goods into the graph database as edges; inquiring the sum proportion of the shared entry and sales item goods between any two enterprises in the total entry and sales item goods according to the constructed relation graph, and calculating the similarity of the enterprise entry and sales item goods; and screening similar enterprises of the marketing items according to the similarity calculation result, and identifying the enterprise risk through the enterprise industry attribute. According to the method and the system, the enterprise risk identification is carried out by calculating the similarity of the business entry and sales items, so that the business attributes of the business entry and sales items similar to the business are compared, and the aim that tax supervision departments can manage and analyze the enterprise more effectively is achieved.
Description
Technical Field
The invention belongs to an enterprise risk identification method and system based on enterprise business-to-sales relationship maps.
Background
With the development of economy, the role of tax field is more and more important, and for the supervision department of tax field, the dynamics of industry change and enterprise development need to be concerned constantly, and unified analysis and management are carried out to the enterprise that the business is similar, so how to find the enterprise that the business is similar becomes a valuable problem.
The existing method for searching for the business-in and business-out similar enterprises is also limited to network searching, and the similarity degree of the business-in and business-out similar enterprises is judged by comparing the text descriptions of the business-in and business-out commodity among enterprises in the same industry.
Disclosure of Invention
The invention aims to provide an enterprise risk identification method and system based on an enterprise business-to-business relationship map, which at least solve the technical problems that the data source acquired by the existing method for searching for business-to-business items similar to the business-to-business items is inaccurate and the text description is difficult to accurately summarize business-to-business conditions of the business.
In order to achieve the above object, in one aspect, the present invention provides an enterprise risk identification method based on an enterprise business-to-sales relationship map, including:
constructing a data set, acquiring enterprise information through enterprise tax data, performing word segmentation matching on the acquired enterprise information, and establishing a corresponding relation between an enterprise and matched goods of an entry and a sales item;
constructing a relation map, adding each enterprise and the entry and sales item goods in the data set into a graph database as nodes, and adding the corresponding relation between each enterprise and the entry and sales item goods into the graph database as edges;
calculating the similarity of the business in-and-out items, namely inquiring the sum proportion of the common in-and-out item goods between any two businesses in the total in-and-out item goods according to the constructed relationship graph, and calculating the similarity of the business in-and-out item goods;
and screening similar enterprises of the marketing items according to the similarity calculation result, and identifying the enterprise risk through the enterprise industry attribute.
Optionally, the data set construction includes:
acquiring enterprise information through invoice data, wherein the enterprise information mainly comprises a sales side tax payer identification number, a sales side tax payer name, a supplier tax payer identification number, a supplier tax payer name, transaction amount, goods for sales and transaction time;
storing the names of the tax payers of the selling parties as word dictionary for matching the corresponding relation;
and matching the matched goods of the in-and-out item by using a word segmentation algorithm aiming at the acquired enterprise information, and storing the enterprise name and the goods name of the in-and-out item, so that the corresponding relation between the enterprise and the goods of the in-and-out item is established.
Optionally, the enterprise relational graph construction employs a distributed graph database janus graph.
Optionally, the calculating of the similarity of the goods of the business entry and sales items comprises:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the items in the enterprise B, which is positioned as the ith common item, is occupied by similarity_in, and the similarity is defined as item similarity;
the calculation formula of the similarity of the pin terms is as follows:
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i Positioned as the ith commonThe amount of sales items in enterprise B is equal to the sum of sales items in enterprise B, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
Optionally, the screening the business-in and business-out item similar enterprises and identifying the enterprise risk through the enterprise industry attribute includes:
and screening enterprises with similarity of the entry and the sales items of the target enterprises being larger than a set amount, serving as the entries, the sales items and similar enterprises of the entry and the sales items of the target enterprises, comparing industry attributes of the similar enterprises of the entry and the sales items, and identifying enterprise risks.
Optionally, the corresponding relation between each enterprise and the goods of the in-and-out item is added to the graph database as an edge, and the information such as transaction time, transaction amount and the like is added to the attribute of the edge.
Optionally, the enterprise business-in-sale similarity calculation and the query of the corresponding relationship include:
and querying the corresponding relation of the business entry and sales items by using gremlin graph query language.
On the other hand, the invention also provides an enterprise risk identification system based on the enterprise business-to-sales relationship map, which comprises the following steps:
the data set construction module acquires enterprise information through enterprise tax data, performs word segmentation matching on the acquired enterprise information, and establishes a corresponding relation between an enterprise and matched goods of an entry and a sales item;
a relation map module is constructed, each enterprise and the goods of the entry and the sales items in the data set are used as nodes to be added into a map database, and the corresponding relation between each enterprise and the goods of the entry and the sales items is used as edges to be added into the map database;
the business item similarity calculation module inquires the sum proportion of the business item goods shared between any two enterprises in the total business item goods according to the constructed relation graph and calculates the business item goods similarity;
and the risk identification module is used for screening similar enterprises of the marketing items according to the similarity calculation result and identifying the enterprise risk through the enterprise industry attribute.
Optionally, the enterprise marketing item cargo similarity calculation module includes:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the items in the enterprise B, which is positioned as the ith common item, is occupied by similarity_in, and the similarity is defined as item similarity;
the calculation formula of the similarity of the pin terms is as follows:
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The sum of the sales items positioned as the ith common sales item in the enterprise B is occupied, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
Further, query of the corresponding relation of the business in-and-out items is performed by using gremlin graph query language.
The beneficial effects of the invention are as follows:
aiming at the tax field, the invention calculates the similarity of the business entry and sales items of the enterprise, so that the business attributes of the business entry and sales items similar to the enterprise are compared to identify the risk of the enterprise, and the aim of more effectively managing and analyzing the risk enterprise by tax supervision departments is fulfilled.
Furthermore, an enterprise relation graph covering mass data is constructed on the basis of a distributed graph database Janusgraph, graph calculation is realized through Gremlin query language, defects of a traditional database and a single-version Neo4j graph database in graph storage and graph mining are overcome, and the problems of large data volume, high development cost and the like are solved.
Further, when the similarity of the business entries and sales items is calculated, a calculation formula of the similarity is optimized by taking the comprehensive transaction amount information as a key parameter.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow chart of an enterprise risk identification method based on an enterprise business-to-sales relationship graph of the present invention;
fig. 2 is a schematic diagram showing the amount of money of the goods in the whole goods of the enterprise a according to the embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below. While the preferred embodiments of the present invention are described below, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In one aspect, the invention provides an enterprise risk identification method based on an enterprise business-to-sales relationship graph, which comprises the following steps:
and constructing a data set, acquiring enterprise information through enterprise tax data, performing word segmentation matching on the acquired enterprise information, and establishing a corresponding relation between the enterprise and the matched goods of the marketing items.
Specifically, enterprise information is obtained through invoice data, wherein the enterprise information mainly comprises a sales party tax payer identification number, a sales party tax payer name, a supplier tax payer identification number, a supplier tax payer name, transaction amount, goods for sales and transaction time;
storing the names of the tax payers of the selling parties as word dictionary for matching the corresponding relation; for example, the obtained enterprise names are company A and company B, the company A and the company B are stored in dictionary documents, and in addition, the enterprise abbreviations can be corresponding to the enterprise full names through manual screening, so that the abbreviation matching capability is improved.
Matching the matched goods of the in-and-out items by using a word segmentation algorithm aiming at the acquired enterprise information, and storing the enterprise name and the names of the goods of the in-and-out items so as to establish the corresponding relation between the enterprise and the goods of the in-and-out items; for example, the matching information of company a includes "goods 1, goods 2 and goods 3" and other information, and the goods 1, goods 2 and goods 3 can be matched by using the business name dictionary through the word segmentation algorithm, so that the corresponding relationship between company a and goods 1, goods 2 and goods 3 can be established, and it is necessary to explain that the word segmentation algorithm is a conventional algorithm known in the art, and no further description is given here.
Constructing a relation map, adding each enterprise and the entry and sales item goods in the data set into a graph database as nodes, and adding the corresponding relation between each enterprise and the entry and sales item goods into the graph database as edges;
specifically, the enterprise relationship graph construction adopts a distributed graph database Janusgraph.
It should be noted that, the data structure of the enterprise relationship graph is a directed graph, and the adopted data set mainly includes information on invoice data of the enterprise, so that the data set can be converted into a directed graph structure and imported into a Janusgraph database, wherein the directed graph structure includes a construction node and a construction edge.
The construction node comprises:
each enterprise in the data set is added to the graph database as a node, the label is nsr, the data set comprises an enterprise name, an enterprise tax number and a unique identification id number, and each marketing item goods in the data set is added to the graph database as a node, and the label is computability.
The construction edge comprises:
and adding the corresponding relation of each enterprise and the entry and sales goods into a graph database as edges, wherein the edge label of the entry goods is input, the edge label of the sales goods is output, and adding information such as transaction time, transaction amount and the like into the attribute of the edges, so as to facilitate the screening of time nodes and the calculation of the commodity amount ratio during inquiry. For example, when the amount ratio of a certain type of goods in all goods of the enterprise is calculated, when the sales items of the enterprise a have 3 kinds of goods including the goods 1 (amount a), the goods 2 (amount b), and the goods 3 (amount c), the amount ratio of the goods 3 is c/(a+b+c).
And (3) calculating the similarity of the business in-and-out items, namely inquiring the sum proportion of the common in-and-out item goods between any two businesses in the total in-and-out item goods according to the constructed relation graph, and calculating the similarity of the business in-and-out item goods.
Specifically, the corresponding relation query specifically includes:
enterprise node query:
and querying the target enterprise through a 'nsr' tag, and obtaining node information by using gremlin query language. For example: querying an enterprise node with an enterprise name of "spaceflight information", the following statement may be used:
g.V (). HasLabel ('nsr'). Has ('NSRMC', 'aerospace'). ToList ();
node and edge queries with similar entry and sale terms:
after enterprise information is obtained through enterprise node query, nodes and edges similar to all marketing items in a single level are rapidly found through 'id', and the following statement is used as an example of 'aerospace information':
g.V(‘id’).outE().otherV().inE().otherV().outE().simplePath().toList()。
specifically, the calculation of the similarity of the goods of the business entry and sales items comprises the following steps:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The amount of the item goods located as the ith common item in the enterprise B is the ratio, and similarity_in is defined as the item similarity.
The calculation formula of the similarity of the pin terms is as follows:
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The amount of sales in enterprise B that is located as the ith common sales item is the ratio, similarity_out is defined as sales item similarity.
The calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
And screening similar enterprises of the marketing items according to the similarity calculation result, and identifying the enterprise risk through the enterprise industry attribute.
Specifically, an enterprise with the similarity of the entry and the sales item of the target enterprise being larger than the set quantity is selected, the enterprise is used as the entry, the sales item and the similar enterprise of the entry and the sales item of the target enterprise, after the similar enterprise is selected, the industry attribute of the similar enterprise of the entry and the sales item is compared, and if two enterprises with similar commodities of the entry and the sales item are obviously different, the abnormal risk of the intersection information is judged.
On the other hand, the invention also provides an enterprise risk identification system based on the enterprise business-to-sales relationship map, which comprises the following steps:
the data set construction module is used for acquiring enterprise information, performing word segmentation matching on the acquired enterprise information, and establishing a corresponding relation between an enterprise and matched goods of the entry and the sales items;
a relation map module is constructed, each enterprise and the goods of the entry and the sales items in the data set are used as nodes to be added into a map database, and the corresponding relation between each enterprise and the goods of the entry and the sales items is used as edges to be added into the map database;
the business item similarity calculation module inquires the sum proportion of the business item goods shared between any two enterprises in the total business item goods according to the constructed relation graph and calculates the business item goods similarity;
and the risk identification module is used for screening similar enterprises of the marketing items according to the similarity calculation result and identifying the enterprise risk through the enterprise industry attribute.
Aiming at the tax field, the invention calculates the similarity of the business entry and sales items of the enterprises, so that the business attributes of the business entry and sales items similar to the enterprises are compared to identify the risk of the enterprises, and the aim of more effectively managing and analyzing the risk enterprises by the auxiliary tax supervision department is fulfilled.
Specifically, the enterprise business entry and sales item cargo similarity calculation module includes:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the items in the enterprise B, which is positioned as the ith common item, is occupied by similarity_in, and the similarity is defined as item similarity;
the calculation formula of the similarity of the pin terms is as follows:
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The sum of the sales items positioned as the ith common sales item in the enterprise B is occupied, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
Specifically, the enterprise business-in-and-out similarity calculation and the query of the corresponding relationship include:
and querying the corresponding relation of the business entry and sales items by using gremlin graph query language.
Example 1
Referring to fig. 1, the invention provides an enterprise risk identification method based on an enterprise business-to-sales relationship map, which comprises the following steps:
s1, acquiring enterprise tax data, constructing a data set, acquiring enterprise information through the enterprise tax data, performing word segmentation matching on the acquired enterprise information, and establishing a corresponding relation between an enterprise and matched goods of an entry and a sales item.
Specifically, acquiring enterprise information through tax data on an invoice, wherein the enterprise information mainly comprises a sales party tax payer identification number, a sales party tax payer name, a supplier tax payer identification number, a supplier tax payer name, transaction amount, goods for sales and transaction time, as shown in a table a;
table a
Storing the names of the tax payers of the selling parties as word dictionary for matching the corresponding relation; for example, the obtained enterprise names are company A and company B, the company A and the company B are stored in dictionary documents, and in addition, the enterprise abbreviations can be corresponding to the enterprise full names through manual screening, so that the abbreviation matching capability is improved.
Matching the matched goods of the in-and-out items by using a word segmentation algorithm aiming at the acquired enterprise information, and storing the enterprise name and the names of the goods of the in-and-out items so as to establish the corresponding relation between the enterprise and the goods of the in-and-out items; for example, the matching information of company a includes "goods 1, goods 2 and goods 3" and other information, and by using the word segmentation algorithm and the enterprise name dictionary, the goods 1, goods 2 and goods 3 can be matched, so that the corresponding relationship between company a and goods 1, goods 2 and goods 3 can be established.
S2, building a relation map through a graph database Janusgraph, adding each enterprise and the entry and sales item goods in the data set as nodes into the distributed graph database Janusgraph, and adding the corresponding relation between each enterprise and the entry and sales item goods into the distributed graph database Janusgraph as edges.
It should be noted that, the data structure of the enterprise relationship graph is a directed graph, and the adopted data set mainly includes information on invoice data of the enterprise, so that the data set can be converted into a directed graph structure and imported into a Janusgraph database, wherein the directed graph structure includes a construction node and a construction edge.
The construction node comprises:
each enterprise in the data set is added to the graph database as a node, the label is nsr, the data set comprises an enterprise name, an enterprise tax number and a unique identification id number, and each marketing item goods in the data set is added to the graph database as a node, and the label is computability.
The construction edge comprises:
and adding the corresponding relation of each enterprise and the entry and sales goods into a graph database as edges, wherein the edge label of the entry goods is input, the edge label of the sales goods is output, and adding information such as transaction time, transaction amount and the like into the attribute of the edges, so as to facilitate the screening of time nodes and the calculation of the commodity amount ratio during inquiry. For example, when the amount ratio of a certain type of goods in all goods of the enterprise a is calculated, the sales items of the enterprise a are 3, including goods 1 (amount a), goods 2 (amount b) and goods 3 (amount c), and the amount ratio of the goods 3 is c/(a+b+c), as shown in fig. 2.
S3, carrying out corresponding relation query by using gremlin query language
The method specifically comprises the following steps:
enterprise node query:
and querying the target enterprise through a 'nsr' tag, and obtaining node information by using gremlin query language. For example: querying an enterprise node with an enterprise name of "spaceflight information", the following statement may be used:
g.V (). HasLabel ('nsr'), has ('NSRMC', 'aerospace') toList ().
Node and edge queries with similar entry and sale terms:
after enterprise information is obtained through enterprise node query, nodes and edges similar to all marketing items in a single level are rapidly found through 'id', and the following statement is used as an example of 'aerospace information': g.V ('id'). OutE (). OtherV (). InE (). OtherV (). OutE (). SimplePath (). ToList ().
S4, calculating the similarity of the business in-and-out items, namely inquiring the sum proportion of the common in-and-out item goods between any two businesses in the total in-and-out item goods according to the constructed relation graph, and calculating the similarity of the business in-and-out item goods.
Specifically, the calculation of the similarity of the goods of the business entry and sales items comprises the following steps:
the method comprises the steps of calculating the entry similarity of an enterprise and the sales similarity of the enterprise, wherein the specific process of calculating the entry similarity is as follows:
the entry goods of the enterprise A and the enterprise B are respectively A:and B: { ib 1 ,ib 2 ,…,ib x The sum of which is respectively A: { ipa 1 ,ipa 2 ,…,ipa m And B: { ipb 1 ,ipb 2 ,…,ipb x Enterprise a and enterprise B have N identical incoming shipments (n.gtoreq. 0;N.ltoreq.m; n.ltoreq.x).
The formula of the entry similarity calculation is as follows:
the specific process for calculating the similarity of the pin items is as follows:
the sales items of the enterprise A and the enterprise B are respectively A: { oa 1 ,oa 2 ,…,oa n And B: { ob 1 ,ob 2 ,…,ob y And the sum of the two amounts is A { opa }, respectively 1 ,opa 2 ,…,opa n Sum of B { opb } 1 ,opb 2 ,…,opb y Enterprise a and enterprise B have M identical incoming shipments (n.gtoreq. 0;N.ltoreq.m; n.ltoreq.x).
The pin similarity calculation formula is:
the calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
S5, screening similar enterprises of marketing items
And screening enterprises with the similarity of the entry and the sales items of the target enterprises being larger than the set quantity, and taking the enterprises as the entries, the sales items and the similar enterprises of the entry and the sales items of the target enterprises.
In the embodiment, the set quantity is 50%, an enterprise relation graph covering mass data is constructed on the basis of a distributed graph database Janusgraph, graph calculation is realized through Gremlin query language, defects of a traditional database and a single-machine Neo4j graph database in graph storage and graph mining are overcome, and the problems of large data quantity, high development cost and the like are solved.
S6, identifying enterprise risks through enterprise industry attributes
And comparing the industry attributes of the business with similar business, and judging that the business has abnormal risk of intersection information if the two businesses with similar business exist in the business with similar business.
Example 2
The invention also provides an enterprise risk identification system based on the enterprise business-to-sales relationship map, which comprises:
the data set construction module acquires enterprise information through enterprise tax data, performs word segmentation matching on the acquired enterprise information, and establishes a corresponding relation between an enterprise and matched goods of an entry and a sales item;
a relation map module is constructed, each enterprise and the goods of the entry and the sales items in the data set are used as nodes to be added into a map database, and the corresponding relation between each enterprise and the goods of the entry and the sales items is used as edges to be added into the map database;
the business item similarity calculation module inquires the sum proportion of the business item goods shared between any two enterprises in the total business item goods according to the constructed relation graph and calculates the business item goods similarity;
and the risk identification module is used for screening similar enterprises of the marketing items according to the similarity calculation result and identifying the enterprise risk through the enterprise industry attribute.
Aiming at the tax field, the method and the system consider the problem of weights of main and auxiliary marketing commodities of enterprises by calculating the similarity of the marketing items of the enterprises, so that the enterprise risk identification is carried out by comparing the industry attributes of the marketing items similar to the enterprises, and the aim of more effectively managing and analyzing the risk enterprises by the auxiliary tax supervision department is fulfilled.
Specifically, the enterprise business entry and sales item cargo similarity calculation module includes:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the items in the enterprise B, which is positioned as the ith common item, is occupied by similarity_in, and the similarity is defined as item similarity;
the calculation formula of the similarity of the pin terms is as follows:
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The sum of the sales items positioned as the ith common sales item in the enterprise B is occupied, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
similar=(similar_in+similar_out)/2
where similarity is defined as the similarity of the entry and the sale items.
Specifically, the enterprise business-in-and-out similarity calculation and the query of the corresponding relationship include:
and querying the corresponding relation of the business entry and sales items by using gremlin graph query language.
Aiming at the tax field, the method and the system consider the problem of weights of main and auxiliary marketing commodities of enterprises by calculating the similarity of the marketing items of the enterprises, so that the enterprise risk identification is carried out by comparing the industry attributes of the marketing items similar to the enterprises, and the tax supervision department can more effectively manage and analyze the risk enterprises.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.
Claims (8)
1. An enterprise risk identification method based on an enterprise marketing relationship map is characterized by comprising the following steps:
constructing a data set, acquiring enterprise information through enterprise tax data, performing word segmentation matching on the acquired enterprise information, and establishing a corresponding relation between an enterprise and matched goods of an entry and a sales item;
constructing a relation map, adding each enterprise and the entry and sales item goods in the data set into a graph database as nodes, and adding the corresponding relation between each enterprise and the entry and sales item goods into the graph database as edges;
calculating the similarity of the business in-and-out items, namely inquiring the sum proportion of the common in-and-out item goods between any two businesses in the total in-and-out item goods according to the constructed relationship graph, and calculating the similarity of the business in-and-out item goods;
screening similar enterprises of the marketing items according to the similarity calculation result and identifying enterprise risks through enterprise industry attributes;
wherein, the business entry and sales item goods similarity calculation includes:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
,
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the ith shared incoming goods in the enterprise B is defined as the sum ratio, and similarity_in is defined as the incoming similarity;
the calculation formula of the similarity of the pin terms is as follows:
,
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The sum of the ith shared sales item goods in the enterprise B is defined as the sum ratio, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
,
where similarity is defined as the similarity of the entry and the sale items.
2. The method for identifying enterprise risk based on enterprise business trip relationship graph according to claim 1, wherein the data set construction comprises:
acquiring enterprise information through invoice data, wherein the enterprise information comprises a sales side tax payer identification number, a sales side tax payer name, a supplier tax payer identification number, a supplier tax payer name, transaction amount, goods for sales and transaction time;
storing the names of the tax payers of the selling parties as word dictionary for matching the corresponding relation;
and matching the matched goods of the in-and-out item by using a word segmentation algorithm aiming at the acquired enterprise information, and storing the enterprise name and the goods name of the in-and-out item, so that the corresponding relation between the enterprise and the goods of the in-and-out item is established.
3. The enterprise risk identification method based on the enterprise business trip relationship graph of claim 1, wherein the enterprise business trip relationship graph is constructed by using a distributed graph database janus graph.
4. The method for identifying enterprise risk based on enterprise business-to-business relationship graph according to claim 1, wherein the steps of screening business-to-business items for similar enterprises and identifying enterprise risk by enterprise industry attributes according to the similarity calculation result include:
and screening enterprises with similarity of the entry and the sales items of the target enterprises being larger than a set amount, serving as the entries, the sales items and similar enterprises of the entry and the sales items of the target enterprises, comparing industry attributes of the similar enterprises of the entry and the sales items, and identifying enterprise risks.
5. The enterprise risk identification method based on the enterprise business trip relationship graph according to claim 1, wherein the step of adding the corresponding relationship between each enterprise and the business trip item goods as an edge to the graph database comprises adding transaction time and transaction amount information to the attribute of the edge.
6. The method for identifying the enterprise risk based on the enterprise business relationship graph as claimed in claim 1, comprising the following steps: and calculating the similarity of the business entry and sales items and inquiring the corresponding relation, wherein the method comprises the following steps:
and querying the corresponding relation of the business entry and sales items by using gremlin graph query language.
7. An enterprise risk identification system based on an enterprise business-to-sales relationship graph, comprising:
the data set construction module acquires enterprise information through enterprise tax data, performs word segmentation matching on the acquired enterprise information, and establishes a corresponding relation between an enterprise and matched goods of an entry and a sales item;
a relation map module is constructed, each enterprise and the goods of the entry and the sales items in the data set are used as nodes to be added into a map database, and the corresponding relation between each enterprise and the goods of the entry and the sales items is used as edges to be added into the map database;
the business item similarity calculation module inquires the sum proportion of the business item goods shared between any two enterprises in the total business item goods according to the constructed relation graph and calculates the business item goods similarity;
the risk identification module is used for screening similar enterprises of the marketing items according to the similarity calculation result and identifying enterprise risks through enterprise industry attributes;
wherein, the business entry goods similarity calculation module includes:
the method comprises the steps of enterprise entry similarity and enterprise marketing similarity, wherein the calculation formula of the entry similarity is as follows:
,
wherein N is an entry cargo shared between the enterprises A and B, ipa i Defined as the sum of the ith shared incoming goods in enterprise A, ipb i The sum of the ith shared incoming goods in the enterprise B is defined as the sum ratio, and similarity_in is defined as the incoming similarity;
the calculation formula of the similarity of the pin terms is as follows:
,
wherein M is a sales item cargo shared between the enterprises A and B, opa i Defined as the sum of the ith shared sales item in the enterprise A, opb i The sum of the ith shared sales item goods in the enterprise B is defined as the sum ratio, and similarity_out is defined as sales item similarity;
the calculation formula of the similarity of the entry and the sales items is as follows:
,
where similarity is defined as the similarity of the entry and the sale items.
8. The enterprise risk identification system based on the enterprise business trip relationship graph of claim 7, wherein the enterprise business trip item similarity calculation, the query of the correspondence, comprises:
and querying the corresponding relation of the business entry and sales items by using gremlin graph query language.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224147.0A CN112328839B (en) | 2020-11-05 | 2020-11-05 | Enterprise risk identification method and system based on enterprise marketing relationship graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011224147.0A CN112328839B (en) | 2020-11-05 | 2020-11-05 | Enterprise risk identification method and system based on enterprise marketing relationship graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112328839A CN112328839A (en) | 2021-02-05 |
CN112328839B true CN112328839B (en) | 2024-02-27 |
Family
ID=74315843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011224147.0A Active CN112328839B (en) | 2020-11-05 | 2020-11-05 | Enterprise risk identification method and system based on enterprise marketing relationship graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112328839B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636971A (en) * | 2013-11-06 | 2015-05-20 | 航天信息股份有限公司 | Method of detecting one number for multiple names of value added tax invoice and system thereof |
CN105183767A (en) * | 2015-07-31 | 2015-12-23 | 山东大学 | Enterprise network-based enterprise business similarity calculation method and system |
KR101623322B1 (en) * | 2015-12-09 | 2016-05-20 | 최승출 | Syatem for collecting electronic tax invoice and making a erp connection and method thereof |
EP3255586A1 (en) * | 2016-06-06 | 2017-12-13 | Fujitsu Limited | Method, program, and apparatus for comparing data graphs |
CN108242020A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for calculating income and selling diversity factor between item item lists |
CN109615153A (en) * | 2017-09-26 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Businessman's methods of risk assessment, device, equipment and storage medium |
CN111695979A (en) * | 2020-06-18 | 2020-09-22 | 税友软件集团股份有限公司 | Method, device and equipment for analyzing relation between raw material and finished product |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9053437B2 (en) * | 2008-11-06 | 2015-06-09 | International Business Machines Corporation | Extracting enterprise information through analysis of provenance data |
US20120323618A1 (en) * | 2011-06-17 | 2012-12-20 | Sap Ag | Case-based retrieval of integration cases using similarity measures based on a business deomain ontology |
US8983936B2 (en) * | 2012-04-04 | 2015-03-17 | Microsoft Corporation | Incremental visualization for structured data in an enterprise-level data store |
-
2020
- 2020-11-05 CN CN202011224147.0A patent/CN112328839B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636971A (en) * | 2013-11-06 | 2015-05-20 | 航天信息股份有限公司 | Method of detecting one number for multiple names of value added tax invoice and system thereof |
CN105183767A (en) * | 2015-07-31 | 2015-12-23 | 山东大学 | Enterprise network-based enterprise business similarity calculation method and system |
KR101623322B1 (en) * | 2015-12-09 | 2016-05-20 | 최승출 | Syatem for collecting electronic tax invoice and making a erp connection and method thereof |
EP3255586A1 (en) * | 2016-06-06 | 2017-12-13 | Fujitsu Limited | Method, program, and apparatus for comparing data graphs |
CN108242020A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for calculating income and selling diversity factor between item item lists |
CN109615153A (en) * | 2017-09-26 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Businessman's methods of risk assessment, device, equipment and storage medium |
CN111695979A (en) * | 2020-06-18 | 2020-09-22 | 税友软件集团股份有限公司 | Method, device and equipment for analyzing relation between raw material and finished product |
Non-Patent Citations (2)
Title |
---|
Approximation capalbility of fuzzy input-output systems based om similarity measures;Li Yong-ming et al.;Proceedings of the 3rd World Congress on Intelligent Control and Automation;第1790-1794页 * |
JCT 与FIDIC 设计建造合同条件中支付管理的比较分析;王明皓;石油化工设计;第30-34页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112328839A (en) | 2021-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8180779B2 (en) | System and method for using external references to validate a data object's classification / consolidation | |
JP6301516B2 (en) | Fuzzy data manipulation | |
CN108985912B (en) | Data reconciliation | |
SG186687A1 (en) | Methods and systems for implementing approximate string matching within a database | |
KR20220092040A (en) | Customized competitive reinforcement consulting system by enterprise | |
CN112131348B (en) | Method for preventing repeated declaration of project based on similarity of text and image | |
CN110825817B (en) | Enterprise suspected association judgment method and system | |
Niu et al. | Reliability assessment of a multi-state distribution network under cost and spoilage considerations | |
US20150081728A1 (en) | Automatic format conversion | |
CN112328839B (en) | Enterprise risk identification method and system based on enterprise marketing relationship graph | |
CN111951081A (en) | System for enabling each material to be attached with information attribute and constructing scene by using data | |
AU2017201787B2 (en) | Fuzzy data operations | |
US20070226085A1 (en) | System and method for automated mapping of data in a multi-valued data structure | |
CN112950301A (en) | One-stop enterprise information service platform | |
Achmad Kuncoro | Factors that affect competitive advantage in freight forwarding industry on Jakarta-Indonesia | |
US20070214139A1 (en) | System and method for mapping data in a multi-valued data structure | |
JP7519795B2 (en) | Natural language processing device and program | |
US11694276B1 (en) | Process for automatically matching datasets | |
JP7519793B2 (en) | Natural language processing device and program | |
Abdalla et al. | Leverage data quality improvement for big data analytics | |
Conradie et al. | The necessity of information quality for effective business intelligence | |
CN117151764A (en) | Updating method, query tool and ERP system of expense information base | |
CN117634996A (en) | Lightweight product modeling method | |
Atmojo et al. | Warehouse Location Optimization with Clustering Analysis to Minimize Shipping Costs in Indonesia’s E-Commerce Case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |