CN117609362A - Data processing method, device, computer equipment and storage medium - Google Patents
Data processing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN117609362A CN117609362A CN202311368971.7A CN202311368971A CN117609362A CN 117609362 A CN117609362 A CN 117609362A CN 202311368971 A CN202311368971 A CN 202311368971A CN 117609362 A CN117609362 A CN 117609362A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- aggregation
- source
- aggregate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 10
- 230000002776 aggregation Effects 0.000 claims abstract description 92
- 238000004220 aggregation Methods 0.000 claims abstract description 92
- 238000005192 partition Methods 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000015654 memory Effects 0.000 claims description 24
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 239000000306 component Substances 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 208000025174 PANDAS Diseases 0.000 description 1
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of computers, and discloses a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a data table generated in a current time period, wherein the data table comprises unprocessed source data in the current time period; performing aggregation operation on source data in a data table to obtain target aggregation data; traversing a data summary table to determine a target partition of target aggregate data in the data summary table, and writing the target aggregate data into the target partition, wherein the data summary table is used for storing the aggregate data in each time period; and exporting the target aggregation data in the target partition to a target data table. In the embodiment of the application, a big data asynchronous derivative mode is adopted to decouple a source database; the flexibility of data processing is improved, multiple multi-dimensional processing aggregation can be performed on the data, and data interaction between different database types can be supported by exporting the aggregated data.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.
Background
Now, a more common method for synchronizing data in an original database to a target database is as follows: business systems based on Oracle databases. To facilitate synchronizing data for different database instances, data is queried from other database instances in a timed manner and synchronized into the materialized view. This approach can only be applied to early or smaller-scale traffic. However, as the services develop, each service starts to split the service and reform the micro service. In the reconstruction process, the cost of the database is increased, the resource consumption is high, the efficiency is greatly reduced, the original database is greatly influenced, and further, some applications need to remove the Oracle database with higher resource consumption, and the Mysql database with lower resource consumption is adopted. At this time, the data synchronization mode of Oracle materialized view synchronization with high coupling degree can not meet the requirement of service development.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a data processing method, apparatus, electronic device, and storage medium, so as to solve the problems of data synchronization resource waste and higher data coupling degree in large-scale service splitting and micro-service transformation.
In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
acquiring a data table generated in a current time period, wherein the data table comprises unprocessed source data in the current time period;
performing aggregation operation on the source data in the data table to obtain target aggregation data;
traversing a data summary table to determine a target partition of the target aggregate data in the data summary table, and writing the target aggregate data into the target partition, wherein the data summary table is used for storing aggregate data in each time period;
and exporting the target aggregation data in the target partition to a target data table.
Optionally, the acquiring the data table generated in the current time period includes:
inquiring unprocessed source data in at least one source database based on the current time period, wherein the source data is stored in a source data table in the source database;
storing the source data to a distributed file system;
and mapping the source data in the distributed file system into the data table by using a preset query statement, wherein the table structure of the data table is consistent with the table structure of the source data table corresponding to the source data.
Optionally, the aggregating operation is performed on the source data in the data table to obtain target aggregate data, including:
acquiring data attributes corresponding to the source data;
determining an aggregation mode corresponding to the data attribute;
and performing aggregation operation on the source data according to the aggregation mode to obtain the target aggregation data.
Optionally, after performing an aggregation operation on the source data in the data table to obtain target aggregate data, the method further includes:
detecting the data quantity corresponding to the source data in the data table;
acquiring adjacent partitions of the target partition from the aggregate table under the condition that the data volume is greater than or equal to a preset threshold;
comparing the aggregation data of the adjacent partitions with the target aggregation data to obtain incremental data;
and storing the increment data into an increment data table.
Optionally, after exporting the aggregate data to a target data table based on the target partition, the method further comprises:
detecting whether the data volume of the target data table meets a preset requirement or not;
and generating alarm information under the condition that the data quantity does not meet the preset requirement.
Optionally, the method further comprises:
under the condition that the data volume meets the preset requirement, comparing the target aggregation data in the target data table with the service data in the service database to obtain a comparison result;
and executing corresponding processing operation according to the comparison result.
Optionally, the executing a corresponding processing operation according to the comparison result includes:
inserting the target aggregation data into the service database under the condition that the comparison result is that the target aggregation data does not exist in the service database;
and under the condition that the service data is inconsistent with the target aggregation data as a result of the comparison, updating the service database by utilizing the target aggregation data.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the acquisition module is used for acquiring a data table generated in the current time period, wherein the data table comprises unprocessed source data in the current time period;
the aggregation module is used for carrying out aggregation operation on the source data in the data table to obtain target aggregation data;
the storage module is used for traversing a data summary table to determine a target partition of the aggregated data in the data summary table and writing the aggregated data into the target partition, wherein the data summary table is used for storing the aggregated data in each time period;
and the export module is used for exporting the aggregate data to a target data table based on the target partition.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including: the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions to perform the method of the first aspect or any implementation manner corresponding to the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect or any of its corresponding embodiments.
The method provided by the embodiment of the application has the following beneficial effects:
in the embodiment of the application, a big data asynchronous derivative mode is adopted to decouple a source database; the flexibility of data processing is improved, multiple multi-dimensional processing aggregation can be performed on the data, and data interaction between different database types can be supported by exporting the aggregated data.
Specifically, the embodiment of the application realizes the timing generation of the data table by acquiring the data table generated in the current time period, so that the automatic processing of aggregation and export of the source data in the current time period is performed, and the workload of manual operation and the possibility of errors are reduced. And the latest data can be processed in time, so that the aggregated and exported data is ensured to be based on the latest source data, and the latest aggregated data is stored in a data table. The target partition of the target aggregate data in the data summary table is determined by traversing the data summary table, and the target aggregate data is written into the target partition, so that the target partition can be flexibly selected for data storage and management according to the requirement.
In addition, by storing aggregated data over various time periods in a data table, historical data may be retained and historical queries supported. Therefore, the aggregation result in each time period can be conveniently traced and analyzed, and basis is provided for decision and analysis.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a data processing method according to some embodiments of the invention;
FIG. 2 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
According to embodiments of the present invention, there is provided a data processing method, apparatus, electronic device, and storage medium, it should be noted that the steps illustrated in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order different from that herein.
In this embodiment, a data processing method is provided, which may be used in the above mobile terminal, such as a mobile phone, a tablet computer, etc., fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:
step S11, a data table generated in the current time period is obtained, wherein the data table comprises unprocessed source data in the current time period.
The method provided by the embodiment of the application is applied to a big data platform, wherein the big data platform can be AirFlow, airFlow an open source platform for managing, dispatching and monitoring data flow. It provides a unified interface across multiple tasks and data sources that enables users to easily write, schedule, and monitor complex data flows. The air flow is mainly composed of three core components: scheduler (Scheduler): is responsible for scheduling and executing tasks according to a predetermined schedule. Actuator (Executor): is responsible for performing tasks in a distributed environment. Metadata base (Metadata Database): for storing metadata related to task scheduling, execution and monitoring.
In the embodiment of the application, the data table generated in the current time period is obtained, and the method comprises the following steps of A1-A3:
and step A1, inquiring unprocessed source data in at least one source database based on the current time period, wherein the source data is stored in a source data table in the source database.
In the embodiment of the present application, the current time period may be 1 day, 1 week or N hours, and the specific time period may be configured on the scheduler in advance.
In this embodiment of the present application, the source database is a database deployed outside the big data platform, where the source database may be an Oracle database or a Mysql database, and the source database is used to store application data of related applications, where the related applications may be house renting applications, and the application data may be browsing volume, collection volume, and volume of deals of a house source. The related application may also be furniture repair, the application data may be furniture repair type, repair number, etc.
And step A2, storing the source data to a distributed file system.
In the embodiment of the application, the distributed file system HDFS (Hadoop Distributed File System) is a Hadoop distributed file system, which is used for storing full-volume table data imported from a structural database (Oracle/MySql) and aggregated intermediate data.
And step A3, mapping the source data in the distributed file system into a data table by using a preset query statement, wherein the table structure of the data table is consistent with the table structure of the source data table corresponding to the source data.
As an example, assuming an Oracle database table, to create a table in Hive that is the same as the table structure, a table containing the same fields as the source data table may be defined using the Hive table. Because the structure of the Hive table is the same as that of the source data table, the data can be conveniently converted, processed and interacted by using the SQL statement of Hive to query and process the data, such as executing aggregation query, connecting other tables and the like.
And step S12, performing aggregation operation on the source data in the data table to obtain target aggregation data.
In the embodiment of the application, the aggregation operation is performed on the source data in the data table to obtain the target aggregated data, which comprises the following steps of:
step B1, obtaining data attributes corresponding to source data;
in the embodiment of the present application, the acquired source data may include the following corresponding attributes:
nominal properties: the values of the nominal attributes are names of symbols or entities, each representing a certain class, code or state, so the nominal attributes are in turn regarded as classified attributes. These values do not have to be in a meaningful order and are not quantitative. For example, the types of products can be distinguished into a variety of: daily necessities, foods, etc.
Binary attribute: a nominal attribute, there are only two categories or states: 0 or 1, where 0 often indicates no occurrence and 1 indicates occurrence. If 0 and 1 are to be corresponding to false and true, the binary attribute is the Boolean attribute. For example, a product is in a sold or unsold state, with only attributes in both the sold and unsold states.
Ordinal attribute: ordinal attributes may be used to record subjective quality assessment that cannot be objectively measured. Ordinal attributes are therefore commonly used for rating surveys. If the customer service quality of a sales department is evaluated, 0 indicates that the customer is very unsatisfied, 1 indicates that the customer is not very satisfied, 2 indicates that the customer is neutral, 3 indicates that the customer is satisfied, and 4 indicates that the customer is very satisfied.
In the embodiment of the present application, the data attribute may also be transaction class data, for example: sales, volume of deals, etc., and may also be service class data, such as: user rating data, after-market data, user behavior data, and the like.
The method for acquiring the data attribute corresponding to the source data is favorable for selecting a more proper mode for aggregation processing, and excessive data coupling is avoided.
Step B2, determining an aggregation mode corresponding to the data attribute;
in embodiments of the present application, the polymerization process may include the following:
combining: the records in the plurality of source data are combined into a whole, typically by matching, associating or aggregating the records according to some common identifier or key. For example, sales data for the same product is consolidated from data sources from different channels into an overall sales data set.
And (3) calculating: and carrying out mathematical calculation or statistical analysis on the plurality of source data to obtain a calculation result. For example, calculation operations such as summation, average value, maximum value, minimum value, etc. are performed on sales data of a plurality of regions, and an index such as the total sales, average sales, or maximum sales is obtained.
Summarizing: and summarizing certain specific dimensions of the plurality of source data to obtain a comprehensive summary result. For example, sales data of a plurality of products are collected by month to obtain a total sales amount or sales amount per month.
Such an aggregation operation may help us extract useful information from the raw, scattered data, forming a higher level view of the data or analysis results.
And step B3, performing aggregation operation on the source data according to an aggregation mode to obtain target aggregation data.
As an example, assume that there is a summary table named "sales_data" for storing all sales data. And data aggregation operation is carried out every day, and the sales data of the same day are summarized according to different dimensions to generate an aggregation result. This aggregate result may be stored in the summary table as a partition of the summary table and may be differentiated by date.
The method provided by the embodiment of the application can conveniently query and analyze the historical data. Each aggregation operation creates a new partition, and specific aggregate results can be queried according to different time ranges or other dimensions, without the need to calculate and aggregate the entire summary table each time. By taking the result of each aggregation operation as one partition of the total table, the aggregation data can be effectively managed and organized, and the efficiency of data query and analysis is improved.
In the embodiment of the present application, after performing an aggregation operation on source data in a data table to obtain target aggregate data, the method further includes: detecting the data quantity corresponding to the source data in the data table; acquiring adjacent partitions of the target partition from the aggregate table under the condition that the data volume is greater than or equal to a preset threshold value; comparing the aggregation data of the adjacent partitions with the target aggregation data to obtain incremental data; the delta data is stored to a delta data table.
As one example, source_data represents a source data table, containing the following fields:
date, category, product category, value, data value
aggregated_data represents an aggregate table containing the following fields:
date, category, product category, total_value, aggregate data value
The increment_data represents an increment data table to be used for storing increment data, and contains the following fields:
date, category, product category, increment value
First, the data amount of the source data table is detected, and if the data amount is greater than or equal to a preset threshold, a subsequent operation is performed. Then, the adjacent partition dates of the target partition are obtained from the aggregate table. In particular, SQL query statements may be used to obtain the dates of the day before and day after the target partition date. And respectively acquiring target aggregation data and aggregation data of the adjacent partitions from an aggregation table according to dates of the adjacent partitions. And comparing the target aggregate data with the aggregate data of the adjacent partitions to obtain incremental data. Finally, the delta data is stored in the delta data table, and specifically, the to_sql function of pandas can be used to store the data in the delta data table.
It should be noted that, for a table with a large data size, it may be relatively time-consuming and resource-consuming to perform a complete aggregation operation each time. To increase efficiency, only the data of the last two partitions may be processed, i.e., the latest partition is compared with the last partition.
By comparing the data of the two latest partitions, the embodiment of the application can find out the data which is newly added, updated or deleted in the new partition. These three types of operations refer to data that appears in a new partition that may be newly added, existing, but updated or deleted. Meanwhile, the incremental data extracted according to the types of the new adding, updating and deleting operations are stored in another aggregated data table, and the table is specially used for storing the incremental data. Corresponding fields, record operation types or other attribute information may be added as needed.
Through the process, data aggregation and incremental data processing can be separated, and processing efficiency and flexibility are improved. The aggregation data table stores the complete aggregation result, and the increment data table specifically stores the change data after each aggregation operation, so that the increment update or other processing after the aggregation operation is convenient.
Step S13, traversing a data summary table to determine a target partition of target aggregate data in the data summary table, and writing the target aggregate data into the target partition, wherein the data summary table is used for storing the aggregate data in each time period.
In the embodiment of the application, the data summary table is partitioned based on a time period, the target aggregate data is processed by aggregation of source data, the target aggregate data has a time attribute, a target partition corresponding to the data summary table can be found according to the time of the target aggregate data, and the target aggregate data is written into the data summary table partition. For example: if the target aggregate data belongs to the time period of the previous day, the data summary table is traversed first, the target partition of the previous day is determined, and the target aggregate data is written into the target partition of the previous day.
Step S14, exporting the target aggregation data in the target partition to a target data table.
In the embodiment of the present application, the purpose of exporting the incremental data table as a single data table is: the data may be made more readily available and accessible. Thus, the user can obtain the required data by directly accessing a single data table without carrying out data aggregation operation each time. In addition, after a single data table is exported, the data can be conveniently shared with other systems. Different organizations or systems may be interested in different data tables, so deriving a single data table may better meet the needs of data sharing.
In the embodiment of the application, a big data asynchronous derivative mode is adopted to decouple a source database; the flexibility of data processing is improved, multiple multi-dimensional processing aggregation can be performed on the data, and data interaction between different database types can be supported by exporting the aggregated data.
Specifically, the embodiment of the application realizes the timing generation of the data table by acquiring the data table generated in the current time period, so that the automatic processing of aggregation and export of the source data in the current time period is performed, and the workload of manual operation and the possibility of errors are reduced. And the latest data can be processed in time, so that the aggregated and exported data is ensured to be based on the latest source data, and the latest aggregated data is stored in a data table. The target partition of the target aggregate data in the data summary table is determined by traversing the data summary table, and the target aggregate data is written into the target partition, so that the target partition can be flexibly selected for data storage and management according to the requirement.
In addition, by storing aggregated data over various time periods in a data table, historical data may be retained and historical queries supported. Therefore, the aggregation result in each time period can be conveniently traced and analyzed, and basis is provided for decision and analysis.
In an embodiment of the present application, after exporting the aggregate data to the target data table based on the target partition, the method further includes: and detecting whether the data volume of the target data table meets the preset requirement. And generating alarm information under the condition that the data quantity does not meet the preset requirement.
And (3) by a timing scheduling tool, regularly inquiring whether a table of the current aggregated data exists in the target database and whether the data quantity in the table meets the requirement. If no table exists or the data does not meet the requirement, the data volume is empty or the data volume does not exceed the specified threshold, the alarm is given through mail and enterprise WeChat, and the execution is carried out again in a delayed mode.
In an embodiment of the present application, the method further includes: under the condition that the data volume meets the preset requirement, comparing the target aggregation data in the target data table with the service data in the service database to obtain a comparison result;
in the embodiment of the present application, corresponding processing operations are performed according to the comparison result, including:
under the condition that the comparison result is that the target aggregation data does not exist in the service database, the target aggregation data is inserted into the service database;
as an example, if there is no consumer preference of the customer in the user information in the business database, such as a historical browsing amount of the customer for the related product, but there is corresponding data in the target aggregate data, then the data in the target database about the consumer preference of the customer is inserted into the business database.
And under the condition that the comparison result is that the service data is inconsistent with the target aggregation data, updating the service database by utilizing the target aggregation data.
As an example, if the sales of a certain product in the service database is inconsistent with the sales of the same product in the target aggregate data, the corresponding data of the service database is updated by using the data in the target database.
In this embodiment, a data processing device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides a data processing apparatus, as shown in fig. 2, including:
an obtaining module 21, configured to obtain a data table generated in a current time period, where the data table includes unprocessed source data in the current time period;
the aggregation module 22 is configured to perform an aggregation operation on the source data in the data table to obtain target aggregated data;
a storage module 23, configured to traverse a data summary table to determine a target partition of the aggregate data in the data summary table, and write the aggregate data into the target partition, where the data summary table is used to store the aggregate data in each time period;
export module 24 is configured to export the aggregate data to the target data table based on the target partition.
In the embodiment of the application, the acquisition module is used for inquiring unprocessed source data in at least one source database based on the current time period, wherein the source data is stored in a source data table in the source database; storing the source data to a distributed file system; mapping source data in the distributed file system into a data table by using a preset query statement, wherein the table structure of the data table is consistent with the table structure of the source data table corresponding to the source data.
In this embodiment of the present application, an aggregation module is configured to obtain a data attribute corresponding to source data; determining an aggregation mode corresponding to the data attribute; and carrying out aggregation operation on the source data according to an aggregation mode to obtain target aggregation data.
In an embodiment of the present application, the apparatus further includes: the comparison module is used for detecting the data quantity corresponding to the source data in the data table; acquiring adjacent partitions of the target partition from the aggregate table under the condition that the data volume is greater than or equal to a preset threshold value; comparing the aggregation data of the adjacent partitions with the target aggregation data to obtain incremental data; the delta data is stored to a delta data table.
In an embodiment of the present application, the apparatus further includes: the detection module is used for detecting whether the data volume of the target data table meets the preset requirement; and generating alarm information under the condition that the data quantity does not meet the preset requirement.
In an embodiment of the present application, the apparatus further includes: the execution module is used for comparing the target aggregation data in the target data table with the service data in the service database under the condition that the data volume meets the preset requirement to obtain a comparison result; and executing corresponding processing operation according to the comparison result.
In this embodiment of the present application, the execution module is specifically configured to insert, when the comparison result is that the target aggregate data does not exist in the service database, the target aggregate data into the service database; and under the condition that the comparison result is that the service data is inconsistent with the target aggregation data, updating the service database by utilizing the target aggregation data.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, and as shown in fig. 3, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system).
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.
Claims (10)
1. A method of data processing, the method comprising:
acquiring a data table generated in a current time period, wherein the data table comprises unprocessed source data in the current time period;
performing aggregation operation on the source data in the data table to obtain target aggregation data;
traversing a data summary table to determine a target partition of the target aggregate data in the data summary table, and writing the target aggregate data into the target partition, wherein the data summary table is used for storing aggregate data in each time period;
and exporting the target aggregation data in the target partition to a target data table.
2. The method of claim 1, wherein the obtaining the data table generated during the current time period comprises:
inquiring unprocessed source data in at least one source database based on the current time period, wherein the source data is stored in a source data table in the source database;
storing the source data to a distributed file system;
and mapping the source data in the distributed file system into the data table by using a preset query statement, wherein the table structure of the data table is consistent with the table structure of the source data table corresponding to the source data.
3. The method of claim 1, wherein the aggregating the source data in the data table to obtain the target aggregate data comprises:
acquiring data attributes corresponding to the source data;
determining an aggregation mode corresponding to the data attribute;
and performing aggregation operation on the source data according to the aggregation mode to obtain the target aggregation data.
4. A method according to claim 3, wherein after aggregating the source data in the data table to obtain the target aggregate data, the method further comprises:
detecting the data quantity corresponding to the source data in the data table;
acquiring adjacent partitions of the target partition from the aggregate table under the condition that the data volume is greater than or equal to a preset threshold;
comparing the aggregation data of the adjacent partitions with the target aggregation data to obtain incremental data;
and storing the increment data into an increment data table.
5. The method of claim 1, wherein after exporting the aggregate data to a target data table based on the target partition, the method further comprises:
detecting whether the data volume of the target data table meets a preset requirement or not;
and generating alarm information under the condition that the data quantity does not meet the preset requirement.
6. The method of claim 5, wherein the method further comprises:
under the condition that the data volume meets the preset requirement, comparing the target aggregation data in the target data table with the service data in the service database to obtain a comparison result;
and executing corresponding processing operation according to the comparison result.
7. The method of claim 6, wherein the performing a corresponding processing operation based on the comparison result comprises:
inserting the target aggregation data into the service database under the condition that the comparison result is that the target aggregation data does not exist in the service database;
and under the condition that the service data is inconsistent with the target aggregation data as a result of the comparison, updating the service database by utilizing the target aggregation data.
8. A data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a data table generated in the current time period, wherein the data table comprises unprocessed source data in the current time period;
the aggregation module is used for carrying out aggregation operation on the source data in the data table to obtain target aggregation data;
the storage module is used for traversing a data summary table to determine a target partition of the aggregated data in the data summary table and writing the aggregated data into the target partition, wherein the data summary table is used for storing the aggregated data in each time period;
and the export module is used for exporting the aggregate data to a target data table based on the target partition.
9. A computer device, comprising:
a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, cause the processor to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311368971.7A CN117609362A (en) | 2023-10-20 | 2023-10-20 | Data processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311368971.7A CN117609362A (en) | 2023-10-20 | 2023-10-20 | Data processing method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117609362A true CN117609362A (en) | 2024-02-27 |
Family
ID=89954997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311368971.7A Pending CN117609362A (en) | 2023-10-20 | 2023-10-20 | Data processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117609362A (en) |
-
2023
- 2023-10-20 CN CN202311368971.7A patent/CN117609362A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2577507B1 (en) | Data mart automation | |
WO2018051096A1 (en) | System for importing data into a data repository | |
US11615076B2 (en) | Monolith database to distributed database transformation | |
CN112199433A (en) | Data management system for city-level data middling station | |
US10133779B2 (en) | Query hint management for a database management system | |
CN112527783A (en) | Data quality probing system based on Hadoop | |
CN112148718A (en) | Big data support management system for city-level data middling station | |
CN114595294B (en) | Data warehouse modeling and extracting method and system | |
CN111309712A (en) | Optimized task scheduling method, device, equipment and medium based on data warehouse | |
CN115408381A (en) | Data processing method and related equipment | |
CN115552392A (en) | Performing time dynamic range partition transformations | |
CN115640300A (en) | Big data management method, system, electronic equipment and storage medium | |
US9727666B2 (en) | Data store query | |
CN114428813A (en) | Data statistics method, device, equipment and storage medium based on report platform | |
CN111125045B (en) | Lightweight ETL processing platform | |
US20140074792A1 (en) | Automated database optimization | |
CN116362212A (en) | Report generation method, device, equipment and storage medium | |
CN116257594A (en) | Data reconstruction method and system | |
CN115718690A (en) | Data accuracy monitoring system and method | |
CN115510289A (en) | Data cube configuration method and device, electronic equipment and storage medium | |
CN117609362A (en) | Data processing method, device, computer equipment and storage medium | |
CN109033196A (en) | A kind of distributed data scheduling system and method | |
CN114860759A (en) | Data processing method, device and equipment and readable storage medium | |
Aydin et al. | Data modelling for large-scale social media analytics: design challenges and lessons learned | |
US8423575B1 (en) | Presenting information from heterogeneous and distributed data sources with real time updates |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |