US20150186434A1 - Efficiently Query Compressed Time-Series Data in a Database - Google Patents
Efficiently Query Compressed Time-Series Data in a Database Download PDFInfo
- Publication number
- US20150186434A1 US20150186434A1 US14/146,589 US201414146589A US2015186434A1 US 20150186434 A1 US20150186434 A1 US 20150186434A1 US 201414146589 A US201414146589 A US 201414146589A US 2015186434 A1 US2015186434 A1 US 2015186434A1
- Authority
- US
- United States
- Prior art keywords
- segments
- data
- value
- specified
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims description 27
- 238000007906 compression Methods 0.000 claims description 25
- 230000006835 compression Effects 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 14
- 230000006837 decompression Effects 0.000 claims description 13
- 238000007726 management method Methods 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000012886 linear function Methods 0.000 description 9
- 238000012885 constant function Methods 0.000 description 6
- WVCHIGAIXREVNS-UHFFFAOYSA-N 2-hydroxy-1,4-naphthoquinone Chemical compound C1=CC=C2C(O)=CC(=O)C(=O)C2=C1 WVCHIGAIXREVNS-UHFFFAOYSA-N 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G06F17/30321—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G06F17/30153—
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3059—Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
- H03M7/3064—Segmenting
Definitions
- the subject matter described herein relates to techniques for querying compressed time-series data in a database.
- Billions of devices will soon be interconnected while monitoring and controlling the physical infrastructure as well as other technical and natural objects.
- Such devices generate large amounts of information, often as time-series data.
- These devices include smart electricity meters and all kinds of sensors capturing data from the physical world, including industrial processes.
- the data needs to be stored in databases.
- data compression can be used to reduce the size of the data by, say, several hundred times.
- a query of time series data stored in a database specifies at least one value.
- the database includes (i) an index table specifying groups of segments of compressed time series data with corresponding ranges each having a lowest value and a highest value, and (ii) a segments table specifying individual segments of compressed time series data. Thereafter, using the index table, at least one group for which the specified at least one value falls within the corresponding range is identified.
- the segments table is then queried for the segments corresponding to the identified at least one group to generate a new segments table specifying at least one segment.
- the at least one segment specified by the new segments table is decompressed. Data responsive to the query within the decompressed at least one segment is then identified using the specified at least one value. Data characterizing the identified data responsive to the query is then provided.
- At least one of the receiving, identifying, querying, decompressing, identifying, and providing can be implemented by at least one data processor of at least one computing system.
- Providing data can include at least one of: displaying the data, loading the data into memory, transmitting the data to a remote computing system, and storing the data in a data store.
- the index table can be also generated, and it can be generated using a segments parameter that specifies a number of segments to be specified by a single group of segments and/or a time span to be covered by a single group of segments.
- the data can be compressed using piecewise regression-based compression.
- the segments table can specify a decompression function and decompression parameters for decompressing each segment.
- the database can include an in-memory database management system storing data in volatile memory database.
- the in-memory database management system can be a column-oriented database management system storing data as sections of columns.
- the current subject matter is also applicable to row oriented databases as well as databases that do not operate in-memory.
- the specified at least one value can include a predicate or ordering, wherein identifying data responsive to the query within the decompressed at least one segment using the specified at least one value further comprises applying the predicate or ordering to the decompressed at least one segment.
- segments of the segments table can be decompressed until data responsive to the query is identified.
- the parameter of the function can represent the minimum and the maximum of the function (which allows the pre-selection to be based on a corresponding param_ 1 alone).
- the compression algorithm employs only linear functions (and possibly additionally constant functions, as they can be converted to linear ones)
- the starting and ending points of a linear function in an interval can represent the minimum and the maximum value of the interval.
- the extrema for the pre-selection of the segments can be computed on-the-fly when answering a query instead of storing the maximum and minimum explicitly.
- Non-transitory computer program products i.e., physically embodied computer program products
- store instructions which when executed one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein.
- computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein.
- methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems.
- Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
- a network e.g. the Internet, a wireless wide area network, a local area
- the current subject matter enables database systems to not only store time-series data in a space-efficient way, but also to deal with queries of time-series data in an efficient manner while only marginally decreasing compression ratios.
- Such advances allow for applications to be generated that rely on fine-grained time-series data.
- FIG. 1 is a system diagram illustrating an architecture for querying time-series data stored in a database
- FIG. 2 is a process flow diagram illustrating a method for querying time-series data stored in a database.
- piecewise-regression-based compression can be used (such as described in U.S. patent application Ser. No. 13/746,802, entitled “Compressing a Time Series of Data”, the contents of which are hereby fully incorporated by reference).
- Such techniques can be integrated seamlessly in database-management systems including traditional relational database management systems. These techniques are also well suited for integration into in-memory database technology and column-stores such as the SAP HANA platform.
- piecewise-regression-based compression can lead to high compression ratios and can compress/store and decompress/retrieve (parts of) time series in a very fast way, it can be difficult to answer so-called value-based queries in an efficient way while keeping high compression ratios.
- An example of such a query is “select all time-series exceeding threshold x”. As such queries are very important in many domains, for example, in surveillance of sensor data, the usage of promising piecewise-regression-based compression techniques is currently limited.
- the current subject matter provides efficient processing of value-based queries in time-series data in database systems compressed by various compression techniques including, but not limited to, piecewise-regression-based compression.
- Table 1 illustrates how piecewise-regression-based compression stores compressed time series using a particular example representation. It will be appreciated that different algorithms and implementations can utilize different database schemata.
- Table 1 is an example table called ‘segments’ storing compressed time series.
- the table can store multiple time series identified by their ts_id.
- Each row of the table can stand for one segment (or interval) of a time series that is compressed by a regression function.
- the segment can be defined by its start and end points t_start and t_end.
- function_id can be the id of a function used to compress the segment (an arbitrary regression function may be chosen by the compression algorithm) and function_params can be the parameters of the function mentioned.
- the example data in Table 1 describes a time series having the ts_id 1 .
- the first segment starts at 2013-06-12 10:02:33 and is described by a parabolic regression function (function_id 2 ).
- This function type requires three function params.
- the second segment starts one second after the end of the first segment and is described by a linear function requiring two parameters.
- the segments table (Table 1) can be used internally and be hidden from the user.
- User in this device can refer to a person, a software application, and/or a service. From the user point of view, querying a time-series table having a design as in Table 2 can be preferred.
- Table 2 is an example for the first two decompressed values from the time series in Table 1.
- a table like Table 2 was the original source for obtaining the compressed segments table Table 1 using a compression algorithm.
- timeseries ts_id time value 1 2013-06-12 10:02:33 12.33 1 2013-06-12 10:02:34 12.41
- the user can send a query and a retrieval component within the database management system can internally translate a compressed table to an uncompressed table.
- the user query can be as follows:
- the retrieval component can decompress the segments table. It does so by calculating the function values from the regression function employed for compressing the respective segments for every point in the requested time frame (in this example for every second).
- SAP HANA is an in-memory database management system storing data in a columnar fashion.
- the user can call a decompression procedure to decompress a compressed time series and to provide it in a time-series database table.
- Other database languages such as PL/SQL can be used in other database management systems, and the functionality can also be natively integrated into the database. In the latter case, there would be no need for explicit calls of decompression procedures, and the user can access the time-series table transparently without knowing that internally some decompression takes place.
- the calculations described above can be executed extremely fast inside the database.
- value-based queries In contrast with regular database queries such as the query referenced above (Q1) which might have predicates on the time or the time-series id, value-based queries have (additional) predicates or ordering instructions such as “ORDER BY” (including corresponding TOP or RANK clauses) on the values of the time series.
- ORDER BY including corresponding TOP or RANK clauses
- the user queries all time series within the past three months where the value exceeds a threshold of 90.
- a threshold of 90 is a surveillance application of industrial devices for which exceeding a temperature of 90° C. for a longer period is dangerous.
- value >90 is only one example for value-based queries.
- Such queries may include all types of predicates or ordering instructions based on the value attribute of a time series.
- FIG. 1 is a diagram 100 that visualizes the architecture of a deployment within a relational database-management system 110 (sometimes referred to simply as a database).
- An external data source 120 (such as a sensor or a smart electricity meter) can provide time-series data to be compressed and loaded into the database 110 .
- a compression component 130 that can form part of the database 110 , can compress data received from the external data source 120 using, for example, techniques references above to result in compressed data 140 .
- the compressed data 140 can then be stored in data storage 150 of the database 110 .
- the data storage 150 can store data according to a particular internal data organization 155 such as the segments tables described above.
- a retrieval component 160 can have access to the data storage 150 that can decompress data stored therein as described above.
- external applications can access the retrieval component 160 via a data access interface 170 .
- the data access interface 170 can, for example, provide direct access to the database-management system using general purpose (e.g., SQL) or it can comprise other customized interfaces.
- the external applications can execute on a remote client computer 190 that accesses the database 110 over a network 180 via the data access interface 170 .
- the internal data organization 155 of the database 110 can comprise an index table ‘indextab’ that is in addition to segments table described above.
- the index table can store meta information for several segments (e.g., groups of segments, etc.) in aggregated values.
- indextab with example data. indextab ts_id t_start t_end minimum maximum 1 2013-06-12 2013-06-12 12.33 43.94 09:12:53 14:22:28
- Table 3 is an example of an ‘indextab’ table. This example includes only one entry for a time period of slightly more than five hours. The minimum and maximum values indicate the lowest and highest value occurring in the original time series (consisting of several segments in the segments table) in the time frame specified. When processing value-based queries, a query can be first executed on the index table. The result, in most cases, can help dramatically limit the number of segments that require decompression.
- indextab is stored in a database management system
- traditional index structures such as B+ trees on the minimum and maximum columns can be used to further speed-up access times. This needs to be chosen depending on the actual database technology (e.g., in in-memory column-store databases such as SAP HANA, additional indices may not be necessary as the column-store technology already acts as an index), and existing (auto-)tuning mechanisms may be used.
- the entries in an ‘indextab’ table can stand for a couple of segments (groups, etc.) in the ‘segments’ table. Determining the ratio of such entries to segments is an important trade-off: Too many entries reduce the compression ratio and too few entries lead to longer execution times. Controlling this trade-off is done by a user-defined parameter #segments which controls the number of segments to be covered by one entry in ‘indextab’. Alternatively, a parameter #entries per million can be used, which controls the number of entries which should be used to cover one million steps of a time series (e.g., one million seconds if the time series describes measurements every second).
- the ‘indextab’ tables can be generated by the compression component 130 while compressing a time series.
- the ‘indextab’ can use the original values of the time series to determine the maximum and minimum value for each entry in ‘indextab’.
- the following provides examples of how the retrieval component 160 operates.
- the example query Q2 would be first translated to a query on the index table:
- the result of this query is an empty set. This indicates to the retrieval component 160 to likewise return an empty set.
- tables can be populated with many more values. Assume that the result from the last query (Q3) is not empty, but as given in Table 4.
- this query (Q4) is an example how the result given in Table 4 is translated to a new query.
- the result from this query (Q4) is a new segments table (structured like Table 1) with a comparably very low number of entries.
- This new segments table can then be passed to the decompressing functionality that decompresses compressed time series (as described previously) and the predicate value >90 can be applied to the result in order to obtain the final result of the query which is passed via the data access interface 170 to the user (note that different predicates may be used in other value-based queries).
- FIG. 2 is a process flow diagram 200 in which, at 210 , a query of time series data stored in a database is received that specifies at least one value.
- the database includes an index table specifying groups of segments of compressed time series data with corresponding ranges each having a lowest value and a highest value, and a segments table specifying individual segments of compressed time series data.
- at 220 at least one group for which the specified at least one value falls within the corresponding range is identified using the index table.
- the segment table is queried for the segments corresponding to the identified at least one group to generate a new segments table specifying at least one segment.
- the at least one segment specified by the new segments table is then, at 240 , decompressed.
- Data responsive to the query is then identified, at 250 , using the value so that, at 260 , data characterizing the identified data responsive to the query can be provided (e.g., displayed, loaded, stored, transmitted to a remote computing system, etc.).
- the query can specify a predicate to further filter the results among the decompressed data.
- index-table approach In case that the compression algorithm (as described above) employs functions fulfilling certain criteria, implementations alternative to the index-table approach can be used. Such alternatives can be beneficial as they do not require metadata and, thus, require less storage space. Three alternatives are described below. In all three alternatives, it is not necessary to store the minimum and the maximum values explicitly (the corresponding columns can be removed from the tables in the internal data organization) nor are index tables required.
- a first alternative is segments that are compressed with constant functions. If a segment is approximated with a constant function, the parameter of the function represents already the minimum and the maximum of the function. In case the compression algorithm employs only constant functions, it is therefore possible to do the pre-selection based on param_ 1 alone.
- a second alternative can be used with segments that are compressed with linear functions. In case the compression algorithm employs only linear functions (and possibly additionally constant functions, as they can be converted to linear ones), the following variation can provide a favorable alternative: Instead of the two params of the linear function, the value of the first (val_start) and the last approximated value (val_end) of a segment are stored.
- the values val_start and val_end represent two points. These two points likewise define the linear function.
- the nature of the starting and the ending point of a linear function in an interval is that they represent the minimum and the maximum value of the interval. However, it is not clear which of the values is the maximum and which is the minimum. Using a distinction of cases, it is possible to pre-select the segments based on the values val_min and val_max.
- a third technique can be used with segments that are compressed with functions with the possibility of efficient computation of the extrema.
- the compression algorithm employs only functions for which the extrema can be computed efficiently
- the extrema for the pre-selection of the segments can be computed on-the-fly when answering a query instead of storing the maximum and minimum explicitly. This additionally requires very fast computation and a moderate number of segments.
- it may be faster to compute the extrema at runtime of the query (and possibly cache them for future use in temporary tables) than to decompress more segments, caused by using the index table.
- One or more aspects or features of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device (e.g., mouse, touch screen, etc.), and at least one output device.
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable data processor.
- the machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium.
- the machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
- the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- a display device such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer.
- CTR cathode ray tube
- LCD liquid crystal display
- a keyboard and a pointing device such as for example a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well.
- feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback
- touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
- the subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components.
- the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- LAN local area network
- WAN wide area network
- the Internet the global information network
- the computing system may include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The subject matter described herein relates to techniques for querying compressed time-series data in a database.
- Billions of devices will soon be interconnected while monitoring and controlling the physical infrastructure as well as other technical and natural objects. Such devices generate large amounts of information, often as time-series data. These devices include smart electricity meters and all kinds of sensors capturing data from the physical world, including industrial processes. To make use of such data, for example, in analytical applications, the data needs to be stored in databases. As storing huge amounts of information is extremely expensive—and in certain situations not feasible because of its size—data compression can be used to reduce the size of the data by, say, several hundred times.
- In one aspect, a query of time series data stored in a database is received that specifies at least one value. The database includes (i) an index table specifying groups of segments of compressed time series data with corresponding ranges each having a lowest value and a highest value, and (ii) a segments table specifying individual segments of compressed time series data. Thereafter, using the index table, at least one group for which the specified at least one value falls within the corresponding range is identified. The segments table is then queried for the segments corresponding to the identified at least one group to generate a new segments table specifying at least one segment. Next, the at least one segment specified by the new segments table is decompressed. Data responsive to the query within the decompressed at least one segment is then identified using the specified at least one value. Data characterizing the identified data responsive to the query is then provided.
- At least one of the receiving, identifying, querying, decompressing, identifying, and providing can be implemented by at least one data processor of at least one computing system. Providing data can include at least one of: displaying the data, loading the data into memory, transmitting the data to a remote computing system, and storing the data in a data store.
- The index table can be also generated, and it can be generated using a segments parameter that specifies a number of segments to be specified by a single group of segments and/or a time span to be covered by a single group of segments.
- The data can be compressed using piecewise regression-based compression. The segments table can specify a decompression function and decompression parameters for decompressing each segment.
- The database can include an in-memory database management system storing data in volatile memory database. The in-memory database management system can be a column-oriented database management system storing data as sections of columns. The current subject matter is also applicable to row oriented databases as well as databases that do not operate in-memory.
- The specified at least one value can include a predicate or ordering, wherein identifying data responsive to the query within the decompressed at least one segment using the specified at least one value further comprises applying the predicate or ordering to the decompressed at least one segment.
- In some variations, if it determined that there are no groups for which the specified at least one value falls within the corresponding range, segments of the segments table can be decompressed until data responsive to the query is identified.
- Other variations can be employed in which it is not necessary to store the minimum and the maximum values explicitly (the corresponding columns can be removed from the tables in the internal data organization) nor are index tables required. For example, if segments are compressed with constant functions, the parameter of the function can represent the minimum and the maximum of the function (which allows the pre-selection to be based on a corresponding param_1 alone). In case the compression algorithm employs only linear functions (and possibly additionally constant functions, as they can be converted to linear ones), the starting and ending points of a linear function in an interval can represent the minimum and the maximum value of the interval. Using a distinction of cases, it is possible to pre-select the segments based on the values val_min and val_max. In addition, if the compression algorithm employs only functions for which the extrema can be computed efficiently, the extrema for the pre-selection of the segments can be computed on-the-fly when answering a query instead of storing the maximum and minimum explicitly.
- Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
- The subject matter described herein provides many advantages. For example, the current subject matter enables database systems to not only store time-series data in a space-efficient way, but also to deal with queries of time-series data in an efficient manner while only marginally decreasing compression ratios. Such advances allow for applications to be generated that rely on fine-grained time-series data.
- The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a system diagram illustrating an architecture for querying time-series data stored in a database; and -
FIG. 2 is a process flow diagram illustrating a method for querying time-series data stored in a database. - To facilitate the compressed storage of time-series data, piecewise-regression-based compression can be used (such as described in U.S. patent application Ser. No. 13/746,802, entitled “Compressing a Time Series of Data”, the contents of which are hereby fully incorporated by reference). Such techniques can be integrated seamlessly in database-management systems including traditional relational database management systems. These techniques are also well suited for integration into in-memory database technology and column-stores such as the SAP HANA platform. While piecewise-regression-based compression can lead to high compression ratios and can compress/store and decompress/retrieve (parts of) time series in a very fast way, it can be difficult to answer so-called value-based queries in an efficient way while keeping high compression ratios. An example of such a query is “select all time-series exceeding threshold x”. As such queries are very important in many domains, for example, in surveillance of sensor data, the usage of promising piecewise-regression-based compression techniques is currently limited.
- The current subject matter provides efficient processing of value-based queries in time-series data in database systems compressed by various compression techniques including, but not limited to, piecewise-regression-based compression.
- Compressed Time-Series Data in Database Tables.
- Table 1 illustrates how piecewise-regression-based compression stores compressed time series using a particular example representation. It will be appreciated that different algorithms and implementations can utilize different database schemata.
-
TABLE 1 Database table ‘segments’ with example data, where function id 2 stands for a parabolic function and function id 1 stands for a linear function. segments ts_id t_start t_end function_id function_params 1 2013-06-12 10:02:33 2013-06-12 10:07:41 2 12.33, 21.84, 2.73 1 2013-06-12 10:07:42 2013-06-12 10:33:14 1 24.33, 7.74 - Table 1 is an example table called ‘segments’ storing compressed time series. Concretely, the table can store multiple time series identified by their ts_id. Each row of the table can stand for one segment (or interval) of a time series that is compressed by a regression function. The segment can be defined by its start and end points t_start and t_end. function_id can be the id of a function used to compress the segment (an arbitrary regression function may be chosen by the compression algorithm) and function_params can be the parameters of the function mentioned.
- The example data in Table 1 describes a time series having the ts_id 1. The first segment starts at 2013-06-12 10:02:33 and is described by a parabolic regression function (function_id 2). This function type requires three function params. The second segment starts one second after the end of the first segment and is described by a linear function requiring two parameters.
- Decompressing Time-Series Data in Databases.
- When piecewise-regression-based compression is implemented in a database, the segments table (Table 1) can be used internally and be hidden from the user. User in this device can refer to a person, a software application, and/or a service. From the user point of view, querying a time-series table having a design as in Table 2 can be preferred. Table 2 is an example for the first two decompressed values from the time series in Table 1. Likewise, a table like Table 2 was the original source for obtaining the compressed segments table Table 1 using a compression algorithm.
-
TABLE 2 Database table ‘timeseries’ with example data. timeseries ts_id time value 1 2013-06-12 10:02:33 12.33 1 2013-06-12 10:02:34 12.41 - If a user wants to display a time series, the user can send a query and a retrieval component within the database management system can internally translate a compressed table to an uncompressed table. Concretely, the user query can be as follows:
-
- Q1 SELECT*FROM timeseries WHERE ts_id=1
- AND time >=2013-06-12 00:00:00 AND time <=2013-06-12 23:59:59;
- Q1 SELECT*FROM timeseries WHERE ts_id=1
- The user can then expect a result table such as Table 2 (but with one row for each second of 2013-06-12). To obtain this result, the retrieval component can decompress the segments table. It does so by calculating the function values from the regression function employed for compressing the respective segments for every point in the requested time frame (in this example for every second).
- In an example implementation in SAP HANA, SQLScript language (the procedural database language in SAP HANA) can be used to perform the decompression. SAP HANA is an in-memory database management system storing data in a columnar fashion. In this scenario, the user can call a decompression procedure to decompress a compressed time series and to provide it in a time-series database table. Other database languages such as PL/SQL can be used in other database management systems, and the functionality can also be natively integrated into the database. In the latter case, there would be no need for explicit calls of decompression procedures, and the user can access the time-series table transparently without knowing that internally some decompression takes place. Regardless of the database management system, the calculations described above can be executed extremely fast inside the database.
- Value-Based Queries on Compressed Time-Series Data.
- In contrast with regular database queries such as the query referenced above (Q1) which might have predicates on the time or the time-series id, value-based queries have (additional) predicates or ordering instructions such as “ORDER BY” (including corresponding TOP or RANK clauses) on the values of the time series. The following is an example of such a query:
-
- Q2 SELECT*FROM timeseries WHERE value >90
- AND time >=2013-03-12 00:00:00 AND time <=2013-06-12 23:59:59;
- Q2 SELECT*FROM timeseries WHERE value >90
- In this example, the user queries all time series within the past three months where the value exceeds a threshold of 90. One example is a surveillance application of industrial devices for which exceeding a temperature of 90° C. for a longer period is dangerous. Note that value >90 is only one example for value-based queries. Such queries may include all types of predicates or ordering instructions based on the value attribute of a time series.
- For answering the example query Q2, it is possible to apply the naïve approach to first decompress all time series data in the respective time frame and to succinctly query all entries with a value >90. However, as a database may contain thousands of time series, this would be very expensive in terms of resource consumption and possibly prohibitive. As will be described below an index table can be provided that allows for pre-selection of segments which need to be decompressed in order to answer the query efficiently.
- Architecture of a Deployment in a Database Management System.
-
FIG. 1 is a diagram 100 that visualizes the architecture of a deployment within a relational database-management system 110 (sometimes referred to simply as a database). An external data source 120 (such as a sensor or a smart electricity meter) can provide time-series data to be compressed and loaded into thedatabase 110. Acompression component 130, that can form part of thedatabase 110, can compress data received from theexternal data source 120 using, for example, techniques references above to result incompressed data 140. Thecompressed data 140 can then be stored indata storage 150 of thedatabase 110. Thedata storage 150 can store data according to a particularinternal data organization 155 such as the segments tables described above. Aretrieval component 160 can have access to thedata storage 150 that can decompress data stored therein as described above. Finally, external applications can access theretrieval component 160 via a data access interface 170. The data access interface 170 can, for example, provide direct access to the database-management system using general purpose (e.g., SQL) or it can comprise other customized interfaces. The external applications can execute on aremote client computer 190 that accesses thedatabase 110 over anetwork 180 via the data access interface 170. - The
internal data organization 155 of thedatabase 110 can comprise an index table ‘indextab’ that is in addition to segments table described above. The index table can store meta information for several segments (e.g., groups of segments, etc.) in aggregated values. -
TABLE 3 Database table ‘indextab’ with example data. indextab ts_id t_start t_end minimum maximum 1 2013-06-12 2013-06-12 12.33 43.94 09:12:53 14:22:28 - Table 3 is an example of an ‘indextab’ table. This example includes only one entry for a time period of slightly more than five hours. The minimum and maximum values indicate the lowest and highest value occurring in the original time series (consisting of several segments in the segments table) in the time frame specified. When processing value-based queries, a query can be first executed on the index table. The result, in most cases, can help dramatically limit the number of segments that require decompression.
- Usage of Further Index Structures.
- As the table ‘indextab’ is stored in a database management system, traditional index structures such as B+ trees on the minimum and maximum columns can be used to further speed-up access times. This needs to be chosen depending on the actual database technology (e.g., in in-memory column-store databases such as SAP HANA, additional indices may not be necessary as the column-store technology already acts as an index), and existing (auto-)tuning mechanisms may be used.
- Control of the Number of Entries in ‘indextab’ Tables.
- With the examples above, the entries in an ‘indextab’ table can stand for a couple of segments (groups, etc.) in the ‘segments’ table. Determining the ratio of such entries to segments is an important trade-off: Too many entries reduce the compression ratio and too few entries lead to longer execution times. Controlling this trade-off is done by a user-defined parameter #segments which controls the number of segments to be covered by one entry in ‘indextab’. Alternatively, a parameter #entries per million can be used, which controls the number of entries which should be used to cover one million steps of a time series (e.g., one million seconds if the time series describes measurements every second).
- Creation of the ‘indextab’ Tables in the Compression Component.
- The ‘indextab’ tables can be generated by the
compression component 130 while compressing a time series. The ‘indextab’ can use the original values of the time series to determine the maximum and minimum value for each entry in ‘indextab’. - Retrieval Component.
- The following provides examples of how the
retrieval component 160 operates. To continue the example with temperature values exceeding 90° C. (Q2), looking at Table 3 indicates that there are no values bigger than 43.94, and the query can be terminated by returning an empty set—without doing any decompression at all. Concretely, the example query Q2 would be first translated to a query on the index table: -
- Q3 SELECT ts_id, t_start, t_end FROM indextab WHERE maximum >90
- AND time >=2013-03-12 00:00:00 AND time <=2013-06-12 23:59:59;
- Q3 SELECT ts_id, t_start, t_end FROM indextab WHERE maximum >90
- As mentioned above, in this example, the result of this query is an empty set. This indicates to the
retrieval component 160 to likewise return an empty set. - In a further example, tables can be populated with many more values. Assume that the result from the last query (Q3) is not empty, but as given in Table 4.
-
TABLE 4 Example result table. ts_id time Value 12 2013-03-15 11:43:25 2013-03-17 08:47:56 4 2013-03-17 04:21:42 2013-03-22 14:04:16 234 2013-04-28 01:00:28 2013-05-02 19:22:47 - The result from Table 4 can then be used to automatically assemble a query on the segments table (assume the existence of a table structured like Table 1, but containing millions of entries):
-
Q4 SELECT * FROM segments WHERE (1 = 2 OR (ts_idts_id = 12 AND time >= 2013-03-15 11:43:25 AND time <= 2013-03-17 08:47:56) OR (ts_id = 4 AND time >= 2013-03-17 04:21:42 AND time <= 2013-03-22 14:04:16) OR (ts_id = 234 AND time >= 2013-04-28 01:00:28 AND time <= 2013-05-02 19:22:47) ); - Note that this query (Q4) is an example how the result given in Table 4 is translated to a new query. The result from this query (Q4) is a new segments table (structured like Table 1) with a comparably very low number of entries. This new segments table can then be passed to the decompressing functionality that decompresses compressed time series (as described previously) and the predicate value >90 can be applied to the result in order to obtain the final result of the query which is passed via the data access interface 170 to the user (note that different predicates may be used in other value-based queries).
- Similarly, it is possible to select segments which are relevant for answering Top-K Queries very efficiently on the ‘indextab’ to reduce the amount of data for decompression.
-
FIG. 2 is a process flow diagram 200 in which, at 210, a query of time series data stored in a database is received that specifies at least one value. The database includes an index table specifying groups of segments of compressed time series data with corresponding ranges each having a lowest value and a highest value, and a segments table specifying individual segments of compressed time series data. Thereafter, at 220, at least one group for which the specified at least one value falls within the corresponding range is identified using the index table. Next, at 230, the segment table is queried for the segments corresponding to the identified at least one group to generate a new segments table specifying at least one segment. The at least one segment specified by the new segments table is then, at 240, decompressed. Data responsive to the query is then identified, at 250, using the value so that, at 260, data characterizing the identified data responsive to the query can be provided (e.g., displayed, loaded, stored, transmitted to a remote computing system, etc.). In some cases, the query can specify a predicate to further filter the results among the decompressed data. - In case that the compression algorithm (as described above) employs functions fulfilling certain criteria, implementations alternative to the index-table approach can be used. Such alternatives can be beneficial as they do not require metadata and, thus, require less storage space. Three alternatives are described below. In all three alternatives, it is not necessary to store the minimum and the maximum values explicitly (the corresponding columns can be removed from the tables in the internal data organization) nor are index tables required.
- A first alternative is segments that are compressed with constant functions. If a segment is approximated with a constant function, the parameter of the function represents already the minimum and the maximum of the function. In case the compression algorithm employs only constant functions, it is therefore possible to do the pre-selection based on param_1 alone. A second alternative can be used with segments that are compressed with linear functions. In case the compression algorithm employs only linear functions (and possibly additionally constant functions, as they can be converted to linear ones), the following variation can provide a favorable alternative: Instead of the two params of the linear function, the value of the first (val_start) and the last approximated value (val_end) of a segment are stored. Together with the values t_start and t_end, the values val_start and val_end represent two points. These two points likewise define the linear function. The nature of the starting and the ending point of a linear function in an interval is that they represent the minimum and the maximum value of the interval. However, it is not clear which of the values is the maximum and which is the minimum. Using a distinction of cases, it is possible to pre-select the segments based on the values val_min and val_max.
- A third technique can be used with segments that are compressed with functions with the possibility of efficient computation of the extrema. In case the compression algorithm employs only functions for which the extrema can be computed efficiently, the extrema for the pre-selection of the segments can be computed on-the-fly when answering a query instead of storing the maximum and minimum explicitly. This additionally requires very fast computation and a moderate number of segments. In this alternative, it may be faster to compute the extrema at runtime of the query (and possibly cache them for future use in temporary tables) than to decompress more segments, caused by using the index table.
- One or more aspects or features of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device (e.g., mouse, touch screen, etc.), and at least one output device.
- These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” (sometimes referred to as a computer program product) refers to physically embodied apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable data processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable data processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
- To provide for interaction with a user, the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
- The subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flow(s) depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/146,589 US9450602B2 (en) | 2014-01-02 | 2014-01-02 | Efficiently query compressed time-series data in a database |
US15/244,399 US10776319B2 (en) | 2014-01-02 | 2016-08-23 | Efficiently query compressed time-series data in a database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/146,589 US9450602B2 (en) | 2014-01-02 | 2014-01-02 | Efficiently query compressed time-series data in a database |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/244,399 Continuation US10776319B2 (en) | 2014-01-02 | 2016-08-23 | Efficiently query compressed time-series data in a database |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150186434A1 true US20150186434A1 (en) | 2015-07-02 |
US9450602B2 US9450602B2 (en) | 2016-09-20 |
Family
ID=53481997
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/146,589 Active 2034-07-07 US9450602B2 (en) | 2014-01-02 | 2014-01-02 | Efficiently query compressed time-series data in a database |
US15/244,399 Active 2035-11-10 US10776319B2 (en) | 2014-01-02 | 2016-08-23 | Efficiently query compressed time-series data in a database |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/244,399 Active 2035-11-10 US10776319B2 (en) | 2014-01-02 | 2016-08-23 | Efficiently query compressed time-series data in a database |
Country Status (1)
Country | Link |
---|---|
US (2) | US9450602B2 (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150032775A1 (en) * | 2013-07-26 | 2015-01-29 | Metamarkets Group Inc. | Segment data visibility and management in a distributed database of time stamped records |
US20150363167A1 (en) * | 2014-06-16 | 2015-12-17 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US20160329928A1 (en) * | 2015-05-07 | 2016-11-10 | Elster Solutions, Llc | System and method for efficient data compression in a communication system |
US9753935B1 (en) * | 2016-08-02 | 2017-09-05 | Palantir Technologies Inc. | Time-series data storage and processing database system |
US20180060374A1 (en) * | 2016-08-23 | 2018-03-01 | Sap Se | Optimizing column based database table compression |
US9952808B2 (en) | 2015-03-26 | 2018-04-24 | International Business Machines Corporation | File system block-level tiering and co-allocation |
CN108153483A (en) * | 2016-12-06 | 2018-06-12 | 南京南瑞继保电气有限公司 | A kind of time series data compression method based on attribute grouping |
CN109165217A (en) * | 2018-08-03 | 2019-01-08 | 北京涛思数据科技有限公司 | A kind of high-efficiency storage method of time series data |
US10216695B1 (en) | 2017-09-21 | 2019-02-26 | Palantir Technologies Inc. | Database system for time series data storage, processing, and analysis |
US10417224B2 (en) | 2017-08-14 | 2019-09-17 | Palantir Technologies Inc. | Time series database processing system |
US10417258B2 (en) | 2013-12-19 | 2019-09-17 | Exposit Labs, Inc. | Interactive multi-dimensional nested table supporting scalable real-time querying of large data volumes |
US10585907B2 (en) | 2015-06-05 | 2020-03-10 | Palantir Technologies Inc. | Time-series data storage and processing database system |
US10740309B2 (en) | 2015-12-18 | 2020-08-11 | Cisco Technology, Inc. | Fast circular database |
US10812332B2 (en) | 2018-02-28 | 2020-10-20 | Vmware Inc. | Impartial buffering in stream processing |
US10824623B2 (en) * | 2018-02-28 | 2020-11-03 | Vmware, Inc. | Efficient time-range queries on databases in distributed computing systems |
US10860576B2 (en) | 2018-01-26 | 2020-12-08 | Vmware, Inc. | Splitting a query into native query operations and post-processing operations |
CN112380268A (en) * | 2020-10-27 | 2021-02-19 | 国网宁夏电力有限公司经济技术研究院 | Method, device, equipment and storage medium for compressing equally spaced time series |
US11016972B2 (en) | 2018-01-26 | 2021-05-25 | Vmware, Inc. | Splitting a time-range query into multiple sub-queries for serial execution |
US11016971B2 (en) | 2018-01-26 | 2021-05-25 | Vmware, Inc. | Splitting a time-range query into multiple sub-queries for parallel execution |
US11016986B2 (en) | 2017-12-04 | 2021-05-25 | Palantir Technologies Inc. | Query-based time-series data display and processing system |
US11023469B2 (en) * | 2017-11-29 | 2021-06-01 | Teradata Us, Inc. | Value list compression (VLC) aware qualification |
US20210182416A1 (en) * | 2019-12-13 | 2021-06-17 | Vmware, Inc. | Method and system for secure access to metrics of time series data |
US11178213B2 (en) | 2018-02-28 | 2021-11-16 | Vmware, Inc. | Automated configuration based deployment of stream processing pipeline |
CN113742335A (en) * | 2021-01-28 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Data compression management method and device |
US11281726B2 (en) | 2017-12-01 | 2022-03-22 | Palantir Technologies Inc. | System and methods for faster processor comparisons of visual graph features |
US11314738B2 (en) | 2014-12-23 | 2022-04-26 | Palantir Technologies Inc. | Searching charts |
US11379453B2 (en) | 2017-06-02 | 2022-07-05 | Palantir Technologies Inc. | Systems and methods for retrieving and processing data |
US11741124B2 (en) | 2018-01-26 | 2023-08-29 | Vmware, Inc. | Data ingestion by distributed-computing systems |
US12118041B2 (en) * | 2019-10-13 | 2024-10-15 | Thoughtspot, Inc. | Query execution on compressed in-memory data |
US12124467B2 (en) | 2023-06-07 | 2024-10-22 | Palantir Technologies Inc. | Query-based time-series data display and processing system |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9450602B2 (en) * | 2014-01-02 | 2016-09-20 | Sap Se | Efficiently query compressed time-series data in a database |
CN107506490B (en) * | 2017-09-22 | 2020-08-11 | 深圳大学 | Priority query algorithm and system based on position top-k keyword query under sliding window |
CN109302187A (en) * | 2018-08-31 | 2019-02-01 | 中国海洋大学 | A kind of data compression method for subsurface buoy thermohaline depth data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5093782A (en) * | 1987-12-14 | 1992-03-03 | Texas Instruments Incorporated | Real time event driven database management system |
US20040015478A1 (en) * | 2000-11-30 | 2004-01-22 | Pauly Duncan Gunther | Database |
US20070208792A1 (en) * | 2004-09-13 | 2007-09-06 | Expway | Method for compressing and decompressing a sequence of numbers |
US20130103657A1 (en) * | 2010-05-14 | 2013-04-25 | Hitachi, Ltd. | Time-series data management device, system, method, and program |
US20140122022A1 (en) * | 2012-10-31 | 2014-05-01 | International Business Machines Corporation | Processing time series data from multiple sensors |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7827218B1 (en) * | 2006-11-18 | 2010-11-02 | X-Engines, Inc. | Deterministic lookup using hashed key in a multi-stride compressed trie structure |
US9524317B2 (en) * | 2007-08-24 | 2016-12-20 | International Business Machines Corporation | Optimization of aggregate queries in database management systems using an early out join when processing min and max functions |
US20110218978A1 (en) * | 2010-02-22 | 2011-09-08 | Vertica Systems, Inc. | Operating on time sequences of data |
KR101480374B1 (en) * | 2010-05-24 | 2015-01-09 | 에어 프로덕츠 앤드 케미칼스, 인코오포레이티드 | Bulk distribution method |
US8742959B1 (en) | 2013-01-22 | 2014-06-03 | Sap Ag | Compressing a time series of data |
US9450602B2 (en) * | 2014-01-02 | 2016-09-20 | Sap Se | Efficiently query compressed time-series data in a database |
-
2014
- 2014-01-02 US US14/146,589 patent/US9450602B2/en active Active
-
2016
- 2016-08-23 US US15/244,399 patent/US10776319B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5093782A (en) * | 1987-12-14 | 1992-03-03 | Texas Instruments Incorporated | Real time event driven database management system |
US20040015478A1 (en) * | 2000-11-30 | 2004-01-22 | Pauly Duncan Gunther | Database |
US20070208792A1 (en) * | 2004-09-13 | 2007-09-06 | Expway | Method for compressing and decompressing a sequence of numbers |
US20130103657A1 (en) * | 2010-05-14 | 2013-04-25 | Hitachi, Ltd. | Time-series data management device, system, method, and program |
US20140122022A1 (en) * | 2012-10-31 | 2014-05-01 | International Business Machines Corporation | Processing time series data from multiple sensors |
Non-Patent Citations (1)
Title |
---|
Zhang et al., A Self-Adaptive Regression-Based Multivariate Data Compression Scheme with Error Bound in Wireless Sensor Networks, 2013, Hindawi, Volume 2013, pp. 1-12 * |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150032775A1 (en) * | 2013-07-26 | 2015-01-29 | Metamarkets Group Inc. | Segment data visibility and management in a distributed database of time stamped records |
US10324942B2 (en) * | 2013-07-26 | 2019-06-18 | Snap Inc. | Segment data visibility and management in a distributed database of time stamped records |
US10417258B2 (en) | 2013-12-19 | 2019-09-17 | Exposit Labs, Inc. | Interactive multi-dimensional nested table supporting scalable real-time querying of large data volumes |
US10048937B2 (en) | 2014-06-16 | 2018-08-14 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US20150363167A1 (en) * | 2014-06-16 | 2015-12-17 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US9846567B2 (en) * | 2014-06-16 | 2017-12-19 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US10162598B2 (en) | 2014-06-16 | 2018-12-25 | International Business Machines Corporation | Flash optimized columnar data layout and data access algorithms for big data query engines |
US11314738B2 (en) | 2014-12-23 | 2022-04-26 | Palantir Technologies Inc. | Searching charts |
US10558399B2 (en) | 2015-03-26 | 2020-02-11 | International Business Machines Corporation | File system block-level tiering and co-allocation |
US9952808B2 (en) | 2015-03-26 | 2018-04-24 | International Business Machines Corporation | File system block-level tiering and co-allocation |
US11593037B2 (en) | 2015-03-26 | 2023-02-28 | International Business Machines Corporation | File system block-level tiering and co-allocation |
US20160329928A1 (en) * | 2015-05-07 | 2016-11-10 | Elster Solutions, Llc | System and method for efficient data compression in a communication system |
US10585907B2 (en) | 2015-06-05 | 2020-03-10 | Palantir Technologies Inc. | Time-series data storage and processing database system |
US11681678B2 (en) | 2015-12-18 | 2023-06-20 | Cisco Technology, Inc. | Fast circular database |
US11232087B2 (en) | 2015-12-18 | 2022-01-25 | Cisco Technology, Inc. | Fast circular database |
US10740309B2 (en) | 2015-12-18 | 2020-08-11 | Cisco Technology, Inc. | Fast circular database |
US10664444B2 (en) | 2016-08-02 | 2020-05-26 | Palantir Technologies Inc. | Time-series data storage and processing database system |
US9753935B1 (en) * | 2016-08-02 | 2017-09-05 | Palantir Technologies Inc. | Time-series data storage and processing database system |
US10235100B2 (en) * | 2016-08-23 | 2019-03-19 | Sap Se | Optimizing column based database table compression |
US20180060374A1 (en) * | 2016-08-23 | 2018-03-01 | Sap Se | Optimizing column based database table compression |
CN108153483A (en) * | 2016-12-06 | 2018-06-12 | 南京南瑞继保电气有限公司 | A kind of time series data compression method based on attribute grouping |
US11379453B2 (en) | 2017-06-02 | 2022-07-05 | Palantir Technologies Inc. | Systems and methods for retrieving and processing data |
US11397730B2 (en) | 2017-08-14 | 2022-07-26 | Palantir Technologies Inc. | Time series database processing system |
US10417224B2 (en) | 2017-08-14 | 2019-09-17 | Palantir Technologies Inc. | Time series database processing system |
US11573970B2 (en) | 2017-09-21 | 2023-02-07 | Palantir Technologies Inc. | Database system for time series data storage, processing, and analysis |
US10216695B1 (en) | 2017-09-21 | 2019-02-26 | Palantir Technologies Inc. | Database system for time series data storage, processing, and analysis |
US11914605B2 (en) | 2017-09-21 | 2024-02-27 | Palantir Technologies Inc. | Database system for time series data storage, processing, and analysis |
US11023469B2 (en) * | 2017-11-29 | 2021-06-01 | Teradata Us, Inc. | Value list compression (VLC) aware qualification |
US11281726B2 (en) | 2017-12-01 | 2022-03-22 | Palantir Technologies Inc. | System and methods for faster processor comparisons of visual graph features |
US12099570B2 (en) | 2017-12-01 | 2024-09-24 | Palantir Technologies Inc. | System and methods for faster processor comparisons of visual graph features |
US11016986B2 (en) | 2017-12-04 | 2021-05-25 | Palantir Technologies Inc. | Query-based time-series data display and processing system |
US11593365B2 (en) | 2018-01-26 | 2023-02-28 | Vmware, Inc. | Splitting a time-range query into multiple sub-queries for serial execution |
US10860576B2 (en) | 2018-01-26 | 2020-12-08 | Vmware, Inc. | Splitting a query into native query operations and post-processing operations |
US11741124B2 (en) | 2018-01-26 | 2023-08-29 | Vmware, Inc. | Data ingestion by distributed-computing systems |
US11514032B2 (en) | 2018-01-26 | 2022-11-29 | Vmware, Inc. | Splitting a query into native query operations and post-processing operations |
US11016971B2 (en) | 2018-01-26 | 2021-05-25 | Vmware, Inc. | Splitting a time-range query into multiple sub-queries for parallel execution |
US11016972B2 (en) | 2018-01-26 | 2021-05-25 | Vmware, Inc. | Splitting a time-range query into multiple sub-queries for serial execution |
US11178213B2 (en) | 2018-02-28 | 2021-11-16 | Vmware, Inc. | Automated configuration based deployment of stream processing pipeline |
US10812332B2 (en) | 2018-02-28 | 2020-10-20 | Vmware Inc. | Impartial buffering in stream processing |
US11586623B2 (en) | 2018-02-28 | 2023-02-21 | Vmware, Inc. | Efficient time-range queries on databases in distributed computing systems |
US10824623B2 (en) * | 2018-02-28 | 2020-11-03 | Vmware, Inc. | Efficient time-range queries on databases in distributed computing systems |
US11190401B2 (en) | 2018-02-28 | 2021-11-30 | Vmware Inc. | Impartial buffering in stream processing |
CN109165217A (en) * | 2018-08-03 | 2019-01-08 | 北京涛思数据科技有限公司 | A kind of high-efficiency storage method of time series data |
US12118041B2 (en) * | 2019-10-13 | 2024-10-15 | Thoughtspot, Inc. | Query execution on compressed in-memory data |
US20210182416A1 (en) * | 2019-12-13 | 2021-06-17 | Vmware, Inc. | Method and system for secure access to metrics of time series data |
CN112380268A (en) * | 2020-10-27 | 2021-02-19 | 国网宁夏电力有限公司经济技术研究院 | Method, device, equipment and storage medium for compressing equally spaced time series |
CN113742335A (en) * | 2021-01-28 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Data compression management method and device |
US12124467B2 (en) | 2023-06-07 | 2024-10-22 | Palantir Technologies Inc. | Query-based time-series data display and processing system |
Also Published As
Publication number | Publication date |
---|---|
US10776319B2 (en) | 2020-09-15 |
US20160357777A1 (en) | 2016-12-08 |
US9450602B2 (en) | 2016-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10776319B2 (en) | Efficiently query compressed time-series data in a database | |
US8868594B2 (en) | Split processing paths for a database calculation engine | |
US10474648B2 (en) | Migration of unified table metadata graph nodes | |
US20200004517A1 (en) | Cache efficient reading of result values in a column store database | |
US11892999B2 (en) | Faster access for compressed time series data: the block index | |
US20110313969A1 (en) | Updating historic data and real-time data in reports | |
JP2014120153A (en) | Column smart mechanism for column based database | |
US9959312B2 (en) | High performance index creation on sorted data using parallel query plans | |
US9177037B2 (en) | In-memory runtime for multidimensional analytical views | |
US11741117B2 (en) | Enterprise search using database views | |
CN112905600A (en) | Data query method and device, storage medium and electronic equipment | |
JP2015197909A (en) | Online analytical processing method using 2 level query by sql parsing and result cashing for processing large capacity data | |
US10558661B2 (en) | Query plan generation based on table adapter | |
US11016973B2 (en) | Query plan execution engine | |
US10877959B2 (en) | Integrated database table access | |
US8812523B2 (en) | Predicate result cache | |
US20160378814A1 (en) | Formula-Encoded Time Stamps for Time Series Data | |
US11789741B2 (en) | Determining an optimum quantity of interleaved instruction streams of defined coroutines | |
US9589038B1 (en) | Attribute tracking, profiling, and recognition | |
US20150120642A1 (en) | Realtime snapshot indices | |
US10860571B2 (en) | Storage and pruning for faster access of a document store | |
US11003634B2 (en) | Dynamic linked multi-layered business object configurations | |
US10789249B2 (en) | Optimal offset pushdown for multipart sorting | |
US10769214B2 (en) | Encoding and decoding files for a document store | |
US20170139994A1 (en) | Faster Main Memory Scans in Unsorted Dictionary-Encoded Vectors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAP AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EICHINGER, FRANK;KURFISS, DENNIS;REEL/FRAME:031883/0967 Effective date: 20131219 |
|
AS | Assignment |
Owner name: SAP SE, GERMANY Free format text: CHANGE OF NAME;ASSIGNOR:SAP AG;REEL/FRAME:033625/0223 Effective date: 20140707 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |