[go: nahoru, domu]

US8862566B2 - Systems and methods for intelligent parallel searching - Google Patents

Systems and methods for intelligent parallel searching Download PDF

Info

Publication number
US8862566B2
US8862566B2 US13/661,485 US201213661485A US8862566B2 US 8862566 B2 US8862566 B2 US 8862566B2 US 201213661485 A US201213661485 A US 201213661485A US 8862566 B2 US8862566 B2 US 8862566B2
Authority
US
United States
Prior art keywords
data
index
indices
data sources
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/661,485
Other versions
US20140122455A1 (en
Inventor
Stephen Leitner
Kevin W. Manthey
Mark Burgess
Samuel Canfield
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Equifax Inc
Original Assignee
Equifax Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Equifax Inc filed Critical Equifax Inc
Priority to US13/661,485 priority Critical patent/US8862566B2/en
Assigned to EQUIFAX INC. reassignment EQUIFAX INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BURGESS, MARK, CANFIELD, Samuel, LEITNER, STEPHEN, MANTHEY, KEITH W.
Priority to CA2888846A priority patent/CA2888846C/en
Priority to ES13849495T priority patent/ES2752058T3/en
Priority to PCT/US2013/066911 priority patent/WO2014066816A1/en
Priority to EP13849495.0A priority patent/EP2912578B1/en
Priority to PT138494950T priority patent/PT2912578T/en
Publication of US20140122455A1 publication Critical patent/US20140122455A1/en
Publication of US8862566B2 publication Critical patent/US8862566B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F17/30864
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • This disclosure relates generally to computer hardware and methods implemented on such computer hardware, and more particularly to conducting intelligent parallel searches of multiple data sources.
  • Search applications and systems can provide search capabilities to locate and retrieve information in an online environment.
  • search applications and systems can be required to search or otherwise access large amounts of data, such as terabytes of data, and return a result in less than a second.
  • Previous solutions for providing sub-second search capabilities of data sources can require that data be stored in a common format. Previous solutions do not provide intelligent searches of data sources including data in different formats in a manner that can provide a response in less than a second. Accordingly, such solutions can require data to be converted to a common or proprietary format in order to search or otherwise access the data.
  • the search engine receives a request to access target data that is stored in at least one of multiple data sources. Each data source has a candidate index.
  • the search engine extracts inquiry parameters from the request. Each inquiry parameter corresponds to a sub-index of a respective general index. Each general index includes an index of relationships between data from at least two of the data sources. Each sub-index includes a subset of the respective general index.
  • the search engine performs parallel searches of the general indices common to the data sources. Each parallel search includes searching sub-indices for the general indices based on corresponding inquiry parameters for the sub-indices.
  • the search engine performs additional parallel searches of the candidate indices based on results of parallel searches.
  • the search engine extracts an output based on results returned from the additional parallel searches.
  • FIG. 1 is a network diagram illustrating a computing system having a search engine in communication with data sources via a network according to one feature;
  • FIG. 2 is a block diagram illustrating data sources having indices and sub-indices according to one feature
  • FIG. 3 is a block diagram illustrating data sources associated with candidate indices and general indices according to one feature
  • FIG. 4 is a block diagram illustrating a flow of communications between a search engine and data sources according to one feature
  • FIG. 5 is a block diagram depicting an example of computing systems for implementing certain features
  • FIG. 6 is a flow chart illustrating an example method for conducting intelligent parallel searching of the data sources according to one feature
  • FIG. 7 is a flow chart illustrating an example method for formatting inquiry parameters for use with data sources according to one feature.
  • FIG. 8 is a block diagram illustrating an example output of intelligent parallel searching performed by a search engine.
  • Intelligent parallel searching can include utilizing relationships between data in different data sources to partition a search process into multiple search processes to be executed in parallel.
  • a search engine executed on a computing system or other processing device can receive a search inquiry.
  • a search inquiry can include a request to search or otherwise access data stored in at least one of multiple data sources.
  • the search engine can extract inquiry parameters, such as index inquiry information and candidate inquiry information, from the search inquiry.
  • Index inquiry information can include data corresponding to an index or sub-index for a data source. For example, if a first data source includes an index based on names and a second data source includes an index based on social security numbers, the search engine can extract index inquiry information such as a surname and a social security number from a search inquiry.
  • Candidate inquiry information can include several data items corresponding to a specific individual or entity.
  • a search inquiry includes a name, an address, and an income level
  • the search engine can extract candidate inquiry information usable for identifying a particular individual or entity, such as the name and address.
  • the search engine can generate index search elements from the index inquiry information and candidate search elements from the candidate inquiry information.
  • Search elements can include search terms formatted for use with a specific type of data source.
  • the search engine can provide the index search elements to parallelized processes for searching data source indices. Each inquiry parameter can be intelligently mapped to a corresponding sub-index for a data source.
  • the results returned by the parallelized searches of the data source indices can be merged such that results duplicating candidate search elements are removed.
  • the search engine can provide the candidate search elements to parallelized processes for searching candidate indices.
  • the parallelized searches of candidate indices can provide the search engine with pointers for retrieving candidate data from data sources in a medium-agnostic and data type-agnostic manner.
  • the extracted candidate data which can include target data corresponding to the search inquiry and relationships between target data, can be returned.
  • the search engine can thus provide parallelized searching of data sources in a medium-agnostic manner such that target data can be returned milliseconds after receiving the request to access the target data.
  • search engine can refer to one or more software modules configured to search for information in one or more data sources.
  • a search engine can return search results, such as (but not limited to) target data.
  • Target data can include any data stored in a data source. Examples of target data can include (but are not limited to) web pages, images, entity identification, etc.
  • data source can refer to any combination of software modules and tangible computer-readable media configured to store data.
  • a data source can include a data source that is a database that has a collection of data organized in a structured format.
  • a database can include one or more tables. Each table can have rows corresponding to data records and can have columns corresponding to properties of data records.
  • Other aspects can include a data source that is a repository that has one or more files organized in one or more directories.
  • Some data sources can include structured data.
  • Structured data can include data stored in fixed fields within a record or file. Examples of structured data can include (but are not limited to) relational databases and spreadsheets.
  • Other data sources can include unstructured data. Unstructured data can include data that is not stored using fixed fields or locations. Unstructured data can include free-form text, such as (but not limited to) word processing documents, portable document format (“PDF”) files, e-mail messages, blogs, web pages, etc.
  • Other data sources can include semi-structured data.
  • Semi-structured data can include data that is not organized using data models such as relational databases or other forms of data tables and that includes tags or other markers.
  • Tags or other markers can delineate elements of records in a data source including semi-structured data. Tags or other markers can also identify hierarchical relationships between records in a data source including semi-structured data.
  • data source index can refer to a file or other data identifying location for each record in one or more data sources.
  • a data source index can identify a location for each record using a data pointer.
  • a data pointer can identify a location in a physical computer-readable medium and/or a location in a logical data structure.
  • an index can include a copy of one or more columns of a table and a pointer mapping unique values for each row in a column to one or more records in the relational database.
  • a data source index is a flat file.
  • Another non-limiting example of a data source index is a hierarchical index.
  • sub-index can refer to a portion of a data source index identifying locations for a subset of the data in a data source.
  • a data source can include multiple sub-indices collectively including all information included in the data source index.
  • a data source can include data describing which sub-index includes a respective portion of the index for the data source.
  • parallel can refer to dividing a series of processes to be executed sequentially by one or more processors into multiple subsets of processes. Each subset of processes can be executed concurrently with each other subset of processes. Executing the subsets of processes concurrently can reduce the amount of processing time associated with executing the entire series of processes as compared to executing the entire series of processes sequentially.
  • the term “candidate” can refer to a subset of data from a data source matching at least one inquiry parameter.
  • the candidate can include a set of data to either be returned or excluded by a search engine based on completing the parallel searches.
  • candidate index can refer to an index identifying records or other data associated with candidates from a given data source.
  • general index can refer to an index identifying one or more relationships between data included in at least two data sources.
  • Additional or alternative features can include the search engine executing the parallel searches via a data service layer.
  • the data services layer can include one or more software modules in a network protocol providing an abstraction layer between the functions executed by a processor to access data and the logical data structures and physical storage media used for storing the data. Executing the parallel searches via a data service layer can allow the search engine to be executed in a medium-agnostic manner.
  • the term “medium-agnostic” can refer to executing a common set of operations to search or otherwise access data regardless of the type of storage media used to store data in the data sources.
  • a medium-agnostic operation can be used to search or otherwise access data stored on a first type of storage medium in the same manner as data stored on a second type of storage medium different from the first type.
  • different storage media can include, but are not limited to, a dynamic random access memory (“DRAM”) device, a non-volatile random-access memory (“NVRAM”) device, a solid-state disk (“SDD”), etc.
  • Additional or alternative features can include the search engine performing searches in a data type-agnostic manner.
  • data type-agnostic can refer to executing a common set of operations to search or otherwise access data regardless of logical data structure used to store the data.
  • the search engine can perform searches in a data type-agnostic manner by, for example, consuming data formats via plug-in software modules or other applications providing data layouts and data matching extensions.
  • Additional or alternative features can include the search engine providing an output that is usable for identity resolution.
  • identity resolution can include one or more processes executed to determine that an entity or individual identified in a first data source is the same as or associated with an entity or individual identified in a second data source.
  • Examples of an output that is usable for identity resolution can include target data from two or more data sources and data describing the relationships between the target data from different data sources.
  • a computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs.
  • Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more aspects of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
  • FIG. 1 is a network diagram illustrating a computing system 102 in communication with data sources 104 a - c via a network 108 .
  • the computing system 102 can be any suitable computing system for hosting the search engine 110 . Some aspects can include the computing system 102 being a single computing system, such as a server system. Other aspects can include the computing system 102 being a virtual server implemented using a number of computing systems connected in a grid or cloud computing topology.
  • the search engine 110 executed at the computing system 102 can include one or more software modules for searching or otherwise accessing the data 106 a - c respectively stored in the data sources 104 a - c.
  • the data sources 104 a - c can include one or more software modules and associated hardware for storing data.
  • the data sources 104 a - c can store data in any format.
  • the data source 104 a can store data 106 a that is structured data.
  • the data source 104 b can store data 106 b that is unstructured data.
  • the data source 104 c can store data 106 c that is semi-structured data. While three data sources are depicted in FIG. 1 , the search engine 110 can search or otherwise access data stored in any number of data sources, including one.
  • FIG. 2 is a block diagram illustrating the data sources 104 a - c having indices and sub-indices.
  • Each of the data sources 104 a - c can respectively include indices 202 , 206 , 210 .
  • Each of the indices 202 , 206 , 210 can be generated by extracting a portion of the data from the respective data sources 104 a - c and associating each of the extracted data with one or more pointers identifying locations in a physical memory and/or a logical data structure in which records or other data including the extracted data can be found.
  • an index 202 can be generated by extracting each unique surname included in the data 106 a of the data source 104 a and associating each unique surname with one or more pointers to records or other data 106 a in the data source 104 a including the surname.
  • the data 106 a can include a table having records represented as rows with identification numbers corresponding to each record.
  • the index 202 can include a list of unique surnames associated with pointers to the respective rows including the surname.
  • Another example can be a data source 104 b having records including a field for a geographical address associated with an entity or individual, an index 206 can be generated by extracting each unique geographical address included in the data 106 b of the data source 104 b and associating each unique geographical address with one or more pointers to records or other data 106 b in the data source 104 b including the geographical address.
  • Another example can be a data source 104 c having records including a field for a social security number associated with an entity or individual, an index 206 can be generated by extracting each unique social security number included in the data 106 c of the data source 104 c and associating each unique social security number with one or more pointers to records or other data 106 c in the data source 104 c including the social security number.
  • a data source can include any number of indices.
  • a data source can include records having both surnames and geographical addresses.
  • the data source can include a first index based on surnames and a second index based on geographical addresses.
  • Each of the indices 202 , 206 , 210 can include two or more sub-indices.
  • Each sub-index can include a subset of the extracted data and associated pointers of the respective index with which the sub-index is associated.
  • the index 202 can be associated with a sub-index 204 a including surnames beginning with the letter A, a sub-index 204 b including surnames beginning with the letter B, and a sub-index 204 c including surnames beginning with the letter C.
  • the sub-indices can include any range of values.
  • an index 202 including surnames can include a sub-index 204 a of surnames beginning with the letters A-G, a sub-index 204 b of surnames beginning with the letters H-P, a sub-index 204 c of surnames beginning with the letters P-Z.
  • An index 206 including geographical addresses can include a sub-index 208 a of geographical addresses beginning with street numbers 000 to 599 and a sub-index 208 b of geographical addresses beginning with street numbers 600 to 999.
  • An index 210 including social security numbers can include a sub-index 212 a of social security numbers beginning with street numbers 000 to 299, a sub-index 212 b of social security numbers beginning with street numbers 300 to 699, and a sub-index 212 c of social security numbers beginning with street numbers 700 to 999.
  • FIG. 3 is a block diagram illustrating relationships among the data sources 104 a - c , candidate indices 302 a - c , and general indices 304 a - c .
  • the candidate indices 302 a - c are associated with the general indices 304 a - c.
  • Each of the data sources 104 a - c can be associated with a respective candidate index 302 a - c .
  • Each of the candidate indices 302 a - c can include an index of records of a respective source associated with a candidate.
  • a candidate can include two or more data items corresponding to a specific individual or entity.
  • each of the candidate indices 302 a , 302 b can be used to resolve individuals or entities having a given name and address to specific locations in the respective data sources 104 a , 104 b .
  • a search of candidate index 302 a for an individual or entity having the Surname “C_Name” and the address “Addr — 4” can be resolved to the fourth and fifth records of the data source 104 a via pointers having values 104 a _ 5 , 104 a _ 4 .
  • Each of the candidate indices 302 a - c can include or be associated with two or more sub-indices similar to the sub-indices described above with respect to FIG. 2 .
  • Each sub-index of a respective candidate index can include a subset of the extracted data and associated pointers of the respective index with which the sub-index is associated.
  • Each of the candidate indices 302 a - c can be associated with one or more of the general indices 304 a - c .
  • Each general index can include an index of relationships between data from one or more of the data sources 104 a - c . The relationships between data can described in a general index by reference to a candidate index for a respective data source.
  • a general index 304 a associated with the candidates indices 302 a , 302 b can include an entry for a surname associated with a geographical address. The entry including the surname associated with a geographical address can in turn be associated with one or more pointers to records in the respective candidate indices 302 a , 302 b .
  • the general indices 304 a - c can be shared among the data sources 104 a - c . Sharing the general indices 304 a - c among the data sources 104 a - c can identify relationships between data in different data sources. As depicted in FIG. 3 , the general index provides a list of pointers identifying a candidate index and row number of a respective candidate index in which each unique combination of surnames and geographical addresses can be found.
  • a general index 304 b can include an entry for a social security number associated with a geographical address.
  • the entry including the social security number associated with a geographical address can in turn be associated with one or more pointers to records or other data 106 b , 106 c in the respective data sources 104 b , 104 c.
  • FIG. 3 depicts three general indices, any number of general indices describing relationships between data included in multiple data sources can be used.
  • FIG. 4 is a block diagram illustrating an example flow of communications between the search engine 110 and the data sources 104 a - c.
  • the search engine 110 can receive a request 402 to search or otherwise access data stored in one or more of the data sources 104 a - c .
  • the request 402 can include inquiry parameters 404 a - c .
  • a request 402 to search for an individual can include an inquiry parameter 404 a that is a surname, an inquiry parameter 404 b that is an address, and an inquiry parameter 404 c that is a social security number.
  • the search engine 110 can extract the inquiry parameters 404 a - c from the request 402 .
  • the search engine 110 can provide the inquiry parameters 404 a - c to the data sources 104 a - c .
  • the inquiry parameters 404 a - c can be provided to the data sources 104 a - c to perform parallel searches of the data sources 104 a - c .
  • Some aspects can include the inquiry parameters 404 a - c being provided to the data sources 104 a - c as index search elements.
  • Index search elements may be constructed from the inquiry parameters 404 a - c via hash key indexing.
  • the index search elements can be used for relationship processing.
  • the index search elements can be shared among the data sources 104 a - c to generate inter-source relationships.
  • An inter-source relationship can include a relationship between records or other data in different data sources generated based on relationships between data within a data source. Inter-source relationships can be stored using one or more general indices.
  • a data source 104 a can include a relationship between a table including addresses and a table including surnames.
  • a data source 104 b can include a relationship between a table including account numbers and a table including surnames. Elements of the indices 202 , 206 can be shared such that records of the data source 104 a including surnames can be associated with records of the data source 104 b including surnames.
  • a resulting inter-source relationship can describe addresses in the data source 104 a being related to account numbers in the data source 104 b via the surnames included in the data sources 104 a , 104 b.
  • Some aspects can include the search engine 110 having a plug-in software module or other application that is executable to format the inquiry parameters 404 a - c for use with the respective data sources 104 a - c .
  • the inquiry parameter 404 a provided to a data source 104 a including structured data, such as a relational database may be formatted as a database query.
  • the inquiry parameter 404 c provided to a data source 104 c including semi-structured data, such as documents organized in hierarchy via tags, may be formatted to retrieve data from a hierarchical data structure. Formatting the inquiry parameters 404 a - c for use with the respective data sources 104 a - c can allow a search engine 110 to be used with multiple data sources having data in native formats. Doing so can obviate a requirement the data from the multiple data sources to be converted to a common format for use with the search engine 110 .
  • the search engine 110 can retrieve candidate data 406 a - c based on the parallel searches of the respective candidate indices 302 a - c of the data sources 104 a - c .
  • the parallel searches can be executed using the candidate indices 302 a - c or sub-indices of the candidate indices 302 a - c .
  • the candidate data 406 a - c can include any of the data from the data sources 104 a - c matching or otherwise corresponding to an inquiry parameter provided to a respective data source. For example, a search using an inquiry parameter 404 a that is a surname can retrieve candidate data 406 a that includes all records including the surname.
  • a search of the data source 104 a using an inquiry parameter 404 b that is an address can retrieve candidate data 406 b that includes all records including the address or a part of the address, such as a street name or zip code.
  • a search of the data source 104 b using an inquiry parameter 404 b that is an address can retrieve candidate data 406 b that includes all records including the address or a part of the address, such as a street name or zip code.
  • a search of the data source 104 c using an inquiry parameter 404 c that is a social security number can retrieve candidate data 406 c that includes all records including the social security number.
  • the candidate data 406 a - c can additionally or alternatively include relationships between data from at least two the data sources 104 a - c matching or otherwise corresponding to an inquiry parameter provided to a respective data source.
  • the search engine 110 can search the general indices 304 a , 304 b using de-duplicated candidate data 408 a , 408 b . For example, duplicate records in candidate data 406 a , 406 b can be removed such that the candidate data 408 a , 408 b includes a set of unique records or other data.
  • the search engine 110 can retrieve one or more pointers 410 a , 410 b from the general indices 304 a , 304 b based on the search of the general indices 304 a , 304 b.
  • the search engine 110 can retrieve data subsets 412 a - c from the data 106 a - c using the one or more pointers 410 a , 410 b .
  • the data subsets 412 a - c can include one or more records or other data from one or more of the data sources 104 a - c .
  • the data subsets 412 a - c can also include relationships among the data retrieved from one or more of the data sources 104 a - c.
  • the search engine 110 can provide the output 414 that includes, or is generated from, the data subsets 412 a - c .
  • the output 414 can include data and relationships between data.
  • the output 414 can be usable for identity resolution. Some aspects can include applying a matching plug-in module or other application to the output 414 .
  • the matching plug-in module or other application can analyze the relationships between data included in the output 414 to determine that the output 414 includes or does not include the target data of the request 402 , such as the identity of an individual.
  • FIG. 5 is a block diagram depicting examples of computing systems for implementing certain features.
  • the examples of computing systems include the computing system 102 and a data source 104 communicating via the network 108 .
  • the computing system 102 includes a computer-readable medium such as a processor 502 communicatively coupled to a memory 504 that can execute computer-executable program instructions and/or accesses information stored in the memory 504 .
  • a processor 502 may include a microprocessor, an ASIC, a state machine, or other processor, and can be any of a number of computer processors.
  • Such a processor can include, or may be in communication with, a computer-readable medium which stores instructions that, when executed by the processor, cause the processor to perform the steps described herein.
  • the data source 104 includes computer-readable medium such as a memory 510 . Data 106 , the index 202 , and the sub-indices 204 a , 204 b can be stored in the memory 510 .
  • a computer-readable medium may include, but is not limited to, an electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable instructions.
  • Other examples can include, but are not limited to, a floppy disk, CD-ROM, DVD, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, optical storage, magnetic tape or other magnetic storage, or any other medium from which a computer processor can read instructions.
  • the instructions may include processor-specific instructions generated by a compiler and/or an interpreter from code written in any suitable computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, and ActionScript.
  • the computing system 102 may also include a number of external or internal devices such as a mouse, a CD-ROM, DVD, a keyboard, a display, audio speakers, one or more microphones, or any other input or output devices.
  • the computing system 102 can receive input from and provide output to external device via an input/output (“I/O”) interface 508 .
  • I/O input/output
  • a bus 506 can communicatively couple the components of the computing system 102 .
  • FIG. 5 also illustrates the search engine 110 and candidate indices 302 a - c and general indices 304 a - c included in the memory 504 of the computing system 102 .
  • the search engine 110 can include one or more software modules configuring the processor 502 for searching or otherwise accessing the data 106 of the data source 104 .
  • the search engine 110 may be resident in any suitable computer-readable medium and execute on any suitable processor. Some aspects can include the search engine 110 and the candidate indices 302 a - c and general indices 304 a - c residing in memory at the computing system 102 . Other aspects can include one or more of the search engine 110 and the candidate indices 302 a - c and general indices 304 a - c being accessed by the computing system 102 from a remote location via the network 108 .
  • FIG. 6 is a flow chart illustrating an example method 600 for conducting intelligent parallel searching of the data sources 104 a - c .
  • the method 600 is described with reference to the system implementations depicted in FIGS. 1-4 . Other implementations, however, are possible.
  • the method 600 involves the search engine 110 receiving a request 402 to access target data, as shown in block 610 .
  • the target data can be stored in at least one of the data sources 104 a - c .
  • Some aspects can include the request 402 being received as or generated from input received via the I/O interface 508 .
  • Other aspects can include the request 402 being received as or generated from a message from an application in communication with the search engine 110 via the computing system 102 , such as a calling application.
  • the method 600 further involves the search engine 110 extracting the inquiry parameters 404 a - c from the request 402 , as shown in block 620 .
  • Extracting the inquiry parameters 404 a - c can include identifying one or more inquiry parameters included in the request 402 that can be used to search or otherwise access the data from each data source.
  • Each inquiry parameter can correspond to an index for a respective data source or a candidate index for a respective data source.
  • the search engine 110 can extract a surname, a geographical address, and a social security number from a request 402 and provide the surname to a data source 104 a having an index 202 including surnames, provide the geographical address to a data source 104 b having an index 206 including geographical addresses, and provide the social security number to a data source 104 c having an index 210 including social security numbers.
  • Extracting the inquiry parameters can additionally or alternatively include formatting the inquiry parameters 404 a - c for use with the respective data sources 104 a - c , as discussed in detail with respect to FIG. 7 .
  • the method 600 further involves the search engine 110 performing parallel searches of the general indices 304 a - c common to the data sources 104 a - c , as shown in block 630 .
  • Each parallel search can include searching a respective sub-index of a respective general index based on a corresponding inquiry parameter. For example, an inquiry parameter that is a surname “Doe” can be used to search a sub-index of surnames beginning with the letters A-F.
  • Performing the parallel searches can include searching multiple sub-indices of the general indices.
  • Performing the parallel searches can include searching multiple sub-indices associated with different general indices and/or data sources, searching multiple sub-indices associated within each general index and/or data source, or a combination of both.
  • Some aspects can include the search engine 110 executing the parallel searches via a data service layer.
  • the method 600 further involves the search engine 110 performing one or more additional parallel searches of the candidate indices 302 a - c based on results of the parallel searches of general indices 304 a , 304 b unioned with the inquiry parameters 404 a - c from the request 402 , as shown in block 640 .
  • Performing the union of the general indices 304 a - c with the inquiry information from the request 402 can involve excluding duplicate candidate data returned from the parallel searches, as described above with respect to FIG. 3 .
  • the method 600 further involves the search engine 110 extracting an output 414 based on results returned from the one or more additional parallel searches of the candidate indices 302 a - c , as shown in block 650 .
  • the output 414 can be extracted from candidate data 406 a - c returned from the additional parallel searches.
  • the output 414 can include the target data from at least two of the data sources and a relationship between the target data from the at least two data sources.
  • the target data and the relationship between the target data can be usable for identity resolution.
  • Some aspects can include a plug-in output formatting service or other application formatting the output 414 such that the output 414 can be provided to the application providing the request 402 .
  • FIG. 7 is a flow chart illustrating an example method for formatting the inquiry parameters 404 a - c for use with the respective data sources 104 a - c.
  • the search engine 110 selects one of the data sources 104 a - c for which inquiry parameters have not been formatted, as shown in block 710 .
  • the search engine 110 determines a format for a data source, as shown in block 720 . Some aspects can include the search engine 110 determining a format for a data source based on metadata included in the data source and describing the format for the data source. Other aspects can include the search engine 110 retrieving sample data from the data source and analyzing the data to determine the format for a data source.
  • the search engine 110 formats one or more inquiry parameters for accessing structured data, as shown in block 730 . Formatting inquiry parameters for accessing structured data can include generating queries for accessing data in relational databases based on the inquiry parameters.
  • the search engine 110 formats one or more inquiry parameters for accessing semi-structured data, as shown in block 740 . Formatting inquiry parameters for accessing semi-structured data can include generating queries for accessing data in hierarchical data structure based on the inquiry parameters.
  • the search engine 110 formats a first inquiry parameter for accessing unstructured data, as shown in block 750 .
  • the search engine 110 can determine if inquiry parameters have been formatted for each of the data sources 104 a - c , as shown in block 760 . If inquiry parameters have been formatted for each of the data sources 104 a - c , the method can return to block 710 . If inquiry parameters have been formatted for each of the data sources 104 a - c , the method can terminate and proceed to block 630 of method 600 , as shown in block 770 .
  • FIG. 8 is a block diagram illustrating an example output 414 of intelligent parallel searching performed by a search engine 110 .
  • the output 414 can include the records returned as the result of a search of the candidate indices 302 a - c for the individual “Todd LastName” and the relationships between those records.
  • a search of general index 304 a can yield an entry 902 for an individual “Todd LastName” having an address “123 Street St.”
  • the entry 902 can provide a pointer to a record 906 a in data source 104 b having a name field with the value “Todd LastName” and an address field with the value “123 Street St.”
  • the relationships between records based on the address field within the data source 104 b can also be used to select the related records 906 b , 906 c having an address field with the value “123 Street St.” relating the records 906 b , 906 c to record 906 a.
  • a search of general index 304 b can yield entries 904 a , 904 b .
  • the entry 904 a can describe an individual “Todd LastName” having an address “456 Street St.” and a social security number “xxx-xx-1234.”
  • the entry 904 b can describe an individual “Todd LastName” having an address “889 Street St.” and a social security number “xxx-xx-4568.”
  • the entry 904 a can provide a pointer to a record 908 a in data source 104 c having a name field with the value “Todd LastName,” an address field with the value “456 Street St.”, and a social security number field with the value “xxx-xx-1234.”
  • the relationships between records based on the social security number field within the data source 104 c can also be used to select the related record 908 c having a social security number field with the value “xxx-xx-1234.”
  • the entry 904 b can provide a pointer to a
  • a computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs.
  • Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more features of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems and methods are provided for conducting intelligent parallel searches of data sources. A search engine receives a request to access target data that is stored in at least one of multiple data sources. Each data source has a candidate index. The search engine extracts inquiry parameters from the request. The inquiry parameters corresponds to sub-indices of respective general indices. Each general index includes an index of relationships between data from at least two of the data sources. Each sub-index includes a subset of the respective general index. The search engine performs parallel searches of the general indices common to the data sources. Each parallel search includes searching the sub-indices based on corresponding inquiry parameters for the sub-indices. The search engine performs additional parallel searches of the candidate indices based on results of the parallel searches. The search engine extracts an output from results of the additional parallel searches.

Description

TECHNICAL FIELD
This disclosure relates generally to computer hardware and methods implemented on such computer hardware, and more particularly to conducting intelligent parallel searches of multiple data sources.
BACKGROUND
Search applications and systems can provide search capabilities to locate and retrieve information in an online environment. Within industries dealing with financial services or other credit-related industries, search applications and systems can be required to search or otherwise access large amounts of data, such as terabytes of data, and return a result in less than a second.
Previous solutions for providing sub-second search capabilities of data sources can require that data be stored in a common format. Previous solutions do not provide intelligent searches of data sources including data in different formats in a manner that can provide a response in less than a second. Accordingly, such solutions can require data to be converted to a common or proprietary format in order to search or otherwise access the data.
Systems and methods are therefore desirable that can conduct intelligent parallel searches of multiple data sources.
SUMMARY
One example involves a search engine executed by a processor. The search engine receives a request to access target data that is stored in at least one of multiple data sources. Each data source has a candidate index. The search engine extracts inquiry parameters from the request. Each inquiry parameter corresponds to a sub-index of a respective general index. Each general index includes an index of relationships between data from at least two of the data sources. Each sub-index includes a subset of the respective general index. The search engine performs parallel searches of the general indices common to the data sources. Each parallel search includes searching sub-indices for the general indices based on corresponding inquiry parameters for the sub-indices. The search engine performs additional parallel searches of the candidate indices based on results of parallel searches. The search engine extracts an output based on results returned from the additional parallel searches.
This illustrative example is mentioned not to limit or define the invention, but to aid understanding thereof. Other aspects, advantages, and features of the present invention will become apparent after review of the entire description and Figures, including the following sections: Brief Description of the Figures, Detailed Description, and Claims.
BRIEF DESCRIPTION OF THE FIGURES
These and other features, aspects, and advantages of the present disclosure are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein:
FIG. 1 is a network diagram illustrating a computing system having a search engine in communication with data sources via a network according to one feature;
FIG. 2 is a block diagram illustrating data sources having indices and sub-indices according to one feature;
FIG. 3 is a block diagram illustrating data sources associated with candidate indices and general indices according to one feature;
FIG. 4 is a block diagram illustrating a flow of communications between a search engine and data sources according to one feature;
FIG. 5 is a block diagram depicting an example of computing systems for implementing certain features;
FIG. 6 is a flow chart illustrating an example method for conducting intelligent parallel searching of the data sources according to one feature;
FIG. 7 is a flow chart illustrating an example method for formatting inquiry parameters for use with data sources according to one feature; and
FIG. 8 is a block diagram illustrating an example output of intelligent parallel searching performed by a search engine.
DETAILED DESCRIPTION
Computer-implemented systems and methods are disclosed for conducting intelligent parallel searches of data sources. Intelligent parallel searching can include utilizing relationships between data in different data sources to partition a search process into multiple search processes to be executed in parallel.
For example, a search engine executed on a computing system or other processing device can receive a search inquiry. Such a search inquiry can include a request to search or otherwise access data stored in at least one of multiple data sources. The search engine can extract inquiry parameters, such as index inquiry information and candidate inquiry information, from the search inquiry. Index inquiry information can include data corresponding to an index or sub-index for a data source. For example, if a first data source includes an index based on names and a second data source includes an index based on social security numbers, the search engine can extract index inquiry information such as a surname and a social security number from a search inquiry. Candidate inquiry information can include several data items corresponding to a specific individual or entity. For example, if a search inquiry includes a name, an address, and an income level, the search engine can extract candidate inquiry information usable for identifying a particular individual or entity, such as the name and address. The search engine can generate index search elements from the index inquiry information and candidate search elements from the candidate inquiry information. Search elements can include search terms formatted for use with a specific type of data source. The search engine can provide the index search elements to parallelized processes for searching data source indices. Each inquiry parameter can be intelligently mapped to a corresponding sub-index for a data source. The results returned by the parallelized searches of the data source indices can be merged such that results duplicating candidate search elements are removed. The search engine can provide the candidate search elements to parallelized processes for searching candidate indices. The parallelized searches of candidate indices can provide the search engine with pointers for retrieving candidate data from data sources in a medium-agnostic and data type-agnostic manner. The extracted candidate data, which can include target data corresponding to the search inquiry and relationships between target data, can be returned. The search engine can thus provide parallelized searching of data sources in a medium-agnostic manner such that target data can be returned milliseconds after receiving the request to access the target data.
As used herein, the term “search engine” can refer to one or more software modules configured to search for information in one or more data sources. A search engine can return search results, such as (but not limited to) target data. Target data can include any data stored in a data source. Examples of target data can include (but are not limited to) web pages, images, entity identification, etc.
As used herein, the term “data source” can refer to any combination of software modules and tangible computer-readable media configured to store data. Some aspects can include a data source that is a database that has a collection of data organized in a structured format. For example, a database can include one or more tables. Each table can have rows corresponding to data records and can have columns corresponding to properties of data records. Other aspects can include a data source that is a repository that has one or more files organized in one or more directories.
Some data sources can include structured data. Structured data can include data stored in fixed fields within a record or file. Examples of structured data can include (but are not limited to) relational databases and spreadsheets. Other data sources can include unstructured data. Unstructured data can include data that is not stored using fixed fields or locations. Unstructured data can include free-form text, such as (but not limited to) word processing documents, portable document format (“PDF”) files, e-mail messages, blogs, web pages, etc. Other data sources can include semi-structured data. Semi-structured data can include data that is not organized using data models such as relational databases or other forms of data tables and that includes tags or other markers. Tags or other markers can delineate elements of records in a data source including semi-structured data. Tags or other markers can also identify hierarchical relationships between records in a data source including semi-structured data.
As used herein, the term “data source index” can refer to a file or other data identifying location for each record in one or more data sources. A data source index can identify a location for each record using a data pointer. A data pointer can identify a location in a physical computer-readable medium and/or a location in a logical data structure. For example, in a relational database, an index can include a copy of one or more columns of a table and a pointer mapping unique values for each row in a column to one or more records in the relational database. One non-limiting example of a data source index is a flat file. Another non-limiting example of a data source index is a hierarchical index.
As used herein, the term “sub-index” can refer to a portion of a data source index identifying locations for a subset of the data in a data source. A data source can include multiple sub-indices collectively including all information included in the data source index. A data source can include data describing which sub-index includes a respective portion of the index for the data source.
As used herein, the term “parallel” can refer to dividing a series of processes to be executed sequentially by one or more processors into multiple subsets of processes. Each subset of processes can be executed concurrently with each other subset of processes. Executing the subsets of processes concurrently can reduce the amount of processing time associated with executing the entire series of processes as compared to executing the entire series of processes sequentially.
As used herein, the term “candidate” can refer to a subset of data from a data source matching at least one inquiry parameter. The candidate can include a set of data to either be returned or excluded by a search engine based on completing the parallel searches.
As used herein, the term “candidate index” can refer to an index identifying records or other data associated with candidates from a given data source.
As used herein, the term “general index” can refer to an index identifying one or more relationships between data included in at least two data sources.
Additional or alternative features can include the search engine executing the parallel searches via a data service layer. The data services layer can include one or more software modules in a network protocol providing an abstraction layer between the functions executed by a processor to access data and the logical data structures and physical storage media used for storing the data. Executing the parallel searches via a data service layer can allow the search engine to be executed in a medium-agnostic manner.
As used herein, the term “medium-agnostic” can refer to executing a common set of operations to search or otherwise access data regardless of the type of storage media used to store data in the data sources. For example, a medium-agnostic operation can be used to search or otherwise access data stored on a first type of storage medium in the same manner as data stored on a second type of storage medium different from the first type. Examples of different storage media can include, but are not limited to, a dynamic random access memory (“DRAM”) device, a non-volatile random-access memory (“NVRAM”) device, a solid-state disk (“SDD”), etc.
Additional or alternative features can include the search engine performing searches in a data type-agnostic manner. As used herein, the term “data type-agnostic” can refer to executing a common set of operations to search or otherwise access data regardless of logical data structure used to store the data. The search engine can perform searches in a data type-agnostic manner by, for example, consuming data formats via plug-in software modules or other applications providing data layouts and data matching extensions.
Additional or alternative features can include the search engine providing an output that is usable for identity resolution. As used herein, the term “identity resolution” can include one or more processes executed to determine that an entity or individual identified in a first data source is the same as or associated with an entity or individual identified in a second data source. Examples of an output that is usable for identity resolution can include target data from two or more data sources and data describing the relationships between the target data from different data sources.
The features discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more aspects of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
Referring now to the drawings, FIG. 1 is a network diagram illustrating a computing system 102 in communication with data sources 104 a-c via a network 108.
The computing system 102 can be any suitable computing system for hosting the search engine 110. Some aspects can include the computing system 102 being a single computing system, such as a server system. Other aspects can include the computing system 102 being a virtual server implemented using a number of computing systems connected in a grid or cloud computing topology. The search engine 110 executed at the computing system 102 can include one or more software modules for searching or otherwise accessing the data 106 a-c respectively stored in the data sources 104 a-c.
The data sources 104 a-c can include one or more software modules and associated hardware for storing data. The data sources 104 a-c can store data in any format. For example, the data source 104 a can store data 106 a that is structured data. The data source 104 b can store data 106 b that is unstructured data. The data source 104 c can store data 106 c that is semi-structured data. While three data sources are depicted in FIG. 1, the search engine 110 can search or otherwise access data stored in any number of data sources, including one.
FIG. 2 is a block diagram illustrating the data sources 104 a-c having indices and sub-indices.
Each of the data sources 104 a-c can respectively include indices 202, 206, 210. Each of the indices 202, 206, 210 can be generated by extracting a portion of the data from the respective data sources 104 a-c and associating each of the extracted data with one or more pointers identifying locations in a physical memory and/or a logical data structure in which records or other data including the extracted data can be found.
For example, for a data source 104 a having records including a field for a surname of an individual, an index 202 can be generated by extracting each unique surname included in the data 106 a of the data source 104 a and associating each unique surname with one or more pointers to records or other data 106 a in the data source 104 a including the surname. As depicted in FIG. 2, the data 106 a can include a table having records represented as rows with identification numbers corresponding to each record. The index 202 can include a list of unique surnames associated with pointers to the respective rows including the surname.
Another example can be a data source 104 b having records including a field for a geographical address associated with an entity or individual, an index 206 can be generated by extracting each unique geographical address included in the data 106 b of the data source 104 b and associating each unique geographical address with one or more pointers to records or other data 106 b in the data source 104 b including the geographical address. Another example can be a data source 104 c having records including a field for a social security number associated with an entity or individual, an index 206 can be generated by extracting each unique social security number included in the data 106 c of the data source 104 c and associating each unique social security number with one or more pointers to records or other data 106 c in the data source 104 c including the social security number.
Although each of the data sources 104 a-c is depicted as having only a single index, a data source can include any number of indices. For example, a data source can include records having both surnames and geographical addresses. The data source can include a first index based on surnames and a second index based on geographical addresses.
Each of the indices 202, 206, 210 can include two or more sub-indices. Each sub-index can include a subset of the extracted data and associated pointers of the respective index with which the sub-index is associated. As depicted in FIG. 2, the index 202 can be associated with a sub-index 204 a including surnames beginning with the letter A, a sub-index 204 b including surnames beginning with the letter B, and a sub-index 204 c including surnames beginning with the letter C.
The sub-indices can include any range of values. For example, an index 202 including surnames can include a sub-index 204 a of surnames beginning with the letters A-G, a sub-index 204 b of surnames beginning with the letters H-P, a sub-index 204 c of surnames beginning with the letters P-Z. An index 206 including geographical addresses can include a sub-index 208 a of geographical addresses beginning with street numbers 000 to 599 and a sub-index 208 b of geographical addresses beginning with street numbers 600 to 999. An index 210 including social security numbers can include a sub-index 212 a of social security numbers beginning with street numbers 000 to 299, a sub-index 212 b of social security numbers beginning with street numbers 300 to 699, and a sub-index 212 c of social security numbers beginning with street numbers 700 to 999.
FIG. 3 is a block diagram illustrating relationships among the data sources 104 a-c, candidate indices 302 a-c, and general indices 304 a-c. The candidate indices 302 a-c are associated with the general indices 304 a-c.
Each of the data sources 104 a-c can be associated with a respective candidate index 302 a-c. Each of the candidate indices 302 a-c can include an index of records of a respective source associated with a candidate. A candidate can include two or more data items corresponding to a specific individual or entity. For example, as depicted in FIG. 3, each of the candidate indices 302 a, 302 b can be used to resolve individuals or entities having a given name and address to specific locations in the respective data sources 104 a, 104 b. A search of candidate index 302 a for an individual or entity having the Surname “C_Name” and the address “Addr 4” can be resolved to the fourth and fifth records of the data source 104 a via pointers having values 104 a_5, 104 a_4. Each of the candidate indices 302 a-c can include or be associated with two or more sub-indices similar to the sub-indices described above with respect to FIG. 2. Each sub-index of a respective candidate index can include a subset of the extracted data and associated pointers of the respective index with which the sub-index is associated.
Each of the candidate indices 302 a-c can be associated with one or more of the general indices 304 a-c. Each general index can include an index of relationships between data from one or more of the data sources 104 a-c. The relationships between data can described in a general index by reference to a candidate index for a respective data source. For example, a general index 304 a associated with the candidates indices 302 a, 302 b can include an entry for a surname associated with a geographical address. The entry including the surname associated with a geographical address can in turn be associated with one or more pointers to records in the respective candidate indices 302 a, 302 b. The general indices 304 a-c can be shared among the data sources 104 a-c. Sharing the general indices 304 a-c among the data sources 104 a-c can identify relationships between data in different data sources. As depicted in FIG. 3, the general index provides a list of pointers identifying a candidate index and row number of a respective candidate index in which each unique combination of surnames and geographical addresses can be found.
In another example, a general index 304 b can include an entry for a social security number associated with a geographical address. The entry including the social security number associated with a geographical address can in turn be associated with one or more pointers to records or other data 106 b, 106 c in the respective data sources 104 b, 104 c.
Although FIG. 3 depicts three general indices, any number of general indices describing relationships between data included in multiple data sources can be used.
FIG. 4 is a block diagram illustrating an example flow of communications between the search engine 110 and the data sources 104 a-c.
The search engine 110 can receive a request 402 to search or otherwise access data stored in one or more of the data sources 104 a-c. The request 402 can include inquiry parameters 404 a-c. For example, a request 402 to search for an individual can include an inquiry parameter 404 a that is a surname, an inquiry parameter 404 b that is an address, and an inquiry parameter 404 c that is a social security number. The search engine 110 can extract the inquiry parameters 404 a-c from the request 402.
The search engine 110 can provide the inquiry parameters 404 a-c to the data sources 104 a-c. The inquiry parameters 404 a-c can be provided to the data sources 104 a-c to perform parallel searches of the data sources 104 a-c. Some aspects can include the inquiry parameters 404 a-c being provided to the data sources 104 a-c as index search elements. Index search elements may be constructed from the inquiry parameters 404 a-c via hash key indexing. The index search elements can be used for relationship processing. The index search elements can be shared among the data sources 104 a-c to generate inter-source relationships. An inter-source relationship can include a relationship between records or other data in different data sources generated based on relationships between data within a data source. Inter-source relationships can be stored using one or more general indices.
For example, a data source 104 a can include a relationship between a table including addresses and a table including surnames. A data source 104 b can include a relationship between a table including account numbers and a table including surnames. Elements of the indices 202, 206 can be shared such that records of the data source 104 a including surnames can be associated with records of the data source 104 b including surnames. A resulting inter-source relationship can describe addresses in the data source 104 a being related to account numbers in the data source 104 b via the surnames included in the data sources 104 a, 104 b.
Some aspects can include the search engine 110 having a plug-in software module or other application that is executable to format the inquiry parameters 404 a-c for use with the respective data sources 104 a-c. For example, the inquiry parameter 404 a provided to a data source 104 a including structured data, such as a relational database, may be formatted as a database query. The inquiry parameter 404 c provided to a data source 104 c including semi-structured data, such as documents organized in hierarchy via tags, may be formatted to retrieve data from a hierarchical data structure. Formatting the inquiry parameters 404 a-c for use with the respective data sources 104 a-c can allow a search engine 110 to be used with multiple data sources having data in native formats. Doing so can obviate a requirement the data from the multiple data sources to be converted to a common format for use with the search engine 110.
The search engine 110 can retrieve candidate data 406 a-c based on the parallel searches of the respective candidate indices 302 a-c of the data sources 104 a-c. The parallel searches can be executed using the candidate indices 302 a-c or sub-indices of the candidate indices 302 a-c. The candidate data 406 a-c can include any of the data from the data sources 104 a-c matching or otherwise corresponding to an inquiry parameter provided to a respective data source. For example, a search using an inquiry parameter 404 a that is a surname can retrieve candidate data 406 a that includes all records including the surname. A search of the data source 104 a using an inquiry parameter 404 b that is an address can retrieve candidate data 406 b that includes all records including the address or a part of the address, such as a street name or zip code. A search of the data source 104 b using an inquiry parameter 404 b that is an address can retrieve candidate data 406 b that includes all records including the address or a part of the address, such as a street name or zip code. A search of the data source 104 c using an inquiry parameter 404 c that is a social security number can retrieve candidate data 406 c that includes all records including the social security number. The candidate data 406 a-c can additionally or alternatively include relationships between data from at least two the data sources 104 a-c matching or otherwise corresponding to an inquiry parameter provided to a respective data source.
The search engine 110 can search the general indices 304 a, 304 b using de-duplicated candidate data 408 a, 408 b. For example, duplicate records in candidate data 406 a, 406 b can be removed such that the candidate data 408 a, 408 b includes a set of unique records or other data. The search engine 110 can retrieve one or more pointers 410 a, 410 b from the general indices 304 a, 304 b based on the search of the general indices 304 a, 304 b.
The search engine 110 can retrieve data subsets 412 a-c from the data 106 a-c using the one or more pointers 410 a, 410 b. The data subsets 412 a-c can include one or more records or other data from one or more of the data sources 104 a-c. The data subsets 412 a-c can also include relationships among the data retrieved from one or more of the data sources 104 a-c.
The search engine 110 can provide the output 414 that includes, or is generated from, the data subsets 412 a-c. The output 414 can include data and relationships between data. The output 414 can be usable for identity resolution. Some aspects can include applying a matching plug-in module or other application to the output 414. The matching plug-in module or other application can analyze the relationships between data included in the output 414 to determine that the output 414 includes or does not include the target data of the request 402, such as the identity of an individual.
Any suitable computing system 102 can be used to implement the features described in FIGS. 2-3. FIG. 5 is a block diagram depicting examples of computing systems for implementing certain features. The examples of computing systems include the computing system 102 and a data source 104 communicating via the network 108.
The computing system 102 includes a computer-readable medium such as a processor 502 communicatively coupled to a memory 504 that can execute computer-executable program instructions and/or accesses information stored in the memory 504. Each of the processor 502 may include a microprocessor, an ASIC, a state machine, or other processor, and can be any of a number of computer processors. Such a processor can include, or may be in communication with, a computer-readable medium which stores instructions that, when executed by the processor, cause the processor to perform the steps described herein. The data source 104 includes computer-readable medium such as a memory 510. Data 106, the index 202, and the sub-indices 204 a, 204 b can be stored in the memory 510.
A computer-readable medium may include, but is not limited to, an electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable instructions. Other examples can include, but are not limited to, a floppy disk, CD-ROM, DVD, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, optical storage, magnetic tape or other magnetic storage, or any other medium from which a computer processor can read instructions. The instructions may include processor-specific instructions generated by a compiler and/or an interpreter from code written in any suitable computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, and ActionScript.
The computing system 102 may also include a number of external or internal devices such as a mouse, a CD-ROM, DVD, a keyboard, a display, audio speakers, one or more microphones, or any other input or output devices. The computing system 102 can receive input from and provide output to external device via an input/output (“I/O”) interface 508. A bus 506 can communicatively couple the components of the computing system 102.
FIG. 5 also illustrates the search engine 110 and candidate indices 302 a-c and general indices 304 a-c included in the memory 504 of the computing system 102. The search engine 110 can include one or more software modules configuring the processor 502 for searching or otherwise accessing the data 106 of the data source 104. As is known to one of skill in the art, the search engine 110 may be resident in any suitable computer-readable medium and execute on any suitable processor. Some aspects can include the search engine 110 and the candidate indices 302 a-c and general indices 304 a-c residing in memory at the computing system 102. Other aspects can include one or more of the search engine 110 and the candidate indices 302 a-c and general indices 304 a-c being accessed by the computing system 102 from a remote location via the network 108.
FIG. 6 is a flow chart illustrating an example method 600 for conducting intelligent parallel searching of the data sources 104 a-c. For illustrative purposes, the method 600 is described with reference to the system implementations depicted in FIGS. 1-4. Other implementations, however, are possible.
The method 600 involves the search engine 110 receiving a request 402 to access target data, as shown in block 610. The target data can be stored in at least one of the data sources 104 a-c. Some aspects can include the request 402 being received as or generated from input received via the I/O interface 508. Other aspects can include the request 402 being received as or generated from a message from an application in communication with the search engine 110 via the computing system 102, such as a calling application.
The method 600 further involves the search engine 110 extracting the inquiry parameters 404 a-c from the request 402, as shown in block 620. Extracting the inquiry parameters 404 a-c can include identifying one or more inquiry parameters included in the request 402 that can be used to search or otherwise access the data from each data source. Each inquiry parameter can correspond to an index for a respective data source or a candidate index for a respective data source. For example, the search engine 110 can extract a surname, a geographical address, and a social security number from a request 402 and provide the surname to a data source 104 a having an index 202 including surnames, provide the geographical address to a data source 104 b having an index 206 including geographical addresses, and provide the social security number to a data source 104 c having an index 210 including social security numbers. Extracting the inquiry parameters can additionally or alternatively include formatting the inquiry parameters 404 a-c for use with the respective data sources 104 a-c, as discussed in detail with respect to FIG. 7.
The method 600 further involves the search engine 110 performing parallel searches of the general indices 304 a-c common to the data sources 104 a-c, as shown in block 630. Each parallel search can include searching a respective sub-index of a respective general index based on a corresponding inquiry parameter. For example, an inquiry parameter that is a surname “Doe” can be used to search a sub-index of surnames beginning with the letters A-F. Performing the parallel searches can include searching multiple sub-indices of the general indices. Performing the parallel searches can include searching multiple sub-indices associated with different general indices and/or data sources, searching multiple sub-indices associated within each general index and/or data source, or a combination of both. Some aspects can include the search engine 110 executing the parallel searches via a data service layer.
The method 600 further involves the search engine 110 performing one or more additional parallel searches of the candidate indices 302 a-c based on results of the parallel searches of general indices 304 a, 304 b unioned with the inquiry parameters 404 a-c from the request 402, as shown in block 640. Performing the union of the general indices 304 a-c with the inquiry information from the request 402 can involve excluding duplicate candidate data returned from the parallel searches, as described above with respect to FIG. 3.
The method 600 further involves the search engine 110 extracting an output 414 based on results returned from the one or more additional parallel searches of the candidate indices 302 a-c, as shown in block 650. The output 414 can be extracted from candidate data 406 a-c returned from the additional parallel searches. The output 414 can include the target data from at least two of the data sources and a relationship between the target data from the at least two data sources. The target data and the relationship between the target data can be usable for identity resolution. Some aspects can include a plug-in output formatting service or other application formatting the output 414 such that the output 414 can be provided to the application providing the request 402.
FIG. 7 is a flow chart illustrating an example method for formatting the inquiry parameters 404 a-c for use with the respective data sources 104 a-c.
At block 710, the search engine 110 selects one of the data sources 104 a-c for which inquiry parameters have not been formatted, as shown in block 710.
At block 720, the search engine 110 determines a format for a data source, as shown in block 720. Some aspects can include the search engine 110 determining a format for a data source based on metadata included in the data source and describing the format for the data source. Other aspects can include the search engine 110 retrieving sample data from the data source and analyzing the data to determine the format for a data source.
If a data source includes structured data, the search engine 110 formats one or more inquiry parameters for accessing structured data, as shown in block 730. Formatting inquiry parameters for accessing structured data can include generating queries for accessing data in relational databases based on the inquiry parameters.
If a data source includes semi-structured data, the search engine 110 formats one or more inquiry parameters for accessing semi-structured data, as shown in block 740. Formatting inquiry parameters for accessing semi-structured data can include generating queries for accessing data in hierarchical data structure based on the inquiry parameters.
If a data source includes unstructured data, the search engine 110 formats a first inquiry parameter for accessing unstructured data, as shown in block 750.
The search engine 110 can determine if inquiry parameters have been formatted for each of the data sources 104 a-c, as shown in block 760. If inquiry parameters have been formatted for each of the data sources 104 a-c, the method can return to block 710. If inquiry parameters have been formatted for each of the data sources 104 a-c, the method can terminate and proceed to block 630 of method 600, as shown in block 770.
FIG. 8 is a block diagram illustrating an example output 414 of intelligent parallel searching performed by a search engine 110. The output 414 can include the records returned as the result of a search of the candidate indices 302 a-c for the individual “Todd LastName” and the relationships between those records.
A search of general index 304 a can yield an entry 902 for an individual “Todd LastName” having an address “123 Street St.” The entry 902 can provide a pointer to a record 906 a in data source 104 b having a name field with the value “Todd LastName” and an address field with the value “123 Street St.” The relationships between records based on the address field within the data source 104 b can also be used to select the related records 906 b, 906 c having an address field with the value “123 Street St.” relating the records 906 b, 906 c to record 906 a.
A search of general index 304 b can yield entries 904 a, 904 b. The entry 904 a can describe an individual “Todd LastName” having an address “456 Street St.” and a social security number “xxx-xx-1234.” The entry 904 b can describe an individual “Todd LastName” having an address “889 Street St.” and a social security number “xxx-xx-4568.” The entry 904 a can provide a pointer to a record 908 a in data source 104 c having a name field with the value “Todd LastName,” an address field with the value “456 Street St.”, and a social security number field with the value “xxx-xx-1234.” The relationships between records based on the social security number field within the data source 104 c can also be used to select the related record 908 c having a social security number field with the value “xxx-xx-1234.” The entry 904 b can provide a pointer to a record 908 b in data source 104 c having a name field with the value “Todd LastName,” an address field with the value “789 Street St.”, and a social security number field with the value “xxx-xx-4568.” The relationships between records based on the address field within the data source 104 c can also be used to select the related record 908 d having an address field with the value “789 Street St.” The relationships between data sources 104 a, 104 b based on address field can be used to select the related record 910 having an address field with the value “789 Street St.”
Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter may be practiced without these specific details. In other instances, methods, apparatuses, or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.
Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.
The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more features of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
Features of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied—for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.
The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.
While the present subject matter has been described in detail with respect to specific aspects and features thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such aspects and features. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Claims (20)

The invention claimed is:
1. A method comprising:
receiving, by a search engine executed by a processor device, a request to access target data, wherein the target data is stored in at least one of a plurality of data sources, each of the plurality of data sources having a respective candidate index;
accessing, by the search engine, a plurality of general indices, wherein each of the plurality of general indices comprises a respective index of relationships between data from at least two of the plurality of data sources;
extracting, by the search engine, a plurality of inquiry parameters from the request, wherein each of the extracted plurality of inquiry parameters corresponds to a respective sub-index from the plurality of general indices, wherein each sub-index from the plurality of general indices comprises a respective subset of at least one respective general index from the plurality of general indices;
performing, by the search engine, parallel searches of the plurality of general indices, wherein each parallel search comprises searching a respective sub-index from the plurality of general indices based on a respective inquiry parameter from the extracted plurality of inquiry parameters that corresponds to the respective sub-index;
performing, by the search engine, additional parallel searches of the candidate indices for the plurality of data sources based on the extracted plurality of inquiry parameters and at least one additional inquiry parameter obtained from the parallel searches of the plurality of general indices; and
extracting, by the search engine, an output based on results returned from the additional parallel searches of the candidate indices.
2. The method of claim 1, wherein a first data source of the plurality of data sources comprises structured data and wherein a second data source of the plurality of data sources comprises unstructured data, wherein performing the additional parallel searches comprises:
formatting a first one of the additional parallel searches for accessing the structured data using a first one of the extracted plurality of inquiry parameters or the at least one additional inquiry parameter; and
formatting a second one of the additional parallel searches for accessing the structured data using a second one of the extracted plurality of inquiry parameters or the at least one additional inquiry parameter.
3. The method of claim 2, wherein the search engine executes the parallel searches via a data service layer.
4. The method of claim 1, wherein returning the output comprises returning the target data from at least two data sources of the plurality of data sources and a relationship between the target data from the at least two data sources.
5. The method of claim 4, wherein the target data and the relationship between the target data are usable for identity resolution, wherein the identity resolution comprises determining that a first entity or individual identified in a first one of the plurality of data sources is the same as or associated with a second entity or individual identified in a second one of the plurality of data sources.
6. The method of claim 1, wherein executing the parallel searches of the plurality of data sources comprises executing a first search of a first data source having a first type of storage medium and executing a second search of a second data source having a second type of storage medium different from the first type of storage medium.
7. The method of claim 1, wherein executing the additional parallel searches of the candidate indices further comprises excluding duplicate candidate data returned from the parallel searches.
8. The method of claim 1, wherein the at least one additional inquiry parameter is non-duplicative of the extracted plurality of inquiry parameters and wherein at least one of the additional parallel searches comprises searching a respective one of the candidate indices to which the at least one additional inquiry parameter corresponds and to which none of the extracted plurality of inquiry parameters corresponds.
9. The method of claim 8, further comprising obtaining the at least one additional inquiry parameter from the parallel searches by performing operations comprising:
determining from at least one of the plurality of general indices that first data from a one of the plurality of data sources is related to second data from a second one of the plurality of data sources; and
obtaining the at least one additional inquiry parameter from the second data based on (i) the second data being related to the first data and (ii) the first data corresponding to at least some of the extracted plurality of inquiry parameters.
10. The method of claim 9, wherein obtaining the at least one additional inquiry parameter from the second data comprises selecting the second data as the at least one additional inquiry parameter.
11. The method of claim 1, wherein a first set of results returned by searching the plurality of data sources with the extracted plurality of inquiry parameters is smaller than a second set of results returned by searching the plurality of data sources with the extracted plurality of inquiry parameters unioned with the at least one additional inquiry parameter.
12. A non-transitory computer-readable medium embodying program code executable by a computer system, the non-transitory computer-readable medium comprising:
program code for receiving a request to access target data, wherein the target data is stored in at least one of a plurality of data sources, each of the plurality of data sources having a respective candidate index;
program code for accessing a plurality of general indices, wherein each of the plurality of general indices comprises a respective index of relationships between data from at least two of the plurality of data sources;
program code for extracting a plurality of inquiry parameters from the request, wherein each of the extracted plurality of inquiry parameters corresponds to a respective sub-index from the plurality of general indices, wherein each sub-index from the plurality of general indices comprises a respective subset of at least one respective general index from the plurality of general indices;
program code for performing parallel searches of the plurality of general indices, wherein each parallel search comprises searching a respective sub-index from the plurality of general indices based on a respective inquiry parameter from the extracted plurality of inquiry parameters that corresponds to the respective sub-index;
program code for performing additional parallel searches of the candidate indices for the plurality of data sources based on the extracted plurality of inquiry parameters and at least one additional inquiry parameter obtained from the parallel searches of the plurality of general indices; and
program code for extracting an output based on results returned from the additional parallel searches of the candidate indices.
13. The non-transitory computer-readable medium of claim 12, wherein a first data source of the plurality of data sources comprises structured data and wherein a second data source of the plurality of data sources comprises unstructured data.
14. The non-transitory computer-readable medium of claim 12, wherein the program code for returning the output comprise program code for returning the target data from at least two data sources of the plurality of data sources and a relationship between the target data from the at least two data sources.
15. The non-transitory computer-readable medium of claim 12, wherein the program code for executing the parallel searches of the plurality of data sources comprises program code for executing a first search of a first data source having a first type of storage medium and executing a second search of a second data source having a second type of storage medium different from the first type of storage medium.
16. A system comprising:
a non-transitory computer-readable medium configured to store instructions providing a search engine;
a processor configured to execute the instructions stored in the non-transitory computer-readable medium to execute the search engine by performing operations comprising:
receiving a request to access target data, wherein the target data is stored in at least one of a plurality of data sources, each of the plurality of data sources having a respective candidate index;
accessing, by the search engine, a plurality of general indices, wherein each of the plurality of general indices comprises a respective index of relationships between data from at least two of the plurality of data sources;
extracting a plurality of inquiry parameters from the request, wherein each of the extracted plurality of inquiry parameters corresponds to a respective sub-index from the plurality of general indices, wherein each sub-index from the plurality of general indices comprises a respective subset of at least one respective general index from the plurality of general indices;
performing parallel searches of the plurality of general indices, wherein each parallel search comprises searching a respective sub-index from the plurality of general indices based on a respective inquiry parameter from the extracted plurality of inquiry parameters that corresponds to the respective sub-index;
performing additional parallel searches of the candidate indices for the plurality of data sources based on the extracted plurality of inquiry parameters and at least one additional inquiry parameter obtained from the parallel searches of the plurality of general indices; and
extracting an output based on results returned from the additional parallel searches of the candidate indices.
17. The system of claim 16, wherein a first data source of the plurality of data sources comprises structured data and wherein a second data source of the plurality of data sources comprises unstructured data.
18. The system of claim 16, wherein returning the output comprises returning the target data from at least two data sources of the plurality of data sources and a relationship between the target data from the at least two data sources.
19. The system of claim 18, wherein the target data and the relationship between the target data are usable for identity resolution.
20. The system of claim 16, wherein executing the parallel searches of the plurality of data sources comprises executing a first search of a first data source having a first type of storage medium and executing a second search of a second data source having a second type of storage medium different from the first type of storage medium.
US13/661,485 2012-10-26 2012-10-26 Systems and methods for intelligent parallel searching Active 2032-10-29 US8862566B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US13/661,485 US8862566B2 (en) 2012-10-26 2012-10-26 Systems and methods for intelligent parallel searching
EP13849495.0A EP2912578B1 (en) 2012-10-26 2013-10-25 Systems and methods for intelligent parallel searching
ES13849495T ES2752058T3 (en) 2012-10-26 2013-10-25 Systems and methods for intelligent parallel search
PCT/US2013/066911 WO2014066816A1 (en) 2012-10-26 2013-10-25 Systems and methods for intelligent parallel searching
CA2888846A CA2888846C (en) 2012-10-26 2013-10-25 Systems and methods for intelligent parallel searching
PT138494950T PT2912578T (en) 2012-10-26 2013-10-25 Systems and methods for intelligent parallel searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/661,485 US8862566B2 (en) 2012-10-26 2012-10-26 Systems and methods for intelligent parallel searching

Publications (2)

Publication Number Publication Date
US20140122455A1 US20140122455A1 (en) 2014-05-01
US8862566B2 true US8862566B2 (en) 2014-10-14

Family

ID=50545341

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/661,485 Active 2032-10-29 US8862566B2 (en) 2012-10-26 2012-10-26 Systems and methods for intelligent parallel searching

Country Status (6)

Country Link
US (1) US8862566B2 (en)
EP (1) EP2912578B1 (en)
CA (1) CA2888846C (en)
ES (1) ES2752058T3 (en)
PT (1) PT2912578T (en)
WO (1) WO2014066816A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9720968B2 (en) 2015-12-16 2017-08-01 International Business Machines Corporation Runtime optimization for multi-index access
US10580025B2 (en) 2013-11-15 2020-03-03 Experian Information Solutions, Inc. Micro-geographic aggregation system
US11107158B1 (en) 2014-02-14 2021-08-31 Experian Information Solutions, Inc. Automatic generation of code for attributes
US11157872B2 (en) 2008-06-26 2021-10-26 Experian Marketing Solutions, Llc Systems and methods for providing an integrated identifier
US11227001B2 (en) 2017-01-31 2022-01-18 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11308170B2 (en) 2007-03-30 2022-04-19 Consumerinfo.Com, Inc. Systems and methods for data verification
US11734234B1 (en) 2018-09-07 2023-08-22 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9501585B1 (en) 2013-06-13 2016-11-22 DataRPM Corporation Methods and system for providing real-time business intelligence using search-based analytics engine
US10083398B2 (en) 2014-12-13 2018-09-25 International Business Machines Corporation Framework for annotated-text search using indexed parallel fields
US20180004892A1 (en) * 2014-12-23 2018-01-04 Koninklijke Philips N.V. Systems, methods, and apparatuses for sequence alignment
US9971809B1 (en) * 2015-09-28 2018-05-15 Symantec Corporation Systems and methods for searching unstructured documents for structured data
WO2017189674A1 (en) * 2016-04-26 2017-11-02 Equifax, Inc. Global matching system
CN106776945A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 Mobile terminal and garbage files searching method
US11216432B2 (en) * 2018-07-06 2022-01-04 Cfph, Llc Index data structures and graphical user interface
US20220405263A1 (en) * 2021-06-21 2022-12-22 International Business Machines Corporation Increasing Index Availability in Databases
CN114138785B (en) * 2021-11-30 2024-07-30 中国平安财产保险股份有限公司 Data retrieval method, device, equipment and storage medium suitable for large data volume

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133476A1 (en) 1996-07-08 2002-09-19 Gert J. Reinhardt Database system
US6560634B1 (en) * 1997-08-15 2003-05-06 Verisign, Inc. Method of determining unavailability of an internet domain name
US6618727B1 (en) 1999-09-22 2003-09-09 Infoglide Corporation System and method for performing similarity searching
US6665677B1 (en) 1999-10-01 2003-12-16 Infoglide Corporation System and method for transforming a relational database to a hierarchical database
US6728706B2 (en) 2001-03-23 2004-04-27 International Business Machines Corporation Searching products catalogs
US6738759B1 (en) 2000-07-07 2004-05-18 Infoglide Corporation, Inc. System and method for performing similarity searching using pointer optimization
US6742001B2 (en) 2000-06-29 2004-05-25 Infoglide Corporation System and method for sharing data between hierarchical databases
US6751575B2 (en) 2000-02-14 2004-06-15 Infoglide Corporation System and method for monitoring and control of processes and machines
US6795819B2 (en) 2000-08-04 2004-09-21 Infoglide Corporation System and method for building and maintaining a database
US6829606B2 (en) 2002-02-14 2004-12-07 Infoglide Software Corporation Similarity search engine for use with relational databases
US6839714B2 (en) 2000-08-04 2005-01-04 Infoglide Corporation System and method for comparing heterogeneous data sources
US6853997B2 (en) 2000-06-29 2005-02-08 Infoglide Corporation System and method for sharing, mapping, transforming data between relational and hierarchical databases
US20050102270A1 (en) 2003-11-10 2005-05-12 Risvik Knut M. Search engine with hierarchically stored indices
US6985898B1 (en) 1999-10-01 2006-01-10 Infoglide Corporation System and method for visually representing a hierarchical database objects and their similarity relationships to other objects in the database
US7007174B2 (en) 2000-04-26 2006-02-28 Infoglide Corporation System and method for determining user identity fraud using similarity searching
US7010539B1 (en) 2000-09-08 2006-03-07 International Business Machines Corporation System and method for schema method
US7188107B2 (en) 2002-03-06 2007-03-06 Infoglide Software Corporation System and method for classification of documents
US7283998B2 (en) 2002-09-03 2007-10-16 Infoglide Software Corporation System and method for classification of documents
US20080059421A1 (en) * 2006-08-29 2008-03-06 Randall Paul Baartman Method and Apparatus for Resolution of Abbreviated Text in an Electronic Communications System
US7386554B2 (en) 2002-09-03 2008-06-10 Infoglide Software Corporation Remote scoring and aggregating similarity search engine for use with relational databases
US7412417B1 (en) 2000-03-03 2008-08-12 Infoglide Software Corporation Loan compliance auditing system and method
US7458508B1 (en) 2003-05-12 2008-12-02 Id Analytics, Inc. System and method for identity-based fraud detection
US20090182724A1 (en) 2008-01-11 2009-07-16 Paul Reuben Day Database Query Optimization Using Index Carryover to Subset an Index
US7562814B1 (en) 2003-05-12 2009-07-21 Id Analytics, Inc. System and method for identity-based fraud detection through graph anomaly detection
US20100005054A1 (en) 2008-06-17 2010-01-07 Tim Smith Querying joined data within a search engine index
US7647635B2 (en) 2006-11-02 2010-01-12 A10 Networks, Inc. System and method to resolve an identity interactively
US7686214B1 (en) 2003-05-12 2010-03-30 Id Analytics, Inc. System and method for identity-based fraud detection using a plurality of historical identity records
US7953723B1 (en) 2004-10-06 2011-05-31 Shopzilla, Inc. Federation for parallel searching
US20120254148A1 (en) 2011-03-28 2012-10-04 Microsoft Corporation Serving multiple search indexes
US20120317082A1 (en) * 2011-06-13 2012-12-13 Microsoft Corporation Query-based information hold

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960194A (en) * 1995-09-11 1999-09-28 International Business Machines Corporation Method for generating a multi-tiered index for partitioned data
US20080270382A1 (en) * 2007-04-24 2008-10-30 Interse A/S System and Method of Personalizing Information Object Searches
US8442982B2 (en) * 2010-11-05 2013-05-14 Apple Inc. Extended database search

Patent Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133476A1 (en) 1996-07-08 2002-09-19 Gert J. Reinhardt Database system
US6560634B1 (en) * 1997-08-15 2003-05-06 Verisign, Inc. Method of determining unavailability of an internet domain name
US6618727B1 (en) 1999-09-22 2003-09-09 Infoglide Corporation System and method for performing similarity searching
US6665677B1 (en) 1999-10-01 2003-12-16 Infoglide Corporation System and method for transforming a relational database to a hierarchical database
US6985898B1 (en) 1999-10-01 2006-01-10 Infoglide Corporation System and method for visually representing a hierarchical database objects and their similarity relationships to other objects in the database
US6751575B2 (en) 2000-02-14 2004-06-15 Infoglide Corporation System and method for monitoring and control of processes and machines
US7412417B1 (en) 2000-03-03 2008-08-12 Infoglide Software Corporation Loan compliance auditing system and method
US7007174B2 (en) 2000-04-26 2006-02-28 Infoglide Corporation System and method for determining user identity fraud using similarity searching
US6742001B2 (en) 2000-06-29 2004-05-25 Infoglide Corporation System and method for sharing data between hierarchical databases
US6853997B2 (en) 2000-06-29 2005-02-08 Infoglide Corporation System and method for sharing, mapping, transforming data between relational and hierarchical databases
US6738759B1 (en) 2000-07-07 2004-05-18 Infoglide Corporation, Inc. System and method for performing similarity searching using pointer optimization
US6795819B2 (en) 2000-08-04 2004-09-21 Infoglide Corporation System and method for building and maintaining a database
US6839714B2 (en) 2000-08-04 2005-01-04 Infoglide Corporation System and method for comparing heterogeneous data sources
US7010539B1 (en) 2000-09-08 2006-03-07 International Business Machines Corporation System and method for schema method
US6728706B2 (en) 2001-03-23 2004-04-27 International Business Machines Corporation Searching products catalogs
US7020651B2 (en) 2002-02-14 2006-03-28 Infoglide Software Corporation Similarity search engine for use with relational databases
US6829606B2 (en) 2002-02-14 2004-12-07 Infoglide Software Corporation Similarity search engine for use with relational databases
US7188107B2 (en) 2002-03-06 2007-03-06 Infoglide Software Corporation System and method for classification of documents
US7386554B2 (en) 2002-09-03 2008-06-10 Infoglide Software Corporation Remote scoring and aggregating similarity search engine for use with relational databases
US7283998B2 (en) 2002-09-03 2007-10-16 Infoglide Software Corporation System and method for classification of documents
US7562814B1 (en) 2003-05-12 2009-07-21 Id Analytics, Inc. System and method for identity-based fraud detection through graph anomaly detection
US7458508B1 (en) 2003-05-12 2008-12-02 Id Analytics, Inc. System and method for identity-based fraud detection
US7686214B1 (en) 2003-05-12 2010-03-30 Id Analytics, Inc. System and method for identity-based fraud detection using a plurality of historical identity records
US7793835B1 (en) 2003-05-12 2010-09-14 Id Analytics, Inc. System and method for identity-based fraud detection for transactions using a plurality of historical identity records
US20050102270A1 (en) 2003-11-10 2005-05-12 Risvik Knut M. Search engine with hierarchically stored indices
US7953723B1 (en) 2004-10-06 2011-05-31 Shopzilla, Inc. Federation for parallel searching
US20080059421A1 (en) * 2006-08-29 2008-03-06 Randall Paul Baartman Method and Apparatus for Resolution of Abbreviated Text in an Electronic Communications System
US7647635B2 (en) 2006-11-02 2010-01-12 A10 Networks, Inc. System and method to resolve an identity interactively
US20090182724A1 (en) 2008-01-11 2009-07-16 Paul Reuben Day Database Query Optimization Using Index Carryover to Subset an Index
US20100005054A1 (en) 2008-06-17 2010-01-07 Tim Smith Querying joined data within a search engine index
US20120254148A1 (en) 2011-03-28 2012-10-04 Microsoft Corporation Serving multiple search indexes
US20120317082A1 (en) * 2011-06-13 2012-12-13 Microsoft Corporation Query-based information hold

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
International Patent Application No. PCT/US2013/066911, "International Search Report and Written Opinion", mailed Feb. 7, 2014, 9 pages.

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11308170B2 (en) 2007-03-30 2022-04-19 Consumerinfo.Com, Inc. Systems and methods for data verification
US11769112B2 (en) 2008-06-26 2023-09-26 Experian Marketing Solutions, Llc Systems and methods for providing an integrated identifier
US11157872B2 (en) 2008-06-26 2021-10-26 Experian Marketing Solutions, Llc Systems and methods for providing an integrated identifier
US10580025B2 (en) 2013-11-15 2020-03-03 Experian Information Solutions, Inc. Micro-geographic aggregation system
US11107158B1 (en) 2014-02-14 2021-08-31 Experian Information Solutions, Inc. Automatic generation of code for attributes
US11847693B1 (en) 2014-02-14 2023-12-19 Experian Information Solutions, Inc. Automatic generation of code for attributes
US9747338B2 (en) 2015-12-16 2017-08-29 International Business Machines Corporation Runtime optimization for multi-index access
US9898506B2 (en) 2015-12-16 2018-02-20 International Business Machines Corporation Runtime optimization for multi-index access
US9720968B2 (en) 2015-12-16 2017-08-01 International Business Machines Corporation Runtime optimization for multi-index access
US11227001B2 (en) 2017-01-31 2022-01-18 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11681733B2 (en) 2017-01-31 2023-06-20 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11734234B1 (en) 2018-09-07 2023-08-22 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US12066990B1 (en) 2018-09-07 2024-08-20 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution

Also Published As

Publication number Publication date
EP2912578B1 (en) 2019-09-18
EP2912578A1 (en) 2015-09-02
WO2014066816A1 (en) 2014-05-01
ES2752058T3 (en) 2020-04-02
CA2888846A1 (en) 2014-05-01
CA2888846C (en) 2020-10-13
PT2912578T (en) 2019-10-24
EP2912578A4 (en) 2016-07-13
US20140122455A1 (en) 2014-05-01

Similar Documents

Publication Publication Date Title
US8862566B2 (en) Systems and methods for intelligent parallel searching
EP3602351B1 (en) Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps
US10180992B2 (en) Atomic updating of graph database index structures
US9031992B1 (en) Analyzing big data
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
CN111258966A (en) Data deduplication method, device, equipment and storage medium
US10565201B2 (en) Query processing management in a database management system
US10157211B2 (en) Method and system for scoring data in a database
US20170255708A1 (en) Index structures for graph databases
CN107329987A (en) A kind of search system based on mongo databases
US10628421B2 (en) Managing a single database management system
KR101823463B1 (en) Apparatus for providing researcher searching service and method thereof
CA3149710A1 (en) Data collecting method, device, computer equipment and storage medium
US20230153455A1 (en) Query-based database redaction
US20070282804A1 (en) Apparatus and method for extracting database information from a report
CN111125216B (en) Method and device for importing data into Phoenix
CN116610700A (en) Query statement detection method and device and storage medium
TWI547888B (en) A method of recording user information and a search method and a server
US11675751B2 (en) Systems and methods for capturing data schema for databases during data insertion
CN110851517A (en) Source data extraction method, device and equipment and computer storage medium
CN111400556A (en) Data query method and device, computer equipment and storage medium
US11954223B2 (en) Data record search with field level user access control
US12038894B2 (en) Evaluating row-store expressions on a column-store database
El Abassi et al. Deduplication Over Big Data Integration
Dang et al. A Deeper Analysis of the Hierarchical Clustering and Set Unionability-Based Data Union Method

Legal Events

Date Code Title Description
AS Assignment

Owner name: EQUIFAX INC., GEORGIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEITNER, STEPHEN;MANTHEY, KEITH W.;BURGESS, MARK;AND OTHERS;SIGNING DATES FROM 20121031 TO 20121106;REEL/FRAME:029336/0147

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8