WO2002095676A2 - Real-time adaptive data mining system and method - Google Patents
Real-time adaptive data mining system and method Download PDFInfo
- Publication number
- WO2002095676A2 WO2002095676A2 PCT/US2002/016069 US0216069W WO02095676A2 WO 2002095676 A2 WO2002095676 A2 WO 2002095676A2 US 0216069 W US0216069 W US 0216069W WO 02095676 A2 WO02095676 A2 WO 02095676A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- conclusion
- rules
- attribute
- attribute value
- rule
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
Definitions
- the present invention relates generally to a system and method of data mining. More specifically, the present invention is related to a system and method for deriving adaptive knowledge based on pattern information obtained from data collected in real time.
- Data mining takes advantage of the potential intelligence contained in the vast amounts of data collected by business when interacting with customers.
- the data generally contains patterns that can indicate, for example, when it is most appropriate to contact a particular customer for a specific purpose.
- a business may timely offer a customer a product that has been purchased in the past, or draw attention to additional products that the customer may be interested in purchasing.
- Data mining has the potential to improve the quality of interaction between businesses and customers.
- data mining can assist in detection of fraud while providing other advantages to business operations, such as increased efficiency. It is the object of data mining to extract fact patterns from a data set, to associate the fact patterns with potential conclusions and to produce an intelligent result based on the patterns embedded in the data.
- neural networks and case based reasoning algorithms may also be used in data mining processes. Known as machine learning algorithms, neural nets and case based reasoning algorithms are exposed to a number of patterns to "teach" the proper conclusion given a particular data pattern. [0006] However, neural networks have the disadvantage of obscuring the patterns that are discovered in the data. A neural network simply provides conclusions about which of the neural network patterns most closely matches patterns in newly presented data.
- the goal of data mining is to obtain a certain level of intelligence regarding customer activity based on previous activity patterns present in a data set related to a particular activity or event.
- Intelligence can be defined as the association of a pattern of facts with a conclusion.
- the data to be mined is usually organized as records containing fields for each of the fact items and an associated conclusion.
- Fact value patterns define situations or contexts within which fact values are interpreted. Some fact values in a given pattern may provide the context in which the remaining fact values in the pattern are interpreted. Therefore, fact values given an interpretation in one context may receive a different inte ⁇ retation in another context.
- Each field in a record can represent a fact with a number of possible values.
- the permutations that can be formed from the number of possible associations between n fact items is Nl * N2 * N3 * ... * Ni * ... * Nn, where each Ni represents the number of values that the fact item can assume.
- Nl * N2 * N3 * ... * Ni * ... * Nn the number of possible associations between the fact items, or patterns, is very large. Most often, however, all possible combinations of fact item values are not represented in the data.
- the number of conclusions or actions associated with the fact item patterns is normally a small number. A large number of data records are normally required to ensure that the data correctly represents true causality or associative quality between all the fact items and the conclusions.
- Statistical methods have been used to determine which fact item (usually referred to as an attribute) has the most influence on a particular conclusion.
- a typical statistical method divides the data into two groups according to a value for a particular fact item. Each group will have a different conclusion, or action, associated with the grouping of values related to the conclusion or action in the data for that group. Each subgroup is again divided according to the value of a particular fact item. The process continues until no further division is statistically significant, or at some arbitrary level of divisions. In dividing the data at each step, evidence of certain patterns can be split among the two groups, reducing the chance that the pattern will show statistical significance, and hence be discovered.
- the above-described approaches to data mining operate on sets of data that have been amassed over time and that are generally static in nature.
- the statistical methods operate on the data on a whole to produce statistical conclusions for specific patterns in the data.
- the approaches that adopt a machine learning algorithm such as the neural networks and case based reasoning techniques, require exposure to a large number of data examples to produce useful results.
- Each of these systems described above is typically unsuitable for use in a real time framework to discover patterns within data being received in response to presently occurring real world situations.
- the above-described approaches are ill suited to handle dynamic information that is characteristic of real time data mining.
- a further object of the present invention is to surpass the performance of statistical based data mining methods by detecting patterns that have small statistical support.
- a further object of the present invention is to provide a minimal set of patterns that represent the intelligence or knowledge represented by the data.
- a further object of the present invention is to indicate missing patterns and pattern overlap due to incomplete data for defining the domain of knowledge.
- the present invention uses logic to directly determine the factors or attributes that are relevant or significant to the associated conclusions or actions represented in a set of data.
- a system and method according to the present invention reveals all significant patterns in the data.
- the system and method permit the determination of a minimal set of patterns for the knowledge domain represented by the data.
- the system and method also identify irrelevant attributes in the patterns representing the data.
- the system and method allow the determination of all the possible patterns within the constraints imposed by the data. Patterns that completely cover all relevant outcomes are detected or identified and recorded.
- the present invention directly determines the factors or attributes in the data that are relevant to a representation of the data.
- Knowledge contained in data acquired in real time is revealed as the significant data patterns are discovered, beginning immediately with initial real time data. Because the system and method of the present invention use logic rather than statistical methods, relevant patterns representative of the knowledge contained in the data are determinable starting with the very first data example provided.
- attributes irrelevant to the outcomes of the data patterns are removed from the set of attributes in the data pattern.
- the attributes that are removed from the various data patterns do not contribute to their respective conclusions and are therefore irrelevant.
- the present invention can determine a minimal set of patterns for the knowledge domain represented by the data.
- the present invention provides a system and method for detecting and reporting attribute patterns needed to completely represent all possible patterns representative of the data. By weighing the more recently received data more heavily than prior data, the present invention emphasizes the effect of the more pertinent information that is more recently received. The use of non-linear processing accorded to more recently received data provides a weighting technique that provides emphasis on the more recently received data.
- the data provided according to the present invention represents situations and concepts through a set of attribute values associated with an appropriate action or conclusion.
- a first example of a set of attribute values associated with a conclusion is accepted as a first rule in which the conclusion or action associated with the attribute values is inferred every time that any of those attribute values are encountered in the data.
- This overly broad rule is normally modified as new examples are processed. As new examples are provided and examined, a comparison is made between the new example and the established rules derived from previous examples.
- a new rule is only generated when the example under examination does not match the attribute values of a rule that has already been established.
- a count for each action/conclusion of each rule is retained. If the attribute values of the example under examination matches an existing rule, a count for actions or conclusions associated with the example is incremented in the rule.
- the present invention provides a predetermined maximum action or conclusion tally that the count increment may not exceed. If the action or conclusion count is already at a maximum, and an incrementation is indicated by examination of the present example, then the counts for all other actions or conclusions associated with that rule are decremented, with a minimum value for each count being zero.
- inconsistencies in the data can be represented by having several different conclusions or actions associated with a single set of attribute values for a given rule.
- the action or conclusion for each rule that has the highest count is designated as the predominant action or conclusion for that rule.
- the system and method according to the present invention will be more responsive in emphasizing recent trend changes in the data. Due to the weighting of the actions or conclusions associated with the attribute values of a particular rule, prior action or conclusion data is retained, but can be emphasized or de-emphasized depending on more recently received data.
- the action or conclusion that has the highest count in a group of actions or conclusions associated with a given set of attribute values for a rule is designated as the predominant action or conclusion for that rule. Since the count values can change for each of the actions or conclusions in a given rule, it is possible to have several actions or conclusions with the same highest count number.
- the former designated predominant action is preferably retained as the predominant action or conclusion for the specific rule to provide hysteresis for noise suppression.
- new rules can be formed that are representative of previously undiscovered patterns in the data.
- a further operation to identify irrelevant attributes and to identify groups of relevant attributes is performed on the rules. Identification of irrelevant attributes and groups of relevant attributes is obtained by comparing the new rule to all the other rules having a different predominant action or conclusion in the set of existing rules. This comparison process may affect the relevance of attributes within existing rules, requiring an update to the existing rules.
- An update to the existing rules may also be required if there is a shift in the predominant action or conclusion for a given rule brought about by incrementing and decrementing the associated counts for the rule action or conclusion.
- Another rule set can be formed that has all redundancy for each predominant action or conclusion removed.
- This non-redundant set of rules is determined by expanding each set of relevant attribute values for each rule into a canonical form, which permits redundancy among the rules to be more easily observed.
- the non-redundant rules contain only relevant attributes, and cover a large portion, if not all, of the possible attribute combinations. Accordingly, these non-redundant rules will typically be small in number, usually much smaller than the possible number of rules that could be generated given the set of all possible attribute values.
- the present invention thus simplifies the data mining process to provide a concise and highly useful result, without suffering from "the curse of exponential explosion" often mentioned in artificial intelligence literature.
- Various subset domains of knowledge can be defined to represent the overall domain of knowledge contained within the data.
- Each of the subset domains are related to each other in a hierarchy that provides a representation of the overall knowledge domain. By breaking down the overall domain of knowledge into smaller pieces for representation of the data, each of the subset domains can become fully defined as soon as the data related to a given subset domain is received and processed.
- the subset domains can be generalized in the same way that the rules describing the data are generalized.
- the subset domains can be mutually exclusive while representing the knowledge related to the overall domain with a minimized set of rules.
- the results contained within the subset domains can be aggregated or condensed in upper levels of the hierarchy that serves to organize all the subset domains with respect to each other.
- the subset domains all typically use the same attributes, even if a number of attributes in the various subset domains are declared irrelevant.
- the complete set of non-redundant, mutually exclusive and minimized rules represent all the relevant knowledge contained in the data received to that point. If there is insufficient data to completely define all the rules representative of the data, the rules may exhibit some overlap or gaps. Overlap is observed through rules with different conclusions, yet with the same set of attribute values. Gaps in the data is observed through portions of the domain not covered by any data example. Initially, the method produces a gap with the first data example. The gap can be filled in if desired by adding extrapolated rules determined by the first data example. The second received data example eliminates the gap.
- the system and process generalizes the data presented as representative of a domain of knowledge by calculating and saving intermediate results. Accordingly, an entire set of amassed data can be processed to achieve an intermediate result, that is further adapted upon application of new data examples.
- the system and method of the present invention can also handle multivalued or analog type parameters in a set of attribute patterns representative of a domain of knowledge.
- the continuous type parameters can be segmented into discrete value ranges, so that multiple attributes represent a single continuous parameter.
- a multi -valued attribute will be assumed to contain a single value. If more than one value is contained in the mulit-valued attribute of the example, it will be considered as a separate example for each value. Thus, a new rule will be generated for each of the different values encountered in data examples. If multi -valued attributes are compared between two rules, and the attribute values match, then that specific value of the multivalued attribute can be declared irrelevant or redundant, rather than the entire multivalued attribute. Also, if two or more values are in rules with the same conclusion and the rules only differ by those values, the values may be grouped (effectively reducing the dimensionality of the attribute) and the rules combined into one.
- Fig. 1 is a diagram illustrating the steps of the data mining method
- FIG. 2 is a diagram illustrating the step of selecting a data example
- Fig. 3 is a diagram illustrating selection of a domain
- Fig. 4 is a diagram illustrating an overall procedure for processing real-time data examples
- Fig. 5 is a diagram illustrating an update to conclusion counts
- Fig. 6 is a diagram illustrating the removal of redundant rules
- Fig. 7 is a diagram illustrating expansion of the rules into canonical form to facilitate the elimination of redundant rules
- Fig. 8 is an illustration of a data example containing an attribute list and associated action or conclusion; and [0048] Fig. 9 is an example of a canonical expansion of a relevant attribute rule for redundancy checks.
- a flow diagram illustrating an overview of the system and method according to the present invention is shown.
- a data example relating to a situation is gathered and formatted for use according to the present invention.
- the data can be accumulated over a period of time to provide an amassed set of information, or can be processed in individual records as they are generated or received.
- unique patterns in the data are identified and resolved into rules. Generating the rules in this manner maintains the uniqueness of the patterns represented in the rules.
- the generation or update of the rules accommodates a single data example at a time when sequentially processing an entire set of amassed data examples or upon receipt of new data when processing in real time.
- relevant attributes for each of the rules are determined in a step 300.
- Relevant attributes are preferably attributes with values that contribute in some way to the conclusion associated with a given rule. As new data examples are received and processed, shifts may occur in the relevancy of attribute values as conclusions for a rule are updated.
- Step 300 permits attributes to be identified as relevant or irrelevant to the particular conclusion with which they are associated.
- the rules can be expanded into a canonical form to more easily identify redundancies in an optional step 400.
- a step 500 removes redundant rules in the set of rules determined from steps 100-400. Once redundancies are removed from the rules, an optional step 600 permits review of the result to determine if any overlap of information exists between the rules (rules with different conclusions, yet with the same set of attribute values). Overlap between the rules can be resolved with input from an operator, or by obtaining further data examples that can resolve the discrepancies in subsequent process loops.
- the final result is a set of rules that completely describe the domain of knowledge with no conflicting conditions.
- Rules of intelligence can be defined as representing knowledge contained within the data if each rule contains 1) attributes describing a situation or concept, and 2) an appropriate action or conclusion to be taken based on those specific attribute values. It is also assumed that the majority of these data records contain correct actions or conclusions associated with each set of attribute values. That is to say, the conclusion for each associated set of attribute values in a data example is inferred as a correct conclusion in the general case.
- the data examples may contain errors in the attribute values or the associated conclusions in practice.
- Each data example with a set of attribute values and an associated conclusion can be a data record reflecting information related to a situation in everyday life.
- the system and method of the present invention builds a knowledge base representative of the system or concept for which information is collected.
- the data records are preferably discretely valued, containing a number of discrete attribute values associated with a discrete conclusion.
- the invention accommodates continuously valued parameters by separating them into discrete ranges of continuous values, for example. If it is known that certain ranges of the continuous value have similar effect, then those ranges may be defined as discrete attribute values.
- the granularity of the continuous value parameters represented by discrete ranges can be improved by increasing the number of discrete attribute values representing the continuous parameter.
- the invention permits multi- valued attributes that can assume a number of discrete values in a range. For example, instead of having an attribute that is binary in nature, a multi- valued attribute can be tertiary or quaternary valued. It should be apparent that any type of attribute configuration can be accommodated in the invention, with the attributes preferably being discretely quantized.
- Data records can be analyzed for attribute patterns beginning with the first data received, or in the case of amassed data records, the first data record. In the case of amassed data records, if it is not desired to imply greater importance to the last records analyzed, then that part of the processing can be eliminated or set to have a very high maximum value for conclusion counts.
- An initial data record is selected for processing, whether it be the first data received in real time, or the first data record taken from a collected set of data records. The information contained in the data record is then compared with subsequent data records to determine whether new information can be obtained through the comparison.
- a number of data records can be processed in this way, resulting in a set of mutually exclusive rules that each contain a set of attributes and a group of conclusions associated with the specific list of attributes.
- the group of conclusions associated with a specific set of attribute values in a given rule generally includes a correct conclusion and several conclusions that reflect alternate conclusions or possible errors in the data (attribute values and/or conclusion). Data errors can generally be manifested in a number of conflicting actions or conclusions for the same set of attribute values.
- a given rule may represent an attribute pattern that has differing actions or conclusions for the same set of attribute values.
- the present invention permits the selection of a predominant action by assigning counts to each of the conclusions that occur for a specific set of attribute values in the data records.
- the conclusion or action associated with a particular set of attribute values that has the highest count value is preferably designated as the predominant action for that set of attribute values.
- the predominant conclusion or action is chosen from a group of conclusions or actions based on the count value associated with that conclusion, there is a statistical impact on the data associated with the knowledge domain. For example, there may be a statistically small occurrence of a particular conclusion that is associated with a set of attribute values that may be of particular interest to the domain of knowledge. If the practical error rate for the data under examination approaches the frequency of occurrence for the infrequently occurring conclusions of interest, these conclusions of interest may be missed altogether. In a situation such as this, the statistical selection of the predominant conclusion based on counts may result in a set of rules that does not contain all the knowledge of interest in representing a domain of knowledge relevant to a given situation.
- An example of a data pattern that can typically result in a statistically small, but interesting set of conclusions, is when there is fraud in a transaction.
- the number of transactions that do not contain fraud may be much larger than the number of occurrences of fraudulent transactions.
- the number of occurrences of fraudulent transactions appearing in the data may be comparable to the occurrences generated by a practical error rate for the non- fraud data. If it is the fraudulent transactions that are of interest in the particular domain of knowledge, the overwhelming numbers of non- fraudulent transactions, that may include errors that mimic fraudulent transactions, will diminish the significance of the fraudulent transactions. This misinformation will cause fraud rules to be missed or identified as erroneous.
- the overall probability of fraud n/N.
- non-fraud examples and fraud examples must be more balanced.
- the problem can be overcome by reducing the number of non-fraud examples, and/or increasing the number of fraud examples, n. With the number of instances of each conclusion or action occurring in roughly comparable numbers, the examples of interest will occur significantly more often than the erroneous examples. Modifying the selection of data to include more examples of interest and/or to decrease the instances of other conclusions does not change the intelligence content of the data. While a particular portion of the data is given more focus, the underlying and attendant information remains unchanged.
- a portion of the erroneous examples may be discarded to avoid introducing misinformation.
- FIG. 2 an illustration of a flow process for obtaining data that is properly balanced is shown. Information about data error rates and infrequently occurring conclusions is gathered apriori. A next data example is selected in step 110. A decision step 120 determines if the number of data example errors based on the expected error rate exceeds a predetermined fraction of the pertinent examples of interest. If there is no difficulty with an infrequently occurring conclusion being overwhelmed, decision step 120 branches to the "NO" path, and the process ends in a step 140.
- decision step 120 branches to the "YES" path.
- a step 130 causes frequently occurring data examples to be discarded to balance the data. This process can also be viewed as sampling the data. Once the data examples are deleted in step 130, the process returns to step 110 to accept the next example.
- the process in Fig. 2 can be revised if more information about the data becomes available.
- the data contains non-fraud related examples that have two or more differing conclusions that occur in comparable quantities with respect to each other.
- the non-fraud examples should be discarded or sampled to maintain the relative statistical relationship between the non- fraud examples having differing conclusions.
- none of the fraud related examples should be discarded.
- fraud or non- fraud will normally not be known reliably until a later time. At that time, correction for an original erroneous conclusion should be made by correcting the conclusion counts (non-fraud and fraud) for the rule that represents the situation.
- this sophistication can be programmed according to the present invention by monitoring the number of examples received for each conclusion or action.
- the present invention then preferably prevents a ratio of the examples from exceeding a value for which a practical error rate would introduce an erroneous conclusion. That is, the greatest number of examples having a specific conclusion do not exceed some multiple of the smallest number of examples having another conclusion that would lead to the introduction of an erroneous conclusion, given the practical error rate for the data.
- the number of correct examples and the number of erroneous examples is used to determine the practical error rate.
- the practical error rate is used to determine the number of expected erroneous examples in a generalized process, in which it can be assumed, if not otherwise known, that there is an even distribution of data errors.
- Multiple domains of knowledge can be represented by separate sets of rules, each separate set of rules being developed using the same methodology. Selection of the appropriate set of rules for a given situation or concept represented by the data can be determined according to a set of selection rules. These selection rules can be developed using the same methodology for determining relevant rules according to the present invention. The resulting hierarchical structure with multiple knowledge domains permits all of the separate sets of rules to be developed concurrently as the data examples are acquired. The selection rules coupled with the separate sets of rules can be placed in a hierarchical construction that can be expanded to as many levels as necessary to represent all the domains of knowledge desired.
- a set of rules representing a broad range of knowledge can be formed using a number of limited domains, each of which can become fully defined as soon as a sufficient number of examples for each domain is acquired. If it is not possible to define the limited domains in advance, a selection procedure can automatically define the domains as appropriate examples are encountered.
- FIG. 3 a simple illustration of selection of one or more appropriate domains is shown with an entry step 202.
- the data example obtained in a step 210 is equivalent to that obtained in step 100 shown in Fig. 1.
- a decision step 220 determines whether a set of rules for assigning attributes and conclusions to appropriate domains exists. If multiple domains exist, and the domain selection rules are formed, the domain(s) appropriate for the data example can be selected, and the data example is then applied to the appropriate domains, as illustrated in a step 230. If multiple domains are not defined, decision step 220 branches to the negative result, and the data example is simply applied to the existing set of rules.
- the data examples are records of details, in the form of attribute values, describing events or observations relating to situations occurring in everyday life. From these records, a machine can be configured to execute a programmed method according to the present invention to discover patterns within the data representing those situations and build a knowledge base.
- the present invention preferably uses the first data example as a first rule. It should be apparent that any data example can be selected as a rule for executing the method according to the present invention.
- the selected data example forms the first rule of each designated domain.
- the first rule in each of the domains in this case is preferably formed with only the attributes and conclusions designated for the particular domain according to the domain selection rule set. All other attributes can be marked as irrelevant to the domain, if not discarded.
- the domain selection rule set is also preferably formed with only the attributes and conclusions needed to select the appropriate domains. If the domain selection rule set is part of a hierarchy having more than two levels, then each domain of all the domain selection rule sets is preferably formed using only the attributes and conclusions necessary to select the appropriate lower level domains. This hierarchical level structure can be repeated for any number of domain levels.
- the same attribute may be used in more than one domain, and are used on more than one domain level given a number of hierarchy levels for domains. For example, environmental conditions such as the temperature may influence more than one domain, and may be pertinent to more than one domain level in a domain hierarchy. If the first data example does not contain attributes or a conclusion related to a particular domain, the domain preferably remains in a state associated with waiting for a first data example.
- a flow diagram illustrating the processing of data examples is shown. Entry to the process is found at a step 302, which is directed to a step 306 in which a data example is obtained.
- the data example obtained in step 306 can be a real time data example related to instantaneous or very recent events. Alternatively, the data example can be obtained from a sequential list of examples that have been accumulated over a period of time and stored for processing. As new data examples are acquired in step 306, they can be applied to all previously defined domains, as discussed above.
- the application of a data example to a domain rule set is illustrated in a step 310, in which comparisons between the data example and the appropriate domain rule set takes place.
- Domains encountering a data example with assigned attributes or conclusions for the first time in step 310 treat the data example as a first rule in the domain. If no domains are defined, the first data example obtained from step 306 is treated as the first rule in the rule set in step 310.
- step 310 When a domain already has at least one rule, new data examples assigned to that domain are compared to the existing rule(s) in step 310.
- a decision step 314 determines if the attribute values contained in the new data example match an existing rule for the domain. If an attribute value match between the data example and a rule is obtained, decision step 314 branches to the "YES" path, and the conclusion counts for the matched rule are updated in a step 320 in accordance with the conclusion found in the data example.
- decision step 314 branches to the "NO" path, where a new rule for that domain is made from the data example in a step 324.
- a new rule generated from a data example that does not match any existing rule in step 324 has a conclusion count of one (1) for the conclusion associated with the data example and that conclusion is designated as the rule's predominant conclusion. All other counts related to conclusions for the newly formed rule are set to zero.
- step 320 When the conclusion counts are updated in step 320, due to encountering a data example with attribute values that match those of the rule, the rule conclusion counter related to the conclusion found in the data example is typically incremented as shown in Fig. 5, which begins with an entry step 400.
- a decision step 404 checks if the matching rules 's count for the conclusion that matches the conclusion in the data example is at a maximum. If so, the process branches to a decision step 406 that checks for other conclusion counts greater than zero. If other conclusion counts are greater than zero, decision step 406 branches to a step 407, in which those other conclusion counts greater than zero are decremented.
- decision step 404 determines that the count for the matching rule conclusion count is not at a maximum, the process branches to a step 405, in which that rule conclusion count is incremented.
- the various branches of the process complete at a step 408.
- the maximum value for a conclusion count is chosen based on how quickly a change in data example conclusions are preferably recognized in the set of rules. One maximum value may be selected for the entire system or optimized values may be used for each rule.
- This technique of incrementing and decrementing conclusion counts emphasizes the knowledge contained in more recent data examples over that contained in older data examples. Attribute patterns and conclusions that occur with greater frequency in more recently acquired data examples can quickly overcome the rule conclusions that are supported by hundreds of older data examples. For example, setting the maximum conclusion count for a rule to a small number such as, for example, five, enables six new data examples in a row (fewer if the count is non- zero when the string of examples begin) to change the predominant conclusion for the rule. The predominant conclusion is changed if, for example, six data examples containing the same set of attribute values relevant to the rule, having the same previously unencountered conclusion, are assimilated into the rule.
- the first five of these new conclusions will increment the associated conclusion count to the maximum of five, while with the sixth occurrence of the new conclusion, the previously predominant conclusion count is decremented to a value of at most four.
- the new conclusion is then designated the predominant conclusion.
- the designation of predominant conclusion is preferably changed only when a count for a non-predominant conclusion exceeds the count for the designated predominant conclusion to reduce frequent changes. To increase the suppression of frequent changes, the decision to change the designation can be delayed until the largest count exceeds all others by more than one.
- the condition supporting a change in predominant conclusion depends upon the new data examples having attribute values matching the rule, and having an associated conclusion different than the predominant conclusion.
- a shift in the predominant conclusion for a rule indicates that the rule is now associated with the new conclusion.
- a decision step 328 determines if a shift in the predominant conclusion has occurred. If a shift in the predominant conclusion for a rule has occurred, and there is more than one rule in the domain or rule set, decision step 328 branches to the "YES" path to initiate a sequence to reprocess the existing rules to determine any changes to the relevancy of the rule attributes.
- the existing rules are also preferably reprocessed if, for example, a new rule is created in step 324, and the new rule has a conclusion that is different than the predominant conclusions of other rules in the same domain.
- a decision step 332 checks the conclusion of the newly created rule from step 324, and branches to the "YES" path for reprocessing if the conclusion differs from those of other rules in the domain or rule set.
- the addition of a new rule with a new conclusion may affect the relevancy of attribute values in other rules in the domain. If the addition of a new rule in step 324 does not result in a conclusion that differs from those of other rules in the domain, the attributes of the rule are all considered relevant to the rule conclusion.
- decision step 332 branches to the "NO" path to return to the beginning of the process to obtain a new data example. This occurs only when starting a domain and continues until the first example containing a different conclusion is encountered.
- a step 336 begins the rule reprocessing by identifying the relevant attributes in a rule through comparisons of the attribute values with other rules having different predominant conclusions. The values of attributes that correspond between the rules under comparison are compared with each other, and if any of the attribute values match, meaning that they do not contribute to differentiating the two differing conclusions, then they are marked irrelevant in both the new rule and in the rule to which it is compared.
- the rules may be expanded into canonical form in optional step 340.
- the canonical expansion is used to simplify the identification of redundant rules. For example, two rules that are mutually exclusive because they have differing attribute value patterns may still be redundant in their conclusion or action. If the two rules have the same conclusion, and a common subset of identical attribute values, the rules are redundant.
- the canonical expansion in step 340 sets up the attribute values in an easily comparable form to identify any existing redundancies.
- the rule is rewritten in canonical form.
- the canonical form is an expansion of the rule resulting in a generalized form that contains relevant attributes and a predominant conclusion.
- each group of rules with the same conclusion is reviewed in a step 344 to eliminate any redundancy that may exist. Once redundancies are eliminated in step 344, the resulting set of rules provides a conclusion for every possible combination of the attributes for its knowledge domain if at least two rules were generated for the domain. If sufficient examples have been provided, the information about the domain represented by the rules will not contain overlap, e.g., the rules will be consistent with each other, and mutually exclusive.
- the procedure preferably accepts a new data example for processing, as illustrated by step 306 in Fig. 4.
- the procedure can continue for as long as data examples are supplied, or can be discontinued and restarted at any point.
- the procedure can also be applied to an amassed set of data examples to produce a set of rules for that knowledge domain.
- the method according to the present invention is preferably suitable for developing personalization rules based on user interaction with a real life system.
- the rules resulting from application of the method are developed in the following steps: [0081] (1) Format the Data
- the data is arranged to focus on a domain of knowledge.
- the domain of knowledge to be represented by the rules is preferably decided upon by an operator or system developer.
- the operator preferably selects the conclusions or actions that are of interest for the domain (and any subset domains), and the attributes that are used to describe the situations for which the conclusions of interest apply in the domain (and subset domains).
- the data is organized into a regular format, or if the data is arriving in real time, it is formatted as it is received. Referring to Fig. 8 momentarily, the data is preferably organized into an ordered set of attribute values, followed by a conclusion associated with the attribute values. Counters for the conclusions are reserved in relation to the rules that are constructed from the data examples.
- the examples can be sampled, as discussed above with regard to Fig. 2, if there is a concern that data examples with infrequently occurring information may be masked by erroneous data related to frequently occurring conclusions.
- the sampling is conducted to prevent the ratio of the most frequently occurring conclusion to the most infrequently occurring conclusion from exceeding a value based on an assumed or practical error rate. If the number of examples of some of the conclusions is small relative to the number of examples of other conclusions, a fraction of the data examples with more frequently occurring conclusions may be discarded. Discarded data examples can still contribute new information simply through the fact of occurrence and the time of occurrence, which may be recorded for use by the system.
- the attribute values of the data example are used as the attribute values of a first rule in a first rule set.
- the counter for the respective conclusion found in the data example is set to one, and that conclusion is designated as the predominant conclusion or action for the rule. All other conclusion counters are set to zero.
- the first data example becomes the first rule in each subset domain in which the associated conclusion is to be represented.
- Each of the first rules may have attributes omitted or specifically marked as irrelevant according to the subset domain definition, as illustrated in Fig. 3. Some attributes in the example may be known apriori to be relevant in the highest hierarchical level and thus be implicit in the lower level, making their explicit presence unnecessary. [0085] f3) Mark Initial Relevant Attributes
- each attribute of the rule not specifically marked irrelevant by the subset domain definitions, as being relevant to the predominant rule conclusion.
- all the attributes are marked as belonging to a relevant attribute list referred to here as List 1.
- List 1 comprises (a, b, c). Marking the attributes as relevant can take the form of a list indicator. Since other rules that may be added to the first rule set can have their associated attribute values included in a number of lists (i.e. List 1, List 2, etc.), relevancy can be shown by inclusion in a list.
- a list of relevancy indicia marks can take the form of (1, 1, 1), meaning that the attributes a, b and c all belong to List 1 and are relevant.
- a "0" may be used to indicate that an attribute is irrelevant, for example. If subset domains are defined as discussed above, some attributes may initially be specifically marked as irrelevant to simplify processing, with a -1, for example. If not so marked, those attributes would be discovered to be irrelevant by the method if their values are invariant or enough data examples are used. [0087] (4) Generate Initial Final Rule
- a second rule is generated it will complete the rule set, making the inserted rule redundant. It should be apparent that there may be more than two potential values for each relevant attribute, i.e., a' represents any other value that the attribute "a" can accommodate.
- the second rule set made of canonical rules is preferably copied into a third rule set, also referred to as a final rule set.
- a third rule set also referred to as a final rule set.
- the second rule set could serve as the final rule set, it will be seen that it would require additional processing to rebuild modified rules.
- the conclusion counter for the rule is updated related to the conclusion found in the data example. If incrementing the counter would exceed a predetermined maximum count; the counter is not incremented, and the counts of the other conclusions or actions for that rule that are greater than zero, are decremented. When the conclusion count is at a maximum and all other conclusion counts are decremented in response to the data example with the matching attribute values, the maximum count conclusion is designated as the predominant conclusion for the rule, if a larger difference is not required (as previously discussed). Since the rules are built to be mutually exclusive with regard to attribute value patterns, once a match for the attribute pattern has been found, the comparison terminates.
- a review of the rules is preferably done to identify relevant and irrelevant attributes as shown in steps 328 and 332 in Fig. 4. If a new rule with a predominant conclusion or action different from those of the other rules in the domain is formed through the above process, a review of the rules is preferably conducted. In addition, when the predominant conclusion or action of a rule switches from one conclusion to another through updated conclusion counts, and there is more than one rule in the domain or rule set, a review of the rules is preferably conducted. The changes in the conclusions for the rules in the rules set can indicate that the relevancy of some attributes with respect to their associated conclusions has changed. The review or reprocessing of the rules is conducted to properly identify irrelevant or newly relevant attributes in the rules.
- the rule processing calls for all the attributes, except those specifically marked irrelevant, of any new rules generated in (5b) to be marked as relevant by belonging to a relevant attribute List 1 of that rule. For example, if the rule has attributes (a, b, c, d), indicia marks are provided with respect to the relevancy of the attributes: (1, 1, 1, 1). The attribute values of the new rule are compared to the attribute values of every rule in the first rule set that has a predominant conclusion different from that of the new rule. A copy of the indicia marks for the compared rule is made prior to the rule comparison.
- the indicia for the matching attribute values is changed to irrelevant to record the comparison result.
- a typical relevant indicia list might be (0, 1, 2, 2).
- 0 indicates that attribute 'a' is irrelevant
- 1 indicates that attribute 'b' belongs to attribute List 1
- the 2's show that attributes 'c' and 'd' are in attribute List 2.
- the procedure for developing the various Lists is discussed more fully below. It is possible to have a number of Lists for each rule, and the combination of the irrelevant attributes and the various Lists represents all of the attributes in the rule.
- Each attribute in a rule belongs to one of the Lists of relevant attributes or is marked irrelevant (e.g. 0 or -1).
- the comparison can result in at least one attribute in the rule remaining marked relevant.
- the relevancy marks contained in the lowest numbered List in which at least one relevant attribute is found are retained.
- the other relevant attribute value marks are restored from their copies made prior to the comparison and subsequent relevancy mark changes.
- attributes (a, b, c, d) and respective relevancy marks (0, 1, 2, 2), suppose that the relevancy marks for attributes 'b' and 'd' are both changed to 0, i.e., marked irrelevant as a result of the comparison.
- the new relevancy indicia list for that rule becomes (0, 1, 2, 0).
- attribute 'a' remains irrelevant.
- Attribute 'b' is a member of List 1, since List 1 has no remaining relevant attributes, and is restored from the List 1 copy. Attribute 'c' is the only remaining relevant attribute in List 2, with 'd' being declared irrelevant. Accordingly, List 2 (attribute 'c') is retained in its simplified form as indicated by the indicia mark '2' in the location indicative of attribute 'c'. If there were a List 3 containing none, one, or more remaining relevant attributes, it would also be restored from its copy because it has a List number that exceeds that of List 2, and List 2 had a relevant attribute left after the comparison concluded.
- the relevancy indicia marks for that rule are restored from the copies made prior to the comparison.
- the values of the i ⁇ elevant attributes that do not match the values of the corresponding attributes to which they are compared are then marked as belonging to a new relevant attribute List.
- the new relevant attribute List is numbered as the next higher number in the order of relevant attribute Lists, i.e. 2, 3, 4, etc.
- the rule that has the changed predominant conclusion is compared against all other rules in the domain having predominant conclusions that differ from the new predominant conclusion of the changed rule.
- the changed rule is treated as a new rule and processed as provided in (6a). The difference is that some rules preferably do not have their relevancy indicia marks modified.
- the changed rule and the rules that have a predominant conclusion matching that previously held by the changed rule are preferably allowed to have their indicia marks modified, while the relevancy indicia marks of all other rules preferably do not change with the comparison.
- the relevancy indicia marks of rules that have a predominant conclusion that matches neither the new nor the previous predominant conclusion of the changed rule preferably remain the same. Accordingly, it is not necessary to make copies of the relevancy indicia marks for these compared rules prior to a comparison. If the changed rule is compared against a rule that has a predominant conclusion that matches that previously held by the changed rule, then the relevancy indicia marks of both rules can be modified. When a situation is encountered where the relevancy indicia marks for a rule can be modified, a copy of the marks is made prior to the comparison, in case the marks need to be restored by changes due to i ⁇ elevancy, as indicated in (6a). [00113] (6b2 Compare Other Rules With Same Conclusion
- the rules in the domain that have a predominant conclusion that is the same as that of the new predominant conclusion for the changed rule are reviewed to check for relevancy as well. These reviewed rules are compared to all other rules in the domain that have differing predominant conclusions. Each of these reviewed rules is treated as a new rule and processed as provided in (6a).
- the relevancy indicia marks of the rules to which the reviewed rules are compared preferably are not modified as a result of the comparison. Accordingly, copies of the relevancy indicia marks for the rules to which the reviewed rules are compared are not required. However, copies of the relevancy indicia marks for each of the reviewed rules under comparison are preferably made prior to the comparison.
- Changing a predominant conclusion can institute a number of rule comparisons according to this process (as many as N(n-l) for n rules of which N belong to the set of rules having the predominant conclusion of the new predominant conclusion of the changed rule). Accordingly, it may be preferable to delay recognition of the predominant conclusion change until the associated conclusion count exceeds the other conclusion counts by more than one count to avoid unnecessary computation. If it is known that the data is noisy, with a variance of ⁇ for example, then a change of 2 ⁇ might be used as the delay threshold before recognizing the new predominant conclusion. The delay threshold preferably does not require the prior predominant conclusion count to be decremented below zero due to the recognition delay. [00116] (7) Generate final rules
- the new and modified rules of the first rule set are preferably expanded into canonical rules in the second rule set.
- the new and modified canonical expansion rules preferably replace any previous versions for those rules in the second rule set.
- the changes in the canonical rules are then preferably incorporated into a third rule set that contains the final rules describing the domain without redundancies among the rules.
- the new rule is preferably expanded into canonical form (one or more rules that represent the rule) and placed in the second and third rule sets.
- rules that have the same conclusion as the new rule are examined and redundant rules are preferably removed.
- the non- redundant rules in the third rule set are examined to determine if rules can be combined in a more generalized form that permits the elimination of an attribute.
- a domain's third rule set with information represented by three attributes, (a, b, c) could have a rule [1] with an attribute set of (x, b, x).
- the set (x, b, x) can by broken down into the subsets (a, b, x) and (a', b, x), where x is an attribute placemarker that represents any value of the attribute and a' represents "not a”. If there is a rule [2] with the same conclusion and with an attribute set of (a, x, x), it can be broken down into the subsets (a, b, x) and (a, b', x). Accordingly, the subset (a, b, x) can be deleted from rule [1], since it is redundant to that in rule [2].
- Rules [1] and [2] are therefore completely represented by the sets (a, x, x) and (a', b, x). Alternately, this resultant rule pair (a, x, x), (a', b, x) can be rewritten as rules (a, b', x), (x, b, x) if needed, to combine the attribute sets with that of another rule. [00121] (7b) Change in Relevancy Indicia Marks
- the rule(s) with the changes are preferably expanded into canonical form rules in the second rule set, replacing any prior canonical form version of those rules.
- All the rules in the third rule set with the same action as the changed rule(s) are preferably deleted and recreated from the second rule set.
- the third rule set will then be consistent with the modified rule(s) in the second rule set.
- the third rule set is then examined to remove redundant rules and combine rules that enable elimination of an attribute as discussed above.
- the combination of rules can occur through grouping of multivalued attribute values.
- an attribute can represent several different values of a multi- valued attribute in combination with other rules. If two rules with the same action in the third rule set match exactly except for having different values for one multivalued attribute, the two rules can be combined into one rule.
- the attribute values of that attribute of both rules are preferably grouped together, excluding duplicate attribute values. The result is a single rule containing all the relevant attributes of the previous two rules, with the differing values of the multi- valued attribute being grouped to act as a single attribute.
- Fig. 6 illustrates a process for consolidating rules according to the present invention.
- This section provides an optional action that can be taken as a result of observing the rule outcomes. Refe ⁇ ing to Fig. 7, If there are any gaps in the domain information not covered by the rules, an operator can be notified in an optional step 520. Some steps an operator might take upon notice of gaps in the rules can be to acquire more specific data examples related to filling in the information. In addition, if there is any overlap between rules having different conclusions, conflicting rules, further sampling to provide particular data examples can dispel the overlap. In addition to taking specific data examples or new samples, an expert on the domain of knowledge can decide on how any conflict between the rules should be resolved. All information regarding the domain of knowledge can be recorded for later use in resolving conflicts. An operator may be able to better distinguish redundant rules after all relevant attributes of all the rules are expanded into canonical form, as illustrated in step 510.
- the system and method of the present invention provides a complete and consistent rule set for the domain of knowledge under observation as long as enough data is provided.
- the first operation in the process organizes the incoming data, either realtime or stored, for the succeeding operations.
- the second operation initiates processing by reviewing the first data example and creating the first rule of the first rule set.
- each attribute of the first rule is preferably marked relevant in the third operation.
- the relevancy indicia marks indicate that the attributes all belong to one (the first) relevant attribute List. Further operations can introduce new sequentially numbered relevant attribute Lists, each having relevant attributes related to a subset of data examples.
- the fourth operation completes the initial processing of the first rule in the first rule set.
- the rule is preferably expanded into canonical form and placed in the second rule set. Since the rule is already in reduced form (there are no other rules), it is also placed into the third rule set. Subsequent operations preferably use the second rule set as an intermediate location for canonical rules.
- the third rule set becomes the final product of the system and method of the present invention.
- Each of the rules in the third rule set represents the intelligence or knowledge evidenced by the information contained in the data examples.
- Each new data example is preferably processed through all of the operations 5-7 prior to accepting the next data example. If the rate of accepting data is very high in comparison to the speed, at which the cu ⁇ ent data example is processed, input data might have to be queued. It is possible to sample the data examples received to avoid having information queued, or if the system is to provide real-time results without a large lag time. When there are a large number of certain data examples that threaten to overwhelm the importance of less frequently occurring data examples due to the magnitude of the data error rate, a fraction of the more frequently occ ring data examples may be discarded; thus reducing processing and suppressing e ⁇ oneous conclusions.
- Another strategy to handle high input data rates is to delay the processing of operation 6, particularly 6b, during periods of high input data rates.
- Operation (5) preferably assures that new rules are added to the first rule set only if the new rule is mutually exclusive of all the other rules in the first rule set. If the new data example matches any rule, the conclusion count for that rule is modified such that the predominant conclusion count never exceeds a predetermined maximum.
- Operation (6a) considers newly created rules in the first rule set, separating the attributes of the rule into relative attribute Lists. Attributes that are irrelevant to the predominant conclusion of the rule are so marked, while relevant attributes are placed in relevant attribute Lists that distinguish that rule from all the other rules having a different predominant conclusion.
- the relevant attribute Lists for the rule are preferably organized into relevancy indicia marks for that rule that show the relevancy of attributes and the relevant attribute List to which the attribute belongs, if any.
- the convention of using relevant attribute lists permits modification of the rules in a structured format, as needed upon comparison to another rule with a different predominant conclusion.
- Each relevant attribute List differentiates the rule from a subset of the other rules. Attributes that are not required in making these distinctions are recognized as irrelevant and can be excluded from the attribute lists in the final rules formed in operation (7).
- Operation (6b) considers changes made to the relevancy indicia marks for rules in the first rule set when the predominant conclusion of a rule is modified through exposure to the information in a new data example.
- the relevancy indicia marks for rules having the prior and changed predominant conclusion are preferably modified, while the marks for other rules remain unchanged.
- the seventh operation completes the rule generation process for each new data example encountered.
- the procedure loops back to Step 5 to continue processing new data examples.
- the rule is expanded into a canonical form and preferably replaces prior versions of the rule in the second rule set.
- the rules in the second rule set are copied into the third rule set and examined to reduce or consolidate the rules if possible.
- the canonical form of new rules are also preferably placed in the third rule set, and examined with other rules having the same predominant action to reduce or consolidate the rules if possible.
- the rules in the third rule set are examined for inconsistencies or redundancies, and made consistent if possible.
- the third rule set is the final output of the system; accurately representing the intelligence contained in the data examples presented to the system.
- These rules can be used in an expert system to supply the appropriate response for situations covered by the domains of knowledge from which the data examples were derived.
- the action or conclusion for the rule that matches the new data can be inferred by the expert system to be the most appropriate action or conclusion to draw.
- One basis for this result is that the data examples used to develop the rules consistently represent the best course of action, given their particular attribute value pattern.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002339808A AU2002339808A1 (en) | 2001-05-23 | 2002-05-23 | Real-time adaptive data mining system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29323401P | 2001-05-23 | 2001-05-23 | |
US60/293,234 | 2001-05-23 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002095676A2 true WO2002095676A2 (en) | 2002-11-28 |
WO2002095676A3 WO2002095676A3 (en) | 2003-01-23 |
Family
ID=23128259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/016069 WO2002095676A2 (en) | 2001-05-23 | 2002-05-23 | Real-time adaptive data mining system and method |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002339808A1 (en) |
WO (1) | WO2002095676A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006085293A1 (en) * | 2005-02-10 | 2006-08-17 | Norkom Alchemist Limited | A transaction data processing system |
EP1953659A1 (en) * | 2007-01-30 | 2008-08-06 | Daintel ApS | A method for effecting computer implemented decision-support in prescribing a drug therapy |
US7788278B2 (en) * | 2004-04-21 | 2010-08-31 | Kong Eng Cheng | Querying target databases using reference database records |
US20140089219A1 (en) * | 2012-09-25 | 2014-03-27 | Lone Star College | A system and method that provides personal, educational and career navigation, validation, and analysis to users |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787274A (en) * | 1995-11-29 | 1998-07-28 | International Business Machines Corporation | Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records |
US5787425A (en) * | 1996-10-01 | 1998-07-28 | International Business Machines Corporation | Object-oriented data mining framework mechanism |
US5899992A (en) * | 1997-02-14 | 1999-05-04 | International Business Machines Corporation | Scalable set oriented classifier |
US6055539A (en) * | 1997-06-27 | 2000-04-25 | International Business Machines Corporation | Method to reduce I/O for hierarchical data partitioning methods |
US6115709A (en) * | 1998-09-18 | 2000-09-05 | Tacit Knowledge Systems, Inc. | Method and system for constructing a knowledge profile of a user having unrestricted and restricted access portions according to respective levels of confidence of content of the portions |
US6199068B1 (en) * | 1997-09-11 | 2001-03-06 | Abb Power T&D Company Inc. | Mapping interface for a distributed server to translate between dissimilar file formats |
US6230151B1 (en) * | 1998-04-16 | 2001-05-08 | International Business Machines Corporation | Parallel classification for data mining in a shared-memory multiprocessor system |
-
2002
- 2002-05-23 WO PCT/US2002/016069 patent/WO2002095676A2/en not_active Application Discontinuation
- 2002-05-23 AU AU2002339808A patent/AU2002339808A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787274A (en) * | 1995-11-29 | 1998-07-28 | International Business Machines Corporation | Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records |
US5787425A (en) * | 1996-10-01 | 1998-07-28 | International Business Machines Corporation | Object-oriented data mining framework mechanism |
US5899992A (en) * | 1997-02-14 | 1999-05-04 | International Business Machines Corporation | Scalable set oriented classifier |
US6055539A (en) * | 1997-06-27 | 2000-04-25 | International Business Machines Corporation | Method to reduce I/O for hierarchical data partitioning methods |
US6199068B1 (en) * | 1997-09-11 | 2001-03-06 | Abb Power T&D Company Inc. | Mapping interface for a distributed server to translate between dissimilar file formats |
US6230151B1 (en) * | 1998-04-16 | 2001-05-08 | International Business Machines Corporation | Parallel classification for data mining in a shared-memory multiprocessor system |
US6115709A (en) * | 1998-09-18 | 2000-09-05 | Tacit Knowledge Systems, Inc. | Method and system for constructing a knowledge profile of a user having unrestricted and restricted access portions according to respective levels of confidence of content of the portions |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7788278B2 (en) * | 2004-04-21 | 2010-08-31 | Kong Eng Cheng | Querying target databases using reference database records |
US8346794B2 (en) | 2004-04-21 | 2013-01-01 | Tti Inventions C Llc | Method and apparatus for querying target databases using reference database records by applying a set of reference-based mapping rules for matching input data queries from one of the plurality of sources |
WO2006085293A1 (en) * | 2005-02-10 | 2006-08-17 | Norkom Alchemist Limited | A transaction data processing system |
US7925607B2 (en) | 2005-02-10 | 2011-04-12 | Norkom Alchemist Limited | Transaction data processing system |
EP1953659A1 (en) * | 2007-01-30 | 2008-08-06 | Daintel ApS | A method for effecting computer implemented decision-support in prescribing a drug therapy |
WO2008092452A1 (en) * | 2007-01-30 | 2008-08-07 | Daintel Aps | A method for effecting computer implemented decision-support in prescribing a drug therapy |
US20140089219A1 (en) * | 2012-09-25 | 2014-03-27 | Lone Star College | A system and method that provides personal, educational and career navigation, validation, and analysis to users |
Also Published As
Publication number | Publication date |
---|---|
AU2002339808A1 (en) | 2002-12-03 |
WO2002095676A3 (en) | 2003-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20030023593A1 (en) | Real-time adaptive data mining system and method | |
US6647379B2 (en) | Method and apparatus for interpreting information | |
Hamilton et al. | RIAC: a rule induction algorithm based on approximate classification | |
US8364618B1 (en) | Large scale machine learning systems and methods | |
US7856616B2 (en) | Action-based in-process software defect prediction software defect prediction techniques based on software development activities | |
US7430717B1 (en) | Method for adapting a K-means text clustering to emerging data | |
US11315196B1 (en) | Synthesized invalid insurance claims for training an artificial intelligence / machine learning model | |
US11580560B2 (en) | Identity resolution for fraud ring detection | |
WO2005055073A1 (en) | Automated anomaly detection | |
US11562262B2 (en) | Model variable candidate generation device and method | |
US20020049720A1 (en) | System and method of data mining | |
Ayetiran et al. | A data mining-based response model for target selection in direct marketing | |
JP2000339351A (en) | System for identifying selectively related database record | |
CN113626241A (en) | Application program exception handling method, device, equipment and storage medium | |
CN115859191A (en) | Fault diagnosis method and device, computer readable storage medium and computer equipment | |
JP7470235B2 (en) | Vocabulary extraction support system and vocabulary extraction support method | |
Bruha | From machine learning to knowledge discovery: Survey of preprocessing and postprocessing | |
WO2002095676A2 (en) | Real-time adaptive data mining system and method | |
JP4172388B2 (en) | Link diagnostic device, link diagnostic method, and link diagnostic program. | |
JP7532300B2 (en) | Information processing method, program, and information processing device | |
CN115439079A (en) | Item classification method and device | |
CN114066173A (en) | Capital flow behavior analysis method and storage medium | |
CN107025615B (en) | Learning condition statistical method based on learning tracking model | |
WO2020101478A1 (en) | System and method for managing duplicate entities based on a relationship cardinality in production knowledge base repository | |
WO2024147212A1 (en) | Learning data management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |