US20050080613A1 - System and method for processing text utilizing a suite of disambiguation techniques - Google Patents
System and method for processing text utilizing a suite of disambiguation techniques Download PDFInfo
- Publication number
- US20050080613A1 US20050080613A1 US10/921,954 US92195404A US2005080613A1 US 20050080613 A1 US20050080613 A1 US 20050080613A1 US 92195404 A US92195404 A US 92195404A US 2005080613 A1 US2005080613 A1 US 2005080613A1
- Authority
- US
- United States
- Prior art keywords
- sense
- text
- word
- selection
- components
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Definitions
- Word sense disambiguation is the process of determining the meaning of words in text.
- the word “bank” can mean a financial institution, an embankment, or an aerial manoeuvre (or several other meanings).
- humans listen to or read naturally expressed language, they automatically select the correct meaning of each word based on the context in which it is expressed.
- a word sense disambiguator is a computer-based system for accomplishing this task, and is a critical component of technology for making naturally expressed language understandable to computers.
- a word sense disambiguator is used in applications which require or which can be improved by making use of the meaning of the words in the text.
- Such applications include but are not limited to: Internet search and other information retrieval applications; document classification; machine translation; and speech recognition.
- a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense or senses for the text comprises applying a selection of the components to the text to identify a local disambiguated sense for the text.
- Each component provides a local disambiguated sense of the text with a confidence score and a probability score.
- the disambiguated sense is determined utilizing a selection of local disambiguated senses.
- the components are sequentially activated and controlled by a central module.
- the method may further comprise identifying a second selection of components; and applying the second selection to the text to refine the disambiguated sense (or senses).
- Each component in the second selection provides a second local disambiguated sense (or senses) of the text with a second confidence score and a second probability score.
- the disambiguated sense (or senses) is determined utilizing a selection of the second local disambiguated senses.
- the further step of eliminating a sense from the disambiguated sense having a confidence score below a threshold may be executed.
- the selection and the second selection of components may be identical.
- the confidence score of the each component may be generated by a confidence function utilizing a trait of each component.
- the method may generate a probability distribution for its disambiguated sense (or senses). Further the method may merge all probability distributions for the selection.
- the selection of component disambiguates the text using context of the text may be identified from one of the following contexts: domain; user history; and specified context.
- the method may refine a knowledge base of each component in the selection utilizing the disambiguated sense (or senses).
- At least one of the selection of components provides results only for coarse senses.
- results of the selection of components may be combined into one result utilizing a merging algorithm.
- the process may utilize a first stage comprising merging of coarse senses, and a second stage comprising merging of fine senses within each coarse sense grouping.
- the merging process may utilize a weighted sum of probability distributions, and the weights may be the confidence score associated with the distribution. Further, the merging process may comprise a weighted average of confidence scores, and the weights are again the confidence scores associated with the distribution.
- a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text comprises steps of: defining an accuracy target for disambiguation; and applying a selection of components from the plurality of disambiguation components to meet the accuracy target.
- a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text comprises steps of: identifying a set of senses for the text; and identifying and removing an unwanted sense from the set.
- a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text comprises steps of: identifying a set of senses for the text; and identifying and removing an amount of ambiguity from the set of senses.
- a method of generating sense-tagged text comprises steps of: disambiguating a quantity of documents utilizing a disambiguation component; generating a confidence score and a probability score for a sense identified for a word provided by the component; if the confidence score for the sense for the word is below a set threshold, the sense is ignored; and if the confidence score for the sense for the word is above the set threshold, the sense is added to the sense-tagged text.
- FIG. 1 is a schematic representation of words and word senses associated with an embodiment of a text processing system
- FIG. 2 is a schematic representation of a representative semantic relationship or words for with the system of FIG. 1 ;
- FIG. 3 is a schematic representation of an embodiment of a text processing system providing word sense disambiguation
- FIG. 4 is a block diagram of a word sense disambiguator module, control file optimizer, and database elements of the text processing system of FIG. 3 .
- FIG. 5 is a diagram of data structures used to represent the semantic relationships of FIG. 2 for the system of FIG. 3 ;
- FIG. 6 is a flow diagram of a text processing process performed by the embodiment of FIG. 3 ;
- FIG. 7 is flow diagram of a process for a disambiguating step of the text processing process of FIG. 6 ;
- FIG. 8 is a data flow diagram for the control file optimizer of FIG. 4 ;
- FIG. 9 is a flow diagram of a bootstrapping process associated with the text processing system of FIG. 3 .
- Computer readable storage medium hardware for storing instructions or data for a computer.
- the medium may take the form of a portable item such as a small disk, floppy diskette, cassette, or it may take the form of a relatively large or immobile item such as hard disk drive, solid state memory card, or RAM.
- documents, web pages, emails, image descriptions, transcripts, stored text etc. that contain searchable content of interest to users, for example, contents related to news articles, news group messages, web logs, etc.
- Module a software or hardware component that performs certain steps and/or processes; may be implemented in software running on a general-purpose processor.
- Natural language a formulation of words intended to be understood by a person rather than a machine or computer.
- Network an interconnected system of devices configured to communicate over a communication channel using particular protocols. This could be a local area network, a wide area network, the Internet, or the like operating over communication lines or through wireless transmissions.
- Query a list of keywords indicative of desired search results; may utilize Boolean operators (e.g. “AND”, “OR”); may be expressed in natural language.
- Boolean operators e.g. “AND”, “OR”
- Text textual information represented in its usual form within a computer or associated storage device. Unless otherwise specified, it is assumed to be expressed in natural language.
- Search engine a hardware or software component to provide search results regarding information of interest to a user in response to text from the user.
- the search results may be ranked and/or sorted by relevance.
- Sense-tagged text text in which some or all of the words have been marked with a word sense or senses signifying the meaning of the word in the text.
- Sense-tagged corpus is a collection of sense-tagged text for which the senses and possibly linguistic information such as part of speech tags of some or all words have been marked.
- the accuracy of the specification of the senses and other linguistic information must be similar to that which would be achieved by a human lexicographer.
- sense-tagged text is generated by a machine, then the accuracy of word senses that are marked by the machine must similar that of a human lexicographer performing word sense disambiguation.
- the embodiment relates to natural language processing, and in particular to processing natural language text as a step in an application which requires or can be improved by making use of the meaning of the words in the text. This process is known generally as word sense disambiguation.
- Applications include but are not limited to:
- Document classification in allowing documents to be clustered based upon precise criteria of meaning as opposed to their textual content. For example, consider an application which automatically sorted email messages into folders each pertaining to a topic specified by a user. One such folder might be entitled “programming tools”, and contain any emails that mentioned any form of “programming tool”. The use of word sense disambiguation in this application would allow emails that contained related information, but did not contain words matching the title of the folder to be accurately classified as belonging in the folder or not.
- the words “Java object” could be placed in the folder because it contains a sense of “Java” meaning a programming language
- an email containing the terms “Java coffee” or “tools to use in designing a conference program” could be rejected because, in the first case, the word “Java” is disambiguated to mean a type of coffee, and, in the second case, the word “program” refers to an event, which is a meaning not associated with computer programming.
- Such an effect could be optionally achieved by giving the senses present in a disambiguated email to a machine learning algorithm, rather than just providing the words as is currently done by state-of-the-art applications. The accuracy of the classification would increase as a result, and the application would appear more intelligent and be more useful to the user.
- Machine translation in knowing the precise meanings of words before they are translated, so that the correct translation can be provided for words with multiple possible translations.
- the word “bank” in English may translate into the French “banque” if it means “financial institution”, but “rive” if it means “river bank”.
- it is necessary to select a meaning. It will be recognised by those skilled in the art that a large percentage of the errors in prior art machine translation systems are made due to the selection of the wrong senses of words being translated.
- the addition of word sense disambiguation to such a system would improve accuracy by reducing or eliminating the errors of this type that are made by today's state-of-the-art systems.
- Speech recognition in allowing utterances with words or combinations of words that sound the same but are written differently to be correctly interpreted.
- Most speech recognition systems include a recognition component that analyses the phonetics of a phrase and outputs several possible sequences of words that could have been pronounced. For example, “I asked to people” and “I asked two people” are pronounced the same, and would both be output as possible sequences of words by such a recognition component.
- Most speech recognition systems then include a module which selects which of the possible word sequences is the most probable, and outputs this sequence as the result. This module typically operates by selecting the word sequence that matches most closely with word sequences that are known to be uttered. Word sense disambiguation could improve the operation of such a module by selecting the word sequence that leads to the most consistent interpretation.
- Text to speech in allowing words with multiple pronunciations to be pronounced correctly. For example, “I saw her sow the seeds” and “The old sow was slaughtered for bacon” both contain the word “sow”, which is pronounced differently in each sentence.
- a text to speech application needs to know which interpretation applies to each word in order to correctly utter each sentence.
- a word sense disambiguation module could determine that the sense of “sow” in the first sentence was the verb “to sow” and in the second sentence was “a female hog”. The application would then have the information necessary to pronounce each sentence correctly.
- relationship between words and word senses is shown generally by the reference 100 .
- the word “bank” may represent: (i) a noun referring to a financial institution; (ii) a noun referring to a river bank; or (iii) a verb referring to an action to save money.
- the word “interest” has multiple meanings including: (i) a noun representing an amount of money payable relating to an outstanding investment or loan; (ii) a noun representing special attention given to something; or (iii) a noun representing a legal right in something.
- the embodiment assigns senses to words.
- the embodiment defines two senses of words: coarse and fine.
- a fine sense defines a precise meaning and usage of a word. Each fine sense applies within a particular part of speech category (noun, verb, adjective or adverb).
- a coarse sense defines a broad concept associated with a word, and may be associated with more than one part of speech category. Each coarse sense contains one or more fine senses, and each fine sense belongs to one coarse sense.
- a word can have more than one fine and more than one coarse sense.
- a fine sense is classified under the coarse sense because the fine sense of the word matches the generic concept associated with the coarse sense definition. Table 1 illustrates the relationship between a word, its coarse senses and its fine senses.
- example semantic relationships between word senses are shown. These semantic relationships are precisely defined types of associations between two words based on meaning.
- the relationships are between word senses, which are specific meanings of words.
- a bank in the sense of a river bank
- a bluff in the sense of a noun meaning a land formation
- a bank in the sense of river bank
- a bank in the sense of river bank is a type of incline (in the sense of grade of the land).
- a bank in the sense of a financial institution is synonymous with a “banking company” or a “banking concern.”
- a bank is also a type of financial institution, which is in turn a type of business.
- a bank in the sense of financial institution
- a bank in the sense of financial institution
- a bank in the sense of financial institution
- a bank is related to interest (in the sense of money paid on investments) and is also related to a loan (in the sense of borrowed money) by the generally understood fact that banks pay interest on deposits and charge interest on loans.
- Words which are in synonymy are words which are synonyms to each other.
- a hypernym is a relationship where one word represents a whole class of specific instances. For example “transportation” is a hypernym for a class of words including “train”, “chariot”, “dogsled” and “car”, as these words provide specific instances of the class.
- a hyponym is a relationship where one word is a member of a class of instances. From the previous list, “train” is a hyponym of the class “transportation”.
- a meronym is a relationship where one word is a constituent part of, the substance of, or a member of something. For example, for the relationship between “leg” and “knee”, “knee” is a meronym to “leg”, as a knee is a constituent part of a leg. Meanwhile, a holonym a relationship where one word is the whole of which a meronym names a part. From the previous example, “leg” is a holonym to “knee”. Any semantic relationships that fall into these categories may be used. In addition, any known semantic relationships that indicate specific semantic and syntactic relationships between word senses may be used.
- the embodiment addresses this issue. It has been recognized that deriving precise synonyms and sub-concepts for each key term in a naturally expressed text increases the volume of retrieved relevant retrievals. If this were performed using a thesaurus without word sense disambiguation, the result could be worsened. For example, semantically expanding the word “Java” without first establishing its precise meaning would yield a massive and unwieldy result set with results potentially selected based on word senses as diverse as “Indonesia” and “computer programming”. The embodiment provides systems and methods of interpreting meaning of each word which are semantically expanded to produce a comprehensive and simultaneously more precise result set.
- the system includes text processing engine 20 .
- the text processing engine 20 may be implemented as dedicated hardware, or as software operating on a general purpose processor.
- the text processing engine may also operate on a network.
- WSD module 32 identifies which specific meaning of the word is the intended meaning using a wide range of interlinked linguistic techniques to analyze the syntax (e.g. part of speech, grammatical relations) and semantics (e.g. logical relations) in context. It may use a knowledge base of word senses which expresses explicit semantic relationships between word senses to assist in performing the disambiguation.
- knowledge base 400 of word senses capturing relationships of words as described above for FIG. 2 .
- Knowledge base 400 is associated with database 30 and is accessed to assist WSD module 32 in performing word sense disambiguation as well as provide the inventory of possible senses of words in a text. While prior art dictionaries, and lexical databases such as WordNet (trademark), have been used in systems, knowledge base 400 provides an enhanced inventory of words, word senses, and semantic relations. For example, while prior art dictionaries contain only definitions of words for each of their word senses, knowledge base 400 also contains information on relations between word senses.
- Knowledge base 400 also contains additional semantic relations not contained in other prior art lexical databases: (i) additional relations between word senses, such as the grouping of fine senses into coarse senses, “instance of” relations, classification relations, and inflectional and derivational morphological relations; (ii) corrections of errors in data obtained from published sources; and (iii) additional words, word senses, and relations that are not present in other prior art knowledge bases.
- database 30 In addition to containing an inventory of words and word senses (fine and coarse) for each word and concepts, as well as over 40 specific types of semantic links between them, database 30 also provides a repository for component resources 402 used by linguistic components 502 and WSD components 504 .
- Some component resources are shared by several components while other resources are specific to a given component.
- the component resources include: general models, domain specific models, user models and session models.
- General models contain general domain information, such as a probability distribution of senses for each word for any text of unknown domain. They are trained using data from several domains.
- WSD components 504 and linguistic components 502 utilize these resources as necessary. For example, a component may use these resources on all requests or may use it only when the request cannot be completed using more specific models.
- Database 30 also contains sense-tagged corpus 404 .
- Sense-tagged corpus 404 may optionally be split up into sub-units used for training components, training confidence functions for components and training the control file optimizer, as described further below.
- a corresponding definition in type field 408 B identifies the label as a “fine sense” word relationship.
- a corresponding entry in annotation filed 410 B identifies the label as “Noun. A financial institution”.
- a “bank” can now be linked to this word sense definition.
- an entry for the word “brokerage” may also be linked to this word sense definition.
- Alternate embodiments may use a common word with a suffix attached to it, in order to facilitate recognition of the word sense definition.
- an alternative label could be “bank/n1”, where the “/n1” suffix identifies the label as a noun (n) and the first meaning for that noun. It will be appreciated that other label variations may be used.
- results generated by a particular component are preferably rated using a probability distribution and a confidence score.
- the probability distribution allows a component to return a probability figure indicating the likelihood that any possible answer is correct.
- possible answers comprise possible senses of words in the text.
- possible answers depend on the task being performed by the linguistic component; for example, possible answers for part-of-speech tagger 502 F are the set of possible part of speech tags for each word.
- the confidence score provides an indication of a level of confidence of the algorithm in the probability distribution.
- an answer having a high probability and a high confidence score indicates that the algorithm has identified a single answer as most probable and it is highly likely that the identified answer is accurate. If an answer has a high probability score and a low confidence, then although the algorithm has identified a single answer as most probable, its confidence score indicates that it may not be correct. In the case of WSD components 504 , a low confidence score may indicate that the component is lacking information that it needed to disambiguate this particular word. It is important that each component have a good confidence function. A component with a low overall accuracy but a good confidence function is able to contribute to the system accuracy despite its low overall accuracy, as the confidence function will identify correctly the subset of words for which the answers supplied by the component can be trusted.
- the components employ statistical techniques based on machine learning concepts or other statistical techniques which will be familiar to those skilled in the art. It will be appreciated by those skilled in the art that such components require use training data, in order to construct their statistical models.
- the priors component 504 A utilizes many sense-tagged examples of each word in order to determine what is the statistically most likely sense for that particular word.
- the training data is provided by sense-tagged corpus 404 , which is known by those skilled in the art as a “training corpus”.
- Each WSD component 504 attempts to associate the correct senses to words in text using a particular word sense disambiguation algorithm.
- Each WSD component 504 may run more than one time during the course of a disambiguation.
- the system provides semantic word data or other forms of data in database 30 that each of the algorithms needs in order to perform disambiguation.
- each WSD component 504 has an algorithm that executes a particular type of disambiguation and generates a probability score and a confidence score with its results.
- the WSD components include but are not limited to: priors component 504 A; example memory component 504 B; n-gram component 504 C; concept overlapping component 504 E; heuristic word sense component 504 F; frequent words component 504 G; and dependency component 504 H.
- Each component has a specialized knowledge base associated with its particular operation.
- Each component produces a confidence function as detailed above. Details of each component are described below.
- Each technique is generally known in the art, unless specific aspects are provided herein. It will also be appreciated that not all of the WSD components described in the embodiment may be necessary to accomplish accurate word sense disambiguation, but that some combination of different techniques is required.
- the example memory algorithm identifies whether parts of the text or text match the previously identified recurring sequences of words which have been retained in the list of word sequences. If there is a match, the module assigns the word senses of the sequence to the matching words in the text.
- This list is derived from word pairs from sense-tagged corpus 404 that occurred multiple times, where the senses for each of the word pair occurrence was identical. However, when a sense of at least one word differs, such word pair senses are rejected and are not retained in the list.
- the algorithm matches word pairs from the text or text being processed with word pair present in the list maintained by the algorithm. A match is identified when a word pair is found and the sense of one of the two words is already present in the text or text being processed. When a match is identified, it is assigned the sense relating to the second word in the word pair being processed.
- the component resource associated with the n-grams algorithm is trained over sense-tagged corpus 404 , and is part of component resources 402 .
- the n-grams component resource includes a statistical model which identifies when an n-gram has been seen sufficiently frequently to become a valid sense predictor.
- predictors from the knowledge base may by triggered by a pattern of words. These predictors may reinforce a common sense or may actually generate multiple possible senses with a given probability distribution.
- frequent words component 504 G it has a frequent words algorithm which identifies the senses of the most frequently occurring words.
- the 500 most frequently occurring words account for almost a third of the words encountered in normal text.
- a large amount of training examples are available in sense-tagged corpus 404 . Accordingly, it is possible to train using supervised machine learning methods specific sense predictors for each word.
- the machine learning method used to train the component is boosting, and the features used include the words and parts of speech of the words in immediate proximity to the target word to be disambiguated. Other features and machine learning techniques may be used to accomplish the same goal, as will be familiar to those skilled in the art.
- Tokenizer 502 A which splits input text into individual words and symbols. Tokenizer 502 A processes the input text as a sequences of characters and breaks the input text into a series of tokens, where a token is the smallest sequence of characters that can form a word.
- Morpher 502 C which identifies a lemma, i.e. a base form, of a word.
- the lemma defines the fine sense and coarse sense inventories of the word. For example, for the inflected word “jumping” the morpher identifies its base form “jump”.
- Parser 502 D which identifies relationships between the words in the input text. Parser 502 D identifies grammatical structures and phrases in the input text. The result of this operation is a parse tree, which is a concept very well known in the field. Some relationships include “subject of the verb” and “object of the verb”. From the phrases, a list of syntactic and semantic dependencies can later be extracted. Parser 502 D also produces part of speech tags that are used to update the part of speech distribution. Parser information is also used to select possible compounds.
- Dependency extractor 502 J uses the parse tree to generate a list of syntactic and semantic dependencies, which will be familiar to those skilled in the art.
- the semantic dependencies are used by a number of other components to enhance their models.
- Dependencies are extracted in the following manner:
- Parser 502 D is used to generate a syntactic parse tree, including syntactic heads for each phrase.
- Named-entity recogniser 502 E identifies known proper nouns such as “Albert Einstein” or “International Business Machines Incorporated” and other multi-word proper nouns.
- Named-entity tagger 502 E collects tokens that form a named entity into groups and classifies the group into categories. Such categories include: a person, location, artefact, as will be familiar to those skilled in the art.
- Named-entity categories are determined by a Hidden Markov Model (HMM) that is trained on parts of the sense-tagged corpus 404 in which the named entities have been marked. For example in the text fragment “Today Coca-Cola announced . . . ”, the HMM will categorize “Coca-Cola” as a company (instead of an artefact) because of analysis of the surrounding words.
- HMM Hidden Markov Model
- Part-of-speech tagger 502 F assigns functional roles such as “noun” and “verb” to the words in the input text.
- Part of speech tagger 502 F identifies a part of speech, which can be mapped to the broad parts of speech (noun, verb, adverb, adjective) relevant to disambiguating between word senses.
- Part-of-speech tagger 502 F utilizes several a trigram-based Hidden Markov Model (HMM) trained on a portion of sense-tagged corpus 404 which has been annotated with part of speech information.
- HMM Hidden Markov Model
- the merger can optionally be run twice, once on the coarse senses and a second time over the group of fine senses associated with each coarse sense.
- ICS 500 then performs ambiguity reduction using ambiguity eliminator 500 C.
- the embodiment performs this process based upon the merged distribution and confidence output by merging module 500 B.
- a sense in the merged distribution has a deemed very high probability and high confidence, it is deemed to contain the correct sense and all other senses can be removed. For example, if a merged result indicated that the disambiguation for “java” was “coffee” with 98% probability and its confidence score was 90%, then all other senses would be excluded as being possible, and “coffee” would be the sole remaining sense.
- Control file 516 sets probability and confidence score thresholds for this decision point.
- At least one or more iterations of steps 4, 5 and 6 may optionally be performed. It will be appreciated that results of each subsequent iteration will likely be different than those of previous iteration(s), as WSD components 504 themselves do not predict senses which were eliminated after previous iterations. WSD components 504 make use of the reduced ambiguity as compared to the previous iteration to produce a result with a more accurate distribution and/or higher confidence score.
- Control file 516 identifies which set of WSD components 504 is applied on each iteration. It will be appreciated that several iterations may be performed until a sufficient number of words have been disambiguated or until the number of iterations specified in the control file 516 have been completed.
- Consolidated merged results are then searched to identify probability and confidence thresholds of merged results that optimize a number of correct answers with an accuracy equal to or above the target accuracy for the iteration. This is preferably performed using the method of step 2.
- control file optimizer 514 can be provided with a maximum number of iterations.
- the embodiment also provides a system and method for automatically providing a sense-tagged corpus 404 or for automatically increasing the size of sense-tagged corpus 404 for the training of WSD components 504 .
- the first is the component training process 960 .
- This process uses sense tagged text 404 or untagged text 900 as an input to the WSD component training module 906 in order to generate improved component resources for the WSD components 504 .
- the second process is the corpus generation process 950 . This process processes untagged text 900 or partially tagged text 902 through the WSD module 32 .
- a key to this process is the use of a probability distribution and confidence score.
- a confidence score is not available and inaccurate results cannot be discarded.
- the WSD components 504 are less accurate after retraining on the enlarged sense tagged corpus 404 than they were before, and such a process is not practically useful.
- the embodiment eliminates this deficiency in the prior art system and allows the training data to be enlarged with high quality tagged text. It will be appreciated that this process can run multiple times, and may create a self-reinforcing loop that increases both the size of the sense tagged corpus 404 and the accuracy of the WSD system 32 .
- the quality of the training data extracted due to the use of a probability distribution and a confidence score) and the potentially self-reinforcing nature of the bootstrapping process are features of the embodiment.
- a number of documents are disambiguated by a highly accurate method, such as manually by a skilled human. Use of these documents provides “seeding resources” to the system, which are added to the sense tagged corpus 404 .
- a large quantity of documents from the domain are automatically disambiguated and added to the sense tagged corpus 404 using the corpus tagging process 950 .
- the system allows a component to have multiple passes over the text being disambiguated, which allows it to use high-accuracy disambiguations (or reductions in ambiguity) provided by any of the other components, to improve its accuracy in disambiguating the remaining words. For example, when faced with the words “cup” and “green” in one sentence, a particular WSD component 504 may not be able to distinguish between a “cup” sense for “golf” and the more mundane “drinking vessel”. If another WSD component 504 is able to disambiguate the word “green” into its “golf green” sense, then the first WSD component 504 may now be able to correctly disambiguate “golf” into “golf cup”. In this sense, WSD components 504 interact with each other to arrive at more likely senses.
- WSD module 32 includes a method for merging an optimal “recipe” of components and parameter values. This merged set is optimal in the sense that it provides the parameters which utilise multiple iterations of multiple components to obtain the maximum possible accuracy.
- WSD module 32 can provide sense distributions to the components which favour those terms in the legal domain.
- the embodiment uses metadata.
- the title of the document can be used to aid in the disambiguation of the document's text, by allowing the words in the title to carry disproportionate weight towards the disambiguation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a system and method for processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text. For the method, it comprises applying a selection of the components to the text to identify a local disambiguated sense for the text. Each component provides a local disambiguated sense of the text with a confidence score and a probability score. The disambiguated sense is determined utilizing a selection of local disambiguated senses. The invention also relates to a system and method for generating sense-tagged text. For the method, it comprises steps of: disambiguating a quantity of documents utilizing a disambiguation component; generating a confidence score and a probability score for a sense identified for a word provided by the component; if the confidence score for the sense for the word is below a set threshold, the sense is ignored; and if the confidence score for the sense for the word is above the set threshold, the sense is added to the sense-tagged text.
Description
- This application claims the benefit of U.S. Provisional Application No. 60/496,681 filed on Aug. 21, 2003.
- The present invention relates to disambiguating natural language text, such as queries to an Internet search engine, web pages and other electronic documents, and disambiguating textual output of a speech to text system.
- Word sense disambiguation is the process of determining the meaning of words in text. For example, the word “bank” can mean a financial institution, an embankment, or an aerial manoeuvre (or several other meanings). When humans listen to or read naturally expressed language, they automatically select the correct meaning of each word based on the context in which it is expressed. A word sense disambiguator is a computer-based system for accomplishing this task, and is a critical component of technology for making naturally expressed language understandable to computers.
- A word sense disambiguator is used in applications which require or which can be improved by making use of the meaning of the words in the text. Such applications include but are not limited to: Internet search and other information retrieval applications; document classification; machine translation; and speech recognition.
- It is accepted by those skilled in the art that, although humans perform word sense disambiguation effortlessly, and this is a critical step in understanding naturally expressed language, no system has yet been developed to accomplish word sense disambiguation of general texts to an accuracy sufficient to permit deployment in such applications. Even current advanced word sense disambiguation systems may have an accuracy of only approximately 33%, thereby making their results too inaccurate for many applications.
- There is a need for word sense disambiguation system and method which addresses deficiencies in the prior art.
- In a first aspect, a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense or senses for the text is provided. The method comprises applying a selection of the components to the text to identify a local disambiguated sense for the text. Each component provides a local disambiguated sense of the text with a confidence score and a probability score. The disambiguated sense is determined utilizing a selection of local disambiguated senses.
- In the method, the components are sequentially activated and controlled by a central module.
- The method may further comprise identifying a second selection of components; and applying the second selection to the text to refine the disambiguated sense (or senses). Each component in the second selection provides a second local disambiguated sense (or senses) of the text with a second confidence score and a second probability score. The disambiguated sense (or senses) is determined utilizing a selection of the second local disambiguated senses.
- In the method, after applying the selection to the text and prior to applying the second selection to refine the disambiguated sense (or senses), the further step of eliminating a sense from the disambiguated sense having a confidence score below a threshold may be executed.
- In the method, when a particular component is present in the selection and the second selection, its confidence and probability scores may be adjusted when applying the second selection to the text.
- In the method, the selection and the second selection of components may be identical.
- In the method, the confidence score of the each component may be generated by a confidence function utilizing a trait of each component.
- After applying the selection of components to the text to identify a local disambiguated sense (or senses) for the text, for each component of the selection, the method may generate a probability distribution for its disambiguated sense (or senses). Further the method may merge all probability distributions for the selection.
- In the method, the selection of component disambiguates the text using context of the text may be identified from one of the following contexts: domain; user history; and specified context.
- After applying the selection to the text, the method may refine a knowledge base of each component in the selection utilizing the disambiguated sense (or senses).
- In the method at least one of the selection of components provides results only for coarse senses.
- In the method, results of the selection of components may be combined into one result utilizing a merging algorithm.
- In the method, the process may utilize a first stage comprising merging of coarse senses, and a second stage comprising merging of fine senses within each coarse sense grouping.
- In the method, the merging process may utilize a weighted sum of probability distributions, and the weights may be the confidence score associated with the distribution. Further, the merging process may comprise a weighted average of confidence scores, and the weights are again the confidence scores associated with the distribution.
- In another aspect, a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text is provided. The method comprises steps of: defining an accuracy target for disambiguation; and applying a selection of components from the plurality of disambiguation components to meet the accuracy target.
- In another aspect, a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text is provided. The method comprises steps of: identifying a set of senses for the text; and identifying and removing an unwanted sense from the set.
- In another aspect a method of processing natural language text utilizing disambiguation components to identify a disambiguated sense for the text is provided. The method comprises steps of: identifying a set of senses for the text; and identifying and removing an amount of ambiguity from the set of senses.
- In another second aspect, a method of generating sense-tagged text is provided. The method comprises steps of: disambiguating a quantity of documents utilizing a disambiguation component; generating a confidence score and a probability score for a sense identified for a word provided by the component; if the confidence score for the sense for the word is below a set threshold, the sense is ignored; and if the confidence score for the sense for the word is above the set threshold, the sense is added to the sense-tagged text.
- In other aspects various combinations of sets and subsets of the above aspects are provided.
- The foregoing and other aspects of the invention will become more apparent from the following description of specific embodiments thereof and the accompanying drawings which illustrate, by way of example only, the principles of the invention. In the drawings, where like elements feature like reference numerals (and wherein individual elements bear unique alphabetical suffixes):
-
FIG. 1 is a schematic representation of words and word senses associated with an embodiment of a text processing system; -
FIG. 2 is a schematic representation of a representative semantic relationship or words for with the system ofFIG. 1 ; -
FIG. 3 is a schematic representation of an embodiment of a text processing system providing word sense disambiguation; -
FIG. 4 is a block diagram of a word sense disambiguator module, control file optimizer, and database elements of the text processing system ofFIG. 3 . -
FIG. 5 is a diagram of data structures used to represent the semantic relationships ofFIG. 2 for the system ofFIG. 3 ; -
FIG. 6 is a flow diagram of a text processing process performed by the embodiment ofFIG. 3 ; -
FIG. 7 is flow diagram of a process for a disambiguating step of the text processing process ofFIG. 6 ; -
FIG. 8 is a data flow diagram for the control file optimizer ofFIG. 4 ; and -
FIG. 9 is a flow diagram of a bootstrapping process associated with the text processing system ofFIG. 3 . - The description which follows, and the embodiments described therein, are provided by way of illustration of an example, or examples, of particular embodiments of the principles of the present invention. These examples are provided for the purposes of explanation, and not limitation, of those principles and of the invention. In the description, which follows, like parts are marked throughout the specification and the drawings with the same respective reference numerals.
- The following terms will be used in the following description, and have the meanings shown below:
- Computer readable storage medium: hardware for storing instructions or data for a computer. For example, magnetic disks, magnetic tape, optically readable medium such as CD ROMs, and semi-conductor memory such as PCMCIA cards. In each case, the medium may take the form of a portable item such as a small disk, floppy diskette, cassette, or it may take the form of a relatively large or immobile item such as hard disk drive, solid state memory card, or RAM.
- Information: documents, web pages, emails, image descriptions, transcripts, stored text etc. that contain searchable content of interest to users, for example, contents related to news articles, news group messages, web logs, etc.
- Module: a software or hardware component that performs certain steps and/or processes; may be implemented in software running on a general-purpose processor.
- Natural language: a formulation of words intended to be understood by a person rather than a machine or computer.
- Network: an interconnected system of devices configured to communicate over a communication channel using particular protocols. This could be a local area network, a wide area network, the Internet, or the like operating over communication lines or through wireless transmissions.
- Query: a list of keywords indicative of desired search results; may utilize Boolean operators (e.g. “AND”, “OR”); may be expressed in natural language.
- Text: textual information represented in its usual form within a computer or associated storage device. Unless otherwise specified, it is assumed to be expressed in natural language.
- Search engine: a hardware or software component to provide search results regarding information of interest to a user in response to text from the user. The search results may be ranked and/or sorted by relevance.
- Sense-tagged text: text in which some or all of the words have been marked with a word sense or senses signifying the meaning of the word in the text.
- Sense-tagged corpus: is a collection of sense-tagged text for which the senses and possibly linguistic information such as part of speech tags of some or all words have been marked. The accuracy of the specification of the senses and other linguistic information must be similar to that which would be achieved by a human lexicographer. Thus, if sense-tagged text is generated by a machine, then the accuracy of word senses that are marked by the machine must similar that of a human lexicographer performing word sense disambiguation.
- The embodiment relates to natural language processing, and in particular to processing natural language text as a step in an application which requires or can be improved by making use of the meaning of the words in the text. This process is known generally as word sense disambiguation. Applications include but are not limited to:
- 1. Internet search and other information retrieval applications; both in disambiguating queries to better specify the user's request, and in disambiguating documents to select more relevant results. When working with large sets of data, such as a database of documents or web pages on the Internet, the volume of available data can make it difficult to find information of relevance. Various methods of searching are used in an attempt to find relevant information in such stores of information. Some of the best known systems are Internet search engines, such as Yahoo (trademark) and Google (trademark) which allow users to perform keyword-based searches. These searches typically involve matching keywords entered by the user with keywords in an index of web pages. One reason for some difficulties encountered in performing such searches is the ambiguity of words used in natural language. Specifically, difficulties are often encountered because one word can have several meanings, and each meaning can have multiple synonyms or paraphrases. For example, “Java bean” is matched by a search engine to documents which simply contain these two words. By disambiguating “Java bean” to mean “coffee bean” instead of the “Java Bean” computer technology by Sun Microsystems, a disambiguator would allow documents about this computer technology to be excluded from the results, and would similarly allow documents concerning coffee beans to be included in the results.
- 2. Document classification; in allowing documents to be clustered based upon precise criteria of meaning as opposed to their textual content. For example, consider an application which automatically sorted email messages into folders each pertaining to a topic specified by a user. One such folder might be entitled “programming tools”, and contain any emails that mentioned any form of “programming tool”. The use of word sense disambiguation in this application would allow emails that contained related information, but did not contain words matching the title of the folder to be accurately classified as belonging in the folder or not. For example, the words “Java object” could be placed in the folder because it contains a sense of “Java” meaning a programming language, whereas an email containing the terms “Java coffee” or “tools to use in designing a conference program” could be rejected because, in the first case, the word “Java” is disambiguated to mean a type of coffee, and, in the second case, the word “program” refers to an event, which is a meaning not associated with computer programming. Such an effect could be optionally achieved by giving the senses present in a disambiguated email to a machine learning algorithm, rather than just providing the words as is currently done by state-of-the-art applications. The accuracy of the classification would increase as a result, and the application would appear more intelligent and be more useful to the user.
- 3. Machine translation; in knowing the precise meanings of words before they are translated, so that the correct translation can be provided for words with multiple possible translations. For example, the word “bank” in English may translate into the French “banque” if it means “financial institution”, but “rive” if it means “river bank”. In order to perform an accurate translation of such a word, it is necessary to select a meaning. It will be recognised by those skilled in the art that a large percentage of the errors in prior art machine translation systems are made due to the selection of the wrong senses of words being translated. The addition of word sense disambiguation to such a system would improve accuracy by reducing or eliminating the errors of this type that are made by today's state-of-the-art systems.
- 4. Speech recognition; in allowing utterances with words or combinations of words that sound the same but are written differently to be correctly interpreted. Most speech recognition systems include a recognition component that analyses the phonetics of a phrase and outputs several possible sequences of words that could have been pronounced. For example, “I asked to people” and “I asked two people” are pronounced the same, and would both be output as possible sequences of words by such a recognition component. Most speech recognition systems then include a module which selects which of the possible word sequences is the most probable, and outputs this sequence as the result. This module typically operates by selecting the word sequence that matches most closely with word sequences that are known to be uttered. Word sense disambiguation could improve the operation of such a module by selecting the word sequence that leads to the most consistent interpretation. For example, consider a speech recognition system which generated two alternative interpretations for an utterance: “I scream in flat endings” or “Ice cream is fattening”. A word sense disambiguator would select between these two interpretations which sound the same, in exactly the same manner as it would disambiguate between two possible interpretations in text which are spelled the same,
- 5. Text to speech (speech synthesis), in allowing words with multiple pronunciations to be pronounced correctly. For example, “I saw her sow the seeds” and “The old sow was slaughtered for bacon” both contain the word “sow”, which is pronounced differently in each sentence. A text to speech application needs to know which interpretation applies to each word in order to correctly utter each sentence. A word sense disambiguation module could determine that the sense of “sow” in the first sentence was the verb “to sow” and in the second sentence was “a female hog”. The application would then have the information necessary to pronounce each sentence correctly.
- Before describing specific aspects of the embodiment, some background on relationships between words and their word senses is provided. Referring to
FIG. 1 , relationship between words and word senses is shown generally by thereference 100. As seen in this example, certain words have multiple senses. Among many other possibilities, the word “bank” may represent: (i) a noun referring to a financial institution; (ii) a noun referring to a river bank; or (iii) a verb referring to an action to save money. Similarly, the word “interest” has multiple meanings including: (i) a noun representing an amount of money payable relating to an outstanding investment or loan; (ii) a noun representing special attention given to something; or (iii) a noun representing a legal right in something. - The embodiment assigns senses to words. In particular, the embodiment defines two senses of words: coarse and fine. A fine sense defines a precise meaning and usage of a word. Each fine sense applies within a particular part of speech category (noun, verb, adjective or adverb). A coarse sense defines a broad concept associated with a word, and may be associated with more than one part of speech category. Each coarse sense contains one or more fine senses, and each fine sense belongs to one coarse sense. A word can have more than one fine and more than one coarse sense. A fine sense is classified under the coarse sense because the fine sense of the word matches the generic concept associated with the coarse sense definition. Table 1 illustrates the relationship between a word, its coarse senses and its fine senses. As an example to illustrate the distinction between fine and coarse senses, the fine senses for the word “bank” respect the distinction between the verb “to bank” as in “to bank a plane” and the noun “a bank” as in “the pilot performed a bank”, whereas these two senses are grouped together under the more general coarse sense “Manoeuvre”.
TABLE 1 Word Coarse Sense Fine Senses Bank Financial Institutions Financial institution (Noun) Building where banking is done (Noun) Perform Business with a Bank (Verb) Ground formations Land beside water (Noun) Ridge of earth (Noun) Slope in road (Noun) Manoeuvre Flight manoeuvre (Noun) Tip laterally (Verb) Gambling Funds held by a gambling house (Noun) act as a banker in gambling (Verb) - Referring to
FIG. 2 , example semantic relationships between word senses are shown. These semantic relationships are precisely defined types of associations between two words based on meaning. The relationships are between word senses, which are specific meanings of words. For example, a bank (in the sense of a river bank) is a type of terrain and a bluff (in the sense of a noun meaning a land formation) is also a type of terrain. A bank (in the sense of river bank) is a type of incline (in the sense of grade of the land). A bank in the sense of a financial institution is synonymous with a “banking company” or a “banking concern.” A bank is also a type of financial institution, which is in turn a type of business. A bank (in the sense of financial institution) is related to interest (in the sense of money paid on investments) and is also related to a loan (in the sense of borrowed money) by the generally understood fact that banks pay interest on deposits and charge interest on loans. - It will be understood that there are many other types of semantic relationships that may be used. Although known in the art, following are some examples of semantic relationships between words: Words which are in synonymy are words which are synonyms to each other. A hypernym is a relationship where one word represents a whole class of specific instances. For example “transportation” is a hypernym for a class of words including “train”, “chariot”, “dogsled” and “car”, as these words provide specific instances of the class. Meanwhile, a hyponym is a relationship where one word is a member of a class of instances. From the previous list, “train” is a hyponym of the class “transportation”. A meronym is a relationship where one word is a constituent part of, the substance of, or a member of something. For example, for the relationship between “leg” and “knee”, “knee” is a meronym to “leg”, as a knee is a constituent part of a leg. Meanwhile, a holonym a relationship where one word is the whole of which a meronym names a part. From the previous example, “leg” is a holonym to “knee”. Any semantic relationships that fall into these categories may be used. In addition, any known semantic relationships that indicate specific semantic and syntactic relationships between word senses may be used.
- It will be recognized that use of word sense disambiguation in a search engine addresses the problem of retrieval relevance. Furthermore, users often express text as they would express language. However, since the same meaning can be described in many different ways, users encounter difficulties when they do not express text in the same specific manner in which the relevant information was initially classified.
- For example if the user is seeking information about “Java” the island, and is interested in “holidays” on Java (island), the user would not retrieve useful documents that had been categorized using the keywords “Java” and “vacation”. The embodiment addresses this issue. It has been recognized that deriving precise synonyms and sub-concepts for each key term in a naturally expressed text increases the volume of retrieved relevant retrievals. If this were performed using a thesaurus without word sense disambiguation, the result could be worsened. For example, semantically expanding the word “Java” without first establishing its precise meaning would yield a massive and unwieldy result set with results potentially selected based on word senses as diverse as “Indonesia” and “computer programming”. The embodiment provides systems and methods of interpreting meaning of each word which are semantically expanded to produce a comprehensive and simultaneously more precise result set.
- Referring to
FIG. 3 , text processing system associated with an embodiment is shown generally atreference 10. The system takes as input a text file 12. The text file 12 contains natural language text, such as a query, a document, the output of a speech to text system, or any source of natural language text in electronic form. - The system includes
text processing engine 20. Thetext processing engine 20 may be implemented as dedicated hardware, or as software operating on a general purpose processor. The text processing engine may also operate on a network. - The
text processing engine 20 generally includes aprocessor 22. The engine may also be connected, either directly thereto, or indirectly over a network or other such communication means, to adisplay 24, aninterface 26, and a computerreadable storage medium 28. Theprocessor 22 is coupled to thedisplay 24 and to theinterface 26, which may comprise user input devices such as a keyboard, mouse, or other suitable devices. If thedisplay 24 is touch sensitive, then thedisplay 24 itself can be employed as theinterface 26. The computerreadable storage medium 28 is coupled to theprocessor 22 for providing instructions to theprocessor 22 to instruct and/or configureprocessor 22 to perform steps or algorithms related to the operation oftext processing engine 20, as further explained below. Portions or all of the computerreadable storage medium 28 may be physically located outside of thetext processing engine 20 to accommodate, for example, very large amounts of storage. Persons skilled in the art will appreciate that various forms of text processing engines can be used with the present invention. - Optionally, and for greater computational speed, the
text processing engine 20 may include multiple processors operating in parallel or any other multi-processing arrangement. Such use of multiple processors may enable thetext processing engine 20 to divide tasks among various processors. Furthermore, the multiple processors need not be physically located in the same place, but rather may be geographically separated and interconnected over a network as will be understood by those skilled in the art. -
Text processing engine 20 includes adatabase 30 for storing a knowledge base and component linguistic resources used by thetext processing engine 20. Thedatabase 30 stores the information in a structured format to allow computationally efficient storage and retrieval as will be understood by those skilled in the art. Thedatabase 30 may be updated by adding additional keyword senses or by referencing existing keyword senses to additional documents. Thedatabase 30 may be divided and stored in multiple locations for greater efficiency. - A central component of
text processing engine 20 is word sense disambiguation (WSD)module 32, which processes words from an input document or text into word senses. A word sense is a given interpretation ascribed to a word, in view of the context of its usage and its neighbouring words. For example, the word “book” in the sentence “Book me a flight to New York” is ambiguous, because “book” can be a noun or a verb, each with multiple potential meanings. The result of processing of the words by theWSD module 32 is a disambiguated document or disambiguated text comprising word senses rather than ambiguous or uninterpreted words.WSD module 32 distinguishes between word senses for each word in the document or text.WSD module 32 identifies which specific meaning of the word is the intended meaning using a wide range of interlinked linguistic techniques to analyze the syntax (e.g. part of speech, grammatical relations) and semantics (e.g. logical relations) in context. It may use a knowledge base of word senses which expresses explicit semantic relationships between word senses to assist in performing the disambiguation. - Referring to
FIG. 4 , further detail ondatabase 30 is provided. - To assist in disambiguating words into word senses, the embodiment utilizes
knowledge base 400 of word senses capturing relationships of words as described above forFIG. 2 .Knowledge base 400 is associated withdatabase 30 and is accessed to assistWSD module 32 in performing word sense disambiguation as well as provide the inventory of possible senses of words in a text. While prior art dictionaries, and lexical databases such as WordNet (trademark), have been used in systems,knowledge base 400 provides an enhanced inventory of words, word senses, and semantic relations. For example, while prior art dictionaries contain only definitions of words for each of their word senses,knowledge base 400 also contains information on relations between word senses. These relations includes the definition of the sense and the associated part of speech (noun, verb, etc.), fine sense synonyms, antonyms, hyponyms, meronyms, pertainyms, similar adjectives relations and other relationships known in the art.Knowledge base 400 also contains additional semantic relations not contained in other prior art lexical databases: (i) additional relations between word senses, such as the grouping of fine senses into coarse senses, “instance of” relations, classification relations, and inflectional and derivational morphological relations; (ii) corrections of errors in data obtained from published sources; and (iii) additional words, word senses, and relations that are not present in other prior art knowledge bases. - In addition to containing an inventory of words and word senses (fine and coarse) for each word and concepts, as well as over 40 specific types of semantic links between them,
database 30 also provides a repository forcomponent resources 402 used bylinguistic components 502 andWSD components 504. Some component resources are shared by several components while other resources are specific to a given component. In the embodiment, the component resources include: general models, domain specific models, user models and session models. General models contain general domain information, such as a probability distribution of senses for each word for any text of unknown domain. They are trained using data from several domains.WSD components 504 andlinguistic components 502 utilize these resources as necessary. For example, a component may use these resources on all requests or may use it only when the request cannot be completed using more specific models. Domain-specific models are trained from domain specific information. They are useful for modelling usage of specialized meanings of words in various domains. For example, the word “Java” has different meaning for travel agents and computer programmers. These resources allow the building of statistical models for each group. User models are trained for a specific user. The models may be given and maybe learnt over time. The user models can be constructed by the application or automatically by the word sense disambiguation system. Session models provide information regarding multiple requests regrouped within a session. For example, several word sense disambiguation requests may be related to the same topic during an information retrieval session using a search engine. The session models can be constructed by the application or automatically byWSD module 32. -
Database 30 also contains sense-taggedcorpus 404. Sense-taggedcorpus 404 may optionally be split up into sub-units used for training components, training confidence functions for components and training the control file optimizer, as described further below. - Referring to
FIG. 5 , further detail onknowledge base 400 is provided. In the embodiment,knowledge base 400 is a generalized graph data structure and is implemented as a table ofnodes 402 and a table ofedge relations 404 associating two nodes together. Each is described in turn. Annotations of arbitrary data types may be attached to each node or edge. In other embodiments, other data structures, such as linked lists, may be used to implementknowledge base 400. - In table 402, each node is an element in a row of table 402. In the embodiment, a record for each node has as many as the following fields: an
ID field 406, atype field 408 and anannotation field 410. There are two types of entries in table 402: a word and a word sense definition. For example, the word “bank” inID field 406A is identified as a word by the “word” entry intype field 408A. Also, exemplary table 402 provides several definitions of words. To catalog the definitions and to distinguish definition entries in table 402 from word entries, labels are used to identify definition entries. For example, entry in ID field 406B is labeled “LABEL001”. A corresponding definition intype field 408B identifies the label as a “fine sense” word relationship. A corresponding entry in annotation filed 410B identifies the label as “Noun. A financial institution”. As such, a “bank” can now be linked to this word sense definition. Furthermore an entry for the word “brokerage” may also be linked to this word sense definition. Alternate embodiments may use a common word with a suffix attached to it, in order to facilitate recognition of the word sense definition. For example, an alternative label could be “bank/n1”, where the “/n1” suffix identifies the label as a noun (n) and the first meaning for that noun. It will be appreciated that other label variations may be used. Other identifiers to identify adjectives, adverbs and others may be used. The entry intype field 408 identifies the type associated with the word. There are several types available for a word, including: word, fine sense and coarse sense. Other types may also be provided. In the embodiment, when an instance of a word has a fine sense, that instance also has an entry inannotation field 410 to provide further particulars on that instance of the word. - Edge/Relations table 404 contains records indicating relationships between two entries in nodes table 402. Table 404 has the following entries: From
node ID column 412, tonode ID column 414,type column 416 andannotation column 418.Columns Column 416 identifies the type of relation that links the two entries. A record has the ID of the origin and the destination node, the type of the relation, and may have annotations based on the type. Types of relations include “root word to word”, “word to fine sense”, “word to coarse sense”, “coarse to fine sense”, “derivation”, “hyponym”, “category”, “pertainym”, “similar”, “has part”. Other relations may also be tracked therein. Entries inannotation column 418 provide a (numeric) key to uniquely identify an edge type going from a word node to either a coarse node or fine node for a given part-of-speech. - Referring to
FIG. 4 , further detail onWSD module 32 is provided.WSD module 32 comprisescontrol file optimizer 514, iterative component sequencer (ICS) 500,linguistic components 502, andWSD components 504. - Turning first to
WSD components 504 andlinguistic components 502, common characteristics and features ofWSD components 504 and linguistic components 502 (“components”) are now described. Results generated by a particular component are preferably rated using a probability distribution and a confidence score. The probability distribution allows a component to return a probability figure indicating the likelihood that any possible answer is correct. In the case ofWSD components 504, possible answers comprise possible senses of words in the text. In the case oflinguistic components 502, possible answers depend on the task being performed by the linguistic component; for example, possible answers for part-of-speech tagger 502F are the set of possible part of speech tags for each word. The confidence score provides an indication of a level of confidence of the algorithm in the probability distribution. As such, an answer having a high probability and a high confidence score indicates that the algorithm has identified a single answer as most probable and it is highly likely that the identified answer is accurate. If an answer has a high probability score and a low confidence, then although the algorithm has identified a single answer as most probable, its confidence score indicates that it may not be correct. In the case ofWSD components 504, a low confidence score may indicate that the component is lacking information that it needed to disambiguate this particular word. It is important that each component have a good confidence function. A component with a low overall accuracy but a good confidence function is able to contribute to the system accuracy despite its low overall accuracy, as the confidence function will identify correctly the subset of words for which the answers supplied by the component can be trusted. - The confidence function considers internal operating features of the component and its algorithm and evaluates potential weaknesses of accuracy of the algorithm. For example, if an algorithm relies on statistical probabilities, it would tend to produce incorrect results when probabilities were calculated from very few examples. Accordingly, for that algorithm, the confidence score will use a variable containing the number of examples used by the algorithm. A confidence function may contain several variables, even hundreds of variables. The function is usually created by using the variables as input into a classification or regression algorithm (statistical, such as a generalized linear model, or based upon machine learning, such as a neural network) familiar to those skilled in the art. The data used to train the classification or regression algorithm is preferably obtained by running the WSD algorithm over a portion of sense-tagged
corpus 404 that has been set aside for this purpose. - Many of the components employ statistical techniques based on machine learning concepts or other statistical techniques which will be familiar to those skilled in the art. It will be appreciated by those skilled in the art that such components require use training data, in order to construct their statistical models. For example, the
priors component 504A utilizes many sense-tagged examples of each word in order to determine what is the statistically most likely sense for that particular word. In the embodiment, the training data is provided by sense-taggedcorpus 404, which is known by those skilled in the art as a “training corpus”. - Further detail is now provided on features of
WSD components 504. EachWSD component 504 attempts to associate the correct senses to words in text using a particular word sense disambiguation algorithm. EachWSD component 504 may run more than one time during the course of a disambiguation. The system provides semantic word data or other forms of data indatabase 30 that each of the algorithms needs in order to perform disambiguation. As noted earlier, eachWSD component 504 has an algorithm that executes a particular type of disambiguation and generates a probability score and a confidence score with its results. The WSD components include but are not limited to:priors component 504A;example memory component 504B; n-gram component 504C;concept overlapping component 504E; heuristicword sense component 504F;frequent words component 504G; anddependency component 504H. Each component has a specialized knowledge base associated with its particular operation. Each component produces a confidence function as detailed above. Details of each component are described below. Each technique is generally known in the art, unless specific aspects are provided herein. It will also be appreciated that not all of the WSD components described in the embodiment may be necessary to accomplish accurate word sense disambiguation, but that some combination of different techniques is required. - For
priors component 504A, it utilizes a priors algorithm to predict word senses by utilizing statistical data on frequency of appearances of various word senses. Specifically the algorithm assigns a probability to each word sense based on the frequency of the word sense in a sense-taggedcorpus 404. These frequencies are preferably stored in thecomponent resources 402. - For
example memory component 504B, it utilizes an example memory algorithm to predict words senses for phrases (or word sequences). Preferably it attempts to predict word senses of all the words in a sequence. Phrases typically are defined as a series of consecutive words. A phrase can be two words long up to a full sentence. The algorithm accesses a list of phrases (word sequences) which provide a deemed correct sense for each word in that phrase. Preferably, the list comprises sentence fragments from sense-taggedcorpus 404 that occurred multiple times where the senses for each of the fragment occurrence was identical. Preferably, when an analyzed phrase contains a word which has a sense which differs from a sense previously attributed to that word in that phrase, senses in the analyzed phrase are rejected and are not retained in the list of word sequences. - When disambiguating text, the example memory algorithm identifies whether parts of the text or text match the previously identified recurring sequences of words which have been retained in the list of word sequences. If there is a match, the module assigns the word senses of the sequence to the matching words in the text.
- For n-
gram component 504C, it utilizes an n-gram algorithm which operates over a fixed range of words and only attempts to predict a sense of a single word once at a time, in contrast to the example memory algorithm. The n-grams algorithm predicts word senses for a head word by matching features immediately surrounding the word in a very narrow window. Such features include: lemma, part of speech, coarse of fine word sense, and a name entity type. While the algorithm may examine n words before or following a target word, typically, n is set at two words. With n being set at 2, the algorithm utilizes a list of word pairs with a correct sense associated with each word. This list is derived from word pairs from sense-taggedcorpus 404 that occurred multiple times, where the senses for each of the word pair occurrence was identical. However, when a sense of at least one word differs, such word pair senses are rejected and are not retained in the list. When disambiguating text, the algorithm matches word pairs from the text or text being processed with word pair present in the list maintained by the algorithm. A match is identified when a word pair is found and the sense of one of the two words is already present in the text or text being processed. When a match is identified, it is assigned the sense relating to the second word in the word pair being processed. - The component resource associated with the n-grams algorithm is trained over sense-tagged
corpus 404, and is part ofcomponent resources 402. The n-grams component resource includes a statistical model which identifies when an n-gram has been seen sufficiently frequently to become a valid sense predictor. Several predictors from the knowledge base may by triggered by a pattern of words. These predictors may reinforce a common sense or may actually generate multiple possible senses with a given probability distribution. - For
concept overlapping component 504E, it has a concept overlapping algorithm which predicts a sense for words by choosing the senses which match most closely the general topic of the text segment. In the embodiment, the topic of the text segment is defined as the set of all non-removed senses for all words in text segment, and topical similarity is assessed by comparing the topic of the text segment which is being disambiguated with the topics extracted from the sense taggedcorpus 404 for each word sense, and choosing the sense of each word with the highest such similarity. One such method of comparison is the dot-product or cosine metric. There are many other techniques for making use of topic similarity to disambiguate text, as will be familiar to those skilled in the art. - For heuristic
word sense component 504F, it has a heuristic word sense algorithm which predicts a sense of words using human-generated rules which may use intrinsic language properties and semantic links in the knowledge base. For example, the senses “language” in terms of“a spoken human language” and “Indonesian” are related in the knowledge base by the relation “Indonesian is a language”. A sentence containing both “language” and “Indonesian” would have the word “language” disambiguated by this component. Typically, such a relation has been manually verified, thereby providing a high confidence in accuracy. - For
frequent words component 504G, it has a frequent words algorithm which identifies the senses of the most frequently occurring words. In English, the 500 most frequently occurring words account for almost a third of the words encountered in normal text. For each of these words, a large amount of training examples are available in sense-taggedcorpus 404. Accordingly, it is possible to train using supervised machine learning methods specific sense predictors for each word. In the embodiment, the machine learning method used to train the component is boosting, and the features used include the words and parts of speech of the words in immediate proximity to the target word to be disambiguated. Other features and machine learning techniques may be used to accomplish the same goal, as will be familiar to those skilled in the art. - For
dependency component 504H, it has a dependency algorithm which utilizes a sense prediction model based on the semantic dependencies in a sentence. By determining that a word is a head word in a dependency, and optionally the sense of the head word, it predicts the sense of its dependant words. Similarly, having determined that a word is a dependent and optionally the sense of the dependent word, it can predict the sense of the head word. For example in the text fragment “drive the car”, the head word is “drive” and the dependant is “car”. Knowledge of the sense of “car” will be sufficient to predict the sense of “drive” as “drive a vehicle”. - It will be appreciated that other techniques for word sense disambiguation become available from time to time as the scientific research in the field progresses, and that such other techniques could equally be included as new WSD components within the system. It will by appreciated that a single WSD component may not be sufficient to disambiguate text with high accuracy. To address this issue, the embodiment utilizes multiple techniques to disambiguate text. The techniques described above specify an exemplary combination which is capable of performing high accuracy word sense disambiguation. Other techniques may also be used.
- Turning now to
linguistic components 502, eachcomponent 502 provides a text processing function which can be applied to text to determine a certain type of linguistic information. This information is then provided to theWSD components 504 for disambiguation. The operation of each of thelinguistic components 502 will be familiar to one skilled in the art. Thelinguistic components 502 include: -
Tokenizer 502A which splits input text into individual words and symbols.Tokenizer 502A processes the input text as a sequences of characters and breaks the input text into a series of tokens, where a token is the smallest sequence of characters that can form a word. - Sentence boundary detector 502B which identifies sentence boundaries in the input text. It uses rules and data (e.g., list of abbreviations) to identify the possible sentence breaks in the input text.
-
Morpher 502C which identifies a lemma, i.e. a base form, of a word. In the embodiment, the lemma defines the fine sense and coarse sense inventories of the word. For example, for the inflected word “jumping” the morpher identifies its base form “jump”. -
Parser 502D which identifies relationships between the words in the input text.Parser 502D identifies grammatical structures and phrases in the input text. The result of this operation is a parse tree, which is a concept very well known in the field. Some relationships include “subject of the verb” and “object of the verb”. From the phrases, a list of syntactic and semantic dependencies can later be extracted.Parser 502D also produces part of speech tags that are used to update the part of speech distribution. Parser information is also used to select possible compounds. - Dependency extractor 502J uses the parse tree to generate a list of syntactic and semantic dependencies, which will be familiar to those skilled in the art. The semantic dependencies are used by a number of other components to enhance their models. Dependencies are extracted in the following manner:
- 1.
Parser 502D is used to generate a syntactic parse tree, including syntactic heads for each phrase. - 2. Using set of heuristics, as will be familiar to those skilled in the art, semantic heads are generated for each phrase. Semantic heads differ from syntactic heads as the semantic rules give preference to semantically important elements (like nouns and verbs) while syntactic heads give preference to syntactically important elements like prepositions.
- 3. Once a semantic head (word or phrase) is identified, sister words and phrases are considered to form dependencies with the head.
- Named-
entity recogniser 502E identifies known proper nouns such as “Albert Einstein” or “International Business Machines Incorporated” and other multi-word proper nouns. Named-entity tagger 502E collects tokens that form a named entity into groups and classifies the group into categories. Such categories include: a person, location, artefact, as will be familiar to those skilled in the art. Named-entity categories are determined by a Hidden Markov Model (HMM) that is trained on parts of the sense-taggedcorpus 404 in which the named entities have been marked. For example in the text fragment “Today Coca-Cola announced . . . ”, the HMM will categorize “Coca-Cola” as a company (instead of an artefact) because of analysis of the surrounding words. Many techniques exist for named entity recognition as will be familiar to those skilled in the art. - Part-of-
speech tagger 502F assigns functional roles such as “noun” and “verb” to the words in the input text. Part ofspeech tagger 502F identifies a part of speech, which can be mapped to the broad parts of speech (noun, verb, adverb, adjective) relevant to disambiguating between word senses. Part-of-speech tagger 502F utilizes several a trigram-based Hidden Markov Model (HMM) trained on a portion of sense-taggedcorpus 404 which has been annotated with part of speech information. Many techniques exist for part of speech tagging, as will be familiar to those skilled in the art. -
Compound finder 502H finds possible compounds in the input text. An example of a compound is “coffee table” or “fire truck”, which although sometimes written as two words need to be treated as a single word for the purposes of word sense disambiguation.Knowledge base 400 contains a list of compounds, which can be identified in the text. Each identified compound is given a probability which marks the likelihood that the compound was correctly formed. The probability is calculated from the sense-taggedcorpus 404. - Turning now to
ICS 500,ICS 500 controls the sequence in whichlinguistic components 502 andWSD components 504 are operated on text, to continually reduce the amount of ambiguity in a text being processed. It has several specific functions: - 1. It coordinates extraction of required elements from text utilizing selected
linguistic components 502 and provides such elements toWSD components 504. through a common interface. - 2. It seeds an initial set of sense possible for each
word using seeder 500A, which associates an initial set of possible senses from theknowledge base 400 to each word in the text to identify to theWSD components 504 which senses they must disambiguate between, thus providing an initial maximum level of ambiguity. - 3. It invokes
WSD components 504 according to an algorithm mix identified bycontrol file 516. Activations of the selectedWSD components 504 then attempt to disambiguate the text, providing probabilities and confidence scores associated with possible senses of the words in the text. Preferably, WSD components are invoked in multiple iterations. - 4. It merges and integrates output from multiple components using
merging module 500B andambiguity eliminator 500C.Merger module 500B combines the outputs of all of theWSD components 504 into a single merged probability distribution and confidence score.Ambiguity eliminator 500C which determines which sense ambiguity can be removed from the text based upon the output ofmerger module 500B. - More detailed description of the function and design of
ICS 500 is provided in subsequent sections describing the operation of the process of word sense disambiguation. - The
control file optimizer 514 optionally performs a training procedure which outputs a “recipe” in the form ofcontrol file 516, which contains optimal sequence and parameters for theWSD components 504 in each iteration, and is used byICS 500 during word sense disambiguation. More detailed description of the function and design ofcontrol file optimizer 514 is provided in subsequent section describing the generation of an optimized control file. - Further detail is now provided on steps performed by the embodiment to process text. Referring to
FIG. 6 , a process to perform disambiguation of text generally by reference 600. The process may be divided into four steps. The first step is to generate an optimizedcontrol file 602. This step creates a control file which is used in thestep disambiguate text 606. The second step readtext 604 comprises reading in the text to be disambiguated from a file. The thirdstep disambiguate text 606 consists of disambiguating the text, and is the main step in the process. The fourth step output disambiguatedtext 608 consists of writing the sense-tagged text to a file. - Referring to
FIG. 7 , further detail is now provided on the main processing step, disambiguatetext 606. - Upon receiving a text to disambiguate,
ICS 500 processes the text in the following manner: - 1.
ICS 500 passes the text throughtokenizer 502A to identify the boundaries of the words and separate these from punctuation symbols that may be present in the text. - 2.
ICS 500 causes the syntactic features in the text to be identified by passing the text throughlinguistic components 502. Such features include: lemma (including compounds), part of speech, named entities and semantic dependencies. Each feature is generated with a confidence score and with a probability distribution. - 3. Processed text is then provided to
seeder 500A which uses lemma and part of speech generated bylinguistic components 502 to identify a list of possible senses in theknowledge base 400 for each word in the text. - 4.
ICS 500 then applies a set ofWSD components 504 independently to the input text, wherespecific WSD components 504 and a sequence of their execution are specified incontrol file 516. EachWSD component 504 disambiguates some or all of the words in the text. For senses that are disambiguated, a probability distribution and a confidence score are generated by eachWSD component 504. - 5.
ICS 500 then performs a merging operation usingmerging module 500B. This module merges the results of all components for all words to generate a single probability distribution of senses and associated confidence score for each word. Prior to merging, if specified in thecontrol file 516,ICS 500 may discard results with insufficiently high confidence, or for which the probability of the top result is insufficiently high. The merged probability distribution is the weighted sum of each remaining probability distribution, with the weight being provided by the confidence score. The merged confidence score is a weighted average of confidence values, with weights provided by the confidence score. For example, if a WSD component “A” had given “hot beverage” at 100% probability for the sense of the word “Java”, and WSD component “B” had given “programming language” at 100% probability for the same word, then the merged distribution would contain both “hot beverage” and “programming language” at 50% probability each. In order to merge the results ofWSD components 504 that produce only coarse senses, the merger can optionally be run twice, once on the coarse senses and a second time over the group of fine senses associated with each coarse sense. - 6.
ICS 500 then performs ambiguity reduction usingambiguity eliminator 500C. The embodiment performs this process based upon the merged distribution and confidence output by mergingmodule 500B. When a sense in the merged distribution has a deemed very high probability and high confidence, it is deemed to contain the correct sense and all other senses can be removed. For example, if a merged result indicated that the disambiguation for “java” was “coffee” with 98% probability and its confidence score was 90%, then all other senses would be excluded as being possible, and “coffee” would be the sole remaining sense.Control file 516 sets probability and confidence score thresholds for this decision point. Conversely, when one or more senses have a very low probability and high confidence score, such senses may be deemed to be improbable and are removed from the set of senses. Again controlfile 516 sets probability and confidence thresholds for this decision point. This process reduces ambiguity from the input text by utilizing information provided byWSD components 504, and accordingly influences which senses are provided toWSD components 504 during subsequent iterations of disambiguation. - 7. At least one or more iterations of
steps 4, 5 and 6 may optionally be performed. It will be appreciated that results of each subsequent iteration will likely be different than those of previous iteration(s), asWSD components 504 themselves do not predict senses which were eliminated after previous iterations.WSD components 504 make use of the reduced ambiguity as compared to the previous iteration to produce a result with a more accurate distribution and/or higher confidence score.Control file 516 identifies which set ofWSD components 504 is applied on each iteration. It will be appreciated that several iterations may be performed until a sufficient number of words have been disambiguated or until the number of iterations specified in thecontrol file 516 have been completed. - In the embodiment, the word sense disambiguation process may involve multiple iterations. Typically, in each iteration, only a portion of ambiguity can be removed without introducing a large number of disambiguation errors. Preferably, for each word that any selected
WSD component 504 attempts to disambiguate, the selectedWSD component 504 returns a full probability distribution over those senses which had not previously been removed. Generally, aWSD component 504 is not allowed to increase ambiguity of a text by re-submitting a sense for a word which has previously been discarded for that word. Also, each WSD component in an iteration operates independently from the others and interactions betweenWSD components 504 occur under the control ofICS 500 or via ambiguity removed in a previous iteration. In other embodiments, different degrees of interaction and knowledge of results between WSD components during an iteration and between iterations may be provided. It will be appreciated that due to the highly complex and unpredictable nature of such interactions, systems that include a high degree of interaction betweenWSD components 504 explicitly programmed into theWSD components 504 tend to be too complex to built practically. As such, the controlled interaction betweenWSD components 504 provided by the structure of the ICS and the independence of theWSD components 504 is a key advantage of the embodiment and invention. - The combined action of
merger module 500B andambiguity eliminator 500C is to post-process the results ofseveral WSD algorithms 504 to reduce ambiguity in the text. The combined action of these modules is referred to as the post processing module 512. It will be appreciated that the use of amerging module 500B and anambiguity reducer 500C as described in the embodiment is an exemplary technique in this particular embodiment only and that alternative techniques could be devised. For example, post processing module 512 may utilize a machine learning technique, such as a neural network, to merge and prune results. In this algorithm, the probability distributions and confidence scores of each algorithm are fed into a learning system, which generates a combined probability and confidence score for each sense. - In relation to the
merger module 500B, other algorithms, such as voting algorithms and merging of rankings algorithms may be used. - Referring to
FIG. 8 , further details are now provided on controlfile optimizer process 514 used to generate an optimizedcontrol file 516 providing maximum disambiguation accuracy. The process begins with a sense tagged corpus 802. In the embodiment, this sense tagged corpus is a portion of the sense taggedcorpus 404 that has been set aside for the purpose of performing controlfile optimizer process 514.Control file optimizer 514 uses theWSD module 606 to generate acontrol file 516 that optimizes accuracy of the WSD module over the sense tagged corpus. -
Control file optimizer 514 requires that optimization criteria are specified. Thresholds are specified separately for either the percentage of ambiguity to be removed, or the percentage accuracy of disambiguation; the control file optimizer then optimizes the control file to maximize the performance of word sense disambiguator on one measure given the threshold for the other. It is also possible to specify a maximum number of iterations. The number of correct results or the amount of ambiguity removed given are then maximized for each iteration. After the optimal combination of algorithms and thresholds for a given accuracy have been determined, the training proceeds to the next iteration. The target accuracy is lowered at each iteration, which allows the standard of results to drop gradually as the number of iterations increases. Multiple sequences of target accuracy are tested and the sequence producing the best results over the sense tagged corpus 802 is selected. Preferentially, accuracy or remaining ambiguity is progressively reduced on each subsequent iteration. Example iteration accuracy sequences that are tested are: - 1. 95%−>90%−>85%−>80%
- 2. 90%−>80%
- For a given iteration and target disambiguation accuracy, the optimal list of algorithms to invoke and the associated probability and confidence thresholds of results to keep is identified by executing the following steps:
- 1. Invoke each
WSD component 504 individually on sense-tagged corpus 802 to obtain a set of results for each component. - 2. For a set of results of a
WSD component 504, search space of probability and confidence threshold to identify thresholds which maximize performance against the optimization criteria. This is done through a search of all combinations of probability and confidence thresholds in the range of 0% to 100% in fixed step increments, such as 5%. - 3. Once optimal thresholds for each
WSD component 504 are identified, results of allWSD components 504 are pruned according to those thresholds and are merged using themerging module 500B as described earlier. - 4. Consolidated merged results are then searched to identify probability and confidence thresholds of merged results that optimize a number of correct answers with an accuracy equal to or above the target accuracy for the iteration. This is preferably performed using the method of
step 2. - 5.
Step 4 is repeated forWSD component 504 that was merged but the results of theWSD component 504 of interest are excluded. The probability and confidence thresholds to maximize the number of correct results of this result set are them identified. The difference between the maximum number of correct results of this set compared to the number obtained instep 4 indicates a contribution of correct unique answers of the algorithm of interest. If the contribution of aWSD component 504 is negative, it identifies that thisWSD component 504 as having a detrimental impact on the results. If the contribution is zero, then it identifies that theWSD component 504 is not contributing new correct results in the iteration. In either case, theWSD component 504 having the lowest negative contribution is removed from the list ofWSD components 504 to be invoked in subsequent iterations. - 6. Step 5 is repeated until a set
number WSD components 504 that have a negative or zero contribution are identified and removed. The number may be allWSD components 504. - 7.
Steps 2 through 6 are repeated but with the target accuracy for ofstep 2 modified by a small increment, e.g. 2.5% both above and then below the target accuracy of the iteration. - 8. The combination of
WSD components 504 and the associated probability and confidence thresholds that resulted in the largest number of correct answers are retained as the solution to a given iteration. The thresholds for probability and confidence for eachWSD algorithm 504 and theambiguity reducer 500C are written to the control file, and the training proceeds to the next iteration and target disambiguation accuracy. - The
control file optimizer 514, can be set to optimize accuracy given that each word is assigned one and only one sense, the above description implies. It will be recognized that for certain applications or in certain specific instances, it may not make sense to attempt to assign only one sense to each word, or to disambiguate all the words. - The amount of ambiguity present in text prior to any disambiguation may be considered to be the maximum ambiguity. The amount of ambiguity present in fully sense-tagged text, for which each word has been assigned one and only one word sense can be considered to be the minimum ambiguity. It will be recognized that for some applications or in certain cases it will be useful to remove only part of the ambiguity present in the text. This can be accomplished by allowing a word to have more than one possible sense, or by not disambiguating certain words, or both of these. In the embodiment, the percentage of ambiguity removed is defined as the (number of senses discarded), divided by the (total number of possible senses minus one). It will further be recognized that, in general, removing a smaller percentage of ambiguity permits
word sense disambiguator 32 to return a more accurate results, given thatword sense disambiguator 32 can specify more than one possible sense for a word, and where a word is considered correctly disambiguated if senses specified for the word include the correct sense of the word. - Optionally, the
control file optimizer 514 can be provided with separate optimization criteria and thresholds for the percentage of ambiguity to be removed by theword sense disambiguator 32 and the accuracy of the disambiguation results ofword sense disambiguator 32. Thecontrol file optimizer 514 can be asked to either a) maximize the amount of ambiguity removed subject to a minimum threshold of accuracy (for example, remove as much ambiguity as possible, ensuring that the remaining possible senses for the words are 95% likely to contain the correct sense), or b) to maximize disambiguation accuracy subject to a minimum percentage of ambiguity to remove (for example, maximize accuracy subject to removing at least 70% of additional senses for each word). This capability is useful in applications a) because it allowsword sense disambiguator 32 to better fit the real world of natural language texts, in which words may be truly ambiguous (i.e. ambiguous to a human) as expressed in a text, and therefore not possible to fully disambiguate, and b) because it allows applications making use ofword sense disambiguator 32 to opt for more or less conservative implementations ofword sense disambiguator 32, wherein the precision of the disambiguation is lower, but fewer correct senses are discarded. This is particularly valuable, for example in information retrieval applications for which it is critical that correct information is never discarded (e.g. due to incorrect disambiguation), even at the expense of including extraneous information (e.g. due to additional incorrect senses being present in the disambiguated text). - Optionally, the
control file optimizer 514 can be provided with a maximum number of iterations. - It will be appreciated that creating accurate confidence functions is important. A component with a poor confidence function, even a component with high accuracy, will not contribute or will contribute less than optimally to the system accuracy. This occurs in one of two ways:
- 1. If the confidence function tends to frequently give a low confidence value to a correct result, then
merger 500A will effectively ignore this result, due to the arithmetic of the merger whereby results are weighted by the confidence score, with the net effect being as if the component had not given a result at all for that word. Thus, these correct results will be excluded from contributing to the system due to the poor confidence function. - 2. On the other hand, if the confidence function gives a high confidence value to incorrect results, then the automatic training procedure will recognize that the algorithm contributes many incorrect results, and exclude it from being run.
- It will be appreciated that adding an algorithm with a poor confidence function to the system (for example, one which is overly optimistic and often produces incorrect results with 100% confidence) does not severely detrimentally affect the accuracy of the system, as the control
file optimization procedure 514 described above will discounts such results and it will not execute that algorithm in further iterations of disambiguation. This provides a level of robustness to the system against the inclusion of poor WSD components. - It will be apparent to those skilled in the art that the accuracy of most WSD systems increases with the size of the training corpus but decreases with an inaccurately tagged training corpus. The addition of accurately sense-tagged text to the training corpus will usually increase the effectiveness of WSD components. In addition,
most WSD components 504 require a portion of the sense-taggedcorpus 404 to be set aside for the training of their confidence function. It will be appreciated that the effectiveness of the confidence function increases as the amount of sense-tagged text in the portion of the sense-taggedcorpus 404 set aside for confidence function training increases. - Sense-tagged
corpus 404 can be created manually by human lexicographers. It will be appreciated that this is a time consuming and expensive process, and that finding a way to generate or augment sense-taggedcorpus 404 automatically would be of substantial value. - Referring to
FIG. 9 , the embodiment also provides a system and method for automatically providing a sense-taggedcorpus 404 or for automatically increasing the size of sense-taggedcorpus 404 for the training ofWSD components 504. There are two processes illustrated inFIG. 9 . The first is the component training process 960. This process uses sense taggedtext 404 oruntagged text 900 as an input to the WSDcomponent training module 906 in order to generate improved component resources for theWSD components 504. The second process is thecorpus generation process 950. This process processesuntagged text 900 or partially tagged text 902 through theWSD module 32. Using the confidence function and probability distributions output by theWSD process 32, senses which are likely to be incorrectly tagged are then filtered out by thefilter module 904. This partially sense tagged text can then be added to the partially tagged text 902 or the sense taggedcorpus 404. When these two processes component training process 960 andcorpus generation process 950 are run alternatively, the effect is to improve the accuracy of theWSD module 32 and to increase the size of the sense-taggedcorpus 404. - As described above, it will recognized that most
conceivable WSD components 504 require a training process to be performed over a sense taggedcorpus 404 before they can be used to disambiguate text. For example,priors component 504A requires that the frequencies of senses be recorded from a sense taggedcorpus 404. These frequencies are stored in theWSD component resources 402. As described above, the more sense taggedtext 404 is available to the training process, the more accurate eachWSD algorithm 504 will be. The collection of the training processes of allWSD components 504 is collectively referred to inFIG. 9 as the WSD component training process 960. - As described above, results of
several WSD components 504 are combined to disambiguate previously unseen text. This is a process known as “bootstrapping”. - With the embodiment, only results with sufficiently high confidence are added to the training data, utilizing the following algorithm:
- 1. Train each model of each word sense disambiguation using the component training process 960 using available training data from the sense tagged
corpus 404. - 2. Disambiguate a large quantity of
untagged documents 900 using theWSD module 32; preferably a very large quantity of documents are used from various domains. - 3. In the
filter module 904, discard all results where the result is ambiguous or where the confidence is below a threshold, which may be adjusted. - 4. Add the non-discarded senses to the sense tagged
data 404. - 5. Re-train the set of word sense disambiguation components using the component training process 960.
- 6. Restart the training over the same documents which are now in the sense tagged
corpus 404 or over a new body ofuntagged text 900. - A key to this process is the use of a probability distribution and confidence score. In prior art systems, a confidence score is not available and inaccurate results cannot be discarded. As a result, the
WSD components 504 are less accurate after retraining on the enlarged sense taggedcorpus 404 than they were before, and such a process is not practically useful. By setting a high confidence threshold that rejects most incorrect senses from being added to the sense taggedcorpus 404, the embodiment eliminates this deficiency in the prior art system and allows the training data to be enlarged with high quality tagged text. It will be appreciated that this process can run multiple times, and may create a self-reinforcing loop that increases both the size of the sense taggedcorpus 404 and the accuracy of theWSD system 32. The quality of the training data extracted (due to the use of a probability distribution and a confidence score) and the potentially self-reinforcing nature of the bootstrapping process are features of the embodiment. - The embodiment also provides a variant of the above bootstrapping process to train the system for a specific domain (e.g., law, health, etc.), utilizing the following variation on the algorithm:
- 1. A number of documents are disambiguated by a highly accurate method, such as manually by a skilled human. Use of these documents provides “seeding resources” to the system, which are added to the sense tagged
corpus 404. - 2. The word sense disambiguation components are trained using the WSD component training process 960.
- 3. A large quantity of documents from the domain are automatically disambiguated and added to the sense tagged
corpus 404 using thecorpus tagging process 950. - It will be apparent that the embodiment has several advantages over the prior art. Some include:
- 1. Multiple independent algorithms. The embodiment allows more components to be incorporated utilizing a simplified interface through
ICS 500. As such, several disambiguation techniques (for example between 10 and 20) without the system becoming too complex to manipulate. - 2. Confidence functions. In prior art systems, a confidence score is not available. The confidence score provides several critical advantages in prior art systems:
- a) Merging together of results of multiple components. The confidence function allows results from different probabilistic algorithms to be combined with different weights reflecting the expected accuracy of the algorithm in a particular situation. Using the confidence function invention above, the system can merge together decisions of many components to obtain a more likely sense.
- b) Discarding poor results or word senses for truly ambiguous words. It allows potentially inaccurate results to be discarded, such embodiment can opt not to provide senses for words for which it has little confidence in its answer. This reflects better the real world of natural language expression, wherein some expressions remain ambiguous even when analyzed by a human.
- c) Bootstrapping. The confidence function provides a likelihood that each answer is correct. This allows only highly accurate results to be kept and reused as training text for components and the overall system. Additional training text in turn further improves the accuracy of the components and the overall system. This is a highly accurate form of bootstrapping, and offers a comparable gain in performance to sense-tagging additional training text using human lexicographers, at a tiny fraction of the cost. The amount of sense-tagged text that can be generated from untagged text (for example, the Internet) with this technique is limited only by available computer capacity Prior art systems have performed bootstrapping without a confidence score, but the sense tags in the text fed to the system are far less accurate than those provided by a human lexicographer or a confidence-score enabled system, and the overall performance of the system quickly stagnates or degrades.
- 3. Iterative disambiguation. The system allows a component to have multiple passes over the text being disambiguated, which allows it to use high-accuracy disambiguations (or reductions in ambiguity) provided by any of the other components, to improve its accuracy in disambiguating the remaining words. For example, when faced with the words “cup” and “green” in one sentence, a
particular WSD component 504 may not be able to distinguish between a “cup” sense for “golf” and the more mundane “drinking vessel”. If anotherWSD component 504 is able to disambiguate the word “green” into its “golf green” sense, then thefirst WSD component 504 may now be able to correctly disambiguate “golf” into “golf cup”. In this sense,WSD components 504 interact with each other to arrive at more likely senses. - 4. Method for automatically tuning
WSD module 32.WSD module 32 includes a method for merging an optimal “recipe” of components and parameter values. This merged set is optimal in the sense that it provides the parameters which utilise multiple iterations of multiple components to obtain the maximum possible accuracy. - 5. Multiple levels of ambiguity. By operating simultaneously on coarse and fine senses, the embodiment can integrate different components effectively. For example, several classes of linguistic components operate by attempting to discern a topical content of text. These types of components tend to have poor accuracy over fine senses, since these often respect grammatical rather than semantic distinctions, but do very well over coarse senses. The
WSD module 32 is capable of merging results between components that give fine and coarse senses, allowing each component to operate over the sense granularity most appropriate for that component. Furthermore, an application that requires only coarse senses can obtain these fromWSD module 32. Due to their coarseness, these coarse senses will have higher accuracy than the fine senses. - 6. Use of domain-specific data. If information about the problem domain is known, the embodiment can be biased to favour senses which match the problem domain. For example, if it is known that a particular document falls within the domain of Law, then
WSD module 32 can provide sense distributions to the components which favour those terms in the legal domain. - 7. Gradual reduction in ambiguity. It will be appreciated that prior art systems perform disambiguation by attempting to choose one single sense for each word in a single iteration, which amounts to removing all ambiguity at once. This decreases the accuracy of the disambiguation. The embodiment instead performs this process gradually, removing some of the ambiguity at each iteration.
- Optionally, the embodiment uses metadata. For example, the title of the document can be used to aid in the disambiguation of the document's text, by allowing the words in the title to carry disproportionate weight towards the disambiguation.
- Although the invention has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the scope of the invention as outlined in the claims appended hereto. A person skilled in the art would have sufficient knowledge of at least one or more of the following disciplines: computer programming, machine learning and computational linguistics.
Claims (18)
1. A method of processing natural language text utilizing a plurality of disambiguation components to identify a disambiguated sense for said text, said method comprising steps of:
applying a selection of components from said plurality of disambiguation components to said text to identify a local disambiguated sense for said text,
wherein
each component of said selection provides a local disambiguated sense of said text with a confidence score and a probability score; and
said disambiguated sense is determined utilizing a selection of local disambiguated senses from said selection.
2. The method of processing natural language text as claimed in claim 1 , wherein said selection of components are sequentially activated and controlled by a central module.
3. The method of processing natural language text as claimed in claim 2 , further comprising
identifying a second selection of components from said plurality of components;
applying said second selection to said text to refine said disambiguated sense,
wherein
each component of said second selection provide a second local disambiguated sense of said text with a second confidence score and a second probability score; and
said disambiguated sense is determined utilizing a selection of second local disambiguated senses from said second selection.
4. The method of processing natural language text as claimed in claim 3 , further comprising
after applying said selection to said text and prior to applying said second selection to refine said disambiguated sense, eliminating a sense from said disambiguated sense having a confidence score below a threshold.
5. The method of processing natural language text as claimed in claim 4 , wherein when a particular component of said plurality of components is present in said selection and said second selection, at least one of its confidence and probability scores is adjusted when applying said second selection to said text.
6. The method of processing natural language text as claimed in claim 4 , wherein said selection and said second selection of components are identical.
7. The method of processing natural language text as claimed in claim 4 , wherein said confidence score of said each component is generated by a confidence function utilizing a trait of each component.
8. The method of processing natural language text as claimed in claim 4 , wherein after applying said selection of components to said text to identify a local disambiguated sense for said text, said method further comprising
for each said component of said selection, generating a probability distribution for its disambiguated sense; and
merging all probability distributions for said selection.
9. The method of processing natural language text as claimed in claim 8 , wherein said selection of component disambiguates said text using context of said text identified from one of domain; user history; and specified contexts.
10. The method of processing natural language text as claimed in claim 8 , further comprising after applying said selection to said text, refining a knowledge base of each component in said selection utilizing said disambiguated sense.
11. The method of processing natural language text as claimed in claim 4 , wherein at least one of said selection of components provides results only for coarse sense s.
12. The method of processing natural language text as claimed in claim 4 , wherein results of said selection of components are combined into one result utilizing a merging algorithm.
13. The method of processing natural language text as claimed in claim 12 , wherein said process utilizes a first stage comprising merging of coarse senses, and a second stage comprising merging of fine senses within each coarse sense grouping.
14. The method of processing natural language text as claimed as claimed in claim 13 , wherein said merging process utilizes a weighted sum of probability distributions, and said weights are the confidence score associated with said distribution, and wherein said merging process comprises a weighted average of confidence scores, and said weights are again the confidence scores associated with said distribution.
15. A method of generating sense-tagged text, said method comprising steps of:
disambiguating a quantity of documents utilizing a disambiguation component;
generating a confidence score and a probability score for a sense identified for a word provided by said component;
if said confidence score for said sense for said word is below a set threshold, said sense is ignored; and
if said confidence score for said sense for said word is above said set threshold, said sense is added to said sense-tagged text.
16. A method of processing natural language text utilizing a plurality of disambiguation components to identify a disambiguated sense or senses for said text, said method comprising steps of:
defining an accuracy target for disambiguation; and
applying a selection of components from said plurality of disambiguation components to meet said accuracy target.
17. A method of processing natural language text utilizing a plurality of disambiguation components to identify a disambiguated sense for said text, said method comprising steps of:
identifying a set of senses for said text; and
identifying and removing an unwanted sense from said set.
18. A method of processing natural language text utilizing a plurality of disambiguation components to identify a disambiguated sense for said text, said method comprising steps of:
identifying a set of senses for said text; and
identifying and removing a specified amount of ambiguity from said set of senses.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/921,954 US20050080613A1 (en) | 2003-08-21 | 2004-08-20 | System and method for processing text utilizing a suite of disambiguation techniques |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US49668103P | 2003-08-21 | 2003-08-21 | |
US10/921,954 US20050080613A1 (en) | 2003-08-21 | 2004-08-20 | System and method for processing text utilizing a suite of disambiguation techniques |
Publications (1)
Publication Number | Publication Date |
---|---|
US20050080613A1 true US20050080613A1 (en) | 2005-04-14 |
Family
ID=34216034
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/921,875 Expired - Fee Related US7509313B2 (en) | 2003-08-21 | 2004-08-20 | System and method for processing a query |
US10/921,820 Expired - Fee Related US7895221B2 (en) | 2003-08-21 | 2004-08-20 | Internet searching using semantic disambiguation and expansion |
US10/921,954 Abandoned US20050080613A1 (en) | 2003-08-21 | 2004-08-20 | System and method for processing text utilizing a suite of disambiguation techniques |
US13/031,600 Abandoned US20110202563A1 (en) | 2003-08-21 | 2011-02-21 | Internet searching using semantic disambiguation and expansion |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/921,875 Expired - Fee Related US7509313B2 (en) | 2003-08-21 | 2004-08-20 | System and method for processing a query |
US10/921,820 Expired - Fee Related US7895221B2 (en) | 2003-08-21 | 2004-08-20 | Internet searching using semantic disambiguation and expansion |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/031,600 Abandoned US20110202563A1 (en) | 2003-08-21 | 2011-02-21 | Internet searching using semantic disambiguation and expansion |
Country Status (5)
Country | Link |
---|---|
US (4) | US7509313B2 (en) |
EP (3) | EP1665092A4 (en) |
CN (3) | CN100580666C (en) |
CA (3) | CA2536265C (en) |
WO (3) | WO2005020091A1 (en) |
Cited By (265)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040039988A1 (en) * | 2002-08-20 | 2004-02-26 | Kyu-Woong Lee | Methods and systems for implementing auto-complete in a web page |
US20060123104A1 (en) * | 2004-12-06 | 2006-06-08 | Bmc Software, Inc. | Generic discovery for computer networks |
US20060136585A1 (en) * | 2004-12-06 | 2006-06-22 | Bmc Software, Inc. | Resource reconciliation |
US20070005206A1 (en) * | 2005-07-01 | 2007-01-04 | You Zhang | Automobile interface |
US20070136689A1 (en) * | 2005-12-13 | 2007-06-14 | David Richardson-Bunbury | System for determining probable meanings of inputted words |
US20070143282A1 (en) * | 2005-03-31 | 2007-06-21 | Betz Jonathan T | Anchor text summarization for corroboration |
US20070150800A1 (en) * | 2005-05-31 | 2007-06-28 | Betz Jonathan T | Unsupervised extraction of facts |
US20080016040A1 (en) * | 2006-07-14 | 2008-01-17 | Chacha Search Inc. | Method and system for qualifying keywords in query strings |
US20080056575A1 (en) * | 2006-08-30 | 2008-03-06 | Bradley Jeffery Behm | Method and system for automatically classifying page images |
US20080071533A1 (en) * | 2006-09-14 | 2008-03-20 | Intervoice Limited Partnership | Automatic generation of statistical language models for interactive voice response applications |
US20080086299A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between languages |
US20080086300A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between languages |
US20080086298A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between langauges |
US20080208864A1 (en) * | 2007-02-26 | 2008-08-28 | Microsoft Corporation | Automatic disambiguation based on a reference resource |
JP2009510639A (en) * | 2005-10-04 | 2009-03-12 | トムソン グローバル リソーシーズ | System, method and software for determining ambiguity of medical terms |
US20090070099A1 (en) * | 2006-10-10 | 2009-03-12 | Konstantin Anisimovich | Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system |
US20090089047A1 (en) * | 2007-08-31 | 2009-04-02 | Powerset, Inc. | Natural Language Hypernym Weighting For Word Sense Disambiguation |
WO2009052277A1 (en) | 2007-10-17 | 2009-04-23 | Evri, Inc. | Nlp-based entity recognition and disambiguation |
US20090157384A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US20090182549A1 (en) * | 2006-10-10 | 2009-07-16 | Konstantin Anisimovich | Deep Model Statistics Method for Machine Translation |
US20090234638A1 (en) * | 2008-03-14 | 2009-09-17 | Microsoft Corporation | Use of a Speech Grammar to Recognize Instant Message Input |
US20090307003A1 (en) * | 2008-05-16 | 2009-12-10 | Daniel Benyamin | Social advertisement network |
WO2008100849A3 (en) * | 2007-02-15 | 2009-12-30 | Cycorp, Inc. | Semantics-based method and system for document analysis |
US20090326922A1 (en) * | 2008-06-30 | 2009-12-31 | International Business Machines Corporation | Client side reconciliation of typographical errors in messages from input-limited devices |
US20100042401A1 (en) * | 2007-05-20 | 2010-02-18 | Ascoli Giorgio A | Semantic Cognitive Map |
US20100082657A1 (en) * | 2008-09-23 | 2010-04-01 | Microsoft Corporation | Generating synonyms based on query log data |
US20100161577A1 (en) * | 2008-12-19 | 2010-06-24 | Bmc Software, Inc. | Method of Reconciling Resources in the Metadata Hierarchy |
US20100250250A1 (en) * | 2009-03-30 | 2010-09-30 | Jonathan Wiggs | Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems |
US20100293170A1 (en) * | 2009-05-15 | 2010-11-18 | Citizennet Inc. | Social network message categorization systems and methods |
US20100293179A1 (en) * | 2009-05-14 | 2010-11-18 | Microsoft Corporation | Identifying synonyms of entities using web search |
US20100313258A1 (en) * | 2009-06-04 | 2010-12-09 | Microsoft Corporation | Identifying synonyms of entities using a document collection |
US20110040553A1 (en) * | 2006-11-13 | 2011-02-17 | Sellon Sasivarman | Natural language processing |
US20110047149A1 (en) * | 2009-08-21 | 2011-02-24 | Vaeaenaenen Mikko | Method and means for data searching and language translation |
US20110093455A1 (en) * | 2009-10-21 | 2011-04-21 | Citizennet Inc. | Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency |
US20110153595A1 (en) * | 2009-12-23 | 2011-06-23 | Palo Alto Research Center Incorporated | System And Method For Identifying Topics For Short Text Communications |
US7970766B1 (en) | 2007-07-23 | 2011-06-28 | Google Inc. | Entity type assignment |
US20110231183A1 (en) * | 2008-11-28 | 2011-09-22 | Nec Corporation | Language model creation device |
US20110238637A1 (en) * | 2010-03-26 | 2011-09-29 | Bmc Software, Inc. | Statistical Identification of Instances During Reconciliation Process |
US20110246462A1 (en) * | 2010-03-30 | 2011-10-06 | International Business Machines Corporation | Method and System for Prompting Changes of Electronic Document Content |
US20110289025A1 (en) * | 2010-05-19 | 2011-11-24 | Microsoft Corporation | Learning user intent from rule-based training data |
US20110307254A1 (en) * | 2008-12-11 | 2011-12-15 | Melvyn Hunt | Speech recognition involving a mobile device |
US8122026B1 (en) * | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US8131546B1 (en) * | 2007-01-03 | 2012-03-06 | Stored Iq, Inc. | System and method for adaptive sentence boundary disambiguation |
US20120166414A1 (en) * | 2008-08-11 | 2012-06-28 | Ultra Unilimited Corporation (dba Publish) | Systems and methods for relevance scoring |
US20120209609A1 (en) * | 2011-02-14 | 2012-08-16 | General Motors Llc | User-specific confidence thresholds for speech recognition |
US8260785B2 (en) | 2006-02-17 | 2012-09-04 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
CN102682042A (en) * | 2011-03-18 | 2012-09-19 | 日电(中国)有限公司 | Concept identifying device and method |
US20120239381A1 (en) * | 2011-03-17 | 2012-09-20 | Sap Ag | Semantic phrase suggestion engine |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
US8521517B2 (en) * | 2010-12-13 | 2013-08-27 | Google Inc. | Providing definitions that are sensitive to the context of a text |
US8554854B2 (en) | 2009-12-11 | 2013-10-08 | Citizennet Inc. | Systems and methods for identifying terms relevant to web pages using social network messages |
TWI412277B (en) * | 2009-08-10 | 2013-10-11 | Univ Nat Cheng Kung | Video summarization method based on mining the story-structure and semantic relations among concept entities |
US20130297290A1 (en) * | 2012-05-03 | 2013-11-07 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US8612293B2 (en) | 2010-10-19 | 2013-12-17 | Citizennet Inc. | Generation of advertising targeting information based upon affinity information obtained from an online social network |
US8615434B2 (en) | 2010-10-19 | 2013-12-24 | Citizennet Inc. | Systems and methods for automatically generating campaigns using advertising targeting information based upon affinity information obtained from an online social network |
US8650175B2 (en) | 2005-03-31 | 2014-02-11 | Google Inc. | User interface for facts query engine with snippets from information sources that include query terms and answer terms |
US8682913B1 (en) | 2005-03-31 | 2014-03-25 | Google Inc. | Corroborating facts extracted from multiple sources |
US8700404B1 (en) * | 2005-08-27 | 2014-04-15 | At&T Intellectual Property Ii, L.P. | System and method for using semantic and syntactic graphs for utterance classification |
US20140114649A1 (en) * | 2006-10-10 | 2014-04-24 | Abbyy Infopoisk Llc | Method and system for semantic searching |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
WO2014074317A1 (en) * | 2012-11-08 | 2014-05-15 | Evernote Corporation | Extraction and clarification of ambiguities for addresses in documents |
US8745019B2 (en) | 2012-03-05 | 2014-06-03 | Microsoft Corporation | Robust discovery of entity synonyms using query logs |
US20140156703A1 (en) * | 2012-11-30 | 2014-06-05 | Altera Corporation | Method and apparatus for translating graphical symbols into query keywords |
WO2014104943A1 (en) * | 2012-12-27 | 2014-07-03 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US8812435B1 (en) | 2007-11-16 | 2014-08-19 | Google Inc. | Learning objects and facts from documents |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US20140342320A1 (en) * | 2013-02-15 | 2014-11-20 | Voxy, Inc. | Language learning systems and methods |
US20140343922A1 (en) * | 2011-05-10 | 2014-11-20 | Nec Corporation | Device, method and program for assessing synonymous expressions |
US20150006155A1 (en) * | 2012-03-07 | 2015-01-01 | Mitsubishi Electric Corporation | Device, method, and program for word sense estimation |
US8935230B2 (en) | 2011-08-25 | 2015-01-13 | Sap Se | Self-learning semantic search engine |
US20150019204A1 (en) * | 2013-07-12 | 2015-01-15 | Microsoft Corporation | Feature completion in computer-human interactive learning |
US8959011B2 (en) | 2007-03-22 | 2015-02-17 | Abbyy Infopoisk Llc | Indicating and correcting errors in machine translation systems |
US8971630B2 (en) | 2012-04-27 | 2015-03-03 | Abbyy Development Llc | Fast CJK character recognition |
US8989485B2 (en) | 2012-04-27 | 2015-03-24 | Abbyy Development Llc | Detecting a junction in a text line of CJK characters |
US8996470B1 (en) | 2005-05-31 | 2015-03-31 | Google Inc. | System for ensuring the internal consistency of a fact repository |
US9002892B2 (en) | 2011-08-07 | 2015-04-07 | CitizenNet, Inc. | Systems and methods for trend detection using frequency analysis |
US9047275B2 (en) | 2006-10-10 | 2015-06-02 | Abbyy Infopoisk Llc | Methods and systems for alignment of parallel text corpora |
US9053497B2 (en) | 2012-04-27 | 2015-06-09 | CitizenNet, Inc. | Systems and methods for targeting advertising to groups with strong ties within an online social network |
US9063927B2 (en) | 2011-04-06 | 2015-06-23 | Citizennet Inc. | Short message age classification |
US9093073B1 (en) * | 2007-02-12 | 2015-07-28 | West Corporation | Automatic speech recognition tagging |
US9158799B2 (en) | 2013-03-14 | 2015-10-13 | Bmc Software, Inc. | Storing and retrieving context sensitive data in a management system |
US20150379090A1 (en) * | 2014-06-26 | 2015-12-31 | International Business Machines Corporation | Mining product aspects from opinion text |
US9229924B2 (en) | 2012-08-24 | 2016-01-05 | Microsoft Technology Licensing, Llc | Word detection and domain dictionary recommendation |
US9235573B2 (en) | 2006-10-10 | 2016-01-12 | Abbyy Infopoisk Llc | Universal difference measure |
US20160012020A1 (en) * | 2014-07-14 | 2016-01-14 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US9239826B2 (en) | 2007-06-27 | 2016-01-19 | Abbyy Infopoisk Llc | Method and system for generating new entries in natural language dictionary |
US20160019287A1 (en) * | 2010-05-14 | 2016-01-21 | Salesforce.Com, Inc. | Querying a database using relationship metadata |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9262409B2 (en) | 2008-08-06 | 2016-02-16 | Abbyy Infopoisk Llc | Translation of a selected text fragment of a screen |
US9269353B1 (en) * | 2011-12-07 | 2016-02-23 | Manu Rehani | Methods and systems for measuring semantics in communications |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9305103B2 (en) * | 2012-07-03 | 2016-04-05 | Yahoo! Inc. | Method or system for semantic categorization |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160147737A1 (en) * | 2014-11-20 | 2016-05-26 | Electronics And Telecommunications Research Institute | Question answering system and method for structured knowledgebase using deep natual language question analysis |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US20160217501A1 (en) * | 2015-01-23 | 2016-07-28 | Conversica, Llc | Systems and methods for processing message exchanges using artificial intelligence |
EP2115630A4 (en) * | 2007-01-04 | 2016-08-17 | Thinking Solutions Pty Ltd | Linguistic analysis |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US20160302196A1 (en) * | 2015-04-09 | 2016-10-13 | Hong Kong Applied Science And Technology Research Institute Co., Ltd. | Systems and methods for using high probability area and availability probability determinations for white space channel identification |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US20160357853A1 (en) * | 2015-06-05 | 2016-12-08 | Apple Inc. | Systems and methods for providing improved search functionality on a client device |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9594831B2 (en) | 2012-06-22 | 2017-03-14 | Microsoft Technology Licensing, Llc | Targeted disambiguation of named entities |
US9600566B2 (en) | 2010-05-14 | 2017-03-21 | Microsoft Technology Licensing, Llc | Identifying entity synonyms |
US9606986B2 (en) | 2014-09-29 | 2017-03-28 | Apple Inc. | Integrated word N-gram and class M-gram language models |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9626353B2 (en) | 2014-01-15 | 2017-04-18 | Abbyy Infopoisk Llc | Arc filtering in a syntactic graph |
US9626358B2 (en) | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9633005B2 (en) | 2006-10-10 | 2017-04-25 | Abbyy Infopoisk Llc | Exhaustive automatic processing of textual information |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
CN106709011A (en) * | 2016-12-26 | 2017-05-24 | 武汉大学 | Positional concept hierarchy disambiguation calculation method based on spatial locating cluster |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9740682B2 (en) | 2013-12-19 | 2017-08-22 | Abbyy Infopoisk Llc | Semantic disambiguation using a statistical analysis |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9760627B1 (en) * | 2016-05-13 | 2017-09-12 | International Business Machines Corporation | Private-public context analysis for natural language content disambiguation |
US9772995B2 (en) | 2012-12-27 | 2017-09-26 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9824084B2 (en) | 2015-03-19 | 2017-11-21 | Yandex Europe Ag | Method for word sense disambiguation for homonym words based on part of speech (POS) tag of a non-homonym word |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9858506B2 (en) | 2014-09-02 | 2018-01-02 | Abbyy Development Llc | Methods and systems for processing of images of mathematical expressions |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934313B2 (en) | 2007-03-14 | 2018-04-03 | Fiver Llc | Query templates and labeled search tip system, methods and techniques |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9984071B2 (en) | 2006-10-10 | 2018-05-29 | Abbyy Production Llc | Language ambiguity detection of text |
WO2018118302A1 (en) * | 2016-12-21 | 2018-06-28 | Intel Corporation | Methods and apparatus to identify a count of n-grams appearing in a corpus |
US10032131B2 (en) | 2012-06-20 | 2018-07-24 | Microsoft Technology Licensing, Llc | Data services for enterprises leveraging search system data assets |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049150B2 (en) | 2010-11-01 | 2018-08-14 | Fiver Llc | Category-based content recommendation |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10055410B1 (en) * | 2017-05-03 | 2018-08-21 | International Business Machines Corporation | Corpus-scoped annotation and analysis |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US20180239751A1 (en) * | 2017-02-22 | 2018-08-23 | Google Inc. | Optimized graph traversal |
US10068022B2 (en) | 2011-06-03 | 2018-09-04 | Google Llc | Identifying topical entities |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127296B2 (en) | 2011-04-07 | 2018-11-13 | Bmc Software, Inc. | Cooperative naming for configuration items in a distributed configuration management database environment |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10152538B2 (en) | 2013-05-06 | 2018-12-11 | Dropbox, Inc. | Suggested search based on a content item |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN109214007A (en) * | 2018-09-19 | 2019-01-15 | 哈尔滨理工大学 | A kind of Chinese sentence meaning of a word based on convolutional neural networks disappears qi method |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10191899B2 (en) | 2016-06-06 | 2019-01-29 | Comigo Ltd. | System and method for understanding text using a translation of the text |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10268965B2 (en) * | 2015-10-27 | 2019-04-23 | Yardi Systems, Inc. | Dictionary enhancement technique for business name categorization |
US10274983B2 (en) * | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Extended business name categorization apparatus and method |
US10275708B2 (en) * | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Criteria enhancement technique for business name categorization |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10331783B2 (en) | 2010-03-30 | 2019-06-25 | Fiver Llc | NLP-based systems and methods for providing quotations |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10372824B2 (en) * | 2017-05-15 | 2019-08-06 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US20190295531A1 (en) * | 2016-10-20 | 2019-09-26 | Google Llc | Determining phonetic relationships |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10460229B1 (en) * | 2016-03-18 | 2019-10-29 | Google Llc | Determining word senses using neural networks |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
RU2710966C2 (en) * | 2015-01-23 | 2020-01-14 | МАЙКРОСОФТ ТЕКНОЛОДЖИ ЛАЙСЕНСИНГ, ЭлЭлСи | Methods for understanding incomplete natural language query |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US20200104379A1 (en) * | 2018-09-28 | 2020-04-02 | Io-Tahoe LLC. | System and method for tagging database properties |
US10652592B2 (en) | 2017-07-02 | 2020-05-12 | Comigo Ltd. | Named entity disambiguation for providing TV content enrichment |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10726061B2 (en) | 2017-11-17 | 2020-07-28 | International Business Machines Corporation | Identifying text for labeling utilizing topic modeling-based text clustering |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10832680B2 (en) | 2018-11-27 | 2020-11-10 | International Business Machines Corporation | Speech-to-text engine customization |
US20200394257A1 (en) * | 2019-06-17 | 2020-12-17 | The Boeing Company | Predictive query processing for complex system lifecycle management |
US10872080B2 (en) * | 2017-04-24 | 2020-12-22 | Oath Inc. | Reducing query ambiguity using graph matching |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11010555B2 (en) | 2015-01-23 | 2021-05-18 | Conversica, Inc. | Systems and methods for automated question response |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20210201932A1 (en) * | 2013-05-07 | 2021-07-01 | Veveo, Inc. | Method of and system for real time feedback in an incremental speech input interface |
US11100285B2 (en) | 2015-01-23 | 2021-08-24 | Conversica, Inc. | Systems and methods for configurable messaging with feature extraction |
US11106871B2 (en) | 2015-01-23 | 2021-08-31 | Conversica, Inc. | Systems and methods for configurable messaging response-action engine |
CN113361283A (en) * | 2021-06-28 | 2021-09-07 | 东南大学 | Web table-oriented paired entity joint disambiguation method |
US11170770B2 (en) * | 2018-08-03 | 2021-11-09 | International Business Machines Corporation | Dynamic adjustment of response thresholds in a dialogue system |
US11216742B2 (en) | 2019-03-04 | 2022-01-04 | Iocurrents, Inc. | Data compression and communication using machine learning |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11216718B2 (en) * | 2015-10-27 | 2022-01-04 | Yardi Systems, Inc. | Energy management system |
US11222057B2 (en) * | 2019-08-07 | 2022-01-11 | International Business Machines Corporation | Methods and systems for generating descriptions utilizing extracted entity descriptors |
US11237713B2 (en) * | 2019-01-21 | 2022-02-01 | International Business Machines Corporation | Graphical user interface based feature extraction application for machine learning and cognitive models |
US11308128B2 (en) * | 2017-12-11 | 2022-04-19 | International Business Machines Corporation | Refining classification results based on glossary relationships |
JP2022071194A (en) * | 2012-07-31 | 2022-05-13 | ベベオ, インコーポレイテッド | Cancellation of ambiguity in user's intention in conversation type interaction |
US11361416B2 (en) | 2018-03-20 | 2022-06-14 | Netflix, Inc. | Quantifying encoding comparison metric uncertainty via bootstrapping |
US11423023B2 (en) | 2015-06-05 | 2022-08-23 | Apple Inc. | Systems and methods for providing improved search functionality on a client device |
US11494557B1 (en) | 2021-05-17 | 2022-11-08 | Verantos, Inc. | System and method for term disambiguation |
US11551188B2 (en) | 2015-01-23 | 2023-01-10 | Conversica, Inc. | Systems and methods for improved automated conversations with attendant actions |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US20230132090A1 (en) * | 2021-10-22 | 2023-04-27 | Tencent America LLC | Bridging semantics between words and definitions via aligning word sense inventories |
US11663409B2 (en) | 2015-01-23 | 2023-05-30 | Conversica, Inc. | Systems and methods for training machine learning models using active learning |
US11710574B2 (en) | 2021-01-27 | 2023-07-25 | Verantos, Inc. | High validity real-world evidence study with deep phenotyping |
US11811889B2 (en) | 2015-01-30 | 2023-11-07 | Rovi Guides, Inc. | Systems and methods for resolving ambiguous terms based on media asset schedule |
US12032643B2 (en) | 2012-07-20 | 2024-07-09 | Veveo, Inc. | Method of and system for inferring user intent in search input in a conversational interaction system |
Families Citing this family (240)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060116865A1 (en) | 1999-09-17 | 2006-06-01 | Www.Uniscape.Com | E-services translation utilizing machine translation and translation memory |
US6804662B1 (en) * | 2000-10-27 | 2004-10-12 | Plumtree Software, Inc. | Method and apparatus for query and analysis |
US7904595B2 (en) | 2001-01-18 | 2011-03-08 | Sdl International America Incorporated | Globalization management system and method therefor |
US20070136251A1 (en) * | 2003-08-21 | 2007-06-14 | Idilia Inc. | System and Method for Processing a Query |
US7548910B1 (en) * | 2004-01-30 | 2009-06-16 | The Regents Of The University Of California | System and method for retrieving scenario-specific documents |
US7983896B2 (en) | 2004-03-05 | 2011-07-19 | SDL Language Technology | In-context exact (ICE) matching |
US7409402B1 (en) * | 2005-09-20 | 2008-08-05 | Yahoo! Inc. | Systems and methods for presenting advertising content based on publisher-selected labels |
US8972856B2 (en) * | 2004-07-29 | 2015-03-03 | Yahoo! Inc. | Document modification by a client-side application |
US7421441B1 (en) * | 2005-09-20 | 2008-09-02 | Yahoo! Inc. | Systems and methods for presenting information based on publisher-selected labels |
US7603349B1 (en) | 2004-07-29 | 2009-10-13 | Yahoo! Inc. | User interfaces for search systems using in-line contextual queries |
US7856441B1 (en) * | 2005-01-10 | 2010-12-21 | Yahoo! Inc. | Search systems and methods using enhanced contextual queries |
US7958115B2 (en) * | 2004-07-29 | 2011-06-07 | Yahoo! Inc. | Search systems and methods using in-line contextual queries |
US7895218B2 (en) * | 2004-11-09 | 2011-02-22 | Veveo, Inc. | Method and system for performing searches for television content using reduced text input |
US20060101504A1 (en) * | 2004-11-09 | 2006-05-11 | Veveo.Tv, Inc. | Method and system for performing searches for television content and channels using a non-intrusive television interface and with reduced text input |
US20070266406A1 (en) * | 2004-11-09 | 2007-11-15 | Murali Aravamudan | Method and system for performing actions using a non-intrusive television with reduced text input |
WO2006086179A2 (en) * | 2005-01-31 | 2006-08-17 | Textdigger, Inc. | Method and system for semantic search and retrieval of electronic documents |
US20060188864A1 (en) * | 2005-01-31 | 2006-08-24 | Pankaj Shah | Automated transfer of data from PC clients |
WO2006096260A2 (en) * | 2005-01-31 | 2006-09-14 | Musgrove Technology Enterprises, Llc | System and method for generating an interlinked taxonomy structure |
US8150846B2 (en) * | 2005-02-17 | 2012-04-03 | Microsoft Corporation | Content searching and configuration of search results |
CN1841372A (en) * | 2005-03-29 | 2006-10-04 | 国际商业机器公司 | Method and apparatus for helping user to forming structured diagram according to non-structured information source |
US9104779B2 (en) | 2005-03-30 | 2015-08-11 | Primal Fusion Inc. | Systems and methods for analyzing and synthesizing complex knowledge representations |
US9177248B2 (en) | 2005-03-30 | 2015-11-03 | Primal Fusion Inc. | Knowledge representation systems and methods incorporating customization |
US10002325B2 (en) | 2005-03-30 | 2018-06-19 | Primal Fusion Inc. | Knowledge representation systems and methods incorporating inference rules |
US7849090B2 (en) | 2005-03-30 | 2010-12-07 | Primal Fusion Inc. | System, method and computer program for faceted classification synthesis |
US8849860B2 (en) | 2005-03-30 | 2014-09-30 | Primal Fusion Inc. | Systems and methods for applying statistical inference techniques to knowledge representations |
US9378203B2 (en) | 2008-05-01 | 2016-06-28 | Primal Fusion Inc. | Methods and apparatus for providing information of interest to one or more users |
JP2008537225A (en) * | 2005-04-11 | 2008-09-11 | テキストディガー,インコーポレイテッド | Search system and method for queries |
WO2006113597A2 (en) * | 2005-04-14 | 2006-10-26 | The Regents Of The University Of California | Method for information retrieval |
US7962504B1 (en) | 2005-05-26 | 2011-06-14 | Aol Inc. | Sourcing terms into a search engine |
US7702665B2 (en) * | 2005-06-14 | 2010-04-20 | Colloquis, Inc. | Methods and apparatus for evaluating semantic proximity |
KR100544514B1 (en) * | 2005-06-27 | 2006-01-24 | 엔에이치엔(주) | Method and system for determining relation between search terms in the internet search system |
US10198521B2 (en) * | 2005-06-27 | 2019-02-05 | Google Llc | Processing ambiguous search requests in a geographic information system |
US7788266B2 (en) | 2005-08-26 | 2010-08-31 | Veveo, Inc. | Method and system for processing ambiguous, multi-term search queries |
US8321198B2 (en) * | 2005-09-06 | 2012-11-27 | Kabushiki Kaisha Square Enix | Data extraction system, terminal, server, programs, and media for extracting data via a morphological analysis |
US7711737B2 (en) * | 2005-09-12 | 2010-05-04 | Microsoft Corporation | Multi-document keyphrase extraction using partial mutual information |
US7620607B1 (en) * | 2005-09-26 | 2009-11-17 | Quintura Inc. | System and method for using a bidirectional neural network to identify sentences for use as document annotations |
US7475072B1 (en) | 2005-09-26 | 2009-01-06 | Quintura, Inc. | Context-based search visualization and context management using neural networks |
KR100724122B1 (en) * | 2005-09-28 | 2007-06-04 | 최진근 | System and its method for managing database of bundle data storing related structure of data |
US7958124B2 (en) * | 2005-09-28 | 2011-06-07 | Choi Jin-Keun | System and method for managing bundle data database storing data association structure |
US10319252B2 (en) | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US7644054B2 (en) * | 2005-11-23 | 2010-01-05 | Veveo, Inc. | System and method for finding desired results by incremental search using an ambiguous keypad with the input containing orthographic and typographic errors |
US20080228738A1 (en) * | 2005-12-13 | 2008-09-18 | Wisteme, Llc | Web based open knowledge system with user-editable attributes |
US7660786B2 (en) * | 2005-12-14 | 2010-02-09 | Microsoft Corporation | Data independent relevance evaluation utilizing cognitive concept relationship |
US8694530B2 (en) * | 2006-01-03 | 2014-04-08 | Textdigger, Inc. | Search system with query refinement and search method |
US20070185860A1 (en) * | 2006-01-24 | 2007-08-09 | Michael Lissack | System for searching |
US7739225B2 (en) | 2006-02-09 | 2010-06-15 | Ebay Inc. | Method and system to analyze aspect rules based on domain coverage of an aspect-value pair |
US7739226B2 (en) * | 2006-02-09 | 2010-06-15 | Ebay Inc. | Method and system to analyze aspect rules based on domain coverage of the aspect rules |
US8380698B2 (en) * | 2006-02-09 | 2013-02-19 | Ebay Inc. | Methods and systems to generate rules to identify data items |
US7640234B2 (en) * | 2006-02-09 | 2009-12-29 | Ebay Inc. | Methods and systems to communicate information |
US7849047B2 (en) | 2006-02-09 | 2010-12-07 | Ebay Inc. | Method and system to analyze domain rules based on domain coverage of the domain rules |
US9443333B2 (en) * | 2006-02-09 | 2016-09-13 | Ebay Inc. | Methods and systems to communicate information |
US7725417B2 (en) * | 2006-02-09 | 2010-05-25 | Ebay Inc. | Method and system to analyze rules based on popular query coverage |
US8195683B2 (en) * | 2006-02-28 | 2012-06-05 | Ebay Inc. | Expansion of database search queries |
US7657526B2 (en) | 2006-03-06 | 2010-02-02 | Veveo, Inc. | Methods and systems for selecting and presenting content based on activity level spikes associated with the content |
US7624130B2 (en) * | 2006-03-30 | 2009-11-24 | Microsoft Corporation | System and method for exploring a semantic file network |
US20070255693A1 (en) * | 2006-03-30 | 2007-11-01 | Veveo, Inc. | User interface method and system for incrementally searching and selecting content items and for presenting advertising in response to search activities |
US8073860B2 (en) * | 2006-03-30 | 2011-12-06 | Veveo, Inc. | Method and system for incrementally selecting and providing relevant search engines in response to a user query |
US7634471B2 (en) * | 2006-03-30 | 2009-12-15 | Microsoft Corporation | Adaptive grouping in a file network |
US9135238B2 (en) * | 2006-03-31 | 2015-09-15 | Google Inc. | Disambiguation of named entities |
US8862573B2 (en) * | 2006-04-04 | 2014-10-14 | Textdigger, Inc. | Search system and method with text function tagging |
US7461061B2 (en) | 2006-04-20 | 2008-12-02 | Veveo, Inc. | User interface methods and systems for selecting and presenting content based on user navigation and selection actions associated with the content |
US8150827B2 (en) * | 2006-06-07 | 2012-04-03 | Renew Data Corp. | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
US20080004920A1 (en) * | 2006-06-30 | 2008-01-03 | Unisys Corporation | Airline management system generating routings in real-time |
US7792967B2 (en) * | 2006-07-14 | 2010-09-07 | Chacha Search, Inc. | Method and system for sharing and accessing resources |
US7698328B2 (en) * | 2006-08-11 | 2010-04-13 | Apple Inc. | User-directed search refinement |
US8589869B2 (en) * | 2006-09-07 | 2013-11-19 | Wolfram Alpha Llc | Methods and systems for determining a formula |
US20080071744A1 (en) * | 2006-09-18 | 2008-03-20 | Elad Yom-Tov | Method and System for Interactively Navigating Search Results |
WO2008045690A2 (en) | 2006-10-06 | 2008-04-17 | Veveo, Inc. | Linear character selection display interface for ambiguous text input |
US9098489B2 (en) | 2006-10-10 | 2015-08-04 | Abbyy Infopoisk Llc | Method and system for semantic searching |
US9892111B2 (en) | 2006-10-10 | 2018-02-13 | Abbyy Production Llc | Method and device to estimate similarity between documents having multiple segments |
US9189482B2 (en) | 2012-10-10 | 2015-11-17 | Abbyy Infopoisk Llc | Similar document search |
US9069750B2 (en) | 2006-10-10 | 2015-06-30 | Abbyy Infopoisk Llc | Method and system for semantic searching of natural language texts |
US9075864B2 (en) | 2006-10-10 | 2015-07-07 | Abbyy Infopoisk Llc | Method and system for semantic searching using syntactic and semantic analysis |
RU2618375C2 (en) * | 2015-07-02 | 2017-05-03 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Expanding of information search possibility |
US9495358B2 (en) | 2006-10-10 | 2016-11-15 | Abbyy Infopoisk Llc | Cross-language text clustering |
US8359190B2 (en) * | 2006-10-27 | 2013-01-22 | Hewlett-Packard Development Company, L.P. | Identifying semantic positions of portions of a text |
CN100507915C (en) * | 2006-11-09 | 2009-07-01 | 华为技术有限公司 | Network search method, network search device, and user terminals |
US8078884B2 (en) | 2006-11-13 | 2011-12-13 | Veveo, Inc. | Method of and system for selecting and presenting content based on user identification |
US8635203B2 (en) * | 2006-11-16 | 2014-01-21 | Yahoo! Inc. | Systems and methods using query patterns to disambiguate query intent |
US7437370B1 (en) * | 2007-02-19 | 2008-10-14 | Quintura, Inc. | Search engine graphical interface using maps and images |
WO2008118884A1 (en) * | 2007-03-23 | 2008-10-02 | Ruttenberg Steven E | Method of prediciting affinity between entities |
US7809714B1 (en) | 2007-04-30 | 2010-10-05 | Lawrence Richard Smith | Process for enhancing queries for information retrieval |
US8549424B2 (en) * | 2007-05-25 | 2013-10-01 | Veveo, Inc. | System and method for text disambiguation and context designation in incremental search |
US20080313574A1 (en) * | 2007-05-25 | 2008-12-18 | Veveo, Inc. | System and method for search with reduced physical interaction requirements |
US9002869B2 (en) * | 2007-06-22 | 2015-04-07 | Google Inc. | Machine translation for query expansion |
US8280721B2 (en) * | 2007-08-31 | 2012-10-02 | Microsoft Corporation | Efficiently representing word sense probabilities |
US8145660B2 (en) * | 2007-10-05 | 2012-03-27 | Fujitsu Limited | Implementing an expanded search and providing expanded search results |
US20090094211A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Implementing an expanded search and providing expanded search results |
US8108405B2 (en) | 2007-10-05 | 2012-01-31 | Fujitsu Limited | Refining a search space in response to user input |
US20090094210A1 (en) * | 2007-10-05 | 2009-04-09 | Fujitsu Limited | Intelligently sorted search results |
US8543380B2 (en) | 2007-10-05 | 2013-09-24 | Fujitsu Limited | Determining a document specificity |
US20090254540A1 (en) * | 2007-11-01 | 2009-10-08 | Textdigger, Inc. | Method and apparatus for automated tag generation for digital content |
US8943539B2 (en) | 2007-11-21 | 2015-01-27 | Rovi Guides, Inc. | Enabling a friend to remotely modify user data |
US8019772B2 (en) * | 2007-12-05 | 2011-09-13 | International Business Machines Corporation | Computer method and apparatus for tag pre-search in social software |
US9501467B2 (en) | 2007-12-21 | 2016-11-22 | Thomson Reuters Global Resources | Systems, methods, software and interfaces for entity extraction and resolution and tagging |
WO2009094633A1 (en) | 2008-01-25 | 2009-07-30 | Chacha Search, Inc. | Method and system for access to restricted resource(s) |
WO2009097558A2 (en) | 2008-01-30 | 2009-08-06 | Thomson Reuters Global Resources | Financial event and relationship extraction |
US8392436B2 (en) * | 2008-02-07 | 2013-03-05 | Nec Laboratories America, Inc. | Semantic search via role labeling |
US10269024B2 (en) * | 2008-02-08 | 2019-04-23 | Outbrain Inc. | Systems and methods for identifying and measuring trends in consumer content demand within vertically associated websites and related content |
US8180754B1 (en) * | 2008-04-01 | 2012-05-15 | Dranias Development Llc | Semantic neural network for aggregating query searches |
US8112431B2 (en) * | 2008-04-03 | 2012-02-07 | Ebay Inc. | Method and system for processing search requests |
US9361365B2 (en) | 2008-05-01 | 2016-06-07 | Primal Fusion Inc. | Methods and apparatus for searching of content using semantic synthesis |
CN106845645B (en) | 2008-05-01 | 2020-08-04 | 启创互联公司 | Method and system for generating semantic network and for media composition |
US8676732B2 (en) | 2008-05-01 | 2014-03-18 | Primal Fusion Inc. | Methods and apparatus for providing information of interest to one or more users |
CN106250371A (en) | 2008-08-29 | 2016-12-21 | 启创互联公司 | For utilizing the definition of existing territory to carry out the system and method that semantic concept definition and semantic concept relation is comprehensive |
GB2463669A (en) * | 2008-09-19 | 2010-03-24 | Motorola Inc | Using a semantic graph to expand characterising terms of a content item and achieve targeted selection of associated content items |
US20100131513A1 (en) | 2008-10-23 | 2010-05-27 | Lundberg Steven W | Patent mapping |
US8260605B2 (en) | 2008-12-09 | 2012-09-04 | University Of Houston System | Word sense disambiguation |
US8108393B2 (en) | 2009-01-09 | 2012-01-31 | Hulu Llc | Method and apparatus for searching media program databases |
US8463806B2 (en) * | 2009-01-30 | 2013-06-11 | Lexisnexis | Methods and systems for creating and using an adaptive thesaurus |
US20100217768A1 (en) * | 2009-02-20 | 2010-08-26 | Hong Yu | Query System for Biomedical Literature Using Keyword Weighted Queries |
US20110301941A1 (en) * | 2009-03-20 | 2011-12-08 | Syl Research Limited | Natural language processing method and system |
CN101840397A (en) * | 2009-03-20 | 2010-09-22 | 日电(中国)有限公司 | Word sense disambiguation method and system |
GB201016385D0 (en) * | 2010-09-29 | 2010-11-10 | Touchtype Ltd | System and method for inputting text into electronic devices |
US20100281025A1 (en) * | 2009-05-04 | 2010-11-04 | Motorola, Inc. | Method and system for recommendation of content items |
US8601015B1 (en) * | 2009-05-15 | 2013-12-03 | Wolfram Alpha Llc | Dynamic example generation for queries |
US8788524B1 (en) | 2009-05-15 | 2014-07-22 | Wolfram Alpha Llc | Method and system for responding to queries in an imprecise syntax |
CN101901210A (en) * | 2009-05-25 | 2010-12-01 | 日电(中国)有限公司 | Word meaning disambiguating system and method |
US8370275B2 (en) | 2009-06-30 | 2013-02-05 | International Business Machines Corporation | Detecting factual inconsistencies between a document and a fact-base |
US9396485B2 (en) * | 2009-12-24 | 2016-07-19 | Outbrain Inc. | Systems and methods for presenting content |
US20110040604A1 (en) * | 2009-08-13 | 2011-02-17 | Vertical Acuity, Inc. | Systems and Methods for Providing Targeted Content |
US9292855B2 (en) | 2009-09-08 | 2016-03-22 | Primal Fusion Inc. | Synthesizing messaging using context provided by consumers |
US11023675B1 (en) | 2009-11-03 | 2021-06-01 | Alphasense OY | User interface for use with a search engine for searching financial related documents |
US9262520B2 (en) | 2009-11-10 | 2016-02-16 | Primal Fusion Inc. | System, method and computer program for creating and manipulating data structures using an interactive graphical interface |
US20110119047A1 (en) * | 2009-11-19 | 2011-05-19 | Tatu Ylonen Oy Ltd | Joint disambiguation of the meaning of a natural language expression |
US8504355B2 (en) * | 2009-11-20 | 2013-08-06 | Clausal Computing Oy | Joint disambiguation of syntactic and semantic ambiguity |
US9208259B2 (en) * | 2009-12-02 | 2015-12-08 | International Business Machines Corporation | Using symbols to search local and remote data stores |
US10713666B2 (en) | 2009-12-24 | 2020-07-14 | Outbrain Inc. | Systems and methods for curating content |
US10607235B2 (en) * | 2009-12-24 | 2020-03-31 | Outbrain Inc. | Systems and methods for curating content |
US20110197137A1 (en) * | 2009-12-24 | 2011-08-11 | Vertical Acuity, Inc. | Systems and Methods for Rating Content |
US20110161091A1 (en) * | 2009-12-24 | 2011-06-30 | Vertical Acuity, Inc. | Systems and Methods for Connecting Entities Through Content |
US20110191330A1 (en) * | 2010-02-04 | 2011-08-04 | Veveo, Inc. | Method of and System for Enhanced Content Discovery Based on Network and Device Access Behavior |
US9684683B2 (en) * | 2010-02-09 | 2017-06-20 | Siemens Aktiengesellschaft | Semantic search tool for document tagging, indexing and search |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US8341099B2 (en) * | 2010-03-12 | 2012-12-25 | Microsoft Corporation | Semantics update and adaptive interfaces in connection with information as a service |
US8484015B1 (en) | 2010-05-14 | 2013-07-09 | Wolfram Alpha Llc | Entity pages |
US10474647B2 (en) | 2010-06-22 | 2019-11-12 | Primal Fusion Inc. | Methods and devices for customizing knowledge representation systems |
US9235806B2 (en) | 2010-06-22 | 2016-01-12 | Primal Fusion Inc. | Methods and devices for customizing knowledge representation systems |
US8812298B1 (en) | 2010-07-28 | 2014-08-19 | Wolfram Alpha Llc | Macro replacement of natural language input |
US9703871B1 (en) | 2010-07-30 | 2017-07-11 | Google Inc. | Generating query refinements using query components |
GB201200643D0 (en) | 2012-01-16 | 2012-02-29 | Touchtype Ltd | System and method for inputting text |
US9779168B2 (en) | 2010-10-04 | 2017-10-03 | Excalibur Ip, Llc | Contextual quick-picks |
US9418155B2 (en) | 2010-10-14 | 2016-08-16 | Microsoft Technology Licensing, Llc | Disambiguation of entities |
US20120124028A1 (en) * | 2010-11-12 | 2012-05-17 | Microsoft Corporation | Unified Application Discovery across Application Stores |
US11294977B2 (en) | 2011-06-20 | 2022-04-05 | Primal Fusion Inc. | Techniques for presenting content to a user based on the user's preferences |
US10657540B2 (en) | 2011-01-29 | 2020-05-19 | Sdl Netherlands B.V. | Systems, methods, and media for web content management |
US9547626B2 (en) | 2011-01-29 | 2017-01-17 | Sdl Plc | Systems, methods, and media for managing ambient adaptability of web applications and web services |
US10580015B2 (en) | 2011-02-25 | 2020-03-03 | Sdl Netherlands B.V. | Systems, methods, and media for executing and optimizing online marketing initiatives |
US10140320B2 (en) | 2011-02-28 | 2018-11-27 | Sdl Inc. | Systems, methods, and media for generating analytical data |
US9904726B2 (en) | 2011-05-04 | 2018-02-27 | Black Hills IP Holdings, LLC. | Apparatus and method for automated and assisted patent claim mapping and expense planning |
US20120324367A1 (en) | 2011-06-20 | 2012-12-20 | Primal Fusion Inc. | System and method for obtaining preferences with a user interface |
US9069814B2 (en) | 2011-07-27 | 2015-06-30 | Wolfram Alpha Llc | Method and system for using natural language to generate widgets |
US9984054B2 (en) | 2011-08-24 | 2018-05-29 | Sdl Inc. | Web interface including the review and manipulation of a web document and utilizing permission based control |
US9734252B2 (en) | 2011-09-08 | 2017-08-15 | Wolfram Alpha Llc | Method and system for analyzing data using a query answering system |
US20130085946A1 (en) | 2011-10-03 | 2013-04-04 | Steven W. Lundberg | Systems, methods and user interfaces in a patent management system |
CN102937966A (en) * | 2011-10-11 | 2013-02-20 | 微软公司 | Finding and consuming related data |
US8996549B2 (en) * | 2011-10-11 | 2015-03-31 | Microsoft Technology Licensing, Llc | Recommending data based on user and data attributes |
CN102999553B (en) * | 2011-10-11 | 2016-02-24 | 微软技术许可有限责任公司 | Based on user and data attribute recommending data |
US20130091163A1 (en) * | 2011-10-11 | 2013-04-11 | Microsoft Corporation | Discovering and consuming related data |
CN103049474A (en) * | 2011-10-25 | 2013-04-17 | 微软公司 | Search query and document-related data translation |
US9501759B2 (en) * | 2011-10-25 | 2016-11-22 | Microsoft Technology Licensing, Llc | Search query and document-related data translation |
US9569439B2 (en) | 2011-10-31 | 2017-02-14 | Elwha Llc | Context-sensitive query enrichment |
US9851950B2 (en) | 2011-11-15 | 2017-12-26 | Wolfram Alpha Llc | Programming in a precise syntax using natural language |
US8793199B2 (en) | 2012-02-29 | 2014-07-29 | International Business Machines Corporation | Extraction of information from clinical reports |
CN103294661A (en) * | 2012-03-01 | 2013-09-11 | 富泰华工业(深圳)有限公司 | Language ambiguity eliminating system and method |
US9773270B2 (en) | 2012-05-11 | 2017-09-26 | Fredhopper B.V. | Method and system for recommending products based on a ranking cocktail |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
EP2701087A4 (en) * | 2012-06-27 | 2014-07-09 | Rakuten Inc | Information processing device, information processing method, and information processing program |
US9405424B2 (en) | 2012-08-29 | 2016-08-02 | Wolfram Alpha, Llc | Method and system for distributing and displaying graphical items |
US10452740B2 (en) | 2012-09-14 | 2019-10-22 | Sdl Netherlands B.V. | External content libraries |
US11386186B2 (en) | 2012-09-14 | 2022-07-12 | Sdl Netherlands B.V. | External content library connector systems and methods |
US11308528B2 (en) | 2012-09-14 | 2022-04-19 | Sdl Netherlands B.V. | Blueprinting of multimedia assets |
US9916306B2 (en) | 2012-10-19 | 2018-03-13 | Sdl Inc. | Statistical linguistic analysis of source content |
US9009197B2 (en) | 2012-11-05 | 2015-04-14 | Unified Compliance Framework (Network Frontiers) | Methods and systems for a compliance framework database schema |
US9575954B2 (en) | 2012-11-05 | 2017-02-21 | Unified Compliance Framework (Network Frontiers) | Structured dictionary |
US8892597B1 (en) | 2012-12-11 | 2014-11-18 | Google Inc. | Selecting data collections to search based on the query |
CN103914476B (en) * | 2013-01-05 | 2017-02-01 | 北京百度网讯科技有限公司 | Search guiding method and search engine |
WO2014127301A2 (en) | 2013-02-14 | 2014-08-21 | 24/7 Customer, Inc. | Categorization of user interactions into predefined hierarchical categories |
US9305102B2 (en) | 2013-02-27 | 2016-04-05 | Google Inc. | Systems and methods for providing personalized search results based on prior user interactions |
US9972030B2 (en) | 2013-03-11 | 2018-05-15 | Criteo S.A. | Systems and methods for the semantic modeling of advertising creatives in targeted search advertising campaigns |
US9761225B2 (en) * | 2013-03-11 | 2017-09-12 | Nuance Communications, Inc. | Semantic re-ranking of NLU results in conversational dialogue applications |
US20140280314A1 (en) * | 2013-03-14 | 2014-09-18 | Advanced Search Laboratories, lnc. | Dimensional Articulation and Cognium Organization for Information Retrieval Systems |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US20140379324A1 (en) * | 2013-06-20 | 2014-12-25 | Microsoft Corporation | Providing web-based alternate text options |
US10275485B2 (en) * | 2014-06-10 | 2019-04-30 | Google Llc | Retrieving context from previous sessions |
US10262060B1 (en) * | 2014-07-07 | 2019-04-16 | Clarifai, Inc. | Systems and methods for facilitating searching, labeling, and/or filtering of digital media items |
US9519635B2 (en) * | 2014-09-11 | 2016-12-13 | Automated Insights, Inc. | System and method for integrated development environments for dynamically generating narrative content |
US10460239B2 (en) * | 2014-09-16 | 2019-10-29 | International Business Machines Corporation | Generation of inferred questions for a question answering system |
KR102348084B1 (en) * | 2014-09-16 | 2022-01-10 | 삼성전자주식회사 | Image Displaying Device, Driving Method of Image Displaying Device, and Computer Readable Recording Medium |
CN105868193A (en) * | 2015-01-19 | 2016-08-17 | 富士通株式会社 | Device and method used to detect product relevant information in electronic text |
WO2016171927A1 (en) * | 2015-04-20 | 2016-10-27 | Unified Compliance Framework (Network Frontiers) | Structured dictionary |
CN104978878A (en) * | 2015-06-26 | 2015-10-14 | 苏州点通教育科技有限公司 | Microlecture teaching system and method |
US10069940B2 (en) | 2015-09-10 | 2018-09-04 | Microsoft Technology Licensing, Llc | Deployment meta-data based applicability targetting |
US9965604B2 (en) | 2015-09-10 | 2018-05-08 | Microsoft Technology Licensing, Llc | De-duplication of per-user registration data |
US10614167B2 (en) | 2015-10-30 | 2020-04-07 | Sdl Plc | Translation review workflow systems and methods |
US10229687B2 (en) * | 2016-03-10 | 2019-03-12 | Microsoft Technology Licensing, Llc | Scalable endpoint-dependent natural language understanding |
US10878191B2 (en) * | 2016-05-10 | 2020-12-29 | Nuance Communications, Inc. | Iterative ontology discovery |
US20180349354A1 (en) * | 2016-06-29 | 2018-12-06 | Intel Corporation | Natural language indexer for virtual assistants |
US10503832B2 (en) * | 2016-07-29 | 2019-12-10 | Rovi Guides, Inc. | Systems and methods for disambiguating a term based on static and temporal knowledge graphs |
CN106294645A (en) * | 2016-08-03 | 2017-01-04 | 王晓光 | Different part of speech realization method and systems in big data search |
WO2018023484A1 (en) * | 2016-08-03 | 2018-02-08 | 王晓光 | Method and system of implementing search of different parts of speech in big data |
US20180068031A1 (en) * | 2016-08-16 | 2018-03-08 | Ebay Inc. | Enhancing user queries using implicit indicators |
US10102200B2 (en) | 2016-08-25 | 2018-10-16 | International Business Machines Corporation | Predicate parses using semantic knowledge |
CN106407180B (en) * | 2016-08-30 | 2021-01-01 | 北京奇艺世纪科技有限公司 | Entity disambiguation method and device |
US10268734B2 (en) * | 2016-09-30 | 2019-04-23 | International Business Machines Corporation | Providing search results based on natural language classification confidence information |
CN108509449B (en) * | 2017-02-24 | 2022-07-08 | 腾讯科技(深圳)有限公司 | Information processing method and server |
US10546026B2 (en) | 2017-03-31 | 2020-01-28 | International Business Machines Corporation | Advanced search-term disambiguation |
CN107180087B (en) * | 2017-05-09 | 2019-11-15 | 北京奇艺世纪科技有限公司 | A kind of searching method and device |
CN107193810B (en) * | 2017-05-19 | 2020-06-23 | 北京蓦然认知科技有限公司 | Method, equipment and system for disambiguating natural language content title |
CN109271621B (en) * | 2017-07-18 | 2023-04-18 | 腾讯科技(北京)有限公司 | Semantic disambiguation processing method, device and equipment |
US10635863B2 (en) | 2017-10-30 | 2020-04-28 | Sdl Inc. | Fragment recall and adaptive automated translation |
US11941033B2 (en) * | 2017-11-27 | 2024-03-26 | Affirm, Inc. | Method and system for syntactic searching |
US10387576B2 (en) | 2017-11-30 | 2019-08-20 | International Business Machines Corporation | Document preparation with argumentation support from a deep question answering system |
US10817676B2 (en) | 2017-12-27 | 2020-10-27 | Sdl Inc. | Intelligent routing services and systems |
US10915577B2 (en) * | 2018-03-22 | 2021-02-09 | Adobe Inc. | Constructing enterprise-specific knowledge graphs |
US11799664B2 (en) * | 2018-03-26 | 2023-10-24 | Entigenlogic Llc | Verifying authenticity of content to produce knowledge |
US10838951B2 (en) | 2018-04-02 | 2020-11-17 | International Business Machines Corporation | Query interpretation disambiguation |
CN108647705B (en) * | 2018-04-23 | 2019-04-05 | 北京交通大学 | Image, semantic disambiguation method and device based on image and text semantic similarity |
CN108920497B (en) * | 2018-05-23 | 2021-10-15 | 北京奇艺世纪科技有限公司 | Man-machine interaction method and device |
US11256867B2 (en) | 2018-10-09 | 2022-02-22 | Sdl Inc. | Systems and methods of machine learning for digital assets and message creation |
US20220318213A1 (en) * | 2019-01-28 | 2022-10-06 | Entigenlogic Llc | Curing impaired content utilizing a knowledge database of entigens |
US11386130B2 (en) * | 2019-01-28 | 2022-07-12 | Entigenlogic Llc | Converting content from a first to a second aptitude level |
US11966389B2 (en) * | 2019-02-13 | 2024-04-23 | International Business Machines Corporation | Natural language to structured query generation via paraphrasing |
US10607598B1 (en) * | 2019-04-05 | 2020-03-31 | Capital One Services, Llc | Determining input data for speech processing |
CN109977418B (en) * | 2019-04-09 | 2023-03-31 | 南瑞集团有限公司 | Short text similarity measurement method based on semantic vector |
US10769379B1 (en) | 2019-07-01 | 2020-09-08 | Unified Compliance Framework (Network Frontiers) | Automatic compliance tools |
US10824817B1 (en) * | 2019-07-01 | 2020-11-03 | Unified Compliance Framework (Network Frontiers) | Automatic compliance tools for substituting authority document synonyms |
US11120227B1 (en) | 2019-07-01 | 2021-09-14 | Unified Compliance Framework (Network Frontiers) | Automatic compliance tools |
US11501065B2 (en) * | 2019-09-11 | 2022-11-15 | Oracle International Corporation | Semantic parser including a coarse semantic parser and a fine semantic parser |
US20210141929A1 (en) * | 2019-11-12 | 2021-05-13 | Pilot Travel Centers Llc | Performing actions on personal data stored in multiple databases |
CN113051898A (en) * | 2019-12-27 | 2021-06-29 | 北京阿博茨科技有限公司 | Word meaning accumulation and word segmentation method, tool and system for structured data searched by natural language |
CN111159409B (en) * | 2019-12-31 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
US11651156B2 (en) * | 2020-05-07 | 2023-05-16 | Optum Technology, Inc. | Contextual document summarization with semantic intelligence |
CN111611810B (en) * | 2020-05-29 | 2023-08-04 | 河北数云堂智能科技有限公司 | Multi-tone word pronunciation disambiguation device and method |
US11941138B2 (en) * | 2020-06-04 | 2024-03-26 | Pilot Travel Centers, LLC | Data deletion and obfuscation system |
EP4205018A1 (en) | 2020-08-27 | 2023-07-05 | Unified Compliance Framework (Network Frontiers) | Automatically identifying multi-word expressions |
US11860943B2 (en) * | 2020-11-25 | 2024-01-02 | EMC IP Holding Company LLC | Method of “outcome driven data exploration” for datasets, business questions, and pipelines based on similarity mapping of business needs and asset use overlap |
US20230031040A1 (en) | 2021-07-20 | 2023-02-02 | Unified Compliance Framework (Network Frontiers) | Retrieval interface for content, such as compliance-related content |
US20230185786A1 (en) * | 2021-12-13 | 2023-06-15 | International Business Machines Corporation | Detect data standardization gaps |
US11922126B1 (en) * | 2023-07-28 | 2024-03-05 | Intuit Inc. | Use of semantic confidence metrics for uncertainty estimation in large language models |
Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5083571A (en) * | 1988-04-18 | 1992-01-28 | New York University | Use of brain electrophysiological quantitative data to classify and subtype an individual into diagnostic categories by discriminant and cluster analysis |
US5251131A (en) * | 1991-07-31 | 1993-10-05 | Thinking Machines Corporation | Classification of data records by comparison of records to a training database using probability weights |
US5418717A (en) * | 1990-08-27 | 1995-05-23 | Su; Keh-Yih | Multiple score language processing system |
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5510981A (en) * | 1993-10-28 | 1996-04-23 | International Business Machines Corporation | Language translation apparatus and method using context-based translation models |
US5541836A (en) * | 1991-12-30 | 1996-07-30 | At&T Corp. | Word disambiguation apparatus and methods |
US5638425A (en) * | 1992-12-17 | 1997-06-10 | Bell Atlantic Network Services, Inc. | Automated directory assistance system using word recognition and phoneme processing method |
US5761665A (en) * | 1995-10-31 | 1998-06-02 | Pitney Bowes Inc. | Method of automatic database field identification for postal coding |
US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
US6003027A (en) * | 1997-11-21 | 1999-12-14 | International Business Machines Corporation | System and method for determining confidence levels for the results of a categorization system |
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US6026388A (en) * | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
US20020120437A1 (en) * | 2000-04-03 | 2002-08-29 | Xerox Corporation | Method and apparatus for reducing the intermediate alphabet occurring between cascaded finite state transducers |
US20030176931A1 (en) * | 2002-03-11 | 2003-09-18 | International Business Machines Corporation | Method for constructing segmentation-based predictive models |
US20030187587A1 (en) * | 2000-03-14 | 2003-10-02 | Mark Swindells | Database |
US20030217052A1 (en) * | 2000-08-24 | 2003-11-20 | Celebros Ltd. | Search engine method and apparatus |
US20040076139A1 (en) * | 2000-07-03 | 2004-04-22 | Kenneth Kang-Yeh | Wireless name service registry and flexible call routing and scheduling |
US20040236725A1 (en) * | 2003-05-19 | 2004-11-25 | Einat Amitay | Disambiguation of term occurrences |
US20050071333A1 (en) * | 2001-02-28 | 2005-03-31 | Mayfield James C | Method for determining synthetic term senses using reference text |
US7043492B1 (en) * | 2001-07-05 | 2006-05-09 | Requisite Technology, Inc. | Automated classification of items using classification mappings |
US7143091B2 (en) * | 2002-02-04 | 2006-11-28 | Cataphorn, Inc. | Method and apparatus for sociological data mining |
US7209875B2 (en) * | 2002-12-04 | 2007-04-24 | Microsoft Corporation | System and method for machine learning a confidence metric for machine translation |
US7249012B2 (en) * | 2002-11-20 | 2007-07-24 | Microsoft Corporation | Statistical method and apparatus for learning translation relationships among phrases |
US7403942B1 (en) * | 2003-02-04 | 2008-07-22 | Seisint, Inc. | Method and system for processing data records |
Family Cites Families (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5317507A (en) | 1990-11-07 | 1994-05-31 | Gallant Stephen I | Method for document retrieval and for word sense disambiguation using neural networks |
US5325298A (en) | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
EP0494573A1 (en) | 1991-01-08 | 1992-07-15 | International Business Machines Corporation | Method for automatically disambiguating the synonymic links in a dictionary for a natural language processing system |
IL107482A (en) | 1992-11-04 | 1998-10-30 | Conquest Software Inc | Method for resolution of natural-language queries against full-text databases |
US5873056A (en) * | 1993-10-12 | 1999-02-16 | The Syracuse University | Natural language processing system for semantic vector representation which accounts for lexical ambiguity |
US5675819A (en) | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US5519786A (en) * | 1994-08-09 | 1996-05-21 | Trw Inc. | Method and apparatus for implementing a weighted voting scheme for multiple optical character recognition systems |
US5642502A (en) | 1994-12-06 | 1997-06-24 | University Of Central Florida | Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text |
US5794050A (en) | 1995-01-04 | 1998-08-11 | Intelligent Text Processing, Inc. | Natural language understanding system |
US6076088A (en) | 1996-02-09 | 2000-06-13 | Paik; Woojin | Information extraction system and method using concept relation concept (CRC) triples |
US5907839A (en) | 1996-07-03 | 1999-05-25 | Yeda Reseach And Development, Co., Ltd. | Algorithm for context sensitive spelling correction |
US5953541A (en) | 1997-01-24 | 1999-09-14 | Tegic Communications, Inc. | Disambiguating system for disambiguating ambiguous input sequences by displaying objects associated with the generated input sequences in the order of decreasing frequency of use |
US6098065A (en) | 1997-02-13 | 2000-08-01 | Nortel Networks Corporation | Associative search engine |
US5996011A (en) | 1997-03-25 | 1999-11-30 | Unified Research Laboratories, Inc. | System and method for filtering data received by a computer system |
US6038560A (en) | 1997-05-21 | 2000-03-14 | Oracle Corporation | Concept knowledge base search and retrieval system |
US6138085A (en) | 1997-07-31 | 2000-10-24 | Microsoft Corporation | Inferring semantic relations |
US6078878A (en) | 1997-07-31 | 2000-06-20 | Microsoft Corporation | Bootstrapping sense characterizations of occurrences of polysemous words |
US6098033A (en) | 1997-07-31 | 2000-08-01 | Microsoft Corporation | Determining similarity between words |
US6070134A (en) | 1997-07-31 | 2000-05-30 | Microsoft Corporation | Identifying salient semantic relation paths between two words |
US6105023A (en) | 1997-08-18 | 2000-08-15 | Dataware Technologies, Inc. | System and method for filtering a document stream |
US6260008B1 (en) | 1998-01-08 | 2001-07-10 | Sharp Kabushiki Kaisha | Method of and system for disambiguating syntactic word multiples |
US6421675B1 (en) * | 1998-03-16 | 2002-07-16 | S. L. I. Systems, Inc. | Search engine |
US6092034A (en) * | 1998-07-27 | 2000-07-18 | International Business Machines Corporation | Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models |
US6487552B1 (en) * | 1998-10-05 | 2002-11-26 | Oracle Corporation | Database fine-grained access control |
US6480843B2 (en) * | 1998-11-03 | 2002-11-12 | Nec Usa, Inc. | Supporting web-query expansion efficiently using multi-granularity indexing and query processing |
US6256629B1 (en) | 1998-11-25 | 2001-07-03 | Lucent Technologies Inc. | Method and apparatus for measuring the degree of polysemy in polysemous words |
US6189002B1 (en) | 1998-12-14 | 2001-02-13 | Dolphin Search | Process and system for retrieval of documents using context-relevant semantic profiles |
US6751606B1 (en) | 1998-12-23 | 2004-06-15 | Microsoft Corporation | System for enhancing a query interface |
US7089194B1 (en) | 1999-06-17 | 2006-08-08 | International Business Machines Corporation | Method and apparatus for providing reduced cost online service and adaptive targeting of advertisements |
US7089236B1 (en) * | 1999-06-24 | 2006-08-08 | Search 123.Com, Inc. | Search engine interface |
KR20010004404A (en) | 1999-06-28 | 2001-01-15 | 정선종 | Keyfact-based text retrieval system, keyfact-based text index method, and retrieval method using this system |
US6665665B1 (en) * | 1999-07-30 | 2003-12-16 | Verizon Laboratories Inc. | Compressed document surrogates |
US6453315B1 (en) * | 1999-09-22 | 2002-09-17 | Applied Semantics, Inc. | Meaning-based information organization and retrieval |
US6816857B1 (en) * | 1999-11-01 | 2004-11-09 | Applied Semantics, Inc. | Meaning-based advertising and document relevance determination |
US6405162B1 (en) | 1999-09-23 | 2002-06-11 | Xerox Corporation | Type-based selection of rules for semantically disambiguating words |
EP1221110A2 (en) * | 1999-09-24 | 2002-07-10 | Wordmap Limited | Apparatus for and method of searching |
US6636848B1 (en) * | 2000-05-31 | 2003-10-21 | International Business Machines Corporation | Information search using knowledge agents |
EP1170677B1 (en) | 2000-07-04 | 2009-03-18 | International Business Machines Corporation | Method and system of weighted context feedback for result improvement in information retrieval |
GB0018645D0 (en) | 2000-07-28 | 2000-09-13 | Tenara Limited | Dynamic personalization via semantic networks |
AU2001286689A1 (en) | 2000-08-24 | 2002-03-04 | Science Applications International Corporation | Word sense disambiguation |
US6766320B1 (en) | 2000-08-24 | 2004-07-20 | Microsoft Corporation | Search engine with natural language-based robust parsing for user query and relevance feedback learning |
US7174341B2 (en) * | 2001-05-31 | 2007-02-06 | Synopsys, Inc. | Dynamic database management system and method |
US7184948B2 (en) | 2001-06-15 | 2007-02-27 | Sakhr Software Company | Method and system for theme-based word sense ambiguity reduction |
US20030101182A1 (en) * | 2001-07-18 | 2003-05-29 | Omri Govrin | Method and system for smart search engine and other applications |
US7007074B2 (en) * | 2001-09-10 | 2006-02-28 | Yahoo! Inc. | Targeted advertisements using time-dependent key search terms |
US7403938B2 (en) * | 2001-09-24 | 2008-07-22 | Iac Search & Media, Inc. | Natural language query processing |
US20030078928A1 (en) * | 2001-10-23 | 2003-04-24 | Dorosario Alden | Network wide ad targeting |
US20050021397A1 (en) * | 2003-07-22 | 2005-01-27 | Cui Yingwei Claire | Content-targeted advertising using collected user behavior data |
US20030220913A1 (en) * | 2002-05-24 | 2003-11-27 | International Business Machines Corporation | Techniques for personalized and adaptive search services |
US20040117173A1 (en) * | 2002-12-18 | 2004-06-17 | Ford Daniel Alexander | Graphical feedback for semantic interpretation of text and images |
US20050033771A1 (en) * | 2003-04-30 | 2005-02-10 | Schmitter Thomas A. | Contextual advertising system |
US8856163B2 (en) * | 2003-07-28 | 2014-10-07 | Google Inc. | System and method for providing a user interface with search query broadening |
US20070073678A1 (en) * | 2005-09-23 | 2007-03-29 | Applied Linguistics, Llc | Semantic document profiling |
JP2008537225A (en) * | 2005-04-11 | 2008-09-11 | テキストディガー,インコーポレイテッド | Search system and method for queries |
-
2004
- 2004-08-20 CA CA2536265A patent/CA2536265C/en not_active Expired - Fee Related
- 2004-08-20 CN CN200480023961A patent/CN100580666C/en not_active Expired - Fee Related
- 2004-08-20 US US10/921,875 patent/US7509313B2/en not_active Expired - Fee Related
- 2004-08-20 WO PCT/CA2004/001531 patent/WO2005020091A1/en active Application Filing
- 2004-08-20 CA CA002536262A patent/CA2536262A1/en not_active Abandoned
- 2004-08-20 US US10/921,820 patent/US7895221B2/en not_active Expired - Fee Related
- 2004-08-20 US US10/921,954 patent/US20050080613A1/en not_active Abandoned
- 2004-08-20 CN CN2004800312332A patent/CN1871597B/en not_active Expired - Fee Related
- 2004-08-20 CA CA002536270A patent/CA2536270A1/en not_active Abandoned
- 2004-08-20 WO PCT/CA2004/001530 patent/WO2005020093A1/en active Application Filing
- 2004-08-20 CN CN200480031158XA patent/CN1871603B/en not_active Expired - Fee Related
- 2004-08-20 EP EP04761694A patent/EP1665092A4/en not_active Withdrawn
- 2004-08-20 EP EP04761695A patent/EP1661031A4/en not_active Withdrawn
- 2004-08-20 EP EP04761693A patent/EP1665091A4/en not_active Withdrawn
- 2004-08-20 WO PCT/CA2004/001529 patent/WO2005020092A1/en active Application Filing
-
2011
- 2011-02-21 US US13/031,600 patent/US20110202563A1/en not_active Abandoned
Patent Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5083571A (en) * | 1988-04-18 | 1992-01-28 | New York University | Use of brain electrophysiological quantitative data to classify and subtype an individual into diagnostic categories by discriminant and cluster analysis |
US5418717A (en) * | 1990-08-27 | 1995-05-23 | Su; Keh-Yih | Multiple score language processing system |
US5805832A (en) * | 1991-07-25 | 1998-09-08 | International Business Machines Corporation | System for parametric text to text language translation |
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5251131A (en) * | 1991-07-31 | 1993-10-05 | Thinking Machines Corporation | Classification of data records by comparison of records to a training database using probability weights |
US5541836A (en) * | 1991-12-30 | 1996-07-30 | At&T Corp. | Word disambiguation apparatus and methods |
US5638425A (en) * | 1992-12-17 | 1997-06-10 | Bell Atlantic Network Services, Inc. | Automated directory assistance system using word recognition and phoneme processing method |
US5510981A (en) * | 1993-10-28 | 1996-04-23 | International Business Machines Corporation | Language translation apparatus and method using context-based translation models |
US6026388A (en) * | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
US5761665A (en) * | 1995-10-31 | 1998-06-02 | Pitney Bowes Inc. | Method of automatic database field identification for postal coding |
US6003027A (en) * | 1997-11-21 | 1999-12-14 | International Business Machines Corporation | System and method for determining confidence levels for the results of a categorization system |
US20030187587A1 (en) * | 2000-03-14 | 2003-10-02 | Mark Swindells | Database |
US20020120437A1 (en) * | 2000-04-03 | 2002-08-29 | Xerox Corporation | Method and apparatus for reducing the intermediate alphabet occurring between cascaded finite state transducers |
US20040076139A1 (en) * | 2000-07-03 | 2004-04-22 | Kenneth Kang-Yeh | Wireless name service registry and flexible call routing and scheduling |
US20030217052A1 (en) * | 2000-08-24 | 2003-11-20 | Celebros Ltd. | Search engine method and apparatus |
US20050071333A1 (en) * | 2001-02-28 | 2005-03-31 | Mayfield James C | Method for determining synthetic term senses using reference text |
US7043492B1 (en) * | 2001-07-05 | 2006-05-09 | Requisite Technology, Inc. | Automated classification of items using classification mappings |
US7143091B2 (en) * | 2002-02-04 | 2006-11-28 | Cataphorn, Inc. | Method and apparatus for sociological data mining |
US20030176931A1 (en) * | 2002-03-11 | 2003-09-18 | International Business Machines Corporation | Method for constructing segmentation-based predictive models |
US7249012B2 (en) * | 2002-11-20 | 2007-07-24 | Microsoft Corporation | Statistical method and apparatus for learning translation relationships among phrases |
US7209875B2 (en) * | 2002-12-04 | 2007-04-24 | Microsoft Corporation | System and method for machine learning a confidence metric for machine translation |
US7403942B1 (en) * | 2003-02-04 | 2008-07-22 | Seisint, Inc. | Method and system for processing data records |
US20040236725A1 (en) * | 2003-05-19 | 2004-11-25 | Einat Amitay | Disambiguation of term occurrences |
Cited By (430)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US20040039988A1 (en) * | 2002-08-20 | 2004-02-26 | Kyu-Woong Lee | Methods and systems for implementing auto-complete in a web page |
US7185271B2 (en) * | 2002-08-20 | 2007-02-27 | Hewlett-Packard Development Company, L.P. | Methods and systems for implementing auto-complete in a web page |
US20060123104A1 (en) * | 2004-12-06 | 2006-06-08 | Bmc Software, Inc. | Generic discovery for computer networks |
US20060136585A1 (en) * | 2004-12-06 | 2006-06-22 | Bmc Software, Inc. | Resource reconciliation |
US10534577B2 (en) | 2004-12-06 | 2020-01-14 | Bmc Software, Inc. | System and method for resource reconciliation in an enterprise management system |
US10523543B2 (en) | 2004-12-06 | 2019-12-31 | Bmc Software, Inc. | Generic discovery for computer networks |
US8683032B2 (en) | 2004-12-06 | 2014-03-25 | Bmc Software, Inc. | Generic discovery for computer networks |
US9967162B2 (en) | 2004-12-06 | 2018-05-08 | Bmc Software, Inc. | Generic discovery for computer networks |
US9137115B2 (en) * | 2004-12-06 | 2015-09-15 | Bmc Software, Inc. | System and method for resource reconciliation in an enterprise management system |
US10795643B2 (en) | 2004-12-06 | 2020-10-06 | Bmc Software, Inc. | System and method for resource reconciliation in an enterprise management system |
US8650175B2 (en) | 2005-03-31 | 2014-02-11 | Google Inc. | User interface for facts query engine with snippets from information sources that include query terms and answer terms |
US9208229B2 (en) | 2005-03-31 | 2015-12-08 | Google Inc. | Anchor text summarization for corroboration |
US20070143282A1 (en) * | 2005-03-31 | 2007-06-21 | Betz Jonathan T | Anchor text summarization for corroboration |
US8682913B1 (en) | 2005-03-31 | 2014-03-25 | Google Inc. | Corroborating facts extracted from multiple sources |
US8825471B2 (en) | 2005-05-31 | 2014-09-02 | Google Inc. | Unsupervised extraction of facts |
US8996470B1 (en) | 2005-05-31 | 2015-03-31 | Google Inc. | System for ensuring the internal consistency of a fact repository |
US9558186B2 (en) | 2005-05-31 | 2017-01-31 | Google Inc. | Unsupervised extraction of facts |
US20070150800A1 (en) * | 2005-05-31 | 2007-06-28 | Betz Jonathan T | Unsupervised extraction of facts |
US20070005206A1 (en) * | 2005-07-01 | 2007-01-04 | You Zhang | Automobile interface |
US7826945B2 (en) * | 2005-07-01 | 2010-11-02 | You Zhang | Automobile speech-recognition interface |
US9905223B2 (en) | 2005-08-27 | 2018-02-27 | Nuance Communications, Inc. | System and method for using semantic and syntactic graphs for utterance classification |
US8700404B1 (en) * | 2005-08-27 | 2014-04-15 | At&T Intellectual Property Ii, L.P. | System and method for using semantic and syntactic graphs for utterance classification |
US9218810B2 (en) | 2005-08-27 | 2015-12-22 | At&T Intellectual Property Ii, L.P. | System and method for using semantic and syntactic graphs for utterance classification |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
JP2009510639A (en) * | 2005-10-04 | 2009-03-12 | トムソン グローバル リソーシーズ | System, method and software for determining ambiguity of medical terms |
JP2011233162A (en) * | 2005-10-04 | 2011-11-17 | Thomson Reuters Global Resources | System, method, and software for assessing ambiguity of medical terms |
US7681147B2 (en) * | 2005-12-13 | 2010-03-16 | Yahoo! Inc. | System for determining probable meanings of inputted words |
US20070136689A1 (en) * | 2005-12-13 | 2007-06-14 | David Richardson-Bunbury | System for determining probable meanings of inputted words |
US9092495B2 (en) | 2006-01-27 | 2015-07-28 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
US8682891B2 (en) | 2006-02-17 | 2014-03-25 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
US8260785B2 (en) | 2006-02-17 | 2012-09-04 | Google Inc. | Automatic object reference identification and linking in a browseable fact repository |
US20080016040A1 (en) * | 2006-07-14 | 2008-01-17 | Chacha Search Inc. | Method and system for qualifying keywords in query strings |
US8255383B2 (en) | 2006-07-14 | 2012-08-28 | Chacha Search, Inc | Method and system for qualifying keywords in query strings |
US20080056575A1 (en) * | 2006-08-30 | 2008-03-06 | Bradley Jeffery Behm | Method and system for automatically classifying page images |
US9594833B2 (en) | 2006-08-30 | 2017-03-14 | Amazon Technologies, Inc. | Automatically classifying page images |
US8306326B2 (en) * | 2006-08-30 | 2012-11-06 | Amazon Technologies, Inc. | Method and system for automatically classifying page images |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US20080071533A1 (en) * | 2006-09-14 | 2008-03-20 | Intervoice Limited Partnership | Automatic generation of statistical language models for interactive voice response applications |
US8214199B2 (en) | 2006-10-10 | 2012-07-03 | Abbyy Software, Ltd. | Systems for translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions |
US9235573B2 (en) | 2006-10-10 | 2016-01-12 | Abbyy Infopoisk Llc | Universal difference measure |
US9047275B2 (en) | 2006-10-10 | 2015-06-02 | Abbyy Infopoisk Llc | Methods and systems for alignment of parallel text corpora |
US9645993B2 (en) * | 2006-10-10 | 2017-05-09 | Abbyy Infopoisk Llc | Method and system for semantic searching |
US20090182549A1 (en) * | 2006-10-10 | 2009-07-16 | Konstantin Anisimovich | Deep Model Statistics Method for Machine Translation |
US20080086298A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between langauges |
US9984071B2 (en) | 2006-10-10 | 2018-05-29 | Abbyy Production Llc | Language ambiguity detection of text |
US20140114649A1 (en) * | 2006-10-10 | 2014-04-24 | Abbyy Infopoisk Llc | Method and system for semantic searching |
US8145473B2 (en) | 2006-10-10 | 2012-03-27 | Abbyy Software Ltd. | Deep model statistics method for machine translation |
US20080086299A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between languages |
US9817818B2 (en) | 2006-10-10 | 2017-11-14 | Abbyy Production Llc | Method and system for translating sentence between languages based on semantic structure of the sentence |
US8195447B2 (en) | 2006-10-10 | 2012-06-05 | Abbyy Software Ltd. | Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions |
US20080086300A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between languages |
US20090070099A1 (en) * | 2006-10-10 | 2009-03-12 | Konstantin Anisimovich | Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system |
US8548795B2 (en) | 2006-10-10 | 2013-10-01 | Abbyy Software Ltd. | Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system |
US9633005B2 (en) | 2006-10-10 | 2017-04-25 | Abbyy Infopoisk Llc | Exhaustive automatic processing of textual information |
US8442810B2 (en) | 2006-10-10 | 2013-05-14 | Abbyy Software Ltd. | Deep model statistics method for machine translation |
US9323747B2 (en) | 2006-10-10 | 2016-04-26 | Abbyy Infopoisk Llc | Deep model statistics method for machine translation |
US8412513B2 (en) | 2006-10-10 | 2013-04-02 | Abbyy Software Ltd. | Deep model statistics method for machine translation |
US8805676B2 (en) | 2006-10-10 | 2014-08-12 | Abbyy Infopoisk Llc | Deep model statistics method for machine translation |
US8918309B2 (en) | 2006-10-10 | 2014-12-23 | Abbyy Infopoisk Llc | Deep model statistics method for machine translation |
US8892418B2 (en) | 2006-10-10 | 2014-11-18 | Abbyy Infopoisk Llc | Translating sentences between languages |
US9760570B2 (en) | 2006-10-20 | 2017-09-12 | Google Inc. | Finding and disambiguating references to entities on web pages |
US8751498B2 (en) | 2006-10-20 | 2014-06-10 | Google Inc. | Finding and disambiguating references to entities on web pages |
US8122026B1 (en) * | 2006-10-20 | 2012-02-21 | Google Inc. | Finding and disambiguating references to entities on web pages |
US20110040553A1 (en) * | 2006-11-13 | 2011-02-17 | Sellon Sasivarman | Natural language processing |
US8131546B1 (en) * | 2007-01-03 | 2012-03-06 | Stored Iq, Inc. | System and method for adaptive sentence boundary disambiguation |
EP2115630A4 (en) * | 2007-01-04 | 2016-08-17 | Thinking Solutions Pty Ltd | Linguistic analysis |
US9093073B1 (en) * | 2007-02-12 | 2015-07-28 | West Corporation | Automatic speech recognition tagging |
WO2008100849A3 (en) * | 2007-02-15 | 2009-12-30 | Cycorp, Inc. | Semantics-based method and system for document analysis |
US9772992B2 (en) * | 2007-02-26 | 2017-09-26 | Microsoft Technology Licensing, Llc | Automatic disambiguation based on a reference resource |
US20120102045A1 (en) * | 2007-02-26 | 2012-04-26 | Microsoft Corporation | Automatic disambiguation based on a reference resource |
US8112402B2 (en) | 2007-02-26 | 2012-02-07 | Microsoft Corporation | Automatic disambiguation based on a reference resource |
US20080208864A1 (en) * | 2007-02-26 | 2008-08-28 | Microsoft Corporation | Automatic disambiguation based on a reference resource |
US8347202B1 (en) | 2007-03-14 | 2013-01-01 | Google Inc. | Determining geographic locations for place names in a fact repository |
US9892132B2 (en) | 2007-03-14 | 2018-02-13 | Google Llc | Determining geographic locations for place names in a fact repository |
US9934313B2 (en) | 2007-03-14 | 2018-04-03 | Fiver Llc | Query templates and labeled search tip system, methods and techniques |
US8959011B2 (en) | 2007-03-22 | 2015-02-17 | Abbyy Infopoisk Llc | Indicating and correcting errors in machine translation systems |
US9772998B2 (en) | 2007-03-22 | 2017-09-26 | Abbyy Production Llc | Indicating and correcting errors in machine translation systems |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US20100042401A1 (en) * | 2007-05-20 | 2010-02-18 | Ascoli Giorgio A | Semantic Cognitive Map |
US8190422B2 (en) * | 2007-05-20 | 2012-05-29 | George Mason Intellectual Properties, Inc. | Semantic cognitive map |
US9239826B2 (en) | 2007-06-27 | 2016-01-19 | Abbyy Infopoisk Llc | Method and system for generating new entries in natural language dictionary |
US7970766B1 (en) | 2007-07-23 | 2011-06-28 | Google Inc. | Entity type assignment |
US20090089047A1 (en) * | 2007-08-31 | 2009-04-02 | Powerset, Inc. | Natural Language Hypernym Weighting For Word Sense Disambiguation |
US8463593B2 (en) * | 2007-08-31 | 2013-06-11 | Microsoft Corporation | Natural language hypernym weighting for word sense disambiguation |
EP2206057A4 (en) * | 2007-10-17 | 2017-04-05 | VCVC lll LLC | Nlp-based entity recognition and disambiguation |
EP2206057A1 (en) * | 2007-10-17 | 2010-07-14 | Evri Inc. | Nlp-based entity recognition and disambiguation |
US10282389B2 (en) | 2007-10-17 | 2019-05-07 | Fiver Llc | NLP-based entity recognition and disambiguation |
WO2009052277A1 (en) | 2007-10-17 | 2009-04-23 | Evri, Inc. | Nlp-based entity recognition and disambiguation |
US8812435B1 (en) | 2007-11-16 | 2014-08-19 | Google Inc. | Learning objects and facts from documents |
US20090157384A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US8275607B2 (en) | 2007-12-12 | 2012-09-25 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US20090234638A1 (en) * | 2008-03-14 | 2009-09-17 | Microsoft Corporation | Use of a Speech Grammar to Recognize Instant Message Input |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US20090307003A1 (en) * | 2008-05-16 | 2009-12-10 | Daniel Benyamin | Social advertisement network |
US20090326922A1 (en) * | 2008-06-30 | 2009-12-31 | International Business Machines Corporation | Client side reconciliation of typographical errors in messages from input-limited devices |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9262409B2 (en) | 2008-08-06 | 2016-02-16 | Abbyy Infopoisk Llc | Translation of a selected text fragment of a screen |
US20120166414A1 (en) * | 2008-08-11 | 2012-06-28 | Ultra Unilimited Corporation (dba Publish) | Systems and methods for relevance scoring |
US20100082657A1 (en) * | 2008-09-23 | 2010-04-01 | Microsoft Corporation | Generating synonyms based on query log data |
US9092517B2 (en) | 2008-09-23 | 2015-07-28 | Microsoft Technology Licensing, Llc | Generating synonyms based on query log data |
US20110231183A1 (en) * | 2008-11-28 | 2011-09-22 | Nec Corporation | Language model creation device |
US9043209B2 (en) * | 2008-11-28 | 2015-05-26 | Nec Corporation | Language model creation device |
US9959870B2 (en) * | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US20110307254A1 (en) * | 2008-12-11 | 2011-12-15 | Melvyn Hunt | Speech recognition involving a mobile device |
US20180218735A1 (en) * | 2008-12-11 | 2018-08-02 | Apple Inc. | Speech recognition involving a mobile device |
US20100161577A1 (en) * | 2008-12-19 | 2010-06-24 | Bmc Software, Inc. | Method of Reconciling Resources in the Metadata Hierarchy |
US10831724B2 (en) | 2008-12-19 | 2020-11-10 | Bmc Software, Inc. | Method of reconciling resources in the metadata hierarchy |
US20100250250A1 (en) * | 2009-03-30 | 2010-09-30 | Jonathan Wiggs | Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems |
US8712774B2 (en) * | 2009-03-30 | 2014-04-29 | Nuance Communications, Inc. | Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems |
US20100293179A1 (en) * | 2009-05-14 | 2010-11-18 | Microsoft Corporation | Identifying synonyms of entities using web search |
US20100293170A1 (en) * | 2009-05-15 | 2010-11-18 | Citizennet Inc. | Social network message categorization systems and methods |
US8504550B2 (en) * | 2009-05-15 | 2013-08-06 | Citizennet Inc. | Social network message categorization systems and methods |
US20100313258A1 (en) * | 2009-06-04 | 2010-12-09 | Microsoft Corporation | Identifying synonyms of entities using a document collection |
US8533203B2 (en) * | 2009-06-04 | 2013-09-10 | Microsoft Corporation | Identifying synonyms of entities using a document collection |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
TWI412277B (en) * | 2009-08-10 | 2013-10-11 | Univ Nat Cheng Kung | Video summarization method based on mining the story-structure and semantic relations among concept entities |
US20110047149A1 (en) * | 2009-08-21 | 2011-02-24 | Vaeaenaenen Mikko | Method and means for data searching and language translation |
US9953092B2 (en) | 2009-08-21 | 2018-04-24 | Mikko Vaananen | Method and means for data searching and language translation |
US20110093455A1 (en) * | 2009-10-21 | 2011-04-21 | Citizennet Inc. | Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency |
US8380697B2 (en) | 2009-10-21 | 2013-02-19 | Citizennet Inc. | Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency |
US8554854B2 (en) | 2009-12-11 | 2013-10-08 | Citizennet Inc. | Systems and methods for identifying terms relevant to web pages using social network messages |
US20110153595A1 (en) * | 2009-12-23 | 2011-06-23 | Palo Alto Research Center Incorporated | System And Method For Identifying Topics For Short Text Communications |
US8725717B2 (en) * | 2009-12-23 | 2014-05-13 | Palo Alto Research Center Incorporated | System and method for identifying topics for short text communications |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US8712979B2 (en) | 2010-03-26 | 2014-04-29 | Bmc Software, Inc. | Statistical identification of instances during reconciliation process |
US10877974B2 (en) | 2010-03-26 | 2020-12-29 | Bmc Software, Inc. | Statistical identification of instances during reconciliation process |
US10198476B2 (en) | 2010-03-26 | 2019-02-05 | Bmc Software, Inc. | Statistical identification of instances during reconciliation process |
US9323801B2 (en) | 2010-03-26 | 2016-04-26 | Bmc Software, Inc. | Statistical identification of instances during reconciliation process |
US20110238637A1 (en) * | 2010-03-26 | 2011-09-29 | Bmc Software, Inc. | Statistical Identification of Instances During Reconciliation Process |
US20110246462A1 (en) * | 2010-03-30 | 2011-10-06 | International Business Machines Corporation | Method and System for Prompting Changes of Electronic Document Content |
US10331783B2 (en) | 2010-03-30 | 2019-06-25 | Fiver Llc | NLP-based systems and methods for providing quotations |
US10482106B2 (en) * | 2010-05-14 | 2019-11-19 | Salesforce.Com, Inc. | Querying a database using relationship metadata |
US9600566B2 (en) | 2010-05-14 | 2017-03-21 | Microsoft Technology Licensing, Llc | Identifying entity synonyms |
US20160019287A1 (en) * | 2010-05-14 | 2016-01-21 | Salesforce.Com, Inc. | Querying a database using relationship metadata |
US20110289025A1 (en) * | 2010-05-19 | 2011-11-24 | Microsoft Corporation | Learning user intent from rule-based training data |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US9135666B2 (en) | 2010-10-19 | 2015-09-15 | CitizenNet, Inc. | Generation of advertising targeting information based upon affinity information obtained from an online social network |
US8615434B2 (en) | 2010-10-19 | 2013-12-24 | Citizennet Inc. | Systems and methods for automatically generating campaigns using advertising targeting information based upon affinity information obtained from an online social network |
US8612293B2 (en) | 2010-10-19 | 2013-12-17 | Citizennet Inc. | Generation of advertising targeting information based upon affinity information obtained from an online social network |
US10049150B2 (en) | 2010-11-01 | 2018-08-14 | Fiver Llc | Category-based content recommendation |
US8521517B2 (en) * | 2010-12-13 | 2013-08-27 | Google Inc. | Providing definitions that are sensitive to the context of a text |
US8645364B2 (en) | 2010-12-13 | 2014-02-04 | Google Inc. | Providing definitions that are sensitive to the context of a text |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US20120209609A1 (en) * | 2011-02-14 | 2012-08-16 | General Motors Llc | User-specific confidence thresholds for speech recognition |
US8639508B2 (en) * | 2011-02-14 | 2014-01-28 | General Motors Llc | User-specific confidence thresholds for speech recognition |
US20120239381A1 (en) * | 2011-03-17 | 2012-09-20 | Sap Ag | Semantic phrase suggestion engine |
US9311296B2 (en) | 2011-03-17 | 2016-04-12 | Sap Se | Semantic phrase suggestion engine |
CN102682042A (en) * | 2011-03-18 | 2012-09-19 | 日电(中国)有限公司 | Concept identifying device and method |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US9063927B2 (en) | 2011-04-06 | 2015-06-23 | Citizennet Inc. | Short message age classification |
US10127296B2 (en) | 2011-04-07 | 2018-11-13 | Bmc Software, Inc. | Cooperative naming for configuration items in a distributed configuration management database environment |
US10740352B2 (en) | 2011-04-07 | 2020-08-11 | Bmc Software, Inc. | Cooperative naming for configuration items in a distributed configuration management database environment |
US11514076B2 (en) | 2011-04-07 | 2022-11-29 | Bmc Software, Inc. | Cooperative naming for configuration items in a distributed configuration management database environment |
US9262402B2 (en) * | 2011-05-10 | 2016-02-16 | Nec Corporation | Device, method and program for assessing synonymous expressions |
US20140343922A1 (en) * | 2011-05-10 | 2014-11-20 | Nec Corporation | Device, method and program for assessing synonymous expressions |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10068022B2 (en) | 2011-06-03 | 2018-09-04 | Google Llc | Identifying topical entities |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9002892B2 (en) | 2011-08-07 | 2015-04-07 | CitizenNet, Inc. | Systems and methods for trend detection using frequency analysis |
US9223777B2 (en) | 2011-08-25 | 2015-12-29 | Sap Se | Self-learning semantic search engine |
US8935230B2 (en) | 2011-08-25 | 2015-01-13 | Sap Se | Self-learning semantic search engine |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9269353B1 (en) * | 2011-12-07 | 2016-02-23 | Manu Rehani | Methods and systems for measuring semantics in communications |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US8745019B2 (en) | 2012-03-05 | 2014-06-03 | Microsoft Corporation | Robust discovery of entity synonyms using query logs |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US20150006155A1 (en) * | 2012-03-07 | 2015-01-01 | Mitsubishi Electric Corporation | Device, method, and program for word sense estimation |
US9053497B2 (en) | 2012-04-27 | 2015-06-09 | CitizenNet, Inc. | Systems and methods for targeting advertising to groups with strong ties within an online social network |
US8989485B2 (en) | 2012-04-27 | 2015-03-24 | Abbyy Development Llc | Detecting a junction in a text line of CJK characters |
US8971630B2 (en) | 2012-04-27 | 2015-03-03 | Abbyy Development Llc | Fast CJK character recognition |
US9390707B2 (en) | 2012-05-03 | 2016-07-12 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US9275636B2 (en) * | 2012-05-03 | 2016-03-01 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US20170116979A1 (en) * | 2012-05-03 | 2017-04-27 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US10002606B2 (en) * | 2012-05-03 | 2018-06-19 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US9570068B2 (en) * | 2012-05-03 | 2017-02-14 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US20130297290A1 (en) * | 2012-05-03 | 2013-11-07 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US20160284342A1 (en) * | 2012-05-03 | 2016-09-29 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US10170102B2 (en) * | 2012-05-03 | 2019-01-01 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US9892725B2 (en) * | 2012-05-03 | 2018-02-13 | International Business Machines Corporation | Automatic accuracy estimation for audio transcriptions |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10032131B2 (en) | 2012-06-20 | 2018-07-24 | Microsoft Technology Licensing, Llc | Data services for enterprises leveraging search system data assets |
US9594831B2 (en) | 2012-06-22 | 2017-03-14 | Microsoft Technology Licensing, Llc | Targeted disambiguation of named entities |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9305103B2 (en) * | 2012-07-03 | 2016-04-05 | Yahoo! Inc. | Method or system for semantic categorization |
US12032643B2 (en) | 2012-07-20 | 2024-07-09 | Veveo, Inc. | Method of and system for inferring user intent in search input in a conversational interaction system |
JP2022071194A (en) * | 2012-07-31 | 2022-05-13 | ベベオ, インコーポレイテッド | Cancellation of ambiguity in user's intention in conversation type interaction |
JP7371155B2 (en) | 2012-07-31 | 2023-10-30 | ベベオ, インコーポレイテッド | Disambiguating user intent in conversational interactions |
US11847151B2 (en) | 2012-07-31 | 2023-12-19 | Veveo, Inc. | Disambiguating user intent in conversational interaction system for large corpus information retrieval |
US9229924B2 (en) | 2012-08-24 | 2016-01-05 | Microsoft Technology Licensing, Llc | Word detection and domain dictionary recommendation |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
WO2014074317A1 (en) * | 2012-11-08 | 2014-05-15 | Evernote Corporation | Extraction and clarification of ambiguities for addresses in documents |
US20140156703A1 (en) * | 2012-11-30 | 2014-06-05 | Altera Corporation | Method and apparatus for translating graphical symbols into query keywords |
US9772995B2 (en) | 2012-12-27 | 2017-09-26 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
WO2014104943A1 (en) * | 2012-12-27 | 2014-07-03 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US9852655B2 (en) | 2013-02-15 | 2017-12-26 | Voxy, Inc. | Systems and methods for extracting keywords in language learning |
US10438509B2 (en) | 2013-02-15 | 2019-10-08 | Voxy, Inc. | Language learning systems and methods |
US9875669B2 (en) | 2013-02-15 | 2018-01-23 | Voxy, Inc. | Systems and methods for generating distractors in language learning |
US9666098B2 (en) * | 2013-02-15 | 2017-05-30 | Voxy, Inc. | Language learning systems and methods |
US10325517B2 (en) | 2013-02-15 | 2019-06-18 | Voxy, Inc. | Systems and methods for extracting keywords in language learning |
US20140342320A1 (en) * | 2013-02-15 | 2014-11-20 | Voxy, Inc. | Language learning systems and methods |
US10410539B2 (en) | 2013-02-15 | 2019-09-10 | Voxy, Inc. | Systems and methods for calculating text difficulty |
US10720078B2 (en) | 2013-02-15 | 2020-07-21 | Voxy, Inc | Systems and methods for extracting keywords in language learning |
US9711064B2 (en) | 2013-02-15 | 2017-07-18 | Voxy, Inc. | Systems and methods for calculating text difficulty |
US10147336B2 (en) | 2013-02-15 | 2018-12-04 | Voxy, Inc. | Systems and methods for generating distractors in language learning |
US9852165B2 (en) | 2013-03-14 | 2017-12-26 | Bmc Software, Inc. | Storing and retrieving context senstive data in a management system |
US9158799B2 (en) | 2013-03-14 | 2015-10-13 | Bmc Software, Inc. | Storing and retrieving context sensitive data in a management system |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US10152538B2 (en) | 2013-05-06 | 2018-12-11 | Dropbox, Inc. | Suggested search based on a content item |
US20210201932A1 (en) * | 2013-05-07 | 2021-07-01 | Veveo, Inc. | Method of and system for real time feedback in an incremental speech input interface |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9582490B2 (en) | 2013-07-12 | 2017-02-28 | Microsoft Technolog Licensing, LLC | Active labeling for computer-human interactive learning |
US9489373B2 (en) | 2013-07-12 | 2016-11-08 | Microsoft Technology Licensing, Llc | Interactive segment extraction in computer-human interactive learning |
US9355088B2 (en) * | 2013-07-12 | 2016-05-31 | Microsoft Technology Licensing, Llc | Feature completion in computer-human interactive learning |
US9430460B2 (en) | 2013-07-12 | 2016-08-30 | Microsoft Technology Licensing, Llc | Active featuring in computer-human interactive learning |
CN105393263A (en) * | 2013-07-12 | 2016-03-09 | 微软技术许可有限责任公司 | Feature completion in computer-human interactive learning |
US9779081B2 (en) | 2013-07-12 | 2017-10-03 | Microsoft Technology Licensing, Llc | Feature completion in computer-human interactive learning |
US20150019204A1 (en) * | 2013-07-12 | 2015-01-15 | Microsoft Corporation | Feature completion in computer-human interactive learning |
US11023677B2 (en) | 2013-07-12 | 2021-06-01 | Microsoft Technology Licensing, Llc | Interactive feature selection for training a machine learning system and displaying discrepancies within the context of the document |
US10372815B2 (en) | 2013-07-12 | 2019-08-06 | Microsoft Technology Licensing, Llc | Interactive concept editing in computer-human interactive learning |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US9740682B2 (en) | 2013-12-19 | 2017-08-22 | Abbyy Infopoisk Llc | Semantic disambiguation using a statistical analysis |
US9626353B2 (en) | 2014-01-15 | 2017-04-18 | Abbyy Infopoisk Llc | Arc filtering in a syntactic graph |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US20160232232A1 (en) * | 2014-06-26 | 2016-08-11 | International Business Machines Corporation | Mining product aspects from opinion text |
US10282467B2 (en) * | 2014-06-26 | 2019-05-07 | International Business Machines Corporation | Mining product aspects from opinion text |
US20150379090A1 (en) * | 2014-06-26 | 2015-12-31 | International Business Machines Corporation | Mining product aspects from opinion text |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160012020A1 (en) * | 2014-07-14 | 2016-01-14 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US10073673B2 (en) * | 2014-07-14 | 2018-09-11 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9858506B2 (en) | 2014-09-02 | 2018-01-02 | Abbyy Development Llc | Methods and systems for processing of images of mathematical expressions |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9606986B2 (en) | 2014-09-29 | 2017-03-28 | Apple Inc. | Integrated word N-gram and class M-gram language models |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US20160147737A1 (en) * | 2014-11-20 | 2016-05-26 | Electronics And Telecommunications Research Institute | Question answering system and method for structured knowledgebase using deep natual language question analysis |
US9633006B2 (en) * | 2014-11-20 | 2017-04-25 | Electronics And Telecommunications Research Institute | Question answering system and method for structured knowledgebase using deep natural language question analysis |
US9626358B2 (en) | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US11106871B2 (en) | 2015-01-23 | 2021-08-31 | Conversica, Inc. | Systems and methods for configurable messaging response-action engine |
US11301632B2 (en) | 2015-01-23 | 2022-04-12 | Conversica, Inc. | Systems and methods for natural language processing and classification |
US11100285B2 (en) | 2015-01-23 | 2021-08-24 | Conversica, Inc. | Systems and methods for configurable messaging with feature extraction |
US11663409B2 (en) | 2015-01-23 | 2023-05-30 | Conversica, Inc. | Systems and methods for training machine learning models using active learning |
US11551188B2 (en) | 2015-01-23 | 2023-01-10 | Conversica, Inc. | Systems and methods for improved automated conversations with attendant actions |
RU2710966C2 (en) * | 2015-01-23 | 2020-01-14 | МАЙКРОСОФТ ТЕКНОЛОДЖИ ЛАЙСЕНСИНГ, ЭлЭлСи | Methods for understanding incomplete natural language query |
US11042910B2 (en) * | 2015-01-23 | 2021-06-22 | Conversica, Inc. | Systems and methods for processing message exchanges using artificial intelligence |
US11010555B2 (en) | 2015-01-23 | 2021-05-18 | Conversica, Inc. | Systems and methods for automated question response |
US20160217501A1 (en) * | 2015-01-23 | 2016-07-28 | Conversica, Llc | Systems and methods for processing message exchanges using artificial intelligence |
US11991257B2 (en) | 2015-01-30 | 2024-05-21 | Rovi Guides, Inc. | Systems and methods for resolving ambiguous terms based on media asset chronology |
US11843676B2 (en) | 2015-01-30 | 2023-12-12 | Rovi Guides, Inc. | Systems and methods for resolving ambiguous terms based on user input |
US11811889B2 (en) | 2015-01-30 | 2023-11-07 | Rovi Guides, Inc. | Systems and methods for resolving ambiguous terms based on media asset schedule |
US11997176B2 (en) | 2015-01-30 | 2024-05-28 | Rovi Guides, Inc. | Systems and methods for resolving ambiguous terms in social chatter based on a user profile |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9824084B2 (en) | 2015-03-19 | 2017-11-21 | Yandex Europe Ag | Method for word sense disambiguation for homonym words based on part of speech (POS) tag of a non-homonym word |
US10045237B2 (en) * | 2015-04-09 | 2018-08-07 | Hong Kong Applied Science And Technology Research Institute Co., Ltd. | Systems and methods for using high probability area and availability probability determinations for white space channel identification |
US20160302196A1 (en) * | 2015-04-09 | 2016-10-13 | Hong Kong Applied Science And Technology Research Institute Co., Ltd. | Systems and methods for using high probability area and availability probability determinations for white space channel identification |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10769184B2 (en) * | 2015-06-05 | 2020-09-08 | Apple Inc. | Systems and methods for providing improved search functionality on a client device |
US20160357853A1 (en) * | 2015-06-05 | 2016-12-08 | Apple Inc. | Systems and methods for providing improved search functionality on a client device |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11423023B2 (en) | 2015-06-05 | 2022-08-23 | Apple Inc. | Systems and methods for providing improved search functionality on a client device |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10275708B2 (en) * | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Criteria enhancement technique for business name categorization |
US10274983B2 (en) * | 2015-10-27 | 2019-04-30 | Yardi Systems, Inc. | Extended business name categorization apparatus and method |
US11216718B2 (en) * | 2015-10-27 | 2022-01-04 | Yardi Systems, Inc. | Energy management system |
US10268965B2 (en) * | 2015-10-27 | 2019-04-23 | Yardi Systems, Inc. | Dictionary enhancement technique for business name categorization |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10460229B1 (en) * | 2016-03-18 | 2019-10-29 | Google Llc | Determining word senses using neural networks |
US9760627B1 (en) * | 2016-05-13 | 2017-09-12 | International Business Machines Corporation | Private-public context analysis for natural language content disambiguation |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10191899B2 (en) | 2016-06-06 | 2019-01-29 | Comigo Ltd. | System and method for understanding text using a translation of the text |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10650810B2 (en) * | 2016-10-20 | 2020-05-12 | Google Llc | Determining phonetic relationships |
US20190295531A1 (en) * | 2016-10-20 | 2019-09-26 | Google Llc | Determining phonetic relationships |
US11450313B2 (en) * | 2016-10-20 | 2022-09-20 | Google Llc | Determining phonetic relationships |
WO2018118302A1 (en) * | 2016-12-21 | 2018-06-28 | Intel Corporation | Methods and apparatus to identify a count of n-grams appearing in a corpus |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
CN106709011A (en) * | 2016-12-26 | 2017-05-24 | 武汉大学 | Positional concept hierarchy disambiguation calculation method based on spatial locating cluster |
US10140286B2 (en) * | 2017-02-22 | 2018-11-27 | Google Llc | Optimized graph traversal |
US20180239751A1 (en) * | 2017-02-22 | 2018-08-23 | Google Inc. | Optimized graph traversal |
US12001799B1 (en) | 2017-02-22 | 2024-06-04 | Google Llc | Optimized graph traversal |
US11551003B2 (en) | 2017-02-22 | 2023-01-10 | Google Llc | Optimized graph traversal |
US10789428B2 (en) | 2017-02-22 | 2020-09-29 | Google Llc | Optimized graph traversal |
US10872080B2 (en) * | 2017-04-24 | 2020-12-22 | Oath Inc. | Reducing query ambiguity using graph matching |
US10055410B1 (en) * | 2017-05-03 | 2018-08-21 | International Business Machines Corporation | Corpus-scoped annotation and analysis |
US10268688B2 (en) * | 2017-05-03 | 2019-04-23 | International Business Machines Corporation | Corpus-scoped annotation and analysis |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US20190251173A1 (en) * | 2017-05-15 | 2019-08-15 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10565314B2 (en) * | 2017-05-15 | 2020-02-18 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10372824B2 (en) * | 2017-05-15 | 2019-08-06 | International Business Machines Corporation | Disambiguating concepts in natural language |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US10652592B2 (en) | 2017-07-02 | 2020-05-12 | Comigo Ltd. | Named entity disambiguation for providing TV content enrichment |
US10726061B2 (en) | 2017-11-17 | 2020-07-28 | International Business Machines Corporation | Identifying text for labeling utilizing topic modeling-based text clustering |
US11308128B2 (en) * | 2017-12-11 | 2022-04-19 | International Business Machines Corporation | Refining classification results based on glossary relationships |
US12075104B2 (en) * | 2018-03-20 | 2024-08-27 | Netflix, Inc. | Quantifying perceptual quality model uncertainty via bootstrapping |
US11361416B2 (en) | 2018-03-20 | 2022-06-14 | Netflix, Inc. | Quantifying encoding comparison metric uncertainty via bootstrapping |
US11170770B2 (en) * | 2018-08-03 | 2021-11-09 | International Business Machines Corporation | Dynamic adjustment of response thresholds in a dialogue system |
CN109214007A (en) * | 2018-09-19 | 2019-01-15 | 哈尔滨理工大学 | A kind of Chinese sentence meaning of a word based on convolutional neural networks disappears qi method |
US20200104379A1 (en) * | 2018-09-28 | 2020-04-02 | Io-Tahoe LLC. | System and method for tagging database properties |
US11226970B2 (en) * | 2018-09-28 | 2022-01-18 | Hitachi Vantara Llc | System and method for tagging database properties |
US10832680B2 (en) | 2018-11-27 | 2020-11-10 | International Business Machines Corporation | Speech-to-text engine customization |
US11237713B2 (en) * | 2019-01-21 | 2022-02-01 | International Business Machines Corporation | Graphical user interface based feature extraction application for machine learning and cognitive models |
US11216742B2 (en) | 2019-03-04 | 2022-01-04 | Iocurrents, Inc. | Data compression and communication using machine learning |
US11468355B2 (en) | 2019-03-04 | 2022-10-11 | Iocurrents, Inc. | Data compression and communication using machine learning |
US11966686B2 (en) * | 2019-06-17 | 2024-04-23 | The Boeing Company | Synthetic intelligent extraction of relevant solutions for lifecycle management of complex systems |
US20200394257A1 (en) * | 2019-06-17 | 2020-12-17 | The Boeing Company | Predictive query processing for complex system lifecycle management |
US11222057B2 (en) * | 2019-08-07 | 2022-01-11 | International Business Machines Corporation | Methods and systems for generating descriptions utilizing extracted entity descriptors |
US11710574B2 (en) | 2021-01-27 | 2023-07-25 | Verantos, Inc. | High validity real-world evidence study with deep phenotyping |
WO2022245405A1 (en) * | 2021-05-17 | 2022-11-24 | Verantos, Inc. | System and method for term disambiguation |
GB2622167A (en) * | 2021-05-17 | 2024-03-06 | Verantos Inc | System and method for term disambiguation |
US11727208B2 (en) | 2021-05-17 | 2023-08-15 | Verantos, Inc. | System and method for term disambiguation |
US11494557B1 (en) | 2021-05-17 | 2022-11-08 | Verantos, Inc. | System and method for term disambiguation |
US11989511B2 (en) | 2021-05-17 | 2024-05-21 | Verantos, Inc. | System and method for term disambiguation |
CN113361283A (en) * | 2021-06-28 | 2021-09-07 | 东南大学 | Web table-oriented paired entity joint disambiguation method |
US20230132090A1 (en) * | 2021-10-22 | 2023-04-27 | Tencent America LLC | Bridging semantics between words and definitions via aligning word sense inventories |
Also Published As
Publication number | Publication date |
---|---|
CA2536262A1 (en) | 2005-03-03 |
US20050080780A1 (en) | 2005-04-14 |
WO2005020092A1 (en) | 2005-03-03 |
WO2005020093A1 (en) | 2005-03-03 |
CA2536270A1 (en) | 2005-03-03 |
CN1871603B (en) | 2010-04-28 |
EP1665092A4 (en) | 2006-11-22 |
CN1871597A (en) | 2006-11-29 |
CN100580666C (en) | 2010-01-13 |
EP1661031A4 (en) | 2006-12-13 |
CN1839386A (en) | 2006-09-27 |
CN1871597B (en) | 2010-04-14 |
EP1665091A1 (en) | 2006-06-07 |
CA2536265A1 (en) | 2005-03-03 |
CA2536265C (en) | 2012-11-13 |
US7895221B2 (en) | 2011-02-22 |
EP1665091A4 (en) | 2006-11-15 |
EP1661031A1 (en) | 2006-05-31 |
US20050080776A1 (en) | 2005-04-14 |
EP1665092A1 (en) | 2006-06-07 |
CN1871603A (en) | 2006-11-29 |
US20110202563A1 (en) | 2011-08-18 |
US7509313B2 (en) | 2009-03-24 |
WO2005020091A1 (en) | 2005-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050080613A1 (en) | System and method for processing text utilizing a suite of disambiguation techniques | |
Kanapala et al. | Text summarization from legal documents: a survey | |
US9971974B2 (en) | Methods and systems for knowledge discovery | |
Li et al. | Learning question classifiers: the role of semantic information | |
US20070136251A1 (en) | System and Method for Processing a Query | |
JP4726528B2 (en) | Suggested related terms for multisense queries | |
US8543565B2 (en) | System and method using a discriminative learning approach for question answering | |
US8346534B2 (en) | Method, system and apparatus for automatic keyword extraction | |
US8005858B1 (en) | Method and apparatus to link to a related document | |
US7376634B2 (en) | Method and apparatus for implementing Q&A function and computer-aided authoring | |
Varma et al. | IIIT Hyderabad at TAC 2009. | |
US20070073745A1 (en) | Similarity metric for semantic profiling | |
KR20230077589A (en) | Method of classifying intention of various question and searching answers of financial domain using external databse and system impelemting thereof | |
Alami et al. | Hybrid method for text summarization based on statistical and semantic treatment | |
Devi et al. | A hybrid document features extraction with clustering based classification framework on large document sets | |
Alami et al. | Arabic text summarization based on graph theory | |
Galvez et al. | Term conflation methods in information retrieval: Non‐linguistic and linguistic approaches | |
Selvaretnam et al. | A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting | |
Strzalkowski | Natural language processing in large-scale text retrieval tasks | |
Al-Lahham | Index term selection heuristics for Arabic text retrieval | |
Zavrel et al. | Feature-Rich Memory-Based Classification for Shallow NLP and Information Extraction. | |
Radu et al. | A focused crawler for Romanian words discovery | |
Yan et al. | A novel word-graph-based query rewriting method for question answering | |
Sabbah | Automatic term extraction using statistical techniques a comparative in-depth study & application | |
Moschitti | Natural language processing and automated text categorization. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: IDILIA INC., CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COLLEDGE, MR. MATTHEW;BELZILE, MR. PIERRE;BARNES, MR. JEREMY;REEL/FRAME:017604/0949 Effective date: 20041209 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |