USRE44248E1 - System for transferring personalize matter from one computer to another - Google Patents
System for transferring personalize matter from one computer to another Download PDFInfo
- Publication number
- USRE44248E1 USRE44248E1 US13/436,519 US201213436519A USRE44248E US RE44248 E1 USRE44248 E1 US RE44248E1 US 201213436519 A US201213436519 A US 201213436519A US RE44248 E USRE44248 E US RE44248E
- Authority
- US
- United States
- Prior art keywords
- computer
- user
- speech recognition
- voice model
- recited
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime, expires
Links
- 238000000034 method Methods 0.000 claims abstract description 133
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000012546 transfer Methods 0.000 claims description 36
- 238000009434 installation Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims 45
- 230000008569 process Effects 0.000 abstract description 35
- 238000013459 approach Methods 0.000 abstract description 6
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 36
- 238000005516 engineering process Methods 0.000 description 22
- 238000005259 measurement Methods 0.000 description 18
- 238000004519 manufacturing process Methods 0.000 description 14
- 238000013519 translation Methods 0.000 description 11
- 230000010354 integration Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000007667 floating Methods 0.000 description 7
- 230000004043 responsiveness Effects 0.000 description 7
- 238000011056 performance test Methods 0.000 description 6
- 238000013518 transcription Methods 0.000 description 6
- 230000035897 transcription Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 101100113065 Mus musculus Cfi gene Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
Definitions
- This invention has been created without the sponsorship or funding of any federally sponsored research or development program.
- This invention relates to computer voice recognition enhancements. It explains methodologies for measuring reliability, accuracy, and performance as system responsiveness using a standardized method of measurement.
- the invention introduces a method of machine independent user mobility between different voice recognition systems. It addresses a method for enabling speaker independent voice recognition for masses of people without the need for training or enrollment. It describes how to apply the technology to a new style of interactive real time voice to text handheld transcriber including visual feedback to replace the previous handheld transcribers that are only recording devices. It describes using these techniques in a system that translates voice mail audio into text readable messages.
- voice recognition dictation products that are presently in the market follow the typical clone PC market strategy.
- the state of the art is buying a personal computer that is designed as a general purpose computing device, installing voice recognition software (i.e. IBM ViaVoice, L&H Voice Express, Philips Speech Pro from Philips, Dragon Naturally Speaking, from Dragon Systems), and using that configuration as a Large Vocabulary Voice Recognition dictation system.
- voice recognition software i.e. IBM ViaVoice, L&H Voice Express, Philips Speech Pro from Philips, Dragon Naturally Speaking, from Dragon Systems
- LVVR Large Vocabulary Voice Recognition
- the voice recognition dictation systems require the training sessions to enable the system to identify the words of a person is speaking.
- the process of training a voice recognition system creates speaker voice files or a “Voice Model”.
- a “Voice Model” is defined here as a signal, information, or electronic data file that is information and/or parameters that representation of a person's voice or a noise.
- a Voice Model contains attributes that characterize specific speaking items such as formants, phonemes, speaking rate, pause length, acoustic models, unique vocabulary's, etc. for a given user.
- One use for a voice model that contains data and parameters of a specific user is that it allows the user to take advantage of Large Vocabulary Voice Recognition (LVVR) dictation applications. All approaches to LVVR (e.g.
- This invention is targeting to resolve the specific problems of measuring a standard performance and standard accuracy, machine dependency, speaker dependency, mobility, and methods of estimating accurate cost for users and manufacturers.
- This invention includes several components that provide enhancements and ease of use features to voice recognition systems and applications.
- it is possible to reliably measure accuracy and responsiveness of a voice recognition system used for dictation purposes. With the ability to measure these key metrics other enhancements can then be added to voice recognition systems with a quick and easy determination of system improvement or degradation.
- One such enhancement described is the ability to move speaker voice models (Voice Modeled Mobility) between systems with the ability to quickly determine the success of a quick user enrollment versus a full training session of a voice recognition system.
- the measurements can also be applied to a new type of handheld transcriber with internal voice dictation software eliminating the need for a two-step process of recording the dictation notes and then transferring them to the voice recognition software for a speech to text translation. Further advantages can be achieved by applying the RAP Rate measurement techniques to engineering and manufacturing processes resulting in products that have a common reference and relationships providing a known value to industry, users, and customers prior to purchasing the voice dictation product. Applying the RAP Rate measurement techniques with other techniques for determining voice recognition user speech patterns (described in detail later) enables the creation of a new type of speaker voice model or a (Super Voice Model) that can be applied to many people without the prerequisite of training or voice recognition system enrollment.
- this invention includes components that measure voice recognition metrics (RAP meter), provide ease of use for the movement of speaker voice models (Voice Model Mobility), a handheld transcriber that includes voice recognition software for dictation internal to the transcriber (Powerful Handheld Device), a process for the manufacturing and verification of systems used for voice dictation purposes (RAP Rate Manufacturing Process), a methodology for creating speaker independent voice models (Super Voice Model), and applying these techniques of RAP Meters, Voice Model Mobility, Super Voice Model, and Powerful Handheld Devices to a Audio Voice Mail to Text Translation system.
- RAP meter voice recognition metrics
- FIG. 1 is the opening screen of a software application called Voice Model Mobility. It displays the voice recognition application being used (current speech engine) and its related software version. It also has four buttons for controlling the application including moving speaker voice models from the voice recognition application (Move voice model from speech engine), moving voice models to the speech recognition application (Move voice model from media), and help and exit buttons.
- voice recognition application current speech engine
- buttons for controlling the application including moving speaker voice models from the voice recognition application (Move voice model from speech engine), moving voice models to the speech recognition application (Move voice model from media), and help and exit buttons.
- FIG. 2 illustrates the user control screen of the Voice Model Mobility software used when copying a voice model from the speech engine to media (disk drive, tape drive, writable CD, network, or other transfer medium). It has for user control's including voice model selection button, destination button, and OK and cancel button.
- FIG. 3 displays the user control screen of the Voice Model Mobility software used when moving a voice model from media into a speech recognition application. This control allows the location of the voice model to be selected and OK and cancel button.
- FIG. 4 displays the user control screen of the Voice Model Mobility software used when moving a voice model from media into a speech recognition application. There is a button to select the voice model and OK and cancel buttons.
- FIG. 5 indicates a Voice Model Mobility error handling dialog box example indicating to the user attempting to move voice model that the voice model was not successfully moved.
- FIG. 6 illustrates the RAP Meter software opening screen. It contained six buttons and a display area to provide visual feedback to the user.
- the “Verify RAP Rate” button launches a user screen to perform the RAP Rate test.
- the Select Mic” button allows the tests user to select a specific microphone on the system being used.
- the “Log File” button is a file that the user can review to see specific details of the Rap Rate test.
- the “Certificate Button” displays a certificate that can be shared with others indicating what level of Rap Rate the system under test achieved.
- the “Help” button displays the Rap Meter online help documentation, and the “Exit” button exits the RAP Meter program.
- FIG. 7 illustrates the first opening screen of a RAP Rate test session. This screen enables the user to input specific information about the test to be performed including test name, voice recognition software used, and specific hardware options like the microphone or sound input port to be used for the test.
- FIG. 8 is the RAP Rate user interface where the testing is implemented. It contains three display areas and two “New Test” buttons that start either the accuracy or the performance test.
- the “Delay” display illustrates the response time of how long it takes the speech recognition software to translate an audio spoken word into text display on a computer screen.
- the “Text” of display area provides the text to be read during the RAP Rate testing.
- the “Performance” display area provides the text to be read for the performance test.
- the “Log File” displays a log file of the current test. The “OK” button reverts back to the RAP Meter main screen, and “Cancel Button” reverts back to the main screen.
- FIG. 9 is a display of the RAP Rate Certificate.
- the RAP Rate certificate is provided by the RAP Meter after the RAP Rate test has been completed, used for sharing and displaying the RAP Rate achievement for a specific system.
- FIG. 10 is an example of a RAP Rate log file From a Typical RAP Rate test run. It illustrates the detailed test results for the performance in the accuracy test and also includes system specific configuration.
- FIG. 11 shows a microphone and a hand held transcriber connected to a voice recognition system using a Y connector cable enabling simultaneous training of the voice recognition system for a microphone and transcriber input devices.
- FIG. 12 is a prototype handheld computer with voice recognition software included for the purposes of voice dictation.
- FIG. 13 illustrates computer hardware and the relationship of the components. The components that are shown fit into the form factor of a handheld transcriber.
- FIGS. 14 and 15 illustrate a flow chat of a manufacturing process using the RAP Rate metrics to produce voice recognition systems with an observed and predictable level of accuracy and responsiveness.
- FIG. 16 illustrates a process sheet to support the RAP Rate manufacturing process.
- FIG. 17 illustrates the major components and overview process for a Super Voice Model (SVM).
- SVM Super Voice Model
- VMM Voice Model Mobility
- Voice Model Mobility was originally conceived due to the problem of having to train multiple voice recognition dictation machines for a single person's voice. This was discovered when experimenting with voice recognition dictation applications. It was determined that a better way to use multiple machines was to separate the files and parameters that characterize the user, package the files and parameters as a voice model and move them to a medium for transfer and installation into another separate system. Voice models and a means to package, move them, and install them can and should be independent of the voice recognition applications allowing the owner of a voice model the ability to plug into and use any voice recognition machine.
- Voice models and training are assumed needed and can be time-consuming therefore; voice recognition applications provided backup mechanisms to restore voice files to their original locations. They did not however, provide a means to transfer voice models between systems. Prior to VMM, moving voice models between voice recognition systems did not exist and there was no easy way to move these specific user parameters and data between machines. Several experiments were done in effort of understanding why the voice recognition applications did not support such features. From these experiments it was discovered that the lack of ability to create and move a voice model was not technical. The first experiment was to use the backup and save feature provided with the Dragon Professional voice recognition application. The problems encountered when trying to accomplish this included a different filename when restoring the user from when the user was saved. Another problem was the limitation of where the backup could be saved. In other words the voice model was not mobile.
- the second experiment was to copy the voice model files directly to another location and then copy them back to use them. In some cases this approach appeared to work although it took some trial and error until the exact files that needed to be copied were discovered. Crashes and hangs occurred often. Problems encountered prior to successful file copies included; user voice files contamination, the system hanging when trying to open a specific user, or the Dragon application no longer finding the user for opening. Although this approach sometimes yielded success it was discovered that the user would have to be created first, and then the files could be copied. This was due to registry entries not being setup as part of the copy process. A Visual Basic prototype was coded using this method for user interface experimentation. The third effort included investigation of the system registry to determine if Dragon was setting any parameters using the registry. This was found to be true and solved the final problems. The current version of VMM is coded in the C programming language.
- FIG. 1 illustrates the start up screen from voice model mobility software.
- the opening screen contains two buttons to control the movement of voice models between media and voice recognition systems.
- FIG. 2 illustrates another screen of the user interface. This screen is displayed when de-installing or moving a voice model from the speech recognition program to a storage medium. The user selects the voice model to be used and the destination where the voice model will be stored.
- a software execution process begins copying the user's parameters, and other files that make up the voice model.
- the process creates a catalog of the voice model system environment (parameters) and packages the files into a central location with a log file, information file, and initialization file to support future movement of the specific voice model.
- the sequence of events includes:
- FIG. 3 is the screen displayed when installing a voice model into voice recognition system. From this screen and the follow-up screen shown in FIG. 4 the user selects a specific voice model and the location of the voice model to be used. When the user selects the OK button on FIG. 4 a software execution process starts reading and executing the VMM initialization files and deposits the voice recognition parameters and files into the operating systems registry and parameter configurations files. The sequence of events occurs as follows:
- voice model mobility could enhance many situations where training a voice recognition system is necessary, for example voice model mobility could enhance U.S. Pat. No. 6,477,491 Chandler et al. (System and Method for Providing Speaker Specific Records of Statements of Speakers) and Epstein et al. U.S. Pat. No. 6,327,343 (System and Method for Automatic Call and Data Transfer Processing) both mentioning the need of obtaining or assuming that a speaker voice model exists.
- Voice Model Mobility version 1.0 (VMM V1.0) is the first step toward the concept of modular plug-able voice models. This concept enables new features to be incorporated into voice models to provide enhancements on a wide variety of applications (e.g.
- Voice Model Mobility could be applied with any voice recognition software, application or hardware
- the VMM software application developed and used and described here is for example purposes and used Dragon Systems Voice Recognition Professional Application.
- the transfer medium could have been floppy disk, network, credit card strip, or other means of storing data.
- the VMM prototype has been tested, debugged, and upgraded and is presently being used by many people.
- the current version works with high capacity floppy disks, network drives, CD media, and Internet network drives, and other removable storage media.
- the goal would be to put voice models on credit card type magnet strips requiring personalized the identification to enable the models, similar to credit cards and bank ATM cards of today.
- One embodiment of the invention could be described as a method for training a second speech recognition computer, so that the second speech recognition computer is more effective at speech recognition than it was prior to the training, comprising the steps of training a first speech recognition computer by causing the said first speech recognition computer to form first voice model files that contain the training result, and thereby to cause the first speech recognition computer to be more effective at speech recognition.
- a speech recognition computer system comprising a first computer including a memory, said memory containing a first installation of a speech recognition program, and said memory also containing first voice model files that are adapted to be used by the said speech recognition program to increase the accuracy of the said speech recognition program when the said first computer is used by a user to convert the speech of the user into text, said speech recognition program also containing a training module adapted to convert generic voice model files to the first voice model files by monitoring the conversion of the speech of the user into text, a second computer including a memory, said memory containing a second installation of the speech recognition program, and a transfer device adapted to copy the first voice model files from the first computer to the second computer in such a way that the first voice model files can be utilized by the second installation of the speech recognition program to increase the accuracy of the second installation of the speech recognition program when the said second installation of the speech recognition program is used by the user.
- the mechanics of translating voice models between LVVR applications include; 1) An information file is created identifying which parameters are needed for each type of LVVR systems. 2) The parameters are read from one LVVR system and translated to an LVVR common file format (.lvr). 3) The parameters are then formatted from the (.lvr) to the target voice recognition application format. 4) The file is then translated to the desired voce model format to create the final voice model. 5) The voice model is plugged-in to the destination LVVR system using the VMM techniques.
- an alternate view is the ability to train a voice recognition system to recognized many uses with a single voice model, making the voice recognition system Speaker Independent.
- LVVR Large Vocabulary Voice Recognition
- Voice model mobility was a method to remove machine dependency. If many people want to use a specific voice recognition system for LVVR dictation using the current technology each person would have to train each machine to be used separately which is usually not feasible for the masses of people and potential numbers of systems.
- a Super Voice Model a new type of Voice Model called a Super Voice Model or SVM that has the ability to achieve speaker independent voice recognition.
- the technology that enables this ability is VMM for the movement of voice models combined with RAP Meter technology (described at a later point in this document) to verify success and adjust parameters based on real time feedback.
- the key difference between voice models presently used and a Super Voice Model is: Current voice models attempt to adapt to the many different speakers, while a Super Voice Model attempts to adapt the speaker to fit a standard voice model (the Super Voice Model).
- Voice Models will be available for transfer using VMM, they can be collected into a Voice Model Library ( FIG. 17 # 104 ).
- a new type of synthetic voice model can be created ( FIG. 17 # 107 ), derived from the parameters available in the collection ( FIG. 17 # 105 ).
- the Super Voice Model involves a quick sample of voice input from a new user ( FIG. 17 # 106 ) and a comparison of the available parameters in the parameter lookup table ( FIG. 17 # 105 ) and a selection of the specific parameters from the voice recognition voice model files ( FIG. 17 # 101 , 102 , 103 ) with the final output result a synthetic Voice Model ( FIG.
- the new Voice Model can optionally be calculated real time or prior for a given person via some quick recording.
- the SVM is based upon having information about Voice Models readily available, organized, and ready to statistically calculate to create a synthesize Voice Model that can be used for any given speaker at the time.
- the overview process flow is as follow; VMM creates and archives voice models into a Voice Model Library ( FIG. 17 # 104 ), the voice models are collected and categorized by common attributes of speakers, analysis' is implemented with results deposited into a table ( FIG. 17 # 105 ) indicating availability of parameters for a potential synthetic generic voice model. As more voice models are added, potentially new generic voice models can be created.
- a Super Voice Model includes the library of voice models ( FIG. 17 # 104 ), the lookup table of voice models indexed by parameters ( FIG. 17 # 105 ), the logic to select a specific voice model from the table ( FIG. 17 # 108 ), and the ability to install the synthetic voice model into a voice recognition dictation system using VMM technology and a measurement of the success of the synthetic voice model using RAP Meter technology.
- a possible enhancement to the Super Voice Model database selection rules could include algorithms similar or as described in U.S. Pat. No. 6,029,195 Herz (System for Customized Electronic Identification of Desirable Objects).
- the Super Voice Model eliminates the need for specific user training or enrolling into a voice recognition system.
- the Super Voice Model is a classification and cataloging of many user voice models based on a library of known unique user files (Speaker Voice Models) classified by characteristics of each unique speaker including gender, rate of speech, and phoneme characteristics (e.g. spectral characteristics, pitch, frequency, and amplitude) and the use or combining of these files to statistically create a speaker voice model that previously did not exist or possibly the ability to use a speaker voice model that is very similar if one does exist.
- Using multiple systems and transferring voice models can result in a degradation of system performance and text output accuracy.
- indicators are needed to accurately measure accuracy as a correct word input count and performance as system responsiveness.
- These two key measurements performing reliably are what people expect from a quality voice recognition system.
- These metrics are defined here as Reliable Accuracy Performance Rate or “RAP Rate”.
- RAP Rate Reliable Accuracy Performance Rate
- Reliable Accuracy Performance Rate is defined in this invention as spoken audio correctly translated to text delivered with measured delay time from word spoken to word visible text in an electronic document.
- a user “u” is defined as the person speaking to a voice recognition system.
- the system “s” is defined as a system trained to recognize a person's voice for the purposes of identifying audible spoken words.
- Quality of components “q” is defined as the hardware and software component functionality that is appropriate for voice recognition dictation applications and Integration as “I” defined as how the components are combined together including the merging of hardware, software, and parameters focusing on providing optimal voice recognition. For example, if a system has a reliable accuracy of 96% and a reliable performance of 1 second, then the RAP Rate would equal 96% at 1 second or a RAP Rate of 96/1. Presently, a large vocabulary voice recognition dictation system including quality components and good integration can deliver a RAP rate of approximately 96% at 4 seconds (96/4).
- the RAP Rate equation components can be further defined:
- this equation indicates aspects of hardware that can be changed to achieve an improved RAP rate focusing on the metric of Performance indicating system responsiveness to the process of voice recognition.
- the performance result is measured in time (seconds for current technology).
- the “delay” in the performance definition will never be zero. It may not be immediately perceivable to a user; but it can be measured and over periods of time will be perceived by the user.
- System parameters include hardware (microphones, sound port, AD conversion devices, DSP methods, etc. and the related software driver modules, firmware, bios, operating systems, applications/utilities and their specific parameters).
- Computer parameters designed to accomplish general computing i.e. word processing, multimedia, games, etc.
- Setting up software parameters to ensure the capabilities for LVVR are enabled at all levels can improve RAP Rate.
- “Integration” ties directly to RAP Rate.
- RAP Meter software application To measure the specific metrics of accuracy and performance a RAP Meter software application is created. A person skilled in the art of computer hardware and software development will realize that other methods of creating a device to measure accuracy and performance for voice recognition are possible including a separate device using a separate hardware and software.
- the RAP Meter is one example used here to further describe the concept. Referencing FIGS. 6 through 10 , an explanation of the RAP meter software is as follows:
- FIG. 6 represents the opening screen of the RAP Meter software.
- This screen contains 6 user control buttons and quick instructions in a display area of how to use the RAP Meter.
- the voice recognition software is launched and running in the background.
- the 6 control buttons include: 1) Verifying RAP, 2) Log filed, 3) Help button, 4) Selected mic, 5) Certificate, and 6) Exit button.
- a user selects the verify RAP button.
- a new session screen is displayed as seen in FIG. 7 .
- the user inputs specific information about the test that is about to occur including a name for the test session, what kind of microphone is being used, and what is the voice recognition application software.
- the use also selects a sound port from the available sound ports displayed.
- FIG. 8 After entering this information and clicking the “OK” button, another screen to implement the actual test will be displayed as shown in FIG. 8 . Referring to FIG. 8 there are two separate areas of the screen that, contain controls.
- the top area of the display operates the performance test in the low area of the display operates the accuracy test.
- the user clicks the top new test button. This will cause words to be automatically displayed in the performance display area. As the words are displayed the user is to speak the words as they appear. The time is measured from the time the word is spoken until the point in time when the word would be displayed on a computer screen. This measured delay time is displayed for each word in the delay window. When the performance test is completed the word delays are averaged and written into a log file or wrap or a certificate. For performance, the RAP Meter records the time that sound was input (Tstart) and subtract from the time that text is displayed (Tend) on the screen for editing in a text application. Thus, Performance Tend ⁇ Tstart.
- the text display area a paragraph is printed out for the user to read.
- the RAP Meter counts the words that are translated correctly versus the words that are translated incorrectly. It calculates the ratio of these two values in displays the results as a percentage of correct words in a log file and RAP Rate certificate.
- the words are translated into text.
- the RAP Meter compares the original text with the text that was translated using the LVVR system and responds back to the user with an accuracy measurement in percentage of correct translated words (e.g. 96% correct).
- Accuracy % words incorrect/words correct. Incorrect words can be highlighted for display to the user.
- a log file will be display as shown in FIG. 10 indicating the results of the performance in the accuracy test in combining the metrics to create a RAP Rate of accuracy percentage versus performance as system response time.
- the log file includes specific details on system specifics including voice recognition application, computer hardware (processor type, memory capacity, CPU speed, etc.).
- the log also includes words that were spoken during the performance test and an indication if the words were correctly translated along with the delay for each word spoken measured in seconds or fractions thereof.
- the log file also includes output of the accuracy test including the paragraph that was read as displayed on the screen and the paragraph that was spoken by the user of the test (input and output text).
- the accuracy metric is also included in the log file as a percentage of correct words versus and correct words.
- the log file can be printed, sent to the mail, or transferred by other means.
- the RAP Rate Meter concept is an extend feature for verification of the Voice Model Mobility concept providing an indication of successful operation of moving a voice model.
- the RAP meter can also be provided as a separate application for certification of LVVR applications.
- companies could use RAP Rate to advertise displaying RAP certification and could be charged a per-system usage fee or license fee as a way to create revenues from RAP licensing.
- the components (Application Programmers Interfaces) used to get real time capture and performance measurement includes; software application handles to indicate applications loaded are used for LVVR. Audio input device IRQ and I/O address range and software driver IO function calls to indicate when the A/D translation has started. Speech recognition function calls (e.g. RealTimeGet and TimeOutGet) to indicate when the voice recognition engine has started and completed the translation. Video board IRQ and I/O address range and software driver 10 function calls to determine when the text is being displayed to the editor on the screen. As words are spoken into a microphone, trigger points are set to indicate when each section of the text translation and display process has completed its task. The following steps indicate one method regarding how RAP meters can function as an example:
- the RAP meter can be applied to engineering and manufacturing processes. It is the object of this invention to describe a methodology for a process that measures specific hardware and software features necessary to support optimal Large Vocabulary Voice Recognition (LVVR) using Reliable Accuracy Performance (RAP) Rate as the measurement reference of the processes. during engineering the reference measurement values are inserted into process sheets allowing controlled steps to be followed. Using this technique processes can be developed for a production line for LVVR system manufacture. For development the methods include a hardware components selection process based on investigation of functions needed, a test process to measure components adequacy, and documenting functionality and parameters.
- LVVR Large Vocabulary Voice Recognition
- RAP Reliable Accuracy Performance
- FIG. 14 , # 501 starts the process with a component investigation of optimal components to be used in voice recognition dictation systems.
- Supplier components are investigated for the specific hardware functionality needed ( FIG. 14 , # 502 ).
- Specifications and documentation distributed by component suppliers is investigated for the specific fit to the RAP list of requirements FIG. 14 , # 506 .
- the process can include having the suppliers of hardware components produce the list of hardware that meets the requirements of the RAP list.
- the following is a list as an example of optimal features for voice recognition dictation systems:
- FIG. 14 , # 503 illustrates the RAP Meter test and verification being applied to the engineering development process. If an acceptable RAP Rate is achieved the documented process sheets will be delivered to the manufacturing process as illustrated in FIG. 14 , # 508 , otherwise adjustments to components and/or parameters are made FIG. 14 , # 505 in the system goes back to test FIG. 14 , # 503 .
- An example engineering process sheet for system BIOS is shown in FIG. 16 .
- Manufacturing process sheets ( FIG. 15 , # 607 ) are provided from the engineering process in FIG. 14 .
- the components are ordered ( FIG. 15 , # 601 ) as described by the engineering process sheets.
- the components are integrated into a system package ( FIG. 15 , # 602 ) and then tested using the RAP Rate technology ( FIG. 15 , # 603 ). If the RAP Meter test indicates successful results, the system is packaged for shipping ( FIG. 15 , # 606 ) otherwise the processes updated ( FIG. 15 , # 605 ) to avoid unsuccessful results in the future.
- RAP Rate Reliable Accuracy and Performance Rate
- microprocessor usage When measuring microprocessor usage while using LVVR applications, results show that microprocessor usage is at 100%. To determine this, a combination of a manual process and an automated process is used.
- One method of measuring CPU usage is by using the performance monitor tools available with an operating system like Microsoft Windows 98. The goal is to achieve a margin of microprocessor resources left available while dictation to a system is being done. Ideally, with voice recognition a performance in the range of no noticeable delay from the time the words are spoken to the time the text is displayed in a text editor is a desired metric. If other applications are to be run simultaneously, then an additional margin in performance must be added to avoid affecting RAP rate.
- Floating point microprocessors may be embedded in a main microprocessor or done separately by the main CPU instruction set or software. Microprocessors that support floating-point in different ways can directly affect RAP rate. Ideally a microprocessor that has a combination of hardware registers, floating point instruction set with features that allow multiple calculations with minimal clock cycles, while supporting access to fast cache memory are desirable. Measurements on floating points can be achieved using industry standard tools or published results in the trade magazines or from the manufacturers.
- Cache memory is the closest storage medium to the microprocessors doing the work. Typically the memory closest to the main CPU will be the fastest data access. The capacity of the cache memory, the percentage of cache hits, and if the cache is embedded in the CPU chip or off chip will make a difference. “KB Cache/cache hit rate” work as performance enhancement in the equation and can be measured using embedded OS performance tools of Microsoft Windows.
- High-capacity/fast main memory a large capacity main memory is desired and will affect performance. Enough capacity to allow the LVVR and related applications to execute directly out of memory yields the best performance. Having to go out to disk is a magnitude of time longer and should be avoided whenever possible. Testing and measuring results indicate that using a LVVR system can easily use 256 megabytes to prevent disk access. This can be measured using operating system tools like the performance monitor of Microsoft Windows 98, along with other tools available in the computer industry. As memory is reduced a delay resulting in a lower RAP rate will occur. Therefore the equation includes a metric “% of application in memory” as add or minus to performance. These values will change over time and technology, but the goal remains the same for LVVR, to execute without disk access.
- Sound input device with performance focused in the range of the human voice. Most sound components for PC's focus on output while input is a secondary consideration. Therefore sound input components can cause performance problems.
- the physical system interface/bus can also add or subtract to performance. A/D conversion time+bus throughput latency time subtracts from the performance and can never be removed from the system. While this delay can be lowered to the level of not perceivable, it will never be reduced to zero. Oscilloscopes are one method of measuring this delay. This measurement is also included in the performance measurement of RAP Rate which can be measured through a software tool like a RAP meter.
- RAP rate can be affected by software (firmware, operating systems, applications/utilities and parameters). Parameters can enhance or subtract RAP rate from a large vocabulary voice recognition application. As an example, a word processing application with a parameter set to auto correct grammar during dictation may cause sever RAP rate reduction due to resources being shared for real time grammar correction and LVVR. Starting at the lowest level (BIOS) and working through the OS towards the LVVR application is one method of tuning software for a good RAP rate. Another method would be to reverse the order and start at the LVVR application and work back. Then create a software utility that does the parameter settings automatically based on the known information.
- BIOS lowest level
- One object of this invention is to describe a method that can reduced voice recognition system training time by using a cable that allows a microphone to be connected to both the handheld transcriber and a desktop PC and implementing a process of synchronization for training large vocabulary voice recognition on both devices simultaneously.
- a Y cable configuration connects to the handheld transcriber microphone input and at the same time connects to the computer for the voice recognition software input.
- This configuration creates a single microphone input for both devices (the computer and the handheld transcriber).
- Using this method enables a single training session for both devices. This was successful allowing a user to train a hand held transcriber at the same time the desktop system was trained.
- a better method of accomplishing large vocabulary voice recognition for handheld transcribers would be to package the desktop system hardware into a handheld form factor.
- the prototype is a fully functioning handheld transcriber focusing on proof of the concepts of form factor, use of VMM via a network drive, the ability to provide direct feedback of speech to text while dictating in the handheld environment, and the ability to use a voice recognition interface combined with touch screen for user control.
- the prototype supports a vocabulary of over 30,000 words. Test results from this proto-type indicate that production models could support large vocabularies including libraries to support medical and legal services.
- This prototype includes battery operation, network connection, USB, keyboard and mouse if desired, and connection for 120 volt AC power, and a microphone input jack.
- FIG. 13 is a block diagram of the handheld transcriber components. They include a 400 MHz CPU ( FIG. 12 , # 200 ), 256 KB cache ( FIG. 12 , # 201 ), 256 MB of memory ( FIG. 12 , # 203 ), and Neo Magic graphic chip ( FIG. 12 , # 204 ), a PCI to ISA bridge ( FIG. 12 , # 205 ), and a USB port ( FIG. 12 , # 209 ). These components are integrated into any Plug n Run motherboard/daughter board configuration purchased from Cell Computing, located in California.
- the A/D converter ( FIG. 12 , # 208 ) was purchased from Telex in the microphone ( FIG. 12 , # 207 ) was purchased from Radio Shack.
- the color LCD ( FIG. 12 , # 206 ) was purchased from Sharp.
- the Microsoft Windows 98 operating system was loaded onto the IDE disk drive in the voice recognition software (Dragon Naturally Speaking) was installed onto the handheld transcriber.
- the device After applying power to the device it can be controlled using voice recognition commands and touch screen.
- the device When the device becomes ready it automatically is in a mode to select a user and dictation can start. Dictating to machine; the device supports a microphone input jack with a microphone on/off switch that can be momentary or left in either mode.
- the user speaks into a microphone and the voice is translated into a text editor on a handheld screen. What makes this handheld device unique is the amount of words (large vocabulary of greater than 30,000+ words) that can be translated realtime during dictation. Save file saves the dictated text files for later editing, importing and exporting, archival or transfer.
- the device supports a network connection for moving the files and voice models to and from the handheld device.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Abstract
Description
-
- 1. User clicks “Move voice model to Media” button.
- 2. A folder labeled “Users” is created on the destination media.
- 3. A “users.ini” file is created in the Users folder. This file is a logical translation from Username to a user file name that Dragon will open.
- 4. VMM then creates and writes the user specific registry information into a file called VMMinfo.txt
- 5. A “user” specific folder will be created in the Users folder. There are several files in the user specific folder.
- 6. The user as a result of the Dragon training process creates the files listed below. These files are copied to the CDWriter using the standard Dragon directory structure. Files included are:
audioin.dat | ||
Current Folder containing: | ||
topics (configuration file) | ||
options (configuration file) | ||
global.DVC | ||
Voice folder | ||
DD10User.sig | ||
DD10User.usr | ||
GeneralE Folder | ||
dd10voc1.voc | ||
dd10voc2.voc | ||
dd10voc3.voc | ||
dd10voc4.voc | ||
General.voc | ||
Shared Folder | ||
archive. voc | ||
-
- 1. VMM pops up a user selection window asking for drive containing the voice model.
- 2. After the drive is selected, VMM looks on the selected drive for the Users folder containing the voice models, specifically, the users.ini files.
- 3. If VMM does not find any users, a window pops up saying that no users were found with an OK button to click returning the user to the previous screen as illustrated in
FIG. 5 . - 4. If users exist, VMM then asks the user to select one of the voice models it found on the selected drive.
- 5. VMM then reads the file VMMInfo.txt file to determine the appropriate registry settings.
- 6. If the user already exists, VMM will prompt the user to ask if the existing user files should be overwritten.
- 7. If the user responds by clicking the OK button, then steps 8 onward will be executed, other wise VMM will go back to the main VMM screen.
- 8. If there is no user specific folder, then VMM creates the user specific folder in the standard Dragon hierarchical directory structure otherwise it uses the existing folder.
- 9. VMM then copies all the specific user files listed previously to the specific User folder.
- 10. VMM then configures the registry parameters for the selected user.
-
- 1. (Setup) Application used for LVVR is identified
- 2. (Tstart) A/D time is measured by logging the time the driver gets sound input. This can be accomplished through a peek message or for MSWindows; InmChannelAudio::IsIncoming, HRESULT IsIncoming(void); or other method.
- 3. (Pstart) Determine and log the when the speech processing engine has received the sound by using a function call (i.e. RealTimeGet).
- 4. (Pend) Determine and log the time when the speech engine has completed the translation using a function call (i.e. TimeOutGet).
- 5. (Tend) Determine when the graphics driver has displayed the text using a peek message or for MSWindows a function call (i.e. UI Text Event; TEXT_VALUE_CHANGED).
- 6. (Report) Calculate the times. For general performance Tend−Tstart will supply the performance delay. For further resolution to determine areas of throughput resistance, steps 2 and 3 can be used.
- Optimal features to enhance RAP Rate
- High-speed microprocessors
- Robust floating point features
- Large on chip and off chip cache memory 512 kb or more
- High-capacity/fast main memory (optimal 512 megabytes)
- Sound input device with performance focused on input in the range of the human voice
- An operating system specifically configured (tuned) for the application of voice recognition including:
- Removing any throughput resistance including processes that require main CPU clock cycles but don't provide advantage to LVVR.
- Removing operating system resources that use main memory or run in the background like schedulers, virus checking, or utilities that execute polling at specific time intervals or triggers.
- Removing applications that use main CPU floating point and moving that work to other microprocessors.
- Ensuring that any operating system or applications being used return allocated memory back to being available and not left locked out by the LVVR application.
Claims (74)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/436,519 USRE44248E1 (en) | 1999-09-29 | 2012-03-30 | System for transferring personalize matter from one computer to another |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15663899P | 1999-09-29 | 1999-09-29 | |
US21450400P | 2000-06-28 | 2000-06-28 | |
US67632800A | 2000-09-29 | 2000-09-29 | |
US10/763,966 US7689416B1 (en) | 1999-09-29 | 2004-01-23 | System for transferring personalize matter from one computer to another |
US13/436,519 USRE44248E1 (en) | 1999-09-29 | 2012-03-30 | System for transferring personalize matter from one computer to another |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/763,966 Reissue US7689416B1 (en) | 1999-09-29 | 2004-01-23 | System for transferring personalize matter from one computer to another |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE44248E1 true USRE44248E1 (en) | 2013-05-28 |
Family
ID=42044656
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/763,966 Expired - Fee Related US7689416B1 (en) | 1999-09-29 | 2004-01-23 | System for transferring personalize matter from one computer to another |
US13/436,519 Expired - Lifetime USRE44248E1 (en) | 1999-09-29 | 2012-03-30 | System for transferring personalize matter from one computer to another |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/763,966 Expired - Fee Related US7689416B1 (en) | 1999-09-29 | 2004-01-23 | System for transferring personalize matter from one computer to another |
Country Status (1)
Country | Link |
---|---|
US (2) | US7689416B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130211568A1 (en) * | 2012-02-12 | 2013-08-15 | Skymedi Corporation | Automataed mass prodcution method and system thereof |
US9763024B2 (en) | 2015-04-09 | 2017-09-12 | Yahoo Holdings, Inc. | Mobile ghosting |
US20190378496A1 (en) * | 2018-06-07 | 2019-12-12 | Kabushiki Kaisha Toshiba | Recognition device, method and storage medium |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2902542B1 (en) * | 2006-06-16 | 2012-12-21 | Gilles Vessiere Consultants | SEMANTIC, SYNTAXIC AND / OR LEXICAL CORRECTION DEVICE, CORRECTION METHOD, RECORDING MEDIUM, AND COMPUTER PROGRAM FOR IMPLEMENTING SAID METHOD |
US8838457B2 (en) * | 2007-03-07 | 2014-09-16 | Vlingo Corporation | Using results of unstructured language model based speech recognition to control a system-level function of a mobile communications facility |
US8949130B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Internal and external speech recognition use with a mobile communication facility |
US20090030687A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Adapting an unstructured language model speech recognition system based on usage |
US10056077B2 (en) | 2007-03-07 | 2018-08-21 | Nuance Communications, Inc. | Using speech recognition results based on an unstructured language model with a music system |
US8886545B2 (en) * | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Dealing with switch latency in speech recognition |
US20090030691A1 (en) * | 2007-03-07 | 2009-01-29 | Cerra Joseph P | Using an unstructured language model associated with an application of a mobile communication facility |
US8886540B2 (en) * | 2007-03-07 | 2014-11-11 | Vlingo Corporation | Using speech recognition results based on an unstructured language model in a mobile communication facility application |
US8949266B2 (en) | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
US8880405B2 (en) | 2007-03-07 | 2014-11-04 | Vlingo Corporation | Application text entry in a mobile environment using a speech processing facility |
US20080312934A1 (en) * | 2007-03-07 | 2008-12-18 | Cerra Joseph P | Using results of unstructured language model based speech recognition to perform an action on a mobile communications facility |
US20080221899A1 (en) * | 2007-03-07 | 2008-09-11 | Cerra Joseph P | Mobile messaging environment speech processing facility |
US8635243B2 (en) * | 2007-03-07 | 2014-01-21 | Research In Motion Limited | Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application |
US20080256613A1 (en) * | 2007-03-13 | 2008-10-16 | Grover Noel J | Voice print identification portal |
US9128981B1 (en) | 2008-07-29 | 2015-09-08 | James L. Geer | Phone assisted ‘photographic memory’ |
US20100131280A1 (en) * | 2008-11-25 | 2010-05-27 | General Electric Company | Voice recognition system for medical devices |
US8185373B1 (en) * | 2009-05-05 | 2012-05-22 | The United States Of America As Represented By The Director, National Security Agency, The | Method of assessing language translation and interpretation |
US8352244B2 (en) * | 2009-07-21 | 2013-01-08 | International Business Machines Corporation | Active learning systems and methods for rapid porting of machine translation systems to new language pairs or new domains |
TWI413106B (en) * | 2010-08-04 | 2013-10-21 | Hon Hai Prec Ind Co Ltd | Electronic recording apparatus and method thereof |
US9263032B2 (en) | 2013-10-24 | 2016-02-16 | Honeywell International Inc. | Voice-responsive building management system |
US10395640B1 (en) * | 2014-07-23 | 2019-08-27 | Nvoq Incorporated | Systems and methods evaluating user audio profiles for continuous speech recognition |
US20160379630A1 (en) * | 2015-06-25 | 2016-12-29 | Intel Corporation | Speech recognition services |
CN113241064B (en) * | 2021-06-28 | 2024-02-13 | 科大讯飞股份有限公司 | Speech recognition, model training method and device, electronic equipment and storage medium |
Citations (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4922538A (en) | 1987-02-10 | 1990-05-01 | British Telecommunications Public Limited Company | Multi-user speech recognition system |
US5027406A (en) | 1988-12-06 | 1991-06-25 | Dragon Systems, Inc. | Method for interactive speech recognition and training |
US5425128A (en) | 1992-05-29 | 1995-06-13 | Sunquest Information Systems, Inc. | Automatic management system for speech recognition processes |
US5600781A (en) | 1994-09-30 | 1997-02-04 | Intel Corporation | Method and apparatus for creating a portable personalized operating environment |
US5717820A (en) | 1994-03-10 | 1998-02-10 | Fujitsu Limited | Speech recognition method and apparatus with automatic parameter selection based on hardware running environment |
US5724410A (en) | 1995-12-18 | 1998-03-03 | Sony Corporation | Two-way voice messaging terminal having a speech to text converter |
US5774841A (en) | 1995-09-20 | 1998-06-30 | The United States Of America As Represented By The Adminstrator Of The National Aeronautics And Space Administration | Real-time reconfigurable adaptive speech recognition command and control apparatus and method |
US5818800A (en) | 1992-04-06 | 1998-10-06 | Barker; Bruce J. | Voice recording device having portable and local modes of operation |
US5822727A (en) | 1995-03-30 | 1998-10-13 | At&T Corp | Method for automatic speech recognition in telephony |
US5825921A (en) | 1993-03-19 | 1998-10-20 | Intel Corporation | Memory transfer apparatus and method useful within a pattern recognition system |
US5850627A (en) | 1992-11-13 | 1998-12-15 | Dragon Systems, Inc. | Apparatuses and methods for training and operating speech recognition systems |
US5956683A (en) * | 1993-12-22 | 1999-09-21 | Qualcomm Incorporated | Distributed voice recognition system |
US5960063A (en) | 1996-08-23 | 1999-09-28 | Kokusai Denshin Denwa Kabushiki Kaisha | Telephone speech recognition system |
US5995936A (en) | 1997-02-04 | 1999-11-30 | Brais; Louis | Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations |
EP0965979A1 (en) | 1998-06-15 | 1999-12-22 | Dragon Systems Inc. | Position manipulation in speech recognition |
US6014624A (en) | 1997-04-18 | 2000-01-11 | Nynex Science And Technology, Inc. | Method and apparatus for transitioning from one voice recognition system to another |
US6029195A (en) | 1994-11-29 | 2000-02-22 | Herz; Frederick S. M. | System for customized electronic identification of desirable objects |
US6067516A (en) | 1997-05-09 | 2000-05-23 | Siemens Information | Speech and text messaging system with distributed speech recognition and speaker database transfers |
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
US6073103A (en) | 1996-04-25 | 2000-06-06 | International Business Machines Corporation | Display accessory for a record playback system |
US6092043A (en) | 1992-11-13 | 2000-07-18 | Dragon Systems, Inc. | Apparatuses and method for training and operating speech recognition systems |
US6108628A (en) | 1996-09-20 | 2000-08-22 | Canon Kabushiki Kaisha | Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model |
WO2000049601A1 (en) | 1999-02-19 | 2000-08-24 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
US6122614A (en) | 1998-11-20 | 2000-09-19 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US6128482A (en) | 1998-12-22 | 2000-10-03 | General Motors Corporation | Providing mobile application services with download of speaker independent voice model |
US6163768A (en) | 1998-06-15 | 2000-12-19 | Dragon Systems, Inc. | Non-interactive enrollment in speech recognition |
US6175822B1 (en) | 1998-06-05 | 2001-01-16 | Sprint Communications Company, L.P. | Method and system for providing network based transcription services |
US6212498B1 (en) | 1997-03-28 | 2001-04-03 | Dragon Systems, Inc. | Enrollment in speech recognition |
US6260013B1 (en) | 1997-03-14 | 2001-07-10 | Lernout & Hauspie Speech Products N.V. | Speech recognition system employing discriminatively trained models |
US6275805B1 (en) | 1999-02-25 | 2001-08-14 | International Business Machines Corp. | Maintaining input device identity |
US6308158B1 (en) * | 1999-06-30 | 2001-10-23 | Dictaphone Corporation | Distributed speech recognition system with multi-user input stations |
US6317484B1 (en) | 1998-04-08 | 2001-11-13 | Mcallister Alexander I. | Personal telephone service with transportable script control of services |
US6327343B1 (en) | 1998-01-16 | 2001-12-04 | International Business Machines Corporation | System and methods for automatic call and data transfer processing |
US6342903B1 (en) | 1999-02-25 | 2002-01-29 | International Business Machines Corp. | User selectable input devices for speech applications |
US6366882B1 (en) | 1997-03-27 | 2002-04-02 | Speech Machines, Plc | Apparatus for converting speech to text |
US6374221B1 (en) * | 1999-06-22 | 2002-04-16 | Lucent Technologies Inc. | Automatic retraining of a speech recognizer while using reliable transcripts |
US6401066B1 (en) | 1999-11-09 | 2002-06-04 | West Teleservices Holding Company | Automated third party verification system |
US6415258B1 (en) | 1999-10-06 | 2002-07-02 | Microsoft Corporation | Background audio recovery system |
US20020095290A1 (en) * | 1999-02-05 | 2002-07-18 | Jonathan Kahn | Speech recognition program mapping tool to align an audio file to verbatim text |
US6430551B1 (en) | 1997-10-08 | 2002-08-06 | Koninklijke Philips Electroncis N.V. | Vocabulary and/or language model training |
US6477491B1 (en) | 1999-05-27 | 2002-11-05 | Mark Chandler | System and method for providing speaker-specific records of statements of speakers |
US6477493B1 (en) | 1999-07-15 | 2002-11-05 | International Business Machines Corporation | Off site voice enrollment on a transcription device for speech recognition |
US6556971B1 (en) | 2000-09-01 | 2003-04-29 | Snap-On Technologies, Inc. | Computer-implemented speech recognition system training |
US6594628B1 (en) * | 1995-09-21 | 2003-07-15 | Qualcomm, Incorporated | Distributed voice recognition system |
US6636961B1 (en) | 1999-07-09 | 2003-10-21 | International Business Machines Corporation | System and method for configuring personal systems |
US6654955B1 (en) | 1996-12-19 | 2003-11-25 | International Business Machines Corporation | Adding speech recognition libraries to an existing program at runtime |
US6674451B1 (en) | 1999-02-25 | 2004-01-06 | International Business Machines Corporation | Preventing audio feedback |
US6725194B1 (en) | 1999-07-08 | 2004-04-20 | Koninklijke Philips Electronics N.V. | Speech recognition device with text comparing means |
US6751590B1 (en) | 2000-06-13 | 2004-06-15 | International Business Machines Corporation | Method and apparatus for performing pattern-specific maximum likelihood transformations for speaker recognition |
US6775651B1 (en) | 2000-05-26 | 2004-08-10 | International Business Machines Corporation | Method of transcribing text from computer voice mail |
US6785647B2 (en) * | 2001-04-20 | 2004-08-31 | William R. Hutchison | Speech recognition system with network accessible speech processing resources |
US6785847B1 (en) * | 2000-08-03 | 2004-08-31 | International Business Machines Corporation | Soft error detection in high speed microprocessors |
US6868379B1 (en) | 1999-07-08 | 2005-03-15 | Koninklijke Philips Electronics N.V. | Speech recognition device with transfer means |
US6885736B2 (en) | 1996-11-14 | 2005-04-26 | Nuance Communications | System and method for providing and using universally accessible voice and speech data files |
US6952675B1 (en) | 1999-09-10 | 2005-10-04 | International Business Machines Corporation | Methods and apparatus for voice information registration and recognized sentence specification in accordance with speech recognition |
US6961699B1 (en) | 1999-02-19 | 2005-11-01 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
US6983248B1 (en) | 1999-09-10 | 2006-01-03 | International Business Machines Corporation | Methods and apparatus for recognized word registration in accordance with speech recognition |
US7006967B1 (en) | 1999-02-05 | 2006-02-28 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US7212969B1 (en) | 2000-09-29 | 2007-05-01 | Intel Corporation | Dynamic generation of voice interface structure and voice content based upon either or both user-specific contextual information and environmental information |
-
2004
- 2004-01-23 US US10/763,966 patent/US7689416B1/en not_active Expired - Fee Related
-
2012
- 2012-03-30 US US13/436,519 patent/USRE44248E1/en not_active Expired - Lifetime
Patent Citations (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4922538A (en) | 1987-02-10 | 1990-05-01 | British Telecommunications Public Limited Company | Multi-user speech recognition system |
US5027406A (en) | 1988-12-06 | 1991-06-25 | Dragon Systems, Inc. | Method for interactive speech recognition and training |
US5818800A (en) | 1992-04-06 | 1998-10-06 | Barker; Bruce J. | Voice recording device having portable and local modes of operation |
US5425128A (en) | 1992-05-29 | 1995-06-13 | Sunquest Information Systems, Inc. | Automatic management system for speech recognition processes |
US6073097A (en) | 1992-11-13 | 2000-06-06 | Dragon Systems, Inc. | Speech recognition system which selects one of a plurality of vocabulary models |
US6092043A (en) | 1992-11-13 | 2000-07-18 | Dragon Systems, Inc. | Apparatuses and method for training and operating speech recognition systems |
US6101468A (en) | 1992-11-13 | 2000-08-08 | Dragon Systems, Inc. | Apparatuses and methods for training and operating speech recognition systems |
US5850627A (en) | 1992-11-13 | 1998-12-15 | Dragon Systems, Inc. | Apparatuses and methods for training and operating speech recognition systems |
US5881312A (en) | 1993-03-19 | 1999-03-09 | Intel Corporation | Memory transfer apparatus and method useful within a pattern recognition system |
US5825921A (en) | 1993-03-19 | 1998-10-20 | Intel Corporation | Memory transfer apparatus and method useful within a pattern recognition system |
US5956683A (en) * | 1993-12-22 | 1999-09-21 | Qualcomm Incorporated | Distributed voice recognition system |
US5717820A (en) | 1994-03-10 | 1998-02-10 | Fujitsu Limited | Speech recognition method and apparatus with automatic parameter selection based on hardware running environment |
US5600781A (en) | 1994-09-30 | 1997-02-04 | Intel Corporation | Method and apparatus for creating a portable personalized operating environment |
US6029195A (en) | 1994-11-29 | 2000-02-22 | Herz; Frederick S. M. | System for customized electronic identification of desirable objects |
US5822727A (en) | 1995-03-30 | 1998-10-13 | At&T Corp | Method for automatic speech recognition in telephony |
US5774841A (en) | 1995-09-20 | 1998-06-30 | The United States Of America As Represented By The Adminstrator Of The National Aeronautics And Space Administration | Real-time reconfigurable adaptive speech recognition command and control apparatus and method |
US6594628B1 (en) * | 1995-09-21 | 2003-07-15 | Qualcomm, Incorporated | Distributed voice recognition system |
US5724410A (en) | 1995-12-18 | 1998-03-03 | Sony Corporation | Two-way voice messaging terminal having a speech to text converter |
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
US6073103A (en) | 1996-04-25 | 2000-06-06 | International Business Machines Corporation | Display accessory for a record playback system |
US5960063A (en) | 1996-08-23 | 1999-09-28 | Kokusai Denshin Denwa Kabushiki Kaisha | Telephone speech recognition system |
US6108628A (en) | 1996-09-20 | 2000-08-22 | Canon Kabushiki Kaisha | Speech recognition method and apparatus using coarse and fine output probabilities utilizing an unspecified speaker model |
US6885736B2 (en) | 1996-11-14 | 2005-04-26 | Nuance Communications | System and method for providing and using universally accessible voice and speech data files |
US6654955B1 (en) | 1996-12-19 | 2003-11-25 | International Business Machines Corporation | Adding speech recognition libraries to an existing program at runtime |
US5995936A (en) | 1997-02-04 | 1999-11-30 | Brais; Louis | Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations |
US6260013B1 (en) | 1997-03-14 | 2001-07-10 | Lernout & Hauspie Speech Products N.V. | Speech recognition system employing discriminatively trained models |
US6366882B1 (en) | 1997-03-27 | 2002-04-02 | Speech Machines, Plc | Apparatus for converting speech to text |
US6212498B1 (en) | 1997-03-28 | 2001-04-03 | Dragon Systems, Inc. | Enrollment in speech recognition |
US6014624A (en) | 1997-04-18 | 2000-01-11 | Nynex Science And Technology, Inc. | Method and apparatus for transitioning from one voice recognition system to another |
US6067516A (en) | 1997-05-09 | 2000-05-23 | Siemens Information | Speech and text messaging system with distributed speech recognition and speaker database transfers |
US6430551B1 (en) | 1997-10-08 | 2002-08-06 | Koninklijke Philips Electroncis N.V. | Vocabulary and/or language model training |
US6327343B1 (en) | 1998-01-16 | 2001-12-04 | International Business Machines Corporation | System and methods for automatic call and data transfer processing |
US6317484B1 (en) | 1998-04-08 | 2001-11-13 | Mcallister Alexander I. | Personal telephone service with transportable script control of services |
US6175822B1 (en) | 1998-06-05 | 2001-01-16 | Sprint Communications Company, L.P. | Method and system for providing network based transcription services |
US6424943B1 (en) | 1998-06-15 | 2002-07-23 | Scansoft, Inc. | Non-interactive enrollment in speech recognition |
US6163768A (en) | 1998-06-15 | 2000-12-19 | Dragon Systems, Inc. | Non-interactive enrollment in speech recognition |
EP0965979A1 (en) | 1998-06-15 | 1999-12-22 | Dragon Systems Inc. | Position manipulation in speech recognition |
US6122614A (en) | 1998-11-20 | 2000-09-19 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US6128482A (en) | 1998-12-22 | 2000-10-03 | General Motors Corporation | Providing mobile application services with download of speaker independent voice model |
US7006967B1 (en) | 1999-02-05 | 2006-02-28 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US20020095290A1 (en) * | 1999-02-05 | 2002-07-18 | Jonathan Kahn | Speech recognition program mapping tool to align an audio file to verbatim text |
US6961699B1 (en) | 1999-02-19 | 2005-11-01 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
WO2000049601A1 (en) | 1999-02-19 | 2000-08-24 | Custom Speech Usa, Inc. | Automated transcription system and method using two speech converting instances and computer-assisted correction |
US6342903B1 (en) | 1999-02-25 | 2002-01-29 | International Business Machines Corp. | User selectable input devices for speech applications |
US6275805B1 (en) | 1999-02-25 | 2001-08-14 | International Business Machines Corp. | Maintaining input device identity |
US6674451B1 (en) | 1999-02-25 | 2004-01-06 | International Business Machines Corporation | Preventing audio feedback |
US6477491B1 (en) | 1999-05-27 | 2002-11-05 | Mark Chandler | System and method for providing speaker-specific records of statements of speakers |
US6374221B1 (en) * | 1999-06-22 | 2002-04-16 | Lucent Technologies Inc. | Automatic retraining of a speech recognizer while using reliable transcripts |
US6308158B1 (en) * | 1999-06-30 | 2001-10-23 | Dictaphone Corporation | Distributed speech recognition system with multi-user input stations |
US6868379B1 (en) | 1999-07-08 | 2005-03-15 | Koninklijke Philips Electronics N.V. | Speech recognition device with transfer means |
US6725194B1 (en) | 1999-07-08 | 2004-04-20 | Koninklijke Philips Electronics N.V. | Speech recognition device with text comparing means |
US6636961B1 (en) | 1999-07-09 | 2003-10-21 | International Business Machines Corporation | System and method for configuring personal systems |
US6477493B1 (en) | 1999-07-15 | 2002-11-05 | International Business Machines Corporation | Off site voice enrollment on a transcription device for speech recognition |
US6952675B1 (en) | 1999-09-10 | 2005-10-04 | International Business Machines Corporation | Methods and apparatus for voice information registration and recognized sentence specification in accordance with speech recognition |
US6983248B1 (en) | 1999-09-10 | 2006-01-03 | International Business Machines Corporation | Methods and apparatus for recognized word registration in accordance with speech recognition |
US6415258B1 (en) | 1999-10-06 | 2002-07-02 | Microsoft Corporation | Background audio recovery system |
US6401066B1 (en) | 1999-11-09 | 2002-06-04 | West Teleservices Holding Company | Automated third party verification system |
US6775651B1 (en) | 2000-05-26 | 2004-08-10 | International Business Machines Corporation | Method of transcribing text from computer voice mail |
US6751590B1 (en) | 2000-06-13 | 2004-06-15 | International Business Machines Corporation | Method and apparatus for performing pattern-specific maximum likelihood transformations for speaker recognition |
US6785847B1 (en) * | 2000-08-03 | 2004-08-31 | International Business Machines Corporation | Soft error detection in high speed microprocessors |
US6556971B1 (en) | 2000-09-01 | 2003-04-29 | Snap-On Technologies, Inc. | Computer-implemented speech recognition system training |
US7212969B1 (en) | 2000-09-29 | 2007-05-01 | Intel Corporation | Dynamic generation of voice interface structure and voice content based upon either or both user-specific contextual information and environmental information |
US6785647B2 (en) * | 2001-04-20 | 2004-08-31 | William R. Hutchison | Speech recognition system with network accessible speech processing resources |
Non-Patent Citations (8)
Title |
---|
Article 1: An e-mail dialog in which Michael Bliss asked for instructions on transferring user voice files to another computer on Jun. 26, 1998, and Marty Tibor of Synapse Adaptive, provided a link to Article 2 below, on Jun. 29, 1998. Printed from http://www.voicerecognition.com/voice-users/archive/1998/3598.html on Feb. 24, 2011. |
Article 2: Gould, Joel, "Moving Speech files Around", web address http://www.synapseadaptive.com/joel/moving-speech-files.htm, last revised Feb. 23, 1998, currently available, original publication unknown. |
Article 2: Gould, Joel, "Moving Speech files Around", web address http://www.synapseadaptive.com/joel/moving—speech—files.htm, last revised Feb. 23, 1998, currently available, original publication unknown. |
BM ViaVoiceTM 98 Home Edition User Guide, First Edition (Jul. 1998), International Business Machines Corporation 1998. |
Brandhagen, Randy. "Recognizing Users" E-mail to the Sez! Speech Recognition Forum. Mar. 19, 1998 at 20:14:52, answer by Phantom. Mar. 21, 1998 at 07:42:26. |
S0165168400000281-1, Gilloire et al.,"Innovative Speech Processing for Mobile Terminals"Signal Processing 80 (2000) 1149-1166, published by Elsevier Science B.V. |
S0167639399000801-1 Doddington, et al. "The NIST Speaker Recognition Evaluation" Speech Communication, 31 (2000) 225-254, published by Elsevier Science B.V. |
This reference is a collection of elements of an online dialog that occurred on a list site at medspeech@list.sirius.com between May 12, 1998 and May 13, 1998. It was found by searching for "moving voice files". |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130211568A1 (en) * | 2012-02-12 | 2013-08-15 | Skymedi Corporation | Automataed mass prodcution method and system thereof |
US8855799B2 (en) * | 2012-02-12 | 2014-10-07 | Skymedi Corporation | Automated mass production method and system thereof |
US9763024B2 (en) | 2015-04-09 | 2017-09-12 | Yahoo Holdings, Inc. | Mobile ghosting |
US10555148B2 (en) | 2015-04-09 | 2020-02-04 | Oath Inc. | Mobile ghosting |
US20190378496A1 (en) * | 2018-06-07 | 2019-12-12 | Kabushiki Kaisha Toshiba | Recognition device, method and storage medium |
US11600262B2 (en) * | 2018-06-07 | 2023-03-07 | Kabushiki Kaisha Toshiba | Recognition device, method and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US7689416B1 (en) | 2010-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE44248E1 (en) | System for transferring personalize matter from one computer to another | |
Warden | Speech commands: A dataset for limited-vocabulary speech recognition | |
US7962331B2 (en) | System and method for tuning and testing in a speech recognition system | |
US6728680B1 (en) | Method and apparatus for providing visual feedback of speed production | |
AU2003279037B2 (en) | Software for statistical analysis of speech | |
US8260617B2 (en) | Automating input when testing voice-enabled applications | |
US10535352B2 (en) | Automated cognitive recording and organization of speech as structured text | |
US20200143151A1 (en) | Interactive test method, device and system | |
US10997965B2 (en) | Automated voice processing testing system and method | |
EA004352B1 (en) | Automated transcription system and method using two speech converting instances and computer-assisted correction | |
US20080280269A1 (en) | A Homework Assignment and Assessment System for Spoken Language Education and Testing | |
US11741303B2 (en) | Tokenization of text data to facilitate automated discovery of speech disfluencies | |
US12062373B2 (en) | Automated generation of transcripts through independent transcription | |
US7983921B2 (en) | Information processing apparatus for speech recognition with user guidance, method and program | |
JP2002132287A (en) | Speech recording method and speech recorder as well as memory medium | |
US20150149183A1 (en) | Process and Associated System for Separating a Specified Component and an Audio Background Component from an Audio Mixture Signal | |
Rodd et al. | A tool for efficient and accurate segmentation of speech data: announcing POnSS | |
US20210280167A1 (en) | Text to speech prompt tuning by example | |
Nusbaum et al. | Automatic measurement of speech recognition performance: A comparison of six speaker-dependent recognition devices | |
JP7166370B2 (en) | Methods, systems, and computer readable recording media for improving speech recognition rates for audio recordings | |
Ingram et al. | Commentary on" Evaluating Articulation and Phonological Disorders When the Clock Is Running". | |
US20230230610A1 (en) | Approaches to generating studio-quality recordings through manipulation of noisy audio | |
JP2017207546A (en) | Reverberant environment determination device, reverberant environment determination method, and program | |
CN115910029A (en) | Generating synthesized speech input | |
US20060074638A1 (en) | Speech file generating system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR) |
|
FEPP | Fee payment procedure |
Free format text: SURCHARGE FOR LATE PAYMENT, MICRO ENTITY (ORIGINAL EVENT CODE: M3555) |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, MICRO ENTITY (ORIGINAL EVENT CODE: M3552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: GENERAL VOICE INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:POIRIER, DARRELL;REEL/FRAME:044701/0468 Effective date: 20171002 Owner name: BLACKBIRD TECH LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENERAL VOICE INC.;REEL/FRAME:044701/0288 Effective date: 20171016 |
|
AS | Assignment |
Owner name: GENERAL VOICE, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BLACKBIRD TECH LLC DBA BLACKBIRD TECHNOLOGIES;REEL/FRAME:050008/0023 Effective date: 20190613 |
|
AS | Assignment |
Owner name: GENERAL VOICE, LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENERAL VOICE, INC.,;REEL/FRAME:051395/0774 Effective date: 20191115 |
|
AS | Assignment |
Owner name: SYNKLOUD TECHNOLOGIES, LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENERAL VOICE, LLC;REEL/FRAME:051455/0161 Effective date: 20200108 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |