[go: nahoru, domu]

WO1990008439A2 - A speech processing apparatus and method therefor - Google Patents

A speech processing apparatus and method therefor Download PDF

Info

Publication number
WO1990008439A2
WO1990008439A2 PCT/US1990/000096 US9000096W WO9008439A2 WO 1990008439 A2 WO1990008439 A2 WO 1990008439A2 US 9000096 W US9000096 W US 9000096W WO 9008439 A2 WO9008439 A2 WO 9008439A2
Authority
WO
WIPO (PCT)
Prior art keywords
telephone
speech
signal
stored
telephone number
Prior art date
Application number
PCT/US1990/000096
Other languages
French (fr)
Other versions
WO1990008439A3 (en
Inventor
Peter J. Schmuckal
Mitchell S. Phillips
Francis P. Keiper, Iii
James C. Sprout
Ronald H. Fried
Original Assignee
Origin Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US07/294,168 external-priority patent/US5007081A/en
Application filed by Origin Technology, Inc. filed Critical Origin Technology, Inc.
Priority to KR1019900701938A priority Critical patent/KR910700582A/en
Publication of WO1990008439A2 publication Critical patent/WO1990008439A2/en
Publication of WO1990008439A3 publication Critical patent/WO1990008439A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/12Speech classification or search using dynamic programming techniques, e.g. dynamic time warping [DTW]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M13/00Party-line systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition

Definitions

  • the present invention relates to a speech processing apparatus having particular application in a speech activated telephone.
  • Speech recognition apparatuses are well known in the art.
  • a speech recognition apparatus can be used to activate a number of task including a telephone.
  • the algorithm used in speech recognition is complex and has required the use of a dedicated signal processor.
  • the use of a dedicated signal processor has increased the cost of the apparatus.
  • DT basic dynamic time warping
  • Warping Algorithm for Isolated Word Recognition by K. Paliwa, A. Agarwal and S.S. Sinha, IEEE Int'l Conf. On Acoust. Speech and Si ⁇ . Proc.. Vol. ICASSP- 2, pp. ' 1259-61, May 1982.
  • this algorithm attempts to solve the problem of window skewing, it is also subject to error.
  • Speech activated phones are also well known in the art. However, they have not provided a mechanism by which questionable choices of the particular speech pattern to a stored speech pattern can be resolved. Thus, they are prone to error and cannot be resolved by user input. These and other considerations have not permitted a speech activated telephone to use an inexpensive general processor wherein other novel features may be implemented, without a great deal of cost.
  • a single line telephone is connected to a single pair of physical wires labeled tip and ring. Since communication must be effected in both directions through the tip and ring lines, a balance transformer has been used to isolate the tip and ring lines from the transmit and receive lines internal within the telephone. Such a balance transformer is expensive and is bulky.
  • Prior art telephones have provided for the monitoring of the telephone line to which it is attached; However, the typical monitoring functions have been limited to ringing, hold, and busy. The telephone apparatus has not been able to display the status of whether or not the telephone is connected at all to the line without placing the telephone off hook to determine if the telephone is connected to the line.
  • a speech recognition apparatus and method therefor uses a modified clipped auto correlation function wherein a first difference of the speech signal is obtained prior to applying the clipped auto correlation function to produce speech patterns, which are stored.
  • the apparatus also uses a constant width band dynamic time warping processing algorithm and adaptive linear prune line algorithm to match the input speech pattern to the stored speech pattern.
  • the present invention also relates to a speech activated telephone that uses a modified clipped auto correlation function to process speech signals to produce speech patterns and to store the speech patterns.
  • the telephone also uses a constant width band dynamic time warping processing algorithm and adaptive linear prune line algorithm to match the input speech pattern with the stored speech pattern. Further, the telephone provides for user input on questionable matches.
  • the telephone has a user interface menu for entering text in conjunction with numeric data and can record and display previously made calls.
  • the telephone also has an answering capability which can screen incoming calls and can forward a particular call and display the associated name, if any, of the caller from its directory.
  • a novel speed dialing feature is also disclosed.
  • the telephone also has a novel line status monitoring circuit with a novel phone network interface circuit.
  • Figure 1 is a perspective view of a novel telephone device.
  • Figure 2 is a top view of the keyboard layout of the portion of the telephone shown in Figure 1.
  • Figure 3 is a schematic circuit block diagram of the telephone shown in Figure 1.
  • Figure 4 is a detailed block level schematic diagram of a portion of the circuit shown in Figure 3.
  • FIG. 5 is a detailed circuit level diagram of the microprocessor and its associated circuitry used in the telephone shown in Figure 3.
  • Figure 6 is a detailed schematic circuit diagram of the keyboard and display assemblies shown in Figure 3.
  • Figure 7 is a detailed schematic circuit diagram of the telephone network interface circuit portion of the telephone.
  • FIG. 8 is a detailed circuit level diagram of the interface circuit portion of the telephone which interfaces with various audio input/output signals.
  • Figure 9 is a graph showing dynamic time warping algorithm with a constant width band.
  • FIG. 10 is a graph showing adaptive linear pruning. Detailed Description of the Drawings
  • the telephone 10 comprises a hand set 12 which has a microphone and speaker.
  • the telephone 10 also has a numeric keypad 22 which receives the numeric input numbers (0-9) as well as the control signals of "*" and "#".
  • Such a keyboard is well known in the art.
  • the telephone 10 comprises a plurality of well known conventional control keys for redial 20, hold 18, flash 16, and speaker 14, to activate the speaker phone.
  • the telephone 10 also comprises a key labeled directory 24, a key labeled voice 26, and three reprogrammable option buttons 28 (A-C) .
  • the telephone 10 comprises a LCD display 30 which can display two line of alphanumeric characters each line with sixteen characters. Referring to Figures 3 and 4, there is shown in block level diagram the telephone 10.
  • the telephone 10 is connected to a telephone line comprising of a tip and ring 32.
  • the telephone line 32 is connected to a line protection circuit 34 which is then connected to a polarity guard circuit 36.
  • the signals (to and from) are separated by a hybrid circuit 40 (discussed in greater detail hereinafter) , from which the transmit and receive signals are supplied to the audio connect circuit 46 and the receive attenuator circuit 44, respectively.
  • a hybrid circuit 40 discussed in greater detail hereinafter
  • Other analog circuits which connect the handset receiver 50 and handset microphone 52 and the speaker phone microphone 54 are also shown in Figure 4 and are well known in the art.
  • the signals in the phone circuit shown in Figure 4 are supplied to and from a microcomputer 60 shown in Figure 3.
  • the microcomputer 60 is a 50943 microcomputer made by Mitsubishi.
  • the microcomputer 60 is based on the 6502 processor.
  • the microcomputer 60 contains internal memory in the form of RAM and ROM. In addition, it has timers.
  • the microcomputer 60 also provides bidirectional digital I/O ports.
  • the microcomputer 60 has a built in A/D converter with multiple multiplexed inputs.
  • the microcomputer 60 has a pulse width modulator capable of generating analog signals when the proper low
  • the timing of the microcomputer 60 is controlled by a crystal oscillator circuit 62.
  • the crystal oscillator circuit 62 consists of an 8 MHz quartz crystal and the support components. This is actually driven to oscillation by the microcomputer 60 which divides the resulting signal to get 20/OTJT and 0/OUT.
  • the signal 20/OUT is a 4 MHz digital clock signal that is used to drive the time control circuits 64.
  • the 0/OUT signal which is a 2 MHz digital clock signal, is used by the microcomputer 60 to set the basic processor cycle time. It is also supplied to a memory control circuit 66 which is used by the memory control circuit 66 to control access to external memory.
  • External memory in the form of 32K X 8 SRAM 68, 32K X 8 EPROM memory 70, and output latch 72 are all accessed by a 16 bit address bus from the microcomputer 60 with an 8 bit data bus. They are all under the control of the memory control circuit 66.
  • the memory circuit SRAM 68 is accessed whenever address 0000 through 7FFF are read from or written to.
  • EPROM memory 70 is accessed whenever address 8000 through FFFF are read from.
  • the digital output latch 72 is accessed whenever address 8000 through FFFF are written to. A system reset will initialize the digital output latch 72 causing all outputs to be set to a low logic level.
  • the output latch 72 is driven by the microcomputer 60 and can store the results from microcomputer 60.
  • the output latch 72 drives the following signals:
  • DTMF enable 74 This signal is supplied to the DTMF decoder 76. A logic level high on this line enables data output on the DTMF decoder circuit 76.
  • the LCD enable signal 78 is supplied to the LCD module 30.
  • a logic level high on the LCD enable 78 enables the LCD controller in the LCD module 30 to read and to write to the LCD display 30.
  • SYNTH 82 This signal is supplied to the synthesis control circuit 84. A logic level high will enable the output of the microcomputer 60 pulse with modulator to be sent to a low pass filter 86 for synthesis of analog audio signals.
  • RING signal 88 A logic level high on this line switches the output of the PWM driven low pass filter 86 to the ring drive path, permitting an audio signal to be supplied to the amplifier 90 of the speaker phone and to the speaker 92 of the speaker phone. This signal is used to synthesize the call ringer.
  • a low logic level on the RING line 88 routes the output of the low pass filter 86 to the speaker phone IC 48. From the speaker phone IC 48, the synthesized audio signal is sent through the audio connect circuit 46 to the tip and ring line to the telephone network to another phone and is used with the call answering feature.
  • LINE signal 94 This signal is supplied to the Hook switch 42 and controls the line status of the telephone 10. A logic level high takes the phone off hook. A logic level low makes the phone on-hook.
  • SPEAKER signal 96 This signal controls whether the phone is in normal, or speaker phone mode. A logic level high turns the speaker phone on. A logic low returns the operation to handset.
  • MUTE signal 98 A logic level high on this line causes attenuation of the receive signal coming from the telephone network of tip and ring 32. It also is used by the speaker phone IC 48 to attenuate the microphone amplifier 90. A logic level low allows normal signal levels.
  • a secondary function of the mute signal 98 is to control the selection of source signals for analog to digital conversion. When mute signal 98 is at a high logic level, signals coming from the telephone network 32 are sent to the recognition source selector 102 circuit which is then supplied through a low pass filter 104, through a sample and hold circuit 106 and to the analog to converter circuit within the microcomputer 60. When mute signal 98 is a logic low, signals from the microphone (handset 52 or speaker 54) are sent to the analog/digital converter in the microcomputer 60.
  • AUDIO signal 100 This signal controls the connection of the Audio connect circuit 46 to the tip and ring 32.
  • a logic level low allows normal operation.
  • a logic level high prevents audio signals from being transmitted or received from the telephone network 32. This is to implement the hold feature and also is used during the speech recognition process.
  • the microcomputer is also connected to the time control circuit 64.
  • the time control circuit has three functions: system reset, watchdog timer reset, and time reference interrupt. During power up, a reset pulse is generated. This is stretched to allow the microcomputer 60 to become stable. Manual reset is also stretched.
  • the 4 MHz signal, 20/OUT is divided down to a 61 Hz. (16.384 msec, period signal.) The resulting signal is used to drive the intl interrupt input on the microcomputer 60.
  • the watchdog signal 110 must be pulsed to a logic level low and then brought back high. This keeps the watchdog timer from causing an automatic reset of the microcomputer 60. If the watchdog circuit is left at a low logic level, the watchdog timer is disabled.
  • the microcomputer 60 also directly outputs or reads the following signals: l. Watchdog signal 110.
  • the watchdog signal 110 is supplied to the time control circuit 64.
  • a logic level high signifies normal operation. Pulsing the signal from high to low then back to high again is required once every 61 Hz interrupt to prevent a watchdog timer reset. Holding the watchdog signal 110 low disables the watchdog timer. The watchdog timer is used to insure that the microcomputer 60 is operations.
  • Battery signal 112. This is a bidirectional digital signal. This signal is normally used as an input to sense power supply status.
  • a logic level low that is read on this line by the microcomputer 60 indicates that power is being supplied to the telephone 10 by an AC transformer.
  • a logic level high read on this line indicates that batteries are powering the telephone 10.
  • Sense hook switch signal 116 This is a digital input signal to the microcomputer 60. This signal is used to detect the status of hook switch 42. A logic level high indicates that the telephone 10 is on hook. A logic level low indicates that the telephone 10 is off hook.
  • Serial in and serial out signals 118A and 118B These digital signals form an asynchronous serial communication port. This is used during the testing of the telephone 10. 5.
  • S/*H signal 120 This signal is supplied from the microcomputer 60 through the synthesis control circuit 84. This signal is used to drive the sample and hold switch circuit 106 which is used to supply input signal to the A/D converter portion of the microcomputer 60. A logic level high allows sampling of the signal from the sample and hold circuit 106. A logic level low prevents the receipt of signals from the sample hold circuit 106 into the microcomputer 60. When the signal is gated by a logic level high signal on SYNTH signal 82, it is also used to drive the low pass filter 86 that generates audio signals.
  • Slow bus (SB0-SB5) 122 This is a bidirectional bus for digital signals. It is a medium speed data and control bus for operating the keyboard 22 and the option buttons 28 (A-C) of the LCD module 30, and the DTMF decoder 76. SB2 through SB5 are data lines when communicating with the DTMF decoder 76 and the switches 28 (A-C) of the LCD module 30. SB0 and SB1 are control lines when communicating with the switches 28 (A-C) of the LCD module 30. SBO through SB5 are used as digital outputs to drive the keyboard 22.
  • ROWBUS ROW0-ROW3 124. These digital input signals are supplied from the keyboard 22 and are used to decode the keyboard key closures 22.
  • INT 126 This is a digital input signal received from the time control circuit 64. It is a 61 Hz interrupt signal that the microcomputer 60 uses to keep track of real time.
  • INT interrupt signal 128 This is a signal supplied from the DTMF decoder circuit 76 that indicates the presence of a valid DTMF tone.
  • Battery level signal 114 This is an analog input signal from the power supply 130. It is used to determine battery charge level.
  • Line status signal 132 This is an analog signal that is received by the microcomputer 60. It is used to detect incoming ring signals from the telephone network 32. It is generated by the line status monitor circuit 38. In addition, when the telephone 10 is on hold, this line is monitored to detect another telephone on the same line going off hook. If this occurs, the hold state will be ended.
  • Voice Signal 134 This analog input signal is supplied from the sample and hold circuit 106 to the microcomputer 60. Signals from the low pass filter 104 and the sample and hold signal 106 enter the A-D converter portion in the microcomputer 60. They are used for the speech recognition process and for software DTMF detection.
  • the hybrid circuit 40 which is used to interface the network telephone lines tip and ring 32 from the transmit and receive lines within the telephone, is shown as a single transistor 40.
  • the single transistor is of bipolar and shown as a PNP transistor Q5 MPSW63.
  • the PNP transistor has a collector 41, a base 39 and an emitter 37.
  • transmit audio coming from the CMOS switch (4053) 46, passes through the RC network C9, R10 to the base 39 of the transistor 40.
  • the audio signal into the base 39 modulates the collector current of transistor 40.
  • This collector current is the telephone loop current and is the transmit audio signal out to the telephone line 32.
  • the audio signal from the output of the CMOS switch 46 also passes through another RC circuit, C8 and R24, to the receive attenuator switch 44.
  • the transmit audio signal at the collector of transistor 40 is of equal amplitude but is one hundred eighty (180) degrees out of phase with the signal from 46.
  • a third RC network, Cll and Rll, passes this signal, summing it with the signal from the output of 46 causing a cancellation of the transmit audio into the receive attenuator 44. There is no cancellation of receive audio coming from the telephone line 32 which passes from the collector 41 of transistor 40 through this same RC network to the receive attenuator 44.
  • the integrated circuit designated Ul 4053 is a three pole double throw CMOS switch (44 & 46) . It is used to connect the receive audio path (C section) , and the transmit audio path (B section) , to the speaker phone IC.
  • the A section is a receive attenuator switch. It is used to mute the level of the DTMF signals and pulse clicks during dialing.
  • the line status monitor circuit 38 is a differential amplifier with a very high input impedance (greater than 10 megohms) .
  • the inputs are connected to the voltage out of the Polarity Guard 36.
  • the voltage is about 48 volts.
  • the op amp 38 (U3D) converts this voltage to a signal in the range of three volts and passes this to the line status input to the microcomputer 60.
  • the output of the Polarity Guard 36 is greater than 100 volts.
  • the output of the op amp 38 is saturated high (greater than 4 volts) .
  • the voltage at the output of 36 drops to a much lower value, in the region of 10 to 15 volts, which translates to a voltage less than 1 volt into the line status input.
  • These voltages are interpreted by software in the microcomputer 60 to determine line status.
  • the voltage change which takes place when another phone comes across the telephone line 32 is used by the software to terminate the hold function and drop the line when the second phone picks up the line.
  • the software to control the operation of the telephone 10 is stored in the ROM portion of the microcomputer 60 as well as the
  • EPROM memory 70 The software that is stored in the ROM section of the microcomputer 60 performs the functions of 1) CACF signal processing; 2) Low level hardware support routines; 3) LCD display text messages; and 4) Copyright and code protection code.
  • the software that is stored in the EPROM memory 70 performs the functions of speech recognition and user interface.
  • the RAM memory 68 is used as a scratch pad and for storage of voice templates during the operation of the telephone 10.
  • the user can use the keypad 22 in its normal prior art operation for dialing a particular user's numbers.
  • the numbers are displayed on the display 30.
  • the redial key 20, hold key 18., flash key 16, and the speaker key 14 function in the normal prior art manner.
  • a telephone number shall mean a plurality of digits.
  • the operation of the telephone 10 would proceed as follows.
  • the date and time would be displayed on the display screen 30.
  • the date and time can be changed by pressing the option button C 28 (C) twice and following the prompt on the display device 30 to change the date and time.
  • the telephone 10 Since the telephone 10 responds to a particular speech command, the telephone 10 must be first trained to store the speech pattern of the particular speech to which it will respond. In that connection. the user must first train the telephone 10. The training mode is entered by the user lifting the handset and activating the voice key 26. A message prompting the user to speak is displayed on the display 30. The user then speaks a word or a command. That speech, converted into an analog signal by the handset mike 52, is received by the telephone 10 through the recognition source selector 102, through the low pass filter 104, the sample and hold circuit 106 into the microcomputer 60. The microcomputer 60 performs a number of functions based upon the software that is set forth in Exhibit B.
  • the analog speech or command is supplied at the sample rate of 7200 HZ., and is digitized to yield X(t) . A difference between each sample is taken. Thus, the signal after the first difference would be
  • the resultant signal, S(t), from the first difference is to eliminate DC signal.
  • the difference operation places a 6 db octave preemphasis in the speech and thus acts like a high pass filter.
  • the sampled signals S(t) are supplied to a frame buffer comprising of 144 storage locations.
  • 144 samples form one frame. Therefore, the frame rate is at 20 msec.
  • a well known processing technique of clipped autocorrelation function is performed on each frame.
  • the clipped autocorrelation function performs the operation as follows:
  • Each of the coefficients ⁇ m) represents a value of a portion of the speech pattern in time.
  • a standard end point determination technique to determine the beginning and the ending of the speech is also applied.
  • the software set forth in Exhibit B then checks each frame by frame to compress the signal therein - also based upon the well known prior art technique.
  • the user is prompted to speak at least twice, or until two words are spoken which are consistent with each other. An average is taken of the two words that the user spoke. This average is based upon a standard, well known technique.
  • the normalized coefficients A N (m), associated with each frame that is calculated, based upon the clipped autocorrelation function, is then stored as the speech pattern of the inputted speech.
  • the user is then prompted to enter the telephone number associated with the speech inputted name.
  • the user then enters the telephone number associated with the inputted speech.
  • the user presses the option button 28(C) which is associated with the text display of "done".
  • the telephone 10 prompts the user to enter the alphabetical text name that corresponds to the speech name that was audibly inputted into the telephone 10.
  • the user simply presses the appropriate numeric key which contains the alphabetical letter.
  • the three choices are then displayed on the display 30.
  • Each choice is displayed juxtaposition to each one of the three option buttons 28 (A-C) .
  • the option buttons 28 (A-C) are then reprogrammed such that the activation of one of the keys would then input the particular associated displayed alphabetical letter on the display 30.
  • alphabetical text can be entered on the numeric keypad 22 in conjunction with the option buttons 28 (A-C) . For example, if the number "5" on the numeric keypad 22 is pressed and the control key 28A is pressed, the letter J would be entered into the telephone 10 and would also be displayed on the display 30.
  • the telephone 10 can store up to 50 speech names, each associated with a telephone number, and an alphabetical text name. Clearly, through the addition of greater memories, more names can be stored in the telephone 10.
  • the option buttons 28 (A- C) can be reprogrammed by the telephone 10 to function for other purposes.
  • the software to perform that function is contained in the listing set forth in Exhibit C.
  • the option buttons 28 (A-C) can be changed from changing the date and time function to change inputting alphabetical text function.
  • the telephone 10 can dial numbers in a conventional manner.
  • the telephone 10 can be placed in a mode whereby the keypad 22 is locked thereby preventing all outgoing calls.
  • each of the three option buttons 28 (A-C) would still be functional and they can be reprogrammed to be used for dialing the emergency numbers such as police, fire and ambulance.
  • the telephone 10 can be placed in a protected mode whereby the speech names and each name's associated telephone number and alphabetical text are protected from searches (further discussed) or deletions through reinputting.
  • the telephone 10 can also respond to speech command dialing.
  • the user picks up the handset 12 and simply speaks a name that the user wants to call - with the speech having been previously trained and stored in the telephone 10.
  • the speech is converted into an analog signal and is received again by the microcomputer 60 through the sample and hold circuit 106.
  • the microcomputer 60 once again performs the function of finding the first difference of the samples which are sampled at the rate of 7200 HZ. A frame of every 144 samples is also gathered together.
  • a clipped autocorrelation function of each frame is calculated and normalized.
  • the coefficients generated by the clipped autocorrelation function for the inputted speech is then generated.
  • the plurality of coefficients of the inputted speech pattern is then compared to the plurality of coefficients associated with a stored speech pattern.
  • the comparison is based upon a modified dynamic time warping (DTW) algorithm.
  • DTW dynamic time warping
  • Figure 9 shows a typical band Dynamic Programming graph (see "Dynamic
  • the band region is defined as:
  • the processed spoken word i.e., the coefficients of the clipped autocorrelation of the spoken word is compared to each one of the stored words using the DTW algorithm and the word whose coefficients produce the smallest DTW result is the closest word to which the input word correspond.
  • the closest match i.e., the result of the DTW operation is still greater than some threshold level, then no match is found.
  • the matching of the speech pattern of the spoken word to the speech pattern of the stored words must be made through the list of the speech patterns of all the stored words before the correct match is found.
  • an adaptive linear pruning method is employed in the searching or matching process.
  • the DTW operation is first operated on the first word.
  • the DTW operation computes not only the result of the operation but also every subpart summation that corresponds to the particular point in time (see Figure 10) .
  • (Thus, C*, - C N are computed) .
  • the line 200 represents the best result of a DTW operation on a word, then not only is the DTW value for the total (C H ) computed, but also the linear progressive coefficients (C-
  • a second word comparison would be made at each point in time between the coefficient as a result of the DTW operation of the spoken word to the second word and to the corresponding coefficients for the best word.
  • line 210 represents.the operation of the DTW algorithm on the spoken word compared to the second word, then the operation is terminated without waiting for the completion of the DTW operation on the entire coefficients that correspond to the second word.
  • the predictive nature of the progression of the summation of the differences in line 210 will get worse and that the operation need not be permitted to its completion. This, of course, saves computational time and speeds the searching result.
  • an offset "O" can be added to the best score.
  • the DTW operation on other words in subsequent operations, as each coefficient in time is presented, must exceed the best score plus the offset to ensure that the prediction of the calculation of DTW through completion would in all likelihood exceed the DTW of the best score.
  • abs_mthresh This is absolute match threshold that a match must be under to be considered as a valid match.
  • rel_mthresh This is the relative match threshold over which the best match must be greater than the second best match to not be considered “questionable.”
  • a match is considered "pruned" when D n (x) > PL(x) , where D n (x) is the normalized accumulating distance function of the DTW.
  • the prune threshold is set to a maximum absolute cutoff threshold initially. If the results of the DTW operation on the first word is such that it is below the initial maximum threshold, then all of the coefficients of the first word would have been operated upon by the DTW operation. Thereafter, the probability decreases as to whether or not the DTW operation is allowed to operate on all of the coefficients of the subsequent words.
  • One possible method as used by the telephone 10 is to present words which correspond to names and therefore a telephone number which has been dialed most frequently. This raises the probability that the spoken word would most likely match the stored word which corresponds to the most frequently dialed telephone number.
  • a pre-sort feature of the stored words is accomplished by the telephone 10 before the stored words are presented for the DTW adaptive pruning analysis.
  • a general purpose microcomputer 60 can be used in the telephone 10. This saves cost in the telephone 10. Because it is always possible that the telephone 10 is unable to match the speech pattern of the spoken word with the speech pattern of any of the stored words, the stored words that are most probable of being matched, e.g., the stored words that have the lowest values of DTW algorithm operation but still exceed a threshold amount, are presented for display on the display 30. A match j that falls into the "questionable" region must satisfy the following criterion:
  • the telephone 10 presents its best questionable match for the user to confirm whether it is indeed the word that was spoken.
  • the user can then press option button A 28 (A) ("yes") if it is correct, or option button B 28 (B) ("no") to request the telephone 10 to display the next best questionable match. This process continues until either "yes” (28 (A) ) is pressed or there is only one remaining questionable match left, wherein it is dialed. Thus, on questionable matches, user input is sought.
  • Another way for a user to use the telephone 10 is to press the directory key 24.
  • the user is then prompted to enter one of the numbers from the numeric keypad 22.
  • the number that the user selects has three alphabetical letters associated therewith. All the names in that three letter group are presented in alphabetical order. To scroll forward, the user presses that same numeric key repeatedly. When a particular desired name and number are presented on the display 30, the user then activates the redial key 20 to dial that number.
  • the software to accomplish this is set forth in Exhibit C.
  • the telephone 10 can also be used to retrieve a telephone number - without dialing. There are two methods to accomplish the foregoing. In one method, the user simply presses the directory key 24 and then presses the voice button 26. The user then speaks the name. The telephone 10 will process the speech signal as previously discussed and display the chosen name and the telephone number associated with that name.
  • the telephone 10 can also be searched to retrieve a telephone number manually.
  • the user presses the directory key 24 and then a number on the keypad 22.
  • the user does not have to press an option button 28 (A-C) to narrow a three (3) letter group down to one.
  • the user presses that same key repeatedly to move forward.
  • the option buttons 28 (A-C) are not used at all except to display the one-button speed dial names/numbers.
  • the software to accomplish this is set forth in Exhibit C.
  • the telephone apparatus 10 can also log the last one hundred calls made.
  • the telephone 10 also stores the time, the date, and the phone number called and the length of each call.
  • the user can review this log at any time to audit phone bills, scan for frequently called numbers or redial those numbers, or any other purpose.
  • option button B 28 (B) The software to accomplish this is set forth in Exhibit C.
  • the telephone 10 also has an answering facility. It can be activated by pressing option button A 28 (A) . When it is activated, in the unattended mode, the telephone 10 will answer all calls with a prerecorded message. The apparatus 10 then prompts the caller to use the caller's telephone to dial in the telephone number of the caller. The caller, however, must be at a telephone apparatus that generates a DTMF signal. The DTMF signals are received by the telephone 10 and are converted into signals that represent the telephone number of the caller. The telephone 10 then records that number and the time of the call.
  • the telephone 10 can use the telephone number received from the caller and search through its directory to find the corresponding name. When the user returns, the telephone 10 can display the number of messages it has recorded. In addition, the time and the phone numbers are displayed. Finally, if there is a name associated with the phone numbers, i.e., the phone numbers are in the directory of the telephone apparatus 10, the name will be displayed as well.
  • Another aspect of phone answering is that the user can program the telephone 10 such that once a call has been answered and the caller has left the message comprising the DTMF signals indicating the telephone number of the caller, the telephone apparatus can then automatically dial a preset number (which is the telephone number of a paging service) and regenerate the DTMF signals left by the caller.
  • the telephone apparatus 10 can relay the numbers of the caller after a caller has left the message of where the caller can be reached.
  • the telephone 10 can be placed in call screening mode.
  • this is activated by pressing option button A 28 (A)
  • the telephone apparatus 10 does not ring when a incoming call is received.
  • the calling party knows the preassigned code, the calling party can dial in the code. This would override the call screening capability and the ringer would then be turned on. Without the code, the calling party then receives the phone answering message and the telephone 10 records the calling telephone number of the caller.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

In the present invention, a speech processing apparatus is disclosed. In one embodiment, the apparatus is used to activate a telephone. The speech activated phone stores speech patterns in accordance with a modified clipped autocorrelation function algorithm. The comparison of the speech pattern of the spoken word to the speech pattern of the stored word to obtain a speech pattern match is performed in accordance with a modified dynamic time warping algorithm, wherein a constant width window is maintained. Further, an adaptive pruning method is employed to speed up the processing of the DTW algorithms operation. A plurality of spoken words, the telephone number and the alphanumeric word associated with each spoken word are stored in the telephone. The telephone automatically dials the telephone number in response to inputted spoken word, matching the stored spoken word. In addtion, the telephone number and alphanumeric text for the matched spoken word is displayed.

Description

A SPEECH PROCESSING APPARATUS AND METHOD THEREFOR
TECHNICAL FIELD
The present invention relates to a speech processing apparatus having particular application in a speech activated telephone.
BACKGROUND OF THE INVENTION
Speech recognition apparatuses are well known in the art. A speech recognition apparatus can be used to activate a number of task including a telephone. However, the algorithm used in speech recognition is complex and has required the use of a dedicated signal processor. The use of a dedicated signal processor has increased the cost of the apparatus. Although a first difference technique has been used in speaker recognition analysis (see "Telephone- Line Speaker Recognition Using Clipped Autocorrelation Analysis", by H. Ney, Proc. ICASSP81 (Atlantic, 1981) p. 188-192)), such an analysis has not been done in speech recognition heretofore. The basic dynamic time warping (DT ) algorithm, used in speech pattern matching process, is well known in the art. It is disclosed in the article "Dynamic Programming Algorithm Optimization for Spoken Word Recognition" by Hiroaki Sakoe and Seibi Chiba, IEEE Trans. Acoust.. Speech, and Signal
Process, Vol. Assp-26, pp. 43-49, February, 1978. However, that algorithm does not provide a satisfactory solution to the window skewing problem. A modified DTW algorithm is disclosed in "A Modification Over Sakoe and Chiba's Dynamic Time
Warping Algorithm for Isolated Word Recognition", by K. Paliwa, A. Agarwal and S.S. Sinha, IEEE Int'l Conf. On Acoust. Speech and Siα. Proc.. Vol. ICASSP- 2, pp.' 1259-61, May 1982. Although this algorithm attempts to solve the problem of window skewing, it is also subject to error.
It is also known in the prior art to "prune" a DTW operation. In a pruning operation, if a search of k words results in an ith word having the lowest value of X, then in the DTW operation on the subsequent words, if during the summation operation the difference of the coefficients at any point in time exceeds the best score, then the DTW operation is terminated. See "Performance Trade-Offs and Search Techniques for Isolated Word Search Recognition", by R. Bisiani, A. Waibel, IEEE Int'l Conf. On Acoust. Speech and Siσ. Proc.. Vol. ICASSP- 1, pp. 570-73, May 1982. However, this technique still requires a considerable amount of computation. Speech activated phones are also well known in the art. However, they have not provided a mechanism by which questionable choices of the particular speech pattern to a stored speech pattern can be resolved. Thus, they are prone to error and cannot be resolved by user input. These and other considerations have not permitted a speech activated telephone to use an inexpensive general processor wherein other novel features may be implemented, without a great deal of cost.
In the prior art a single line telephone is connected to a single pair of physical wires labeled tip and ring. Since communication must be effected in both directions through the tip and ring lines, a balance transformer has been used to isolate the tip and ring lines from the transmit and receive lines internal within the telephone. Such a balance transformer is expensive and is bulky. Prior art telephones have provided for the monitoring of the telephone line to which it is attached; However, the typical monitoring functions have been limited to ringing, hold, and busy. The telephone apparatus has not been able to display the status of whether or not the telephone is connected at all to the line without placing the telephone off hook to determine if the telephone is connected to the line.
Summary of the Invention
In the present invention a speech recognition apparatus and method therefor is disclosed. The apparatus uses a modified clipped auto correlation function wherein a first difference of the speech signal is obtained prior to applying the clipped auto correlation function to produce speech patterns, which are stored. The apparatus also uses a constant width band dynamic time warping processing algorithm and adaptive linear prune line algorithm to match the input speech pattern to the stored speech pattern. The present invention also relates to a speech activated telephone that uses a modified clipped auto correlation function to process speech signals to produce speech patterns and to store the speech patterns. The telephone also uses a constant width band dynamic time warping processing algorithm and adaptive linear prune line algorithm to match the input speech pattern with the stored speech pattern. Further, the telephone provides for user input on questionable matches.
In addition, in the present invention the telephone has a user interface menu for entering text in conjunction with numeric data and can record and display previously made calls. The telephone also has an answering capability which can screen incoming calls and can forward a particular call and display the associated name, if any, of the caller from its directory. A novel speed dialing feature is also disclosed.
Finally, the telephone also has a novel line status monitoring circuit with a novel phone network interface circuit.
Brief Description of the Drawings Figure 1 is a perspective view of a novel telephone device.
Figure 2 is a top view of the keyboard layout of the portion of the telephone shown in Figure 1.
Figure 3 is a schematic circuit block diagram of the telephone shown in Figure 1.
Figure 4 is a detailed block level schematic diagram of a portion of the circuit shown in Figure 3.
Figure 5 is a detailed circuit level diagram of the microprocessor and its associated circuitry used in the telephone shown in Figure 3.
Figure 6 is a detailed schematic circuit diagram of the keyboard and display assemblies shown in Figure 3. Figure 7 is a detailed schematic circuit diagram of the telephone network interface circuit portion of the telephone.
Figure 8 is a detailed circuit level diagram of the interface circuit portion of the telephone which interfaces with various audio input/output signals.
Figure 9 is a graph showing dynamic time warping algorithm with a constant width band.
Figure 10 is a graph showing adaptive linear pruning. Detailed Description of the Drawings
Referring to Figure 1 there is shown in perspective view a telephone device 10. The telephone 10 comprises a hand set 12 which has a microphone and speaker. The telephone 10 also has a numeric keypad 22 which receives the numeric input numbers (0-9) as well as the control signals of "*" and "#". Such a keyboard is well known in the art. Finally, the telephone 10 comprises a plurality of well known conventional control keys for redial 20, hold 18, flash 16, and speaker 14, to activate the speaker phone.
In addition to the foregoing keys, the telephone 10 also comprises a key labeled directory 24, a key labeled voice 26, and three reprogrammable option buttons 28 (A-C) . In addition, the telephone 10 comprises a LCD display 30 which can display two line of alphanumeric characters each line with sixteen characters. Referring to Figures 3 and 4, there is shown in block level diagram the telephone 10. The telephone 10 is connected to a telephone line comprising of a tip and ring 32. The telephone line 32 is connected to a line protection circuit 34 which is then connected to a polarity guard circuit 36. From the polarity guard circuit 36, the signals (to and from) are separated by a hybrid circuit 40 (discussed in greater detail hereinafter) , from which the transmit and receive signals are supplied to the audio connect circuit 46 and the receive attenuator circuit 44, respectively. Other analog circuits which connect the handset receiver 50 and handset microphone 52 and the speaker phone microphone 54 are also shown in Figure 4 and are well known in the art. The signals in the phone circuit shown in Figure 4 are supplied to and from a microcomputer 60 shown in Figure 3. The microcomputer 60 is a 50943 microcomputer made by Mitsubishi. The microcomputer 60 is based on the 6502 processor. The microcomputer 60 contains internal memory in the form of RAM and ROM. In addition, it has timers. The microcomputer 60 also provides bidirectional digital I/O ports. Further, the microcomputer 60 has a built in A/D converter with multiple multiplexed inputs. Finally, the microcomputer 60 has a pulse width modulator capable of generating analog signals when the proper low pass filter is added.
The timing of the microcomputer 60 is controlled by a crystal oscillator circuit 62. The crystal oscillator circuit 62 consists of an 8 MHz quartz crystal and the support components. This is actually driven to oscillation by the microcomputer 60 which divides the resulting signal to get 20/OTJT and 0/OUT. The signal 20/OUT is a 4 MHz digital clock signal that is used to drive the time control circuits 64. The 0/OUT signal, which is a 2 MHz digital clock signal, is used by the microcomputer 60 to set the basic processor cycle time. It is also supplied to a memory control circuit 66 which is used by the memory control circuit 66 to control access to external memory. External memory in the form of 32K X 8 SRAM 68, 32K X 8 EPROM memory 70, and output latch 72 are all accessed by a 16 bit address bus from the microcomputer 60 with an 8 bit data bus. They are all under the control of the memory control circuit 66.
The memory circuit SRAM 68 is accessed whenever address 0000 through 7FFF are read from or written to. EPROM memory 70 is accessed whenever address 8000 through FFFF are read from. The digital output latch 72 is accessed whenever address 8000 through FFFF are written to. A system reset will initialize the digital output latch 72 causing all outputs to be set to a low logic level.
There are a number of output signals from the output latch 72. The output latch 72 is driven by the microcomputer 60 and can store the results from microcomputer 60. The output latch 72 drives the following signals:
1. DTMF enable 74. This signal is supplied to the DTMF decoder 76. A logic level high on this line enables data output on the DTMF decoder circuit 76.
2. LCD enable 78. The LCD enable signal 78 is supplied to the LCD module 30. A logic level high on the LCD enable 78 enables the LCD controller in the LCD module 30 to read and to write to the LCD display 30.
3. SYNTH 82. This signal is supplied to the synthesis control circuit 84. A logic level high will enable the output of the microcomputer 60 pulse with modulator to be sent to a low pass filter 86 for synthesis of analog audio signals.
4. RING signal 88. A logic level high on this line switches the output of the PWM driven low pass filter 86 to the ring drive path, permitting an audio signal to be supplied to the amplifier 90 of the speaker phone and to the speaker 92 of the speaker phone. This signal is used to synthesize the call ringer. A low logic level on the RING line 88 routes the output of the low pass filter 86 to the speaker phone IC 48. From the speaker phone IC 48, the synthesized audio signal is sent through the audio connect circuit 46 to the tip and ring line to the telephone network to another phone and is used with the call answering feature.
5. LINE signal 94. This signal is supplied to the Hook switch 42 and controls the line status of the telephone 10. A logic level high takes the phone off hook. A logic level low makes the phone on-hook.
6. SPEAKER signal 96. This signal controls whether the phone is in normal, or speaker phone mode. A logic level high turns the speaker phone on. A logic low returns the operation to handset.
7. MUTE signal 98. A logic level high on this line causes attenuation of the receive signal coming from the telephone network of tip and ring 32. It also is used by the speaker phone IC 48 to attenuate the microphone amplifier 90. A logic level low allows normal signal levels. A secondary function of the mute signal 98 is to control the selection of source signals for analog to digital conversion. When mute signal 98 is at a high logic level, signals coming from the telephone network 32 are sent to the recognition source selector 102 circuit which is then supplied through a low pass filter 104, through a sample and hold circuit 106 and to the analog to converter circuit within the microcomputer 60. When mute signal 98 is a logic low, signals from the microphone (handset 52 or speaker 54) are sent to the analog/digital converter in the microcomputer 60.
8. AUDIO signal 100. This signal controls the connection of the Audio connect circuit 46 to the tip and ring 32. A logic level low allows normal operation. A logic level high prevents audio signals from being transmitted or received from the telephone network 32. This is to implement the hold feature and also is used during the speech recognition process. The microcomputer is also connected to the time control circuit 64. The time control circuit has three functions: system reset, watchdog timer reset, and time reference interrupt. During power up, a reset pulse is generated. This is stretched to allow the microcomputer 60 to become stable. Manual reset is also stretched. The 4 MHz signal, 20/OUT, is divided down to a 61 Hz. (16.384 msec, period signal.) The resulting signal is used to drive the intl interrupt input on the microcomputer 60. This is used to keep track of real time. Once each cycle of the 61 Hz signal is counted, the watchdog signal 110 must be pulsed to a logic level low and then brought back high. This keeps the watchdog timer from causing an automatic reset of the microcomputer 60. If the watchdog circuit is left at a low logic level, the watchdog timer is disabled.
The microcomputer 60 also directly outputs or reads the following signals: l. Watchdog signal 110. As previously discussed, the watchdog signal 110 is supplied to the time control circuit 64. A logic level high signifies normal operation. Pulsing the signal from high to low then back to high again is required once every 61 Hz interrupt to prevent a watchdog timer reset. Holding the watchdog signal 110 low disables the watchdog timer. The watchdog timer is used to insure that the microcomputer 60 is operations. 2. Battery signal 112. This is a bidirectional digital signal. This signal is normally used as an input to sense power supply status. A logic level low that is read on this line by the microcomputer 60 indicates that power is being supplied to the telephone 10 by an AC transformer. A logic level high read on this line indicates that batteries are powering the telephone 10. When the microcomputer 60 drives this line high, the batteries are forced to supply power to the telephone 10. This is done to allow testing of battery capacity. 3. Sense hook switch signal 116. This is a digital input signal to the microcomputer 60. This signal is used to detect the status of hook switch 42. A logic level high indicates that the telephone 10 is on hook. A logic level low indicates that the telephone 10 is off hook.
4. Serial in and serial out signals 118A and 118B. These digital signals form an asynchronous serial communication port. This is used during the testing of the telephone 10. 5. S/*H signal 120. This signal is supplied from the microcomputer 60 through the synthesis control circuit 84. This signal is used to drive the sample and hold switch circuit 106 which is used to supply input signal to the A/D converter portion of the microcomputer 60. A logic level high allows sampling of the signal from the sample and hold circuit 106. A logic level low prevents the receipt of signals from the sample hold circuit 106 into the microcomputer 60. When the signal is gated by a logic level high signal on SYNTH signal 82, it is also used to drive the low pass filter 86 that generates audio signals.
6. Slow bus (SB0-SB5) 122. This is a bidirectional bus for digital signals. It is a medium speed data and control bus for operating the keyboard 22 and the option buttons 28 (A-C) of the LCD module 30, and the DTMF decoder 76. SB2 through SB5 are data lines when communicating with the DTMF decoder 76 and the switches 28 (A-C) of the LCD module 30. SB0 and SB1 are control lines when communicating with the switches 28 (A-C) of the LCD module 30. SBO through SB5 are used as digital outputs to drive the keyboard 22.
7. ROWBUS (ROW0-ROW3) 124. These digital input signals are supplied from the keyboard 22 and are used to decode the keyboard key closures 22.
8. INT 126. This is a digital input signal received from the time control circuit 64. It is a 61 Hz interrupt signal that the microcomputer 60 uses to keep track of real time.
9. INT interrupt signal 128. This is a signal supplied from the DTMF decoder circuit 76 that indicates the presence of a valid DTMF tone.
10. Battery level signal 114. This is an analog input signal from the power supply 130. It is used to determine battery charge level.
11. Line status signal 132. This is an analog signal that is received by the microcomputer 60. It is used to detect incoming ring signals from the telephone network 32. It is generated by the line status monitor circuit 38. In addition, when the telephone 10 is on hold, this line is monitored to detect another telephone on the same line going off hook. If this occurs, the hold state will be ended. 12. Voice Signal 134. This analog input signal is supplied from the sample and hold circuit 106 to the microcomputer 60. Signals from the low pass filter 104 and the sample and hold signal 106 enter the A-D converter portion in the microcomputer 60. They are used for the speech recognition process and for software DTMF detection.
Referring to Figures 5-8, there is shown in greater detail some of the blocks shown in Figures 3 and 4. One particular aspect of the telephone 10 is shown in Figure 7. The hybrid circuit 40 which is used to interface the network telephone lines tip and ring 32 from the transmit and receive lines within the telephone, is shown as a single transistor 40. The single transistor is of bipolar and shown as a PNP transistor Q5 MPSW63. The PNP transistor has a collector 41, a base 39 and an emitter 37.
In the operation of the transistor 40, transmit audio, coming from the CMOS switch (4053) 46, passes through the RC network C9, R10 to the base 39 of the transistor 40. The audio signal into the base 39 modulates the collector current of transistor 40. This collector current is the telephone loop current and is the transmit audio signal out to the telephone line 32. The audio signal from the output of the CMOS switch 46 also passes through another RC circuit, C8 and R24, to the receive attenuator switch 44. The transmit audio signal at the collector of transistor 40 is of equal amplitude but is one hundred eighty (180) degrees out of phase with the signal from 46. A third RC network, Cll and Rll, passes this signal, summing it with the signal from the output of 46 causing a cancellation of the transmit audio into the receive attenuator 44. There is no cancellation of receive audio coming from the telephone line 32 which passes from the collector 41 of transistor 40 through this same RC network to the receive attenuator 44. The integrated circuit designated Ul 4053 is a three pole double throw CMOS switch (44 & 46) . It is used to connect the receive audio path (C section) , and the transmit audio path (B section) , to the speaker phone IC. The A section is a receive attenuator switch. It is used to mute the level of the DTMF signals and pulse clicks during dialing.
The line status monitor circuit 38 is a differential amplifier with a very high input impedance (greater than 10 megohms) . The inputs are connected to the voltage out of the Polarity Guard 36. When the Hook Switch 42 is open and the telephone 10 is connected to the telephone line 32, the voltage is about 48 volts. The op amp 38 (U3D) converts this voltage to a signal in the range of three volts and passes this to the line status input to the microcomputer 60. When a ring signal is present on the telephone line 32, the output of the Polarity Guard 36 is greater than 100 volts. The output of the op amp 38 is saturated high (greater than 4 volts) . When the phone 10 is off hook, the voltage at the output of 36 drops to a much lower value, in the region of 10 to 15 volts, which translates to a voltage less than 1 volt into the line status input. These voltages are interpreted by software in the microcomputer 60 to determine line status. The voltage change which takes place when another phone comes across the telephone line 32 is used by the software to terminate the hold function and drop the line when the second phone picks up the line.
OPERATION
As previously discussed, the software to control the operation of the telephone 10 is stored in the ROM portion of the microcomputer 60 as well as the
EPROM memory 70. The software that is stored in the ROM section of the microcomputer 60 performs the functions of 1) CACF signal processing; 2) Low level hardware support routines; 3) LCD display text messages; and 4) Copyright and code protection code. The software that is stored in the EPROM memory 70 performs the functions of speech recognition and user interface.
The RAM memory 68 is used as a scratch pad and for storage of voice templates during the operation of the telephone 10.
In the operation of the telephone 10, the user can use the keypad 22 in its normal prior art operation for dialing a particular user's numbers. The numbers are displayed on the display 30. In addition, the redial key 20, hold key 18., flash key 16, and the speaker key 14 function in the normal prior art manner.
As previously stated, one of the novel aspects of the telephone 10 is its ability to dial a telephone number based upon a speech command. As used herein, a telephone number shall mean a plurality of digits. In this connection, the operation of the telephone 10 would proceed as follows.
Once power has been supplied to the telephone 10, through either an electrical transformer or through batteries, the date and time would be displayed on the display screen 30. The date and time can be changed by pressing the option button C 28 (C) twice and following the prompt on the display device 30 to change the date and time.
TRAINING MODE
Since the telephone 10 responds to a particular speech command, the telephone 10 must be first trained to store the speech pattern of the particular speech to which it will respond. In that connection. the user must first train the telephone 10. The training mode is entered by the user lifting the handset and activating the voice key 26. A message prompting the user to speak is displayed on the display 30. The user then speaks a word or a command. That speech, converted into an analog signal by the handset mike 52, is received by the telephone 10 through the recognition source selector 102, through the low pass filter 104, the sample and hold circuit 106 into the microcomputer 60. The microcomputer 60 performs a number of functions based upon the software that is set forth in Exhibit B.
First, the analog speech or command is supplied at the sample rate of 7200 HZ., and is digitized to yield X(t) . A difference between each sample is taken. Thus, the signal after the first difference would be
S(t) = X(t) - X(t-l)
The resultant signal, S(t), from the first difference is to eliminate DC signal. In addition, the difference operation places a 6 db octave preemphasis in the speech and thus acts like a high pass filter. Although a first difference technique has been used in speaker recognition analysis (see "Telephone-Line Speaker Recognition Using Clipped
Autocorrelation Analysis", by H. Ney, Proc. ICASSP81 (Atlantic, 1981) p. 188-192)), such an analysis has not been done in speech recognition heretofore.
Once the first difference in the sample rate signal is determined, the sampled signals S(t) are supplied to a frame buffer comprising of 144 storage locations. Thus, 144 samples form one frame. Therefore, the frame rate is at 20 msec. A well known processing technique of clipped autocorrelation function is performed on each frame. The clipped autocorrelation function performs the operation as follows:
Figure imgf000018_0001
Thereafter, the coefficients, A(m) , from each clipped autocorrelation function is normalized to form AN(m) in accordance with the following formula
A(m)
AN(m) =
A(o)
which is also well known in the art. Each of the coefficients ^m) represents a value of a portion of the speech pattern in time.
A standard end point determination technique to determine the beginning and the ending of the speech is also applied. The software set forth in Exhibit B then checks each frame by frame to compress the signal therein - also based upon the well known prior art technique. During the training mode, the user is prompted to speak at least twice, or until two words are spoken which are consistent with each other. An average is taken of the two words that the user spoke. This average is based upon a standard, well known technique.
From the foregoing, the normalized coefficients AN(m), associated with each frame that is calculated, based upon the clipped autocorrelation function, is then stored as the speech pattern of the inputted speech. The user is then prompted to enter the telephone number associated with the speech inputted name. The user then enters the telephone number associated with the inputted speech. At the end of the telephone number, the user presses the option button 28(C) which is associated with the text display of "done".
Thereafter, the telephone 10 prompts the user to enter the alphabetical text name that corresponds to the speech name that was audibly inputted into the telephone 10. The user simply presses the appropriate numeric key which contains the alphabetical letter. However, since there are three possible alphabetical letters that are associated with the activation of one particular numeric key 22, the three choices are then displayed on the display 30. Each choice is displayed juxtaposition to each one of the three option buttons 28 (A-C) . The option buttons 28 (A-C) are then reprogrammed such that the activation of one of the keys would then input the particular associated displayed alphabetical letter on the display 30. In this manner, alphabetical text can be entered on the numeric keypad 22 in conjunction with the option buttons 28 (A-C) . For example, if the number "5" on the numeric keypad 22 is pressed and the control key 28A is pressed, the letter J would be entered into the telephone 10 and would also be displayed on the display 30.
After the user has entered the alphabetic name that is associated with the speech name that was audibly inputted and the telephone number that was inputted by keypad 22, the option button 28 (C) associated with the text display of "done" is activated again. In one embodiment, the telephone 10 can store up to 50 speech names, each associated with a telephone number, and an alphabetical text name. Clearly, through the addition of greater memories, more names can be stored in the telephone 10.
As previously stated, the option buttons 28 (A- C) can be reprogrammed by the telephone 10 to function for other purposes. The software to perform that function is contained in the listing set forth in Exhibit C. Thus, in the embodiment describe heretofore, the option buttons 28 (A-C) can be changed from changing the date and time function to change inputting alphabetical text function.
DIALING MODE
As previously stated, when the user desires to use the telephone 10 to dial a series of numbers, the user can simply pick up the handset 12 or activate the speaker phone 14 and press the appropriate keys on the keypad 22. Thus, the telephone 10 can dial numbers in a conventional manner. In addition, however, since the function of the telephone 10 is governed by the software contained in the microcomputer 60, the telephone 10 can be placed in a mode whereby the keypad 22 is locked thereby preventing all outgoing calls. However, each of the three option buttons 28 (A-C) would still be functional and they can be reprogrammed to be used for dialing the emergency numbers such as police, fire and ambulance. Further, the telephone 10 can be placed in a protected mode whereby the speech names and each name's associated telephone number and alphabetical text are protected from searches (further discussed) or deletions through reinputting. The telephone 10 can also respond to speech command dialing. In this aspect, the user picks up the handset 12 and simply speaks a name that the user wants to call - with the speech having been previously trained and stored in the telephone 10.
The speech is converted into an analog signal and is received again by the microcomputer 60 through the sample and hold circuit 106. The microcomputer 60 once again performs the function of finding the first difference of the samples which are sampled at the rate of 7200 HZ. A frame of every 144 samples is also gathered together.
A clipped autocorrelation function of each frame is calculated and normalized. The coefficients generated by the clipped autocorrelation function for the inputted speech is then generated.
The plurality of coefficients of the inputted speech pattern is then compared to the plurality of coefficients associated with a stored speech pattern.
The comparison is based upon a modified dynamic time warping (DTW) algorithm.
As in the prior art, speech can be expressed as a time sequence of CACF feature vectors: A = 1; a2, ...a,-, ...at test pattern
B = b!, b2, ...b,-, ...bj reference pattern
The problem of attempting to eliminate timing difference between the two patterns is usually accomplished with a DTW algorithm. Figure 9 shows a typical band Dynamic Programming graph (see "Dynamic
Programming Algorithm Optimization for Spoken Word
Recognition" by Hiroaki Sakoe and Seibi Chiba, IEEE
Trans. Acoust.. Speech, and Signal Process. Vol.
Assp-26, pp. 43-49, February, 1978). In the Sakoe & Chiba reference, the band region is defined as:
I i - j I < r where r is a constant representing the vertical window width where the legal warp path must lie. However, if the length of test & reference (I & J) are very different, the DTW is subject to error. In a modification to the foregoing DTW algorithm, Paliwa, Agarwal & Siriha (See "A Modification Over Sakoe and Chiba's Dynamic Time Warping Algorithm for Isolated Word Recognition", by K. Paliwa, A. Agarwal and S.S. Sinha) , suggested defining the band region as:
I i - J/S I < r
where S is the slope of the line joining (0,0) and (I,J) and is equal to J/I. This definition is still subject to error in that the true window width is actually the length of the band perpendicular line, Lt, which varies with S.
In the telephone 10, this variation is eliminated by replacing the constant r with the expression:
r = Rf + S7)
where R is a constant equal to half the length of the perpendicular band line Lt. Thus, the new band line is defined as:
U - J/S[ < Lτ/2 l + S ) , where S=J/I (1^0)
In the operation of the DTW algorithm, the processed spoken word, i.e., the coefficients of the clipped autocorrelation of the spoken word is compared to each one of the stored words using the DTW algorithm and the word whose coefficients produce the smallest DTW result is the closest word to which the input word correspond. Of course, some provision must be made such that even if the closest match is not one of the possible words, then it must be rejected. Thus, if the closest match, i.e., the result of the DTW operation is still greater than some threshold level, then no match is found. In the prior art, the matching of the speech pattern of the spoken word to the speech pattern of the stored words must be made through the list of the speech patterns of all the stored words before the correct match is found. It is also known in the prior art to "prune" the operation. In a pruning operation of the prior art, if a search of k words results in an ith word having the lowest value of X, then in the DTW operation on the subsequent words, if during the summation operation the difference of the coefficients at any particular point in time exceeds the value of X, then the operation is terminated - without the need for completing the summation for the rest of the coefficients. This is because if by a particular point in time the value already exceeds the best value obtained theretofore, the operation of DTW on subsequent coefficients can only be worse. (See "Performance Trade-Offs and Search Techniques for Isolated Word Search Recognition", by R. Bisiani, A. Waibel) . In the telephone 10, to further increase the performance and speed, an adaptive linear pruning method is employed in the searching or matching process. In the adaptive linear pruning method, the DTW operation is first operated on the first word. The DTW operation computes not only the result of the operation but also every subpart summation that corresponds to the particular point in time (see Figure 10) . (Thus, C*, - CN are computed) . Thus, if the line 200 represents the best result of a DTW operation on a word, then not only is the DTW value for the total (CH) computed, but also the linear progressive coefficients (C-| . . . Cn-ι) are also computed.
A second word comparison would be made at each point in time between the coefficient as a result of the DTW operation of the spoken word to the second word and to the corresponding coefficients for the best word. Thus, if line 210 represents.the operation of the DTW algorithm on the spoken word compared to the second word, then the operation is terminated without waiting for the completion of the DTW operation on the entire coefficients that correspond to the second word. In short, it is assumed that the predictive nature of the progression of the summation of the differences in line 210 will get worse and that the operation need not be permitted to its completion. This, of course, saves computational time and speeds the searching result.
In order to further ensure that the DTW adaptive linear pruning method does not inadvertently prune or weed out the comparison with the potential best score, an offset "O" can be added to the best score. Thus, the DTW operation on other words in subsequent operations, as each coefficient in time is presented, must exceed the best score plus the offset to ensure that the prediction of the calculation of DTW through completion would in all likelihood exceed the DTW of the best score.
Mathematically, this is expressed as follows: Definitions: 1. abs_mthresh: This is absolute match threshold that a match must be under to be considered as a valid match.
2. rel_mthresh: This is the relative match threshold over which the best match must be greater than the second best match to not be considered "questionable."
3. fb: This is the prune line initial constant divisor (= 1/3 in the preferred embodiment) . 4. Vf2: This is the prune line variability region constant (= 750 in the preferred embodiment) .
5. BEST_DIST: This is a variable .which is equal to the best total distance scored so far. Initially, BEST_DIST = abs_mthresh. At the end of each computed match, If DISTANCE < BEST_DIST, then: BEST_DIST = DISTANCE
The prune line is defined as follows: kpl = (BEST_DIST + Vfz + rel_mthresh) * (1 - fb) cpl = (BEST_DIST + Vfz + rel_mthresh) * (fb) PL(x) = kpl * x + cpl where 0 < x < 1
A match is considered "pruned" when Dn(x) > PL(x) , where Dn(x) is the normalized accumulating distance function of the DTW.
With the linear adaptive pruning method, the prune threshold is set to a maximum absolute cutoff threshold initially. If the results of the DTW operation on the first word is such that it is below the initial maximum threshold, then all of the coefficients of the first word would have been operated upon by the DTW operation. Thereafter, the probability decreases as to whether or not the DTW operation is allowed to operate on all of the coefficients of the subsequent words. To further speed up the processing of the match between the coefficients of the spoken word to the coefficients of the stored word, it would be advantageous to have the list of words stored presented with the probability of the best score being presented first for DTW operation. This would mean that the potential best score would be presented first. If indeed the best score were presented first, then using the linear adaptive pruning method would greatly save computation requirements. One possible method as used by the telephone 10 is to present words which correspond to names and therefore a telephone number which has been dialed most frequently. This raises the probability that the spoken word would most likely match the stored word which corresponds to the most frequently dialed telephone number. Thus, a pre-sort feature of the stored words is accomplished by the telephone 10 before the stored words are presented for the DTW adaptive pruning analysis.
As a result of the foregoing, a general purpose microcomputer 60 can be used in the telephone 10. This saves cost in the telephone 10. Because it is always possible that the telephone 10 is unable to match the speech pattern of the spoken word with the speech pattern of any of the stored words, the stored words that are most probable of being matched, e.g., the stored words that have the lowest values of DTW algorithm operation but still exceed a threshold amount, are presented for display on the display 30. A match j that falls into the "questionable" region must satisfy the following criterion:
A. Scorej < absolute match threshold ('abs_mthresh')
B. Scorej - Scorβbest < relative match threshold ('rel_mthresh') If the number of matches in the questionable region is > 1, then the user is prompted to make a choice.
The telephone 10 presents its best questionable match for the user to confirm whether it is indeed the word that was spoken. The user can then press option button A 28 (A) ("yes") if it is correct, or option button B 28 (B) ("no") to request the telephone 10 to display the next best questionable match. This process continues until either "yes" (28 (A) ) is pressed or there is only one remaining questionable match left, wherein it is dialed. Thus, on questionable matches, user input is sought.
SPEED DIALING
Another way for a user to use the telephone 10 is to press the directory key 24. The user is then prompted to enter one of the numbers from the numeric keypad 22. The number that the user selects has three alphabetical letters associated therewith. All the names in that three letter group are presented in alphabetical order. To scroll forward, the user presses that same numeric key repeatedly. When a particular desired name and number are presented on the display 30, the user then activates the redial key 20 to dial that number. The software to accomplish this is set forth in Exhibit C.
DIRECTORY ASSISTANCE
The telephone 10 can also be used to retrieve a telephone number - without dialing. There are two methods to accomplish the foregoing. In one method, the user simply presses the directory key 24 and then presses the voice button 26. The user then speaks the name. The telephone 10 will process the speech signal as previously discussed and display the chosen name and the telephone number associated with that name.
The telephone 10 can also be searched to retrieve a telephone number manually. The user presses the directory key 24 and then a number on the keypad 22. The user does not have to press an option button 28 (A-C) to narrow a three (3) letter group down to one. The names are simply accessed in alphabetical order in that letter group, i.e. *2* = "ABC" = all names starting with A or B or C) . Also, to scroll through the letter group, the user presses that same key repeatedly to move forward. The option buttons 28 (A-C) are not used at all except to display the one-button speed dial names/numbers. The software to accomplish this is set forth in Exhibit C.
CALL LOGGING
The telephone apparatus 10 can also log the last one hundred calls made. The telephone 10 also stores the time, the date, and the phone number called and the length of each call. The user can review this log at any time to audit phone bills, scan for frequently called numbers or redial those numbers, or any other purpose. To review this log, the user presses option button B 28 (B) . The software to accomplish this is set forth in Exhibit C.
PHONE ANSWERING/CALL SCREENING
The telephone 10 also has an answering facility. It can be activated by pressing option button A 28 (A) . When it is activated, in the unattended mode, the telephone 10 will answer all calls with a prerecorded message. The apparatus 10 then prompts the caller to use the caller's telephone to dial in the telephone number of the caller. The caller, however, must be at a telephone apparatus that generates a DTMF signal. The DTMF signals are received by the telephone 10 and are converted into signals that represent the telephone number of the caller. The telephone 10 then records that number and the time of the call.
In addition, the telephone 10 can use the telephone number received from the caller and search through its directory to find the corresponding name. When the user returns, the telephone 10 can display the number of messages it has recorded. In addition, the time and the phone numbers are displayed. Finally, if there is a name associated with the phone numbers, i.e., the phone numbers are in the directory of the telephone apparatus 10, the name will be displayed as well.
Another aspect of phone answering is that the user can program the telephone 10 such that once a call has been answered and the caller has left the message comprising the DTMF signals indicating the telephone number of the caller, the telephone apparatus can then automatically dial a preset number (which is the telephone number of a paging service) and regenerate the DTMF signals left by the caller. Thus, the telephone apparatus 10 can relay the numbers of the caller after a caller has left the message of where the caller can be reached.
Finally, the telephone 10 can be placed in call screening mode. When this is activated by pressing option button A 28 (A) , the telephone apparatus 10 does not ring when a incoming call is received. However, if the calling party knows the preassigned code, the calling party can dial in the code. This would override the call screening capability and the ringer would then be turned on. Without the code, the calling party then receives the phone answering message and the telephone 10 records the calling telephone number of the caller.

Claims

WHAT IS CLAIMED IS:
1. In a speech recognition apparatus having signal processing means for receiving a digital signal representative of said speech and for processing said digital signal in accordance with a clipped auto correlation function wherein the improvement comprising: means for taking the first difference of said digital signal prior to processing said digital signal in accordance with said clipped auto correlation function.
2. The apparatus of Claim 1 wherein said signal processing means receives said digital signal of x(t) and processes said signal in accordance with:
S(t) = x(t) - x(t-l)
3. The apparatus of Claim 2 wherein said signal processing means further comprising means for normalizing the coefficients A(m) generated by the clipped autocorrelation function in accordance with
Figure imgf000031_0001
4. In a speech recognition apparatus having signal processing means for receiving a first digital signal having a plurality of first coefficients (a,) representative of timing portions of a spoken word and for comparing said first digital signal to a second digital signal having a plurality of second coefficients (bj) representative of timing portions of a stored word, wherein the improvement comprising: means for processing said first and said second signals in accordance with a dynamic time warp function having a window of
fi - j/s| < Lτ/2( + S') , where S=J/I (1^0)
where T is the length of the perpendicular band line.
5. In a speech recognition apparatus having signal processing means for receiving -a first digital signal having a plurality of first coefficients representative of timing portions of a spoken word and for comparing said first digital signal to a second digital signal having a plurality of second coefficients representative of a stored word, wherein the improvement comprising: means for terminating said comparison at a point in time in the event the sum of the absolute differences of the first coefficients and the corresponding second coe ficients, prior to said point in time exceed the sum of the differences of a prior comparison, at said point in time.
6. In a speech recognition apparatus having signal processing means for receiving a first digital signal representative of a spoken word, and for comparing said first digital signal to one of a plurality of second digital signals, in sequence, wherein each of said plurality of second digital signals is representative of a stored word, wherein the improvement comprising: means for sorting said plurality of second digital signals based upon an a priori determination prior to said comparison.
7. The apparatus of Claim 6 wherein said a priori determination is based upon usage.
8. A method of processing a speech comprising the steps of:
(a) converting said speech into an analog signal; (b) digitizing said analog signal to produce a digital signal having a sampling rate;
(c) performing a first difference of said , digital signal to produce a modified digital signal;
(d) calculating the coefficients of said speech in accordance with a clipped auto correlation function on said modified digital signal; and (e) storing said coeefficients.
9. The method of Claim 8 further comprising the steps of:
(f) repeating the steps of (a)-(d) for a new speech; (g) comparing said coefficients of said new speech to the stored coefficients in accordance with a dynamic time warping (DTW) algorithm having a constant width window in accordance with the following:
I i - j/s | < Lj/2 (V L + Sz) , where S=J/I (1^0)
j is the length of the perpendicular band line.
10. The method of Claim 9 wherein said coefficients of said new speech is compared in sequence to the coefficients of a plurality of stored speeches and further comprising the step of: (h) terminating said comparing step (step
(g) ) in the event the result of said DTW algorithm on a stored speech for the ith coefficient is greater than the DTW algorithm for the same coefficient in time for a prior obtained lowest score on a different stored speech.
11. The method of Claim 10 further comprising the step of:
(i) pre-sorting said plurality of stored speeches to present the most probable lowest score first for DTW operation.
12. A speech activated telephone having means for dialing a pre-stored telephone number, wherein the improvement comprising: means for inputting a plurality of acoustic commands into said telephone in a training mode; means for processing said plurality of acoustic commands into a plurality of processed signals; means for storing said plurality of processed signals; means for inputting a telephone number and an alphanumeric name associated with each acoustic command, in said training mode; means for storing said telephone number and said alphanumeric name associated with each acoustic command; means for receiving a dialing acoustic command in a dialing mode; means for processing said dialing acoustic command into an address signal; means for comparing said address signal to said stored processed signals; means for displaying a plurality of alphanumeric names in response to said comparing means unable to uniquely distinguish said address signal with only one of said stored processed signals; user activatable switch means for selecting one of said displayed alphanumeric names; and means for generating the dialing signal representative of the telephone number associated with said selected displayed alphanumeric name in response to said user activatable switch means.
13. A telephone having a numeric keypad for entering a telephone number, and having a plurality of alphabetical characters associated with each numeric key, wherein the improvement comprising means for displaying the alphabetical characters associated with each numeric key, when a numeric key is activated; a plurality of button means equal to or greater than the number of alphabetical characters associated with each numeric key; each of said plurality of button means positioned substantially juxtaposition to the displayed alphabetic character associated therewith; and means for responding to the activation of one of said button means for entering alphabetical data into said telephone.
14. A telephone having a numeric keypad and means for dialing the numbers entered from said keypad wherein the improvement comprising a plurality of button means; and means for changing the function associated with each control key.
15. A telephone having a numeric keypad and means for dialing the numbers entered from said keypad, wherein the improvement comprising: means for storing a telephone number associated with each call made; and means for displaying said stored telephone number.
16. The telephone of Claim 15 further comprising: means for storing the date, time and duration of each call made; and means for displaying said stored date, time and duration.
17. A telephone having automatic means for answering an incoming call placed by a caller; wherein said improvement comprising: means for receiving a plurality of tones generated by said caller; said tones representative of the telephone number of said caller; means for converting said tones into signals representative of the numbers associated with said tones; means for storing said signals; and means for displaying the numbers associated with said tones.
18. The telephone of Claim 17, wherein said telephone further comprising: means for storing a plurality of names and a telephone number associated with each name; means for comparing said telephone number of said caller to said stored telephone numbers; and means for displaying the name associated with said stored telephone number in the event said telephone number of said caller matches one of said stored telephone numbers.
19. The telephone of Claim 17 further comprising means for calling a pre-set telephone number in the event said plurality of tones is received; and means for generating said received tones.
20. The telephone of Claim 17 further comprising ringing means for producing an audible response to the detection of an incoming call; means for screening said incoming call, wherein said screening means deactivates said ringing means; and password means activated by said caller to deactivate said screening means.
21. A telephone comprising: means for storing a plurality of names, a telephone number associated with each name, and a speech pattern of each name; speech input means for inputting a speech pattern of a name to be searched; means for comparing said inputted speech pattern to said stored speech patterns; means for retrieving one or more names from said storing means, based upon said comparison in the event said comparing means is unable to uniquely distinguish only one of said stored speech patterns to said inputted speech pattern; means for displaying said one or more names; means for selecting one of said one or more names displayed; and means for dialing the telephone number associated with the one name selected by the selecting means.
22. A telephone, having a display means for displaying the telephone line statuses of ringing, hold and busy, wherein the improvement comprising: means for displaying the telephone line status of disconnect or no line, without the telephone being off hook; and means for displaying the date and time.
23. A telephone for connecting to a telephone line having a tip and a ring and having means for transmitting audio signals to said line and means for receiving audio signals from said line, wherein said improvement comprising: a single transistor means for isolating said telephone line from said transmitting means and receiving means; said transistor means having a collector, an emitter and a base; said collector connected to said telephone line; a first RC network means connecting said transmitting means to said base of said transistor means for supplying transmitting audio signal from said transmitting means to said base, wherein said transmitting audio signal generates a base current of said transistor means, causing the generation of a collector current to be transmitted over said telephone line; a second RC network means connecting the collector of said transmitting means to said receiving means; and a third RC network means connecting the transmitting means to said receiving means.
24. The telephone of Claim 21 wherein said display means displays said one or more names in an order.
25. The telephone of Claim 24 wherein said order is the presentation first of the name having the highest probability of match followed by names having decreasing probability of match.
PCT/US1990/000096 1989-01-05 1990-01-04 A speech processing apparatus and method therefor WO1990008439A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1019900701938A KR910700582A (en) 1989-01-05 1990-01-04 Speech processing device and method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US29387689A 1989-01-05 1989-01-05
US07/294,168 US5007081A (en) 1989-01-05 1989-01-05 Speech activated telephone
US294,168 1989-01-05
US293,876 1989-01-05

Publications (2)

Publication Number Publication Date
WO1990008439A2 true WO1990008439A2 (en) 1990-07-26
WO1990008439A3 WO1990008439A3 (en) 1990-09-07

Family

ID=26968205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/000096 WO1990008439A2 (en) 1989-01-05 1990-01-04 A speech processing apparatus and method therefor

Country Status (4)

Country Link
EP (1) EP0453511A4 (en)
JP (1) JPH04504178A (en)
KR (1) KR910700582A (en)
WO (1) WO1990008439A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL1000284C2 (en) * 1995-05-02 1996-11-05 Huigh Cornelis Van Der Mandele Voice operated number selection unit for telephone
GB2307137A (en) * 1995-11-04 1997-05-14 Motorola Ltd Communications addressing network
EP0778689A1 (en) * 1995-12-06 1997-06-11 WILHELM, Siegfried E Telecommunication end user device
GB2317781A (en) * 1996-09-30 1998-04-01 Matsushita Electric Ind Co Ltd Multimodal voice dialling digital key telephone with dialog manager
GB2347253B (en) * 1999-02-23 2001-03-07 Motorola Inc Method of selectively assigning a penalty to a probability associated with a voice recognition system
US7062435B2 (en) 1996-02-09 2006-06-13 Canon Kabushiki Kaisha Apparatus, method and computer readable memory medium for speech recognition using dynamic programming

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0624085D0 (en) * 2006-12-01 2007-01-10 Oxford Biosignals Ltd Biomedical signal analysis method

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3746793A (en) * 1972-08-09 1973-07-17 Phonics Corp Telephone communication system for the hearing impaired
US4074069A (en) * 1975-06-18 1978-02-14 Nippon Telegraph & Telephone Public Corporation Method and apparatus for judging voiced and unvoiced conditions of speech signal
US4400589A (en) * 1979-05-21 1983-08-23 United Networks, Inc. Subscriber station network
US4425627A (en) * 1981-02-23 1984-01-10 Sperry Corporation Intelligent prompting terminal apparatus
US4509133A (en) * 1981-05-15 1985-04-02 Asulab S.A. Apparatus for introducing control words by speech
US4516215A (en) * 1981-09-11 1985-05-07 Sharp Kabushiki Kaisha Recognition of speech or speech-like sounds
US4536886A (en) * 1982-05-03 1985-08-20 Texas Instruments Incorporated LPC pole encoding using reduced spectral shaping polynomial
US4543452A (en) * 1982-02-23 1985-09-24 Confon Ag Apparatus for the storing and recovery of information, subscriber's numbers, subscriber's addresses, etc.
US4608457A (en) * 1984-04-11 1986-08-26 Fowler Stephen L Telecommunications device for the hearing impared
US4644107A (en) * 1984-10-26 1987-02-17 Ttc Voice-controlled telephone using visual display
US4709387A (en) * 1984-05-10 1987-11-24 Sharp Kabushiki Kaisha Multifunctional telephone
US4741031A (en) * 1986-06-27 1988-04-26 Gai-Tronics Intrinsically safe telephone
US4751728A (en) * 1987-03-27 1988-06-14 Treat John M Telephone call monitoring, metering and selection device
US4782522A (en) * 1986-01-25 1988-11-01 Telenorma Telefonbau Und Normalzeit Gmbh Terminal for a telecommunications switching system having an additional keyboard and/or visual display
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US4827500A (en) * 1987-01-30 1989-05-02 American Telephone And Telegraph Company, At&T Bell Laboratories Automatic speech recognition to select among call destinations
US4864622A (en) * 1986-10-31 1989-09-05 Sanyo Electric Co., Ltd. Voice recognizing telephone
US4866778A (en) * 1986-08-11 1989-09-12 Dragon Systems, Inc. Interactive speech recognition apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60501180A (en) * 1983-03-28 1985-07-25 エクソン リサ−チ アンド エンジニアリング カンパニ− Speech recognition method and device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3746793A (en) * 1972-08-09 1973-07-17 Phonics Corp Telephone communication system for the hearing impaired
US4074069A (en) * 1975-06-18 1978-02-14 Nippon Telegraph & Telephone Public Corporation Method and apparatus for judging voiced and unvoiced conditions of speech signal
US4400589A (en) * 1979-05-21 1983-08-23 United Networks, Inc. Subscriber station network
US4425627A (en) * 1981-02-23 1984-01-10 Sperry Corporation Intelligent prompting terminal apparatus
US4509133A (en) * 1981-05-15 1985-04-02 Asulab S.A. Apparatus for introducing control words by speech
US4516215A (en) * 1981-09-11 1985-05-07 Sharp Kabushiki Kaisha Recognition of speech or speech-like sounds
US4543452A (en) * 1982-02-23 1985-09-24 Confon Ag Apparatus for the storing and recovery of information, subscriber's numbers, subscriber's addresses, etc.
US4536886A (en) * 1982-05-03 1985-08-20 Texas Instruments Incorporated LPC pole encoding using reduced spectral shaping polynomial
US4608457A (en) * 1984-04-11 1986-08-26 Fowler Stephen L Telecommunications device for the hearing impared
US4709387A (en) * 1984-05-10 1987-11-24 Sharp Kabushiki Kaisha Multifunctional telephone
US4644107A (en) * 1984-10-26 1987-02-17 Ttc Voice-controlled telephone using visual display
US4783803A (en) * 1985-11-12 1988-11-08 Dragon Systems, Inc. Speech recognition apparatus and method
US4782522A (en) * 1986-01-25 1988-11-01 Telenorma Telefonbau Und Normalzeit Gmbh Terminal for a telecommunications switching system having an additional keyboard and/or visual display
US4741031A (en) * 1986-06-27 1988-04-26 Gai-Tronics Intrinsically safe telephone
US4866778A (en) * 1986-08-11 1989-09-12 Dragon Systems, Inc. Interactive speech recognition apparatus
US4864622A (en) * 1986-10-31 1989-09-05 Sanyo Electric Co., Ltd. Voice recognizing telephone
US4827500A (en) * 1987-01-30 1989-05-02 American Telephone And Telegraph Company, At&T Bell Laboratories Automatic speech recognition to select among call destinations
US4751728A (en) * 1987-03-27 1988-06-14 Treat John M Telephone call monitoring, metering and selection device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"NEW GENERATION PHONE Dials at the Sound of your Voice", COMB Product Catalog, September 1986, page 7, bottom right. *
DIALESS TM I The Ultimate Family Phone, The Scope Catalog, January 1986. *
Electrical Communication (ITT), Volume 59, Number 3, M. IMMENDORFER, "Voice Dialer", 06 May 1985, pages 281-285. *
IEEE Trans. on Acoustics, Speech and Signal Processing, Volume ASSP-26, Number 1, February 1978, SAKOE et al., "Dynamic Programming Algorithm Optimization for Spoken Word Recognition", pages 43-49. *
See also references of EP0453511A1 *
Telecommunications, HANK STROBEL, "Intelligent Telephone Sets", December 1980, figures 1, 2 and page 60, bottom center. *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL1000284C2 (en) * 1995-05-02 1996-11-05 Huigh Cornelis Van Der Mandele Voice operated number selection unit for telephone
GB2307137A (en) * 1995-11-04 1997-05-14 Motorola Ltd Communications addressing network
GB2307137B (en) * 1995-11-04 2000-03-22 Motorola Ltd A communications addressing network and terminal therefor
EP0778689A1 (en) * 1995-12-06 1997-06-11 WILHELM, Siegfried E Telecommunication end user device
US7062435B2 (en) 1996-02-09 2006-06-13 Canon Kabushiki Kaisha Apparatus, method and computer readable memory medium for speech recognition using dynamic programming
GB2317781A (en) * 1996-09-30 1998-04-01 Matsushita Electric Ind Co Ltd Multimodal voice dialling digital key telephone with dialog manager
GB2317781B (en) * 1996-09-30 2000-12-06 Matsushita Electric Ind Co Ltd Multimodal voice dialling digital key telephone with dialog manager
GB2347253B (en) * 1999-02-23 2001-03-07 Motorola Inc Method of selectively assigning a penalty to a probability associated with a voice recognition system
US6233557B1 (en) 1999-02-23 2001-05-15 Motorola, Inc. Method of selectively assigning a penalty to a probability associated with a voice recognition system

Also Published As

Publication number Publication date
EP0453511A1 (en) 1991-10-30
JPH04504178A (en) 1992-07-23
KR910700582A (en) 1991-03-15
WO1990008439A3 (en) 1990-09-07
EP0453511A4 (en) 1993-05-26

Similar Documents

Publication Publication Date Title
US5007081A (en) Speech activated telephone
EP0311414B2 (en) Voice controlled dialer having memories for full-digit dialing for any users and abbreviated dialing for authorized users
EP0307193B1 (en) Telephone apparatus
US4945557A (en) Voice activated dialing apparatus
US6940951B2 (en) Telephone application programming interface-based, speech enabled automatic telephone dialer using names
US5903628A (en) Caller information (CLID) controlled automatic answer feature for telephone
GB2317781A (en) Multimodal voice dialling digital key telephone with dialog manager
US5752230A (en) Method and apparatus for identifying names with a speech recognition program
EP0893901B1 (en) Method for controlling a telecommunication service and a terminal
US6563911B2 (en) Speech enabled, automatic telephone dialer using names, including seamless interface with computer-based address book programs
CA2221913A1 (en) Statistical database correction of alphanumeric account numbers for speech recognition and touch-tone recognition
USH1646H (en) Speech recognition adapter for telephone system
US6671354B2 (en) Speech enabled, automatic telephone dialer using names, including seamless interface with computer-based address book programs, for telephones without private branch exchanges
WO1990008439A2 (en) A speech processing apparatus and method therefor
US6223161B1 (en) Method for setting terminal specific parameters of a communication terminal
CA2058644C (en) Voice activated telephone set
US5638437A (en) Telecommunication system and method enabling a user to get access to an automated call processing from a central station operating on pulse dialling mode
GB2258936A (en) Voice recognition apparatus
KR970055729A (en) Method and apparatus for transmitting telephone number by voice recognition in mobile terminal
KR950009425B1 (en) The phonetic dialing phone
JPH06311220A (en) Image recognizing dialer
EP0642118B1 (en) Automatic system for guided acquistion of telephone line speech signals
US20040181415A1 (en) Speech activated telephone device for connection to existing landline phone
JPH098894A (en) Voice recognizing cordless telephone set
JPH03145248A (en) Caller identification telephone set

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE

AK Designated states

Kind code of ref document: A3

Designated state(s): CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1990902869

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1990902869

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1990902869

Country of ref document: EP