US20010016813A1 - Distributed recogniton system having multiple prompt-specific and response-specific speech recognizers - Google Patents

Distributed recogniton system having multiple prompt-specific and response-specific speech recognizers Download PDF

Info

Publication number
US20010016813A1
US20010016813A1 US09/221,582 US22158298A US2001016813A1 US 20010016813 A1 US20010016813 A1 US 20010016813A1 US 22158298 A US22158298 A US 22158298A US 2001016813 A1 US2001016813 A1 US 2001016813A1
Authority
US
United States
Prior art keywords
speech
spoken utterance
utterance
user
prompt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/221,582
Other versions
US6377922B2 (en
Inventor
Deborah W. Brown
Randy G. Goldberg
Richard R. Rosinski
William R. Wetzel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
AT&T Properties LLC
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Priority to US09/221,582 priority Critical patent/US6377922B2/en
Assigned to AT&T CORP. reassignment AT&T CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BROWN, DEBORAH W., GOLDBERG, RANDY G., ROSINSKI, RICHARD R., WETZEL, WILLIAM R.
Publication of US20010016813A1 publication Critical patent/US20010016813A1/en
Application granted granted Critical
Publication of US6377922B2 publication Critical patent/US6377922B2/en
Assigned to AT&T PROPERTIES, LLC reassignment AT&T PROPERTIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T CORP.
Assigned to AT&T INTELLECTUAL PROPERTY II, L.P. reassignment AT&T INTELLECTUAL PROPERTY II, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T PROPERTIES, LLC
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T INTELLECTUAL PROPERTY II, L.P.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention is directed to a speech recognition system. More particularly, the present invention is directed to a speech recognition system that uses multiple speech recognizers to increase its accuracy.
  • Speech recognition systems are increasingly being used to translate human spoken words or utterances into their written equivalent and meaning. Speech recognition systems can avoid the need for spoken utterances to be manually entered into a computer, or to be recognized by a human. Therefore, speech recognition systems are desirable for many businesses because these systems can minimize the number of human operators needed to handle calls from customers.
  • various speech recognizers have their own strengths and weaknesses with respect to accurately identifying spoken utterances. For example, one speech recognizer may perform better at recognizing a sequence of alpha-numeric characters while other speech recognizers perform better at recognizing proper nouns such as for examples, names of places, people and things. Also, some speech recognizers can execute certain tasks faster or require less processing time than other speech recognizers.
  • One embodiment of the present invention is a speech recognition system for recognizing spoken utterances received as a speech signal from a user.
  • a prompt for requesting a spoken utterance from the user is assigned a response identifier which indicates at least one of a plurality of speech recognizers to best recognize a particular type of spoken utterance.
  • the system includes a processor for receiving the speech signal from the user in response to the prompt.
  • the processor also directs the speech signal to the at least one speech recognizer indicated by the response identifier.
  • the speech recognizer generates a plurality of spoken utterance choices from the speech signal and a probability associated with each of the plurality of choices. At least one of the spoken utterance choices is selected based on the associated probabilities.
  • FIG. 1 is a block diagram illustrating a speech recognition system in accordance with one embodiment of the present invention.
  • FIG. 2 is a flowchart illustrating the steps performed by a speech recognition system in accordance with one embodiment of the present invention.
  • FIG. 1 is a block diagram illustrating a speech recognition system 100 in accordance with one embodiment of the present invention.
  • Speech recognition system 100 includes an input/output (I/O) interface 101 .
  • I/O interface 101 interfaces system 100 to a user.
  • I/O interface 101 can be a remote interface, such as a telephone line connection as shown in FIG. 1.
  • I/O interface can also be a local interface, for example, a microphone and a speaker.
  • I/O interface 101 is coupled through a network 200 to a telephone 300 .
  • Telephone 300 enables a user of speech recognition system 100 to access the system by participating in a telephone call between telephone 300 and system 100 .
  • the user can transmit a speech signal to system 100 through telephone 300 as well as receive signals from system 100 .
  • Network 200 can be any network that enables the user at telephone 300 to dial a telephone number associated with system 100 .
  • network 200 can be the Public Switched Telephone Network (“PSTN”), a local area network, the Internet, or an intranet.
  • PSTN Public Switched Telephone Network
  • Speech recognition system 100 also includes an analog-to-digital (“A/D”) converter 103 coupled to I/O interface 101 and a processor 110 .
  • A/D converter 103 converts analog speech signals from spoken utterances received from I/O interface 101 into digital signals. Alternatively, a digital signal may be sent through a digital network. An example is where A/D converter 103 is located locally.
  • Processor 110 includes a memory 111 and a central processing unit (CPU) 11 that executes a supervisory operating system to coordinate the interactions between some of the different system elements. CPU 11 also executes application software in response to information supplied by the user.
  • CPU central processing unit
  • Memory 111 may be a combination of random access memory (RAM), read only memory (ROM) and/or erasable programmable read only memory (EPROM) components.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • One or more system components may directly access memory 111 , without the use of processor 110 .
  • one or more of the system components may incorporate its own local memory.
  • Speech recognition system 100 further includes a recognizer switch 104 , a data storage device 108 and speech recognizers A, B and C, labeled 105 , 106 and 107 , respectively.
  • Each speech recognizer includes speech processing software and hardware that receives a speech signal generated from a human utterance and generates one or more choice of words that represent the utterance. For each choice of words, a probability that the choice of words is the correct choice may also be generated by the speech recognizers. As illustrated, three speech recognizers are shown, but many such recognizers can be included in system 100 .
  • Database 108 stores prerecorded prompts used to communicate with the user by soliciting responses from the user. For example, database 108 may store prompts which request a user's account number or requests a method of payment for a particular transaction.
  • the speech recognizers utilize a Hidden Markov Model to generate a list of “N-best” choices for the solicited response.
  • An example of this method of speech recognition is disclosed in U.S. Pat. No. 5,241,619, herein incorporated by reference.
  • each speech recognizer executes post-processing routines to generate multiple choices and associated probabilities for each choice. Further, other known methods can be used by the speech recognizers to generate multiple choices and probabilities assigned to these choices.
  • Speech recognizers 105 - 107 preferably have different capabilities or effectiveness in handling specific recognition situations. That is, the recognizers may provide varying degrees of reliability under different circumstances. For example, one speech recognizer may provide the most reliable recognition of numbers or digits, another may provide high reliability for recognition of letters of the alphabet, still another may provide high reliability in a specific limited vocabulary. These capabilities may be determined by testing each of the recognizers before implementation into recognition system 100 . This testing process may include suppling each speech recognizer with an identical spoken utterance. A plurality of these identical spoken utterances may be supplied to each speech recognizer. The plurality of identical spoken utterances includes may types of spoken utterances such as digits, letters of the alphabet, limited vocabulary, etc.
  • Each recognizer then processes the spoken utterances and returns an output signal.
  • the output signals of the speech recognizers are compared with correct signals which represents the spoken utterances to determine which speech recognizer or speech recognizers correctly recognized the spoken utterances.
  • the recognizer that provides the highest reliability for recognizing a particular type of spoken utterance is assigned to recognize that particular type of spoken utterance. Additionally, there are other factors taken into consideration when assigning a speech recognizers to recognize a particular type of spoken utterance.
  • recognizer A may be assigned to recognize digits
  • recognizer B may be assigned to recognize letters of the alphabet
  • recognizer C may be assigned to recognize limited vocabulary.
  • processor 110 is programmed to select from a plurality of prompts stored in database 108 .
  • a prompt can be in the form of recorded or computer generated voice or in the form of a textual prompt if the customer has a local display that is connected to speech recognition system 100 .
  • the selected prompt is presented to the user to obtain a spoken utterance.
  • a response identifier which indicates the speech recognizer to be used for recognizing the spoken utterance. If a spoken utterance is supplied by the user, processor 110 along with switch 104 directs the response to the speech recognizer indicated by the response identifier.
  • a response identifier can indicate more than one speech recognizer if it is anticipated that the spoken utterance will contain multiple types of input utterances.
  • Each spoken utterance from the user initiates the next prompt in a flexible schedule of prompts to retrieve user information.
  • the flexible schedule of prompts for recognition system 100 is implemented in an airline reservation and information system in a manner now described.
  • a user dialing a predetermined number associated with the airline reservation and information system is connected to system 100 via network 200 .
  • Processor 110 instructs the user with a stored prompt from database 108 requesting the user to speak his account number.
  • the prompt could be “What is your account number?”
  • the account number can consist of all numbers or a combination of alpha-numeric characters.
  • Associated with the stored prompt is the response identifier.
  • the response identifier assigns at least one speech recognizer to a stored prompt in anticipation of the spoken utterance. Therefore, if system 100 anticipates receiving an alpha-numeric character as the spoken utterance for the user's account number, the response parameter will assign speech recognizer A to recognize digits and speech recognizer B to recognize letters of the alphabet.
  • a speech signal is generated in response to the user's utterance.
  • Processor 110 processes the speech signal and forwards the user's response to the speech recognizer associated with the response identifier.
  • the assigned speech recognizer decodes the received speech signal.
  • the assigned speech recognizer utilizes a Hidden Markov Model to generate one word choice or to generate a list of “N-best” choices. A probability for each word choice that that word choice is the correct word choice can also be generated.
  • the assigned speech recognizer generates one word choice, and then optionally executes post-processing routines to generate multiple choices and associated probabilities for each choice. Further, other known methods can be used by the assigned speech recognizer to generate one word choice or multiple word choices from the received speech signal.
  • each speech recognizer provides a word choice and an assigned probability. These probabilities can be compared to determine which is higher, thus indicating the recognizer that best recognized the user's utterance.
  • an output signal from each recognizer is supplied to a comparator. The comparator compares the output signals to determine if a match between the output signals occurs. If a match occurs, the matched output signal is used to generate a word choice. Alternatively, if a match does not occur, each output signal is used to generate a different word choice with an assigned probability.
  • one of the word choices is then selected using known methods of selecting the word choice with the highest probability. Once a word choice is selected, known methods can be used to confirm that the selected word choice is the correct word choice. For example, using another stored prompt from database 108 , system 100 can ask the user “Is this your correct account number? Say Yes or No.” The response from the user will confirm if the selected word choice was correct.
  • a predetermined probability threshold could be used to filter out word choices falling below a predetermined probability value. In this case, each word choice having an assigned probability below the predetermined probability value would be discarded and only word choices above the predetermined probability threshold would be presented to the user for verification.
  • speech recognition system 100 could further request, “Which airport do you wish to depart from?” Associated with this prompt is the response identifier assigning speech recognize C which most accurately recognizes words in a limited vocabulary.
  • the response identifier assigning speech recognize C which most accurately recognizes words in a limited vocabulary.
  • Table 1 below is an example of stored prompts and associated response identifier according to the present invention. TABLE 1 Stored Prompts Response Identifier What is your account number? Recognizer A & Recognizer B Which airport do you wish to depart from? Recognizer C What is your telephone number? Recognizer A Is this information correct? Say Yes or Recognizer C No. Please spell your last name. Recognizer B
  • FIG. 2 is a flowchart illustrating some of the steps performed by one embodiment of speech recognition system 100 when a user dials the telephone number associated with speech recognition system 100 from telephone 300 .
  • the call is connected to network 200 or to other call processing hardware in the manner previously described.
  • processor 110 selects a prompt stored in database 108 to present to the user. Associated with the stored prompt is a response identifier.
  • the response identifier assigns at least one speech recognizer which performs best at recognizing a particular type of human utterance.
  • the prompt is a request for the user's account number.
  • processor 110 receives a speech signal generated by the user's utterance in response to the request for the user's account number. For example, the user's account number “CBA123 ” will be spoken by the user if this is the account number assigned to the user.
  • speech recognizer A and speech recognizer B receive the speech signal representing “CBA123” from processor 100 .
  • each speech recognizer generates a plurality of choices of possible user account numbers based on the received speech signal. These choices are generated using the speech recognition hardware and software previously discussed. Associated with each choice is a probability as to whether that choice is the correct account number. Table 2 below is an example of some of the choices and associated probabilities that might be generated in response to the user's utterance of “CBA123”. The list of choices in Table 2 can include the choices for both recognizer A and recognizer B or a separate list of choices and assigned probabilities can be created for each recognizer. TABLE 2 Account Number Probabilities CBA123 .01 ZBA123 .003 BBA023 .006 GCK123 .005
  • the user account number with the highest probability is presented to the user (e.g., “CBA123” in the example of Table 2).
  • the user may be asked whether the presented user account number is the correct account number.
  • a prompt with a response identifier assigning Recognizer C can request the user to verify that “CBA123” is the correct account number by asking the user, “Is your account number CBA123? Say Yes or NO.”
  • processor 110 determines whether the presented user account number is the correct account number. If it is, then speech recognition system 100 has successfully recognized the correct user account number.
  • step 306 the account number with the next highest probability (i.e., “ZBA123” in the example of Table 2) maybe presented to the user. Steps 305 and 306 are repeated until the correct account number is successfully recognized.
  • the present invention utilizes multiple speech recognizers to increase the accuracy of spoken utterances from a user.
  • a prompt used to solicit a response from a user in the form of a spoken utterance, is assigned to a speech recognizer designed to best recognize a particular type of spoken utterance.
  • this allows speech recognition to proceed more quickly and accurately and with less disruption to the user.

Abstract

A speech recognition system recognizes spoken utterances received as a speech signal from a user. A prompt for requesting a spoken utterance from the user is assigned a response identifier which indicates at least one of a plurality of speech recognizers to best recognize a particular type of spoken utterance. The system includes a processor for receiving the speech signal from the user in response to the prompt. The processor also directs the speech signal to the at least one speech recognizer indicated by the response identifier. The speech recognizer generates a plurality of spoken utterance choices from the speech signal and a probability associated with each of the plurality of choices. At least one of the spoken utterance choices is selected based on the associated probabilities.

Description

    FIELD OF THE INVENTION
  • The present invention is directed to a speech recognition system. More particularly, the present invention is directed to a speech recognition system that uses multiple speech recognizers to increase its accuracy. [0001]
  • BACKGROUND OF THE INVENTION
  • Speech recognition systems are increasingly being used to translate human spoken words or utterances into their written equivalent and meaning. Speech recognition systems can avoid the need for spoken utterances to be manually entered into a computer, or to be recognized by a human. Therefore, speech recognition systems are desirable for many businesses because these systems can minimize the number of human operators needed to handle calls from customers. [0002]
  • One drawback to speech recognition systems however, is that they can provide inaccurate results. An exact correspondence between the spoken utterance and an output recognized by a speech recognizer is difficult to attain due to, for example, the deterioration of speech signals that routinely occurs over conventional telephone lines and algorithmic limitations. Such deterioration present in the speech signals may cause a speech recognizer to produce a recognized output that does not correspond to the spoken utterance. Because of limitations introduced into the speech signal by the telephone lines, the speech recognizer may confuse similar sounding letters and numbers. Thus, a speech recognizer may confuse the letter “B” with the number “3” or the letter “C”. For example, given that a user utters the numbers “123” into a telephone, the speech recognizer may produce “12B” as the output. [0003]
  • Additionally, various speech recognizers have their own strengths and weaknesses with respect to accurately identifying spoken utterances. For example, one speech recognizer may perform better at recognizing a sequence of alpha-numeric characters while other speech recognizers perform better at recognizing proper nouns such as for examples, names of places, people and things. Also, some speech recognizers can execute certain tasks faster or require less processing time than other speech recognizers. [0004]
  • If such speech recognition systems are utilized, it is important that the speaker communicate accurate information to the system with maximum machine assistance and minimum user intervention. For example, it is desirable that the user be prompted as few times as possible to repeat questionable information or to supply additional information for the speech recognition system to reach the correct result. [0005]
  • Based on the foregoing there is a need for a speech recognition system that has an increased recognition accuracy without the necessity of relying on human operator intervention or requiring additional input from the user. [0006]
  • SUMMARY OF THE INVENTION
  • One embodiment of the present invention is a speech recognition system for recognizing spoken utterances received as a speech signal from a user. A prompt for requesting a spoken utterance from the user is assigned a response identifier which indicates at least one of a plurality of speech recognizers to best recognize a particular type of spoken utterance. The system includes a processor for receiving the speech signal from the user in response to the prompt. The processor also directs the speech signal to the at least one speech recognizer indicated by the response identifier. The speech recognizer generates a plurality of spoken utterance choices from the speech signal and a probability associated with each of the plurality of choices. At least one of the spoken utterance choices is selected based on the associated probabilities. [0007]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating a speech recognition system in accordance with one embodiment of the present invention. [0008]
  • FIG. 2 is a flowchart illustrating the steps performed by a speech recognition system in accordance with one embodiment of the present invention. [0009]
  • DETAILED DESCRIPTION
  • One embodiment of the present invention is a speech recognition system that uses multiple speech recognizers to enhance its recognition accuracy. FIG. 1 is a block diagram illustrating a [0010] speech recognition system 100 in accordance with one embodiment of the present invention. Speech recognition system 100 includes an input/output (I/O) interface 101. I/O interface 101 interfaces system 100 to a user. I/O interface 101 can be a remote interface, such as a telephone line connection as shown in FIG. 1. I/O interface can also be a local interface, for example, a microphone and a speaker.
  • In the embodiment shown in FIG. 1, I/[0011] O interface 101 is coupled through a network 200 to a telephone 300. Telephone 300 enables a user of speech recognition system 100 to access the system by participating in a telephone call between telephone 300 and system 100. The user can transmit a speech signal to system 100 through telephone 300 as well as receive signals from system 100. Network 200 can be any network that enables the user at telephone 300 to dial a telephone number associated with system 100. For example, network 200 can be the Public Switched Telephone Network (“PSTN”), a local area network, the Internet, or an intranet.
  • [0012] Speech recognition system 100 also includes an analog-to-digital (“A/D”) converter 103 coupled to I/O interface 101 and a processor 110. A/D converter 103 converts analog speech signals from spoken utterances received from I/O interface 101 into digital signals. Alternatively, a digital signal may be sent through a digital network. An example is where A/D converter 103 is located locally. These digital signals are then received by processor 110. Processor 110 includes a memory 111 and a central processing unit (CPU) 11 that executes a supervisory operating system to coordinate the interactions between some of the different system elements. CPU 11 also executes application software in response to information supplied by the user.
  • Memory [0013] 111 may be a combination of random access memory (RAM), read only memory (ROM) and/or erasable programmable read only memory (EPROM) components. One or more system components may directly access memory 111, without the use of processor 110. In addition, one or more of the system components may incorporate its own local memory.
  • [0014] Speech recognition system 100 further includes a recognizer switch 104, a data storage device 108 and speech recognizers A, B and C, labeled 105, 106 and 107, respectively. Each speech recognizer includes speech processing software and hardware that receives a speech signal generated from a human utterance and generates one or more choice of words that represent the utterance. For each choice of words, a probability that the choice of words is the correct choice may also be generated by the speech recognizers. As illustrated, three speech recognizers are shown, but many such recognizers can be included in system 100. Database 108 stores prerecorded prompts used to communicate with the user by soliciting responses from the user. For example, database 108 may store prompts which request a user's account number or requests a method of payment for a particular transaction.
  • In one embodiment of the present invention, the speech recognizers utilize a Hidden Markov Model to generate a list of “N-best” choices for the solicited response. An example of this method of speech recognition is disclosed in U.S. Pat. No. 5,241,619, herein incorporated by reference. In another embodiment, each speech recognizer executes post-processing routines to generate multiple choices and associated probabilities for each choice. Further, other known methods can be used by the speech recognizers to generate multiple choices and probabilities assigned to these choices. [0015]
  • Speech recognizers [0016] 105-107 preferably have different capabilities or effectiveness in handling specific recognition situations. That is, the recognizers may provide varying degrees of reliability under different circumstances. For example, one speech recognizer may provide the most reliable recognition of numbers or digits, another may provide high reliability for recognition of letters of the alphabet, still another may provide high reliability in a specific limited vocabulary. These capabilities may be determined by testing each of the recognizers before implementation into recognition system 100. This testing process may include suppling each speech recognizer with an identical spoken utterance. A plurality of these identical spoken utterances may be supplied to each speech recognizer. The plurality of identical spoken utterances includes may types of spoken utterances such as digits, letters of the alphabet, limited vocabulary, etc. Each recognizer then processes the spoken utterances and returns an output signal. The output signals of the speech recognizers are compared with correct signals which represents the spoken utterances to determine which speech recognizer or speech recognizers correctly recognized the spoken utterances. After employing multiple types of spoken utterances such as digits, letters of the alphabet, limited vocabulary, etc., the recognizer that provides the highest reliability for recognizing a particular type of spoken utterance is assigned to recognize that particular type of spoken utterance. Additionally, there are other factors taken into consideration when assigning a speech recognizers to recognize a particular type of spoken utterance. These factors may include the cost associated with a speech recognizer recognizing a particular type of spoken utterance and the speed at which the speech recognize can recognize the particular type of spoken utterance. According to an embodiment of the present invention, recognizer A may be assigned to recognize digits, recognizer B may be assigned to recognize letters of the alphabet and recognizer C may be assigned to recognize limited vocabulary.
  • In continuing with the description of an embodiment of the present invention, processor [0017] 110 is programmed to select from a plurality of prompts stored in database 108. A prompt can be in the form of recorded or computer generated voice or in the form of a textual prompt if the customer has a local display that is connected to speech recognition system 100. The selected prompt is presented to the user to obtain a spoken utterance. Associated with each prompt is a response identifier which indicates the speech recognizer to be used for recognizing the spoken utterance. If a spoken utterance is supplied by the user, processor 110 along with switch 104 directs the response to the speech recognizer indicated by the response identifier. According to the present invention, a response identifier can indicate more than one speech recognizer if it is anticipated that the spoken utterance will contain multiple types of input utterances.
  • Each spoken utterance from the user initiates the next prompt in a flexible schedule of prompts to retrieve user information. For the purpose of this application the flexible schedule of prompts for [0018] recognition system 100 is implemented in an airline reservation and information system in a manner now described.
  • A user dialing a predetermined number associated with the airline reservation and information system is connected to [0019] system 100 via network 200. Processor 110 instructs the user with a stored prompt from database 108 requesting the user to speak his account number. For example, in the airline reservation embodiment, the prompt could be “What is your account number?” The account number can consist of all numbers or a combination of alpha-numeric characters. Associated with the stored prompt is the response identifier. The response identifier assigns at least one speech recognizer to a stored prompt in anticipation of the spoken utterance. Therefore, if system 100 anticipates receiving an alpha-numeric character as the spoken utterance for the user's account number, the response parameter will assign speech recognizer A to recognize digits and speech recognizer B to recognize letters of the alphabet.
  • Once the user supplies [0020] recognition system 100 with a spoken utterance in response to the request for his account number, a speech signal is generated in response to the user's utterance. Processor 110 processes the speech signal and forwards the user's response to the speech recognizer associated with the response identifier. The assigned speech recognizer decodes the received speech signal. In one embodiment, the assigned speech recognizer utilizes a Hidden Markov Model to generate one word choice or to generate a list of “N-best” choices. A probability for each word choice that that word choice is the correct word choice can also be generated. In another embodiment, the assigned speech recognizer generates one word choice, and then optionally executes post-processing routines to generate multiple choices and associated probabilities for each choice. Further, other known methods can be used by the assigned speech recognizer to generate one word choice or multiple word choices from the received speech signal.
  • In the case where multiple speech recognizers have been assigned, each speech recognizer provides a word choice and an assigned probability. These probabilities can be compared to determine which is higher, thus indicating the recognizer that best recognized the user's utterance. In another embodiment, an output signal from each recognizer is supplied to a comparator. The comparator compares the output signals to determine if a match between the output signals occurs. If a match occurs, the matched output signal is used to generate a word choice. Alternatively, if a match does not occur, each output signal is used to generate a different word choice with an assigned probability. [0021]
  • After the word choices and the probabilities have been assigned, one of the word choices is then selected using known methods of selecting the word choice with the highest probability. Once a word choice is selected, known methods can be used to confirm that the selected word choice is the correct word choice. For example, using another stored prompt from [0022] database 108, system 100 can ask the user “Is this your correct account number? Say Yes or No.” The response from the user will confirm if the selected word choice was correct.
  • Alternatively, a predetermined probability threshold could be used to filter out word choices falling below a predetermined probability value. In this case, each word choice having an assigned probability below the predetermined probability value would be discarded and only word choices above the predetermined probability threshold would be presented to the user for verification. [0023]
  • In response to a verified spoken utterance, [0024] speech recognition system 100 could further request, “Which airport do you wish to depart from?” Associated with this prompt is the response identifier assigning speech recognize C which most accurately recognizes words in a limited vocabulary. For the purpose to the airline reservation and information system example of the present invention, a list of all the names of the airports would be stored by speech recognizer C. Table 1 below is an example of stored prompts and associated response identifier according to the present invention.
    TABLE 1
    Stored Prompts Response Identifier
    What is your account number? Recognizer A & Recognizer B
    Which airport do you wish to depart from? Recognizer C
    What is your telephone number? Recognizer A
    Is this information correct? Say Yes or Recognizer C
    No.
    Please spell your last name. Recognizer B
  • FIG. 2 is a flowchart illustrating some of the steps performed by one embodiment of [0025] speech recognition system 100 when a user dials the telephone number associated with speech recognition system 100 from telephone 300. The call is connected to network 200 or to other call processing hardware in the manner previously described.
  • At [0026] step 300, processor 110 selects a prompt stored in database 108 to present to the user. Associated with the stored prompt is a response identifier. The response identifier assigns at least one speech recognizer which performs best at recognizing a particular type of human utterance. In step 300, the prompt is a request for the user's account number.
  • At [0027] step 301, processor 110 receives a speech signal generated by the user's utterance in response to the request for the user's account number. For example, the user's account number “CBA123 ” will be spoken by the user if this is the account number assigned to the user.
  • At [0028] step 302, speech recognizer A and speech recognizer B receive the speech signal representing “CBA123” from processor 100. At step 303, each speech recognizer generates a plurality of choices of possible user account numbers based on the received speech signal. These choices are generated using the speech recognition hardware and software previously discussed. Associated with each choice is a probability as to whether that choice is the correct account number. Table 2 below is an example of some of the choices and associated probabilities that might be generated in response to the user's utterance of “CBA123”. The list of choices in Table 2 can include the choices for both recognizer A and recognizer B or a separate list of choices and assigned probabilities can be created for each recognizer.
    TABLE 2
    Account Number Probabilities
    CBA123 .01
    ZBA123 .003
    BBA023 .006
    GCK123 .005
  • At [0029] step 304, the user account number with the highest probability is presented to the user (e.g., “CBA123” in the example of Table 2). In addition, the user may be asked whether the presented user account number is the correct account number. For example, a prompt with a response identifier assigning Recognizer C, can request the user to verify that “CBA123” is the correct account number by asking the user, “Is your account number CBA123? Say Yes or NO.”
  • At [0030] step 305, based on the response from the user at step 304, processor 110 determines whether the presented user account number is the correct account number. If it is, then speech recognition system 100 has successfully recognized the correct user account number.
  • However, if it is not the correct account number, at [0031] step 306, the account number with the next highest probability (i.e., “ZBA123” in the example of Table 2) maybe presented to the user. Steps 305 and 306 are repeated until the correct account number is successfully recognized.
  • As disclosed, the present invention utilizes multiple speech recognizers to increase the accuracy of spoken utterances from a user. A prompt, used to solicit a response from a user in the form of a spoken utterance, is assigned to a speech recognizer designed to best recognize a particular type of spoken utterance. Thus, this allows speech recognition to proceed more quickly and accurately and with less disruption to the user. Several embodiments of the present invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. [0032]

Claims (23)

What is claimed is:
1. A method for recognizing a spoken utterance of a user comprising the steps of:
assigning at least one of a plurality of speech recognizers to a prompt for requesting a spoken utterance;
presenting a user with said assigned prompt;
receiving said spoken utterance from said user in response to said prompt and generating a speech signal from said received spoken utterance; and
directing said speech signal to the at least one assigned speech recognizer;
2. The method of
claim 1
, further comprising the steps of:
generating a plurality of spoken utterance choices from the speech signal and a probability associated with each of said plurality of choices; and
selecting the spoken utterance based on the associated probabilities.
3. The method of
claim 2
, wherein the step of selecting the spoken utterance comprises the steps of:
presenting a highest probability spoken utterance choice to the user; and
determining whether the presented highest spoken utterance choice is correct.
4. The method of
claim 3
, wherein the step of selecting the spoken utterance comprises the steps of:
presenting a next highest probability spoken utterance choice to the user;
determining whether the presented next highest spoken utterance choice is correct; and
repeating the steps of presenting a next highest probability and determining whether the presented next highest probability is correct if it is determined at the step of determining whether the presented next highest probability is correct, that the next highest spoken utterance choice is not correct.
5. The method of
claim 2
, wherein the step of generating a plurality of spoken utterance choices comprises using a hidden Markov Model to generate N-best choices.
6. The method of
claim 2
, wherein the step of generating a plurality of spoken utterance choices comprises the steps of:
recognizing a first spoken utterance; and
post-processing the first spoken utterance to generate the plurality of spoken utterance choices;
7. The method of
claim 1
, wherein the step of assigning comprises assigning at least one speech recognizer that provides a highest reliability for recognizing a particular type of spoken utterance.
8. The method of
claim 7
, wherein said particular type of spoken utterance includes numbers.
9. The method of
claim 7
, wherein said particular type of spoken utterance includes letters of the alphabet.
10. The method of
claim 7
, wherein said particular type of spoken utterance includes a limited vocabulary.
11. A speech recognition system for recognizing a spoken utterance comprising:
a processor for presenting a user with a prompt; and
a plurality of speech recognizers coupled to said processor, wherein each speech recognizer generates a plurality of spoken utterance choices from a speech signal and a probability for each spoken utterance choice;
wherein said prompt is assigned at least one speech recognizer that provides a highest reliability for recognizing a particular type of spoken utterance.
12. The speech recognition system of
claim 11
, further comprising a database coupled to said processor, said database having stored therein a plurality of prompts assigned to at least one speech recognizer that provides a highest reliability for recognizing a particular type of spoken utterance.
13. The speech recognition system of
claim 11
, further comprising an input/output interface to receive a speech signal from a user.
14. The speech recognition system of
claim 13
, wherein said input/output interface is coupled through a network and a telephone.
15. The speech recognition system of
claim 13
, wherein said input/output interface comprises a microphone.
16. The speech recognition system of
claim 14
, wherein said network is the Internet.
17. The speech recognition system of
claim 14
, wherein said network is a public switched telephone network.
18. The speech recognition system of
claim 14
, wherein each speech recognizer generates a plurality of spoken utterance choices using a hidden Markov Model to generate N-best choices.
19. The speech recognition system of
claim 11
, wherein each prompt is assigned at least one speech recognizer that provides a highest reliability for recognizing a particular type of spoken utterance.
20. The speech recognition system of
claim 19
, wherein said particular type of spoken utterance includes numbers.
21. The speech recognition system of
claim 19
, wherein said particular type of spoken utterance includes letters of the alphabet.
22. The speech recognition system of
claim 19
, wherein said particular type of spoken utterance includes a limited vocabulary.
23. A method for recognizing a plurality of utterances spoken by a user, the method comprising the steps of:
generating a first prompt requesting a first utterance;
receiving said first utterance;
generating a first speech signal based on said received first utterance;
directing said first speech signal to a first speech recognizer assigned to recognize speech received in response to said first prompt;
generating a second prompt requesting a second utterance;
receiving said second utterance;
generating a second speech signal based on said received second utterance; and
directing said second speech signal to a second speech recognizer assigned to recognize speech received in response to said second prompt.
US09/221,582 1998-12-29 1998-12-29 Distributed recognition system having multiple prompt-specific and response-specific speech recognizers Expired - Lifetime US6377922B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/221,582 US6377922B2 (en) 1998-12-29 1998-12-29 Distributed recognition system having multiple prompt-specific and response-specific speech recognizers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/221,582 US6377922B2 (en) 1998-12-29 1998-12-29 Distributed recognition system having multiple prompt-specific and response-specific speech recognizers

Publications (2)

Publication Number Publication Date
US20010016813A1 true US20010016813A1 (en) 2001-08-23
US6377922B2 US6377922B2 (en) 2002-04-23

Family

ID=22828400

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/221,582 Expired - Lifetime US6377922B2 (en) 1998-12-29 1998-12-29 Distributed recognition system having multiple prompt-specific and response-specific speech recognizers

Country Status (1)

Country Link
US (1) US6377922B2 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030083883A1 (en) * 2001-10-31 2003-05-01 James Cyr Distributed speech recognition system
US20030154077A1 (en) * 2002-02-13 2003-08-14 International Business Machines Corporation Voice command processing system and computer therefor, and voice command processing method
US6615172B1 (en) 1999-11-12 2003-09-02 Phoenix Solutions, Inc. Intelligent query engine for processing voice based queries
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US20040034527A1 (en) * 2002-02-23 2004-02-19 Marcus Hennecke Speech recognition system
US20040054569A1 (en) * 2002-07-31 2004-03-18 Alvaro Pombo Contextual computing system
US20040249635A1 (en) * 1999-11-12 2004-12-09 Bennett Ian M. Method for processing speech signal features for streaming transport
US20050119897A1 (en) * 1999-11-12 2005-06-02 Bennett Ian M. Multi-language speech recognition system
US20060080397A1 (en) * 2004-10-08 2006-04-13 Marc Chene Content management across shared, mobile file systems
US7050977B1 (en) 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US20060161646A1 (en) * 2005-01-19 2006-07-20 Marc Chene Policy-driven mobile forms applications
US20060271568A1 (en) * 2005-05-25 2006-11-30 Experian Marketing Solutions, Inc. Distributed and interactive database architecture for parallel and asynchronous data processing of complex data and for real-time query processing
US20080065390A1 (en) * 2006-09-12 2008-03-13 Soonthorn Ativanichayaphong Dynamically Generating a Vocal Help Prompt in a Multimodal Application
US7725321B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Speech based query system using semantic decoding
US20130289995A1 (en) * 2010-04-27 2013-10-31 Zte Corporation Method and Device for Voice Controlling
WO2015111850A1 (en) * 2014-01-22 2015-07-30 Samsung Electronics Co., Ltd. Interactive system, display apparatus, and controlling method thereof
US10147428B1 (en) * 2018-05-30 2018-12-04 Green Key Technologies Llc Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1168737B8 (en) * 2000-06-30 2010-03-10 Alcatel Lucent Telecommunication system, and switch, and server, and method
EP1304682A1 (en) * 2000-07-05 2003-04-23 Alcatel Distributed speech recognition system
JP2002116796A (en) * 2000-10-11 2002-04-19 Canon Inc Voice processor and method for voice processing and storage medium
US7203651B2 (en) * 2000-12-07 2007-04-10 Art-Advanced Recognition Technologies, Ltd. Voice control system with multiple voice recognition engines
GB0030079D0 (en) * 2000-12-09 2001-01-24 Hewlett Packard Co Voice exchanges with voice service systems
US6836758B2 (en) * 2001-01-09 2004-12-28 Qualcomm Incorporated System and method for hybrid voice recognition
US6701293B2 (en) * 2001-06-13 2004-03-02 Intel Corporation Combining N-best lists from multiple speech recognizers
US6898567B2 (en) * 2001-12-29 2005-05-24 Motorola, Inc. Method and apparatus for multi-level distributed speech recognition
WO2003084196A1 (en) * 2002-03-28 2003-10-09 Martin Dunsmuir Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel
US8239197B2 (en) * 2002-03-28 2012-08-07 Intellisist, Inc. Efficient conversion of voice messages into text
US6834265B2 (en) 2002-12-13 2004-12-21 Motorola, Inc. Method and apparatus for selective speech recognition
US7197331B2 (en) * 2002-12-30 2007-03-27 Motorola, Inc. Method and apparatus for selective distributed speech recognition
US7076428B2 (en) * 2002-12-30 2006-07-11 Motorola, Inc. Method and apparatus for selective distributed speech recognition
US7542907B2 (en) * 2003-12-19 2009-06-02 International Business Machines Corporation Biasing a speech recognizer based on prompt context
KR100695127B1 (en) 2004-10-08 2007-03-14 삼성전자주식회사 Multi-Layered speech recognition apparatus and method
US7865364B2 (en) * 2005-05-05 2011-01-04 Nuance Communications, Inc. Avoiding repeated misunderstandings in spoken dialog system
US7640158B2 (en) 2005-11-08 2009-12-29 Multimodal Technologies, Inc. Automatic detection and application of editing patterns in draft documents
US8364481B2 (en) 2008-07-02 2013-01-29 Google Inc. Speech recognition with parallel recognition tasks
US7933777B2 (en) * 2008-08-29 2011-04-26 Multimodal Technologies, Inc. Hybrid speech recognition
US20100125450A1 (en) * 2008-10-27 2010-05-20 Spheris Inc. Synchronized transcription rules handling
US20110184736A1 (en) * 2010-01-26 2011-07-28 Benjamin Slotznick Automated method of recognizing inputted information items and selecting information items
US8635066B2 (en) * 2010-04-14 2014-01-21 T-Mobile Usa, Inc. Camera-assisted noise cancellation and speech recognition
CA2839266C (en) * 2011-06-19 2022-05-03 Mmodal Ip Llc Document extension in dictation-based document generation workflow
US9679077B2 (en) 2012-06-29 2017-06-13 Mmodal Ip Llc Automated clinical evidence sheet workflow
US10156956B2 (en) 2012-08-13 2018-12-18 Mmodal Ip Llc Maintaining a discrete data representation that corresponds to information contained in free-form text
US10950329B2 (en) 2015-03-13 2021-03-16 Mmodal Ip Llc Hybrid human and computer-assisted coding workflow
US10217464B2 (en) * 2016-05-13 2019-02-26 Koninklijke Philips N.V. Vocabulary generation system
WO2018136417A1 (en) 2017-01-17 2018-07-26 Mmodal Ip Llc Methods and systems for manifestation and transmission of follow-up notifications
US11282596B2 (en) 2017-11-22 2022-03-22 3M Innovative Properties Company Automated code feedback system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4896346A (en) * 1988-11-21 1990-01-23 American Telephone And Telegraph Company, At&T Bell Laboratories Password controlled switching system
US5241619A (en) * 1991-06-25 1993-08-31 Bolt Beranek And Newman Inc. Word dependent N-best search method
US5586171A (en) * 1994-07-07 1996-12-17 Bell Atlantic Network Services, Inc. Selection of a voice recognition data base responsive to video data
US5787455A (en) * 1995-12-28 1998-07-28 Motorola, Inc. Method and apparatus for storing corrected words with previous user-corrected recognition results to improve recognition
US5799065A (en) * 1996-05-06 1998-08-25 Matsushita Electric Industrial Co., Ltd. Call routing device employing continuous speech
US5915001A (en) * 1996-11-14 1999-06-22 Vois Corporation System and method for providing and using universally accessible voice and speech data files
US6061654A (en) * 1996-12-16 2000-05-09 At&T Corp. System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices
US5970446A (en) * 1997-11-25 1999-10-19 At&T Corp Selective noise/channel/coding models and recognizers for automatic speech recognition

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7698131B2 (en) 1999-11-12 2010-04-13 Phoenix Solutions, Inc. Speech recognition system for client devices having differing computing capabilities
US7725307B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
US6615172B1 (en) 1999-11-12 2003-09-02 Phoenix Solutions, Inc. Intelligent query engine for processing voice based queries
US8229734B2 (en) 1999-11-12 2012-07-24 Phoenix Solutions, Inc. Semantic decoding of user queries
US8762152B2 (en) 1999-11-12 2014-06-24 Nuance Communications, Inc. Speech recognition system interactive agent
US7912702B2 (en) 1999-11-12 2011-03-22 Phoenix Solutions, Inc. Statistical language model trained with semantic variants
US20040249635A1 (en) * 1999-11-12 2004-12-09 Bennett Ian M. Method for processing speech signal features for streaming transport
US20050119897A1 (en) * 1999-11-12 2005-06-02 Bennett Ian M. Multi-language speech recognition system
US20050144004A1 (en) * 1999-11-12 2005-06-30 Bennett Ian M. Speech recognition system interactive agent
US20050144001A1 (en) * 1999-11-12 2005-06-30 Bennett Ian M. Speech recognition system trained with regional speech characteristics
US7873519B2 (en) 1999-11-12 2011-01-18 Phoenix Solutions, Inc. Natural language speech lattice containing semantic variants
US7050977B1 (en) 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US7831426B2 (en) 1999-11-12 2010-11-09 Phoenix Solutions, Inc. Network based interactive speech recognition system
US20060200353A1 (en) * 1999-11-12 2006-09-07 Bennett Ian M Distributed Internet Based Speech Recognition System With Natural Language Support
US7702508B2 (en) 1999-11-12 2010-04-20 Phoenix Solutions, Inc. System and method for natural language processing of query answers
US7729904B2 (en) 1999-11-12 2010-06-01 Phoenix Solutions, Inc. Partial speech processing device and method for use in distributed systems
US7725320B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Internet based speech recognition system with dynamic grammars
US8352277B2 (en) 1999-11-12 2013-01-08 Phoenix Solutions, Inc. Method of interacting through speech with a web-connected server
US7725321B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Speech based query system using semantic decoding
US7647225B2 (en) 1999-11-12 2010-01-12 Phoenix Solutions, Inc. Adjustable resource based speech recognition system
US7657424B2 (en) 1999-11-12 2010-02-02 Phoenix Solutions, Inc. System and method for processing sentence based queries
US7672841B2 (en) 1999-11-12 2010-03-02 Phoenix Solutions, Inc. Method for processing speech data for a distributed recognition system
US6633846B1 (en) 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
US9190063B2 (en) 1999-11-12 2015-11-17 Nuance Communications, Inc. Multi-language speech recognition system
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US7146321B2 (en) * 2001-10-31 2006-12-05 Dictaphone Corporation Distributed speech recognition system
US20030083883A1 (en) * 2001-10-31 2003-05-01 James Cyr Distributed speech recognition system
US7299187B2 (en) * 2002-02-13 2007-11-20 International Business Machines Corporation Voice command processing system and computer therefor, and voice command processing method
US20030154077A1 (en) * 2002-02-13 2003-08-14 International Business Machines Corporation Voice command processing system and computer therefor, and voice command processing method
US7392189B2 (en) * 2002-02-23 2008-06-24 Harman Becker Automotive Systems Gmbh System for speech recognition with multi-part recognition
US20040034527A1 (en) * 2002-02-23 2004-02-19 Marcus Hennecke Speech recognition system
US20040054569A1 (en) * 2002-07-31 2004-03-18 Alvaro Pombo Contextual computing system
US20110153465A1 (en) * 2002-07-31 2011-06-23 Truecontext Corporation Contextual computing system
US20060080397A1 (en) * 2004-10-08 2006-04-13 Marc Chene Content management across shared, mobile file systems
US20060161646A1 (en) * 2005-01-19 2006-07-20 Marc Chene Policy-driven mobile forms applications
US8510329B2 (en) * 2005-05-25 2013-08-13 Experian Marketing Solutions, Inc. Distributed and interactive database architecture for parallel and asynchronous data processing of complex data and for real-time query processing
US20060271568A1 (en) * 2005-05-25 2006-11-30 Experian Marketing Solutions, Inc. Distributed and interactive database architecture for parallel and asynchronous data processing of complex data and for real-time query processing
US8086463B2 (en) * 2006-09-12 2011-12-27 Nuance Communications, Inc. Dynamically generating a vocal help prompt in a multimodal application
US20080065390A1 (en) * 2006-09-12 2008-03-13 Soonthorn Ativanichayaphong Dynamically Generating a Vocal Help Prompt in a Multimodal Application
US20130289995A1 (en) * 2010-04-27 2013-10-31 Zte Corporation Method and Device for Voice Controlling
US9236048B2 (en) * 2010-04-27 2016-01-12 Zte Corporation Method and device for voice controlling
WO2015111850A1 (en) * 2014-01-22 2015-07-30 Samsung Electronics Co., Ltd. Interactive system, display apparatus, and controlling method thereof
US9886952B2 (en) 2014-01-22 2018-02-06 Samsung Electronics Co., Ltd. Interactive system, display apparatus, and controlling method thereof
US10147428B1 (en) * 2018-05-30 2018-12-04 Green Key Technologies Llc Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof
US11545152B2 (en) 2018-05-30 2023-01-03 Green Key Technologies, Inc. Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

Also Published As

Publication number Publication date
US6377922B2 (en) 2002-04-23

Similar Documents

Publication Publication Date Title
US6377922B2 (en) Distributed recognition system having multiple prompt-specific and response-specific speech recognizers
EP0890249B1 (en) Apparatus and method for reducing speech recognition vocabulary perplexity and dynamically selecting acoustic models
US5566272A (en) Automatic speech recognition (ASR) processing using confidence measures
US5488652A (en) Method and apparatus for training speech recognition algorithms for directory assistance applications
US6751591B1 (en) Method and system for predicting understanding errors in a task classification system
USRE38101E1 (en) Methods and apparatus for performing speaker independent recognition of commands in parallel with speaker dependent recognition of names, words or phrases
US6487530B1 (en) Method for recognizing non-standard and standard speech by speaker independent and speaker dependent word models
US6766295B1 (en) Adaptation of a speech recognition system across multiple remote sessions with a speaker
US6922669B2 (en) Knowledge-based strategies applied to N-best lists in automatic speech recognition systems
CN107077848B (en) Method, computer system and program product for performing speaker recognition
US6983244B2 (en) Method and apparatus for improved speech recognition with supplementary information
US7058573B1 (en) Speech recognition system to selectively utilize different speech recognition techniques over multiple speech recognition passes
US5638425A (en) Automated directory assistance system using word recognition and phoneme processing method
US6269335B1 (en) Apparatus and methods for identifying homophones among words in a speech recognition system
US6687673B2 (en) Speech recognition system
US9286887B2 (en) Concise dynamic grammars using N-best selection
US8160986B2 (en) Method and system for identifying information related to a good utilizing conditional probabilities of correct recognition
CA2221913C (en) Statistical database correction of alphanumeric account numbers for speech recognition and touch-tone recognition
US6061654A (en) System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices
US20050288922A1 (en) Method and system for speech recognition
JP3703991B2 (en) Method and apparatus for dynamic speech recognition using free speech scoring method
US20010056345A1 (en) Method and system for speech recognition of the alphabet
Cole et al. Experiments with a spoken dialogue system for taking the US census
US6952674B2 (en) Selecting an acoustic model in a speech recognition system
US7970610B2 (en) Speech recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: AT&T CORP., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BROWN, DEBORAH W.;GOLDBERG, RANDY G.;ROSINSKI, RICHARD R.;AND OTHERS;REEL/FRAME:009896/0444;SIGNING DATES FROM 19990310 TO 19990312

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: AT&T INTELLECTUAL PROPERTY II, L.P., GEORGIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T PROPERTIES, LLC;REEL/FRAME:036737/0686

Effective date: 20150821

Owner name: AT&T PROPERTIES, LLC, NEVADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T CORP.;REEL/FRAME:036737/0479

Effective date: 20150821

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T INTELLECTUAL PROPERTY II, L.P.;REEL/FRAME:041498/0316

Effective date: 20161214