US3828132A - Speech synthesis by concatenation of formant encoded words - Google Patents
Speech synthesis by concatenation of formant encoded words Download PDFInfo
- Publication number
- US3828132A US3828132A US00085660A US8566070A US3828132A US 3828132 A US3828132 A US 3828132A US 00085660 A US00085660 A US 00085660A US 8566070 A US8566070 A US 8566070A US 3828132 A US3828132 A US 3828132A
- Authority
- US
- United States
- Prior art keywords
- word
- words
- message
- descriptions
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
Definitions
- ABSTRACT Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding.
- human speech is analyzed in terms of formant structure and coded for storage in the unit.
- a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
- PHYONEMES DIGIT POSITION 3 450 500 560 6IO [)l g fl 4 260 300 340 380 SEQUENCE 5 340 370 4IO 440 TIME (m SEC.)
- FIG. 4 F3 IG 5 IFP sow ThlS mon PATENTED NIB SHEET '4 BF 6
- FIG. 6A
- This invention relates to the synthesis of limited context messages from stored data, and more particularly to processing techniques for assembling stored information into an appropriate specification for energizing a speech synthesizer.
- the response to the question is eventually provided in the form of complete spoken utterances.
- Speech generated by the system would be as intelligible as natural speech. Indeed, the possibility exists that it might be made more intelligible than natural speech. It need not, however, sound like any particular human and may even be permitted to have a machine accent.
- individual speech sounds may be stored in the form of phoneme specifications. Such specifications can be called out of storage in accordance with word and message assembly rules and used to energize a speech synthesizer.
- speech at the acoustic level is not particularly discrete. Articulations of adjacent phonemes interact, and transient movements of the vocal tract in the production of any phoneme last much longer than the average duration of the phoneme. That is, the articulatory gestures overlap and are superimposed on one another. Hence, transient motions of the vocal tract are perceptually important. Moreover, much information about the identity of a constant is carried, not by the spectral shape at the steady-state time of the consonant but by its dynamic interactions with adjacent phonemes.
- Speech synthesis therefore, is strongly concerned with dynamics.
- a synthesizer must reproduce not only the characteristics of sounds when they most nearly represent the ideal of each phoneme, but also the dynamics of vocaltract motion as it progresses from one phoneme to another. This fact highlights a difference between speech synthesis from word or phrase storage and synthesis from more elementary speech units. If the library of speech elements is a small number of short units, such as phonemes, the linking procedures approach the complexity of the vocal tract itself. Conversely, if the library of speech elements is a much larger number of longer segments of speech, such as words or phrases, the elements can be linked together at points in the message where information in transients is minimal.
- formant frequencies i.e., the time course of the relevant parameters
- a steady-state sound can be lengthened or shortened, and even the entire utterance can be speeded up, or slowed down with little or no loss in intelligibility.
- Formants can be locally distorted, and the entire formant contour can be uniformly raised, or lowered, to alter voice quality.
- word length formant data are accessed and concatenated to form complete formant functions for the desired utterance.
- the formant functions are interpolated in accordance with spectral derivatives to establish contours which define smooth transitions between words. Speech contour and word duration data are calculated according to stored rules.
- concatenated formant functions are used to synthesize a waveform which approximates a naturally spoken message.
- economy in storage is achieved because formant and excitation parameters change relatively slowly and can be specified by fewer binary numbers per second (bits) than can, for example, the speech waveform.
- FIG. 1 illustrates schematically a suitable arrangement in accordance with the invention for synthesizing message-length utterances upon command
- FIG. 2 illustrates the manner of overlapping individual word formants, in accordance with the invention, for four different combinations of words
- FIG. 3 illustrates timing data which may be used for processing formant data
- FIG. 4 illustrates the processing of voiced formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer
- FIG. 5 illustrates the processing of both voiced and fricative formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer
- FIGS. 6A, 6B and 6C illustrate by way of a flow chart the operations employed in accordance with the invention, for processing parametric data and for concatenating these data to produce a complete set of control signals for energizing a'formant speech synthesizer.
- FIG. I A system for synthesizing speech by the concatenation of formant encoded words, in accordance with the invention, is illustrated schematically in FIG. I.
- Isolated words spoken by a human being are analyzed to estimate the parameters required for synthesis.
- system 10 which may include either studio generated or recorded words
- the individual words, in whatever format, are supplied to speech analyzer 12, wherein individual formants, amplitudes, pitch period designations, and fricative pole and zero identifications are developed at the Nyquist rate.
- speech analyzer 12 A suitable speech analyzer is described in detail in a copending application of Rabiner-Schafer, Ser. No.
- analyzer 12 includes individual channels, including analyser 13 for identifying formant (voiced) frequencies F F F analyzer 14 for developing a pitch period signal P, analyzer 15 for developing buzz, A and hiss, A level control signals, and analyzer 16 for developing fricative (unvoiced) pole and zero signals, Fp and F
- control parameter values are delivered to parametric description storage unit 17, which may take any desired form. Both analog and digital stores, which may be accessed upon command, are known in the art.
- storage unit 17 constitutes a word catalog which may be referenced by the word concatenation portion of the system. The parameter values maintained in catalog 17 may be revised from time to time by the addition or deletion of new words.
- INPUT COMMAND An input command from word sequence input 18 initiates the necessary operations to synthesize a message composed of words from catalog 17.
- the exact form of input 18 depends upon the particular application of the word synthesis system. Typically, an inquiry of some form is made to the system embodied by unit 18, the necessary data for a response is formulated, and the appropriate word designations for the response, for example, in the English language, are assembled in code language and delivered to the synthesis system as the output signal of unit 18.
- Such response units are known to those skilled in the art and are described in various patents and publications.
- the output developed by such a responsive unit may thus be in the form of machine code language, phoneme or other linguistic symbols, or the like. Whatever the form of the output signal, it is delivered, in accordance with this invention, to word processing System 20, wherein required word data is assembled, processed, and delivered to speech synthesizer 26.
- Processor 20 employs separate strategies for handling the segmental features of the message, such as formant frequencies, unvoiced pole and zero frequencies and amplitudes, and the prosodic features, such as timing and pitch.
- Program strategy for treating the segmental features is self-stored in the processor.
- the prosodic feature information needed for processing is derived in or is supplied to processor 20. It is this flexibility in manipulating formant-coded speech that permits the breaking of the synthesis problem into two parts.
- Timing information may be derived in one of several ways.
- the timing rules need be nothing more complicated than a table specifying word duration as a function of position in an input string of data and as a function of the number of phonemes per word.
- Timing data for a typical seven number digit string is illustrated in the table of FIG. 3 and is normally stored in timing unit 22.
- word duration is determined from rules which take into account the syntax of the specific message to be produced, i.e., rules based on models of the English language. Such data also is stored in timing store 22. It is also possible to specify the duration of each word in the input string to be synthesized from external timing data supplied from unit 23.
- word duration is chosen according to some external criterion, for example, or measured from a naturally spoken version of the message to be synthesized, and is not necessarily a typical duration for that word, independent of context.
- external timing data may be supplied from stored data or from real time adjustments made during synthe- SIS.
- PITCH DATA Synthesis also requires the determination of the appropriate pitch contour, i.e., pitch period as a function of time, for the message being synthesized.
- Pitch information can be obtained in several ways. For example, the pitch character of the original sequence of spoken words may be measured. Alternatively, a monotone or an arbitrarily shaped contour may be used. However,
- pitch data stored in unit 24 are supplied to concatenating processor 21 wherein the contour is locally lengthened or shortened as required by ;the specific utterance timing as specified by the timing data.
- pitch variation data may be supplied from external source 25, either in the form of auxiliary stored data, or as real time input data.
- a pitch contour extracted from a naturally spoken version of the message may be used. Such data would normally be used when word durations have been obtained in a similar manner, i.e., from external timing unit 23.
- Pitch and timing information obtained externally in this manner provide the most natural sounding synthesized speech. It is also possible to calculate pitch contour information by rule. Thus, there are many ways in which the prosodic information for a message can be obtained, and the choice depends strongly on the desired quality of the synthetic speech and the specific application for which it is to be used.
- WORD DURATION ADJUSTMENT Once the timing pattern for the message is established, isolated words in word catalog 17 can be withdrawn and altered to match the specified timing.
- formant data for a word in the catalog may be either lengthened or shortened.
- the formant contours for successive voiced words are smoothly connected together to form continuous transitions and continuous formant contours for the message.
- the choice of place in a word to alter duration is based on the dynamics of the formant contours. For each subinterval of a voiced sound, typically msec in duration, a measure of the rate of change of formant contours is computed in processor 21. This measure is called the spectral derivative.
- an appropriate number of 10 msec intervals are deleted in the region of the smallest spectral derivative.
- the region of the lowest spectral derivative is lengthened by adding an appropraite number of 10 msec intervals. Unvoiced regions of words are never modified.
- the measure of spectral derivative, SD,- is calculated where i(1,2, is the i'" 10 msec interval and F,(i) is the value of the j"' formant in the i"' time interval.
- i(1,2, is the i'" 10 msec interval
- F,(i) is the value of the j"' formant in the i"' time interval.
- this 100 msec region is shared by the two words; hence 50 msec (5 intervals) are allotted to each word separately in terms of the overall timing.
- the technique by which the W additional 10 msec intervals are inserted, or removed, is based entirely on the spectral derivative measurement. As noted above, for each 10 msec voiced interval of the isolated word, the spectral derivative is calculated. To shorten a word, the W intervals having smallest spectral derivatives are removed. To lengthen a word, the region of the word having smallest spectral derivative is located and W intervals are inserted at the middle of this region. Each of the W intervals is given the control parameters of the center of the interval i.e., a steady-state region of W intervals is added.
- FIG. 2 illustrates the type of interpolation performed for four simple cases in accordance with these considerations. Although all three formants of a sound are in terpolated, only one formant is illustrated for each word to simplify the presentation.
- word 1 the top spectrum
- word 2 the middle spectrum
- the interpolated curve shown at the bottom of the first column although beginning at the formants of word 1, rapidly makes a transition and follows the formants of word 2.
- Column 2 shows the reverse situation; word 2 exhibits little spectrum change whereas word 1 has a large spectrum change.
- the interpolated curve therefore, follows the formants of word 1 for most of the merging or overlap region and makes the transition to the formants of word 2 at the end of the region.
- Columns 3 and 4 show examples in which spectrum changes in both words are relatively the same. When they are small, as in column 3, the interpolated curve is essentially linear. When they are large, as in column 4, the interpolated curve tends to follow the formants of the first word for half of the overlap region, and the formants of the second word for the other half.
- the interpolated curve thus always begins at the formants of word 1 (the current word) and terminates with the formants of word 2 (the following word).
- the rate at which the interpolated curve makes a transition from the formants of the first word to those of the second is defir nined by the average spectral derivatives SDI and SD2.
- the spectral derivative of the second word is much greater than that of the first so the transition occurs rapidly at the beginning of the overlap region.
- the spectral derivative of the first word is the greater so that the transition occurs rapidly at the end of the overlap region.
- the spectral derivatives for both words in the examples of columns 3 and 4am much the same so that no rapid transitions take place in the overlap region.
- FIGS. 4 and 5 illustrate the manner in which these rules and considerations are turned to account in the practice of the invention.
- FIG. 4 illustrates the manner in which three voiced words, We, Were, and Away are linked together to form the sentence We were away. As spoken, the words have durations W,, W W as indicated, and through analysis have been determined to have formants F F and F These formant data are stored in storage unit 17 (FIG. 1) for the individual words, as discussed above. Upon an input command from word sequences unit 18 to assemble the three words into the sentence We were away, the formant data is drawn from storage unit 17 and delivered to word concatenating processor 21.
- Timing data from storage 22 (or alternatively from external unit 23) and pitch variation data from store 24 (or alternatively from external source 25) are supplied to the processor. It is initially determined that the words We and Were are normally linked together in speech by a smooth transition and uttered as one continuous phrase, Wewere. Hence, the two voiced words are adjusted in duration to values D D in accordance with the context of the utterance, and the formants of the words are overlapped and interpolated to provide the smooth transition. Similarly, the words were and away are normally spoken as wereaway with time emphasis on away. Hence, the duration of away is lengthened to D and the formants for the two words are overlapped and interpolated.
- the resulting smoothly interpolated formant specification is further modified by superimposing the pitch period contour illustrated in the figure.
- the resultant is a continguous formant specification of the entire utterance.
- FIG. 5 illustrates the concatenation of the words I, Saw, This, and Man, to form the phrase I saw this man".
- the words I and Saw are not overlapped because of the intervening fricative at the beginning of Saw.
- the words Saw and This are generally spoken with a smooth transition.
- these words are overlapped and the formants are interpolated. Since the word This ends in a fricative, the words This and Man are not overlapped.
- the individual word lengths W are each modified to the new values D.
- a stored pitch period contour is superimposed according to a stored rule. The resultant specification of the phrase I saw this man is thus delivered together with voiced-unvoiced character data, A A and fricative pole-zero data, Fp and F to the speech synthesizer.
- the unvoiced intensity parameters, A is obtained directly from the stored controls in word catalog v17 when the interval to be synthesized is unvoiced.
- the voiced intensity parameter, A is similarly obtained directly from word catalog 17, except during a merging region of two voiced intervals, in which case it is obtained by interpolation of the individual voiced intensities of the two words in a fashion similar to that described for the interpolation of formants.
- CONCATENATION PROCESSOR IMPLEMENTATION Although the operations described above for processing word formant data to form word sequence information may be carried out using any desired apparatus and techniques, one suitable arrangement used in practice relies upon the high-speed processing ability of a digital computer. In practice a general purpose digital computer, namely, the Honeywell DDP-516 or the GE- 635, have been found to be satisfactory. The two machines and their software systems are equally adaptable for receiving a program prepared to convert them from a general purpose machine to a special purpose processor for use in the practice of the invention.
- FIGS. 6A, 6B, and 6C A flow chart of the programming steps employed to convert such a machine into special purpose processing apparatus which turns to account the features of the invention, is shown in FIGS. 6A, 6B, and 6C, taken together as one complete description. Each step illustrated in the flow chart is itself well known and can be reduced to a suitable program by any one skilled in the programming art.
- the unique subroutines employed in the word length modification operation and in the overlapping operation are set forth in Fortran IV language in Appendices A and B attached hereto.
- the DDP-5 16 includes 16 k of core memory, hardware, multiply and divide, direct multiplex control with 16 data channels (0.25 mI-Iz each), and a direct memory access channel (1.0 mHz). Input is by way of a teletypewriter.
- a Fortran IV compiler, DAP-l6 machine-language assembler, match libraries, and various utility software are standard items supplied by the manufacturer and delivered with the machine.
- a number of peripheral units may be interfaced with the computer for convenience. This may include auxiliary word stores, card readers, display scopes, printers, tape readers, registers, and the like. Such units are well known to those skilled in the art and are generally available on the open market. They may be interconnected with the basic computer as required by the specific application to which the processor of this invention is to be used.
- PROCESSOR OPERATIONS In the portion of the flow chart shown at the top of FIG. 6A there is indicated schematically the parametric description storage unit 17 of FIG. 1 which contains a catalog of formant pitch amplitude and fricative specifieations for each of the words in the catalog. Upon command from word sequence input 18, these data are transferred to word concatenating processor system 20, which is illustrated by the reaminder of the flow chart.
- Pitch data is then superimposed on the formant and gain structure of each word in the utterance in the fashion described in detail above. These data are available in pitch variation data store 75 (store 24 of FIG. 1). It is next determined by the steps indicated in block 76 whether external pitch data is to be used. If it is, such data from unit 77 (unit 25 in FIG. 1) is supplied by way of data store 75 to the operations of unit 74.
- control parameter contours of the commanded utterance may, if desired, be smoothed and band-limited to about 16 Hz. They are then used to control a formant synthesizer which produces a continuous speech output. Numerous systems, both analog and digital, have been described for synthesizing speech from formant data.
- One suitable synthesizer is described in J. L. Flanagan Pat. No. 3,330,910, another in David-Flanagan, Pat. No. 3,190,963, FIG. 5, and another is described in Gerstman-Kelly Pat. No. 3,158,685.
- a formant synthesizer includes a system for producing excitation as a train of impulses with a spacing proportional to the fundamental pitch of the desired signal. The intensity of the pulse excitation is controlled and the signal is applied to a cascade of variable resonators.
- speech synthesizer 26 generates a waveform which approximates that required for the desired utterance.
- This signal is utilized in any desired fashion, for example, to energize output unit 27 which may be in the form of a loudspeaker, recording device, or the like.
- a system for composing speech messages from sequences of prerecorded words which comprises:
- V means for utilizing said continuous description to control a speech synthesizer.
- said parametric description of each word in said vocabulary comprises:
- said representations are in a coded digital formant.
- Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer which comprises:
- said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of pho-' nemes per word. 6. Apparatus for processing parametric descriptionsas defined in claim 4, wherein, said stored timing infor-' mation comprises: a schedule of word durations derived from rules based on common language usage. I 7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
- Apparatus for developing control signals for a speech synthesizer which comprises:
- equation (3) should read -F z F (n +,Q,) 9-2 SDl F 1) -2,- SD2 (9mm +2-s D 2 (3)
- the bar should be only over SDl and SD2 and the numeral (3) should be separated from the equation by spaces as this numeral is not part of the equation but is only intended to identify same.
- Col. line 13 "eontinguou s should read -c ontiguous--;
Abstract
Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding. According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
Description
United States Patent 1191 Flanagan et al.
[ SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS Inventors: James Loton Flanagan, Warren;
Lawrence Richard Rabiner, Berkeley Heights; Ronald William Schafer, New Providence, all of NJ.
Bell Telephone Laboratories Incorporated, Murray Hill, NJ.
Filed: Oct. 30, 1970 Appl. No.: 85,660
Assignee:
References Cited UNITED STATES PATENTS 11/1958 David 179/l5.55 R 11/1964 Gerstman 179/1 SA 5/1967 DeClerk 179/1 SA 2/1968 French 179/1 SA 10/1970 Nakata 179/1 SA 6/1971 Martin 179/1 SA OTHER PUBLICATIONS Rabiner, A Model for synthesizing Speech by Rule,
[ Aug. 6, 1974 IEEE Transactions AU-l7 3/69, pp. 7-13.
J. L. Flanagan et a1. Synthetic Voices for Computers, IEEE Spectrum, pp. 2245, October 14, 1970.
Primary Examiner-Kathleen H. Claffy Assistant Examiner-Jon Bradford Leaheey Attorney, Agent, or Firm-A. E. Hirsch; G. E. Murphy [5 7] ABSTRACT Audio response units that select speech sounds, stored in analog or coded digital form, as the excitation for a speech synthesizer are widely used, for example in telephone audio announcement terminals. The speech produced by most units is noticeably artifical and mechanical sounding.
According to this invention, human speech is analyzed in terms of formant structure and coded for storage in the unit. As the individual words are called for, a stored program assembles them into a complete utterance, taking into account the durations of the words in the context of the complete utterance, pitch variations common to the language, and transitions between voiced portions of the speech. The result is a more natural sounding synthetic utterance.
13 Claims, 8 Drawing Figures SPOKEN 5 WORD INPUT lf CONV. I'
SPEECH ANALYZER i I 1 j RMANT PITCH 14 AMPLITUDE 15 FRICATIVE a ANALYZER ANALy ANALYZER POLE/ZERO ANALYZER L 1 2 9 P\ v Aw Em r PARAMETRIC DESCRIPTION STORAGE PATENIEIIIIII: B I 3.828.132
SHEEI 1 BF 6 SPOKEN woRD INPUT 4; coNv. I
I' SPEECH ANALYZER F I I 'l FORMANT I3 PITcH l4 AMPLITUDE I5 FRICI TIVE l6 PoLE ZERO I ANALYZER ANALYZER ANALYZER ANALYZER l F21 F3 P\ v AN E5 z PARAMETRIC DEscRIPTIoN sToRAeE I7 I8 26 mm WORD DIGITAL SEQUENCE CONCATENATING SPEECH COMMAND I INPUT PROCESSOR I SYNTHESIZER TIMING M PITCH SPOKEN DATA VAR'AT'ON MESSAGE 24 DATA (STORED) (STORED) v OUTPUT I EXTERNAL EXTERNAL PlTCH TIMING vARIATIoN DATA DATA J. L. FLANAGAN -/Nl/EN7'OR L. R. RAB/IVER R. n. SCHAFER ATTORNEY PAIENIEB M18 3.828. 1 32 WORD I I CONTROL -I-- FUNCTION I M I I WORD 2 I I CONTROL FUNCTION I INTERPOLATED I l 1 I CURVE I I I I OVER LAP OVERLAP OVER LAP OVERLAP REGION REGION REGION REGIO NT-HV,E
PHYONEMES DIGIT POSITION 3 450 500 560 6IO [)l g fl 4 260 300 340 380 SEQUENCE 5 340 370 4IO 440 TIME (m SEC.)
PATENTEM 6W4 3.828.132
WORD SEQUENCE COMMAND INPUT PARAMETRIC DESCRIPTION STORAGE CATALOG OF WORD DESCRIPTIONS 6| T TABLE DETERMINE DURATIONS 62/ LOOKUP OF EACH WORD 22 (F|G.l) IN THE SEQUENCE $23,? MODIFY DURATIONAL ENTRIES DATA usme EXTERNAL INITIALIZE WORD SEQUENCIEWCOUNTER PATENIEIIMIB 1 3.828.132
SHEET 5 (IF 6 FIG. 6B
DATA FROM WORD CATALOG WAS ITH WORD MERGED WITH (I-DsT WORD? SYNTHESIZE FIRST 7 50 m SEC OF I TH WORD I I LENGTHEN OR SHORTEN SUBROUTINE I 1 TH WORD TO MAKE CRDELL TIMING AGREE WITH DURATIONAL DATA YES IS ITH WORD MERGED WITH (I+I) sT WORD? DATA FROM DATA FROM WORD CATALOG WORD CATALOG Iv I I OVERLAP MERGE I SUBROUTINE 1 END OF I TH WORD B E R E; BE INTPL WITH BEGINNING 1TH WORD I T (I+I)ST WORD I v UPDATE WORD $72 SEQUENCING INDEX I IS WORD SEQUENCING INDEX GREATER THAN INDEX OF LAST WORD IN INPUT SEQUENCE? PAIENIEBMIB 6 w 3.828.132-
SNEET 8 BF 6 FIG. 66'
v TABLE SUPERIMPOSE PITCH v LOOKUP DATA ON FORMANT 24 (FIG. D AND GAIN DATA 77 IS EXTERNAL TABLE PITCH DATA ENTRIES To BE USED? 25 (FIG. I)
SYNTHESIZE p26 FINAL OUTPUT SPEECH SYNTHESIS BY CONCATENATION OF FORMANT ENCODED WORDS This invention relates to the synthesis of limited context messages from stored data, and more particularly to processing techniques for assembling stored information into an appropriate specification for energizing a speech synthesizer.
BACKGROUND OF THE INVENTION information with which to energize a speech synthe-.
sizer. The response to the question is eventually provided in the form of complete spoken utterances.
For such a service, it is evident that the system must have a large and flexible vocabulary. The system, therefore, must store sizable quantities of speech information and it must have the information in a form amenable to the production of a great variety of messages. Speech generated by the system would be as intelligible as natural speech. Indeed, the possibility exists that it might be made more intelligible than natural speech. It need not, however, sound like any particular human and may even be permitted to have a machine accent.
DESCRIPTION OF THE PRIOR ART One technique for the synthesis of messages is to store individually spoken words and to select the words in accordance with the desired message output. Words pieced together in this fashion yield intelligible but highly unnatural sounding messages. One difficulty is that word waveforms cannot easily be adjusted in duration. Also, it is difficult to make smooth transitions from one word to the next. Nonetheless, such systems are relatively simple to implement and afford a relatively large vocabulary with simple storage apparatus.
To avoid some of the difficulties of word storage and to reduce the size of the store needed for a reasonable variety of message responses, individual speech sounds may be stored in the form of phoneme specifications. Such specifications can be called out of storage in accordance with word and message assembly rules and used to energize a speech synthesizer. However, speech at the acoustic level is not particularly discrete. Articulations of adjacent phonemes interact, and transient movements of the vocal tract in the production of any phoneme last much longer than the average duration of the phoneme. That is, the articulatory gestures overlap and are superimposed on one another. Hence, transient motions of the vocal tract are perceptually important. Moreover, much information about the identity of a constant is carried, not by the spectral shape at the steady-state time of the consonant but by its dynamic interactions with adjacent phonemes.
Speech synthesis, therefore, is strongly concerned with dynamics. A synthesizer must reproduce not only the characteristics of sounds when they most nearly represent the ideal of each phoneme, but also the dynamics of vocaltract motion as it progresses from one phoneme to another. This fact highlights a difference between speech synthesis from word or phrase storage and synthesis from more elementary speech units. If the library of speech elements is a small number of short units, such as phonemes, the linking procedures approach the complexity of the vocal tract itself. Conversely, if the library of speech elements is a much larger number of longer segments of speech, such as words or phrases, the elements can be linked together at points in the message where information in transients is minimal.
Thus, although phoneme synthesis techniques are attractive and sometimes adequate, the intermediate steps of assembling elementary speech specifications into words and words into messages according to prescribed rules requires complicated equipment and, at best, yields mechanical sounding speech.
SUMMARY OF THE INVENTION These shortcomings are overcome in accordance with the present invention by storing representations of spoken words or phrases in terms of individual formant and other speech defining characteristics. Formants are the natural resonances of the vocal tract and they take on different frequency values as the vocal tract changes its shape during talking. Typically, three such resonances occur in the frequency range most important to intelligibility, namely, 0 3 kHz. Representation of the speech wave as a set of slowly varying excitation parameters and vocal tract resonances is attractive for at least two reasons. First it is more efficient for data storage than, for example, a pulse code modulation (PCM) representation of the speech waveform. Secondly, a formant representation permits flexibility in manipulation of the speech signal for the concatenation of words or phrases.
Thus, in accordance with the invention individual, naturally spoken, isolated words are analyzed to produce a word library which is stored in terms of formant frequencies. In the formant representation of an utterance, formant frequencies, voice pitch, amplitude and timing, can all be manipulated independently. Thus in synthesizing an utterance, an artificial pitch contour, i.e., the time course of the relevant parameters, can be substituted for the natural contour; A steady-state sound can be lengthened or shortened, and even the entire utterance can be speeded up, or slowed down with little or no loss in intelligibility. Formants can be locally distorted, and the entire formant contour can be uniformly raised, or lowered, to alter voice quality.
Upon program demand, word length formant data are accessed and concatenated to form complete formant functions for the desired utterance. The formant functions are interpolated in accordance with spectral derivatives to establish contours which define smooth transitions between words. Speech contour and word duration data are calculated according to stored rules. Following the necessary processing and interpolation, concatenated formant functions are used to synthesize a waveform which approximates a naturally spoken message. As an added advantage, economy in storage is achieved because formant and excitation parameters change relatively slowly and can be specified by fewer binary numbers per second (bits) than can, for example, the speech waveform.
BRIEF DESCRIPTION OF THE DRAWINGS The invention will be fully apprehended from the following detailed description of illustrative embodiments thereof taken in connection with the appended drawings in which:
FIG. 1 illustrates schematically a suitable arrangement in accordance with the invention for synthesizing message-length utterances upon command;
FIG. 2 illustrates the manner of overlapping individual word formants, in accordance with the invention, for four different combinations of words;
FIG. 3 illustrates timing data which may be used for processing formant data;
FIG. 4 illustrates the processing of voiced formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer;
FIG. 5 illustrates the processing of both voiced and fricative formant data for individual words to produce a concatenated formant structure useful for actuating a speech synthesizer; and
FIGS. 6A, 6B and 6C illustrate by way of a flow chart the operations employed in accordance with the invention, for processing parametric data and for concatenating these data to produce a complete set of control signals for energizing a'formant speech synthesizer.
DETAILED DESCRIPTION OF THE INVENTION A system for synthesizing speech by the concatenation of formant encoded words, in accordance with the invention, is illustrated schematically in FIG. I. Isolated words spoken by a human being are analyzed to estimate the parameters required for synthesis. Thus, naturally spoken, isolated words originating, for example, in system 10, which may include either studio generated or recorded words, are converted, if desired, to digital form in converter 11. The individual words, in whatever format, are supplied to speech analyzer 12, wherein individual formants, amplitudes, pitch period designations, and fricative pole and zero identifications are developed at the Nyquist rate. A suitable speech analyzer is described in detail in a copending application of Rabiner-Schafer, Ser. No. 872,050, filed Oct. 29, 1969, now U. S. Pat. 3,649,765, granted Mar. 14, I972. In essence, analyzer 12 includes individual channels, including analyser 13 for identifying formant (voiced) frequencies F F F analyzer 14 for developing a pitch period signal P, analyzer 15 for developing buzz, A and hiss, A level control signals, and analyzer 16 for developing fricative (unvoiced) pole and zero signals, Fp and F These control parameter values are delivered to parametric description storage unit 17, which may take any desired form. Both analog and digital stores, which may be accessed upon command, are known in the art. When completed, storage unit 17 constitutes a word catalog which may be referenced by the word concatenation portion of the system. The parameter values maintained in catalog 17 may be revised from time to time by the addition or deletion of new words.
INPUT COMMAND An input command from word sequence input 18 initiates the necessary operations to synthesize a message composed of words from catalog 17. The exact form of input 18 depends upon the particular application of the word synthesis system. Typically, an inquiry of some form is made to the system embodied by unit 18, the necessary data for a response is formulated, and the appropriate word designations for the response, for example, in the English language, are assembled in code language and delivered to the synthesis system as the output signal of unit 18. Such response units are known to those skilled in the art and are described in various patents and publications. The output developed by such a responsive unit may thus be in the form of machine code language, phoneme or other linguistic symbols, or the like. Whatever the form of the output signal, it is delivered, in accordance with this invention, to word processing System 20, wherein required word data is assembled, processed, and delivered to speech synthesizer 26.
To synthesize a message composed of words from storag unit 17 requires the generation of timing contours, a pitch contour, and formant and amplitude contours. Processor 20, in accordance with the invention, employs separate strategies for handling the segmental features of the message, such as formant frequencies, unvoiced pole and zero frequencies and amplitudes, and the prosodic features, such as timing and pitch. Program strategy for treating the segmental features is self-stored in the processor. The prosodic feature information needed for processing is derived in or is supplied to processor 20. It is this flexibility in manipulating formant-coded speech that permits the breaking of the synthesis problem into two parts.
TIMING DATA Timing information may be derived in one of several ways. For limited vocabulary applications, such as automatic intercept services, the timing rules need be nothing more complicated than a table specifying word duration as a function of position in an input string of data and as a function of the number of phonemes per word. Timing data for a typical seven number digit string is illustrated in the table of FIG. 3 and is normally stored in timing unit 22. For more sophisticated applications, word duration is determined from rules which take into account the syntax of the specific message to be produced, i.e., rules based on models of the English language. Such data also is stored in timing store 22. It is also possible to specify the duration of each word in the input string to be synthesized from external timing data supplied from unit 23. In this case, word duration is chosen according to some external criterion, for example, or measured from a naturally spoken version of the message to be synthesized, and is not necessarily a typical duration for that word, independent of context. Thus, external timing data may be supplied from stored data or from real time adjustments made during synthe- SIS.
PITCH DATA Synthesis also requires the determination of the appropriate pitch contour, i.e., pitch period as a function of time, for the message being synthesized. Pitch information can be obtained in several ways. For example, the pitch character of the original sequence of spoken words may be measured. Alternatively, a monotone or an arbitrarily shaped contour may be used. However,
in practice both of these have been found to give unacceptable, unnatural results. Accordingly, it is in accordance with this invention to use a time-normalized pitch contour, stored in unit 24, and to modify it to matchthe word portions as determined from the timing rules. Thus, pitch data stored in unit 24 are supplied to concatenating processor 21 wherein the contour is locally lengthened or shortened as required by ;the specific utterance timing as specified by the timing data. If desired, pitch variation data may be supplied from external source 25, either in the form of auxiliary stored data, or as real time input data. For example, a pitch contour extracted from a naturally spoken version of the message may be used. Such data would normally be used when word durations have been obtained in a similar manner, i.e., from external timing unit 23.
Pitch and timing information obtained externally in this manner provide the most natural sounding synthesized speech. It is also possible to calculate pitch contour information by rule. Thus, there are many ways in which the prosodic information for a message can be obtained, and the choice depends strongly on the desired quality of the synthetic speech and the specific application for which it is to be used.
WORD DURATION ADJUSTMENT Once the timing pattern for the message is established, isolated words in word catalog 17 can be withdrawn and altered to match the specified timing. Thus, formant data for a word in the catalog may be either lengthened or shortened. The formant contours for successive voiced words are smoothly connected together to form continuous transitions and continuous formant contours for the message. The choice of place in a word to alter duration is based on the dynamics of the formant contours. For each subinterval of a voiced sound, typically msec in duration, a measure of the rate of change of formant contours is computed in processor 21. This measure is called the spectral derivative. Regions of the word where the spectral derivative is smallare regions where the word can be shortened or lengthened with the least effect on word intelligibility. Thus, to shorten a word by a given amount, an appropriate number of 10 msec intervals are deleted in the region of the smallest spectral derivative. To lengthen a word, the region of the lowest spectral derivative is lengthened by adding an appropraite number of 10 msec intervals. Unvoiced regions of words are never modified.
In practice, the measure of spectral derivative, SD,-, is calculated where i(1,2, is the i'" 10 msec interval and F,(i) is the value of the j"' formant in the i"' time interval. To determine how many 10 msec intervals must be added to (or substracted from) the isolated word controls, an equation is used based on desired word duration, isolated word duration, and some simple contextual information concerning how the current word is concatenated with its preceding and following neighbors. By defining the symbols:
I l if the end of the preceding word is voiced, and the beginning of the current word is also voiced; 0 otherwise 1 if the end of the current word is voiced, and the beginning of the following word is also voiced; 0 otherwise W, duration of current word spoken in isolation W duration of current word spoken in context (as determined from timing rules) W number of 10 msec intervals to be added if W 0 (or subtracted if W 0) then W W W 5 X (IPM NM) The reason for the last term in the above equation is that whenever either l or 1, it means that the two words must be smoothly merged together, and will overlap each other by msec. However, this 100 msec region is shared by the two words; hence 50 msec (5 intervals) are allotted to each word separately in terms of the overall timing. The technique by which the W additional 10 msec intervals are inserted, or removed, is based entirely on the spectral derivative measurement. As noted above, for each 10 msec voiced interval of the isolated word, the spectral derivative is calculated. To shorten a word, the W intervals having smallest spectral derivatives are removed. To lengthen a word, the region of the word having smallest spectral derivative is located and W intervals are inserted at the middle of this region. Each of the W intervals is given the control parameters of the center of the interval i.e., a steady-state region of W intervals is added.
OVERLAP OF WORD DESCRIPTIONS Except for the case when the end of the current word, as well as the beginning of the following word, are both voiced, the control data from word to word are simply abutted. Whenever the end of one word is voiced and the beginning of the next word is also voiced, a smooth transition is thus made from the formants at the end of one word to those at the beginning of the next word. This transition is made, for example, over the last 100 msec of the first word and the first 100 msec of the second. The transition rate depends on the relative rates of spectram change of the two words over the merging region.
To perform this transition task, an interpolation function is used whose parameters depend strongly on the average spectral derivatives of the two words during the merging region. If the spectral derivative symbols are defined as:
"0+9 SD1= 2 SDI i=n 7'10. I SD2 =2 SDZ,-
n starting interval of merging region for current word F,(l) Value of formant j of the message contour at time 1 during the merger region, 1 0,1, 9,
then the interlation function used F,(l) F,(n +l)-(9l)-SD1 F"j(l)-l-SD2/(9-l)SDl 4-1-8023 where F *(1) value of the j" formant, at time I for word k (k l is current word, k 2 is following word).
FORMANT INTERPOLATION FIG. 2 illustrates the type of interpolation performed for four simple cases in accordance with these considerations. Although all three formants of a sound are in terpolated, only one formant is illustrated for each word to simplify the presentation. For the words in column 1, word 1 (the top spectrum) exhibits a very small change over its last 100 msec of voicing, whereas word 2 (middle spectrum) exhibits a large change. The interpolated curve shown at the bottom of the first column, although beginning at the formants of word 1, rapidly makes a transition and follows the formants of word 2. Column 2 shows the reverse situation; word 2 exhibits little spectrum change whereas word 1 has a large spectrum change. The interpolated curve, therefore, follows the formants of word 1 for most of the merging or overlap region and makes the transition to the formants of word 2 at the end of the region. Columns 3 and 4 show examples in which spectrum changes in both words are relatively the same. When they are small, as in column 3, the interpolated curve is essentially linear. When they are large, as in column 4, the interpolated curve tends to follow the formants of the first word for half of the overlap region, and the formants of the second word for the other half.
The interpolated curve thus always begins at the formants of word 1 (the current word) and terminates with the formants of word 2 (the following word). The rate at which the interpolated curve makes a transition from the formants of the first word to those of the second is defir nined by the average spectral derivatives SDI and SD2. In the example of column 1, the spectral derivative of the second word is much greater than that of the first so the transition occurs rapidly at the beginning of the overlap region. For example of the second column the spectral derivative of the first word is the greater so that the transition occurs rapidly at the end of the overlap region. As indicated above, the spectral derivatives for both words in the examples of columns 3 and 4am much the same so that no rapid transitions take place in the overlap region.
EXAMPLES OF CONCATENATION FIGS. 4 and 5 illustrate the manner in which these rules and considerations are turned to account in the practice of the invention. FIG. 4 illustrates the manner in which three voiced words, We, Were, and Away are linked together to form the sentence We were away. As spoken, the words have durations W,, W W as indicated, and through analysis have been determined to have formants F F and F These formant data are stored in storage unit 17 (FIG. 1) for the individual words, as discussed above. Upon an input command from word sequences unit 18 to assemble the three words into the sentence We were away, the formant data is drawn from storage unit 17 and delivered to word concatenating processor 21. Timing data from storage 22 (or alternatively from external unit 23) and pitch variation data from store 24 (or alternatively from external source 25) are supplied to the processor. It is initially determined that the words We and Were are normally linked together in speech by a smooth transition and uttered as one continuous phrase, Wewere. Hence, the two voiced words are adjusted in duration to values D D in accordance with the context of the utterance, and the formants of the words are overlapped and interpolated to provide the smooth transition. Similarly, the words were and away are normally spoken as wereaway with time emphasis on away. Hence, the duration of away is lengthened to D and the formants for the two words are overlapped and interpolated.
The resulting smoothly interpolated formant specification is further modified by superimposing the pitch period contour illustrated in the figure. The resultant is a continguous formant specification of the entire utterance. These formant data as modified, together with the pitch period contour, and voiced-unvoiced character data A and A are delivered to speech synthesizer 26 (FIG. 1).
FIG. 5 illustrates the concatenation of the words I, Saw, This, and Man, to form the phrase I saw this man". In this case the words I and Saw are not overlapped because of the intervening fricative at the beginning of Saw. However, the words Saw and This" are generally spoken with a smooth transition. Hence, these words are overlapped and the formants are interpolated. Since the word This ends in a fricative, the words This and Man are not overlapped. In accordance with the context of the expression, the individual word lengths W are each modified to the new values D. Finally, a stored pitch period contour is superimposed according to a stored rule. The resultant specification of the phrase I saw this man is thus delivered together with voiced-unvoiced character data, A A and fricative pole-zero data, Fp and F to the speech synthesizer.
INTENSITY DATA The unvoiced intensity parameters, A is obtained directly from the stored controls in word catalog v17 when the interval to be synthesized is unvoiced. The voiced intensity parameter, A is similarly obtained directly from word catalog 17, except during a merging region of two voiced intervals, in which case it is obtained by interpolation of the individual voiced intensities of the two words in a fashion similar to that described for the interpolation of formants.
CONCATENATION PROCESSOR IMPLEMENTATION Although the operations described above for processing word formant data to form word sequence information may be carried out using any desired apparatus and techniques, one suitable arrangement used in practice relies upon the high-speed processing ability of a digital computer. In practice a general purpose digital computer, namely, the Honeywell DDP-516 or the GE- 635, have been found to be satisfactory. The two machines and their software systems are equally adaptable for receiving a program prepared to convert them from a general purpose machine to a special purpose processor for use in the practice of the invention.
A flow chart of the programming steps employed to convert such a machine into special purpose processing apparatus which turns to account the features of the invention, is shown in FIGS. 6A, 6B, and 6C, taken together as one complete description. Each step illustrated in the flow chart is itself well known and can be reduced to a suitable program by any one skilled in the programming art. The unique subroutines employed in the word length modification operation and in the overlapping operation are set forth in Fortran IV language in Appendices A and B attached hereto.
Although any general purpose digital computer may be adapted to perform the operations required by the flow chart of FIG. 6, a unit with characteristics similar to that of the DDP-l6 is preferred. The DDP-5 16 includes 16 k of core memory, hardware, multiply and divide, direct multiplex control with 16 data channels (0.25 mI-Iz each), and a direct memory access channel (1.0 mHz). Input is by way of a teletypewriter. A Fortran IV compiler, DAP-l6 machine-language assembler, match libraries, and various utility software are standard items supplied by the manufacturer and delivered with the machine. If desired, a number of peripheral units may be interfaced with the computer for convenience. This may include auxiliary word stores, card readers, display scopes, printers, tape readers, registers, and the like. Such units are well known to those skilled in the art and are generally available on the open market. They may be interconnected with the basic computer as required by the specific application to which the processor of this invention is to be used.
PROCESSOR OPERATIONS In the portion of the flow chart shown at the top of FIG. 6A there is indicated schematically the parametric description storage unit 17 of FIG. 1 which contains a catalog of formant pitch amplitude and fricative specifieations for each of the words in the catalog. Upon command from word sequence input 18, these data are transferred to word concatenating processor system 20, which is illustrated by the reaminder of the flow chart.
Initially, the duration of each word in the connected sequence is determined, as indicated in block 61, for example, by examining a stored table of timing data 62, of the sort illustrated in FIG. 3 and by unit 22 in FIG. 1. If a timing change is necessary, the program statements of unit 63 determines whether data in store 62 is sufficient of whether external timing data from unit 64 (block 23 of FIG. 1) should be used. In either event, the duration of each commanded word is established and a word sequence counter, in unit 65, is initialized by setting I=l.
It is then necessary to modify the parametric description of the first word in accordance with timing data and other stored rules. Accordingly, it is determined whether the 1" word was merged with the (ll word. This determination is represented by block 66. If it was not, information for the 1" word is withdrawn from word catalog l7 and the first 50 msec of the 1" is synthesized by unit 67. If the 1" word was so merged, the 1 word is lengthened or shortened to make timing agree with durational data supplied as above. This operation takes place in unit 68 in conjunction with subroutines CRDELL, a listing for which appears in Appendix A.
It is then ascertained whether the 1" word is to be merged with the (1+1 word via the steps of block 69. If there is to be a merger, the operations of block 70 are carried out to overlap the end of the 1" word with the beginning of the (1+1 word. This operation is carried out in conjunction with subroutine INTPL, a listing for which appears as Appendix B. If it is determined in block 69 that there is to be no merging, the operations of block 71 synthesize the last 50 msec of the 1" word using data for that word supplied from store 17.
It is then necessary in unit 72 to update the word sequencing of index I and, in operation 73, to determine if the word sequencing index is greater than the index of the last word in the input sequence. If it is not, control is returned to block 66, and the next word is composed in the fashion just described. The operations are thus iterated until the index is equal to the index of the last word in the input sequence, at which time the data from block 73 is transferred to block 74.
Pitch data is then superimposed on the formant and gain structure of each word in the utterance in the fashion described in detail above. These data are available in pitch variation data store 75 (store 24 of FIG. 1). It is next determined by the steps indicated in block 76 whether external pitch data is to be used. If it is, such data from unit 77 (unit 25 in FIG. 1) is supplied by way of data store 75 to the operations of unit 74.
When the pitch contour operation has been completed, all of the data in the word concatenating processor 20 as modified by the program of FIG. 6, is transferred, for example, to speech synthesizer 26 of FIG. 1.
FORMANT SYNTHESIS When all of the control parameter contours of the commanded utterance have been generated, they may, if desired, be smoothed and band-limited to about 16 Hz. They are then used to control a formant synthesizer which produces a continuous speech output. Numerous systems, both analog and digital, have been described for synthesizing speech from formant data. One suitable synthesizer is described in J. L. Flanagan Pat. No. 3,330,910, another in David-Flanagan, Pat. No. 3,190,963, FIG. 5, and another is described in Gerstman-Kelly Pat. No. 3,158,685. The abovecited Rabiner-Schafer application illustrates a typical formant synthesizer and relates the exact parameters described hereinabove to the input of the synthesizer described in the Flanagan patent. Very simply, a formant synthesizer includes a system for producing excitation as a train of impulses with a spacing proportional to the fundamental pitch of the desired signal. The intensity of the pulse excitation is controlled and the signal is applied to a cascade of variable resonators.
Suffice it to say, speech synthesizer 26 generates a waveform which approximates that required for the desired utterance. This signal is utilized in any desired fashion, for example, to energize output unit 27 which may be in the form of a loudspeaker, recording device, or the like.
SUBROUTINECRDELL(IST.ISP,NEL) COMMONJFTCH(300)JADR(8),LSTA,LNO.LTYPE, LCALL,1LLENG COMMONJAP(720.8).IA(25),IAD(25)JSEQQS). lIWTIM(25),J2,J3JRD2JRD4JRD8JAV 7 COMMONIFCTJ Fl(500)JF2(500)JF3(5 00)JB4(500) IC=I l-IST IF(IC.NE.O)CALL CRDSYNGSTJC) CALLCRDSYN(TI,I)
IC=ISPIST+IIC RETURN CONTINU E FIND SPECTRAL DERIVATIVE LEVEL SUCH THAT NEL INTERVALS HAS THIS LEVEL OR SMALLER IDIF=500 ITHR=O JC=O DO4OI=IST,ISP IF(JAR(T,I).EQ.O)GOTO40 IF(.IAR(I-8),LE.ITHRIJC=IC+l CONTINUE IDIF=IDIFl2 IF(IDIF.EQ.O)GOTO IF(.IC.GT.(NEL+I))GOTO45 IF(.IC.LT.NEL)GOTO50 GOTOSS ITI-IR==ITHRIDI-F GOTO37 ITHR=ITHR+IDIF GOTO37 ICNT=O FLIMINATE THE INTERVALS WITH SPECTRAL DERIVATIVES LESS THAN THIS LEVEL DOI=IST,ISP IF(.IAR(I,I).EQ.O)GOTO56 lFUAR(I.8).LE.ITI-IR)GOTO57 CONTINUE CALLCRDSYN(I.1) GOTO60 ICNT=ICNT+I IF(ICNT.GE.NEL)ITHR=-l CONTINUE RETURN END APPENDIX B y SUBROUTINE TO MERGE WORDS AND INTERPOLATE THEIR CONTROL SIGNALS SUBROUTINENTPL(IST,IST,IW,LST,IL1,IL2,ITPANS, 55
INUM) (l)MMON.I PTCH(300),JADR(8),LSTA,LNO,LTYPE, LCALLJLLENG CUMMGNTAR 720,8 .1A 25).IAD(25).1SEQ(25).
C C C lIWTIM(25)J2J3,.IRD2JRD4.IRD8.IAV COMMONIFCTJFI(500)JF2(S00)JF3(500)JF4(500) COMMONJ F5(500)JF6(500)J F7600 DIMENSIONJFINI'I) DI RENSIONJSATG) CALCULATE AVERAGE SPECTRAL DERIVATIVES OF BOTH WORDS OVER THE MERG ING REGION WHICH CONSIST OF IW INTERVALS CONTINUE JSI=O 182 0 DO5I=LIW i Il =IST+Il. I2=IST+I1 JSI=JSI+JAR(II,8) .IS2=JS2+.IAR(I2.8) IND=I l5 DO3OI= I .IW
**** GET STARTING ADDRESSES OF DATA FOR BOTH WORDS IL=IST+II JL=JST+II KL=LST+II LL=IND+I1 LM=IW+1-LL NORM=ISI+LM+JS2+LL MERGE AND INTERPOLATE CONTROL SIGNALS OVER THESE IW INTERVALS DO20J=L7 .ILI=.IAR(ILJ)+LM JL2=JAR(J L.J)+LL .IAR(KLJ)=.IKOL(.IL1,ISI .NORM)+.IKOL(JL2,JS2, NORM) CONTINUE CONTINUE CALLCRDSYN(LST.IW)
RETURN END What is claimed is: l. A system for composing speech messages from sequences of prerecorded words, which comprises:
means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each;
means for storing said parametric descriptions;
means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message;
means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules;
means for merging consecutive word descriptions together on the basis of the respective, voiceunvoiced character of the merged word descriptions;
means for altering the pitch characteristic of said continuous message description in accordance with a prescribed contour; and
means for utilizing said continuous description to control a speech synthesizer. V
2. A system for composing speech messages as defined in claim 1, wherein,
said parametric description of each word in said vocabulary comprises:
a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.
3. A system for composing speech messages as defined in claim 2, wherein,
said representations are in a coded digital formant.
4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises:
means for deriving a spectral derivative function for each word description of said message;
means for individually altering the durations of selected word descriptions in accordance with stored timing information;
means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message;
means for concatenating said altered word descrip- 13 tions with said transition descriptions in accor-'} dance with said prescribed message to form a con-, tinuous parametric message description; and l means for altering the pitch characteristic of said message description in accordance with prescribed rules. 5. Apparatus for processing parametric descriptions as defined in claim 4, wherein:
said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of pho-' nemes per word. 6. Apparatus for processing parametric descriptionsas defined in claim 4, wherein, said stored timing infor-' mation comprises: a schedule of word durations derived from rules based on common language usage. I 7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises:
a schedule of word durations assembled from mea-' surements of a naturally spoken version of said prescribed message.
8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein,
said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.
9. Apparatus as defined in claim 8, wherein,
the rate of transition between said two words is proportional to the average of said spectral derivatives I for said two words. 10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises:
a stored, time-normalized pitch contour for a selected number of different messages; and
means for modifying said contour in accordance with said altered word description durations.
11. Apparatus for developing control signals for a speech synthesizer, which comprises:
means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions;
means responsive to said spectral derivatives for interpolating said segmental functions to establish for said words as a function of message syntax.
UNITED STATES PATENT OFFICE CERTIFICATE OF CORRECTION Patent No. 3,828,132 Dated August 6, 197M Inv nt (s) James L. Flanagan-Lawrence R. Rabiner- Ronald W. Schafer It is certified that error appears in the above-identified patent and that said Letters Patentare hereby corrected as shown below:
Col. 1, line 31, "should" di read -should;
001.10, line 67, "IF(TS7)" should read --IF(TST)--. 001.11, line 1, "IF(N C.EQ.O|)GOTO22 should read --IF('NC.EQ.Ol)GOTO22--; line 7, "Il =T+J-l" shouldread --Il=I+J-l--; line 18, "I1 TLOC+NC/2" should read --Il=ILOC+NC/2--; line 29, "IF(NEL.E0.0)GOTO30" should read --IF(NEL.EQ.O)GOTO30--; line 22, "CALLCRDSYN(T1,1)" should read 1 --CALLCRDSYN(Il,l)--; Following line 23, add:
--IF(IC,NE,O)CALLCRDSYN(I1.IC); line 10, "FLIMINATE" should read --ELIMINATE--; line 55, "SUBROUTINE NTPL" should read --SUBROUTINE INTPL--;
(SEAL) Attest:
MARSHALL DANN RUTH C. I-IASON Commissioner of Patents Attesting Officer and Trademarks
Claims (13)
1. A system for composing speech messages from sequences of prerecorded words, which comprises: means for analyzing each word of a vocabulary of spoken words to produce a separate parametric description of each; means for storing said parametric descriptions; means under control of an applied command signal for sequentially withdrawing from storage those descriptions required to assemble a desired spoken message; means for individually altering the duration of the description of each word of said message in accordance with prescribed timing rules; means for merging consecutive word descriptions together on the basis of the respective, voice-unvoiced character of the merged word descriptions; means for altering the pitch characteristic of said continuous message description in accordancE with a prescribed contour; and means for utilizing said continuous description to control a speech synthesizer.
2. A system for composing speech messages as defined in claim 1, wherein, said parametric description of each word in said vocabulary comprises: a representation of the formants, voiced and unvoiced amplitudes, and fricative pole-zero characteristics of said spoken word.
3. A system for composing speech messages as defined in claim 2, wherein, said representations are in a coded digital formant.
4. Apparatus for processing parametric descriptions of selected prerecorded spoken words to form a continuous description of a prescribed message suitable for actuating a speech synthesizer, which comprises: means for deriving a spectral derivative function for each word description of said message; means for individually altering the durations of selected word descriptions in accordance with stored timing information; means operative in response to said spectral derivative functions for developing parametric descriptions of transitions between voiced word regions scheduled to be merged to form said message; means for concatenating said altered word descriptions with said transition descriptions in accordance with said prescribed message to form a continuous parametric message description; and means for altering the pitch characteristic of said message description in accordance with prescribed rules.
5. Apparatus for processing parametric descriptions as defined in claim 4, wherein: said stored timing information comprises a schedule of word durations as a function of position in an input string of words, and of the number of phonemes per word.
6. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises: a schedule of word durations derived from rules based on common language usage.
7. Apparatus for processing parametric descriptions as defined in claim 4, wherein, said stored timing information comprises: a schedule of word durations assembled from measurements of a naturally spoken version of said prescribed message.
8. Apparatus for processing parametric descriptions of selected prerecorded words, as defined in claim 4, wherein, said parametric descriptions of transitions are developed for the last 100 msec of the first of two words to be merged and the first 100 msec of the second of said two words to be merged.
9. Apparatus as defined in claim 8, wherein, the rate of transition between said two words is proportional to the average of said spectral derivatives for said two words.
10. Apparatus for processing parametric descriptions of selected words as defined in claim 4, wherein said means for altering the pitch characteristic of said message description comprises: a stored, time-normalized pitch contour for a selected number of different messages; and means for modifying said contour in accordance with said altered word description durations.
11. Apparatus for developing control signals for a speech synthesizer, which comprises: means supplied with word length segmental and prosodic functions of each individual word of a desired message for deriving the spectral derivatives of each of said functions; means responsive to said spectral derivatives for interpolating said segmental functions to establish contours which define smooth transitions between the words of said message; means for concatenating said segmental functions in accordance with said transition contours, and, means for utilizing said prosodic functions to alter said concatenated segmental functions to develop control waveform signals which approximate the waveform of said desired message.
12. Apparatus as defined in claim 11, wherein, said segmental functions include the format frequencies, unvoiced pole and zero frequencies and amplitudes of each of said words.
13. Apparatus as dEfined in claim 11, wherein, said prosodic functions include timing and pitch variations for said words as a function of message syntax.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US00085660A US3828132A (en) | 1970-10-30 | 1970-10-30 | Speech synthesis by concatenation of formant encoded words |
CA107,266A CA941968A (en) | 1970-10-30 | 1971-03-09 | Speech synthesis by concatenation of formant encoded words |
DE2115258A DE2115258C3 (en) | 1970-10-30 | 1971-03-30 | Method and arrangement for speech synthesis from representations of individually spoken words |
JP1928771A JPS539041B1 (en) | 1970-10-30 | 1971-04-01 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US00085660A US3828132A (en) | 1970-10-30 | 1970-10-30 | Speech synthesis by concatenation of formant encoded words |
Publications (1)
Publication Number | Publication Date |
---|---|
US3828132A true US3828132A (en) | 1974-08-06 |
Family
ID=22193116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US00085660A Expired - Lifetime US3828132A (en) | 1970-10-30 | 1970-10-30 | Speech synthesis by concatenation of formant encoded words |
Country Status (4)
Country | Link |
---|---|
US (1) | US3828132A (en) |
JP (1) | JPS539041B1 (en) |
CA (1) | CA941968A (en) |
DE (1) | DE2115258C3 (en) |
Cited By (180)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3982070A (en) * | 1974-06-05 | 1976-09-21 | Bell Telephone Laboratories, Incorporated | Phase vocoder speech synthesis system |
US3995116A (en) * | 1974-11-18 | 1976-11-30 | Bell Telephone Laboratories, Incorporated | Emphasis controlled speech synthesizer |
US4060848A (en) * | 1970-12-28 | 1977-11-29 | Gilbert Peter Hyatt | Electronic calculator system having audio messages for operator interaction |
US4075424A (en) * | 1975-12-19 | 1978-02-21 | International Computers Limited | Speech synthesizing apparatus |
US4144582A (en) * | 1970-12-28 | 1979-03-13 | Hyatt Gilbert P | Voice signal processing system |
DE2854601A1 (en) * | 1977-12-16 | 1979-06-21 | Sanyo Electric Co | CLAY SYNTHESIZER AND METHOD FOR CLAY PROCESSING |
US4163120A (en) * | 1978-04-06 | 1979-07-31 | Bell Telephone Laboratories, Incorporated | Voice synthesizer |
DE3019823A1 (en) * | 1979-05-29 | 1980-12-11 | Texas Instruments Inc | DATA CONVERTER AND LANGUAGE SYNTHESIS ARRANGEMENT THEREFORE |
US4384170A (en) * | 1977-01-21 | 1983-05-17 | Forrest S. Mozer | Method and apparatus for speech synthesizing |
US4455551A (en) * | 1980-01-08 | 1984-06-19 | Lemelson Jerome H | Synthetic speech communicating system and method |
US4559602A (en) * | 1983-01-27 | 1985-12-17 | Bates Jr John K | Signal processing and synthesizing method and apparatus |
US5146502A (en) * | 1990-02-26 | 1992-09-08 | Davis, Van Nortwick & Company | Speech pattern correction device for deaf and voice-impaired |
WO2000070799A1 (en) * | 1999-05-19 | 2000-11-23 | New Horizons Telecasting, Inc. | Streaming media automation and distribution system for multi-window television programming |
US6366884B1 (en) * | 1997-12-18 | 2002-04-02 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
EP1193615A2 (en) * | 2000-09-28 | 2002-04-03 | Global Language Communication Systems e.K. | Electronic text translation apparatus |
US6405169B1 (en) * | 1998-06-05 | 2002-06-11 | Nec Corporation | Speech synthesis apparatus |
US20020123130A1 (en) * | 2001-03-01 | 2002-09-05 | Cheung Ling Y. | Methods and compositions for degrading polymeric compounds |
US20020133349A1 (en) * | 2001-03-16 | 2002-09-19 | Barile Steven E. | Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs |
US20030097266A1 (en) * | 1999-09-03 | 2003-05-22 | Alejandro Acero | Method and apparatus for using formant models in speech systems |
US6591240B1 (en) * | 1995-09-26 | 2003-07-08 | Nippon Telegraph And Telephone Corporation | Speech signal modification and concatenation method by gradually changing speech parameters |
US7409347B1 (en) | 2003-10-23 | 2008-08-05 | Apple Inc. | Data-driven global boundary optimization |
US20080270137A1 (en) * | 2007-04-27 | 2008-10-30 | Dickson Craig B | Text to speech interactive voice response system |
US20090048844A1 (en) * | 2007-08-17 | 2009-02-19 | Kabushiki Kaisha Toshiba | Speech synthesis method and apparatus |
US20090070115A1 (en) * | 2007-09-07 | 2009-03-12 | International Business Machines Corporation | Speech synthesis system, speech synthesis program product, and speech synthesis method |
US7643990B1 (en) | 2003-10-23 | 2010-01-05 | Apple Inc. | Global boundary-centric feature extraction and associated discontinuity metrics |
US20100285778A1 (en) * | 2009-05-11 | 2010-11-11 | Max Bluvband | Method, circuit, system and application for providing messaging services |
US8229086B2 (en) | 2003-04-01 | 2012-07-24 | Silent Communication Ltd | Apparatus, system and method for providing silently selectable audible communication |
US20120309363A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc. | Triggering notifications associated with tasks items that represent tasks to perform |
US20130080176A1 (en) * | 1999-04-30 | 2013-03-28 | At&T Intellectual Property Ii, L.P. | Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8614431B2 (en) | 2005-09-30 | 2013-12-24 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8660849B2 (en) | 2010-01-18 | 2014-02-25 | Apple Inc. | Prioritizing selection criteria by automated assistant |
US8670985B2 (en) | 2010-01-13 | 2014-03-11 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US8718047B2 (en) | 2001-10-22 | 2014-05-06 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8751238B2 (en) | 2009-03-09 | 2014-06-10 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9706030B2 (en) | 2007-02-22 | 2017-07-11 | Mobile Synergy Solutions, Llc | System and method for telephone communication |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9756185B1 (en) * | 2014-11-10 | 2017-09-05 | Teton1, Llc | System for automated call analysis using context specific lexicon |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10915227B1 (en) | 2019-08-07 | 2021-02-09 | Bank Of America Corporation | System for adjustment of resource allocation based on multi-channel inputs |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2020077B (en) * | 1978-04-28 | 1983-01-12 | Texas Instruments Inc | Learning aid or game having miniature electronic speech synthesizer chip |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2860187A (en) * | 1955-12-08 | 1958-11-11 | Bell Telephone Labor Inc | Artificial reconstruction of speech |
US3158685A (en) * | 1961-05-04 | 1964-11-24 | Bell Telephone Labor Inc | Synthesis of speech from code signals |
US3319002A (en) * | 1963-05-24 | 1967-05-09 | Clerk Joseph L De | Electronic formant speech synthesizer |
US3369077A (en) * | 1964-06-09 | 1968-02-13 | Ibm | Pitch modification of audio waveforms |
US3532821A (en) * | 1967-11-29 | 1970-10-06 | Hitachi Ltd | Speech synthesizer |
US3588353A (en) * | 1968-02-26 | 1971-06-28 | Rca Corp | Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition |
-
1970
- 1970-10-30 US US00085660A patent/US3828132A/en not_active Expired - Lifetime
-
1971
- 1971-03-09 CA CA107,266A patent/CA941968A/en not_active Expired
- 1971-03-30 DE DE2115258A patent/DE2115258C3/en not_active Expired
- 1971-04-01 JP JP1928771A patent/JPS539041B1/ja active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US2860187A (en) * | 1955-12-08 | 1958-11-11 | Bell Telephone Labor Inc | Artificial reconstruction of speech |
US3158685A (en) * | 1961-05-04 | 1964-11-24 | Bell Telephone Labor Inc | Synthesis of speech from code signals |
US3319002A (en) * | 1963-05-24 | 1967-05-09 | Clerk Joseph L De | Electronic formant speech synthesizer |
US3369077A (en) * | 1964-06-09 | 1968-02-13 | Ibm | Pitch modification of audio waveforms |
US3532821A (en) * | 1967-11-29 | 1970-10-06 | Hitachi Ltd | Speech synthesizer |
US3588353A (en) * | 1968-02-26 | 1971-06-28 | Rca Corp | Speech synthesizer utilizing timewise truncation of adjacent phonemes to provide smooth formant transition |
Non-Patent Citations (2)
Title |
---|
J. L. Flanagan et al. Synthetic Voices for Computers, IEEE Spectrum, pp. 22 45, October 14, 1970. * |
Rabiner, A Model for Synthesizing Speech by Rule, IEEE Transactions AU 17 3/69, pp. 7 13. * |
Cited By (276)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4060848A (en) * | 1970-12-28 | 1977-11-29 | Gilbert Peter Hyatt | Electronic calculator system having audio messages for operator interaction |
US4144582A (en) * | 1970-12-28 | 1979-03-13 | Hyatt Gilbert P | Voice signal processing system |
US3982070A (en) * | 1974-06-05 | 1976-09-21 | Bell Telephone Laboratories, Incorporated | Phase vocoder speech synthesis system |
US3995116A (en) * | 1974-11-18 | 1976-11-30 | Bell Telephone Laboratories, Incorporated | Emphasis controlled speech synthesizer |
US4075424A (en) * | 1975-12-19 | 1978-02-21 | International Computers Limited | Speech synthesizing apparatus |
US4092495A (en) * | 1975-12-19 | 1978-05-30 | International Computers Limited | Speech synthesizing apparatus |
US4384170A (en) * | 1977-01-21 | 1983-05-17 | Forrest S. Mozer | Method and apparatus for speech synthesizing |
DE2854601A1 (en) * | 1977-12-16 | 1979-06-21 | Sanyo Electric Co | CLAY SYNTHESIZER AND METHOD FOR CLAY PROCESSING |
US4163120A (en) * | 1978-04-06 | 1979-07-31 | Bell Telephone Laboratories, Incorporated | Voice synthesizer |
WO1979000892A1 (en) * | 1978-04-06 | 1979-11-15 | Western Electric Co | Voice synthesizer |
DE3019823A1 (en) * | 1979-05-29 | 1980-12-11 | Texas Instruments Inc | DATA CONVERTER AND LANGUAGE SYNTHESIS ARRANGEMENT THEREFORE |
US4455551A (en) * | 1980-01-08 | 1984-06-19 | Lemelson Jerome H | Synthetic speech communicating system and method |
US4559602A (en) * | 1983-01-27 | 1985-12-17 | Bates Jr John K | Signal processing and synthesizing method and apparatus |
US5146502A (en) * | 1990-02-26 | 1992-09-08 | Davis, Van Nortwick & Company | Speech pattern correction device for deaf and voice-impaired |
US6591240B1 (en) * | 1995-09-26 | 2003-07-08 | Nippon Telegraph And Telephone Corporation | Speech signal modification and concatenation method by gradually changing speech parameters |
US6366884B1 (en) * | 1997-12-18 | 2002-04-02 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
US6553344B2 (en) | 1997-12-18 | 2003-04-22 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
US6785652B2 (en) * | 1997-12-18 | 2004-08-31 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
US6405169B1 (en) * | 1998-06-05 | 2002-06-11 | Nec Corporation | Speech synthesis apparatus |
US20130080176A1 (en) * | 1999-04-30 | 2013-03-28 | At&T Intellectual Property Ii, L.P. | Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus |
US9236044B2 (en) | 1999-04-30 | 2016-01-12 | At&T Intellectual Property Ii, L.P. | Recording concatenation costs of most common acoustic unit sequential pairs to a concatenation cost database for speech synthesis |
US8788268B2 (en) * | 1999-04-30 | 2014-07-22 | At&T Intellectual Property Ii, L.P. | Speech synthesis from acoustic units with default values of concatenation cost |
US9691376B2 (en) | 1999-04-30 | 2017-06-27 | Nuance Communications, Inc. | Concatenation cost in speech synthesis for acoustic unit sequential pair using hash table and default concatenation cost |
WO2000070799A1 (en) * | 1999-05-19 | 2000-11-23 | New Horizons Telecasting, Inc. | Streaming media automation and distribution system for multi-window television programming |
US20050060759A1 (en) * | 1999-05-19 | 2005-03-17 | New Horizons Telecasting, Inc. | Encapsulated, streaming media automation and distribution system |
US8621508B2 (en) | 1999-05-19 | 2013-12-31 | Xialan Chi Ltd., Llc | Encapsulated, streaming media automation and distribution system |
US6708154B2 (en) * | 1999-09-03 | 2004-03-16 | Microsoft Corporation | Method and apparatus for using formant models in resonance control for speech systems |
US20030097266A1 (en) * | 1999-09-03 | 2003-05-22 | Alejandro Acero | Method and apparatus for using formant models in speech systems |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
EP1193615A3 (en) * | 2000-09-28 | 2005-07-13 | Global Language Communication Systems e.K. | Electronic text translation apparatus |
EP1193615A2 (en) * | 2000-09-28 | 2002-04-03 | Global Language Communication Systems e.K. | Electronic text translation apparatus |
US20020123130A1 (en) * | 2001-03-01 | 2002-09-05 | Cheung Ling Y. | Methods and compositions for degrading polymeric compounds |
US6915261B2 (en) | 2001-03-16 | 2005-07-05 | Intel Corporation | Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs |
US20020133349A1 (en) * | 2001-03-16 | 2002-09-19 | Barile Steven E. | Matching a synthetic disc jockey's voice characteristics to the sound characteristics of audio programs |
US8718047B2 (en) | 2001-10-22 | 2014-05-06 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US8229086B2 (en) | 2003-04-01 | 2012-07-24 | Silent Communication Ltd | Apparatus, system and method for providing silently selectable audible communication |
US8015012B2 (en) | 2003-10-23 | 2011-09-06 | Apple Inc. | Data-driven global boundary optimization |
US7409347B1 (en) | 2003-10-23 | 2008-08-05 | Apple Inc. | Data-driven global boundary optimization |
US7643990B1 (en) | 2003-10-23 | 2010-01-05 | Apple Inc. | Global boundary-centric feature extraction and associated discontinuity metrics |
US7930172B2 (en) | 2003-10-23 | 2011-04-19 | Apple Inc. | Global boundary-centric feature extraction and associated discontinuity metrics |
US20100145691A1 (en) * | 2003-10-23 | 2010-06-10 | Bellegarda Jerome R | Global boundary-centric feature extraction and associated discontinuity metrics |
US20090048836A1 (en) * | 2003-10-23 | 2009-02-19 | Bellegarda Jerome R | Data-driven global boundary optimization |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9501741B2 (en) | 2005-09-08 | 2016-11-22 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9389729B2 (en) | 2005-09-30 | 2016-07-12 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9619079B2 (en) | 2005-09-30 | 2017-04-11 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US8614431B2 (en) | 2005-09-30 | 2013-12-24 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9958987B2 (en) | 2005-09-30 | 2018-05-01 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US9706030B2 (en) | 2007-02-22 | 2017-07-11 | Mobile Synergy Solutions, Llc | System and method for telephone communication |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US7895041B2 (en) * | 2007-04-27 | 2011-02-22 | Dickson Craig B | Text to speech interactive voice response system |
US20080270137A1 (en) * | 2007-04-27 | 2008-10-30 | Dickson Craig B | Text to speech interactive voice response system |
US20090048844A1 (en) * | 2007-08-17 | 2009-02-19 | Kabushiki Kaisha Toshiba | Speech synthesis method and apparatus |
US8175881B2 (en) * | 2007-08-17 | 2012-05-08 | Kabushiki Kaisha Toshiba | Method and apparatus using fused formant parameters to generate synthesized speech |
US8370149B2 (en) * | 2007-09-07 | 2013-02-05 | Nuance Communications, Inc. | Speech synthesis system, speech synthesis program product, and speech synthesis method |
US20130268275A1 (en) * | 2007-09-07 | 2013-10-10 | Nuance Communications, Inc. | Speech synthesis system, speech synthesis program product, and speech synthesis method |
US20090070115A1 (en) * | 2007-09-07 | 2009-03-12 | International Business Machines Corporation | Speech synthesis system, speech synthesis program product, and speech synthesis method |
US9275631B2 (en) * | 2007-09-07 | 2016-03-01 | Nuance Communications, Inc. | Speech synthesis system, speech synthesis program product, and speech synthesis method |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9361886B2 (en) | 2008-02-22 | 2016-06-07 | Apple Inc. | Providing text input using speech data and non-speech data |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9691383B2 (en) | 2008-09-05 | 2017-06-27 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8762469B2 (en) | 2008-10-02 | 2014-06-24 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8713119B2 (en) | 2008-10-02 | 2014-04-29 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8751238B2 (en) | 2009-03-09 | 2014-06-10 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US8494490B2 (en) | 2009-05-11 | 2013-07-23 | Silent Communicatin Ltd. | Method, circuit, system and application for providing messaging services |
US8792874B2 (en) | 2009-05-11 | 2014-07-29 | Silent Communication Ltd. | Systems, methods, circuits and associated software for augmenting contact details stored on a communication device with data relating to the contact contained on social networking sites |
US9565551B2 (en) | 2009-05-11 | 2017-02-07 | Mobile Synergy Solutions, Llc | Systems, methods, circuits and associated software for augmenting contact details stored on a communication device with data relating to the contact contained on social networking sites |
US20100285778A1 (en) * | 2009-05-11 | 2010-11-11 | Max Bluvband | Method, circuit, system and application for providing messaging services |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8670985B2 (en) | 2010-01-13 | 2014-03-11 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US8706503B2 (en) | 2010-01-18 | 2014-04-22 | Apple Inc. | Intent deduction based on previous user interactions with voice assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US8660849B2 (en) | 2010-01-18 | 2014-02-25 | Apple Inc. | Prioritizing selection criteria by automated assistant |
US8731942B2 (en) | 2010-01-18 | 2014-05-20 | Apple Inc. | Maintaining context information between user interactions with a voice assistant |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US8670979B2 (en) | 2010-01-18 | 2014-03-11 | Apple Inc. | Active input elicitation by intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US8799000B2 (en) | 2010-01-18 | 2014-08-05 | Apple Inc. | Disambiguation based on active input elicitation by intelligent automated assistant |
US9424861B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US9424862B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US9431028B2 (en) | 2010-01-25 | 2016-08-30 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US9075783B2 (en) | 2010-09-27 | 2015-07-07 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US20120309363A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc. | Triggering notifications associated with tasks items that represent tasks to perform |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9756185B1 (en) * | 2014-11-10 | 2017-09-05 | Teton1, Llc | System for automated call analysis using context specific lexicon |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10915227B1 (en) | 2019-08-07 | 2021-02-09 | Bank Of America Corporation | System for adjustment of resource allocation based on multi-channel inputs |
Also Published As
Publication number | Publication date |
---|---|
CA941968A (en) | 1974-02-12 |
JPS539041B1 (en) | 1978-04-03 |
DE2115258C3 (en) | 1974-01-24 |
DE2115258B2 (en) | 1973-06-07 |
DE2115258A1 (en) | 1972-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US3828132A (en) | Speech synthesis by concatenation of formant encoded words | |
US4912768A (en) | Speech encoding process combining written and spoken message codes | |
JP3408477B2 (en) | Semisyllable-coupled formant-based speech synthesizer with independent crossfading in filter parameters and source domain | |
EP0831460B1 (en) | Speech synthesis method utilizing auxiliary information | |
US20040073427A1 (en) | Speech synthesis apparatus and method | |
US5400434A (en) | Voice source for synthetic speech system | |
US5682502A (en) | Syllable-beat-point synchronized rule-based speech synthesis from coded utterance-speed-independent phoneme combination parameters | |
JPH0833744B2 (en) | Speech synthesizer | |
Rabiner et al. | Computer synthesis of speech by concatenation of formant-coded words | |
US5659664A (en) | Speech synthesis with weighted parameters at phoneme boundaries | |
US5163110A (en) | Pitch control in artificial speech | |
JP5175422B2 (en) | Method for controlling time width in speech synthesis | |
US7130799B1 (en) | Speech synthesis method | |
GB2284328A (en) | Speech synthesis | |
EP1543500A1 (en) | Speech synthesis using concatenation of speech waveforms | |
JPH0642158B2 (en) | Speech synthesizer | |
JPH11249676A (en) | Voice synthesizer | |
KR970003092B1 (en) | Method for constituting speech synthesis unit and sentence speech synthesis method | |
JP2573586B2 (en) | Rule-based speech synthesizer | |
JPH0863187A (en) | Speech synthesizer | |
JPH0358100A (en) | Rule type voice synthesizer | |
JPH08160991A (en) | Method for generating speech element piece, and method and device for speech synthesis | |
Eady et al. | Pitch assignment rules for speech synthesis by word concatenation | |
JPH05257494A (en) | Voice rule synthesizing system | |
JP2002244693A (en) | Device and method for voice synthesis |