US20040204935A1 - Adaptive voice playout in VOP - Google Patents

Adaptive voice playout in VOP Download PDF

Info

Publication number
US20040204935A1
US20040204935A1 US10/081,355 US8135502A US2004204935A1 US 20040204935 A1 US20040204935 A1 US 20040204935A1 US 8135502 A US8135502 A US 8135502A US 2004204935 A1 US2004204935 A1 US 2004204935A1
Authority
US
United States
Prior art keywords
frame
delay
playout
voiced
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/081,355
Inventor
Krishnasamy Anandakumar
Alan McCree
Erdal Paksoy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Texas Instruments Inc
Original Assignee
Texas Instruments Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Inc filed Critical Texas Instruments Inc
Priority to US10/081,355 priority Critical patent/US20040204935A1/en
Assigned to TEXAS INSTRUMENTS INCORPORATED reassignment TEXAS INSTRUMENTS INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MCCREE, ALAN V., ANANDAKUMAR, KRISHNASAMY, PAKSOY, ERDAL
Publication of US20040204935A1 publication Critical patent/US20040204935A1/en
Priority to US12/136,662 priority patent/US7577565B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the invention relates to electronic devices, and more particularly to speech coding, transmission, and decoding/synthesis methods and circuitry.
  • r ( n ) s ( n )+ ⁇ M ⁇ i ⁇ 1 a i s ( n ⁇ i ) (1)
  • M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is typically 80 or 160 (10 or 20 ms frames).
  • a frame of samples may be generated by various windowing operations applied to the input speech samples.
  • ⁇ r(n) 2 yields the ⁇ a i ⁇ which furnish the best linear prediction for the frame.
  • the coefficients ⁇ a i ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage and converted to line spectral pairs (LSPs) for interpolation between subframes.
  • the ⁇ r(n) ⁇ is the LP residual for the frame, and ideally the LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
  • the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an excitation which emulates the LP residual from the encoded parameters.
  • the excitation roughly has the form of a series of pulses at the pitch frequency
  • unvoiced frames roughly has the form of white noise.
  • the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and (quantized) gain(s).
  • a receiver decodes the transmitted/stored items and regenerates the input speech with the same perceptual characteristics.
  • FIGS. 5-6 illustrate high level blocks of an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
  • the ITU standard G.729 uses frames of 10 ms length (80 samples) divided into two 5-ms 40-sample subframes for better tracking of pitch and gain parameters plus reduced codebook search complexity.
  • Each subframe has an excitation represented by an adaptive-codebook contribution and a fixed (algebraic) codebook contribution.
  • the adaptive-codebook contribution provides periodicity in the excitation and is the product of v(n), the prior frame's excitation translated by the current frame's pitch delay in time and interpolated, multiplied by a gain, g P .
  • the fixed codebook contribution approximates the difference between the actual residual and the adaptive codebook contribution with a four-pulse vector, c(n), multiplied by a gain, g C .
  • FIGS. 3-4 illustrate the encoding and decoding in block format; the postfilter essentially emphasizes any periodicity (e.g., vowels).
  • the long-term postfilter finds the long-term predictor for which the prediction gain is more than 3 dB by using a normalized correlation greater than 0.5 in the optimal delay determination.
  • a 10 ms frame is declared periodic if at least one 5 ms subframe has a long-term prediction gain of more than 3 dB. Otherwise the frame is declared nonperiodic.
  • An erased frame inherits its class from the preceding (reconstructed) speech frame. Note that the voicing classification is continuously updated based on this reconstructed speech signal. The specific steps taken for an erased frame are as follows:
  • the gain predictor for the fixed-codebook gain uses the energy of the previously selected fixed-codebook vectors c(n), so to avoid transitional effects once good frames are received, the memory of the gain predictor is updated with an attenuated version of the average codebook energy over four prior frames.
  • the excitation used depends upon the periodicity classification. If the last reconstructed frame was classified as periodic, the current frame is considered to be periodic as well. In that case only the adaptive-codebook contribution is used, and the fixed-codebook contribution is set to zero.
  • the pitch delay is based on the integer part of the pitch delay in the previous frame, and is repeated for each successive frame. To avoid excessive periodicity the pitch delay value is increased by one for each next subframe but bounded by 143 (pitch frequency of 70 Hz). In contrast, if the last reconstructed frame was classified as nonperiodic, the current frame is considered to be nonperiodic as well, and the adaptive codebook contribution is set to zero.
  • the fixed-codebook contribution is generated by randomly selecting a codebook index and sign index.
  • the frame classification allows the use of different decay factors for different types of excitation (e.g., 0.9 for periodic and 0.98 for nonperiodic gains).
  • FIG. 2 illustrates the decoder with concealment parameters.
  • VOP voice over packet
  • jitter variable network delays
  • Continuous playout is often achieved by buffering the received voice packets for sufficient time so that most of the packets are likely received before their scheduled playout times.
  • the playout delay due to buffering can be a fixed delay throughout the duration of a voice call or may adaptively vary during the call. Playout delay trades off packet losses (packet arrival after scheduled playout time) for overall delays.
  • a very large playout delay ensures minimal packet losses but long gaps between messages and replies in two-way calls; and conversely, small playout delays leads to large packet losses but truly real-time conversation.
  • the playout time of the first packet is taken as the then-current playout delay which can be determined by various algorithms: one approach has a normal mode and a spike mode.
  • the normal mode would determine a playout delay as the sum of the filtered arrival delay of previously-arrived packets and a multiple of the filtered variation of the arrival delays; this provides a long-term adaptation.
  • Spike mode would determine a playout delay from the arrival delay of the last-arrived packet during a delay spike (which is detected by a large gradient in packet delays); this provides a short-term adaptation.
  • Ramjee et al and Moon et al use packets consisting of 160 PCM samples of speech sampled at the usual 8 kHz; that is, packetize 20 ms frames. And packets arriving after their scheduled playout times are discarded.
  • a large delay spike occurring within a talkspurt implies a sequence of lost packets because the playout delay cannot adjust until the next talkspurt.
  • lost or discarded packets (arriving after their playout times) can be reconstructed by using the CELP parameters of the last good frame together with bandwidth expansion of the LP coefficients and attenuation of the gains at 4 dB for successive reconstructed frames.
  • the initial lost frames could be reconstructions of the last good frame but would attenuate to essentially silence within a few frames.
  • both the prior and subsequent good frames can be used for two-sided prediction (e.g., interpolation) of CELP parameters for reconstruction.
  • the present invention provides packetized CELP playout delay with short-term plus long-term adaptations together with adjustments during a talkspurt limited to frame expansions. Also, frame classification leads to alternative frame expansion methods.
  • FIG. 1 illustrates a preferred embodiment
  • FIG. 2 shows known CELP decoder concealment.
  • FIGS. 3-4 are block diagrams of known CELP encoder and decoder.
  • FIGS. 5-6 illustrate systems.
  • Preferred embodiment decoders and methods for playout buffer timing in CELP-encoded speech received as packets or frames over a network have one or more of the features: (1) playout delay determined by short-term plus long-term adaptations where the adaptation during a talkspurt is limited to frame expansion, (2) frame expansions for voiced frames in multiples of the pitch delay but unconstrained for unvoiced frames, (3) frame expansions for a transition frame either as a voiced frame or as only the unvoiced portion.
  • the frame expansions use CELP parameters and, optionally, add bandwidth expansion and gain attenuation. The methods minimize the playout delay for better interactive performance while insuring all received packets get played out.
  • FIG. 1 illustrates a preferred embodiment expansion for a voiced frame: the packet with frame m+1 is delayed in the network, so the voiced frame m expands by three multiples of its pitch delay, T (m) , to allow frame m+1 playout without a gap and with phase alignment due to expansion in multiples of the pitch delay.
  • Applications of the preferred embodiment playout methods extend to include hybrid packet-rate adaptation for congestion control, and voice-synchronization with other media.
  • hybrid packet-rate adaptation a decrease in packet-rate (number of packets sent per second) occurs during both silence periods and active speech, but an increase in packet-rate occurs only during silence periods.
  • the step of decreasing packet-rate during active speech uses the speech frame expansion at the receiver for handling playout delay change, and the preferred embodiment methods (frame voicing classification determining expansion approach) apply.
  • the speech playout may be adjusted by the preferred embodiment methods using the video as the time base.
  • Preferred embodiment systems incorporate preferred embodiment playout methods in receivers/decoders and may include an air interface as part or all of the communications channel.
  • the first preferred embodiment playout method schedules a playout time (decoding time) for a CELP frame in a packet as the later of (i) the packet's send time (time stamp) plus a delay threshold or (ii) the packet's arrival time.
  • the delay threshold is set so that a high percentage (e.g., 98%) of recently arrived packets (e.g., the last 200 packets) likely have a delay less than or equal to the delay threshold.
  • the delay threshold has a long-term adaptation.
  • variable “playout_delay” denote the playout delay (playout time minus send time) of the current packet in the received sequence of packets
  • the variable “delay” denote the delay (arrival time minus send time) of the current packet
  • the variable “estimate” be the estimated delay threshold which has the percentage of packets with delay less than or equal to “estimate” at about DELAY_PERCENTILE, a constant such as 0.98.
  • PARAM 1 is a parameter roughly about 0.5 and PARAM 2 is a parameter roughly about 0.9.
  • This choice of PARAM 1 and PARAM 2 allows “estimate” to rapidly increase but only slowly decrease.
  • the delay threshold derives from a histogram of the delays of the NUM_OF_PACKETS (e.g., 200) previously-arrived packets. Indeed, let “delays[]” be an array of size NUM_OF_PACKETS with entries being the delays of recently arrived packets; treat the array as a circular FIFO with read/write position indicator for the current packet as the variable “position”.
  • DELAY_STEP_SIZE is the delay quantization step size, e.g., 1 ms.
  • number_of_packets_played is the number of packets within the NUM_OF_PACKETS recently-arrived packets which would have delays that are less than or equal to the current delay threshold.
  • the spike causes a short-term (following “delay”) jump of “playout_delay” to 180 which then long-term (following “estimate”) persists for 200 packets (the size of the histogram) and then slowly decays back to 130.
  • the playout delay for the current frame is an increase over the playout delay for the prior frame, so the preferred embodiments expand the prior frame to fill the gap. This expansion applies whether the prior frame is active speech or silence. Contrarily, when the playout delay is to decrease, such as when “delay” drops below “estimate”, then if the packet has a frame of silence, the current frame is compressed; otherwise if the packet has a frame of active speech, the decrease is put off until a frame of silence occurs.
  • variable “modification” sets the decoding to expand, compress, or not change the decoded frame length from the encoded frame length of 10 ms.
  • EXPANSION invoke a frame expansion method as described in the following section, and for SILENCE_COMPRESSION truncate the (silence) frame (e.g., truncate the excitation) by the amount “playout_delay” ⁇ “new_playout_delay”. If this truncation exceeds the frame length, then extend to subsequent silence frames. Further, for an active speech frame with NO_MODIFICATION, the compression is pushed to the next packet by increasing “playout_delay” for the next packet to equal “playout_delay” for the current packet.
  • Some frame expansion methods include gain attenuation and bandwidth expansion, and in this case the gap at the onset of a large delay spike is filled with a sequence of fading versions of the last timely-arrived frame prior to the spike.
  • Preferred embodiment frame expansion methods first perform a voicing classification of the frame and then expand accordingly.
  • a voicing classification of the frame and then expand accordingly.
  • a threshold e.g., 0.7 as in G.729 postfilter
  • the peakiness measure ratio of L2 norm to L1 norm
  • the first method treats a transition frame as a voiced frame and follows the foregoing description for a voiced frame expansion.
  • the second method expands only the initial unvoiced portion of the frame and follows the foregoing description of unvoiced frame expansion.
  • the frame to be expanded is not fully played out; but rather the voiced latter portion is delayed and the expansion repeats would use the first subframe LP parameters.
  • the second method requires some look ahead to see that a expansion will be needed and then to prevent the final voiced portion from being played out until needed.
  • Alternate preferred embodiments for (1)-(3) attenuate the adaptive and fixed codebook gains by 1 dB for each 10 ms of expansion and apply bandwidth expansion to the LP coefficients. This gradually mutes the frame expansion for long expansions. Indeed, many detail variations may be used, including dropping the fixed-codebook contribution to the excitation for a periodic frame, dropping the adaptive-codebook contribution and using a random fixed-codebook vector for a nonperiodic frame, separate attenuation rates of adaptive and fixed codebook gains, incrementing the pitch delay during expansion, and so forth.
  • the frame expansion preferred embodiments may be used with playout methods other than the preferred embodiment described in the foregoing.
  • Methods to synchronize voice with other media or adapt voice packet-rate when speech truncation is needed may use preferred embodiment truncation methods which are analogous to the foregoing speech expansion methods. (1) If the speech is voiced, it is truncated only in integer multiples of the pitch period; and (2) if the speech is unvoiced (including silences), no constraint on truncation is applied.
  • FIGS. 5-6 show in functional block form preferred embodiment systems which use a preferred embodiment playout method, both speech and also other signals which can be effectively CELP coded.
  • communications systems users (transmitters and/or receivers) could include one or more digital signal processors (DSP's) and/or other programmable devices such as RISC processors with stored programs for performance of the signal processing of a preferred embodiment method.
  • DSP's digital signal processors
  • ASICs specialized circuitry
  • Users may also contain analog and/or mixed-signal integrated circuits for amplification or filtering of inputs to or outputs from a communications channel and for conversion between analog and digital. Such analog and digital circuits may be integrated on a single die.
  • the stored programs may, for example, be in ROM or flash EEPROM or FeRAM which is integrated with the processor or external to the processor.
  • Antennas may be parts of receivers with multiple finger RAKE detectors for air interface to networks such as the Internet.
  • Exemplary DSP cores could be in the TMS320C6xxx or TMS320C5xxx families from Texas Instruments.
  • the preferred embodiments may be modified in various ways while retaining one or more of the features of playout delay increase during a talkspurt but a decrease only during silence and voiced frame expansion by multiples of the pitch delay.
  • the frame voicing classification may have more classes with two or more classes leading to frame expansions with multiples of the pitch delay but with differing excitations, interval (frame and subframe) size and sampling rate could differ; various gain attenuation rates and bandwidth expansion factors could be used, the CELP encoding may be layered (successively more bits to higher layers) and the playout frame expansion may only use the lower levels, . . .

Abstract

Packetized CELP-encoded speech playout with frame truncation only during silence and frame expansion method dependent upon voicing classification with voiced frame expansion maintaining phase alignment.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority from provisional application Serial No. 60/270,264, filed Feb. 21, 2001.[0001]
  • BACKGROUND OF THE INVENTION
  • The invention relates to electronic devices, and more particularly to speech coding, transmission, and decoding/synthesis methods and circuitry. [0002]
  • The performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications. Both dedicated channel and packetized-over-network (e.g., Voice over IP or Voice over Packet) transmissions benefit from compression of speech signals. The widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech. Linear prediction analysis determines LP coefficients a[0003] i, i=1, 2, . . . , M, for an input frame of digital speech samples {s(n)} by setting
  • r(n)=s(n)+ΣM≧i≧1 a i s(n−i)  (1)
  • and minimizing the energy Σr(n)[0004] 2 of the residual r(n) in the frame. Typically, M, the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network sampling for digital transmission); and the number of samples {s(n)} in a frame is typically 80 or 160 (10 or 20 ms frames). A frame of samples may be generated by various windowing operations applied to the input speech samples. The name “linear prediction” arises from the interpretation of r(n)=s(n)+ΣM≧i≧1 ai s(n−i) as the error in predicting s(n) by the linear combination of preceding speech samples −ΣM≧i≧1 ai s(n−i). Thus minimizing Σr(n)2 yields the {ai} which furnish the best linear prediction for the frame. The coefficients {ai} may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage and converted to line spectral pairs (LSPs) for interpolation between subframes.
  • The {r(n)} is the LP residual for the frame, and ideally the LP residual would be the excitation for the [0005] synthesis filter 1/A(z) where A(z) is the transfer function of equation (1). Of course, the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an excitation which emulates the LP residual from the encoded parameters. Physiologically, for voiced frames the excitation roughly has the form of a series of pulses at the pitch frequency, and for unvoiced frames the excitation roughly has the form of white noise.
  • The LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and (quantized) gain(s). A receiver decodes the transmitted/stored items and regenerates the input speech with the same perceptual characteristics. FIGS. 5-6 illustrate high level blocks of an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second). [0006]
  • The ITU standard G.729 uses frames of 10 ms length (80 samples) divided into two 5-ms 40-sample subframes for better tracking of pitch and gain parameters plus reduced codebook search complexity. Each subframe has an excitation represented by an adaptive-codebook contribution and a fixed (algebraic) codebook contribution. The adaptive-codebook contribution provides periodicity in the excitation and is the product of v(n), the prior frame's excitation translated by the current frame's pitch delay in time and interpolated, multiplied by a gain, g[0007] P. The fixed codebook contribution approximates the difference between the actual residual and the adaptive codebook contribution with a four-pulse vector, c(n), multiplied by a gain, gC. Thus the excitation is u(n)=gP v(n)+gC c(n) where v(n) comes from the prior (decoded) frame and gP, gC, and c(n) come from the transmitted parameters for the current frame. FIGS. 3-4 illustrate the encoding and decoding in block format; the postfilter essentially emphasizes any periodicity (e.g., vowels).
  • However, high error rates in wireless transmission and large packet losses/delays for network transmissions demand that an LP decoder handle frames which arrive too late for playout or in which so many bits are corrupted that the frame is ignored (erased). To maintain speech quality and intelligibility for wireless or voice-over-packet applications in the case of lost or erased frames, the decoder typically has methods to conceal such frame erasures. In particular, G.729 handles frame erasures and lost frames by reconstruction based on previously received information; that is, repetition-based concealment. Namely, replace the missing excitation signal with one of similar characteristics, while gradually decaying its energy by using a voicing classifier based on the long-term prediction gain (which is computed as part of the long-term postfilter analysis). The long-term postfilter finds the long-term predictor for which the prediction gain is more than 3 dB by using a normalized correlation greater than 0.5 in the optimal delay determination. For the error concealment process, a 10 ms frame is declared periodic if at least one 5 ms subframe has a long-term prediction gain of more than 3 dB. Otherwise the frame is declared nonperiodic. An erased frame inherits its class from the preceding (reconstructed) speech frame. Note that the voicing classification is continuously updated based on this reconstructed speech signal. The specific steps taken for an erased frame are as follows: [0008]
  • 1 ) repetition of the synthesis filter parameters. The LP parameters of the last good frame are used. [0009]
  • 2) attenuation of adaptive and fixed-codebook gains. The adaptive-codebook gain is based on an attenuated version of the previous adaptive-codebook gain: if the (m+1)[0010] st frame is erased, use gP (m+1)=0.9 gP (m). Similarly, the fixed-codebook gain is based on an attenuated version of the pervious fixed-codebook gain: gC (m+1)=0.98 gC (m).
  • 3) attenuation of the memory of the gain predictor. The gain predictor for the fixed-codebook gain uses the energy of the previously selected fixed-codebook vectors c(n), so to avoid transitional effects once good frames are received, the memory of the gain predictor is updated with an attenuated version of the average codebook energy over four prior frames. [0011]
  • 4) generation of the replacement excitation. The excitation used depends upon the periodicity classification. If the last reconstructed frame was classified as periodic, the current frame is considered to be periodic as well. In that case only the adaptive-codebook contribution is used, and the fixed-codebook contribution is set to zero. The pitch delay is based on the integer part of the pitch delay in the previous frame, and is repeated for each successive frame. To avoid excessive periodicity the pitch delay value is increased by one for each next subframe but bounded by 143 (pitch frequency of 70 Hz). In contrast, if the last reconstructed frame was classified as nonperiodic, the current frame is considered to be nonperiodic as well, and the adaptive codebook contribution is set to zero. The fixed-codebook contribution is generated by randomly selecting a codebook index and sign index. The frame classification allows the use of different decay factors for different types of excitation (e.g., 0.9 for periodic and 0.98 for nonperiodic gains). FIG. 2 illustrates the decoder with concealment parameters. [0012]
  • A major challenge in voice over packet (VOP) receivers is provision for continuous playout of voice packets in the presence of variable network delays (jitter). Continuous playout is often achieved by buffering the received voice packets for sufficient time so that most of the packets are likely received before their scheduled playout times. The playout delay due to buffering can be a fixed delay throughout the duration of a voice call or may adaptively vary during the call. Playout delay trades off packet losses (packet arrival after scheduled playout time) for overall delays. A very large playout delay ensures minimal packet losses but long gaps between messages and replies in two-way calls; and conversely, small playout delays leads to large packet losses but truly real-time conversation. [0013]
  • R. Ramjee et al, Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks, Proc. INFOCOM'94, 5d.3 (1994) and S. Moon et al, Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms, 5 ACM/Springer Multimedia Systems 17 (1998) schedule the playout times for all speech packets in a talkspurt (interval of essentially continuous active speech) at the beginning of the talkspurt. In particular, the playout time for the first packet in a talkspurt is set during the silence preceding the talkspurt, and the playout times of subsequent packets are just increments of the 20 ms frame length. The playout time of the first packet is taken as the then-current playout delay which can be determined by various algorithms: one approach has a normal mode and a spike mode. The normal mode would determine a playout delay as the sum of the filtered arrival delay of previously-arrived packets and a multiple of the filtered variation of the arrival delays; this provides a long-term adaptation. Spike mode would determine a playout delay from the arrival delay of the last-arrived packet during a delay spike (which is detected by a large gradient in packet delays); this provides a short-term adaptation. Ramjee et al and Moon et al use packets consisting of 160 PCM samples of speech sampled at the usual 8 kHz; that is, packetize 20 ms frames. And packets arriving after their scheduled playout times are discarded. Thus a large delay spike occurring within a talkspurt implies a sequence of lost packets because the playout delay cannot adjust until the next talkspurt. [0014]
  • Leung et al, Speech Coding over Frame Relay Networks, Proc. IEEE Workshop on Speech Coding for Telecommunications 75 (1993) describes an adaptive playout time for a CELP decoder which adjusts the playout delay by a fraction of the error of the current frame (packet) arrival delay from a target arrival delay. The playout delay is adjusted during a talkspurt by either extending or truncating the CELP excitation of a frame to one of the fixed lengths; that is, a 20 ms frame can be truncated to 10 or 15 ms only or extended to 25 or 40 ms only. Larger playout delay adjustments can be made only during silence frames. Also, lost or discarded packets (arriving after their playout times) can be reconstructed by using the CELP parameters of the last good frame together with bandwidth expansion of the LP coefficients and attenuation of the gains at 4 dB for successive reconstructed frames. Thus with a large delay spike (a sequence of late arriving and thus discarded frames), the initial lost frames could be reconstructions of the last good frame but would attenuate to essentially silence within a few frames. For isolated lost frames, both the prior and subsequent good frames can be used for two-sided prediction (e.g., interpolation) of CELP parameters for reconstruction. [0015]
  • However, these playout delay methods have poor results. [0016]
  • SUMMARY OF THE INVENTION
  • The present invention provides packetized CELP playout delay with short-term plus long-term adaptations together with adjustments during a talkspurt limited to frame expansions. Also, frame classification leads to alternative frame expansion methods. [0017]
  • This has advantages including improved performance for adaptive playout delay methods.[0018]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a preferred embodiment. [0019]
  • FIG. 2 shows known CELP decoder concealment. [0020]
  • FIGS. 3-4 are block diagrams of known CELP encoder and decoder. [0021]
  • FIGS. 5-6 illustrate systems. [0022]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • 1. Overview [0023]
  • Preferred embodiment decoders and methods for playout buffer timing in CELP-encoded speech received as packets or frames over a network have one or more of the features: (1) playout delay determined by short-term plus long-term adaptations where the adaptation during a talkspurt is limited to frame expansion, (2) frame expansions for voiced frames in multiples of the pitch delay but unconstrained for unvoiced frames, (3) frame expansions for a transition frame either as a voiced frame or as only the unvoiced portion. The frame expansions use CELP parameters and, optionally, add bandwidth expansion and gain attenuation. The methods minimize the playout delay for better interactive performance while insuring all received packets get played out. [0024]
  • FIG. 1 illustrates a preferred embodiment expansion for a voiced frame: the packet with frame m+1 is delayed in the network, so the voiced frame m expands by three multiples of its pitch delay, T[0025] (m), to allow frame m+1 playout without a gap and with phase alignment due to expansion in multiples of the pitch delay.
  • Applications of the preferred embodiment playout methods extend to include hybrid packet-rate adaptation for congestion control, and voice-synchronization with other media. In particular, for hybrid packet-rate adaptation, a decrease in packet-rate (number of packets sent per second) occurs during both silence periods and active speech, but an increase in packet-rate occurs only during silence periods. The step of decreasing packet-rate during active speech uses the speech frame expansion at the receiver for handling playout delay change, and the preferred embodiment methods (frame voicing classification determining expansion approach) apply. Similarly, to synchronize packetized speech and video or other streaming media, the speech playout may be adjusted by the preferred embodiment methods using the video as the time base. [0026]
  • Preferred embodiment systems (e.g., Voice over IP or Voice over Packet) incorporate preferred embodiment playout methods in receivers/decoders and may include an air interface as part or all of the communications channel. [0027]
  • 2. First Preferred Embodiment Playout Timing [0028]
  • To illustrate the first preferred embodiment playout buffer method, presume a received sequence of packets with each packet containing a CELP-encoded 10 ms frame of speech (or silence between talkspurts) and a send time stamp so the position of a packet in the sequence can be determined at the receiver regardless of the order of receipt or delay of individual packets. Thus a 10-minute conversation using such VOP corresponds to a sequence of 60000 packets received (and also 60000 sent in the other direction, one every 10 ms) with typically more than half of the packets containing a frame of silence (background noise during pauses or while the other conversant is talking). [0029]
  • The first preferred embodiment playout method schedules a playout time (decoding time) for a CELP frame in a packet as the later of (i) the packet's send time (time stamp) plus a delay threshold or (ii) the packet's arrival time. The delay threshold is set so that a high percentage (e.g., 98%) of recently arrived packets (e.g., the last 200 packets) likely have a delay less than or equal to the delay threshold. The delay threshold has a long-term adaptation. In more detail, let the variable “playout_delay” denote the playout delay (playout time minus send time) of the current packet in the received sequence of packets, the variable “delay” denote the delay (arrival time minus send time) of the current packet, and the variable “estimate” be the estimated delay threshold which has the percentage of packets with delay less than or equal to “estimate” at about DELAY_PERCENTILE, a constant such as 0.98. Then the playout delay derives from: [0030]
    if (delay > playout_delay)
      playout_delay = delay; /* immediate delay increase (short term) */
     else
      playout_delay = estimate; /* long term */
  • And the filtering to generate “estimate” from “delay_threshold” is: [0031]
    if (delay_threshold>estimate)
     estimate = PARAM1*estimate + (1-PARAM1)*delay_threshold;
    else
     estimate = PARAM2*estimate + (1-PARAM2)*delay_threshold;
  • where PARAM[0032] 1 is a parameter roughly about 0.5 and PARAM2 is a parameter roughly about 0.9. This choice of PARAM1 and PARAM2 allows “estimate” to rapidly increase but only slowly decrease. The delay threshold derives from a histogram of the delays of the NUM_OF_PACKETS (e.g., 200) previously-arrived packets. Indeed, let “delays[]” be an array of size NUM_OF_PACKETS with entries being the delays of recently arrived packets; treat the array as a circular FIFO with read/write position indicator for the current packet as the variable “position”. The histogram of delays is the array “distribution_fcn[]” with a delay quantized to the nearest 1 ms and the array has size 1000 so the histogram delay saturates at 1000 ms.
    /* Remove old delay value */
      if (delays[position] <= delay_threshold)
       num_of_packets_played −= 1;
      distribution_fcn[delays[position]] −= 1;
      /* Get current packet delay */
      delays[position] = delay;
      /* Update the delay distribution with the current packet delay */
      if (delays[position] <= delay_threshold)
       num_of_packets_played += 1;
       distribution_fcn[delays[position]] += 1;
       /* Update the delay threshold. */
       while (num_of_packets_played >
         NUM_OF_PACKETS*DELAY_PERCENTILE) {
       num_of_packets_played −= distribution_fcn[delay_threshold];
        delay_threshold −= DELAY_STEP_SIZE;
       }
       while (num_of_packets_played <
         NUM_OF_PACKETS*DELAY_PERCENTILE) {
        delay_threshold += DELAY_STEP_SIZE;
        num_of_packets_played += distribution_fcn[delay_threshold];
       }
       /* Update position */
       position = (position+1)%NUM_OF_PACKETS;
  • where DELAY_STEP_SIZE is the delay quantization step size, e.g., 1 ms. The variable “num_of_packets_played” is the number of packets within the NUM_OF_PACKETS recently-arrived packets which would have delays that are less than or equal to the current delay threshold. [0033]
  • As an example of the foregoing, presume an initial normal distribution of packet delays with mean 110 ms and standard deviation 10 ms, so 98% of the delays would be less than or equal to about 130 ms. If 200 packet delays are used in the histogram, then 4 of the 200 delays would be expected to exceed 130 ms. Indeed, presume the six largest delays are 129, 130, 132, 135, 140, and 147 ms; then the value of “delay_threshold” would equal 130. If such a distribution of delays had persisted for a time, then “estimate” would also equal 130, and the scheduled playout time for a timely packet would be the packet's send time plus 130 ms. Now presume a delay spike in which the packet's delay jumps to 180 ms for 10 consecutive packets and then drops back to the distribution about 110 ms. For this spike “playout_delay” would immediately jump to 180 (short-term adaptation) and thus the playout times for these 10 packets be scheduled as arrival time which equals send time plus 180 ms. Simultaneously, “delay_threshold” should increase from 130 to 132 to 135 to 140 to 147 to 180 and stay there for 200 packets before dropping back at roughly the same rate to about 130. And as “delay_threshold” increases, “estimate” also increases, but more slowly due to the filtering; in particular, “estimate” increases from 130 to 131 (=0.5*130+0.5*132) to 133 (=0.5*131+0.5*135) to 137 (=0.5*134+0.5*140) to 142 (=0.5*137+0.5*147) to 161 (=0.5*142+0.5*180) and so forth as “estimate” asymptotically approaches 180. Then 200 packets after the delay spike the histogram begins losing the 10 delays of 180 and “estimate” begins to slowly drop. Indeed, if “playout_delay” drops from 180 to 147 to 140 to 135 to 132 to a sequence of 130s, then “estimate” decreases from 180 to 176 (=0.9*180+0.1*147) to 172 (=0.9*176+0.1*140) to 168 (=0.9*172+0.1*135) to 164 (=0.9*168+0.1*132) to 160 (=0.9*164+0.1*130) and so forth to slowly asymptotically approach 130. In summary, the spike causes a short-term (following “delay”) jump of “playout_delay” to 180 which then long-term (following “estimate”) persists for 200 packets (the size of the histogram) and then slowly decays back to 130. [0034]
  • When the playout delay for the current frame is an increase over the playout delay for the prior frame, there will be a time gap between the end of the prior frame and the beginning of the current frame, so the preferred embodiments expand the prior frame to fill the gap. This expansion applies whether the prior frame is active speech or silence. Contrarily, when the playout delay is to decrease, such as when “delay” drops below “estimate”, then if the packet has a frame of silence, the current frame is compressed; otherwise if the packet has a frame of active speech, the decrease is put off until a frame of silence occurs. In particular, with the variable “new_playout_delay” for the current frame equal to “playout_delay” for the next frame, modification of the current frame decoding follows as: [0035]
    if (new_playout_delay > playout_delay) {/* playout delay increase */
      modification = EXPANSION;
     }
     else if (new_playout_delay < playout_delay) {/* playout delay
     decrease */
     if (type == ACTIVE_SPEECH)
      modification = NO_MODIFICATION;
     else
      modification = SILENCE_COMPRESSION;
     }
  • In the foregoing the variable “modification” sets the decoding to expand, compress, or not change the decoded frame length from the encoded frame length of 10 ms. Indeed, for EXPANSION invoke a frame expansion method as described in the following section, and for SILENCE_COMPRESSION truncate the (silence) frame (e.g., truncate the excitation) by the amount “playout_delay”−“new_playout_delay”. If this truncation exceeds the frame length, then extend to subsequent silence frames. Further, for an active speech frame with NO_MODIFICATION, the compression is pushed to the next packet by increasing “playout_delay” for the next packet to equal “playout_delay” for the current packet. [0036]
  • Alternative preferred embodiment methods for frame expansion may be used as described in the next section. Some frame expansion methods include gain attenuation and bandwidth expansion, and in this case the gap at the onset of a large delay spike is filled with a sequence of fading versions of the last timely-arrived frame prior to the spike. [0037]
  • 3. Preferred Embodiment Frame Expansions [0038]
  • Preferred embodiment frame expansion methods first perform a voicing classification of the frame and then expand accordingly. Thus classify a frame as (1) voiced if the normalized correlation is larger than a threshold (e.g., 0.7 as in G.729 postfilter) or if the peakiness measure (ratio of L2 norm to L1 norm) is larger than a threshold (=1.3) plus the zero-crossing rate is smaller than another threshold (=0.3); otherwise the frame is classified as (2) unvoiced or as (3) a transition from unvoiced to voiced if a first subframe is unvoiced and a second subframe is voiced. [0039]
  • (1) Expand a voiced frame by integer multiples of the pitch delay (pitch period) of the frame, so the expanded frame ends at roughly the same phase as the original frame. That is, for “new_playout_delay” greater than “playout_delay”, first form an excitation by N repeats of the last pitch-delay length portion of the excitation of the current frame where N is the smallest integer at least as large as (“new_playout_delay”−“playout_delay”)/(pitch delay). Then apply this excitation to the LP synthesis filter of the current frame to generate the expansion of the current frame. Lastly, increase “playout_delay” for the frame of the next packet (“new_playout_delay” for the current packet) to equal the current frame “playout_delay”+N*(pitch delay); this aligns the start of the next frame with the end of the current frame expansion. Note that this alignment may make “playout_delay” exceed “delay” for the next packet; see FIG. 1. [0040]
  • (2) Expand an unvoiced frame by synthesize with repeats of the LP synthesis filter coefficients, pitch delay, and adaptive and fixed codebook gains, plus an excitation with a random fixed-codebook vector and an adaptive-codebook contribution. [0041]
  • (3) Expand an unvoiced-to-voiced transition frame by one of two preferred embodiment methods: the first method treats a transition frame as a voiced frame and follows the foregoing description for a voiced frame expansion. The second method expands only the initial unvoiced portion of the frame and follows the foregoing description of unvoiced frame expansion. In this second method the frame to be expanded is not fully played out; but rather the voiced latter portion is delayed and the expansion repeats would use the first subframe LP parameters. Thus the second method requires some look ahead to see that a expansion will be needed and then to prevent the final voiced portion from being played out until needed. [0042]
  • Alternate preferred embodiments for (1)-(3) attenuate the adaptive and fixed codebook gains by 1 dB for each 10 ms of expansion and apply bandwidth expansion to the LP coefficients. This gradually mutes the frame expansion for long expansions. Indeed, many detail variations may be used, including dropping the fixed-codebook contribution to the excitation for a periodic frame, dropping the adaptive-codebook contribution and using a random fixed-codebook vector for a nonperiodic frame, separate attenuation rates of adaptive and fixed codebook gains, incrementing the pitch delay during expansion, and so forth. [0043]
  • The frame expansion preferred embodiments may be used with playout methods other than the preferred embodiment described in the foregoing. [0044]
  • 4. Preferred Embodiment Truncations [0045]
  • Methods to synchronize voice with other media or adapt voice packet-rate when speech truncation is needed may use preferred embodiment truncation methods which are analogous to the foregoing speech expansion methods. (1) If the speech is voiced, it is truncated only in integer multiples of the pitch period; and (2) if the speech is unvoiced (including silences), no constraint on truncation is applied. [0046]
  • 5. Preferred Embodiments For Lost Packets [0047]
  • When a packet appears lost or erased due to uncorrectable errors, such as when (several) packets containing frames later in the sequence of sent frames than the frame of the lost/erased packet are received, then interpolate to reconstruct without change of “playout_delay”. Alternatively, wait up to a threshold time (e.g., 300 ms) with expansion of the prior frame before deciding that the packet is lost. For an isolated lost/erased packet, use G.729 concealment as described in the background or other concealment method. [0048]
  • 6. System Preferred Embodiments [0049]
  • FIGS. 5-6 show in functional block form preferred embodiment systems which use a preferred embodiment playout method, both speech and also other signals which can be effectively CELP coded. In preferred embodiment communications systems users (transmitters and/or receivers) could include one or more digital signal processors (DSP's) and/or other programmable devices such as RISC processors with stored programs for performance of the signal processing of a preferred embodiment method. Alternatively, specialized circuitry (ASICs) could be used with (partically) hardwired preferred embodiments methods. Users may also contain analog and/or mixed-signal integrated circuits for amplification or filtering of inputs to or outputs from a communications channel and for conversion between analog and digital. Such analog and digital circuits may be integrated on a single die. The stored programs, including codebooks, may, for example, be in ROM or flash EEPROM or FeRAM which is integrated with the processor or external to the processor. Antennas may be parts of receivers with multiple finger RAKE detectors for air interface to networks such as the Internet. Exemplary DSP cores could be in the TMS320C6xxx or TMS320C5xxx families from Texas Instruments. [0050]
  • 7. Modifications [0051]
  • The preferred embodiments may be modified in various ways while retaining one or more of the features of playout delay increase during a talkspurt but a decrease only during silence and voiced frame expansion by multiples of the pitch delay. [0052]
  • For example, the frame voicing classification may have more classes with two or more classes leading to frame expansions with multiples of the pitch delay but with differing excitations, interval (frame and subframe) size and sampling rate could differ; various gain attenuation rates and bandwidth expansion factors could be used, the CELP encoding may be layered (successively more bits to higher layers) and the playout frame expansion may only use the lower levels, . . . [0053]

Claims (8)

What is claimed is
1. A method for playout of packetized speech, comprising:
(a) deferring truncation of an active frame; and
(b) truncating a silence frame.
2. The method of claim 1, wherein:
(a) said packetized speech includes CELP-encoded frames; and
(b) said truncating a silence frame includes truncating an excitation for said silence frame.
3. The method of claim 1, further comprising:
(a) expanding an active frame according to a voicing classification for said active frame.
4. A method of frame playout expansion, comprising:
(a) classifying a frame as voiced or not; and
(b) expanding a voiced frame by a multiple of the pitch of said voiced frame.
5. The method of claim 4, wherein:
(a) said frames are CELP-encoded frames; and
(b) said expanding a voiced frame includes expanding an excitation for said voiced frame by a multiple of the pitch of said voiced frame.
6. The method of claim 4, wherein:
(a) said classifying a frame of step (a) classifies an active frame as one of (i) voiced, (ii) unvoiced, or (iii) transition; and
(b) expanding an unvoiced frame includes expanding an excitation for said unvoiced frame with a random fixed-codebook vector.
7. A receiver, comprising:
(a) an input for receiving CELP-encoded frames;
(b) a decoder coupled to said input; and
(c) a playout scheduler coupled to said input;
(d) said decoder operable to provide expansion of a voiced frame in response to said playout scheduler, wherein said expansion is a multiple of the pitch for said voiced frame.
8. The receiver of claim 7, wherein:
(a) said decoder operable to provide truncation of a frame in response to said playout scheduler only when said frame is a silence frame.
US10/081,355 2001-02-21 2002-02-21 Adaptive voice playout in VOP Abandoned US20040204935A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/081,355 US20040204935A1 (en) 2001-02-21 2002-02-21 Adaptive voice playout in VOP
US12/136,662 US7577565B2 (en) 2001-02-21 2008-06-10 Adaptive voice playout in VOP

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US27026401P 2001-02-21 2001-02-21
US10/081,355 US20040204935A1 (en) 2001-02-21 2002-02-21 Adaptive voice playout in VOP

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/136,662 Continuation US7577565B2 (en) 2001-02-21 2008-06-10 Adaptive voice playout in VOP

Publications (1)

Publication Number Publication Date
US20040204935A1 true US20040204935A1 (en) 2004-10-14

Family

ID=33134512

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/081,355 Abandoned US20040204935A1 (en) 2001-02-21 2002-02-21 Adaptive voice playout in VOP
US12/136,662 Expired - Fee Related US7577565B2 (en) 2001-02-21 2008-06-10 Adaptive voice playout in VOP

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/136,662 Expired - Fee Related US7577565B2 (en) 2001-02-21 2008-06-10 Adaptive voice playout in VOP

Country Status (1)

Country Link
US (2) US20040204935A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040076190A1 (en) * 2002-10-21 2004-04-22 Nagendra Goel Method and apparatus for improved play-out packet control algorithm
US20050065782A1 (en) * 2000-09-22 2005-03-24 Jacek Stachurski Hybrid speech coding and system
US20060045139A1 (en) * 2004-08-30 2006-03-02 Black Peter J Method and apparatus for processing packetized data in a wireless communication system
US20060077994A1 (en) * 2004-10-13 2006-04-13 Spindola Serafin D Media (voice) playback (de-jitter) buffer adjustments base on air interface
WO2006048733A1 (en) * 2004-11-03 2006-05-11 Nokia Corporation Method and device for low bit rate speech coding
US20060206318A1 (en) * 2005-03-11 2006-09-14 Rohit Kapoor Method and apparatus for phase matching frames in vocoders
US20060206334A1 (en) * 2005-03-11 2006-09-14 Rohit Kapoor Time warping frames inside the vocoder by modifying the residual
US20060277039A1 (en) * 2005-04-22 2006-12-07 Vos Koen B Systems, methods, and apparatus for gain factor smoothing
US20060277038A1 (en) * 2005-04-01 2006-12-07 Qualcomm Incorporated Systems, methods, and apparatus for highband excitation generation
US20080154587A1 (en) * 2006-12-26 2008-06-26 Gh Innovation, In Gain Quantization System for Speech Coding to Improve Packet Loss Concealment
US20080288094A1 (en) * 2004-07-23 2008-11-20 Mitsugi Fukushima Auto Signal Output Device
US20090059806A1 (en) * 2007-08-27 2009-03-05 Texas Instruments Incorporated Method, system and apparatus for providing signal based packet loss concealment for memoryless codecs
US20090268724A1 (en) * 1999-12-14 2009-10-29 Texas Instruments Incorporated Systems, processes and integrated circuits for rate and/or diversity adaptation for packet communications
US20100010810A1 (en) * 2006-12-13 2010-01-14 Panasonic Corporation Post filter and filtering method
US7668968B1 (en) * 2002-12-03 2010-02-23 Global Ip Solutions, Inc. Closed-loop voice-over-internet-protocol (VOIP) with sender-controlled bandwidth adjustments prior to onset of packet losses
US20130163588A1 (en) * 2007-03-20 2013-06-27 Microsoft Corporation Method of transmitting data in a communication system
US9246644B2 (en) 2011-10-25 2016-01-26 Microsoft Technology Licensing, Llc Jitter buffer
US20160104489A1 (en) * 2013-06-21 2016-04-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing improved concepts for tcx ltp
US9336790B2 (en) 2006-12-26 2016-05-10 Huawei Technologies Co., Ltd Packet loss concealment for speech coding
CN108074586A (en) * 2016-11-15 2018-05-25 电信科学技术研究院 A kind of localization method and device of phonetic problem

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8483243B2 (en) * 2006-09-15 2013-07-09 Microsoft Corporation Network jitter smoothing with reduced delay
JP5632486B2 (en) * 2009-12-24 2014-11-26 テレコム・イタリア・エッセ・ピー・アー Method for scheduling transmissions in a communication network, corresponding communication node and computer program product
US9123328B2 (en) * 2012-09-26 2015-09-01 Google Technology Holdings LLC Apparatus and method for audio frame loss recovery
KR20140067512A (en) * 2012-11-26 2014-06-05 삼성전자주식회사 Signal processing apparatus and signal processing method thereof
US9437203B2 (en) * 2013-03-07 2016-09-06 QoSound, Inc. Error concealment for speech decoder
US9876711B2 (en) 2013-11-05 2018-01-23 Cisco Technology, Inc. Source address translation in overlay networks
US9437211B1 (en) * 2013-11-18 2016-09-06 QoSound, Inc. Adaptive delay for enhanced speech processing
TWI602172B (en) * 2014-08-27 2017-10-11 弗勞恩霍夫爾協會 Encoder, decoder and method for encoding and decoding audio content using parameters for enhancing a concealment
US10116493B2 (en) 2014-11-21 2018-10-30 Cisco Technology, Inc. Recovering from virtual port channel peer failure
RU2591640C1 (en) * 2015-05-27 2016-07-20 Александр Юрьевич Бредихин Method of modifying voice and device therefor (versions)
US10142163B2 (en) 2016-03-07 2018-11-27 Cisco Technology, Inc BFD over VxLAN on vPC uplinks
US10333828B2 (en) 2016-05-31 2019-06-25 Cisco Technology, Inc. Bidirectional multicasting over virtual port channel
US11509501B2 (en) 2016-07-20 2022-11-22 Cisco Technology, Inc. Automatic port verification and policy application for rogue devices
US10193750B2 (en) 2016-09-07 2019-01-29 Cisco Technology, Inc. Managing virtual port channel switch peers from software-defined network controller
US10547509B2 (en) 2017-06-19 2020-01-28 Cisco Technology, Inc. Validation of a virtual port channel (VPC) endpoint in the network fabric

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5414796A (en) * 1991-06-11 1995-05-09 Qualcomm Incorporated Variable rate vocoder
US5828944A (en) * 1996-01-11 1998-10-27 Illinois Superconductor Corporation Diversity reception signal processing system
US5828995A (en) * 1995-02-28 1998-10-27 Motorola, Inc. Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages
US5839110A (en) * 1994-08-22 1998-11-17 Sony Corporation Transmitting and receiving apparatus
US5977960A (en) * 1996-09-10 1999-11-02 S3 Incorporated Apparatus, systems and methods for controlling data overlay in multimedia data processing and display systems using mask techniques
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder
US6175871B1 (en) * 1997-10-01 2001-01-16 3Com Corporation Method and apparatus for real time communication over packet networks
US6311154B1 (en) * 1998-12-30 2001-10-30 Nokia Mobile Phones Limited Adaptive windows for analysis-by-synthesis CELP-type speech coding
US20020009149A1 (en) * 1999-12-14 2002-01-24 Rodriguez Arturo A. System and method for adaptive video processing with coordinated resource allocation
US6351730B2 (en) * 1998-03-30 2002-02-26 Lucent Technologies Inc. Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment
US6369722B1 (en) * 2000-03-17 2002-04-09 Matra Nortel Communications Coding, decoding and transcoding methods
US6393394B1 (en) * 1999-07-19 2002-05-21 Qualcomm Incorporated Method and apparatus for interleaving line spectral information quantization methods in a speech coder
US6484137B1 (en) * 1997-10-31 2002-11-19 Matsushita Electric Industrial Co., Ltd. Audio reproducing apparatus
US6611536B1 (en) * 1999-08-11 2003-08-26 International Business Machines Corporation System and method for integrating voice and data on a single RF channel

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995017745A1 (en) * 1993-12-16 1995-06-29 Voice Compression Technologies Inc. System and method for performing voice compression
EP0932141B1 (en) * 1998-01-22 2005-08-24 Deutsche Telekom AG Method for signal controlled switching between different audio coding schemes
US6604070B1 (en) * 1999-09-22 2003-08-05 Conexant Systems, Inc. System of encoding and decoding speech signals
US6826527B1 (en) * 1999-11-23 2004-11-30 Texas Instruments Incorporated Concealment of frame erasures and method
KR100367700B1 (en) * 2000-11-22 2003-01-10 엘지전자 주식회사 estimation method of voiced/unvoiced information for vocoder
US20030135374A1 (en) * 2002-01-16 2003-07-17 Hardwick John C. Speech synthesizer

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5414796A (en) * 1991-06-11 1995-05-09 Qualcomm Incorporated Variable rate vocoder
US5839110A (en) * 1994-08-22 1998-11-17 Sony Corporation Transmitting and receiving apparatus
US5828995A (en) * 1995-02-28 1998-10-27 Motorola, Inc. Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages
US5828944A (en) * 1996-01-11 1998-10-27 Illinois Superconductor Corporation Diversity reception signal processing system
US5977960A (en) * 1996-09-10 1999-11-02 S3 Incorporated Apparatus, systems and methods for controlling data overlay in multimedia data processing and display systems using mask techniques
US6175871B1 (en) * 1997-10-01 2001-01-16 3Com Corporation Method and apparatus for real time communication over packet networks
US6484137B1 (en) * 1997-10-31 2002-11-19 Matsushita Electric Industrial Co., Ltd. Audio reproducing apparatus
US6351730B2 (en) * 1998-03-30 2002-02-26 Lucent Technologies Inc. Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder
US6311154B1 (en) * 1998-12-30 2001-10-30 Nokia Mobile Phones Limited Adaptive windows for analysis-by-synthesis CELP-type speech coding
US6393394B1 (en) * 1999-07-19 2002-05-21 Qualcomm Incorporated Method and apparatus for interleaving line spectral information quantization methods in a speech coder
US6611536B1 (en) * 1999-08-11 2003-08-26 International Business Machines Corporation System and method for integrating voice and data on a single RF channel
US20020009149A1 (en) * 1999-12-14 2002-01-24 Rodriguez Arturo A. System and method for adaptive video processing with coordinated resource allocation
US6369722B1 (en) * 2000-03-17 2002-04-09 Matra Nortel Communications Coding, decoding and transcoding methods

Cited By (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8024182B2 (en) * 1999-12-14 2011-09-20 Texas Instruments Incorporated Rate/diversity adaptation sending speech in first and second packets
US20090268724A1 (en) * 1999-12-14 2009-10-29 Texas Instruments Incorporated Systems, processes and integrated circuits for rate and/or diversity adaptation for packet communications
US20050065782A1 (en) * 2000-09-22 2005-03-24 Jacek Stachurski Hybrid speech coding and system
US7386444B2 (en) * 2000-09-22 2008-06-10 Texas Instruments Incorporated Hybrid speech coding and system
US7630409B2 (en) * 2002-10-21 2009-12-08 Lsi Corporation Method and apparatus for improved play-out packet control algorithm
US20040076190A1 (en) * 2002-10-21 2004-04-22 Nagendra Goel Method and apparatus for improved play-out packet control algorithm
US7668968B1 (en) * 2002-12-03 2010-02-23 Global Ip Solutions, Inc. Closed-loop voice-over-internet-protocol (VOIP) with sender-controlled bandwidth adjustments prior to onset of packet losses
US20080288094A1 (en) * 2004-07-23 2008-11-20 Mitsugi Fukushima Auto Signal Output Device
US8160887B2 (en) * 2004-07-23 2012-04-17 D&M Holdings, Inc. Adaptive interpolation in upsampled audio signal based on frequency of polarity reversals
US20060050743A1 (en) * 2004-08-30 2006-03-09 Black Peter J Method and apparatus for flexible packet selection in a wireless communication system
US7830900B2 (en) 2004-08-30 2010-11-09 Qualcomm Incorporated Method and apparatus for an adaptive de-jitter buffer
US7826441B2 (en) 2004-08-30 2010-11-02 Qualcomm Incorporated Method and apparatus for an adaptive de-jitter buffer in a wireless communication system
US8331385B2 (en) 2004-08-30 2012-12-11 Qualcomm Incorporated Method and apparatus for flexible packet selection in a wireless communication system
US7817677B2 (en) 2004-08-30 2010-10-19 Qualcomm Incorporated Method and apparatus for processing packetized data in a wireless communication system
US20060045138A1 (en) * 2004-08-30 2006-03-02 Black Peter J Method and apparatus for an adaptive de-jitter buffer
US20060045139A1 (en) * 2004-08-30 2006-03-02 Black Peter J Method and apparatus for processing packetized data in a wireless communication system
US20110222423A1 (en) * 2004-10-13 2011-09-15 Qualcomm Incorporated Media (voice) playback (de-jitter) buffer adjustments based on air interface
US8085678B2 (en) 2004-10-13 2011-12-27 Qualcomm Incorporated Media (voice) playback (de-jitter) buffer adjustments based on air interface
US20060077994A1 (en) * 2004-10-13 2006-04-13 Spindola Serafin D Media (voice) playback (de-jitter) buffer adjustments base on air interface
US20060106600A1 (en) * 2004-11-03 2006-05-18 Nokia Corporation Method and device for low bit rate speech coding
US7752039B2 (en) 2004-11-03 2010-07-06 Nokia Corporation Method and device for low bit rate speech coding
WO2006048733A1 (en) * 2004-11-03 2006-05-11 Nokia Corporation Method and device for low bit rate speech coding
JP2008533530A (en) * 2005-03-11 2008-08-21 クゥアルコム・インコーポレイテッド Method and apparatus for phase matching of frames in a vocoder
US8155965B2 (en) 2005-03-11 2012-04-10 Qualcomm Incorporated Time warping frames inside the vocoder by modifying the residual
US8355907B2 (en) 2005-03-11 2013-01-15 Qualcomm Incorporated Method and apparatus for phase matching frames in vocoders
WO2006099534A1 (en) * 2005-03-11 2006-09-21 Qualcomm Incorporated Method and apparatus for phase matching frames in vocoders
US20060206334A1 (en) * 2005-03-11 2006-09-14 Rohit Kapoor Time warping frames inside the vocoder by modifying the residual
US20060206318A1 (en) * 2005-03-11 2006-09-14 Rohit Kapoor Method and apparatus for phase matching frames in vocoders
KR100956526B1 (en) * 2005-03-11 2010-05-07 퀄컴 인코포레이티드 Method and apparatus for phase matching frames in vocoders
US8069040B2 (en) 2005-04-01 2011-11-29 Qualcomm Incorporated Systems, methods, and apparatus for quantization of spectral envelope representation
US8244526B2 (en) 2005-04-01 2012-08-14 Qualcomm Incorporated Systems, methods, and apparatus for highband burst suppression
US8484036B2 (en) 2005-04-01 2013-07-09 Qualcomm Incorporated Systems, methods, and apparatus for wideband speech coding
US8364494B2 (en) 2005-04-01 2013-01-29 Qualcomm Incorporated Systems, methods, and apparatus for split-band filtering and encoding of a wideband signal
US20060277038A1 (en) * 2005-04-01 2006-12-07 Qualcomm Incorporated Systems, methods, and apparatus for highband excitation generation
US8332228B2 (en) 2005-04-01 2012-12-11 Qualcomm Incorporated Systems, methods, and apparatus for anti-sparseness filtering
US20080126086A1 (en) * 2005-04-01 2008-05-29 Qualcomm Incorporated Systems, methods, and apparatus for gain coding
US8078474B2 (en) 2005-04-01 2011-12-13 Qualcomm Incorporated Systems, methods, and apparatus for highband time warping
US20070088541A1 (en) * 2005-04-01 2007-04-19 Vos Koen B Systems, methods, and apparatus for highband burst suppression
US8140324B2 (en) 2005-04-01 2012-03-20 Qualcomm Incorporated Systems, methods, and apparatus for gain coding
US20070088542A1 (en) * 2005-04-01 2007-04-19 Vos Koen B Systems, methods, and apparatus for wideband speech coding
US8260611B2 (en) 2005-04-01 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for highband excitation generation
US9043214B2 (en) 2005-04-22 2015-05-26 Qualcomm Incorporated Systems, methods, and apparatus for gain factor attenuation
US8892448B2 (en) 2005-04-22 2014-11-18 Qualcomm Incorporated Systems, methods, and apparatus for gain factor smoothing
US20060282262A1 (en) * 2005-04-22 2006-12-14 Vos Koen B Systems, methods, and apparatus for gain factor attenuation
US20060277039A1 (en) * 2005-04-22 2006-12-07 Vos Koen B Systems, methods, and apparatus for gain factor smoothing
US20100010810A1 (en) * 2006-12-13 2010-01-14 Panasonic Corporation Post filter and filtering method
US10083698B2 (en) 2006-12-26 2018-09-25 Huawei Technologies Co., Ltd. Packet loss concealment for speech coding
US8000961B2 (en) * 2006-12-26 2011-08-16 Yang Gao Gain quantization system for speech coding to improve packet loss concealment
US20080154587A1 (en) * 2006-12-26 2008-06-26 Gh Innovation, In Gain Quantization System for Speech Coding to Improve Packet Loss Concealment
US9336790B2 (en) 2006-12-26 2016-05-10 Huawei Technologies Co., Ltd Packet loss concealment for speech coding
US9767810B2 (en) 2006-12-26 2017-09-19 Huawei Technologies Co., Ltd. Packet loss concealment for speech coding
US20130163588A1 (en) * 2007-03-20 2013-06-27 Microsoft Corporation Method of transmitting data in a communication system
US9437216B2 (en) * 2007-03-20 2016-09-06 Skype Method of transmitting data in a communication system
US7929520B2 (en) * 2007-08-27 2011-04-19 Texas Instruments Incorporated Method, system and apparatus for providing signal based packet loss concealment for memoryless codecs
US20090059806A1 (en) * 2007-08-27 2009-03-05 Texas Instruments Incorporated Method, system and apparatus for providing signal based packet loss concealment for memoryless codecs
US9246644B2 (en) 2011-10-25 2016-01-26 Microsoft Technology Licensing, Llc Jitter buffer
US9978376B2 (en) 2013-06-21 2018-05-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application
US10672404B2 (en) 2013-06-21 2020-06-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an adaptive spectral shape of comfort noise
US9978377B2 (en) 2013-06-21 2018-05-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an adaptive spectral shape of comfort noise
US9916833B2 (en) 2013-06-21 2018-03-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
US11869514B2 (en) 2013-06-21 2024-01-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
US9997163B2 (en) * 2013-06-21 2018-06-12 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing improved concepts for TCX LTP
US20180268825A1 (en) * 2013-06-21 2018-09-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing improved concepts for tcx ltp
US20160104489A1 (en) * 2013-06-21 2016-04-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing improved concepts for tcx ltp
US10607614B2 (en) 2013-06-21 2020-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application
US9978378B2 (en) 2013-06-21 2018-05-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
US10679632B2 (en) 2013-06-21 2020-06-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
US10854208B2 (en) * 2013-06-21 2020-12-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing improved concepts for TCX LTP
US10867613B2 (en) 2013-06-21 2020-12-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
US11462221B2 (en) 2013-06-21 2022-10-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an adaptive spectral shape of comfort noise
US11501783B2 (en) 2013-06-21 2022-11-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application
US11776551B2 (en) 2013-06-21 2023-10-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
CN108074586A (en) * 2016-11-15 2018-05-25 电信科学技术研究院 A kind of localization method and device of phonetic problem

Also Published As

Publication number Publication date
US20080243495A1 (en) 2008-10-02
US7577565B2 (en) 2009-08-18

Similar Documents

Publication Publication Date Title
US7577565B2 (en) Adaptive voice playout in VOP
US7590531B2 (en) Robust decoder
US7319703B2 (en) Method and apparatus for reducing synchronization delay in packet-based voice terminals by resynchronizing during talk spurts
US6389006B1 (en) Systems and methods for encoding and decoding speech for lossy transmission networks
US6775649B1 (en) Concealment of frame erasures for speech transmission and storage system and method
JP4931318B2 (en) Forward error correction in speech coding.
RU2419891C2 (en) Method and device for efficient masking of deletion of frames in speech codecs
EP1849158B1 (en) Method for discontinuous transmission and accurate reproduction of background noise information
US20040156397A1 (en) Method and apparatus for reducing synchronization delay in packet switched voice terminals using speech decoder modification
EP2535893B1 (en) Device and method for lost frame concealment
US6873954B1 (en) Method and apparatus in a telecommunications system
US6826527B1 (en) Concealment of frame erasures and method
US7302385B2 (en) Speech restoration system and method for concealing packet losses
EP2084873B1 (en) Network jitter smoothing with reduced delay
US20080103765A1 (en) Encoder Delay Adjustment
JP3722366B2 (en) Packet configuration method and apparatus, packet configuration program, packet decomposition method and apparatus, and packet decomposition program
EP1103953A2 (en) Method for concealing erased speech frames
KR100542435B1 (en) Method and apparatus for frame loss concealment for packet network
Gournay et al. Performance analysis of a decoder-based time scaling algorithm for variable jitter buffering of speech over packet networks
Anandakumar et al. An adaptive voice playout method for VOP applications
US20040138878A1 (en) Method for estimating a codec parameter
Bhute et al. Error concealment schemes for speech packet transmission over IP network
Wu et al. Adaptive playout scheduling for multi-stream voice over IP networks

Legal Events

Date Code Title Description
AS Assignment

Owner name: TEXAS INSTRUMENTS INCORPORATED, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ANANDAKUMAR, KRISHNASAMY;MCCREE, ALAN V.;PAKSOY, ERDAL;REEL/FRAME:014524/0047;SIGNING DATES FROM 20020610 TO 20020612

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION