Joint Optimization of Linear Predictors in Speech Coders

Size: px
Start display at page:

Download "Joint Optimization of Linear Predictors in Speech Coders"

Transcription

1 642 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 37. NO. 5. MAY 1989 Joint Optimization of Linear Predictors in Speech Coders PETER KABAL, MEMBER, IEEE, AND RAVI P. RAMACHANDRAN Abstract-Low bit rate speech coders often employ both formant and pitch predictors to remove near-sample and distant-sample redundancies in the speech signal. The coefficients of these predictors are usually determined for one prediction filter and then for the other (a sequential solution). This paper deals with formant and pitch predictors which are jointly optimized. The first configuration considered is a combination prediction error filter (in either a transversal or a lattice form) that performs the functions of both a formant and a pitch filter. Although a transversal combination filter outperforms the conventional F-P (formant followed by pitch) sequential solution, the combination filter exhibits a high incidence of nonminimum phase filters. For an F-P cascade connection, combined solutions and iterated sequential solutions are found. They yield higher prediction gains than the conventional F-P sequential solution. Furthermore, a practical implementation of the iterated sequential solution is developed such that both the formant and pitch filters are minimum phase. This implementation leads to decoded speech of higher perceptual quality than the conventional sequential solution. I" low bit rate predictive coding of speech, two nonrecursive prediction error filters are often used to process the input signal before coding. The prediction operations are motivated by the fact that the input speech exhibits a high degree of intersample correlation. These correlations occur between adjacent samples (near-sample redundancy) and for voiced speech, between samples separated by the pitch period (far-sample redundancy). Near-sample redundancies can be attributed to the filtering action of the vocal tract. The resonances of the vocal tract correspond to the formant frequencies in speech. Far-sample redundancies can be attributed to the pitch excitation of voiced speech. Two filters, the formant and pitch predictors, are used to remove the near-sample and far-sample redundancies, respectively. The resulting prediction residual signal is of smaller amplitude and can be coded more efficiently than the original speech waveform. The predictor coefficients are adapted by updating them at fixed intervals to follow the time-varying correlation of the speech signal. An example of a system which uses the two predictor arrangement is an Adaptive Predictive Coder (APC). Manuscript received September 4, 1987: revised September 8, This work was supported by the Natural Sciences and Engineering Research Council of Canada. P. Kabal is with the Department of Electrical Engineering, McGill University. Montreal, P.Q., Canada, H3A 2A7 and INRS-Telecommunications, UniversitC du Quebec, Verdun, P.Q., Canada. H3E IH6. R. P. Ramachandran is with the Department of Electrical Engineering. McGill University, Montreal, P.Q., Canada, H3A 2A7. IEEE Log Number In conventional APC, the predictors are placed in a feedback loop around the quantizer. The quantization occurs sample-by-sample. With this configuration, it can be shown that the quantization noise is not only the difference between the residual and its quantized value but also the difference between the original speech signal and its reconstructed value. The perceptual distortion of the output speech can be reduced by adding a noise shaping filter which redistributes the quantization noise spectrum 111, [2]. The noise shaping filter increases the noise energy in the formant regions but decreases the noise power at frequencies in which the energy level is low. Its system function is often chosen to be a bandwidth expanded version of the transfer function of the formant predictor. An alternate APC configuration places the predictors in an open-loop format and includes a noise shaping filter [3] as depicted in Fig. 1. The quantization is again accomplished sample-by-sample. Code-Excited Linear Prediction (CELP) [4] combines an open-loop arrangement for the predictors with vector quantization. Vector quantization is implemented by searching a given repertoire of waveforms for a candidate waveform that best represents the residual in a weighted mean-square sense. The weighting is employed to accomplish noise shaping. The synthesis phase is similar in APC and CELP. In both cases, an excitation signal (the coded residual or the selected codeword after scaling) is passed through a pitch synthesis and a formant synthesis filter to produce the decoded speech. The synthesis operation can be viewed in the frequency domain as first inserting the periodic structure due to pitch and then inserting the spectral envelope (formant structure). A previous paper has considered the cascade connection of a formant and pitch predictor 151. In that paper, the coefficients were determined using a conventional sequential approach. For a sequential solution, the coefficients of the first predictor are determined from the input speech, and the coefficients of the second predictor are determined from the intermediate residual formed by the filtering action of the first predictor. The objective of this paper is to consider combination configurations and joint solutions for the formant and pitch filter coefficients. We present new algorithms for the joint optimization which give improved performance over standard techniques. The minimum phase property of the filters is also considered. A minimum phase prediction error filter at the analysis phase guarantees a stable synthesis filter. This is a signif l8/89/o5oo-o642$ol.oo O 1989 IEEE

2 KABAL AND RAMACHANDRAN: JOINT OPTIMIZATION OF LINEAR PREDICTORS Fig. 2. Analysis model for linear predictors Fig. 1. Block diagram of an APC coder with noise feedback. (a) Analysis phase. (b) Synthesis phase. The general linear predictor described above subsumes both formant and pitch predictors. A formant predictor F(z) has continuous support in that prediction is based on Nf previous samples (usually 8 I Nf I 16), icant issue since, if the synthesis filters are unstable, the quantization noise is accentuated and causes undesirable perceptual distortion in the output speech [6]. The final configuration considered constrains the solutions to be minimum phase. In addition, that system uses a simplified method to choose an appropriate pitch lag. This practical approach retains the gains due to joint optimization and gives real improvements in speech quality in a coding environment. Fig. 2 shows a general analysis model for a linear predictor with arbitrary delays Mk. Windows are applied to both the input and error signals. The aim of the analysis 2 is to minimize the squared error sum E* = E:= -, e,(n). This leads to a linear system of equations [5] which can be written in matrix form (ipc = a) as where the correlation entries are given by m +(i,j) = C wt;(n)xw(n - i)x,.(n - j). (2) n= -m Applying a window only to the input signal results in the autocorrelation method. The covariance formulation results when only the error signal e(n) is windowed. Note that the autocorrelation method is not suitable for predictors with delays which are an appreciable fraction of the frame size [S]. Indeed, the autocorrelation method does not even guarantee minimum phase filters for general delays. These considerations rule out the use of the autocorrelation method for pitch analysis. For this case, the autocorrelation method guarantees that 1 - F(z) is minimum phase [7]. Although the covariance method does not necessarily give a minimum phase solution, an alternative approach called the modified covariance method is based on residual energy ratios [l], [8] and assures minimum phase. The pitch predictor has a small number of taps, N,. The delays associated with these taps are bunched around a value which corresponds to the estimated pitch period in samples. Its system function is PI z - ~ 1 tap flym ~ ~ + /3*zP"+I) z - ~ + P ~ z - ( ~ + ~ ) + P * Z - ( ~ + I ) 2 taps 3 taps. The pitch lag M is usually updated along with the coefficients. Methods to choose the pitch lag are discussed in [S]. The formant and pitch synthesis filters have transfer functions HF(z) = 1/(1 - F(z)) and HP(z) = 1/(1 - P(z)), respectively. Conventionally, the formant and pitch predictors are connected in cascade. Furthermore, the determination of the predictor coefficients proceeds in two steps. The coefficients of the first predictor are chosen to minimize the energy of the intermediate residual signal. The filter coefficients so determined are then used to form the intermediate residual signal. The coefficients of the second predictor are computed with the aim of minimizing the energy of the final residual signal given the intermediate residual as the input signal. The predictors may be cascaded in either order. The sequential solution gives different sets of formant and pitch coefficients for the two orderings. In addition, the fact that the filters are time varying affects the initial conditions at frame boundaries. Hence, the different orderings of the predictors lead to different output residuals. Experiments show that having the formant filter precede the pitch filter (F-P cascade) renders a higher prediction gain than the reverse ordering (P-F arrangement) [S]. The upcoming experimental results are based on prediction gain as the performance measure. The prediction gain is the average energy of the input signal to the pre-

3 644 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOI. 37. NO. 5. MAY 19x9 dictor divided by the average energy of the prediction error. The prediction gain is an appropriate measure since it assesses the extent to which redundancies are removed by the predictor. Fig. 3. All-zero lattice filter. Joint optimization of the formant and pitch predictors is accomplished in two different ways. First, the two predictors are combined into a single combination filter. Second, the predictors are connected as an F-P cascade with jointly optimized coefficients. A. Cornbination Implementation One method of achieving joint optimization is in a combination implementation in which the formant and pitch predictors are connected in parallel. The filter has two sets of grouped delays corresponding to its formant and pitch portions. The first group of delays ranges from 1 to Nf, while the second group ranges from M to M + N, - 1. This combined filter can be viewed as a single prediction error filter 1 - F(z) - P(z) that processes the speech signal. A combination implementation results in a prediction error filter that has a different impulse response than the overall prediction error filter arising from a cascade connection. The overall impulse response of the combination filter has fewer nonzero coefficients (Nf + N, + 1 for transversal implementation ) than the cascade connection (2Nf + N, + l), although both have only Nf + N, degrees of freedom. A lattice implementation for a combination filter is also considered. In a combination implementation, the coefficients of the transversal filter 1 - F(z) - P(z) are determined by placing a rectangular window on the error signal and using the system of equations given in (1) (covariance method). The coefficient vector c is given by ct = [al,.., a,,, PI,.., ON,,]. An inherent advantage of this configuration is that the formant and pitch predictor coefficients are jointly optimized. However, the combination formulation involves solving a larger system of equations (Nf + N, by Nf + N,) than if the conventional sequential optimization is used. Given a covariance analysis, the main disadvantage is that even the formant part of the synthesis filter can be unstable. Although methods are available to obtain stable formant and pitch synthesis filters individually, the corresponding combination synthesis filter is not easily stabilizable. A general all-zero lattice filter is shown in Fig. 3. A lattice structured combination filter can be obtained by using the Burg method to compute the reflection coefficients [9]. The Burg method guarantees a minimum phase solution. In the Burg method, the error criterion to be minimized is the sum of the squares of the forward and backward residuals [ f, (n ) and b, (n )] at a particular lattice stage. This error term is minimized stage by stage. For a combination filter, the first Nf stages have nonzero reflection coefficients. These are followed by stages with zero-valued reflection coefficients until the Mth stage. The next N, stages correspond to the pitch part of the filter. Note that as in the cascade connection, the determination of the reflection coefficients in the lattice form proceeds sequentially-the derivation of the coefficients of the formant stages does not take into account the later pitch stages. However, the pitch coefficients do take the preceding formant stages into account. Care must be exercised in a lattice implementation to avoid problems when the pitch lag changes. As the pitch lag changes, the pitch stages change location. For some of the stages, the reflection coefficients change suddenly from a nonzero value to a zero value and vice versa. If the pitch lag increases from one frame to another, the backward residual in the lattice (the signals bi(n) in Fig. 3) entering the pitch stages will have been filtered by the old pitch coefficients. This can have a detrimental effect on the performance for filters with pitch taps.' The remedy is to reset, at each frame boundary, the portion of the backward residual after the formant part of the combination lattice filter to delayed versions of the formant predicted backward residual. This is similar to the remedy used for a lattice pitch filter in 151. B. Jointly Optimized Cascade Connection Previous implementations using pitch filters have employed a sequential optimization of the formant and pitch predictors in a cascade connection. Specifically, the formant filter coefficients have been chosen to minimize the intermediate residual signal. This intermediate residual will still have pitch pulses present. The mean-square criterion penalizes the solution for the presence of these relatively high amplitude pulses in the residual. A jointly optimized solution minimizes the mean-square error in the final residual which results after both formant and pitch filtering. This formulation allows the formant filter to ignore pitch pulses in the intermediate residual which will be subsequently removed by the pitch filter. 1) One-Shot Combined Optimization: Under some circumstances, a one-shot combined optimization approach to developing an optimal F-P cascade is possible. The oneshot approach involves solving a linear system of equations and requires no iterations. Consider a frame of samples from n = 0 to n = N - 1. The inputs are the original speech samples s(n - l),.., s ( n - NJ) and samples of the formant predicted residual d(n - M ),..., d(n - M - N, + 1 ). The analysis essentially decouples the pitch filter from the output of the formant filter as depicted 'For formant filters, the same stages are involved in the filtering for each analysis frame. Also, the changes in the reflection coefficients will tend to be less abrupt than those for the pitch stages.

4 KABAL AND RAMACHANDRAN: JOINT OPTIMIZATION OF LINEAR PREDICTORS in Fig. 4. To ensure that samples of the formant predicted residual are available beforehand for the analysis, they must depend only on coefficients from previous frames. This imposes the constraint that the pitch lag M be at least as large as the frame length N. The need to have the value of M at least as large as the frame length is a major limitation to the usefulness of this technique. In addition, the requirement that the intermediate residual signal be available beforehand means that the one-shot combined optimization is not possible for a P-F cascade. For the one-shot combined optimization, the windowed mean-square error is minimized, resulting in a system of equations 9c = a where and., d(n - M - N,, + I)]. (6 The system of equations can be written in partitioned form: Fig. 4. F-P cascade configuration. proach to calculate the combined solution. The iterated combined optimization method uses the intermediate residual signal d(n) from a previous iteration to solve the system of equations. For the first iteration, a conventional formant filter formulation is used (i.e., the pitch coefficients are set to zero). The formant filter is used to filter the input signal in order to produce the intermediate residual signal. The next iteration uses a combined optimization, but based on the intermediate residual from the previous iteration. The updated formant filter is used to generate an updated intermediate residual signal. The iterations continue using the combined solution in this manner until the error no longer decreases significantly. Note that this algorithm does not guarantee that the overall error decreases monotonically. If the error increases during a particular iteration, the process is stopped and the previous solution is kept. Hence, this approach guarantees a mean-square error that is never worse than for the conventional sequential solution. 3) Iterative Sequential Optimization: An alternate it- rh.(13."... +,,(I. N,) : 4d(l, M).. +,,/(l. M + N,, - I ) where the cross-correlation terms are given by 4,rx(i, J ) = N- l n=o x(. - i)y(n - J). The upper left-hand comer of 9(Nf by Nf) has the terms corresponding to a formant filter acting on the input signal. The lower right-hand comer (N,, by N,,) has the terms corresponding to a pitch filter acting on the formant predicted residual. The other two comers of the matrix contain interaction terms which allow for the combined optimization. 2) Iterated Combined Optimization: The limitation that M 1 N can be circumvented by using an iterative ap- erative scheme can be built around a sequential optimization of an F-P cascade. First assume that the formant predictor appears alone or, equivalently, that the pitch predictor coefficients are initially zero. The formant predictor coefficients are chosen to minimize the intermediate residual energy. The pitch predictor coefficients are then found using this intermediate residual signal. At this point, the filter coefficients are those that would be used in a conventional F-P cascade. In the iterated sequential optimization scheme, the formant filter is reoptimized given the previously determined pitch predictor coefficients. The equations to reoptimize the formant filter tak-

5 646 1EE:E TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 17. NO. 5. MAY 1989 ing into account the pitch filter can be viewed as minimizing a weighted mean-square error criterion. The analysis model is shown in Fig. 5. For the case at hand, the weighting filter H (z) is the pitch prediction error filter. The analysis model shows an additional input signal e,, (n ) which captures the initial conditions and allows for the use of zero initial conditions for all filters. In our case, e,,(n) is the output of the pitch filter due to inputs from previous analysis frames. There is an additional window applied to the intermediate residual signal. For a covariance analysis, the residual window w,(n) is normally the same as the error window w,(n). Minimization of the weighted error energy leads to a set of linear equations +c = a, where in this case c is the vector of formant predictor coefficients. The elements cp are given by where +(i, m j) = C w~(n)sil)(n)si')(n),i = -m for 1 5 i, j 5 Nf, (9 m si"(n) = C w,(m)h(n - m)s,,.(m - i). (10) 111 = - m The elements of a are One can note that the effects of the initial conditions due to e,,(n) on the vector a are such that even an autocorrelation solution does not guarantee a stable formant synthesis filter. In the case being studied, the e,,(n) term is due to the relatively long delayed outputs of the pitch filter. It is not appropriate to neglect these for normal analysis frame sizes. Consider a covariance analysis with rectangular residual and error windows (nonzero for n ranging from 0 to N - 1 ) and a causal weighting filter. The form of the signal s"'(n) can then be simplified, s'''(n) = C h(n - m)s(m - i), 0 I n < N.,,I = 0 With the weighted error formulation, the formant filter can take into account the effect of the pitch filter. The iterated sequential optimization method first finds the formant tilter which minimizes the error taking into account the pitch filter. Next, the formant predicted residual is formed. The pitch filter is optimized based on this residual. These two steps are then iterated. At each step. the overall residual energy decreases and eventually approaches a local minimum. Some points on the application of the iterated sequential optimization procedure need elaboration. First, the ordering of the predictors is important. In general. different Fig. 5. Analysis model for a prediction filter with error weighting solutions will be reached for the F-P and P-F cascades.' Second, the initial conditions (starting pitch filter coefficients) affect the solution. We start by solving for a formant filter given a pitch filter with zero valued coefficients. This guarantees results that are at least as good as a conventional sequential solution for an F-P cascade. Third, the procedure is only guaranteed to converge to a local minimum if the optimality criterion is the same for determining both the formant and pitch predictor coefficients.' Fourth, the value of M used for the pitch filter is assumed to be constant during the iteration process. If this parameter is changed during the iteration, the convergence properties may be compromised. IV. FILTER PERFORMANCE This section presents the experimental results obtained for the combination filter and for F-P cascade connections in which the coefficients are cgmputed by the various joint solutions. The open-loop predictor configuration as depicted in Fig. l(a) is used to implement the different algorithms. For the purposes of comparison, covariance analyses are used for both the formant and pitch predictors in the conventional sequential approach. The value of the pitch lag M is chosen such that the prediction gain is maximized. This is accomplished by an exhaustive search over the allowable range and provides an indication of the relative performances of the different approaches by revealing their maximum attainable prediction gains. Later, a practical method of calculating M is combined with the minimum phase constraint for the iterated sequential formulation. The formant predictor has 10 coefficients. One, two, and three tap pitch predictors are used. A total of six sentences were processed of which three were spoken by a male and three by a female. The sampling frequency is 8 khz. The average prediction gains in decibels are computed for each sentence in order to compare the different methods. Since the comparisons are essentially consistent for each sentence, the tables which follow present average values of the prediction gains in decibels taken over the six sentences. The tables refer to the formant gain, meaning the ratio of the energy of the input to the formant filter to the energy of the output of the formant filter. A similar definition holds for the pitch gain. A. Short Frumes Consider first the one-shot combined optimization solution given by (7). In this case, the value of the pitch lag 'ln the experiments. only the F-P cascade is used due to its superiority over its P-F counterpart in a conventional sequential opt~mization. his precludes the use of. say. an autocorrelation analysis for the formant filter and a covariance analysis for the pitch tilter.

6 KABAL AND RAMACHANDRAN: JOINT OPTIMIZATION OF LINEAR PREDICTORS 647 M must be at least as large as the frame length. A short frame length of 40 samples is used. The minimum and maximum values of M are set to 40 and 140. This combination of parameters will allow us to evaluate the degree of suboptimality of the iterated approaches. For the iterative sequential scheme, a prediction gain threshold of 0.02 db or a maximum of 10 iterations was used to terminate the optimization cycle. The resultant F-P cascade formed from the one-shot combined and iterated sequential solutions must outperform the conventional sequentially optimized form for the same analysis conditions. The results are shown in Table I. The combined and iteratively optimized configurations have an improved overall prediction gain. Note that the formant gain drops. However, this drop in formant gain is more than compensated by the increase in pitch gain. The iterated sequential scheme approaches the (optimal) one-shot combined solution in overall prediction gain. However, one can note that this solution tends to have a higher formant gain and a lower pitch gain than the one-shot combined solution. The iterated sequential method seems to find solutions which have essentially the same overall prediction gain, but a different apportioning of the formant and pitch gains than the one-shot combined solution. The iterative sequential approach outperforms the noniterative sequential solution, again showing a tradeoff toward a higher pitch gain and a lower formant gain. We note that in all three methods considered in the table, there is a stability problem for the formant filter as well as for the pitch filter due to the fact that a covariance solution has been used. Comments on the relative stability of the corresponding synthesis filters are given in a later section. B. Longer Frame Sizes Now, consider a larger frame size of 80 samples. Restricting the pitch lag to be above 80 is not appropriate since the pitch period for voiced speech is frequently below 80. This is especially true for female speakers whose pitch period is usually smaller than for male speakers. For these experiments, the pitch lag assumes values between 20 and 120. The combined solution uses the one-shot scheme for pitch lags larger than the frame size and uses the iterated approach for smaller pitch lags. Experiments show that an average of three iterations (0.02 db prediction gain threshold) are needed to resolve the coefficients. Table I1 shows the results for 80 sample frames. The iterated sequential method gives larger prediction gains than the iterated combined optimization method at the expense of a slightly larger average number of iterations (5 per frame). For the iterated combined optimization scheme, the iterations are often terminated by a decrease in overall prediction gain (nonmonotonicity of decrease in error). Combination filters designated as F + P in the table are implemented as both transversal and lattice structures. The transversal form, which takes into account the interaction between the formant part and the pitch part, performs bet- TABLE I PREDICTION GAINS FOR SEQUENTIALLY AND JOINTLY OPTIMIZED PREDICTORS (40 SAMPLE FRAMES). THREE NUMBERS IN AN ENTRY REFER TO I, 2, AND 3 TAP PITCH FILTERS. A COVARIANCE FORMULATION IS USED FOR ALL CASES for~i~aut ~nrthod gain db F-P sr~lucntial 16.7 F-P itrratrd serl~~ential F-P one-shot rornhined TABLE I1 PREDICTION GAINS FOR THE SEQUENTIAL AND JOINTLY OPTIMIZED PREDICTORS (80 SAMPLE FRAMES). THREE NUMBERS IN AN ENTRY REFER TO 1, 2, AND 3 TAP PITCH FILTERS. A COVARIANCE FORMULATION IS USED FOR ALL CASES method F-P sequential F+P transversal F+P lattice F-P iterated romhined F-P iterated sequential fornlant gain db pit rh gain db w<,rall gili11 db ' j i t / <)wrall gaiu db 1 gain db ter than the conventional F-P sequential but is inferior to the iterated combined and iterated sequential algorithms. The use of the transversal combination filter is deprecated since the filter frequently becomes nonminimum phase (more than 70 percent of the frames). Thestable does not show figures for the formant and pitch gains separately for the combination filters. In the transversal form, the prediction gains of the separated components 1 - F(z) and 1 - P(z) tend to be relatively small, and when added fall far short of the overall prediction gain. The lattice form of a combination filter performs more poorly than the transversal form. This can be attributed to the essentially sequential determination of the reflection coefficients and to the error criterion (sum of forward and backward errors) used in the Burg formulation. It can be said in favor of the lattice implementation that the lattice filter does offer an improvement due to pitch prediction and that no stability problems are encountered. However, the lattice combination filter does not realize the full potential for removing the far-sample redundancies. The results show that a joint solution can achieve an overall prediction gain that is about 1.5 db better than the conventional sequential approach. The joint F-P cascade approaches give significantly different solutions than the conventional sequential method as manifested in the lowered formant gain and significantly increased pitch gain. We postpone the investigation of the effects of these differences to a later section in which the stability problems are rectified. The synthesis filter used for a combination implementation in transversal form is HF+ P(~) = 1 /( 1 - F(z) - P ( z) ). The denominator polynomial is of high order and has many nonzero coefficients. Experiments show that the percentage of frames with unstable filters is 70, 72, and 75 percent for filters with 1, 2, or 3 pitch coefficients, respectively. The stability was checked by comparing the

7 648 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 37. NO. 5. MAY 1989 TABLE I11 STABILITY OF SEQUENTIAL AND JOINTLY OPTIMIZED FILTERS FOR 80 SAMPLE FRAMES, COVARIANCE ANALYSIS. THREE NUMBERS IN AN ENTRY REFER TO 1, 2, AND 3 TAP PITCH FILTERS met,hod I formant 1 pit41 F-P sequential F-P iterated sequential F-P iterated combined % unstable '% unstahle magnitudes of the equivalent set of reflection coefficients with unity. Given the high incidence of instability resulting from the covariance solution and the fact that practical methods to stabilize such a filter are not known, the combination filter is not suitable for applications requiring resynthesis of the speech signal. Iterated combined and iterated sequential optimization of an F-P cascade leads to more cases of instability than the conventional sequential solution. Table I11 gives figures for the percentage of frames with unstable synthesis filters for sequential optimization, iterated sequential optimization and iterated combined optimization (80 sample frames). The stability of the formant filter was determined by converting the predictor coefficients obtained to the equivalent reflection coefficients. The stability of the pitch filter was determined by the tight sufficient test given in [6]. The incidence of instability of the formant filters is much lower than that for the pitch filters. The question that is addressed in the next section is whether the increased prediction gain of the joint solutions comes only at the expense of unstable filters. VI. MINIMUM PHASE JOINT SOLUTION In this section, an optimization algorithm for an F-P cascade with both a minimum phase formant filter and a minimum phase pitch filter will be described. This algorithm is based on the iterated sequential technique. The formant filter is determined by taking the pitch filter into account as long as its minimum phase property is not sacrificed. In addition, a practical method to determine the pitch lag is used. The formant and pitch filters are determined as follows. For the first iteration, a covariance approach is used to determine a formant filter. If this filter is not minimum phase, as determined from the magnitudes of the corresponding reflection coefficients, new coefficients are found using a modified covariance approach which guarantees a minimum phase solution. The input signal is filtered to produce the intermediate residual signal. The pitch lag is determined using the method described in Appendix A. This method gives the optimal pitch lag if the near-sample redundancies have been removed by the formant filter. Practical results in a sequential solution for an F-P cascade show a negligible degradation of the prediction gain when compared to an exhaustive search [5]. Moreover, the immense computation associated with performing all the iterations for every pitch lag and selecting the pitch lag that maximizes the prediction gain (exhaustive search) is avoided. The pitch filter coefficients are found using a covariance analysis. If the pitch filter is not minimum phase, it is stabilized by the algorithm developed in [6]. After the first iteration, the pitch lag is fixed at its initially determined value. Subsequent iterations allow for the formant filter to take into account the pitch filter using the analysis derived for the iterated sequential method. Again, a covariance analysis is used. If the formant filter is unstable, one reverts to a modified covariance analysis for the formant filter. Note that the modified covariance analysis does not take into account the effect of the pitch. filter. An intermediate residual is generated. The pitch filter is again determined based on the updated intermediate residual and stabilized if necessary. These iterations continue until the overall gain increases by less than 0.02 db. The formulation given above is based on a common analysis frame for the formant and pitch filter optimization. However, this formulation can be generalized to allow for different formant and pitch frame sizes. Consider a formant analysis frame which is divided into subframes for pitch analysis. in each subframe, the pitch lag and pitch coefficients are optimized. The modification to the iterative procedure involves replacing h(n) in (12) by a time-varying filter. Having different frame sizes for the two parts of the analysis is appropriate in a speech coding context. Good prediction gain is maintained when the formant filter is updated at a slower rate than the pitch filter. Reducing the update rate for the formant filter decreases the side information rate needed to send the predictor coefficients to the decoder. A. Results for a Minimum Phase Joint Optimization The resulting prediction gains for a minimum phase iterated sequential solution are shown in Table IV for 80 sample frames. Also included in the table are the prediction gains if the process is stopped at the first iteration (a pure sequential approach with minimum phase filters). The results show that an increase in overall prediction gain is achieved by the iterative approach even when minimum phase filters are mandated. These results can also be compared to Table I1 to reveal a drop in overall prediction gain due to the use of stabilized filters and a practical method to find the pitch lag. However, the imposition of the above two constraints does not substantially diminish the prediction gain. The iterated sequential approach continues to outperform the conventional sequential method. Moreover, the gain achieved by the constrained iterative approach still remains above that achieved by the unconstrained F-P sequential algorithm which can have both unstable formant and unstable pitch filters (see Table 11). Fig. 6 shows plots of the prediction gains (3 tap pitch filter) for an utterance ("Thieves who rob friends deserve jail") spoken by a female. The prediction gains are calculated for 80 sample frames. The plots show that the largest increases in prediction gain attained by the iterated approach tend to be in the energetic voiced regions. Fig. 7 shows the formant and formantlpitch predicted residuals for a voiced section of the same utterance. The iterated sequential method gives a formant residual with

8 KABAL AND RAMACHANDRAN: JOINT OPTIMIZATION OF LINEAR PREDICTORS 649 TABLE IV PREDICTION GAINS FOR MINIMUM PHASE FILTERS IN AN F-P CASCADE (80 SAMPLE FRAMES). THREE NUMBERS IN AN ENTRY REFER TO 1, 2, AND 3 TAP PITCH FILTERS F-P srqorntial F-P ilrnlsd srwir~~tial ?I C ( )2 Fig. 8. Calculation of the weighted error in a CELP coder. Fig. 6. Prediction gains (3 tap pitch filter), frame by frame. In each of the two lower plots, the upper trace is the overall prediction gain, the middle light trace is the formant prediction gain, and the lower dashed trace is the pitch prediction gain. I I F-P iterated sequential formant/pitch residual 1 I frame number Fig. 7. Prediction residuals (3 tap pitch filter) for a section of voiced speech. Magnification factors are noted beside each plot. larger, but more regular, pitch pulses than the sequential method. The efficacy of the pitch filter is enhanced by these regular pitch pulses. The pitch filter forms a final residual having a smaller magnitude and containing fewer spurious bursts than for the conventional sequential approach. The conventional sequential and iterated sequential approaches (both using stabilized filters and the practical method to choose the pitch lag) were implemented as part of a rudimentary version of a CELP coder. The analysis frame size remains at 80 samples. A 10th-order formant predictor and a 3 tap pitch predictor were used. Forty sample blocks of the residual were compared to a dictionary of 1024 waveforms consisting of Gaussian random numbers with unit variance. This comparison was performed by the system shown in Fig. 8. The codeword that represents the residual is the one that achieves the least weighted error. The error is weighted by a filter W(Z) = ( 1 - F(z))/( 1 - F(z/a)). This filter deemphasizes the frequencies which contribute less to perceptual error and emphasizes the frequencies which contribute more to perceptual error [4]. The noise weighting factor a equals 0.8. Listening tests reveal that the iterated sequential algorithm leads to resynthesized speech that is more natural sounding than the decoded speech resulting from the sequential approach. Also, this speech sounds less warbled than its conventional counterpart. Although the perceived improvement varies from sentence to sentence, this phenomenon is clearly evident in the sentence "Thieves who rob friends deserve jail" and can be attributed to the fact that the residual as depicted in Fig. 7 has fewer spurious peaks when the iterated sequential method is used. The Gaussian codewords provide a better match to the residual when it has less energetic pitch pulses. Consequently, the weighted error is smaller and the decoded speech sounds more like the original. Measurements of the signal-to-noise ratio (SNR) that arises when the original speech is compared to the decoded speech were made. For the sentence "Thieves who rob friends deserve jail," the SNR improves by 2.5 db when the iterated sequential method is used. VII. SUMMARY AND CONCLUSIONS This paper has introduced formulations which allow for the joint optimization of the formant and pitch predictors. The final residual can be directly minimized either by a combination implementation or by a cascade of jointly optimized filters. A combination implementation is viewed as a single prediction error filter that removes both near-sample and distant-sample correlations. Either a transversal or lattice structure can be used. The lattice structure is inferior to the transversal form in terms of prediction gain. Although the combination transversal filter outperforms the conventional F-P sequential solution, it suffers from serious stability problems attributable to the covariance method of solution. For a cascade configuration, a significant advantage in terms of predictor gain exists when the final residual is

9 650 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 37, NO. 5. MAY 1989 directly minimized. The resulting increase in prediction gain comes about by a lowered formant gain that is compensated by a significantly higher pitch gain. However, the limitations of this formulation allow for a one-shot combined solution only for short frame lengths. Iterated schemes can be used to get around the frame length constraints. However, since this approach uses a covariance analysis, it admits both unstable formant pitch synthesis filters. An iterated sequential solution achieves slightly more prediction gain than the iterated combined solution. Furthermore, the minimum phase constraint can be satisfied by backing off to the modified covariance method for the formant filter in cases of nonminimum phase filters and by stabilizing the pitch filter. The advantages of the iterated sequential method over the one-shot F-P sequential method lie not only in achieving a higher prediction gain but also in producing an overall residual that has diminished pitch spikes. Moreover, experiments with a simple CELP coder show that the resynthesized speech is of better perceptual quality. A practical method to determine the pitch lag has been developed in [5] and is briefly described here. The covariance method applied to a pitch filter leads to a linear system of equations (9c = a ) which when written in expanded form is The pitch lag M should be chosen so as to maximize 42 (0, M )/4 (M, M ). The situation for multitap pitch filters is more complex but is simplified on the assumption that the near-sample based redundancies are essentially removed when formant prediction is performed before pitch prediction (F-P cascade). Then, the off-diagonal terms in the matrix ip [as in (A. I)] are small and can be neglected. This leads to a diagonal matrix approximation of the system given in (A. 1) with cta becoming M+Np-1 42(09 m) cta = C - ('4.4 m=m 4(m, m)' The value of M that maximizes this quantity is chosen as the pitch lag. B. S. Atal and M. R. Schroeder, "Predictive coding of speech signals and subjective error criteria," IEEE Trans. Acousr., Speech, Signal Processing, vol. ASSP-27, pp , June J. Makhoul and M. Berouti, "Adaptive noise spectral shaping and entropy coding of speech," IEEE Trans. Acoust., Speech. Signal Processing, vol. ASSP-27, pp , Feb N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice-Hall, M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): High-quality speech at very low bit rates," in Proc. IEEE Int. Con$ Acoust., Speech, Signal Processing, Tampa, FL, Mar. 1985, pp R. P. Ramachandran and P. Kabal, "Pitch prediction filters in speech coding," IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp , Apr , "Stability and performance analysis of pitch filters in speech cod- where 4 (i, j ) is the correlation for the input to the pitch filter ers, " IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp , July [7] S. W. Lang and J. H. McClellan, "A simple proof of stability for all- N- 1 pole ~inea~prediction models," Proc. IEEE; vol. 67, pp , +(i, j) = 2 d(n - i) d(n - j). (A'2) May n=o Then, c and the resulting mean-squared error is E2 = 4 (0, 0) - cta. The value of the pitch lag M should be chosen so as to maximize cta. [8] 8. W. Diekinson, "Autoregressive estimation using residual energy ratios," IEEE Trans. inform. Theory, vol. IT-24, PP , JU~Y [9] J. Makhoul, "Stable and efficient lattice methods for linear prediction," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP- 25, pp , Oct (A.3) Ravi P. Ramachandran, for a photograph and biography, see p. 478 of the April 1989 issue of this TRANSACTIONS.

Pitch Prediction Filters in Speech Coding

Pitch Prediction Filters in Speech Coding IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING. VOL. 37, NO. 4, APRIL 1989 Pitch Prediction Filters in Speech Coding RAVI P. RAMACHANDRAN AND PETER KABAL Abstract-Prediction error filters

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

The Equivalence of ADPCM and CELP Coding

The Equivalence of ADPCM and CELP Coding The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Stability and Performance Analysis of Pitch Filters in Speech Coders

Stability and Performance Analysis of Pitch Filters in Speech Coders IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-35, NO. 7, JULY 1987 931 Stability and Performance Analysis of Pitch Filters in Speech Coders RAVI P. RAMACHANDRAN AND PETER KABAL

More information

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007 Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

L used in various speech coding applications for representing

L used in various speech coding applications for representing IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 1, NO. 1. JANUARY 1993 3 Efficient Vector Quantization of LPC Parameters at 24 BitsFrame Kuldip K. Paliwal, Member, IEEE, and Bishnu S. Atal, Fellow,

More information

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION

SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION SCELP: LOW DELAY AUDIO CODING WITH NOISE SHAPING BASED ON SPHERICAL VECTOR QUANTIZATION Hauke Krüger and Peter Vary Institute of Communication Systems and Data Processing RWTH Aachen University, Templergraben

More information

NOISE reduction is an important fundamental signal

NOISE reduction is an important fundamental signal 1526 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 5, JULY 2012 Non-Causal Time-Domain Filters for Single-Channel Noise Reduction Jesper Rindom Jensen, Student Member, IEEE,

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Source modeling (block processing)

Source modeling (block processing) Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured

More information

AN ALTERNATIVE FEEDBACK STRUCTURE FOR THE ADAPTIVE ACTIVE CONTROL OF PERIODIC AND TIME-VARYING PERIODIC DISTURBANCES

AN ALTERNATIVE FEEDBACK STRUCTURE FOR THE ADAPTIVE ACTIVE CONTROL OF PERIODIC AND TIME-VARYING PERIODIC DISTURBANCES Journal of Sound and Vibration (1998) 21(4), 517527 AN ALTERNATIVE FEEDBACK STRUCTURE FOR THE ADAPTIVE ACTIVE CONTROL OF PERIODIC AND TIME-VARYING PERIODIC DISTURBANCES M. BOUCHARD Mechanical Engineering

More information

Design of a CELP coder and analysis of various quantization techniques

Design of a CELP coder and analysis of various quantization techniques EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Pulse-Code Modulation (PCM) :

Pulse-Code Modulation (PCM) : PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number of bits used to represent each sample. The rate from

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator 1 Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator Israel Cohen Lamar Signal Processing Ltd. P.O.Box 573, Yokneam Ilit 20692, Israel E-mail: icohen@lamar.co.il

More information

Multimedia Communications. Differential Coding

Multimedia Communications. Differential Coding Multimedia Communications Differential Coding Differential Coding In many sources, the source output does not change a great deal from one sample to the next. This means that both the dynamic range and

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Optimal Polynomial Control for Discrete-Time Systems

Optimal Polynomial Control for Discrete-Time Systems 1 Optimal Polynomial Control for Discrete-Time Systems Prof Guy Beale Electrical and Computer Engineering Department George Mason University Fairfax, Virginia Correspondence concerning this paper should

More information

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction

Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Downloaded from vbnaaudk on: januar 12, 2019 Aalborg Universitet Re-estimation of Linear Predictive Parameters in Sparse Linear Prediction Giacobello, Daniele; Murthi, Manohar N; Christensen, Mads Græsbøll;

More information

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 12, NO 11, NOVEMBER 2002 957 Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression Markus Flierl, Student

More information

Image Compression using DPCM with LMS Algorithm

Image Compression using DPCM with LMS Algorithm Image Compression using DPCM with LMS Algorithm Reenu Sharma, Abhay Khedkar SRCEM, Banmore -----------------------------------------------------------------****---------------------------------------------------------------

More information

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801 Rate-Distortion Based Temporal Filtering for Video Compression Onur G. Guleryuz?, Michael T. Orchard y? University of Illinois at Urbana-Champaign Beckman Institute, 45 N. Mathews Ave., Urbana, IL 68 y

More information

MANY digital speech communication applications, e.g.,

MANY digital speech communication applications, e.g., 406 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 2, FEBRUARY 2007 An MMSE Estimator for Speech Enhancement Under a Combined Stochastic Deterministic Speech Model Richard C.

More information

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING 5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).

More information

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada February 2003 c 2003 Peter Kabal 2003/02/25

More information

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose ON SCALABLE CODING OF HIDDEN MARKOV SOURCES Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa Barbara, CA, 93106

More information

AUTOREGRESSIVE (AR) modeling identifies and exploits

AUTOREGRESSIVE (AR) modeling identifies and exploits IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007 5237 Autoregressive Modeling of Temporal Envelopes Marios Athineos, Student Member, IEEE, and Daniel P. W. Ellis, Senior Member, IEEE

More information

Analysis of methods for speech signals quantization

Analysis of methods for speech signals quantization INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs

More information

T ulation and demodulation. It has applications not only

T ulation and demodulation. It has applications not only IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT. VOL 39. NO 6. DECEMBER 1990 847 Optimal FIR and IIR Hilbert Transformer Design Via LS and Minimax Fitting ISTVAN KOLLAR, MEMBER, IEEE, RIK PINTELON,

More information

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition Review of Quantization UMCP ENEE631 Slides (created by M.Wu 004) Quantization UMCP ENEE631 Slides (created by M.Wu 001/004) L-level Quantization Minimize errors for this lossy process What L values to

More information

PREDICTIVE quantization is one of the most widely-used

PREDICTIVE quantization is one of the most widely-used 618 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 1, NO. 4, DECEMBER 2007 Robust Predictive Quantization: Analysis and Design Via Convex Optimization Alyson K. Fletcher, Member, IEEE, Sundeep

More information

On Optimal Coding of Hidden Markov Sources

On Optimal Coding of Hidden Markov Sources 2014 Data Compression Conference On Optimal Coding of Hidden Markov Sources Mehdi Salehifar, Emrah Akyol, Kumar Viswanatha, and Kenneth Rose Department of Electrical and Computer Engineering University

More information

6 Quantization of Discrete Time Signals

6 Quantization of Discrete Time Signals Ramachandran, R.P. Quantization of Discrete Time Signals Digital Signal Processing Handboo Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c 1999byCRCPressLLC 6 Quantization

More information

Multimedia Communications. Scalar Quantization

Multimedia Communications. Scalar Quantization Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Noise Reduction in Oversampled Filter Banks Using Predictive Quantization

Noise Reduction in Oversampled Filter Banks Using Predictive Quantization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 47, NO 1, JANUARY 2001 155 Noise Reduction in Oversampled Filter Banks Using Predictive Quantization Helmut Bölcskei, Member, IEEE, Franz Hlawatsch, Member,

More information

window operator 2N N orthogonal transform N N scalar quantizers

window operator 2N N orthogonal transform N N scalar quantizers Lapped Orthogonal Vector Quantization Henrique S. Malvar PictureTel Corporation 222 Rosewood Drive, M/S 635 Danvers, MA 1923 Tel: (58) 623-4394 Email: malvar@pictel.com Gary J. Sullivan PictureTel Corporation

More information

COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING. Johannes Fischer * and Tom Bäckström * Fraunhofer IIS, Am Wolfsmantel 33, Erlangen, Germany

COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING. Johannes Fischer * and Tom Bäckström * Fraunhofer IIS, Am Wolfsmantel 33, Erlangen, Germany COMPARISON OF WINDOWING SCHEMES FOR SPEECH CODING Johannes Fischer * and Tom Bäcström * * International Audio Laboratories Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg (FAU) Fraunhofer IIS,

More information

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Linear prediction analysis is used to obtain an eleventh-order all-pole model for a segment

More information

David Weenink. First semester 2007

David Weenink. First semester 2007 Institute of Phonetic Sciences University of Amsterdam First semester 2007 Digital s What is a digital filter? An algorithm that calculates with sample values Formant /machine H 1 (z) that: Given input

More information

Closed-Form Design of Maximally Flat IIR Half-Band Filters

Closed-Form Design of Maximally Flat IIR Half-Band Filters IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 6, JUNE 2002 409 Closed-Form Design of Maximally Flat IIR Half-B Filters Xi Zhang, Senior Member, IEEE,

More information

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS Gary A. Ybarra and S.T. Alexander Center for Communications and Signal Processing Electrical and Computer Engineering Department North

More information

{-30 - j12~, 0.44~ 5 w 5 T,

{-30 - j12~, 0.44~ 5 w 5 T, 364 B E TRANSACTONS ON CRCUTS AND SYSTEMS-t ANALOG AND DGTAL SGNAL PROCESSNG, VOL. 41, NO. 5, MAY 1994 Example 2: Consider a non-symmetric low-pass complex FR log filter specified by - j12~, -T w 5-0.24~,

More information

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise 334 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL 11, NO 4, JULY 2003 A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Yi Hu, Student Member, IEEE, and Philipos C

More information

3GPP TS V6.1.1 ( )

3GPP TS V6.1.1 ( ) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB)

More information

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis EIGENFILTERS FOR SIGNAL CANCELLATION Sunil Bharitkar and Chris Kyriakakis Immersive Audio Laboratory University of Southern California Los Angeles. CA 9. USA Phone:+1-13-7- Fax:+1-13-7-51, Email:ckyriak@imsc.edu.edu,bharitka@sipi.usc.edu

More information

Efficient Use Of Sparse Adaptive Filters

Efficient Use Of Sparse Adaptive Filters Efficient Use Of Sparse Adaptive Filters Andy W.H. Khong and Patrick A. Naylor Department of Electrical and Electronic Engineering, Imperial College ondon Email: {andy.khong, p.naylor}@imperial.ac.uk Abstract

More information

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS Muhammad Tahir AKHTAR

More information

BASIC COMPRESSION TECHNIQUES

BASIC COMPRESSION TECHNIQUES BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc

More information

ADAPTIVE signal processing algorithms (ASPA s) are

ADAPTIVE signal processing algorithms (ASPA s) are IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 46, NO. 12, DECEMBER 1998 3315 Locally Optimum Adaptive Signal Processing Algorithms George V. Moustakides Abstract We propose a new analytic method for comparing

More information

ETSI TS V ( )

ETSI TS V ( ) TS 146 060 V14.0.0 (2017-04) TECHNICAL SPECIFICATION Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full Rate (EFR) speech transcoding (3GPP TS 46.060 version 14.0.0 Release 14)

More information

AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS. Sai Han and Tim Fingscheidt

AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS. Sai Han and Tim Fingscheidt AN IMPROVED ADPCM DECODER BY ADAPTIVELY CONTROLLED QUANTIZATION INTERVAL CENTROIDS Sai Han and Tim Fingscheidt Institute for Communications Technology, Technische Universität Braunschweig Schleinitzstr.

More information

ETSI TS V5.0.0 ( )

ETSI TS V5.0.0 ( ) Technical Specification Universal Mobile Telecommunications System (UMTS); AMR speech Codec; Transcoding Functions () 1 Reference RTS/TSGS-046090v500 Keywords UMTS 650 Route des Lucioles F-0691 Sophia

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

PARAMETRIC coding has proven to be very effective

PARAMETRIC coding has proven to be very effective 966 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 High-Resolution Spherical Quantization of Sinusoidal Parameters Pim Korten, Jesper Jensen, and Richard Heusdens

More information

A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces

A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 411 A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces Paul M. Baggenstoss, Member, IEEE

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

A Family of Nyquist Filters Based on Generalized Raised-Cosine Spectra

A Family of Nyquist Filters Based on Generalized Raised-Cosine Spectra Proc. Biennial Symp. Commun. (Kingston, Ont.), pp. 3-35, June 99 A Family of Nyquist Filters Based on Generalized Raised-Cosine Spectra Nader Sheikholeslami Peter Kabal Department of Electrical Engineering

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay Yu Gong, Xia Hong and Khalid F. Abu-Salim School of Systems Engineering The University of Reading, Reading RG6 6AY, UK E-mail: {y.gong,x.hong,k.f.abusalem}@reading.ac.uk

More information

ETSI EN V7.1.1 ( )

ETSI EN V7.1.1 ( ) European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Adaptive Multi-Rate (AMR) speech transcoding GLOBAL SYSTEM FOR MOBILE COMMUNICATIONS R Reference DEN/SMG-110690Q7

More information

HARMONIC VECTOR QUANTIZATION

HARMONIC VECTOR QUANTIZATION HARMONIC VECTOR QUANTIZATION Volodya Grancharov, Sigurdur Sverrisson, Erik Norvell, Tomas Toftgård, Jonas Svedberg, and Harald Pobloth SMN, Ericsson Research, Ericsson AB 64 8, Stockholm, Sweden ABSTRACT

More information

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l Vector Quantization Encoder Decoder Original Image Form image Vectors X Minimize distortion k k Table X^ k Channel d(x, X^ Look-up i ) X may be a block of l m image or X=( r, g, b ), or a block of DCT

More information

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem

More information

Chapter 10 Applications in Communications

Chapter 10 Applications in Communications Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM

More information

Tracking the Instability in the FTF Algorithm

Tracking the Instability in the FTF Algorithm Tracking the Instability in the FTF Algorithm Richard C. LeBorne Tech. Universität Hamburg-Harburg Arbeitsbereich Mathematik Schwarzenbergstraße 95 2173 Hamburg, Germany leborne@tu-harburg.de Ian K. Proudler

More information

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement Patrick J. Wolfe Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK pjw47@eng.cam.ac.uk Simon J. Godsill

More information

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Flierl and Girod: Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms, IEEE DCC, Mar. 007. Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Markus Flierl and Bernd Girod Max

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Lloyd-Max Quantization of Correlated Processes: How to Obtain Gains by Receiver-Sided Time-Variant Codebooks

Lloyd-Max Quantization of Correlated Processes: How to Obtain Gains by Receiver-Sided Time-Variant Codebooks Lloyd-Max Quantization of Correlated Processes: How to Obtain Gains by Receiver-Sided Time-Variant Codebooks Sai Han and Tim Fingscheidt Institute for Communications Technology, Technische Universität

More information

Filter Banks for Image Coding. Ilangko Balasingham and Tor A. Ramstad

Filter Banks for Image Coding. Ilangko Balasingham and Tor A. Ramstad Nonuniform Nonunitary Perfect Reconstruction Filter Banks for Image Coding Ilangko Balasingham Tor A Ramstad Department of Telecommunications, Norwegian Institute of Technology, 7 Trondheim, Norway Email:

More information

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

More information

Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L.

Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L. Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L. Cheded 2 1 Computer Engineering Department, 2 Systems Engineering

More information

Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, and Anupam Goyal III. ASSOCIATIVE RECALL BY A POLYNOMIAL MAPPING

Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, and Anupam Goyal III. ASSOCIATIVE RECALL BY A POLYNOMIAL MAPPING Synthesis of a Nonrecurrent Associative Memory Model Based on a Nonlinear Transformation in the Spectral Domain p. 1 Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, Anupam Goyal Abstract -

More information

798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 10, OCTOBER 1997

798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 10, OCTOBER 1997 798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL 44, NO 10, OCTOBER 1997 Stochastic Analysis of the Modulator Differential Pulse Code Modulator Rajesh Sharma,

More information

Predictive Coding. Prediction Prediction in Images

Predictive Coding. Prediction Prediction in Images Prediction Prediction in Images Predictive Coding Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

Predictive Coding. Prediction

Predictive Coding. Prediction Predictive Coding Prediction Prediction in Images Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms International Journal of Electronics and Communication Engineering. ISSN 0974-66 Volume 4, Number (0), pp.83-94 International Research Publication House http://www.irphouse.com Transformation Techniques

More information

FIXED-INTERVAL smoothers (see [1] [10]) can outperform

FIXED-INTERVAL smoothers (see [1] [10]) can outperform IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 54, NO 3, MARCH 2006 1069 Optimal Robust Noncausal Filter Formulations Garry A Einicke, Senior Member, IEEE Abstract The paper describes an optimal minimum-variance

More information

Application of the Bispectrum to Glottal Pulse Analysis

Application of the Bispectrum to Glottal Pulse Analysis ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Selective Use Of Multiple Entropy Models In Audio Coding

Selective Use Of Multiple Entropy Models In Audio Coding Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple

More information

Soft-Output Trellis Waveform Coding

Soft-Output Trellis Waveform Coding Soft-Output Trellis Waveform Coding Tariq Haddad and Abbas Yongaçoḡlu School of Information Technology and Engineering, University of Ottawa Ottawa, Ontario, K1N 6N5, Canada Fax: +1 (613) 562 5175 thaddad@site.uottawa.ca

More information

Quantization 2.1 QUANTIZATION AND THE SOURCE ENCODER

Quantization 2.1 QUANTIZATION AND THE SOURCE ENCODER 2 Quantization After the introduction to image and video compression presented in Chapter 1, we now address several fundamental aspects of image and video compression in the remaining chapters of Section

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels

Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels IEEE TRANSACTIONS ON AUTOMATIC CONTROL 1 Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels Lei Bao, Member, IEEE, Mikael Skoglund, Senior Member, IEEE, and Karl Henrik Johansson,

More information