EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z) Random Noise Generator Block Diagram of simplified model of speech production Figure 1: Sections 0.4 and 0.5 contain the Lab Experiment and Lab Report needed. 0.1 Basic Principles of Linear Predictive Analysis The basic discrete-time model of speech production is shown above. The composite spectral effects of radiation, vocal tract and glottal excitation are represented by a time-varying digital filter. For short periods when parameters are considered stationary, we have a time-invariant system. The steady-state transfer function H(z) of the filter part of the model is modeled as, H(z) = S(z) U(z) = G (1) 1 a 1 z 1) a 2 z 2 a 3 z 3...a p z p The vocal tract system is excited by signal u[n], which will be an impulse train for voiced speech or random noise for unvoiced speech. Thus, the parameters of this speech model are: voiced-unvoiced classification, pitch period for voiced speech, gain parameter G and the coefficients {a k } of the filter. These are the parameters that are transmitted in coded speech. There are many methods for estimation of pitch period and voiced/unvoiced classification. They are not discussed here and actually are not implemented in this Demo. What is implemented is a method for determining filter coefficients ( lattice filter coefficients, referred to as reflection coefficients). It is these filter coefficients that are transmitted along with a residual signal instead of the parameters referred to above. We consider the simplified all-pole model of Figure 1, equation (1) as the natural representation of nonnasal voiced sounds. (For nasals and fricatives, the acoustic theory calls for both poles and zeros in the
vocal tract transfer function H(z)). Actually, if the filter order p is high enough, the all-pole model provides a fairly good representation for almost all the sounds of the speech. The major advantage of the all-pole model is that the gain parameter G and the filter coefficients a k can be estimated in a straightforward and computationally efficient way using the method of linear predictive analysis. 0.2 Linear Predicton Analysis & Synthesis Filters We assume that speech is modeled as shown in Figure 1. The speech s(n) is related to excitation u(n) by s[n] = a k s[n k]+gu[n] (2) To obtain model coefficients, we resort to the following: Assume that you are trying to predict signal s[n] at time n from previous values at times n 1,n 2,...etc.. A linear predictor with prediction coefficients α k is defined as a system whose output is s[n] = α k s[n k] (3) The transfer function of the p th order linear predictor of equation (3) is the polynomial P (z) = α k z k The prediction error e(n) is defined as e[n] =s[n] s[n] =s[n] α k s[n k] (4) Equivalently, where E(z) =A(z)S(z) A(z) =1 α k z k Comparing equations (2) and (4) it is seen that when the speech signal obeys the model of (2) exactly, then α k = a k exactly. Then e[n] =Gu[n] ande(z) =GU(z). Thus the prediction error filter A(z) will be the inverse filter of the system H(z) of(1).thatis, E(z) =GU(z) =A(z)S(z) Hence, H(z) = S(z) U(z) = G A(z) So we have A(z), the analysis filter and H(z), the synthesis filter. The basic problem of linear prediction analysis is to determine the set of predictor coefficients coefficients α k directly from the speech signal. Because of the non-stationary nature of speech, coefficients are determined for short segments of the speech where the signal is considered approximately stationary. These are found through a minimization of the mean-square prediction error. The resulting parameters are then assumed to be the parameters of the system function H(z) which is then used for the synthesis of that speech segment. The method of determining these coeffcients is outlined below.
0.3 Minimum Mean-Square Error and the Orthogonality Principle We consider the linear prediction problem of equation (3) as predicting a random variable from a set of other random variables. Given RVs (x 1, x 2,...,x n ) we wish to find n constants a 1,a 2,a 3,...,a n such that we form a linear estimate of a random variable s by the sum of RVs ŝ = a 1 x 1 + a 2 x 2 +...,+a n x n. (5) This is typically done by assuring that the the mean-square value P = E{ s (a 1 x 1 + a 2 x 2 +...,+x n ) 2 } of the resulting error ɛ = s ŝ = s (a 1 x 1 + a 2 x 2 +...,+x n ) is minimum. We do this by setting P = E{2[s (a 1 x 1 + a 2 x 2 +...,+a n x n )]( x i )} =0 (6) a i which yields the so-called Yule Walker equations: Setting i =1, 2,...,n in equation (6) we get R 11 a 1 + R 12 a 2 +... + R 1n a n = R 01 R 21 a 1 + R 22 a 2 +... + R 2n a n = R 02 R 31 a 1 + R 32 a 2 +... + R 3n a n = R 03... R n1 a 1 + R n2 a 2 +... + R nn a n = R 0n (7) where R ji = E{x i x j } R 0j = E{sx j } If the data x i are linearly independent then the determinant of the coefficients R ij is positive. Equation (7) is solved for the unknown coefficients a k,k =1, 2,...n (α k on the previous page) by using the so-called Levinson-Durbin algorithm. Accordingly, the problem essentially consists of determining, for a short segment of speech, the matrix of correlation coefficients R i,j and then inverting the matrix to obtain the prediction coefficients which are then transmitted. All this often has to be done in real-time. 0.4 MATLAB LPC DEMO Run the Demo as per instructions in Lab 9. Demo Decsription The demo consists of two parts; analysis and synthesis. The analysis portion is found in the transmitter section of the system. Analysis Section: In this simulation, the speech signal is divided into frames of size 20 ms (160 samples), with an overlap of 10 ms (80 samples). Each frame is windowed using a Hamming window. The original speech signal is passed
through an analysis filter, which is an all-zero filter. It is a so-called lattice filter with coefficients referred to as reflection coefficients obtained in the previous step. The output of the filter is called the residual signal. This is what is transmitted here along with the filter coefficients. Here, the analysis section output is simply connected to the synthesis portion. Synthesis Section: This residual signal is passed through a synthesis filter which is the inverse of the analysis filter. The output of the synthesis filter is the original signal. 0.5 LAB REPORT Give a brief description of what exactly is happening in the analysis and synthesis portion of the MATLAB LPC speech analysis and synthesis Demo. Observe the residual signal and filter coefficients generated in the Analysis section that are then transmitted to the synthesis section. Figure 2: Ref: MATLAB Help, Linear Predicting & Coding of Speech.
Class notes:mirchand/ee276-2003