Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Similar documents
Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

SPEECH ANALYSIS AND SYNTHESIS

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions

Lab 9a. Linear Predictive Coding for Speech Processing

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

L7: Linear prediction of speech

M. Hasegawa-Johnson. DRAFT COPY.

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Signal representations: Cepstrum

Lecture 19 IIR Filters

Feature extraction 2

Voiced Speech. Unvoiced Speech

Damped Oscillators (revisited)

GATE EE Topic wise Questions SIGNALS & SYSTEMS

representation of speech

Statistical and Adaptive Signal Processing

Formant Analysis using LPC

Linear Prediction 1 / 41

Applications of Linear Prediction

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

CHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME

ECE 438 Exam 2 Solutions, 11/08/2006.

Time-domain representations

Thursday, October 29, LPC Analysis

Speech Signal Representations

Chapter 2 Speech Production Model

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

David Weenink. First semester 2007

E : Lecture 1 Introduction

DISCRETE-TIME SIGNAL PROCESSING

CS578- Speech Signal Processing

( ) John A. Quinn Lecture. ESE 531: Digital Signal Processing. Lecture Outline. Frequency Response of LTI System. Example: Zero on Real Axis

Multimedia Communications. Differential Coding

Chapter 3 HW Solution

Analysis of Finite Wordlength Effects

Recursive, Infinite Impulse Response (IIR) Digital Filters:

4.2 Acoustics of Speech Production

Let H(z) = P(z)/Q(z) be the system function of a rational form. Let us represent both P(z) and Q(z) as polynomials of z (not z -1 )

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Chapter 10 Applications in Communications

Pitch Prediction Filters in Speech Coding

EE482: Digital Signal Processing Applications

Z Transform (Part - II)

Discrete Time Systems

LPC methods are the most widely used in. recognition, speaker recognition and verification

LAB 6: FIR Filter Design Summer 2011

Chapter 5 THE APPLICATION OF THE Z TRANSFORM. 5.3 Stability

Adaptive Systems Homework Assignment 1

LPC and Vector Quantization

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF INFORMATION TECHNOLOGY. Academic Year

(i) Represent discrete-time signals using transform. (ii) Understand the relationship between transform and discrete-time Fourier transform

5 Kalman filters. 5.1 Scalar Kalman filter. Unit delay Signal model. System model

ECE4270 Fundamentals of DSP Lecture 20. Fixed-Point Arithmetic in FIR and IIR Filters (part I) Overview of Lecture. Overflow. FIR Digital Filter

On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive speech coder

Class of waveform coders can be represented in this manner

ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH

Parametric Method Based PSD Estimation using Gaussian Window

IT DIGITAL SIGNAL PROCESSING (2013 regulation) UNIT-1 SIGNALS AND SYSTEMS PART-A

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

Z - Transform. It offers the techniques for digital filter design and frequency analysis of digital signals.

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

EEL3135: Homework #4

2. Typical Discrete-Time Systems All-Pass Systems (5.5) 2.2. Minimum-Phase Systems (5.6) 2.3. Generalized Linear-Phase Systems (5.

ELEG 5173L Digital Signal Processing Ch. 5 Digital Filters

Wiener Filtering. EE264: Lecture 12

Signals and Systems. Problem Set: The z-transform and DT Fourier Transform

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.

DIGITAL SIGNAL PROCESSING. Chapter 3 z-transform

Optimum Ordering and Pole-Zero Pairing of the Cascade Form IIR. Digital Filter

Poles and Zeros in z-plane

Digital Signal Processing

Need for transformation?

EE123 Digital Signal Processing. M. Lustig, EECS UC Berkeley

Lecture 2 OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE

CMPT 889: Lecture 5 Filters

Digital Filters. Linearity and Time Invariance. Linear Time-Invariant (LTI) Filters: CMPT 889: Lecture 5 Filters

We have shown earlier that the 1st-order lowpass transfer function

Practical Spectral Estimation

6. Methods for Rational Spectra It is assumed that signals have rational spectra m k= m

Part III Spectrum Estimation

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

How to manipulate Frequencies in Discrete-time Domain? Two Main Approaches

EE 313 Linear Signals & Systems (Fall 2018) Solution Set for Homework #7 on Infinite Impulse Response (IIR) Filters CORRECTED

Chapter Intended Learning Outcomes: (i) Understanding the relationship between transform and the Fourier transform for discrete-time signals

NEW STEIGLITZ-McBRIDE ADAPTIVE LATTICE NOTCH FILTERS

Ch. 7: Z-transform Reading

3GPP TS V6.1.1 ( )

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Solutions: Homework Set # 5

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

ELEG 305: Digital Signal Processing

Design of a CELP coder and analysis of various quantization techniques

ELEG 305: Digital Signal Processing

Digital Signal Processing:

Discrete-time signals and systems

Signals and Systems. Spring Room 324, Geology Palace, ,

ELEN E4810: Digital Signal Processing Topic 2: Time domain

Lab 4: Quantization, Oversampling, and Noise Shaping

Transcription:

Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Linear prediction analysis is used to obtain an eleventh-order all-pole model for a segment of voiced speech that was sampled at a rate of F S = 8000 samples/second. The system function of the model is: H(z) = G A(z) = G G = 11 11 1 α k z k (1 z i z 1 ) k=1 Table 1 shows five of the roots of the eleventh-order prediction error filter, A(z). i z i z i (rad) 1 0.2567 2.0677 2 0.9681 1.4402 3 0.9850 0.2750 4 0.8647 2.0036 5 0.9590 2.4162 Table 1: Root locations of eleventh-order prediction error filter in the z-plane. (a) Determine where the other six zeros of A(z) are located in the z-plane. If you cannot precisely determine the pole locations, explain where the pole might occur in the z-plane. (b) Estimate the first three formant frequencies (in Hz) for this segment of speech. (c) Which of the first three formant resonances has the smallest bandwidth? How is this determined? (d) Plot and label the frequency response due to just the first three formants of the all-pole model for the (analog) frequency range 0 f F S /2. Solution (a) Since the five roots of Table 1 are all complex, 5 of the remaining roots occur at the complex conjugate locations. The remaining root must be real and occurs on the z-axis and is generally inside the unit circle. (b) The locations of the formants are determined by the complex roots that are closest to the unit circle, namely with magnitudes close to 1.0. If we order these roots we can estimate the formant frequencies by multiplying the root angle (in radians) by the sampling rate and then dividing by the factor 2π to convert from radians to analog frequency. Using this method we obtain the following formant estimates: i=1 1

1. F 1 = (z 3 ) F S /(2π) = 0.275 8000/(2π) = 350 Hz 2. F 2 = (z 2 ) F S /(2π) = 1.4402 8000/(2π) = 1834 Hz 3. F 3 = (z 5 ) F S /(2π) = 2.4162 8000/(2π) = 3076 Hz There is a choice of the fourth or fifth root as the best estimate of the third formant. However since the fifth root is much closer to the unit circle ( z 5 = 0.9590) than the fourth root ( z 4 = 0.8647), the fifth root is a better candidate for the third formant. (c) The formant with the smallest bandwidth is the pole which is closest to the unit circle, namely the first formant which has a magnitude of 0.9850. (d) Figure 1 shows a plot of the log magnitude spectrum of the cascade of the first three resonances at the complex pole locations chosen for the three formants, with a transfer function: H(z) = 3 1 2r i cos(θ i ) + ri 2 1 2r i cos(θ i )z 1 + ri 2z 2 i=1 where r i and θ i are the formant magnitudes and angles (in radians) for the first three formants. Figure 1: Plot of log magnitude response of system consisting of 3 formant resonances. Problem 2 An unvoiced speech signal segment can be modeled as a segment of a stationary random process of the form: x[n] = w[n] + βw[n 1] where w[n] is a zero mean, unit variance, stationary white noise process and β < 1. (a) What are the mean and variance of x[n]? (b) What system can be used to recover w[n] from x[n]? 2

(c) What is the normalized autocorrelation of x[n] at a delay of 1 sample, i.e., what is r x [1] = R x[1] R x [0]? Solution (a) The mean and variance are calculated as: x[n] = w[n] + bw[n 1] = 0 (since w[n] is a zero-mean process) x 2 [n] = (w[n] + βw[n 1]) 2 = w 2 [n] + 2βw[n] w[n 1] + β 2 w 2 [n 1] = σ 2 w + β 2 σ 2 w = (1 + β 2 )σ 2 w = (1 + β 2 ) (b) The transfer function from w[n] to x[n] is: H 1 (z) = X(z) W (z) = 1 + βz 1 To recover w[n] we need to s x[n] through the inverse system, of the form: (c) The correlation of x[n] is: R x [k] = E{x[n]x[n + k]} H 2 (z) = W (z) X(z) = 1 1 + βz 1 = E{(w[n] + βw[n 1]) (w[n + k] + βw[n + k 1])} = E{w[n]w[n + k]} + βe{w[n 1]w[n + k] + w[n]w[n + k 1]} + β 2 E{w[n 1]w[n + k 1]} = δ[k] + βδ[k 1] + βδ[k + 1] + β 2 δ[k] = (1 + β 2 )δ[k] + βδ[k 1] + βδ[k + 1] giving the final result: r x [1] = R x[1] R x [0] = β 1 + β 2 Problem 3 A causal LTI system has system function: H(z) = 1 4z 1 1 0.25z 1 0.75z 2 0.875z 3 (a) Use the Levinson recursion to determine whether or not the system is stable. (b) Is the system minimum phase? Solution 3

To determine stability, we need to find out if the denominator polynomial has all of its roots inside the unit circle. Consider the denominator polynomial: A(z) = 1 0.25z 1 0.75z 2 0.875z 3 = 1 α (3) 1 z 1 α (3) 2 z 2 α (3) 3 z 3 to be a prediction error filter; therefore we can convert from a i to k i using the backward iteration: k i = a (i) i i = p, p 1,..., 1 α (i 1) j = [α (i) j + α (i) i α (i) i j ]/(1 k2 i ) 1 j i 1 (a) Thus we compute the values of k i using the backward iteration: k 3 = α (3) 3 = 0.875 α (2) 1 = [α (3) 1 + α (3) 3 α(3) 2 ]/(1 k2 3) = 0.25 + (0.875)(0.75) 1 (0.875) 2 = 3.84 α (2) 2 = [α (3) 2 + α (3) 3 α(3) 1 ]/(1 0.75 + (0.875)(0.25) k2 3) = 1 (0.875) 2 = 4.13 k 2 = α (2) 2 = 4.13 α (1) 1 = [α (2) 1 + α (2) 2 α(2) 1 ]/(1 k2 2) = k 1 = α (1) 1 = 1.23 3.84 + (4.13)(3.84) 1 (4.13) 2 = 1.23 Since k i > 1 for k 2 and k 1, the system is unstable. (b) The system is not minimum phase since there is a zero at z = 4, which is outside the unit circle, as well as poles outside the unit circle. Problem 4 A speech signal frame (windowed using a Hamming window) has energy: E (0) n = m s 2 n[m] = 2000 Using the autocorrelation method of analysis on this speech frame, the first 3 PARCOR coefficients are computed and their values are: k 1 = 0.5 k 2 = 0.5 k 3 = 0.2 Find the energy of the linear prediction residual, En 3 = e 2 n[m] that would be m obtained by inverse filtering s n [m] by the optimal third order predictor inverse filter, A 3 (z). 4

Solution The energy of the linear prediction residual using a third order predictor is: E n (3) = E n (0) (1 k1)(1 2 k2)(1 2 k3) 2 = (2000)(1 0.25)(1 0.25)(1 0.04) = 1080 Problem 5 Write a MATLAB program to convert from a frame of speech to a set of linear prediction coefficients, using all 3 methods discussed in class, i.e., the Autocorrelation Method, the Covariance Method, and the Lattice Filter Method. Choose a section of a steady state vowel, and a section of unvoiced speech, and plot LPC spectra from the 3 methods along with the normal spectrum from the Hamming window weighted frame. Use N = 300, p = 12, with Hamming Window weighting for the autocorrelation method. Use the same parameters for the Covariance and Lattice Methods. Use the files ah.wav to get a vowel steady state sound beginning at sample 3000, and the file test 16k.wav to get a fricative beginning at sample 3000. (Dont forget that for the covariance and lattice methods, you also need to preserve p samples before the starting sample at n = 3000 for computing correlations, and error signals). Solution The following MATLAB program reads in a file of speech and computes the original spectrum (of the signal weighted by a Hamming window), and plots on top of this the LPC spectrum from the autocorrelation, covariance, and lattice methods. There is a main program and 3 functions (durbin for the autocorrelation method, cholesky for the covariance methodalthough simple matrix inversion was used, rather than the Cholesky decomposition, and lattice for the traditional lattice method). read in speech file, choose section of speech and solve for set of lpc coefficients using the autocorrelation method, the covariance method, and the lattice method plot the resulting spectra from all three methods read in waveform for a speech file [xin,fs,mode,format]=loadwav( ah_truncated.wav );xin) filename=input( enter speech filename:, s ); [xin,fs,mode,format]=loadwav(filename); 5

normalize input to [-1,1] range and play out sound file xinn=xin/max(xin); sound(xinn,fs); [nrows ncol]=size(xin); m=input( starting sample for speech frame: ); N=input( frame duration: ); p=input( lpc order: ); wtype=input(window type(1=hamming, 0=Rectangular):); stitle=sprintf( file: s, ss: d N: d p: d,filename,m,n,p); print out number of samples in file, read in plotting parameters fprintf( number of samples in file: 7.0f \n, nrows); autocorrelation method--choose section of speech, window using Hamming window, compute autocorrelation for i=0,1,...,p get original spectrum xino=[(xin(m:m+n-1).*hamming(n)) zeros(1,512-n)]; h0=fft(xino,512); f=0:fs/512:fs-fs/512; plot(f(1:257),20*log10(abs(h0(1:257)))),title(stitle),ylabel( log magnitude (db) ),... xlabel( frequency in Hz ); hold on; autocorrelation method xf=xin(m:m+n-1); [R,E,k,alpha,G]=durbin(xf,N,p,wtype); alphap=alpha(1:p,p); num=[1 -alphap ]; [ha,f]=freqz(g,num,512,fs); plot(f,20*log10(abs(ha)), k- ); fvtool(g,num); covariance method--choose section of speech, compute correlation matrix and covariance vector xc=xin(m-p:m+n-1); [phim,phiv,ec,alphac,gc]=cholesky(xc,n,p); numc=[1 -alphac ]; [hc,f]=freqz(gc,numc,512,fs); plot(f,20*log10(abs(hc)), g-- ); fvtool(gc,numc); lattice method--choose section of speech, compute forward and backward errors xl=xin(m-p:m+n-1); [EL,alphal,GL,k]=lattice(xl,N,p); alphalat=alphal(:,p); numl=[1 -alphalat ]; 6

[hl,f]=freqz(gl,numl,512,fs); plot(f,20*log10(abs(hl)), r-. ); leg( windowed speech, autocorrelation method(-), covariance method(--), lattice me fvtool(gl,numl); function [R,E,k,alpha,G]=durbin(xf,N,p,wtype) function [R,E,k,alpha,G]=durbin(xf,N,p,wtype) compute window based on wtype; wtype=1 for Hamming window, wtype=0 for Rectangular window compute R(0:p) from windowed xf solve Durbin recursion fo E,k,alpha in 8 easy steps step 1--E(0)=R(0) step 2--k(1)=R(1)/E(0) step 3--alpha(1,1)=k(1) step 4--E(1)=(1-k(1).^2)E(0) steps 5-8--for i=2,3,...,p; step 5--k(i)=[R(i)-sum from j=1 to i-1 alpha(j,i-1).*r(i-j)]/e(i-1) step 6--alpha(i,i)=k(i) step 7--for j=1,2,...,i-1 alpha(j,i)=alpha(j,i-1)-k(i)alpha(i-j,i-1) step 8--E(i)=(1-k(i).^2)E(i-1) if wtype==1 win=hamming(n); else win=boxcar(n); window frame for autocorrelation method xf=xf.*win; compute autocorrelation for k=0:p R(k+1)=sum(xf(1:N-k).*xf(k+1:N)); solve for lpc coefficients using Durbin s method E=zeros(1,p); k=zeros(1,p); alpha=zeros(p,p); E(1)=R(1); ind=1; k(ind)=r(ind+1)/e(ind); 7

alpha(ind,ind)=k(ind); E(ind+1)=(1-k(ind).^2)*E(ind); for ind=2:p k(ind)=(r(ind+1)-sum(alpha(1:ind-1,ind-1).*r(ind:-1:2)))/e(ind); alpha(ind,ind)=k(ind); for jnd=1:ind-1 alpha(jnd,ind)=alpha(jnd,ind-1)-k(ind)*alpha(ind-jnd,ind-1); E(ind+1)=(1-k(ind).^2)*E(ind); G=sqrt(E(p+1)); function [phim,phiv,ec,alphac,gc]=cholesky(xc,n,p) cholesky decomposition first compute phim(1,1)...phim(p,p) next compute phiv(1,0)...phiv(p,0) for i=1:p for k=1:p phim(i,k)=sum(xc(p+1-i:p+n-i).*xc(p+1-k:p+n-k)); for i=1:p phiv(i)=sum(xc(p+1-i:p+n-i).*xc(p+1:p+n)); phiz(i)=sum(xc(p+1:p+n).*xc(p+1-i:p+n-i)); phi0=sum(xc(p+1:p+n).^2); use simple matrix inverse to solve equations--come back to Cholesky decomposition later phiv=phiv ; solve using matrix inverse, phim*alphac=phiv alphac=inv(phim)*phiv alphac=inv(phim)*phiv; EC=phi0-sum(alphac.*phiz); GC=sqrt(EC); function [EL,alphal,GL,k]=lattice(xc,N,p) lattice solution to lpc equations follow 10 step solution step 1--set e(0)(m)=b(0)(m)=s(m) step 2--compute k1=alpha(1,1) from basic lattice reflection coefficient equation 8.89 step 3--determine forward and backward errors e(1)(m) and b(1)(m) from 8

eqns 8.84 and 8.87 step 4--set i=2 step 5--determine ki=alpha(i,i) from eqn 8.89 step 6--determine alpha(j,i) for j-1,2,...,i-1 from eqn 8.70 step 7--determine e(i)(m) and b(i)(m) from eqns. 8.84 and 8.87 step 8--set i=i+1 step 9--if i<=p, go to step 5 step 10--finished e(:,1)=xc; b(:,1)=xc; k(1)=sum(e(p+1:p+n,1).*b(p:p+n-1,1))/sqrt((sum(e(p+1:p+n,1).^2)*sum(b(p:p+n-1,1).^2))); alphal(1,1)=k(1); btemp=[0 b(:,1) ] ; e(1:n+p,2)=e(1:n+p,1)-k(1)*btemp(1:n+p); b(1:n+p,2)=btemp(1:n+p)-k(1)*e(1:n+p,1); for i=2:p k(i)=sum(e(p+1:p+n,i).*b(p:p+n-1,i))/sqrt((sum(e(p+1:p+n,i).^2)*sum(b(p:p+n-1,i).^2) alphal(i,i)=k(i); for j=1:i-1 alphal(j,i)=alphal(j,i-1)-k(i)*alphal(i-j,i-1); btemp=[0 b(:,i) ] ; e(1:n+p,i+1)=e(1:n+p,i)-k(i)*btemp(1:n+p); b(1:n+p,i+1)=btemp(1:n+p)-k(i)*e(1:n+p,i); EL=sum(xc(p+1:p+N).^2); for i=1:p EL=EL*(1-k(i).^2); GL=sqrt(EL); The following plots are for a vowel from the file ah.wav, and a fricative from the file test 16k.wav. 9

10