M. Hasegawa-Johnson. DRAFT COPY.

Size: px
Start display at page:

Download "M. Hasegawa-Johnson. DRAFT COPY."

Transcription

1 Lecture Notes in Speech Production, Speech Coding, and Speech Recognition Mark Hasegawa-Johnson University of Illinois at Urbana-Champaign February 7, 000

2 M. Hasegawa-Johnson. DRAFT COPY.

3 Chapter Linear Predictive Coding. All-Pole Model of the Speech Transfer Function The speech spectrum S(z) is produced by passing an excitation spectrum, U(z), through an all-pole transfer function H(z) = G=A(z): S(z) =H(z)U(z) = A(z) If we write the corresponding difference equation, we get Where the input signal u(n) is u(n) = ρ N 0 P s(n) = r f(n rn 0) Uncorrelated Gaussian Random Noise G U(z) (.) A(z) a k z k (.) a k s(n k)+gu(n) (.) Voiced Speech Unvoiced Speech A(z) can be written in terms of either the direct-form coefcients a k or the roots r i :. Normal Equations A(z) a k z k py i= (.) ( r i z ) (.5) Suppose we want to pick an A(z) which explains" as much as possible of S(z). If we force A(z) to explain" as much as possible, then the remaining excitation signal, e(n) = Gu(n) will be as simple" as possible. We can do this by choosing A(z) in order to minimize the energy of e(n): X X E = e (m) = G u (m) (.) m= m= Plugging in equation., we get X E = m= ψ s(m) a k s(m k)! (.7) 5

4 M. Hasegawa-Johnson. DRAFT COPY. Solving the equation k = 0 yields the generalized normal equations: X m= s(m)s(m i) = X a k m= s(m k)s(m i); i =;;p (.8) The generalized normal equations contain innite sums, which makes it uncomputable. In order to make it computable, we need to truncate it somehow... Autocorrelation Method Solving the Normal Equations In the autocorrelation method, we window the signal: Then minimize the following criterion: E n = m= s n (m) w(m)s(n + m) (.9) X ψ s n (m) Solving the n =@a k = 0 yields the autocorrelation normal equations: where This can also be written in matrix form: μr p = R() R() R(p) 7 5 ; R μ = R n (i) = R n (i) X m= a k s n (m k)! (.0) a k R n (ji kj) (.) s n (m)s n (jm ij) (.) μr p = μ Rμa (.) R(0) R() R(p ) R() R(0) R(p ) R(p ) R(p ) R(0) 7 5 ; μa = a a p 5 (.) Since the matrix μ R is Toeplitz" (all elements on a given diagonal are the same), it can be inverted efciently (O(p )) using the Levinson-Durbin recursion. ffl Advantage of autocorrelation method: the normal equations can be solved using a computationally efcient algorithm. ffl Disadvantage of autocorrelation method: if x(n) is actually a steady-state signal, autocorrelation will not nd the true spectrum. Levinson-Durbin Recursion The normal equations for autocorrelation can be solved by inverting a Toeplitz matrix: μr p = R() R() R(p) 7 5 ; R μ = μr = μ Rμa ) μa = μ R μr (.5) R(0) R() R(p ) R() R(0) R(p ) R(p ) R(p ) R(0) 7 5 ; μa = a a p 5 (.)

5 Lecture Notes in Speech..., DRAFT COPY. 7 This inversion can be done efciently using the following recursive algorithm, called the Levinson-Durbin recursion:" Finding the Minimum Energy The minimum energy is E min = E (0) = R(0) (.7) k i = R(i) P i j= a(i ) j R(i j) E (i ) ;» i» p (.8) a (i) i = k i (.9) a (i) j = a (i ) j k i a (i ) i j ;» j» i (.0) E (i) = ( k i )E (i ) (.) ψ s(m) a k s(m k)! = R n (0) Combined Equation for Energy and Coefcients (.) a k R n (k) (.) The equations for LPC coefcients and minimum error can be written together in matrix form: fl = E min ; R p =.. Covariance Method Solving the Normal Equations fl = R p a (.) R(0) R() R(p) R() R(0) R(p ) R(p) R(p ) R(0) 7 5 ; a = Instead of windowing the signal, we use the original signal s n (m), but window the error: E n = n+n X m=n ψ s(m) Solving the n =@a k = 0 yields the autocorrelation normal equations: where This can also be written as μfl = Φ(; 0) Φ(; 0) Φ(p; 0) Φ n (i; k) Φ n (i; 0) = 7 5 ; μφ= n+n X m=n a a p 7 5 (.5) a k s(m k)! (.) a k Φ n (i; k) (.7) s(m i)s(m k) (.8) μfl = μφμa (.9) Φ(; ) Φ(; ) Φ(;p) Φ(; ) Φ(; ) Φ(;p) Φ(p; ) Φ(p; ) Φ(p; p) 7 5 (.0)

6 8 M. Hasegawa-Johnson. DRAFT COPY. The matrix μφ is symmetric, but not Toeplitz, so inverting it requires O(p ) operations (using an algorithm called the Cholesky decomposition). ffl Advantage: if x(n) is steady-state, covariance can nd the true spectrum. ffl Disadvantage: solving the normal equations requires an O(p )matrixinversion operation. Finding the Minimum Energy The minimum error energy is: E min =Φ n (0; 0) a k Φ n (0;k) (.) Combined Equation for Energy and Coefcients The equations for LPC coefcients and minimum error can be written together in matrix form: Φ p =.. Choosing the LPC Order fl = Φ p a (.) Φ(0; 0) Φ(0; ) Φ(0; p) Φ(; 0) Φ(; ) Φ(; p) Φ(p; 0) Φ(p; ) Φ(p; p) 7 5 (.) The LPC order needs to be large enough to represent each formant using a complex-conjugate pole pair. There also need to be an extra or poles to represent spectral tilt. With everything together, we have: p ß (Number of Formants) + ( to ) (.) The number of formants in the spectrum is the Nyquist rate (F s =), divided by the average spacing between neighboring formants: F s = Number of Formants = (.5) average(f n+ F n ) The spacing between neighboring formant frequencies is approximately average(f n+ F n ) ß c l (.) where c = 500cm/s is the speed of sound, and l is the length of the vocal tract. The length of a male vocal tract is close to 7.7cm, so there is approximately one formant per 000Hz: p ß Fs 000Hz +(to) (.7) The length of a female vocal tract is close to.75cm, so there is approximately one formant per 00Hz: Fs p ß +(to) (.8) 00Hz

7 Lecture Notes in Speech..., DRAFT COPY. 9.. Choosing the LPC Gain The LPC excitation is dened by the formula: The LPC error is dened by the formula: If we dene then s(n) = e(n) s(n) G = a k s(n k)+gu(n) (.9) N X. Frequency-Domain Interpretation of LPC The LPC error is so the error spectrum is Using Parseval's theorem, we get that k=0 e(n) =s(n) E(z) =S(z)( E n = X m Substituting in the form of E(z), we get E n = ß Z ß ß a k s(n k) =Gu(n) (.0) u (n) (.) P e (n) P u (n) = E min (.) a k s(n k) (.) a k z k )=S(z)A(z) (.) Z e (m) = ß je n (e j! )j d! (.5) ß ß jx n (e j! )j ja(e j! )j d! = G ß Z ß ß jx n (e j! )j jh(e j! d! (.) )j Since X is in the numerator inside the integral, any algorithm which minimizes E n will automatically try to produce an H(e j! )which does a good job of modeling X at frequencies where X is large. In other words, LPC models spectral peaks better than spectral valleys.. Lattice Filtering Lattice ltering analyzes or synthesizes speech usingith-order forward prediction error e (i) (m) and backward prediction error b (i) (m): e (i) (m) =s(m) b (i) (m) =s(m i) ix ix a k s(m k) =e (i ) (m) k i b (i ) (m ) (.7) a k s(m + k i) =b (i ) (m ) k i e (i ) (m) (.8) In LPC synthesis, s(n) andb (p) (m) are calculated recursively from e (p) (m), as shown in gure..

8 70 M. Hasegawa-Johnson. DRAFT COPY. (p) E (z) (p) B (z) -kp kp -k k /z /z /z -k k - G S(z) Figure.: LPC synthesis using a lattice lter structure... How to Calculate Reflection Coefcients. Reflection coefcients can be estimated directly from the speech signal by minimizing an error metric of the form ~E (i) = some function of e (i) (m); b (i) (m) (.9) Direct calculation of reflection coefcients is computationally expensive, but sometimes leads to better estimates of the transfer function (e.g. Burg's method).. Any LPC synthesis lter can be implemented as a lattice lter. The relationship between a i and k i is given by the Levinson-Durbin recursion... Why Use Lattice Filter Instead of Direct-Form LPC? Tests for stability of =A(z): ffl Roots of A(z): jr i j <. ffl Reflection coefcients: jk i j <. ffl Direct form coefcients a k : No simple test... Equivalence of Lattice and Concatenated-Tube Models Reflection coefcients in the lattice lter are equivalent to reflection coefcients in a lossless tube model of the vocal tract. There are many ways to make the two structures correspond. One convenient formulation numbers the tube areas A i backward from the lips: k i = A i A i+ A i + A i+ ; A i =Areaoth tube section (.50) A 0 = (Area of the space beyond the lips lossless termination) (.5) A p+ = A p (Area of the glottis lossy termination) (.5) The length l of each tube section is determined by the sampling period T and the speed of sound c: T = l c (.5).5 Stability of the LPC Filter.5. Stability of the Unquantized Filter H(z) py H(z) = P p i= ff iz i = i= r i z (.5)

9 Lecture Notes in Speech..., DRAFT COPY. 7 ffl H(z) is stable iff jr i j <. This is equivalent tosaying that the reflection coefcients are jk i j <. ffl Autocorrelation and lattice methods guarantee a stable lter, because they choose values of jk i j <. ffl Covariance method does not guarantee a stable lter..5. Stability of the Quantized Filter ^H(z) ^H(z) = ffl Filter is stable iff all j^r i j <,j^k i j <. P p i= ^ff iz i = py i= ^r i z (.55) ffl If the synthesis lter ^H(z) is unstable, energy of the decoded speech will rapidly MAX OUT. ffl Since lter state is carried forward from frame to frame, this erroneously high energy can last for a long time after the offending frame. ffl This means that a single error can wipe out a whole second or more of speech, and maybe the D/A and the speakers as well..5. Quantizing Direct-Form Coefcients Leads to Unstable Filters A small quantization error in one of the direct-form coefcients ^ff i can easily make ^H(z) unstable. example, this lter is stable: H(z) = (.5) 0:z +0:z +0:8z +0:9z If ff is changed from 0.9 to 0.5, the lter is unstable: ^H(z) = (.57) 0:z +0:z +0:8z +0:5z If ff is different, however, the same change in ff leaves a stable lter: ^H(z) =.5. THE SOLUTION +0:z +0:z +0:8z +0:5z (.58) Quantize either k i or r i, and design the quantizer levels so that j^k i j or j^r i j is always less than.0.. Log Area Ratios Goal: Quantize k i such that j^k i j <... The Problem Changes in k i have amuch larger effect on the synthesized speech spectrum if jk i jßthanifjk i j <<, as shown in gure..... The Solution: Companded Quantization k i! Expand! g i! Linear PCM! ^g i! Compress! ^k i (.59) g i =log ki (.0) +k i For

10 7 M. Hasegawa-Johnson. DRAFT COPY. Spectral Sensitivity of k (solid) and of k (dashed) in Second Order LPC 0.9 Dependence of Resonant Frequency on k, dω/dk Reflection Coefcients k (for k=0) and k (for k =) Figure.: Spectral sensitivity to changes in the reflection coefcients. 0 Log Area Ratio Companding 8 Log Area Ration g(i) Reflection Coefcient k(i) Figure.: LAR companding.

11 Lecture Notes in Speech..., DRAFT COPY. 7 Glottal Excitation Area=A0 Loss Signal A A A Radiated Speech Vocal Tract Resonator: Unyielding Walls, Discrete Area Function SPEECH S(z) EXCITATION E(z) LOSS B(z) k k k -k -k -k Figure.: Acoustic resonator and lattice model with a matched impedance termination at the glottis... Interpretation: Log Area Ratios Remember that the reflection coefcients k i can be used to dene a stylized vocal tract model, with reflection coefcients of r L = at the lips, r g = 0 at the glottis, and r i = k p i (i = 0;;p ), as shown in gure... If the cross-sectional areas are A i, A i+, then - and r i = k p i = A i+ A i A i+ + A i (.) A i+ A i So the expanded" reflection coefcient is really a log area ratio:".7 Line Spectral Frequencies Goal: Quantize r i such that jr i j <. = k p i +k p i (.) g i =log ki =log Ap i+ +k i A p i.7. The Impedance Interpretation of the LSFs (.) There is no regular relationship between the real and imaginary parts of r i (the frequencies and bandwidths of the roots), so they are difcult to quantize. Solution Instead, we nd two adjunct polynomials Q(z) andp (z) whose roots e jqn and e jpn are on the unit circle if and only if the roots of A(z) are inside the unit circle (jr i j < ). If the real numbers p n and q n have nice properties, then they might be easier to quantize than the complex numbers r i. Concise Denition The reflection coefcients k i can be used to dene a stylized vocal tract model, with reflection coefcients of r L = at the lips, r g = 0 at the glottis, and r i = k p i. The LSFs are the zero and pole frequencies of the impedance of this vocal tract model as seen from the glottis, which is:

12 7 M. Hasegawa-Johnson. DRAFT COPY. Z T (z) = Q(z) P (z) = ( ej0 z p= ) Y ( e jqn z )( e jqn z ) ( e jß z ) ( e jpn z )( e jpn z ) n= (.) Since the termination at the lips is lossless, and every tube section is lossless, the impedance Z T (z) must also be lossless. This means that the poles and zeros e jpn and e jqn are on the unit circle. Nice Characteristics of LSFs. The pole frequencies p n (Z T (e jpn )=) approximately equal the formant frequencies.. The zeros and poles alternate: 0 <p <q <p <q < (.5) ) Range(q n ;p n ) is limited. (.) ) q n ;p n easy to quantize. (.7). q n ;p n are correlated with each other,sointra-frame prediction and block quantization are possible.. q n ;p n change slowly from frame to frame, so inter-frame prediction is also possible..7. Derivation of the LSF Polynomials Interpretation Lattice lter is like a vocal tract resonator with no reflections at the glottis, as shown in gure... In the gure, the glottal excitation signal" is the forward LPC prediction error E (p) (z): E (p) (z) = ψ i= ff i z i! X(z) (.8) and the glottal loss signal" is the backward LPC prediction error z (p+) B (p) (z), which can be obtained by time-reversing the LPC analysis lter (as proven in R&S 8..): z (p+) B (p) (z) =z (p+) ψ i= ff i z i! X(z) (.9) Notice that, since r L =, the only loss in this vocal tract model is at the glottis. The impedance of the rest of the vocal tract, as seen from the glottis, is pure imaginary: That means the zeros and poles of Z T (e j! ) are on the unit circle: <fz T (e j! )g =0 (.70)...where q n and p n are real numbers. Y T (e jpn )= Z T (e jqn )=0 (.7) =0 (.7) Z T (e jpn )

13 Lecture Notes in Speech..., DRAFT COPY. 75 Closed Glottis A0 A A A Radiated Speech Closed-Glottis Vocal Tract Model (Unyielding Walls) D(z)=E(z)+B(z) SPEECH S(z) + k k k -k -k -k - Figure.5: Acoustic resonator and lattice lter model with a zero-admittance termination at the glottis. Finding the Poles where Y g (e j! ) is the admittance of a rigid wall, Y T (e jpn )=0) Y g (e jpn )+Y T (e jpn )=0 (.7) Y g (e j! )=0 8! (.7) This is the same as saying that p n are the resonant frequencies of the system shown in gure.7.. Find the resonances of the augmented lattice lter: = ψ i= D(z) =E (p) (z)+z (p+) B (p) (z) (.75) ff i z i! X(z)+z (p+) ψ i= ff i z i! X(z) (.7) X(z) = D(z) A(z)+z (p+) A(z ) So the resonances e jpn are the roots of the polynomial (.77) P (z) =A(z)+z (p+) A(z ) (.78) Finding the Zeros where Z g (e j! )istheimpedanceofopenspace, Z T (e jqn )=0) Z g (e jqn )+Z T (e jqn )=0 (.79) Z g (e j! )=0 8! (.80) This is the same as saying that q n are the resonant frequencies of the system shown in gure.7.. Find the resonances of the augmented lattice lter: = ψ i= D(z) =E (p) (z) z (p+) B (p) (z) (.8) ff i z i! X(z) z (p+) ψ i= ff i z i! X(z) (.8)

14 7 M. Hasegawa-Johnson. DRAFT COPY. Open Glottis A0 A A A Radiated Speech Open-Glottis Vocal Tract Model (Unyielding Walls) D(z)=E(z)-B(z) SPEECH S(z) - k k k -k -k -k - Figure.: Acoustic resonator and lattice lter model with a zero-impedance termination at the glottis. X(z) = D(z) A(z) z (p+) A(z ) So the resonances e jqn are the roots of the polynomial (.8) Q(z) =A(z) z (p+) A(z ) (.8).7. The Augmented-Tube Interpretation of Line Spectral Frequencies Setting A p+ = 0 in the concatenated tube model yields: Setting A p+ = yields: S(z) U(z) = S(z) E (p) (z)+z (p+) B (p) (z) = P (z) ; P (z) A(z)+z (p+) A(z ) (.85) S(z) U(z) = S(z) E (p) (z) z (p+) B (p) (z) = Q(z) ; Q(z) A(z) z (p+) A(z ) (.8) Because of symmetry, the roots of both P (z) andq(z) are on the unit circle. If p is even: P (z) = ( + z ) Q(z) =( z ) p= Y n= p= Y n= ( e jpn z )( e jpn z ); p n real (.87) ( e jqn z )( e jqn z ); q n real (.88) The frequencies p n and q n,for» n» p=, are called the line spectral frequencies (LSFs). The LSFs have the following useful characteristics: ffl LSFs are real, so they are easier to quantize than the LPC roots r i,which are complex. ffl If and only if =A(z) is stable, the LSFs satisfy: 0 <p <q <p <q <<ß ffl The LSFs tend to track the LPC root frequencies arg(r i ), but... ffl The LSFs vary more slowly and smoothly than the LPC roots r i. ffl Efcient algorithms exist for calculating the LSFs.

15 Lecture Notes in Speech..., DRAFT COPY LPC Distance Measures Suppose we want to calculate the distance between two all-pole spectra, S (!) =jx (!)j ß We can: G A (!) ; S (!) =jx (!)j ß G A (!) (.89) ffl Calculate the spectral L norm by integrating log jg =A (!)j log jg =A (!)j over!. ffl Find the LPC cepstrum, and calculate a weighted cepstral distance. ffl Calculate an LPC likelihood distortion..8. Itakura-Saito Distortion Suppose that S (!) is a random spectrum, produced by ltering a unit-energy noise process U(!) through an unknown all-pole lter A (!): S (!) = E[S (!)] = G A (!) G A (!) ju(!)j (.90) (.9) Suppose we don't know A, but wehave a spectrum A whichmightormightnotbeagoodapproximation to A. One question worth asking is, what is the probability that the signal x (n) was generated using lter G =A? This probability is related to a distance called the Itakura-Saito measure of the distortion between spectra G =A and G =A, named after the engineers who rst derived it: G d IS ja j ; G ja j Z = ß ß ß jx (!)j ja (!)j G d! log G (.9) G ο log (p x ([x (0) x (L )] j G ;A (!))) (.9) The rst term in the I-S distortion is the residual energy of the random signal x (n), ltered through the inverse lter A (z)=g : x (n)! A (z)=g! e (n) (.9) Z ß Z je (!)j d! = ß ß ß ß ß X = G = G n ψ jx (!)j ja (!)j d! (.95) (x (n) R (0) G a k; x (n k)) (.9) a k; R (k)+ i= a i; a k; R (ji kj)! (.97) = a0 R p; a G (.98) where a is the LPC coefcient vector representing the polynomial A (z), and R p; is the autocorrelation matrix built out of samples of signal x (n). Using this notation, the I-S distortion can be written in the easy-to-compute form:

16 78 M. Hasegawa-Johnson. DRAFT COPY. G d IS ja j ; G = a0 R p; a log G ja j G G (.00) (.00) Characteristics: ffl Asymmetric! G d IS ja j ; G G = d IS ja j ja j ; G ja j (.0) ffl Minimum value of zero is obtained when A = A, G = G. G d IS ja j ; G = E min log G =0 (.0) ja j G G.8. Likelihood-Ratio Distortion, Itakura Distortion Many times, we don't care about differences in the spectral energy of S and S. The likelihood-ratio distortion measure is obtained by normalizing both S and S to unit energy, and then calculating an Itakura-Saito distortion: d LR ja j ; Z = d IS ja j ja j ; = ß ja j ß ß ja (!)j S (!) d! = a0 R p; a (.0) G G (.0) A similar distortion measure called the Itakura distortion is often used instead of the likelihood-ratio distortion: d I ja j ; Z ß log ja j ß ß ja (!)j S (!) d! G a 0 = log R p; a G (.05)

17 Lecture Notes in Speech..., DRAFT COPY. 79 LPC Parameter Sets Containing Equivalent Information R n (i) = Direct-Form Coefcients (a k ) a k R n (ji kj) (.0) A(z) = LPC Roots (r i ) r i =roots(a(z)) (.08) A(z) = PARCOR Coefcients (k i ) py i= a k z k (.07) ( r i z ) (.09) k i = a (i) i a (i ) j = a(i) j + k i a (i) k i i j ;» j» i a (i) j = ρ ki a (i ) j j = i k i a (i ) i j» j» i (.0) g i =log ki +k i Log Area Ratios (g i ) (.) k i = egi +e gi (.) Line Spectral Frequencies (p n ;q n ) P (z) =A(z)+z (p+) A(z ) (.) Q(z) =A(z) z (p+) A(z ) (.) p n =arg(roots(p (z))) s.t. 0 <p n <ß q n = arg(roots(q(z))) s.t. 0 <q n <ß P (z) = ( + z ) Q(z) =( z ) p= Y n= p= Y n= ( e jpn z )( e jpn z ) (.5) ( e jqn z )( e jqn z ) (.) LPC Cepstrum (c m ) c 0 = log G c m = a m + c m = m X m X k m k c k a m k ; m c k a m k ; m>p» m» p G = e c0= a m = c m m X k m c k a m k ;» m» p

18 80 M. Hasegawa-Johnson. DRAFT COPY..9 Exercises. Likelihood Ratio Distance Suppose that the autocorrelation coefcients of signal x (n) arer(0)=500andr() = 00. (a) Find the parameters G and A (z) of a rst-order autocorrelation LPC model of x (n). (b) Suppose that the spectrum of another signal, x (n), can be modeled using the following transfer function: G A (z) = 0 (.7) 0:9z What is the likelihood ratio distance d LR ja j ; ja j?. Vocal Tract Area Function Remember that the PARCOR coefcients k i can be viewed as reflection coefcients in a concatenated-tube model of the vocal tract, with tube sections numbered backward from the lips. Calculate the area A i and length l of each tube section; you may assume that the area of the lip opening is A =5cm. Plot the area of the vocal tract A(x) as a function of position x, where x is measured in centimeters. Turn in code and/or equations showing how you calculated l and A(x).. LPC Analysis and Synthesis Write a function which reads in a frame of speech, calculates LPC coefcients, lters the input by A(z) to nd an LPC residual, and then lters the residual by H(z) ==A(z) tosynthesize a speech waveform. Verify that the output is identical to the non-overlapping part of the input. (a) It is important to carry the state of all digital lters forward from frame to frame. The state of digital lters in matlab can be accessed using the Zf and Zi arguments to lter." What happens if the state of the LPC synthesis lter is carried forward from frame to frame, but not the state of the analysis lter? What happens if the state of the analysis lter is carried forward, but not the state of the synthesis lter? What happens if neither lter state is carried forward? (Hint: compare the LPC residual with and without lter state carry-over.) (b) Quantize the LPC coefcients using LAR quantization. Aim for an average of -5 bits per LPC coefcient. Verify that when you use the same quantized lter ^A(z) for both analysis and synthesis, LPC quantization does not introduce any errors into the reconstructed signal. How many bits per second are you using to quantize the LPC coefcients?. Formant Tracking Write a matlab function of the form A = lpcana(x, P, N)which performs LPC analysis of order P on the waveform X using frames of length N samples. The matrix A should contain one row for each frame; each row should contain the LPC lter coefcients for one frame. Write a matlab function [F,BW] = formants(a,fs) which nds the roots of the LPC polynomials stored in A, and calculates up to p= analog formant bandwidths BW i and frequencies F i per frame, such that r i = e ßBW i +jßf i Fs (.8) where r i is one of the roots of A(z), calculated using the matlab roots function. Plot the formant frequencies as a function of time, and compare your plot to a spectrogram of the utterance. Which formants are tracked during voiced segments? What happens when there are less than p= trackable formants? What happens to the LPC-based formant estimates during unvoiced speech segments?

19 Lecture Notes in Speech..., DRAFT COPY Line Spectral Frequencies Write a matlab function [PF, QF] = tflsf(a) which computes ordered line spectral frequencies corresponding to the LPC coefcients in A. The LSF matrices PF and QF should each contain as many rows as A, and p= columns. (a) Plot the analog P-frequencies, p n F s =ß, and compare to a spectrogram of the utterance. Do the P- frequencies track the formants during voiced segments? What happens if there are 5 P-frequencies, but only trackable formants? What do the P-frequencies do during unvoiced segments? Do you think this behavior is likely to make the P-frequencies more or less quantizable than the LPC-based formant frequencies? Why? Plot the Q-frequencies q n F s =ß. How do the Q-frequencies relate to the P-frequencies? Is there ever a time when a formant frequency is tracked more closely by q n than by p n? How rapidly do the line spectral frequencies change as a function of time? What is the range of each individual line spectral frequency? What is the range of the difference between neighboring line spectral frequencies? Can you think of an efcient way of quantizing the line spectral frequencies? (b) Quantize the LPC coefcients using LSF quantization, then convert the quantized LSFs back into direct-form LPC coefcients, and synthesize speech. How does the speech sound? How can you guarantee that the quantized LPC synthesis lter will be stable?. Pitch Tracking Write a matlab function [N0, B] = pitch(x, N, N0MIN, N0MAX). For each frame, set B and N0 according to the following formulas: B = max r x (m); N 0;min» m» N 0;max (.9) N 0 = arg max r x (m); N 0;min» m» N 0;max (.0) (.) (a) Try different values of N 0;min and N 0;max. For each value you test, plot B and N0 as a function of time, and compare them to the spectrogram. What values give the best pitch tracking? Is B always larger for voiced segments than unvoiced segments? What is the threshold value of B which best divides voiced and unvoiced segments? (b) Try pitch tracking using the autocorrelation of the LPC residual, rather than the signal autocorrelation. Do you nd any improvement? Why or why not? 7. Lattice Filter and Concatenated-Tube Model In this problem, you will explore the relationship between the PARCOR lattice lter structure of R&S Fig. 8. and the lossless tube models of R&S Fig..0. Recall that the lattice lter in Fig. 8. iteratively removes the redundancy from the speech signal S(z) in order to calculate the forward prediction error, E (i) (z), and the backward prediction error, B (i) (z): E (i) (z) =E (i ) (z) k i z B (i ) (z) (.) B (i) (z) =z B (i ) (z) k i E (i ) (z) (.) E (0) (z) =B (0) (z) =S(z) (.) (a) Suppose you are given as inputs the forward error of order (i) and the backward error of order (i ), E (i) (z) andb (i ) (z), and you are asked to synthesize E (i ) (z) andb (i) (z). By re-arranging the equations above, devise a lter structure to accomplish this. Draw the lter structure. (b) Suppose you are given only the forward error of order (), E () (z), and asked to synthesize the speech signal S(z) and the backward error B () (z). Devise a lattice lter structure to accomplish this.

20 8 M. Hasegawa-Johnson. DRAFT COPY. (c) Fig. shows a section of a digital concatenated tube model, similar to the model shown in R&S Fig..0, but with a delay on the backward arc rather than the forward arc. Suppose that the left-hand nodes are the (i)th order LPC prediction errors, A(z) =E (i) (z) and B(z) = B (i) (z). Show that the right-hand nodes are C(z) =fle (i ) (z) andd(z) =flb (i ) (z), for some constant fl. What is the value of fl? A(z) +ki - h - C(z) B(z) ff -ki ff? h -ki ki z - Figure : Section of a digital concatenated tube model. ff D(z) (d) Find values of the reflection coefcients r G, r, r,andr L such that the transfer function of the concatenated tube model in Fig..0c of your text is U l (z) U g (z) = z = G S(z) E () (z) (.5) where S(z) ande () (z) are as dened in part (b) of this problem, and G is a constant. (e) In the concatenated tube model of part (d), what is the acoustic impedance at the lips? at the glottis? (you may express one or both impedances in terms of the cross-sectional tube areas, if necessary). (f) The concatenated tube elements in your model of part (d) are lossless elements, and yet the impulse response of the system decays gradually over time, implying that at least one element in the system must be lossy. Where are the losses occurring? In a real vocal tract, where else would losses occur?

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Linear prediction analysis is used to obtain an eleventh-order all-pole model for a segment

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007 Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is

More information

Chapter 2 Speech Production Model

Chapter 2 Speech Production Model Chapter 2 Speech Production Model Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech

More information

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer. Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that

More information

Applications of Linear Prediction

Applications of Linear Prediction SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

4.2 Acoustics of Speech Production

4.2 Acoustics of Speech Production 4.2 Acoustics of Speech Production Acoustic phonetics is a field that studies the acoustic properties of speech and how these are related to the human speech production system. The topic is vast, exceeding

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M: Lesson 1 Optimal Signal Processing Optimal signalbehandling LTH September 2013 Statistical Digital Signal Processing and Modeling, Hayes, M: John Wiley & Sons, 1996. ISBN 0471594318 Nedelko Grbic Mtrl

More information

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm. Volume 3, Issue 6, June 213 ISSN: 2277 128X International Journal of Advanced Research in Comuter Science and Software Engineering Research Paer Available online at: www.ijarcsse.com Lattice Filter Model

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

More information

2 M. Hasegawa-Johnson. DRAFT COPY.

2 M. Hasegawa-Johnson. DRAFT COPY. Lecture Notes in Speech Production Speech Coding and Speech Recognition Mark Hasegawa-Johnson University of Illinois at Urbana-Champaign February 7 2000 2 M. Hasegawa-Johnson. DRAFT COPY. Chapter Basics

More information

LPC and Vector Quantization

LPC and Vector Quantization LPC and Vector Quantization JanČernocký,ValentinaHubeikaFITBUTBrno When modeling speech production based on LPC, we assume that the excitation is passed through the linear filter: H(z) = A(z) G,where A(z)isaP-thorderpolynome:

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract

More information

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

Thursday, October 29, LPC Analysis

Thursday, October 29, LPC Analysis LPC Analysis Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Chapter 10 Applications in Communications

Chapter 10 Applications in Communications Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM

More information

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada February 2003 c 2003 Peter Kabal 2003/02/25

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ Digital Speech Processing Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models Adaptive and Differential Coding 1 Speech Waveform Coding-Summary of Part 1 1. Probability

More information

Multimedia Communications. Differential Coding

Multimedia Communications. Differential Coding Multimedia Communications Differential Coding Differential Coding In many sources, the source output does not change a great deal from one sample to the next. This means that both the dynamic range and

More information

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav Spectral Flatness

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

E : Lecture 1 Introduction

E : Lecture 1 Introduction E85.2607: Lecture 1 Introduction 1 Administrivia 2 DSP review 3 Fun with Matlab E85.2607: Lecture 1 Introduction 2010-01-21 1 / 24 Course overview Advanced Digital Signal Theory Design, analysis, and implementation

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

LPC methods are the most widely used in. recognition, speaker recognition and verification

LPC methods are the most widely used in. recognition, speaker recognition and verification Digital Seech Processing Lecture 3 Linear Predictive Coding (LPC)- Introduction LPC Methods LPC methods are the most widely used in seech coding, seech synthesis, seech recognition, seaker recognition

More information

Design of a CELP coder and analysis of various quantization techniques

Design of a CELP coder and analysis of various quantization techniques EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005

More information

Pitch Prediction Filters in Speech Coding

Pitch Prediction Filters in Speech Coding IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING. VOL. 37, NO. 4, APRIL 1989 Pitch Prediction Filters in Speech Coding RAVI P. RAMACHANDRAN AND PETER KABAL Abstract-Prediction error filters

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Lecture 19 IIR Filters

Lecture 19 IIR Filters Lecture 19 IIR Filters Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/5/10 1 General IIR Difference Equation IIR system: infinite-impulse response system The most general class

More information

Lecture 04: Discrete Frequency Domain Analysis (z-transform)

Lecture 04: Discrete Frequency Domain Analysis (z-transform) Lecture 04: Discrete Frequency Domain Analysis (z-transform) John Chiverton School of Information Technology Mae Fah Luang University 1st Semester 2009/ 2552 Outline Overview Lecture Contents Introduction

More information

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Chirp Decomposition of Speech Signals for Glottal Source Estimation Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

The Equivalence of ADPCM and CELP Coding

The Equivalence of ADPCM and CELP Coding The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share

More information

c 2014 Jacob Daniel Bryan

c 2014 Jacob Daniel Bryan c 2014 Jacob Daniel Bryan AUTOREGRESSIVE HIDDEN MARKOV MODELS AND THE SPEECH SIGNAL BY JACOB DANIEL BRYAN THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Formant Analysis using LPC

Formant Analysis using LPC Linguistics 582 Basics of Digital Signal Processing Formant Analysis using LPC LPC (linear predictive coefficients) analysis is a technique for estimating the vocal tract transfer function, from which

More information

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering

ADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering Advanced Digital Signal rocessing (18-792) Spring Fall Semester, 201 2012 Department of Electrical and Computer Engineering ROBLEM SET 8 Issued: 10/26/18 Due: 11/2/18 Note: This problem set is due Friday,

More information

Computer Exercise 0 Simulation of ARMA-processes

Computer Exercise 0 Simulation of ARMA-processes Lund University Time Series Analysis Mathematical Statistics Fall 2018 Centre for Mathematical Sciences Computer Exercise 0 Simulation of ARMA-processes The purpose of this computer exercise is to illustrate

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Glottal Source Estimation using an Automatic Chirp Decomposition

Glottal Source Estimation using an Automatic Chirp Decomposition Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

DISCRETE-TIME SIGNAL PROCESSING

DISCRETE-TIME SIGNAL PROCESSING THIRD EDITION DISCRETE-TIME SIGNAL PROCESSING ALAN V. OPPENHEIM MASSACHUSETTS INSTITUTE OF TECHNOLOGY RONALD W. SCHÄFER HEWLETT-PACKARD LABORATORIES Upper Saddle River Boston Columbus San Francisco New

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Use: Analysis of systems, simple convolution, shorthand for e jw, stability. Motivation easier to write. Or X(z) = Z {x(n)}

Use: Analysis of systems, simple convolution, shorthand for e jw, stability. Motivation easier to write. Or X(z) = Z {x(n)} 1 VI. Z Transform Ch 24 Use: Analysis of systems, simple convolution, shorthand for e jw, stability. A. Definition: X(z) = x(n) z z - transforms Motivation easier to write Or Note if X(z) = Z {x(n)} z

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

Some notes about signals, orthogonal polynomials and linear algebra

Some notes about signals, orthogonal polynomials and linear algebra Some notes about signals, orthogonal polynomials and linear algebra Adhemar Bultheel Report TW 180, November 1992 Revised February 1993 n Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan

More information

Source modeling (block processing)

Source modeling (block processing) Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured

More information

BASIC COMPRESSION TECHNIQUES

BASIC COMPRESSION TECHNIQUES BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc

More information

Original application to speech was for data compression. Factor of about 10 But data compression is of less interest for speech these days

Original application to speech was for data compression. Factor of about 10 But data compression is of less interest for speech these days Original application to speech was for data compression Factor of about 10 But data compression is of less interest for speech these days Lives on because the compression method works by approximating

More information

) Rotate L by 120 clockwise to obtain in!! anywhere between load and generator: rotation by 2d in clockwise direction. d=distance from the load to the

) Rotate L by 120 clockwise to obtain in!! anywhere between load and generator: rotation by 2d in clockwise direction. d=distance from the load to the 3.1 Smith Chart Construction: Start with polar representation of. L ; in on lossless lines related by simple phase change ) Idea: polar plot going from L to in involves simple rotation. in jj 1 ) circle

More information

21.4. Engineering Applications of z-transforms. Introduction. Prerequisites. Learning Outcomes

21.4. Engineering Applications of z-transforms. Introduction. Prerequisites. Learning Outcomes Engineering Applications of z-transforms 21.4 Introduction In this Section we shall apply the basic theory of z-transforms to help us to obtain the response or output sequence for a discrete system. This

More information

L used in various speech coding applications for representing

L used in various speech coding applications for representing IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 1, NO. 1. JANUARY 1993 3 Efficient Vector Quantization of LPC Parameters at 24 BitsFrame Kuldip K. Paliwal, Member, IEEE, and Bishnu S. Atal, Fellow,

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Lab 4: Quantization, Oversampling, and Noise Shaping

Lab 4: Quantization, Oversampling, and Noise Shaping Lab 4: Quantization, Oversampling, and Noise Shaping Due Friday 04/21/17 Overview: This assignment should be completed with your assigned lab partner(s). Each group must turn in a report composed using

More information

Lecture 2: Acoustics. Acoustics & sound

Lecture 2: Acoustics. Acoustics & sound EE E680: Speech & Audio Processing & Recognition Lecture : Acoustics 1 3 4 The wave equation Acoustic tubes: reflections & resonance Oscillations & musical acoustics Spherical waves & room acoustics Dan

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Digital Signal Processing 17 (2007) 114 137 www.elsevier.com/locate/dsp A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Stephen So a,, Kuldip K.

More information

Magnitude F y. F x. Magnitude

Magnitude F y. F x. Magnitude Design of Optimum Multi-Dimensional Energy Compaction Filters N. Damera-Venkata J. Tuqan B. L. Evans Imaging Technology Dept. Dept. of ECE Dept. of ECE Hewlett-Packard Labs Univ. of California Univ. of

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Signals and Systems. Problem Set: The z-transform and DT Fourier Transform

Signals and Systems. Problem Set: The z-transform and DT Fourier Transform Signals and Systems Problem Set: The z-transform and DT Fourier Transform Updated: October 9, 7 Problem Set Problem - Transfer functions in MATLAB A discrete-time, causal LTI system is described by the

More information

Chapter 5 THE APPLICATION OF THE Z TRANSFORM. 5.3 Stability

Chapter 5 THE APPLICATION OF THE Z TRANSFORM. 5.3 Stability Chapter 5 THE APPLICATION OF THE Z TRANSFORM 5.3 Stability Copyright c 2005- Andreas Antoniou Victoria, BC, Canada Email: aantoniou@ieee.org February 13, 2008 Frame # 1 Slide # 1 A. Antoniou Digital Signal

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS We want to solve the linear system a, x + + a,n x n = b a n, x + + a n,n x n = b n This will be done by the method used in beginning algebra, by successively eliminating unknowns

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

UNIVERSITY OF OSLO. Please make sure that your copy of the problem set is complete before you attempt to answer anything.

UNIVERSITY OF OSLO. Please make sure that your copy of the problem set is complete before you attempt to answer anything. UNIVERSITY OF OSLO Faculty of mathematics and natural sciences Examination in INF3470/4470 Digital signal processing Day of examination: December 9th, 011 Examination hours: 14.30 18.30 This problem set

More information

Module 6: Deadbeat Response Design Lecture Note 1

Module 6: Deadbeat Response Design Lecture Note 1 Module 6: Deadbeat Response Design Lecture Note 1 1 Design of digital control systems with dead beat response So far we have discussed the design methods which are extensions of continuous time design

More information

Methods for Synthesizing Very High Q Parametrically Well Behaved Two Pole Filters

Methods for Synthesizing Very High Q Parametrically Well Behaved Two Pole Filters Methods for Synthesizing Very High Q Parametrically Well Behaved Two Pole Filters Max Mathews Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

SECTION FOR DIGITAL SIGNAL PROCESSING DEPARTMENT OF MATHEMATICAL MODELLING TECHNICAL UNIVERSITY OF DENMARK Course 04362 Digital Signal Processing: Solutions to Problems in Proakis and Manolakis, Digital

More information

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X The Z-Transform For a phasor: X(k) = e jωk We have previously derived: Y = H(z)X That is, the output of the filter (Y(k)) is derived by multiplying the input signal (X(k)) by the transfer function (H(z)).

More information

ETSI TS V5.0.0 ( )

ETSI TS V5.0.0 ( ) Technical Specification Universal Mobile Telecommunications System (UMTS); AMR speech Codec; Transcoding Functions () 1 Reference RTS/TSGS-046090v500 Keywords UMTS 650 Route des Lucioles F-0691 Sophia

More information

MATH 3511 Lecture 1. Solving Linear Systems 1

MATH 3511 Lecture 1. Solving Linear Systems 1 MATH 3511 Lecture 1 Solving Linear Systems 1 Dmitriy Leykekhman Spring 2012 Goals Review of basic linear algebra Solution of simple linear systems Gaussian elimination D Leykekhman - MATH 3511 Introduction

More information

ECE503: Digital Signal Processing Lecture 6

ECE503: Digital Signal Processing Lecture 6 ECE503: Digital Signal Processing Lecture 6 D. Richard Brown III WPI 20-February-2012 WPI D. Richard Brown III 20-February-2012 1 / 28 Lecture 6 Topics 1. Filter structures overview 2. FIR filter structures

More information

Centre for Mathematical Sciences HT 2017 Mathematical Statistics. Study chapters 6.1, 6.2 and in the course book.

Centre for Mathematical Sciences HT 2017 Mathematical Statistics. Study chapters 6.1, 6.2 and in the course book. Lund University Stationary stochastic processes Centre for Mathematical Sciences HT 2017 Mathematical Statistics Computer exercise 2 in Stationary stochastic processes, HT 17. The purpose with this computer

More information

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou

More information

3GPP TS V6.1.1 ( )

3GPP TS V6.1.1 ( ) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB)

More information

Analysis of Finite Wordlength Effects

Analysis of Finite Wordlength Effects Analysis of Finite Wordlength Effects Ideally, the system parameters along with the signal variables have infinite precision taing any value between and In practice, they can tae only discrete values within

More information

Elementary Linear Algebra

Elementary Linear Algebra Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We

More information

Signal Processing First Lab 11: PeZ - The z, n, and ˆω Domains

Signal Processing First Lab 11: PeZ - The z, n, and ˆω Domains Signal Processing First Lab : PeZ - The z, n, and ˆω Domains The lab report/verification will be done by filling in the last page of this handout which addresses a list of observations to be made when

More information