Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Similar documents
L7: Linear prediction of speech

SPEECH ANALYSIS AND SYNTHESIS

Lab 9a. Linear Predictive Coding for Speech Processing

Signal representations: Cepstrum

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Voiced Speech. Unvoiced Speech

Linear Prediction 1 / 41

LPC methods are the most widely used in. recognition, speaker recognition and verification

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

representation of speech

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Feature extraction 2

Speech Signal Representations

M. Hasegawa-Johnson. DRAFT COPY.

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

CS578- Speech Signal Processing

Design of a CELP coder and analysis of various quantization techniques

Sound 2: frequency analysis

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Frequency Domain Speech Analysis

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

Automatic Speech Recognition (CS753)

Source modeling (block processing)

Statistical and Adaptive Signal Processing

Applications of Linear Prediction

L8: Source estimation

Feature extraction 1

convenient means to determine response to a sum of clear evidence of signal properties that are obscured in the original signal

Timbral, Scale, Pitch modifications

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

3GPP TS V6.1.1 ( )

c 2014 Jacob Daniel Bryan

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

Lecture 7: Feature Extraction

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

COMP 546, Winter 2018 lecture 19 - sound 2

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.

Chapter 10 Applications in Communications

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive speech coder

Proc. of NCC 2010, Chennai, India

ETSI TS V7.0.0 ( )

Chapter 2 Speech Production Model

ETSI TS V5.0.0 ( )

The Equivalence of ADPCM and CELP Coding

ETSI TS V ( )

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

ETSI EN V7.1.1 ( )

Time-domain representations

Allpass Modeling of LP Residual for Speaker Recognition

Voice Activity Detection Using Pitch Feature

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

Class of waveform coders can be represented in this manner

Application of the Bispectrum to Glottal Pulse Analysis

David Weenink. First semester 2007

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Formant Analysis using LPC

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

L6: Short-time Fourier analysis and synthesis

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Multimedia Communications. Differential Coding

NEAR EAST UNIVERSITY

E : Lecture 1 Introduction

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

Glottal Source Estimation using an Automatic Chirp Decomposition

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

L used in various speech coding applications for representing

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

Vocoding approaches for statistical parametric speech synthesis

Joint Optimization of Linear Predictors in Speech Coders

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Thursday, October 29, LPC Analysis

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

Resonances and mode shapes of the human vocal tract during vowel production

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

Linear Prediction: The Problem, its Solution and Application to Speech

Estimation of Cepstral Coefficients for Robust Speech Recognition

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

Lecture 5: GMM Acoustic Modeling and Feature Extraction

1 1.27z z 2. 1 z H 2

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems

L29: Fourier analysis

Efficient Block Quantisation for Image and Speech Coding

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

Abstract Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech fra

4.2 Acoustics of Speech Production

Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways

Pitch Prediction Filters in Speech Coding

Signal Modeling Techniques In Speech Recognition

Lecture 9: Speech Recognition. Recognizing Speech

EE482: Digital Signal Processing Applications

Lecture 9: Speech Recognition

TinySR. Peter Schmidt-Nielsen. August 27, 2014

LPC and Vector Quantization

ETSI EN V7.0.1 ( )

Transcription:

Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1

LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification and for speech storage LPC methods provide extremely accurate estimates of speech parameters, and does it extremely efficiently basic idea of Linear Prediction: current speech sample can be closely approximated as a linear combination of past samples, i.e., 2

LPC Methods for periodic signals with N p period, it is obvious that but that is not what LP is doing; it is estimating s(n) from the p (p<< N p ) most recent values of s(n) by linearly predicting its value for LP, the predictor coefficients (the α k 's) are determined (computed) by minimizing the sum of squared differences (over a finite interval) between the actual speech samples and the linearly predicted ones 3

Speech Production Model the time-varying digital filter represents the effects of the glottal pulse shape, the vocal tract IR, and radiation at the lips the system is excited by an impulse train for voiced speech, or a random noise sequence for unvoiced speech this all-pole model is a natural representation for non-nasal voiced speech but it also works reasonably well for nasals and unvoiced sounds 4

Linear Prediction Model a p-th order linear predictor is a system of the form the prediction error, e(n), is of the form the prediction error is the output of a system with transfer function 5

LP Estimation Issues need to determine {α k } directly from speech such that they give good estimates of the time-varying spectrum need to estimate {α k } from short segments of speech minimize mean-squared prediction error over short segments of speech if the speech signal obeys the production model exactly, then α k =a k e(n) = Gu(n) A(z) is an inverse filter for H(z) 6

Solution for {α k } short-time average prediction squared-error is defined as select segment of speech in the vicinity of sample the key issue to resolve is the range of m for summation (to be discussed later) 7

Solution for {α k } can find values of α k that minimize by setting giving the set of equations where are the values of α k that minimize (from now on just use α k rather than for the optimum values) prediction error is orthogonal to signal for delays (i) of 1 to p 8

Solution for {α k } defining we get leading to a set of p equations in p unknowns that can be solved in an efficient manner for the {α k } 9

Solution for {α k } minimum mean-squared prediction error has the form which can be written in the form Process Compute for Solve matrix equation for α k need to specify range of m to compute need to specify 10

Autocorrelation Method assume exists for and is exactly zero everywhere else (i.e., window of length L samples) (Assumption #1) where w(m) is a finite length window of length L samples 11

Autocorrelation Method if is non-zero only for, then is non-zero only over the interval, giving at values of m near 0 (i.e. m = 0,1,,p-1) we are predicting signal from zero-valued samples outside the window range => will be (relatively) large at values near m=l (i.e. m = L,L+1,,L+p-1) we are predicting zero-valued samples (outside window range) from non-zero values => will be (relatively) large for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window) 12

Autocorrelation Method 13

Autocorrelation Method for calculation of since outside the range then which is equivalent to the form can easily show that where is the shot-time autocorrelation of evaluated at i-k, where 14

Autocorrelation Method since is even, then thus the basic equation becomes with the minimum mean-squared prediction error of the form 15

Autocorrelation Method as expressed in matrix form with solution is a pxp Toeplitz Matrix => symmetric with all diagonal elements equal => there exist more efficient algorithms to solve for {α k } than simple matrix inversion 16

Covariance Method there is a second basic approach to defining the speech segment and the limits on the sums, namely fix the interval over which the mean-squared error is computed, giving (Assumption #2) 17

Covariance Method changing the summation index gives key difference from Autocorrelation Method is that limits of summation include terms before m = 0 => window extends p samples backwards from to since we are extending window backwards, don't need to taper it using a HW- since there is no transition at window edges 18

Covariance Method 19

Covariance Method cannot use autocorrelation formulation => this is a true cross correlation need to solve set of equations of the form 20

Covariance Method we have => symmetric but not Toeplitz matrix all terms have a fixed number of terms contributing to the computed values (L terms) is a covariance matrix => specialized solution for {α k } called the Covariance Method 21

LPC Summary 1. Speech Production Model 2. Linear Prediction Model 22

LPC Summary 3. LPC Minimization 23

LPC Summary 4. Autocorrelation Method 24

LPC Summary 4. Autocorrelation Method resulting matrix equation matrix equation solved using Levinsn-Durbin method 25

5. Covariance Method fix interval for error signal LPC Summary need signal for from to => L+p samples expressed as a matrix equation 26

Frequency Domain Interpretations of Linear Predictive Analysis 27

The Resulting LPC Model The final LPC model consists of the LPC parameters, {α k }, k=1,2,,p, and the gain, G, which together define the system function with frequency response with the gain determined by matching the energy of the model to the short-time energy of the speech signal, i.e., 28

LPC Spectrum LP Analysis is seen to be a method of short-time spectrum estimation with removal of excitation fine structure (a form of wideband spectrum analysis) 29

Effects of Model Order 30

Effects of Model Order plots show Fourier transform of segment and LP spectra for various orders as p increases, more details of the spectrum are preserved need to choose a value of p that represents the spectral effects of the glottal pulse, vocal tract and radiation-- nothing else 31

Linear Prediction Spectrogram Speech spectrogram previously defined as: for set of times,, and set of frequencies, where R is the time shift (in samples) between adjacent STFTS, T is the sampling period, F S = 1 / T is the sampling frequency, and N is the size of the discrete Fourier transform used to computed each STFT estimate. Similarly we can define the LP spectrogram as an image plot of: where and are the gain and prediction error polynomial at analysis time rr. 32

Linear Prediction Spectrogram Wideband Fourier spectrogram ( L=81, R=3, N=1000, 40 db dynamic range) Linear predictive spectrogram (p=12) 33

Comparison to Other Spectrum Analysis Methods Spectra of synthetic vowel /IY/ (a) Narrowband spectrum using 40 msec window (b) Wideband spectrum using a 10 msec window (c) Cepstrally smoothed spectrum (d) LPC spectrum from a 40 msec section using a p=12 order LPC analysis 34

Comparison to Other Spectrum Analysis Methods Natural speech spectral estimates using cepstral smoothing (solid line) and linear prediction analysis (dashed line). Note the fewer (spurious) peaks in the LP analysis spectrum since LP used p=12 which restricted the spectral match to a maximum of 6 resonance peaks. Note the narrow bandwidths of the LP resonances versus the cepstrally smoothed resonances. 35

Solutions of LPC Equations Autocorrelation Method (Levinson-Durbin Algorithm) 36

Levinson-Durbin Algorithm 1 Autocorrelation equations (at each frame ) R is a positive definite symmetric Toeplitz matrix The set of optimum predictor coefficients satisfy with minimum mean-squared prediction error of 37

Levinson-Durbin Algorithm 2 By combining the last two equations we get a larger matrix equation of the form: expanded (p+1)x(p+1) matrix is still Toeplitz and can be solved iteratively by incorporating new correlation value at each iteration and solving for higher order predictor in terms of new correlation value and previous predictor 38

Levinson-Durbin Algorithm 3 Show how i-th order solution can be derived from (i-1)-st order solution; i.e., given the solution to we derive solution to The (i-1)-st solution can be expressed as 39

Levinson-Durbin Algorithm 4 Appending a 0 to vector and multiplying by the matrix gives a new set of (i+1) equations of the form: where and R[i] are introduced 40

Levinson-Durbin Algorithm 5 Key step is that since Toeplitz matrix has special symmetry we can reverse the order of the equations (first equation last, last equation first), giving: 41

Levinson-Durbin Algorithm 6 To get the equation into the desired form (a single component in the vector ) we combine the two sets of matrices (with a multiplicative factor ) giving: Choose so that vector on right has only a single non-zero entry, i.e., 42

Levinson-Durbin Algorithm 7 The first element of the right hand side vector is now: The k i parameters are called PARCOR (partial correlation) coefficients With this choice of, the vector of i-th order predictor coefficients is: yielding the updating procedure 43

Levinson-Durbin Algorithm 8 The final solution for order p is: with prediction error If we use normalized autocorrelation coefficients: we get normalized errors of the form: where 44

Levinson-Durbin Algorithm 45

Autocorrelation Example consider a simple p = 2 solution of the form with solution 46

Autocorrelation Example with final coefficients 47

Prediction Error as a Function of p 48

Autocorrelation Method Properties mean-squared prediction error always non-zero decreases monotonically with increasing model order autocorrelation matching property model and data match up to order p spectrum matching property favors peaks of short-time FT minimum-phase property zeros of A(z) are inside the unit circle Levinson-Durbin recursion efficient algorithm for finding prediction coefficients PARCOR coefficients and MSE are by-products 49

The Prediction Error Signal 50

Prediction Error Signal Behavior 51

LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 52

LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 53

LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 54

LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 55

Properties of the LPC Polynomial 56

Minimum-Phase Property of A(z) Proof: Assume that is a zero (root) of A(z) The minimum mean-squared error is Thus, A(z) could not be the optimum filter because we could replace z 0 by and decrease the error 57

PARCORs and Stability Proof: It is easily shown that k i is the coefficient of z -i in A (i) (z), i.e., Therefore If k i 1, then either all the roots must be on the unit circle or at least one of them must be outside the unit circle k i <1 is a necessary and sufficient condition for A(z) to be a minimum phase system and 1/A(z) to be a stable system 58

Root Locations for Optimum LP Model 59

Pole-Zero Plot for Model 60

Pole Locations 61

Pole Locations (F S =10,000 Hz) 62

Estimating Formant Frequencies compute A(z) and factor it find roots that are close to the unit circle. compute equivalent analog frequencies from the angles of the roots. plot formant frequencies as a function of time. 63

Spectrogram with LPC Roots 64

Spectrogram with LPC Roots 65

Alternative Representations of the LP Parameters 66

LP Parameter Sets 67

PARCOR PARCORs to Prediction Coefficients assume that k i, i=1,2,, p are given. Then we can skip the computation of k i in the Levinson recursion. 68

PARCOR Prediction Coefficients to PARCORs assume that α j, j=1,2,, p are given. Then we can work backwards through the Levinson Recursion. 69

Log Area Ratio log area ratio coefficients from PARCOR coefficients with inverse relation 70

Roots of Predictor Polynomial roots of the predictor polynomial where each root can be expressed as a z-plane i.e., important for formant estimation 71

Impulse Response of H(z) IR of all pole system 72

LP Cepstrum cepstrum of IR of overall LP system from predictor coefficients predictor coefficients from cepstrum of IR where 73

autocorrelation of IR Autocorrelation of IR 74

Autocorrelation of Predictor Polynomial autocorrelation of the predictor polynomial with IR of the inverse filter with autocorrelation 75

Line Spectral Pairs Quantization of LP Parameters consider the magnitude-squared of the model frequency response where g is a parameter that affects P. spectral sensitivity can be defined as which measures sensitivity to errors in the g i parameters 76

Line Spectral Pairs spectral sensitivity for k i parameters; low sensitivity around 0; high sensitivity around 1 spectral sensitivity for log area ratio parameters, g i low sensitivity for virtually entire range is seen 77

Line Spectral Pairs Consider the following Form the symmetric polynomial P(z) as Form the anti-symmetric polynomial Q(z) as 78

LSP Example 79

Line Spectral Pairs properties of LSP parameters 1. all the roots of P(z) and Q(z) are on the unit circle 2. a necessary and sufficient condition for k i < 1, i = 1, 2,, p is that the roots of P(z) and Q(z) alternate on the unit circle 3. the LSP frequencies get close together when roots of A(z) are close to the unit circle 80

Applications 81

Speech Synthesis 82

Speech Coding 1. Extract α k parameters properly 2. Quantize α k parameters properly so that there is little quantization error Small number of bits go into coding the α k coefficients 3. Represent e(n) via: Pitch pulses and noise LPC Coding Multiple pulses per 10 msec interval MPLPC Coding Codebook vectors CELP Almost all of the coding bits go into coding of e(n) 83

LPC Vocoder 84