Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Size: px
Start display at page:

Download "Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析"

Transcription

1 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1

2 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification and for speech storage LPC methods provide extremely accurate estimates of speech parameters, and does it extremely efficiently basic idea of Linear Prediction: current speech sample can be closely approximated as a linear combination of past samples, i.e., 2

3 LPC Methods for periodic signals with N p period, it is obvious that but that is not what LP is doing; it is estimating s(n) from the p (p<< N p ) most recent values of s(n) by linearly predicting its value for LP, the predictor coefficients (the α k 's) are determined (computed) by minimizing the sum of squared differences (over a finite interval) between the actual speech samples and the linearly predicted ones 3

4 Speech Production Model the time-varying digital filter represents the effects of the glottal pulse shape, the vocal tract IR, and radiation at the lips the system is excited by an impulse train for voiced speech, or a random noise sequence for unvoiced speech this all-pole model is a natural representation for non-nasal voiced speech but it also works reasonably well for nasals and unvoiced sounds 4

5 Linear Prediction Model a p-th order linear predictor is a system of the form the prediction error, e(n), is of the form the prediction error is the output of a system with transfer function 5

6 LP Estimation Issues need to determine {α k } directly from speech such that they give good estimates of the time-varying spectrum need to estimate {α k } from short segments of speech minimize mean-squared prediction error over short segments of speech if the speech signal obeys the production model exactly, then α k =a k e(n) = Gu(n) A(z) is an inverse filter for H(z) 6

7 Solution for {α k } short-time average prediction squared-error is defined as select segment of speech in the vicinity of sample the key issue to resolve is the range of m for summation (to be discussed later) 7

8 Solution for {α k } can find values of α k that minimize by setting giving the set of equations where are the values of α k that minimize (from now on just use α k rather than for the optimum values) prediction error is orthogonal to signal for delays (i) of 1 to p 8

9 Solution for {α k } defining we get leading to a set of p equations in p unknowns that can be solved in an efficient manner for the {α k } 9

10 Solution for {α k } minimum mean-squared prediction error has the form which can be written in the form Process Compute for Solve matrix equation for α k need to specify range of m to compute need to specify 10

11 Autocorrelation Method assume exists for and is exactly zero everywhere else (i.e., window of length L samples) (Assumption #1) where w(m) is a finite length window of length L samples 11

12 Autocorrelation Method if is non-zero only for, then is non-zero only over the interval, giving at values of m near 0 (i.e. m = 0,1,,p-1) we are predicting signal from zero-valued samples outside the window range => will be (relatively) large at values near m=l (i.e. m = L,L+1,,L+p-1) we are predicting zero-valued samples (outside window range) from non-zero values => will be (relatively) large for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window) 12

13 Autocorrelation Method 13

14 Autocorrelation Method for calculation of since outside the range then which is equivalent to the form can easily show that where is the shot-time autocorrelation of evaluated at i-k, where 14

15 Autocorrelation Method since is even, then thus the basic equation becomes with the minimum mean-squared prediction error of the form 15

16 Autocorrelation Method as expressed in matrix form with solution is a pxp Toeplitz Matrix => symmetric with all diagonal elements equal => there exist more efficient algorithms to solve for {α k } than simple matrix inversion 16

17 Covariance Method there is a second basic approach to defining the speech segment and the limits on the sums, namely fix the interval over which the mean-squared error is computed, giving (Assumption #2) 17

18 Covariance Method changing the summation index gives key difference from Autocorrelation Method is that limits of summation include terms before m = 0 => window extends p samples backwards from to since we are extending window backwards, don't need to taper it using a HW- since there is no transition at window edges 18

19 Covariance Method 19

20 Covariance Method cannot use autocorrelation formulation => this is a true cross correlation need to solve set of equations of the form 20

21 Covariance Method we have => symmetric but not Toeplitz matrix all terms have a fixed number of terms contributing to the computed values (L terms) is a covariance matrix => specialized solution for {α k } called the Covariance Method 21

22 LPC Summary 1. Speech Production Model 2. Linear Prediction Model 22

23 LPC Summary 3. LPC Minimization 23

24 LPC Summary 4. Autocorrelation Method 24

25 LPC Summary 4. Autocorrelation Method resulting matrix equation matrix equation solved using Levinsn-Durbin method 25

26 5. Covariance Method fix interval for error signal LPC Summary need signal for from to => L+p samples expressed as a matrix equation 26

27 Frequency Domain Interpretations of Linear Predictive Analysis 27

28 The Resulting LPC Model The final LPC model consists of the LPC parameters, {α k }, k=1,2,,p, and the gain, G, which together define the system function with frequency response with the gain determined by matching the energy of the model to the short-time energy of the speech signal, i.e., 28

29 LPC Spectrum LP Analysis is seen to be a method of short-time spectrum estimation with removal of excitation fine structure (a form of wideband spectrum analysis) 29

30 Effects of Model Order 30

31 Effects of Model Order plots show Fourier transform of segment and LP spectra for various orders as p increases, more details of the spectrum are preserved need to choose a value of p that represents the spectral effects of the glottal pulse, vocal tract and radiation-- nothing else 31

32 Linear Prediction Spectrogram Speech spectrogram previously defined as: for set of times,, and set of frequencies, where R is the time shift (in samples) between adjacent STFTS, T is the sampling period, F S = 1 / T is the sampling frequency, and N is the size of the discrete Fourier transform used to computed each STFT estimate. Similarly we can define the LP spectrogram as an image plot of: where and are the gain and prediction error polynomial at analysis time rr. 32

33 Linear Prediction Spectrogram Wideband Fourier spectrogram ( L=81, R=3, N=1000, 40 db dynamic range) Linear predictive spectrogram (p=12) 33

34 Comparison to Other Spectrum Analysis Methods Spectra of synthetic vowel /IY/ (a) Narrowband spectrum using 40 msec window (b) Wideband spectrum using a 10 msec window (c) Cepstrally smoothed spectrum (d) LPC spectrum from a 40 msec section using a p=12 order LPC analysis 34

35 Comparison to Other Spectrum Analysis Methods Natural speech spectral estimates using cepstral smoothing (solid line) and linear prediction analysis (dashed line). Note the fewer (spurious) peaks in the LP analysis spectrum since LP used p=12 which restricted the spectral match to a maximum of 6 resonance peaks. Note the narrow bandwidths of the LP resonances versus the cepstrally smoothed resonances. 35

36 Solutions of LPC Equations Autocorrelation Method (Levinson-Durbin Algorithm) 36

37 Levinson-Durbin Algorithm 1 Autocorrelation equations (at each frame ) R is a positive definite symmetric Toeplitz matrix The set of optimum predictor coefficients satisfy with minimum mean-squared prediction error of 37

38 Levinson-Durbin Algorithm 2 By combining the last two equations we get a larger matrix equation of the form: expanded (p+1)x(p+1) matrix is still Toeplitz and can be solved iteratively by incorporating new correlation value at each iteration and solving for higher order predictor in terms of new correlation value and previous predictor 38

39 Levinson-Durbin Algorithm 3 Show how i-th order solution can be derived from (i-1)-st order solution; i.e., given the solution to we derive solution to The (i-1)-st solution can be expressed as 39

40 Levinson-Durbin Algorithm 4 Appending a 0 to vector and multiplying by the matrix gives a new set of (i+1) equations of the form: where and R[i] are introduced 40

41 Levinson-Durbin Algorithm 5 Key step is that since Toeplitz matrix has special symmetry we can reverse the order of the equations (first equation last, last equation first), giving: 41

42 Levinson-Durbin Algorithm 6 To get the equation into the desired form (a single component in the vector ) we combine the two sets of matrices (with a multiplicative factor ) giving: Choose so that vector on right has only a single non-zero entry, i.e., 42

43 Levinson-Durbin Algorithm 7 The first element of the right hand side vector is now: The k i parameters are called PARCOR (partial correlation) coefficients With this choice of, the vector of i-th order predictor coefficients is: yielding the updating procedure 43

44 Levinson-Durbin Algorithm 8 The final solution for order p is: with prediction error If we use normalized autocorrelation coefficients: we get normalized errors of the form: where 44

45 Levinson-Durbin Algorithm 45

46 Autocorrelation Example consider a simple p = 2 solution of the form with solution 46

47 Autocorrelation Example with final coefficients 47

48 Prediction Error as a Function of p 48

49 Autocorrelation Method Properties mean-squared prediction error always non-zero decreases monotonically with increasing model order autocorrelation matching property model and data match up to order p spectrum matching property favors peaks of short-time FT minimum-phase property zeros of A(z) are inside the unit circle Levinson-Durbin recursion efficient algorithm for finding prediction coefficients PARCOR coefficients and MSE are by-products 49

50 The Prediction Error Signal 50

51 Prediction Error Signal Behavior 51

52 LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 52

53 LP Speech Analysis file:s5, ss:11000, frame size (L):320, lpc order (p):14, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 53

54 LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, cov method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 54

55 LP Speech Analysis file:s3, ss:14000, frame size (L):160, lpc order (p):16, ac method Top panel: speech signal Second panel: error signal Third panel: log magnitude spectra of signal and LP model Fourth panel: log magnitude spectrum of error signal 55

56 Properties of the LPC Polynomial 56

57 Minimum-Phase Property of A(z) Proof: Assume that is a zero (root) of A(z) The minimum mean-squared error is Thus, A(z) could not be the optimum filter because we could replace z 0 by and decrease the error 57

58 PARCORs and Stability Proof: It is easily shown that k i is the coefficient of z -i in A (i) (z), i.e., Therefore If k i 1, then either all the roots must be on the unit circle or at least one of them must be outside the unit circle k i <1 is a necessary and sufficient condition for A(z) to be a minimum phase system and 1/A(z) to be a stable system 58

59 Root Locations for Optimum LP Model 59

60 Pole-Zero Plot for Model 60

61 Pole Locations 61

62 Pole Locations (F S =10,000 Hz) 62

63 Estimating Formant Frequencies compute A(z) and factor it find roots that are close to the unit circle. compute equivalent analog frequencies from the angles of the roots. plot formant frequencies as a function of time. 63

64 Spectrogram with LPC Roots 64

65 Spectrogram with LPC Roots 65

66 Alternative Representations of the LP Parameters 66

67 LP Parameter Sets 67

68 PARCOR PARCORs to Prediction Coefficients assume that k i, i=1,2,, p are given. Then we can skip the computation of k i in the Levinson recursion. 68

69 PARCOR Prediction Coefficients to PARCORs assume that α j, j=1,2,, p are given. Then we can work backwards through the Levinson Recursion. 69

70 Log Area Ratio log area ratio coefficients from PARCOR coefficients with inverse relation 70

71 Roots of Predictor Polynomial roots of the predictor polynomial where each root can be expressed as a z-plane i.e., important for formant estimation 71

72 Impulse Response of H(z) IR of all pole system 72

73 LP Cepstrum cepstrum of IR of overall LP system from predictor coefficients predictor coefficients from cepstrum of IR where 73

74 autocorrelation of IR Autocorrelation of IR 74

75 Autocorrelation of Predictor Polynomial autocorrelation of the predictor polynomial with IR of the inverse filter with autocorrelation 75

76 Line Spectral Pairs Quantization of LP Parameters consider the magnitude-squared of the model frequency response where g is a parameter that affects P. spectral sensitivity can be defined as which measures sensitivity to errors in the g i parameters 76

77 Line Spectral Pairs spectral sensitivity for k i parameters; low sensitivity around 0; high sensitivity around 1 spectral sensitivity for log area ratio parameters, g i low sensitivity for virtually entire range is seen 77

78 Line Spectral Pairs Consider the following Form the symmetric polynomial P(z) as Form the anti-symmetric polynomial Q(z) as 78

79 LSP Example 79

80 Line Spectral Pairs properties of LSP parameters 1. all the roots of P(z) and Q(z) are on the unit circle 2. a necessary and sufficient condition for k i < 1, i = 1, 2,, p is that the roots of P(z) and Q(z) alternate on the unit circle 3. the LSP frequencies get close together when roots of A(z) are close to the unit circle 80

81 Applications 81

82 Speech Synthesis 82

83 Speech Coding 1. Extract α k parameters properly 2. Quantize α k parameters properly so that there is little quantization error Small number of bits go into coding the α k coefficients 3. Represent e(n) via: Pitch pulses and noise LPC Coding Multiple pulses per 10 msec interval MPLPC Coding Codebook vectors CELP Almost all of the coding bits go into coding of e(n) 83

84 LPC Vocoder 84

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 7 Solutions Linear prediction analysis is used to obtain an eleventh-order all-pole model for a segment

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

LPC methods are the most widely used in. recognition, speaker recognition and verification

LPC methods are the most widely used in. recognition, speaker recognition and verification Digital Seech Processing Lecture 3 Linear Predictive Coding (LPC)- Introduction LPC Methods LPC methods are the most widely used in seech coding, seech synthesis, seech recognition, seaker recognition

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007

Linear Prediction Coding. Nimrod Peleg Update: Aug. 2007 Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

M. Hasegawa-Johnson. DRAFT COPY.

M. Hasegawa-Johnson. DRAFT COPY. Lecture Notes in Speech Production, Speech Coding, and Speech Recognition Mark Hasegawa-Johnson University of Illinois at Urbana-Champaign February 7, 000 M. Hasegawa-Johnson. DRAFT COPY. Chapter Linear

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

Design of a CELP coder and analysis of various quantization techniques

Design of a CELP coder and analysis of various quantization techniques EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University

Speech Coding. Speech Processing. Tom Bäckström. October Aalto University Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M:

Lesson 1. Optimal signalbehandling LTH. September Statistical Digital Signal Processing and Modeling, Hayes, M: Lesson 1 Optimal Signal Processing Optimal signalbehandling LTH September 2013 Statistical Digital Signal Processing and Modeling, Hayes, M: John Wiley & Sons, 1996. ISBN 0471594318 Nedelko Grbic Mtrl

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

Source modeling (block processing)

Source modeling (block processing) Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

Applications of Linear Prediction

Applications of Linear Prediction SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

convenient means to determine response to a sum of clear evidence of signal properties that are obscured in the original signal

convenient means to determine response to a sum of clear evidence of signal properties that are obscured in the original signal Digital Speech Processing Lecture 9 Short-Time Fourier Analysis Methods- Introduction 1 General Discrete-Time Model of Speech Production Voiced Speech: A V P(z)G(z)V(z)R(z) Unvoiced Speech: A N N(z)V(z)R(z)

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

More information

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech

Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Ill-Conditioning and Bandwidth Expansion in Linear Prediction of Speech Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada February 2003 c 2003 Peter Kabal 2003/02/25

More information

3GPP TS V6.1.1 ( )

3GPP TS V6.1.1 ( ) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB)

More information

c 2014 Jacob Daniel Bryan

c 2014 Jacob Daniel Bryan c 2014 Jacob Daniel Bryan AUTOREGRESSIVE HIDDEN MARKOV MODELS AND THE SPEECH SIGNAL BY JACOB DANIEL BRYAN THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer. Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm.

Keywords: Vocal Tract; Lattice model; Reflection coefficients; Linear Prediction; Levinson algorithm. Volume 3, Issue 6, June 213 ISSN: 2277 128X International Journal of Advanced Research in Comuter Science and Software Engineering Research Paer Available online at: www.ijarcsse.com Lattice Filter Model

More information

Chapter 10 Applications in Communications

Chapter 10 Applications in Communications Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM

More information

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM Mayukh Bhaowal and Kunal Chawla (Students)Indian Institute of Information Technology, Allahabad, India Abstract: Key words: Speech recognition

More information

On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive speech coder

On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive speech coder Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 1-1-1992 On reducing the coding-delay and computational complexity in an innovations-assisted linear predictive

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

ETSI TS V7.0.0 ( )

ETSI TS V7.0.0 ( ) TS 6 9 V7.. (7-6) Technical Specification Digital cellular telecommunications system (Phase +); Universal Mobile Telecommunications System (UMTS); Speech codec speech processing functions; Adaptive Multi-Rate

More information

Chapter 2 Speech Production Model

Chapter 2 Speech Production Model Chapter 2 Speech Production Model Abstract The continuous speech signal (air) that comes out of the mouth and the nose is converted into the electrical signal using the microphone. The electrical speech

More information

ETSI TS V5.0.0 ( )

ETSI TS V5.0.0 ( ) Technical Specification Universal Mobile Telecommunications System (UMTS); AMR speech Codec; Transcoding Functions () 1 Reference RTS/TSGS-046090v500 Keywords UMTS 650 Route des Lucioles F-0691 Sophia

More information

The Equivalence of ADPCM and CELP Coding

The Equivalence of ADPCM and CELP Coding The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share

More information

ETSI TS V ( )

ETSI TS V ( ) TS 146 060 V14.0.0 (2017-04) TECHNICAL SPECIFICATION Digital cellular telecommunications system (Phase 2+) (GSM); Enhanced Full Rate (EFR) speech transcoding (3GPP TS 46.060 version 14.0.0 Release 14)

More information

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

Speaker Identification Based On Discriminative Vector Quantization And Data Fusion University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou

More information

ETSI EN V7.1.1 ( )

ETSI EN V7.1.1 ( ) European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Adaptive Multi-Rate (AMR) speech transcoding GLOBAL SYSTEM FOR MOBILE COMMUNICATIONS R Reference DEN/SMG-110690Q7

More information

Time-domain representations

Time-domain representations Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Digital Signal Processing 17 (2007) 114 137 www.elsevier.com/locate/dsp A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding Stephen So a,, Kuldip K.

More information

Class of waveform coders can be represented in this manner

Class of waveform coders can be represented in this manner Digital Speech Processing Lecture 15 Speech Coding Methods Based on Speech Waveform Representations ti and Speech Models Uniform and Non- Uniform Coding Methods 1 Analog-to-Digital Conversion (Sampling

More information

Application of the Bispectrum to Glottal Pulse Analysis

Application of the Bispectrum to Glottal Pulse Analysis ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker

More information

David Weenink. First semester 2007

David Weenink. First semester 2007 Institute of Phonetic Sciences University of Amsterdam First semester 2007 Digital s What is a digital filter? An algorithm that calculates with sample values Formant /machine H 1 (z) that: Given input

More information

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Chirp Decomposition of Speech Signals for Glottal Source Estimation Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

Formant Analysis using LPC

Formant Analysis using LPC Linguistics 582 Basics of Digital Signal Processing Formant Analysis using LPC LPC (linear predictive coefficients) analysis is a technique for estimating the vocal tract transfer function, from which

More information

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions The complex cepstrum, ˆx[n], of a sequence x[n] is the inverse Fourier transform of the complex

More information

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ

1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ Digital Speech Processing Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models Adaptive and Differential Coding 1 Speech Waveform Coding-Summary of Part 1 1. Probability

More information

L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

Multimedia Communications. Differential Coding

Multimedia Communications. Differential Coding Multimedia Communications Differential Coding Differential Coding In many sources, the source output does not change a great deal from one sample to the next. This means that both the dynamic range and

More information

NEAR EAST UNIVERSITY

NEAR EAST UNIVERSITY NEAR EAST UNIVERSITY GRADUATE SCHOOL OF APPLIED ANO SOCIAL SCIENCES LINEAR PREDICTIVE CODING \ Burak Alacam Master Thesis Department of Electrical and Electronic Engineering Nicosia - 2002 Burak Alacam:

More information

E : Lecture 1 Introduction

E : Lecture 1 Introduction E85.2607: Lecture 1 Introduction 1 Administrivia 2 DSP review 3 Fun with Matlab E85.2607: Lecture 1 Introduction 2010-01-21 1 / 24 Course overview Advanced Digital Signal Theory Design, analysis, and implementation

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav Spectral Flatness

More information

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract

More information

Glottal Source Estimation using an Automatic Chirp Decomposition

Glottal Source Estimation using an Automatic Chirp Decomposition Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

INTERNATIONAL TELECOMMUNICATION UNION. Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) INTERNATIONAL TELECOMMUNICATION UNION ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU G.722.2 (07/2003) SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL SYSTEMS AND NETWORKS Digital terminal equipments

More information

L used in various speech coding applications for representing

L used in various speech coding applications for representing IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 1, NO. 1. JANUARY 1993 3 Efficient Vector Quantization of LPC Parameters at 24 BitsFrame Kuldip K. Paliwal, Member, IEEE, and Bishnu S. Atal, Fellow,

More information

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X The Z-Transform For a phasor: X(k) = e jωk We have previously derived: Y = H(z)X That is, the output of the filter (Y(k)) is derived by multiplying the input signal (X(k)) by the transfer function (H(z)).

More information

Vocoding approaches for statistical parametric speech synthesis

Vocoding approaches for statistical parametric speech synthesis Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,

More information

Joint Optimization of Linear Predictors in Speech Coders

Joint Optimization of Linear Predictors in Speech Coders 642 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 37. NO. 5. MAY 1989 Joint Optimization of Linear Predictors in Speech Coders PETER KABAL, MEMBER, IEEE, AND RAVI P. RAMACHANDRAN

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Thursday, October 29, LPC Analysis

Thursday, October 29, LPC Analysis LPC Analysis Prediction & Regression We hypothesize that there is some systematic relation between the values of two variables, X and Y. If this hypothesis is true, we can (partially) predict the observed

More information

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding entropy Article Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding Jerry D. Gibson * and Preethi Mahadevan Department of Electrical and Computer

More information

Resonances and mode shapes of the human vocal tract during vowel production

Resonances and mode shapes of the human vocal tract during vowel production Resonances and mode shapes of the human vocal tract during vowel production Atle Kivelä, Juha Kuortti, Jarmo Malinen Aalto University, School of Science, Department of Mathematics and Systems Analysis

More information

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM

AN INVERTIBLE DISCRETE AUDITORY TRANSFORM COMM. MATH. SCI. Vol. 3, No. 1, pp. 47 56 c 25 International Press AN INVERTIBLE DISCRETE AUDITORY TRANSFORM JACK XIN AND YINGYONG QI Abstract. A discrete auditory transform (DAT) from sound signal to

More information

Linear Prediction: The Problem, its Solution and Application to Speech

Linear Prediction: The Problem, its Solution and Application to Speech Dublin Institute of Technology ARROW@DIT Conference papers Audio Research Group 2008-01-01 Linear Prediction: The Problem, its Solution and Application to Speech Alan O'Cinneide Dublin Institute of Technology,

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion

Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion Parametric Signal Modeling and Linear Prediction Theory 4. The Levinson-Durbin Recursion Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides were adapted

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

1 1.27z z 2. 1 z H 2

1 1.27z z 2. 1 z H 2 E481 Digital Signal Processing Exam Date: Thursday -1-1 16:15 18:45 Final Exam - Solutions Dan Ellis 1. (a) In this direct-form II second-order-section filter, the first stage has

More information

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems

Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems GPP C.S00-0 Version.0 Date: June, 00 Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option for Spread Spectrum Systems COPYRIGHT GPP and its Organizational Partners claim

More information

L29: Fourier analysis

L29: Fourier analysis L29: Fourier analysis Introduction The discrete Fourier Transform (DFT) The DFT matrix The Fast Fourier Transform (FFT) The Short-time Fourier Transform (STFT) Fourier Descriptors CSCE 666 Pattern Analysis

More information

Efficient Block Quantisation for Image and Speech Coding

Efficient Block Quantisation for Image and Speech Coding Efficient Block Quantisation for Image and Speech Coding Stephen So, BEng (Hons) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith University, Brisbane, Australia

More information

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Stochastic Processes. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Stochastic Processes Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Abstract Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech fra

Abstract Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech fra Intraframe and Interframe Coding of Speech Spectral Parameters James H. Y. Loo B. A. Sc. Department of Electrical Engineering McGill University Montreal, Canada September 1996 A thesis submitted to the

More information

4.2 Acoustics of Speech Production

4.2 Acoustics of Speech Production 4.2 Acoustics of Speech Production Acoustic phonetics is a field that studies the acoustic properties of speech and how these are related to the human speech production system. The topic is vast, exceeding

More information

Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways

Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways Marsland Press Journal of American Science 2009:5(2) 1-12 Using the Sound Recognition Techniques to Reduce the Electricity Consumption in Highways 1 Khalid T. Al-Sarayreh, 2 Rafa E. Al-Qutaish, 3 Basil

More information

Pitch Prediction Filters in Speech Coding

Pitch Prediction Filters in Speech Coding IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING. VOL. 37, NO. 4, APRIL 1989 Pitch Prediction Filters in Speech Coding RAVI P. RAMACHANDRAN AND PETER KABAL Abstract-Prediction error filters

More information

Signal Modeling Techniques In Speech Recognition

Signal Modeling Techniques In Speech Recognition Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

LPC and Vector Quantization

LPC and Vector Quantization LPC and Vector Quantization JanČernocký,ValentinaHubeikaFITBUTBrno When modeling speech production based on LPC, we assume that the excitation is passed through the linear filter: H(z) = A(z) G,where A(z)isaP-thorderpolynome:

More information

ETSI EN V7.0.1 ( )

ETSI EN V7.0.1 ( ) EN 3 969 V7.. (-) European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Half rate speech; Half rate speech transcoding (GSM 6. version 7.. Release 998) GLOBAL

More information