Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
|
|
- Noah Norris
- 6 years ago
- Views:
Transcription
1 Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
2 Conventional Speech Spectral Estimation Linear prediction (LPC) Cepstral analysis Subband filter bank Autoregressive (AR) model Exponential (EX) model Nonparametric Variations Model Pole-zero (ARMA) model Analysis window Adaptive analysis (sample by sample basis) Auditory characteristics Warped LPC, PLP, etc. Auditory frequency scales (mel, Bark) Loudness scales (log, sone) 2
3 Structure of This Talk 1. Conventional cepstral analysis 2. Introduction of generalized logarithmic function Generalized cepstral analysis 3. Introduction of auditory frequency scale Mel-generalized cepstral analysis 4. Applications to speech recognition and coding 3
4 History of Cepstral Analysis B.P. Bogert, M.J.R. Healy, J.W. Tukey (1963) Analysis of seismic signals decomposition into direct wave and echo Cepstrum, Quefrency, Lifter A.M. Noll (1964, 1967) Pitch extraction based on cepstrum A.V. Oppenheim (1966, 1968) Homomorphic deconvolution decomposition into source and vocal tract function Complex cepstrum 4
5 Definition of Cepstrum Fourier transform of signal s(n) Cepstrum S(e jω )=F [ s(n) ] C(m) =F 1 [ log S(e jω ) 2 ] (Bogert et al., Noll) C(m) =F 1 [ log S(e jω ) ] (Oppenheim) 5
6 Complex Cepstrum z-transform of signal s(n) Complex cepstrum S(z) =Z [ s(n) ] c(m) = Z 1 [ log S(z) ] = F 1 [ log S(e jω ) ] = F 1 [ log S(e jω ) + j arg S(e jω ) ] 6
7 Cepstrum and Complex Cepstrum log S(e jω ) = F [ C(m) ] =Re[ F [ c(m) ]] When it is minimum phase (all poles and zeros are located in the unit circle) c(m) = 0, m < 0 C(m), m =0 2C(m), m > 0 7
8 Homomorphic Deconvolution Pulse train White noise e(n) Linear time-invariant system h(n) s(n) =e(n) h(n) Speech s(n) =h(n) e(n) F S(e jω )=H(e jω )E(e jω ) log log S(e jω ) =log H(e jω ) +log E(e jω ) F 1 C(m) =C h (m)+c e (m) 8
9 Amplitude (db) C(m) s(n) Time (100μs) (a) Quefrency (100μs) (b) Frequency (khz) (d) Amplitude (db) Amplitude (db) Frequency (khz) (c) Frequency (khz) (e) 9
10 Spectrum of Periodic Signal e(n) w(n) log E w (e jω ) W(e jω ) 2π/N p N p 0 π Frequency e(n) h(n) s(n) log S w (e jω ) H(e jω ) 0 π Frequency 10
11 Problems of Homomorphic Processing (Cepstral Analysis) Linear smoothing of log spectrum affected by fine structure of FFT spectrum results in a large bias and variance Voiced speech (periodic) Envelope of peaks of spectral fine structure Improved cepstral analysis, PSE: Biased 11
12 Cost Function P (ω): I N (ω): Estimate of Power Spectrum Periodogram E = 1 2π π π { IN (ω) P (ω) log I } N(ω) P (ω) 1 dω min x: Gaussian Process Maximizing p(x c) Unbiased estimation of log spectrum equivalent to one used in LPC Minimization of energy of inverse filter output 12
13 Analysis of Natural Speech Log magnitude (db) Log magnitude (db) Frequency (khz) (a) Unbiased cepstral analysis Frequency (khz) (b) Linear prediction 13
14 Generalized Cepstrum Complex Cepstrum c(m) = Z 1 [ log S(z) ] log S(z) = Z [ c(m) ] Generalized Cepstrum c γ (m) = Z 1 [ s γ (S(z))] s γ (S(z)) = Z [ c γ (m) ] 14
15 Generalized logarithmic function s γ (w) = (w γ 1)/γ, 0 < γ 1 log w, γ =0 s ( x ) γ γ = < γ < 1 γ = 0 (log x ) 1 < γ < 0 γ = 1 x 15
16 Spectral Model Generalized Cepstrum: c γ (m) H(z) = s 1 γ = M m=0 1+γ exp c γ (m) z m M m=0 M m=0 c γ (m) z m 1/γ, 0 < γ 1 c γ (m) z m, γ =0 Inverse function of Generalized logarithm s 1 γ (w) = (1 + γw) 1/γ, 0 < γ 1 exp w, γ =0 16
17 Cost Function E = 1 2π π π { IN (ω) P (ω) log I } N(ω) P (ω) 1 dω min Estimate of Power Spectrum P (ω) = H(e jω ) 2 = σ 2 D(e jω ) 2 Interpretation in time-domain ε = E [ e 2 (n) ] min x(n) 1/D(z) e(n) 17
18 Advantage 1 γ 0: Convex function Global solutioncan easily be obtained The obtained system H(z) is minimum phase, e.g., stable γ = 1 Linear Prediction H(z) = 1 γ =0 Cepstrum H(z) = exp M m=0 M m=0 1 c γ (m)z m c γ (m)z m 18
19 Prediction Gain D(z) is minimum phase Gain of D(z) is one Predictor: Q(z) = k=1 a(k)z k Cost Function: ε = E [ e 2 (n) ] Prediction Gain: G = E [ x 2 (n) ] E [ e 2 (n) ] x(n) x(n) 1/D(z) Q(z) + e(n) e(n) 19
20 Analysis of synthetic signal (Generalized Cepstral Analysis) Log magnitude 10dB True γ =-1-3/4-1/2-1/ Frequency(Hz) Prediction gain(db) 8 7 (a) Example 1 M= (LPC) (UELS) γ 20
21 Analysis of synthetic signal (Generalized Cepstral Analysis) Log magnitude 10dB True γ =-1-3/4-1/2-1/ Frequency(Hz) Prediction gain(db) 5 4 (c) Example 3 M= (LPC) (UELS) γ 21
22 Analysis of synthetic signal (Generalized Cepstral Analysis) Log magnitude 10dB True γ =-1-3/4-1/2-1/ Frequency(Hz) Prediction gain(db) 7 6 (b) Example 2 M= (LPC) (UELS) γ 22
23 Analysis of natural speech (Generalized Cepstral Analysis) /e/ 80 Magnitude(dB) γ = -1 γ = -1/2 γ = -1/3 γ = Frequency(kHz) Frequency(kHz) Frequency(kHz) Frequency(kHz) Prediction gain(db) M= (LPC) (UELS) γ (a) male /e/ 23
24 Analysis of natural speech (Generalized Cepstral Analysis) /N/ 80 Magnitude(dB) γ = -1 γ = -1/2 γ = -1/3 γ = Frequency(kHz) Frequency(kHz) Frequency(kHz) Frequency(kHz) Prediction gain(db) M= (LPC) (UELS) γ (b) male /N/ 24
25 Structure of synthesis filter H(z) (γ = 1/n) 1st 3nd... n-th input 1 1 σ 1 C(z) C(z) C(z) output H(z) =σd(z) =σ { 1 C(z) } n C(z) = 1+γ M c γ (m) z m m=0 25
26 Structure of synthesis filter H(z) (γ =0) LMA filter Input F (z) F (z) F (z) F (z) A L,1 A L,2 A L,3 A L,4 Output + D(z) = exp F (z) R L (F (z)) = L l=1 L l=1 A L,l {F (z)} l A L,l { F (z)} l F (z) = M m=1 c γ (m) z m 26
27 Introduction of auditory frequency scale First-order all-pass function: z 1 α =Ψ(z) = z 1 α 1 αz 1 Phase Characteristics can be used for Frequency Transformation: ω = tan 1 (1 α 2 )sinω (1 + α 2 ) cos ω 2α where Ψ(e jω )=e j ω Warped frequency ω (rad) π π /2 mel scale 10kHz sampling α = π/2 π Frequency ω (rad) 27
28 Mel-Generalized Cepstral Analysis Mel-generalized cepstrum: c α,γ (m) H(z) = s 1 γ = M m=0 1+γ exp c α,γ (m) z m M m=0 M m=0 α c α,γ (m) z m α 1/γ, 0 < γ 1 c α,γ (m) z m α, γ =0 z 1 α = z 1 α 1 αz 1 28
29 (α, γ) =(0, 0) Cepstral model: H(z) = exp (α, γ) =(0, 1) AR model: M m=0 c α,γ (m)z m H(z) = 1 M m=0 1 c α,γ (m)z m (α, γ) =(0.35, 0) Mel-cepstral model: H(z) = exp M m=0 c α,γ (m) z m α (α, γ) =(0.47, 1) Warped AR model: H(z) = 1 M m=0 1 c α,γ (m) z m α 29
30 A Unified Approach to Speech Spectral Estimation α < 1, 1 γ 0 α =0 Generalized cepstral analysis Mel-generalized cepstral analysis γ = 1 Linear prediction Warped Linear Prediction γ =0 Unbiased Cepstral analysis Mel-cepstral analysis 30
31 Mel-generalized analysis of natural speech /N/ M =12 Magnitude(dB) Magnitude(dB) Magnitude(dB) α = 0 α = 0.35 α = Frequency(kHz) Frequency(kHz) Frequency(kHz) γ = 0 γ = 0.5 γ = 1 31
32 Example α =0 γ = 1 γ = 1/3 γ =0 Frequency(kHz) Frequency(kHz) Frequency(kHz) n a N b u d e w a (ms) (a) Waveform 40(dB) 40(dB) (α, γ, M) = (0, -1, 12) LP (α, γ, M) = (0, -1/3, 12) GCEP 40(dB) (α, γ, M) = (0, 0, 12) UELS (b) Spectral estimates ( α = 0) 32
33 Example α =0.35 γ = 1 γ = 1/3 γ =0 Frequency(kHz) Frequency(kHz) Frequency(kHz) n a N b u d e w a (ms) (a) Waveform 40(dB) 40(dB) (α, γ, M) = (0.35, -1, 12) WLP (α, γ, M) = (0.35, -1/3, 12) MGCEP 40(dB) (α, γ, M) = (0.35, 0, 12) MCEP (b) Spectral estimates ( α = 0.35) 33
34 Structure of synthesis filter H(z) (γ = 1/n) 1st 2nd... n-th input 1 1 σ 1 C(z) C(z) C(z) output Structure of H(z) H(z) =σd(z) =σ C(z) = ( 1+γ M m=0 { 1 C(z) } n c α,γ(m) z m α Output ) Input + α z 1 + z 1 + z 1 + z 1 γ(1 α 2 ) + b(1) + α + b(2) Structure of C(z) (M =3) + α b(3) 34
35 Structure of synthesis filter H(z) (γ =0) MLSA filter D(z) = exp F (z) R L (F (z)) F (z) = M m=0 c α,γ(m) z m α sufficient accuracy: maximum spectral error 0.24dB O(8M) multiply-add operations a sample guaranteed stability M multiply-add operations for filter coefficients calculation 35
36 Structure of MLSA filter Input F (z) F (z) F (z) F (z) A L,1 A L,2 A L,3 A L,4 Output + R L (F (z)) exp F (z) =D(z), L =4 Input 1 α 2 α - + α α z 1 + z 1 + z 1 + z b(1) b(2) b(3) + Basic filter F (z), M =3 + Output 36
37 The choice of α, γ for speech analysis/synthesis Analysis/synthesis system with fixed α and γ speech quality change with γ γ 1 Clear γ 0 Smooth When γ = 0, speech quality with (α, M) =(0.35, 15) is almost equivalent to that with (α,m) =(0, 30). When the analysis order is high enough, the difference becomes small. 37
38 Feature of Unified Approach Linear prediction analysis, Cepstral analysis are the special cases. Mathematically well-defined Physical interpretation Minimization of energy of inverse filter output x is Gaussian Minimization of p(x c) (ML estimation) Global solution, stability of the system function Synthesis filter for direct synthesis from the estimated coefficients LMA/MLSA/GMSLA filter Extension to adaptive analysis (sample by sample basis) Parameter transformation for speech recognition 38
39 Word Recognition based on HMM Spectral Analysis: 1. (α 1,γ 1,M 1 )=(0, 1, 12) Linear Prediction 2. (α 1,γ 1,M 1 )=(0.35, 1/3, 12) Mel-generalized cepstral analysis 3. (α 1,γ 1,M 1 )=(0.35, 0, 12) Mel-cepstral analysis Output vector of HMM: (α 2,γ 2,M 2 )=(0.35, 0, 12) Mel-cepstral coefficients and Δ (dynamic coefficients) H(z) =s 1 γ 1 M 1 m=0 c α1,γ 1 (m) z m α 1 = s 1 γ 2 m=0 c α2,γ 2 (m) z m α 2 39
40 Recognition Accuracy (Continuous HMM, 33 phoneme models, 2618 words) Linear Prediction (α 1, γ 1, M 1)= (0, -1, 12) Mel-Generalized Cepstral Analysis (α 1, γ 1, M 1)= (0.35, -1/3, 12) Mel-Cepstral Analysis (α 1, γ 1, M 1)= (0.35, 0, 12) 77.5% 80.7% 79.6% Recognition accuracy (%) 40
41 Application to 16kb/s wideband CELP coder Input Quantization of MGC Coef. MGC Analysis encoder CELP Excitation Generator Minimize MSE Synthesis Filter MGC coef. MGC coef. Perceptual Weighting Filter MGC coef. MGC coef. decoder CELP Excitation Generator Synthesis Filter Postfilter Output 41
42 Speech quality as a function of α (γ = 1/2) D M O S 3.5 average female male α 42
43 Subjective Evaluation MNRU(dB) G kb/s G kb/s G kb/s Conv.CELP 16kb/s MGC-CELP 16kb/s D M O S average female male 43
44 Summary A unified approach to speech spectral estimation A unified approach to Linear predicton and Cepstral analysis Introduction of auditory frequency scale Efficint representation of speech spectrum with an appropriate choice of α and γ Application to speech anaylysis/synthesis, speech coding, speech recognition Future work: Optimal α and γ (Phoneme/Speaker dependent?) Speech Signal Processing Toolkit: 44
Feature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationCEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.
CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationTopic 6. Timbre Representations
Topic 6 Timbre Representations We often say that singer s voice is magnetic the violin sounds bright this French horn sounds solid that drum sounds dull What aspect(s) of sound are these words describing?
More informationDepartment of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions
Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions The complex cepstrum, ˆx[n], of a sequence x[n] is the inverse Fourier transform of the complex
More informationCS578- Speech Signal Processing
CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction
More informationChapter 10 Applications in Communications
Chapter 10 Applications in Communications School of Information Science and Engineering, SDU. 1/ 47 Introduction Some methods for digitizing analog waveforms: Pulse-code modulation (PCM) Differential PCM
More informationVocoding approaches for statistical parametric speech synthesis
Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,
More informationLinear Prediction Coding. Nimrod Peleg Update: Aug. 2007
Linear Prediction Coding Nimrod Peleg Update: Aug. 2007 1 Linear Prediction and Speech Coding The earliest papers on applying LPC to speech: Atal 1968, 1970, 1971 Markel 1971, 1972 Makhoul 1975 This is
More informationSinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,
Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech
More informationLECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES
LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch
More informationLab 9a. Linear Predictive Coding for Speech Processing
EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)
More informationClass of waveform coders can be represented in this manner
Digital Speech Processing Lecture 15 Speech Coding Methods Based on Speech Waveform Representations ti and Speech Models Uniform and Non- Uniform Coding Methods 1 Analog-to-Digital Conversion (Sampling
More informationSpeaker Identification Based On Discriminative Vector Quantization And Data Fusion
University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Speaker Identification Based On Discriminative Vector Quantization And Data Fusion 2005 Guangyu Zhou
More informationZeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals
Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationSource modeling (block processing)
Digital Speech Processing Lecture 17 Speech Coding Methods Based on Speech Models 1 Waveform Coding versus Block Waveform coding Processing sample-by-sample matching of waveforms coding gquality measured
More informationL8: Source estimation
L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationThe Equivalence of ADPCM and CELP Coding
The Equivalence of ADPCM and CELP Coding Peter Kabal Department of Electrical & Computer Engineering McGill University Montreal, Canada Version.2 March 20 c 20 Peter Kabal 20/03/ You are free: to Share
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationNonparametric and Parametric Defined This text distinguishes between systems and the sequences (processes) that result when a WN input is applied
Linear Signal Models Overview Introduction Linear nonparametric vs. parametric models Equivalent representations Spectral flatness measure PZ vs. ARMA models Wold decomposition Introduction Many researchers
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationLecture 7: Feature Extraction
Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of
More informationAn Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis
ICASSP 07 New Orleans, USA An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI National Institute of Informatics, Japan 07-03-07
More informationDesign of a CELP coder and analysis of various quantization techniques
EECS 65 Project Report Design of a CELP coder and analysis of various quantization techniques Prof. David L. Neuhoff By: Awais M. Kamboh Krispian C. Lawrence Aditya M. Thomas Philip I. Tsai Winter 005
More informationQuadrature-Mirror Filter Bank
Quadrature-Mirror Filter Bank In many applications, a discrete-time signal x[n] is split into a number of subband signals { v k [ n]} by means of an analysis filter bank The subband signals are then processed
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationDesign Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
CENTER FOR COMPUTER RESEARCH IN MUSIC AND ACOUSTICS DEPARTMENT OF MUSIC, STANFORD UNIVERSITY REPORT NO. STAN-M-4 Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation
More informationTime-domain representations
Time-domain representations Speech Processing Tom Bäckström Aalto University Fall 2016 Basics of Signal Processing in the Time-domain Time-domain signals Before we can describe speech signals or modelling
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationPulse-Code Modulation (PCM) :
PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number of bits used to represent each sample. The rate from
More informationOptimal Design of Real and Complex Minimum Phase Digital FIR Filters
Optimal Design of Real and Complex Minimum Phase Digital FIR Filters Niranjan Damera-Venkata and Brian L. Evans Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University
More informationAudio Coding. Fundamentals Quantization Waveform Coding Subband Coding P NCTU/CSIE DSPLAB C.M..LIU
Audio Coding P.1 Fundamentals Quantization Waveform Coding Subband Coding 1. Fundamentals P.2 Introduction Data Redundancy Coding Redundancy Spatial/Temporal Redundancy Perceptual Redundancy Compression
More informationECE 8440 Unit Applica,ons (of Homomorphic Deconvolu,on) to Speech Processing
ECE 8440 Unit 24 1 13.2 Applica,ons (of Homomorphic Deconvolu,on) to Speech Processing Speech produc,on can be modeled as the convolu,on of an excita,on signal with the unit sample response of a linear
More information6.003 (Fall 2011) Quiz #3 November 16, 2011
6.003 (Fall 2011) Quiz #3 November 16, 2011 Name: Kerberos Username: Please circle your section number: Section Time 2 11 am 3 1 pm 4 2 pm Grades will be determined by the correctness of your answers (explanations
More informationPractical Spectral Estimation
Digital Signal Processing/F.G. Meyer Lecture 4 Copyright 2015 François G. Meyer. All Rights Reserved. Practical Spectral Estimation 1 Introduction The goal of spectral estimation is to estimate how the
More informationReview of Fundamentals of Digital Signal Processing
Solution Manual for Theory and Applications of Digital Speech Processing by Lawrence Rabiner and Ronald Schafer Click here to Purchase full Solution Manual at http://solutionmanuals.info Link download
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationParametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes (cont d)
Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes (cont d) Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides
More informationChirp Decomposition of Speech Signals for Glottal Source Estimation
Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationDiscrete-Time Signals and Systems. Frequency Domain Analysis of LTI Systems. The Frequency Response Function. The Frequency Response Function
Discrete-Time Signals and s Frequency Domain Analysis of LTI s Dr. Deepa Kundur University of Toronto Reference: Sections 5., 5.2-5.5 of John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing:
More informationDFT & Fast Fourier Transform PART-A. 7. Calculate the number of multiplications needed in the calculation of DFT and FFT with 64 point sequence.
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING UNIT I DFT & Fast Fourier
More informationRandom signals II. ÚPGM FIT VUT Brno,
Random signals II. Jan Černocký ÚPGM FIT VUT Brno, cernocky@fit.vutbr.cz 1 Temporal estimate of autocorrelation coefficients for ergodic discrete-time random process. ˆR[k] = 1 N N 1 n=0 x[n]x[n + k],
More informationGlottal Source Estimation using an Automatic Chirp Decomposition
Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More information1. Probability density function for speech samples. Gamma. Laplacian. 2. Coding paradigms. =(2X max /2 B ) for a B-bit quantizer Δ Δ Δ Δ Δ
Digital Speech Processing Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models Adaptive and Differential Coding 1 Speech Waveform Coding-Summary of Part 1 1. Probability
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationAn Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling
An Model for HMM-Based Speech Synthesis Based on Residual Modeling Ranniery Maia,, Tomoki Toda,, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, National Inst. of Inform. and Comm. Technology (NiCT), Japan
More informationEfficient Block Quantisation for Image and Speech Coding
Efficient Block Quantisation for Image and Speech Coding Stephen So, BEng (Hons) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith University, Brisbane, Australia
More informationFinite Word Length Effects and Quantisation Noise. Professors A G Constantinides & L R Arnaut
Finite Word Length Effects and Quantisation Noise 1 Finite Word Length Effects Finite register lengths and A/D converters cause errors at different levels: (i) input: Input quantisation (ii) system: Coefficient
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationCHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME
CHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME Shri Mata Vaishno Devi University, (SMVDU), 2013 Page 13 CHAPTER 2 RANDOM PROCESSES IN DISCRETE TIME When characterizing or modeling a random variable, estimates
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationAUTOREGRESSIVE (AR) modeling identifies and exploits
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 11, NOVEMBER 2007 5237 Autoregressive Modeling of Temporal Envelopes Marios Athineos, Student Member, IEEE, and Daniel P. W. Ellis, Senior Member, IEEE
More informationMultidimensional digital signal processing
PSfrag replacements Two-dimensional discrete signals N 1 A 2-D discrete signal (also N called a sequence or array) is a function 2 defined over thex(n set 1 of, n 2 ordered ) pairs of integers: y(nx 1,
More informationSignal Modeling Techniques In Speech Recognition
Picone: Signal Modeling... 1 Signal Modeling Techniques In Speech Recognition by, Joseph Picone Texas Instruments Systems and Information Sciences Laboratory Tsukuba Research and Development Center Tsukuba,
More informationNonparametric Deconvolution of Seismic Depth Phases
IMA Workshop Time Series Analysis and Applications to Geophysical Systems November 1-16, 1 Minneapolis, Minnesota Nonparametric Deconvolution of Seismic Depth Phases Robert H. Shumway 1 Jessie L. Bonner
More information[Omer* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY TAJWEED UTOMATION SYSTEM USING HIDDEN MARKOUV MODEL AND NURAL NETWORK Safaa Omer Mohammed Nssr*, Hoida Ali Abdelgader SUDAN UNIVERSITY
More informationDiscrete-Time David Johns and Ken Martin University of Toronto
Discrete-Time David Johns and Ken Martin University of Toronto (johns@eecg.toronto.edu) (martin@eecg.toronto.edu) University of Toronto 1 of 40 Overview of Some Signal Spectra x c () t st () x s () t xn
More informationBlind Deconvolution of Ultrasonic Signals Using High-Order Spectral Analysis and Wavelets
Blind Deconvolution of Ultrasonic Signals Using High-Order Spectral Analysis and Wavelets Roberto H. Herrera, Eduardo Moreno, Héctor Calas, and Rubén Orozco 3 University of Cienfuegos, Cuatro Caminos,
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationBich Ngoc Do. Neural Networks for Automatic Speaker, Language and Sex Identification
Charles University in Prague Faculty of Mathematics and Physics University of Groningen Faculty of Arts MASTER THESIS Bich Ngoc Do Neural Networks for Automatic Speaker, Language and Sex Identification
More information/ (2π) X(e jω ) dω. 4. An 8 point sequence is given by x(n) = {2,2,2,2,1,1,1,1}. Compute 8 point DFT of x(n) by
Code No: RR320402 Set No. 1 III B.Tech II Semester Regular Examinations, Apr/May 2006 DIGITAL SIGNAL PROCESSING ( Common to Electronics & Communication Engineering, Electronics & Instrumentation Engineering,
More informationSpeech Coding. Speech Processing. Tom Bäckström. October Aalto University
Speech Coding Speech Processing Tom Bäckström Aalto University October 2015 Introduction Speech coding refers to the digital compression of speech signals for telecommunication (and storage) applications.
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationEE538 Final Exam Fall :20 pm -5:20 pm PHYS 223 Dec. 17, Cover Sheet
EE538 Final Exam Fall 005 3:0 pm -5:0 pm PHYS 3 Dec. 17, 005 Cover Sheet Test Duration: 10 minutes. Open Book but Closed Notes. Calculators ARE allowed!! This test contains five problems. Each of the five
More informationINTRODUCTION Noise is present in many situations of daily life for ex: Microphones will record noise and speech. Goal: Reconstruct original signal Wie
WIENER FILTERING Presented by N.Srikanth(Y8104060), M.Manikanta PhaniKumar(Y8104031). INDIAN INSTITUTE OF TECHNOLOGY KANPUR Electrical Engineering dept. INTRODUCTION Noise is present in many situations
More informationLecture 7 Discrete Systems
Lecture 7 Discrete Systems EE 52: Instrumentation and Measurements Lecture Notes Update on November, 29 Aly El-Osery, Electrical Engineering Dept., New Mexico Tech 7. Contents The z-transform 2 Linear
More informationPart III Spectrum Estimation
ECE79-4 Part III Part III Spectrum Estimation 3. Parametric Methods for Spectral Estimation Electrical & Computer Engineering North Carolina State University Acnowledgment: ECE79-4 slides were adapted
More informationrepresentation of speech
Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l
More informationOn Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR
On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR Vivek Tyagi a,c, Hervé Bourlard b,c Christian Wellekens a,c a Institute Eurecom, P.O Box: 193, Sophia-Antipolis, France.
More informationDigital Signal Processing. Midterm 1 Solution
EE 123 University of California, Berkeley Anant Sahai February 15, 27 Digital Signal Processing Instructions Midterm 1 Solution Total time allowed for the exam is 8 minutes Some useful formulas: Discrete
More informationOn Moving Average Parameter Estimation
On Moving Average Parameter Estimation Niclas Sandgren and Petre Stoica Contact information: niclas.sandgren@it.uu.se, tel: +46 8 473392 Abstract Estimation of the autoregressive moving average (ARMA)
More informationADSP ADSP ADSP ADSP. Advanced Digital Signal Processing (18-792) Spring Fall Semester, Department of Electrical and Computer Engineering
Advanced Digital Signal rocessing (18-792) Spring Fall Semester, 201 2012 Department of Electrical and Computer Engineering ROBLEM SET 8 Issued: 10/26/18 Due: 11/2/18 Note: This problem set is due Friday,
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationCorrespondence. Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure
Correspondence Pulse Doppler Radar Target Recognition using a Two-Stage SVM Procedure It is possible to detect and classify moving and stationary targets using ground surveillance pulse-doppler radars
More informationETSI EN V7.1.1 ( )
European Standard (Telecommunications series) Digital cellular telecommunications system (Phase +); Adaptive Multi-Rate (AMR) speech transcoding GLOBAL SYSTEM FOR MOBILE COMMUNICATIONS R Reference DEN/SMG-110690Q7
More informationStatistical and Adaptive Signal Processing
r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science : Discrete-Time Signal Processing
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.34: Discrete-Time Signal Processing OpenCourseWare 006 ecture 8 Periodogram Reading: Sections 0.6 and 0.7
More informationEE 602 TERM PAPER PRESENTATION Richa Tripathi Mounika Boppudi FOURIER SERIES BASED MODEL FOR STATISTICAL SIGNAL PROCESSING - CHONG YUNG CHI
EE 602 TERM PAPER PRESENTATION Richa Tripathi Mounika Boppudi FOURIER SERIES BASED MODEL FOR STATISTICAL SIGNAL PROCESSING - CHONG YUNG CHI ABSTRACT The goal of the paper is to present a parametric Fourier
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro
More information-Digital Signal Processing- FIR Filter Design. Lecture May-16
-Digital Signal Processing- FIR Filter Design Lecture-17 24-May-16 FIR Filter Design! FIR filters can also be designed from a frequency response specification.! The equivalent sampled impulse response
More informationEE 5345 Biomedical Instrumentation Lecture 12: slides
EE 5345 Biomedical Instrumentation Lecture 1: slides 4-6 Carlos E. Davila, Electrical Engineering Dept. Southern Methodist University slides can be viewed at: http:// www.seas.smu.edu/~cd/ee5345.html EE
More informationStress detection through emotional speech analysis
Stress detection through emotional speech analysis INMA MOHINO inmaculada.mohino@uah.edu.es ROBERTO GIL-PITA roberto.gil@uah.es LORENA ÁLVAREZ PÉREZ loreduna88@hotmail Abstract: Stress is a reaction or
More informationLinear models. Chapter Overview. Linear process: A process {X n } is a linear process if it has the representation.
Chapter 2 Linear models 2.1 Overview Linear process: A process {X n } is a linear process if it has the representation X n = b j ɛ n j j=0 for all n, where ɛ n N(0, σ 2 ) (Gaussian distributed with zero
More information