Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals

Size: px
Start display at page:

Download "Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals"

Transcription

1 Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017

2 What is new in this thesis? q We present new spectral representations: ZZT and three group delay based representations and new algorithms demonstrating applications of these representations in various speech analysis problems: Source-tract separation Glottal flow parameter estimation Formant tracking Feature extraction for Automatic Speech recognition q We study in detail the phase estimation problem and propose solutions to existing problems 2 q We discuss group delay characteristics of a mixed-phase speech model 2

3 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 3 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 3

4 4 Motivations Primary motivation : Voice quality analysis for TTS Starting pointafter a literature review: spectral methods Two main problems : source-tract separation, fourier transform phase processing Potential impact areas for this study Source-tract separation: voice quality analysis, speech synthesis, emotion studies,speech therapy, speaker recognition. Phase processing: speech perception, speech recognition, speech coding. Group delay characteristics of the mixed-phase speech model: speech processing theory target application -> basic research -> larger impact 4

5 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 5 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 5

6 Spectral analysis of signals z-transform X z = N 1 n= 0 x n z n Fourier transform X w X z = a w jb w = = jw + z e 6 Magnitude => X + Phase => θ w = Group. delay => τ w 6 2 w = a w b w b w arctan a w d θ w = dw 2

7 All-pole filter response and causality 7 For causality detection, phase processing is essential 7

8 Why study group delay processing? poles of an all-pole filter -Higher resolution -Tilt free -Mixed-phase information 8 advantagegrpd.avi 8

9 Contents 9 q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 9

10 A Mixed-Phase model of speech Maximum-phase Glottal flow excitation* * = Minimum-phase vocal tract filter plus the GF return phase + = 10 Mixed-phase speech signal Important note: Mixed-phase characteristic can only be observed in phase/group delay spectrum = *:after Gardner 1994 and Doval & D Alessandro =

11 Preliminary trials with chirp group delay processing 11 Not robust and we don t know the reason. Bozkurt & Dutoit, VOQUAL 2003 Hint comes from Prof. Kawahara in VOQUAL03: windowing may play a role 11

12 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 12 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 12

13 Problems in group delay analysis of speech Problem! Group Delay Functions are most often very noisy Reason: Roots of the z-transform polynomial close to unit circle Yegnanarayana and Murthy89 Conclusion: 13 A systematic study of roots of Z-transform for speech signals is needed Thanks to todays technology! difficultywithzeros.avi 13

14 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 14 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 14

15 15 15 = = + = = N n N m m N n Z z z x z n x z X ZZT representation: the set of zeros of the z-transform polynomial Almost impossible to study analytically for most of the functions, therefore numerical methods are used roots function of Matlab 1 0,1..., = = N n a n x n = = = N n N n n z a z a z a z X Basic elemantary signal : power series Zeros of Z-TransformZZT Representation

16 ZZT of elementary signals ZZT of damped sinusoid ZZT of causal all-pole filter response 16expCoeffInDampedSinusoid.avi causalresponsezeros1.avi 16

17 Zero-patterns for the LF model * of glottal flow derivative First phase g t 0 ω t = E e sin gt, 0 t t [ ] α e e c 0 e Return phase g E εt ε t te ε tc te t = e e, t t t T a 17 *: Fant et al,

18 ZZT representation of speech Synthetic mixed-phase speech = + + = 18 periodicity results in many zeros on the unit circle 18 first phase of the glottal flow adds zeros outside the unit circle vocal tract response zeros lie inside the unit circle AND WINDOWING EFFECT TO ZZT IS DRASTIC!

19 All-zero representation of windowed speech Non-GCI Synchronous windowing GCI Synchronous windowing Rectangular window Rectangular window 19 19

20 All-zero representation of speech Window location effect to ZZT plots 2T0 case Windowing of synthetic speech Windowing of real speech 20 synth2t0blackman.avi real2t0blackman.avi 20

21 Window function effect to ZZT and group delay Best choices are Blackman, Gaussian and Hanning-Poisson 21 Smoothness can be adjusted by varying coefficients in Gaussian and Hanning- Poisson 21

22 Window size effect to ZZT 22 What do other people do? Pitch asynchronous, 3T0 window size, Hamming -> all are bad choices for phase processing 22

23 23 23 Recently proposed group delay representations in literature 2 ω ω ω ω ω ω τ X Y X Y X I I R R p + = γ ω ω ω ω ω ω τ 2 S Y X Y X I I R R p + = Modified group delay function, MODGDF Hegde et al, ICSLP 2004 Product spectrum, PS Zhu and Paliwal, ICASSP ω ω ω ω ω τ ω ω I I R R p Y X Y X X Q + = = ]}} [ { { n x FT X R = real ω ]}} [ { { n nx FT Y I = imag ω responsible! for spikes replaced by a cepstrally smoothed version responsible! completely removed Originality in our approach: Studying zero patterns, trying to find means of avoiding/removing zeros close to the unit circle

24 ZZT and group delay of GCI-synchronously windowed speech Group Delay of GCI-Synchronously windowed speech GDGCI ZZT Amp. Spec. 24 GDGCI 24

25 Group delay spectrogram using GDGCI Hanning-Poisson, 2T 0 25 The formant frequencies of a given speech signal can be estimated from phase spectrum once windowing is properly performed 25

26 Chirp group delay of GCI-synchronously windowed speech CGDGCI Basic ideas: -remove unwanted zeros -compute chirp group delay away from the resting zeros CGD outside unit circle directly from signal after zero removal CGD inside unit circle directly from signal after zero removal 26 CGDGCI Disadvantages: computationally heavy, GCI-synchronous 26

27 Chirp group delay of the Zero-Phase version of the signal CGDZP GCI-synhcronous processing is not practical for ASR. Speech windowing fft abs ifft CGD R=1.12 CGDZP Zero phasing constant frame size/shift optimized by testing performance for incrementing values 27 whyzerophase.avi CGDcompared2AmpSpec.avi 27

28 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 28 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 28

29 Zero-decomposition for source-tract separation Kawahara et al, GCIdetection.avi Bozkurt et. al., ICSLP 2004-a 29

30 Zero-decomposition for sourcetract separation Synthetic glottal excitation Original windowed speech Original amp. spectum Synthetic speech Original+reconstructed glottal excitation Original+reconstructed glottal amp. spectrum 30 ZZT Original+reconstructed vocal tract response Original+reconstructed tract transfer function Zero-decomposition 30

31 Zero-decomposition for sourcetract separation Real speech Original windowed speech Original amp. spectum ZZT reconstructedglottal excitation reconstructed glottal amp. spectrum 31 reconstructed vocal tract response Zero-decomposition reconstructed tract transfer function Copy-Synth Noise excited tract 31

32 Comperative example: ZZT-decomposition and PSIAIF Glottal flowgf Differential glottal flowdgf Original ZZT-decomp estimate PSIAIF 32 Amp. Spec. of DGF Group delay of DGF 32

33 Robustness of ZZT-decomposition To GCI estimation errors To F1 variation F1= Hz F1= Hz To noise To return phase variations F1= Hz 33 33

34 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 34 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 34

35 Application 2 : ZZT+Group Delay Processing Glottal formant frequency Fg estimation Synthetic vowels a, u, i Real speech f0=100hz OQ? From EGG 35 f0=200hz *Acknowledgement: NB: Fg=fF0,1/OpenQuotient,Asym. Doval et al, VOQUAL 2003 Open quotient estimate provided by Nathalie Henrich. Henrich, N., et al

36 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 36 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 36

37 Application 3 : ZZT+Group Delay Processing Candidates: DPPT, WinSnoori, Praat two publicly available tools Conclusion: Formant tracking with CGDGCI: DPPT Results combined with real speech tests show that Praat and DPPT are comparable in quality and superior to Win Snoori. The disadvantage of DPPT is its low speed and dependency to GCI. Average percentage error Formant miss rate 37 F1 F2 F3 F4 F1 F2 F3 F4 DPPT WinSnoori Praat

38 Application 3 : ZZT+Group Delay Processing Formant tracking with CGDZP: Fast-DPPT SPEECH DATA Fixed frame-size and frame-shift Blackman windowing Frame-size, frame-shift, number of formants to track Computation of the zero-phase version of the signal Computation of chirp group delay outside the unit circle CGDZP Peak picking Decrement/increment radius of analysis circle 38 number of peaks equal to number of formants to track? NO YES Formant frequencies 38

39 Formant tracking tests on real speech 39 Stimuli: 10 real speech examples5 female, 5 male with large formant variations. Candidates: Fast-DPPT, Wave Surfer, Praat two publicly available tools Conclusion: They have similar quality, Fast- DPPT lacks a post-processing module for guaranteeing continuity of tracks. 39

40 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 40 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 40

41 Application of the Mixed-phase model MixLP* for glottal flow parameter estimation Mixed-phase speech model Most of the existing LP methods can only estimate resonances for the minimum phase version of the signal. Poles outside the unit circle are avoided In MixLP, we look for poles outside the unit circle. Conclusion: Works well for synthetic speech, not robust for analyzing real speech. 41 Bozkurt, Severin, Dutoit, 2004, *:implemented and tested by Francois Severin 41

42 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 42 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 42

43 Application 5 : ZZT+Group Delay Processing Automatic Speech RecognitionASR 43 43

44 Computation of ASR features ASR system speech signal Front-End Acoustic Model Word Decoder word sequence MLP HMM topology lexicon grammar 44 MFCC as baseline method Alternative method: replace power spectrum function in MFCC by group delay functions MODGDF, PS, GDCGI, CGDCGI, CGDZP. 44

45 Combining acoustic models speech signal MFCC Front-End Alternative Front-End Acoustic Model 1 Acoustic Model 2 Combination Acoustic Model word sequence MODGDF, PS GDCGI, CGDCGI CGDZP 45 Combine HMM state probabilities MLP outputs at frame level as a weighted geometric average, P λ 1 λ 12 si vt = P1 si vt P2 si vt with λ optimized between 0 and 1. 45

46 46 Application 4 : ZZT+Group Delay Processing Automatic Speech RecognitionASR* Proposed group delay representations are compared with representations proposed in recent studies[hegde2004, Alsteris2004], in an ASR experiment. Our proposed techniques provide better results and have the potential to improve ASR performance. Feature SNR db Extraction MFCC 1.9 œ 6.7 œ 18.6 œ 45.2 œ 75.1 œ 88.8 œ 91.5 œ MODGDF PS GDGCI CGDGCI CGDZP Performances of ASR system word error rate WER in percent for various feature extraction on the AURORA-2 task lexicon reduced to English digits and no grammar is applied. Training: 8440 noise-free utterances spoken by 110 speakers. Evaluation: 4004 different noise-free utterances spoken by 104 other speakers. *:ASR tests handled by Laurent Couvreur/TCTS Lab 46

47 47 Deficiencies, future work, work not included in the thesis Some of the algorithms are not completely tested due to time constraints and therefore stand just as demonstrations of potential. ZZT-decomposition is not throughly tested ASR tests are limited It is an experimental study, analytical part is weak Since studying zero locations is difficult if not impossible Future work: Voice quality labeling/classification: Source-tract decomposition using complex cepstrum Restudying phase related problems in speech processing Studies on TTS during the are not included in the thesis 47

Chirp Decomposition of Speech Signals for Glottal Source Estimation

Chirp Decomposition of Speech Signals for Glottal Source Estimation Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

Glottal Source Estimation using an Automatic Chirp Decomposition

Glottal Source Estimation using an Automatic Chirp Decomposition Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics

More information

Evaluation of the modified group delay feature for isolated word recognition

Evaluation of the modified group delay feature for isolated word recognition Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and

More information

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析

Chapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification

More information

Feature extraction 1

Feature extraction 1 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short

More information

L8: Source estimation

L8: Source estimation L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction

More information

L7: Linear prediction of speech

L7: Linear prediction of speech L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,

More information

Signal representations: Cepstrum

Signal representations: Cepstrum Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,

More information

Speech Signal Representations

Speech Signal Representations Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6

More information

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda

Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,

More information

Improved Method for Epoch Extraction in High Pass Filtered Speech

Improved Method for Epoch Extraction in High Pass Filtered Speech Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004 6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual

More information

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Thomas Drugman, Baris Bozkurt, Thierry Dutoit To cite this version: Thomas Drugman, Baris Bozkurt, Thierry

More information

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,

Sinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece, Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech

More information

Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation

Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation Available online at www.sciencedirect.com Speech Communication 53 (2011) 855 866 www.elsevier.com/locate/specom Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation

More information

Detection-Based Speech Recognition with Sparse Point Process Models

Detection-Based Speech Recognition with Sparse Point Process Models Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,

More information

Lab 9a. Linear Predictive Coding for Speech Processing

Lab 9a. Linear Predictive Coding for Speech Processing EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)

More information

Frequency Domain Speech Analysis

Frequency Domain Speech Analysis Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex

More information

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Lecture 5: GMM Acoustic Modeling and Feature Extraction CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic

More information

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic

More information

Voiced Speech. Unvoiced Speech

Voiced Speech. Unvoiced Speech Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [

More information

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT. CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

Linear Prediction 1 / 41

Linear Prediction 1 / 41 Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution

More information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA

More information

representation of speech

representation of speech Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l

More information

Feature extraction 2

Feature extraction 2 Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods

More information

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis

A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav Spectral Flatness

More information

Estimation of Cepstral Coefficients for Robust Speech Recognition

Estimation of Cepstral Coefficients for Robust Speech Recognition Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment

More information

Allpass Modeling of LP Residual for Speaker Recognition

Allpass Modeling of LP Residual for Speaker Recognition Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

ON THE USE OF PHASE INFORMATION FOR SPEECH RECOGNITION. Baris Bozkurt and Laurent Couvreur

ON THE USE OF PHASE INFORMATION FOR SPEECH RECOGNITION. Baris Bozkurt and Laurent Couvreur ON THE USE OF PHASE NFOMATON FO SPEECH ECOGNTON Baris Bozkurt and Laurent Couvreur TCTS Lab, Faculté Polytechnique De Mons, nitialis Scientific Park, B-7000 Mons, Belgium, hone: +32 65 374733, fax: +32

More information

Modeling the creaky excitation for parametric speech synthesis.

Modeling the creaky excitation for parametric speech synthesis. Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity

More information

Robust Speaker Identification

Robust Speaker Identification Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }

More information

Applications of Linear Prediction

Applications of Linear Prediction SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Proc. of NCC 2010, Chennai, India

Proc. of NCC 2010, Chennai, India Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay

More information

Chirp Transform for FFT

Chirp Transform for FFT Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a

More information

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR

Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper

More information

Voice Activity Detection Using Pitch Feature

Voice Activity Detection Using Pitch Feature Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech

More information

Vocoding approaches for statistical parametric speech synthesis

Vocoding approaches for statistical parametric speech synthesis Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,

More information

Just Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice

Just Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice Just Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice * Nathalie Henrich, *Gunilla Sundin, *Daniel Ambroise, Christophe d Alessandro, *Michèle Castellengo, and Boris Doval

More information

Sound 2: frequency analysis

Sound 2: frequency analysis COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure

More information

A Low-Cost Robust Front-end for Embedded ASR System

A Low-Cost Robust Front-end for Embedded ASR System A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola

More information

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III

More information

LINEAR-PHASE FIR FILTERS DESIGN

LINEAR-PHASE FIR FILTERS DESIGN LINEAR-PHASE FIR FILTERS DESIGN Prof. Siripong Potisuk inimum-phase Filters A digital filter is a minimum-phase filter if and only if all of its zeros lie inside or on the unit circle; otherwise, it is

More information

Improved system blind identification based on second-order cyclostationary statistics: A group delay approach

Improved system blind identification based on second-order cyclostationary statistics: A group delay approach SaÅdhanaÅ, Vol. 25, Part 2, April 2000, pp. 85±96. # Printed in India Improved system blind identification based on second-order cyclostationary statistics: A group delay approach P V S GIRIDHAR 1 and

More information

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking

Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking Dhananjaya Gowda and

More information

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag

where =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power

More information

HARMONIC WAVELET TRANSFORM SIGNAL DECOMPOSITION AND MODIFIED GROUP DELAY FOR IMPROVED WIGNER- VILLE DISTRIBUTION

HARMONIC WAVELET TRANSFORM SIGNAL DECOMPOSITION AND MODIFIED GROUP DELAY FOR IMPROVED WIGNER- VILLE DISTRIBUTION HARMONIC WAVELET TRANSFORM SIGNAL DECOMPOSITION AND MODIFIED GROUP DELAY FOR IMPROVED WIGNER- VILLE DISTRIBUTION IEEE 004. All rights reserved. This paper was published in Proceedings of International

More information

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X The Z-Transform For a phasor: X(k) = e jωk We have previously derived: Y = H(z)X That is, the output of the filter (Y(k)) is derived by multiplying the input signal (X(k)) by the transfer function (H(z)).

More information

Speech Enhancement with Applications in Speech Recognition

Speech Enhancement with Applications in Speech Recognition Speech Enhancement with Applications in Speech Recognition A First Year Report Submitted to the School of Computer Engineering of the Nanyang Technological University by Xiao Xiong for the Confirmation

More information

Application of the Bispectrum to Glottal Pulse Analysis

Application of the Bispectrum to Glottal Pulse Analysis ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker

More information

Resonances and mode shapes of the human vocal tract during vowel production

Resonances and mode shapes of the human vocal tract during vowel production Resonances and mode shapes of the human vocal tract during vowel production Atle Kivelä, Juha Kuortti, Jarmo Malinen Aalto University, School of Science, Department of Mathematics and Systems Analysis

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Fourier Analysis of Signals Using the DFT

Fourier Analysis of Signals Using the DFT Fourier Analysis of Signals Using the DFT ECE 535 Lecture April 29, 23 Overview: Motivation Many applications require analyzing the frequency content of signals Speech processing study resonances of vocal

More information

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,

More information

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS

SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for

More information

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender

Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm

More information

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes

MVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report

More information

OSE801 Engineering System Identification. Lecture 09: Computing Impulse and Frequency Response Functions

OSE801 Engineering System Identification. Lecture 09: Computing Impulse and Frequency Response Functions OSE801 Engineering System Identification Lecture 09: Computing Impulse and Frequency Response Functions 1 Extracting Impulse and Frequency Response Functions In the preceding sections, signal processing

More information

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

The effect of speaking rate and vowel context on the perception of consonants. in babble noise The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu

More information

RAMCESS 2.X framework expressive voice analysis for realtime and accurate synthesis of singing

RAMCESS 2.X framework expressive voice analysis for realtime and accurate synthesis of singing J Multimodal User Interfaces (2008) 2: 133 144 DOI 10.1007/s12193-008-0010-4 ORIGINAL PAPER RAMCESS 2.X framework expressive voice analysis for realtime and accurate synthesis of singing Nicolas d Alessandro

More information

Gaussian Processes for Audio Feature Extraction

Gaussian Processes for Audio Feature Extraction Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline

More information

QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING

QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING Dhananjaya Gowda, Manu Airaksinen, Paavo Alku Dept. of Signal Processing and Acoustics,

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Timbral, Scale, Pitch modifications

Timbral, Scale, Pitch modifications Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications

More information

Probabilistic Modeling of Speech and Language

Probabilistic Modeling of Speech and Language Probabilistic Modeling of Speech and Language Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab, Department of Electronic Engineering, Tsinghua University, Beijing, China. Now Visiting Scholar

More information

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions

Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions The complex cepstrum, ˆx[n], of a sequence x[n] is the inverse Fourier transform of the complex

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

A latent variable modelling approach to the acoustic-to-articulatory mapping problem A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk

More information

LAB 6: FIR Filter Design Summer 2011

LAB 6: FIR Filter Design Summer 2011 University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering ECE 311: Digital Signal Processing Lab Chandra Radhakrishnan Peter Kairouz LAB 6: FIR Filter Design Summer 011

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Exemplar-based voice conversion using non-negative spectrogram deconvolution

Exemplar-based voice conversion using non-negative spectrogram deconvolution Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore

More information

CS578- Speech Signal Processing

CS578- Speech Signal Processing CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction

More information

TinySR. Peter Schmidt-Nielsen. August 27, 2014

TinySR. Peter Schmidt-Nielsen. August 27, 2014 TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),

More information

Model-based unsupervised segmentation of birdcalls from field recordings

Model-based unsupervised segmentation of birdcalls from field recordings Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:

More information

Acoustic holography. LMS Test.Lab. Rev 12A

Acoustic holography. LMS Test.Lab. Rev 12A Acoustic holography LMS Test.Lab Rev 12A Copyright LMS International 2012 Table of Contents Chapter 1 Introduction... 5 Chapter 2... 7 Section 2.1 Temporal and spatial frequency... 7 Section 2.2 Time

More information

L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

More information

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models 8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato

More information

COMP 546, Winter 2018 lecture 19 - sound 2

COMP 546, Winter 2018 lecture 19 - sound 2 Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition. Recognizing Speech EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/

More information

Lecture 9: Speech Recognition

Lecture 9: Speech Recognition EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis

More information

Chapter 9 Automatic Speech Recognition DRAFT

Chapter 9 Automatic Speech Recognition DRAFT P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.

More information

Lecture 7: Feature Extraction

Lecture 7: Feature Extraction Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of

More information

Introduction to Biomedical Engineering

Introduction to Biomedical Engineering Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis

More information

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.

Source/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer. Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that

More information

-Digital Signal Processing- FIR Filter Design. Lecture May-16

-Digital Signal Processing- FIR Filter Design. Lecture May-16 -Digital Signal Processing- FIR Filter Design Lecture-17 24-May-16 FIR Filter Design! FIR filters can also be designed from a frequency response specification.! The equivalent sampled impulse response

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Introduction Basic Audio Feature Extraction

Introduction Basic Audio Feature Extraction Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017 Today g Main modules A. Sound and music for

More information

Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Enhancement of Noisy Speech. State-of-the-Art and Perspectives Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction

More information

ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH

ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH Wolfgang Wokurek Institute of Natural Language Processing, University of Stuttgart, Germany wokurek@ims.uni-stuttgart.de, http://www.ims-stuttgart.de/~wokurek

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information