Zeros of z-transform(zzt) representation and chirp group delay processing for analysis of source and filter characteristics of speech signals
|
|
- Tracy Greer
- 5 years ago
- Views:
Transcription
1 Zeros of z-transformzzt representation and chirp group delay processing for analysis of source and filter characteristics of speech signals Baris Bozkurt 1 Collaboration with LIMSI-CNRS, France 07/03/2017
2 What is new in this thesis? q We present new spectral representations: ZZT and three group delay based representations and new algorithms demonstrating applications of these representations in various speech analysis problems: Source-tract separation Glottal flow parameter estimation Formant tracking Feature extraction for Automatic Speech recognition q We study in detail the phase estimation problem and propose solutions to existing problems 2 q We discuss group delay characteristics of a mixed-phase speech model 2
3 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 3 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 3
4 4 Motivations Primary motivation : Voice quality analysis for TTS Starting pointafter a literature review: spectral methods Two main problems : source-tract separation, fourier transform phase processing Potential impact areas for this study Source-tract separation: voice quality analysis, speech synthesis, emotion studies,speech therapy, speaker recognition. Phase processing: speech perception, speech recognition, speech coding. Group delay characteristics of the mixed-phase speech model: speech processing theory target application -> basic research -> larger impact 4
5 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 5 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 5
6 Spectral analysis of signals z-transform X z = N 1 n= 0 x n z n Fourier transform X w X z = a w jb w = = jw + z e 6 Magnitude => X + Phase => θ w = Group. delay => τ w 6 2 w = a w b w b w arctan a w d θ w = dw 2
7 All-pole filter response and causality 7 For causality detection, phase processing is essential 7
8 Why study group delay processing? poles of an all-pole filter -Higher resolution -Tilt free -Mixed-phase information 8 advantagegrpd.avi 8
9 Contents 9 q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 9
10 A Mixed-Phase model of speech Maximum-phase Glottal flow excitation* * = Minimum-phase vocal tract filter plus the GF return phase + = 10 Mixed-phase speech signal Important note: Mixed-phase characteristic can only be observed in phase/group delay spectrum = *:after Gardner 1994 and Doval & D Alessandro =
11 Preliminary trials with chirp group delay processing 11 Not robust and we don t know the reason. Bozkurt & Dutoit, VOQUAL 2003 Hint comes from Prof. Kawahara in VOQUAL03: windowing may play a role 11
12 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 12 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 12
13 Problems in group delay analysis of speech Problem! Group Delay Functions are most often very noisy Reason: Roots of the z-transform polynomial close to unit circle Yegnanarayana and Murthy89 Conclusion: 13 A systematic study of roots of Z-transform for speech signals is needed Thanks to todays technology! difficultywithzeros.avi 13
14 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 14 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 14
15 15 15 = = + = = N n N m m N n Z z z x z n x z X ZZT representation: the set of zeros of the z-transform polynomial Almost impossible to study analytically for most of the functions, therefore numerical methods are used roots function of Matlab 1 0,1..., = = N n a n x n = = = N n N n n z a z a z a z X Basic elemantary signal : power series Zeros of Z-TransformZZT Representation
16 ZZT of elementary signals ZZT of damped sinusoid ZZT of causal all-pole filter response 16expCoeffInDampedSinusoid.avi causalresponsezeros1.avi 16
17 Zero-patterns for the LF model * of glottal flow derivative First phase g t 0 ω t = E e sin gt, 0 t t [ ] α e e c 0 e Return phase g E εt ε t te ε tc te t = e e, t t t T a 17 *: Fant et al,
18 ZZT representation of speech Synthetic mixed-phase speech = + + = 18 periodicity results in many zeros on the unit circle 18 first phase of the glottal flow adds zeros outside the unit circle vocal tract response zeros lie inside the unit circle AND WINDOWING EFFECT TO ZZT IS DRASTIC!
19 All-zero representation of windowed speech Non-GCI Synchronous windowing GCI Synchronous windowing Rectangular window Rectangular window 19 19
20 All-zero representation of speech Window location effect to ZZT plots 2T0 case Windowing of synthetic speech Windowing of real speech 20 synth2t0blackman.avi real2t0blackman.avi 20
21 Window function effect to ZZT and group delay Best choices are Blackman, Gaussian and Hanning-Poisson 21 Smoothness can be adjusted by varying coefficients in Gaussian and Hanning- Poisson 21
22 Window size effect to ZZT 22 What do other people do? Pitch asynchronous, 3T0 window size, Hamming -> all are bad choices for phase processing 22
23 23 23 Recently proposed group delay representations in literature 2 ω ω ω ω ω ω τ X Y X Y X I I R R p + = γ ω ω ω ω ω ω τ 2 S Y X Y X I I R R p + = Modified group delay function, MODGDF Hegde et al, ICSLP 2004 Product spectrum, PS Zhu and Paliwal, ICASSP ω ω ω ω ω τ ω ω I I R R p Y X Y X X Q + = = ]}} [ { { n x FT X R = real ω ]}} [ { { n nx FT Y I = imag ω responsible! for spikes replaced by a cepstrally smoothed version responsible! completely removed Originality in our approach: Studying zero patterns, trying to find means of avoiding/removing zeros close to the unit circle
24 ZZT and group delay of GCI-synchronously windowed speech Group Delay of GCI-Synchronously windowed speech GDGCI ZZT Amp. Spec. 24 GDGCI 24
25 Group delay spectrogram using GDGCI Hanning-Poisson, 2T 0 25 The formant frequencies of a given speech signal can be estimated from phase spectrum once windowing is properly performed 25
26 Chirp group delay of GCI-synchronously windowed speech CGDGCI Basic ideas: -remove unwanted zeros -compute chirp group delay away from the resting zeros CGD outside unit circle directly from signal after zero removal CGD inside unit circle directly from signal after zero removal 26 CGDGCI Disadvantages: computationally heavy, GCI-synchronous 26
27 Chirp group delay of the Zero-Phase version of the signal CGDZP GCI-synhcronous processing is not practical for ASR. Speech windowing fft abs ifft CGD R=1.12 CGDZP Zero phasing constant frame size/shift optimized by testing performance for incrementing values 27 whyzerophase.avi CGDcompared2AmpSpec.avi 27
28 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 28 Source-tract separation Formant tracking Glottal flow parameter estimation Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 28
29 Zero-decomposition for source-tract separation Kawahara et al, GCIdetection.avi Bozkurt et. al., ICSLP 2004-a 29
30 Zero-decomposition for sourcetract separation Synthetic glottal excitation Original windowed speech Original amp. spectum Synthetic speech Original+reconstructed glottal excitation Original+reconstructed glottal amp. spectrum 30 ZZT Original+reconstructed vocal tract response Original+reconstructed tract transfer function Zero-decomposition 30
31 Zero-decomposition for sourcetract separation Real speech Original windowed speech Original amp. spectum ZZT reconstructedglottal excitation reconstructed glottal amp. spectrum 31 reconstructed vocal tract response Zero-decomposition reconstructed tract transfer function Copy-Synth Noise excited tract 31
32 Comperative example: ZZT-decomposition and PSIAIF Glottal flowgf Differential glottal flowdgf Original ZZT-decomp estimate PSIAIF 32 Amp. Spec. of DGF Group delay of DGF 32
33 Robustness of ZZT-decomposition To GCI estimation errors To F1 variation F1= Hz F1= Hz To noise To return phase variations F1= Hz 33 33
34 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 34 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 34
35 Application 2 : ZZT+Group Delay Processing Glottal formant frequency Fg estimation Synthetic vowels a, u, i Real speech f0=100hz OQ? From EGG 35 f0=200hz *Acknowledgement: NB: Fg=fF0,1/OpenQuotient,Asym. Doval et al, VOQUAL 2003 Open quotient estimate provided by Nathalie Henrich. Henrich, N., et al
36 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 36 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 36
37 Application 3 : ZZT+Group Delay Processing Candidates: DPPT, WinSnoori, Praat two publicly available tools Conclusion: Formant tracking with CGDGCI: DPPT Results combined with real speech tests show that Praat and DPPT are comparable in quality and superior to Win Snoori. The disadvantage of DPPT is its low speed and dependency to GCI. Average percentage error Formant miss rate 37 F1 F2 F3 F4 F1 F2 F3 F4 DPPT WinSnoori Praat
38 Application 3 : ZZT+Group Delay Processing Formant tracking with CGDZP: Fast-DPPT SPEECH DATA Fixed frame-size and frame-shift Blackman windowing Frame-size, frame-shift, number of formants to track Computation of the zero-phase version of the signal Computation of chirp group delay outside the unit circle CGDZP Peak picking Decrement/increment radius of analysis circle 38 number of peaks equal to number of formants to track? NO YES Formant frequencies 38
39 Formant tracking tests on real speech 39 Stimuli: 10 real speech examples5 female, 5 male with large formant variations. Candidates: Fast-DPPT, Wave Surfer, Praat two publicly available tools Conclusion: They have similar quality, Fast- DPPT lacks a post-processing module for guaranteeing continuity of tracks. 39
40 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 40 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 40
41 Application of the Mixed-phase model MixLP* for glottal flow parameter estimation Mixed-phase speech model Most of the existing LP methods can only estimate resonances for the minimum phase version of the signal. Poles outside the unit circle are avoided In MixLP, we look for poles outside the unit circle. Conclusion: Works well for synthetic speech, not robust for analyzing real speech. 41 Bozkurt, Severin, Dutoit, 2004, *:implemented and tested by Francois Severin 41
42 Contents q Motivations q Spectral analysis of signals q Mixed-phase speech model and group delay characteristics q Difficulties in group delay processing q ZZT representation and chirp group delay processing q Applications 42 Source-tract separation Glottal flow parameter estimation Formant tracking Feature estimation for Automatic Speech Recognition q Conclusions and Future Work 42
43 Application 5 : ZZT+Group Delay Processing Automatic Speech RecognitionASR 43 43
44 Computation of ASR features ASR system speech signal Front-End Acoustic Model Word Decoder word sequence MLP HMM topology lexicon grammar 44 MFCC as baseline method Alternative method: replace power spectrum function in MFCC by group delay functions MODGDF, PS, GDCGI, CGDCGI, CGDZP. 44
45 Combining acoustic models speech signal MFCC Front-End Alternative Front-End Acoustic Model 1 Acoustic Model 2 Combination Acoustic Model word sequence MODGDF, PS GDCGI, CGDCGI CGDZP 45 Combine HMM state probabilities MLP outputs at frame level as a weighted geometric average, P λ 1 λ 12 si vt = P1 si vt P2 si vt with λ optimized between 0 and 1. 45
46 46 Application 4 : ZZT+Group Delay Processing Automatic Speech RecognitionASR* Proposed group delay representations are compared with representations proposed in recent studies[hegde2004, Alsteris2004], in an ASR experiment. Our proposed techniques provide better results and have the potential to improve ASR performance. Feature SNR db Extraction MFCC 1.9 œ 6.7 œ 18.6 œ 45.2 œ 75.1 œ 88.8 œ 91.5 œ MODGDF PS GDGCI CGDGCI CGDZP Performances of ASR system word error rate WER in percent for various feature extraction on the AURORA-2 task lexicon reduced to English digits and no grammar is applied. Training: 8440 noise-free utterances spoken by 110 speakers. Evaluation: 4004 different noise-free utterances spoken by 104 other speakers. *:ASR tests handled by Laurent Couvreur/TCTS Lab 46
47 47 Deficiencies, future work, work not included in the thesis Some of the algorithms are not completely tested due to time constraints and therefore stand just as demonstrations of potential. ZZT-decomposition is not throughly tested ASR tests are limited It is an experimental study, analytical part is weak Since studying zero locations is difficult if not impossible Future work: Voice quality labeling/classification: Source-tract decomposition using complex cepstrum Restudying phase related problems in speech processing Studies on TTS during the are not included in the thesis 47
Chirp Decomposition of Speech Signals for Glottal Source Estimation
Chirp Decomposition of Speech Signals for Glottal Source Estimation Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationGlottal Source Estimation using an Automatic Chirp Decomposition
Glottal Source Estimation using an Automatic Chirp Decomposition Thomas Drugman 1, Baris Bozkurt 2, Thierry Dutoit 1 1 TCTS Lab, Faculté Polytechnique de Mons, Belgium 2 Department of Electrical & Electronics
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationFeature extraction 1
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More informationL8: Source estimation
L8: Source estimation Glottal and lip radiation models Closed-phase residual analysis Voicing/unvoicing detection Pitch detection Epoch detection This lecture is based on [Taylor, 2009, ch. 11-12] Introduction
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationSignal representations: Cepstrum
Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,
More informationSpeech Signal Representations
Speech Signal Representations Berlin Chen 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapters 5, 6 2. J. R. Deller et. al., Discrete-Time Processing of Speech Signals, Chapters 4-6
More informationMel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation. Keiichi Tokuda
Mel-Generalized Cepstral Representation of Speech A Unified Approach to Speech Spectral Estimation Keiichi Tokuda Nagoya Institute of Technology Carnegie Mellon University Tamkang University March 13,
More informationImproved Method for Epoch Extraction in High Pass Filtered Speech
Improved Method for Epoch Extraction in High Pass Filtered Speech D. Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham (University) Coimbatore, Tamilnadu 642 Email: d
More informationSPEECH ANALYSIS AND SYNTHESIS
16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques
More informationSPEECH COMMUNICATION 6.541J J-HST710J Spring 2004
6.541J PS3 02/19/04 1 SPEECH COMMUNICATION 6.541J-24.968J-HST710J Spring 2004 Problem Set 3 Assigned: 02/19/04 Due: 02/26/04 Read Chapter 6. Problem 1 In this problem we examine the acoustic and perceptual
More informationCausal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation
Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation Thomas Drugman, Baris Bozkurt, Thierry Dutoit To cite this version: Thomas Drugman, Baris Bozkurt, Thierry
More informationSinusoidal Modeling. Yannis Stylianou SPCC University of Crete, Computer Science Dept., Greece,
Sinusoidal Modeling Yannis Stylianou University of Crete, Computer Science Dept., Greece, yannis@csd.uoc.gr SPCC 2016 1 Speech Production 2 Modulators 3 Sinusoidal Modeling Sinusoidal Models Voiced Speech
More informationCausal anticausal decomposition of speech using complex cepstrum for glottal source estimation
Available online at www.sciencedirect.com Speech Communication 53 (2011) 855 866 www.elsevier.com/locate/specom Causal anticausal decomposition of speech using complex cepstrum for glottal source estimation
More informationDetection-Based Speech Recognition with Sparse Point Process Models
Detection-Based Speech Recognition with Sparse Point Process Models Aren Jansen Partha Niyogi Human Language Technology Center of Excellence Departments of Computer Science and Statistics ICASSP 2010 Dallas,
More informationLab 9a. Linear Predictive Coding for Speech Processing
EE275Lab October 27, 2007 Lab 9a. Linear Predictive Coding for Speech Processing Pitch Period Impulse Train Generator Voiced/Unvoiced Speech Switch Vocal Tract Parameters Time-Varying Digital Filter H(z)
More informationFrequency Domain Speech Analysis
Frequency Domain Speech Analysis Short Time Fourier Analysis Cepstral Analysis Windowed (short time) Fourier Transform Spectrogram of speech signals Filter bank implementation* (Real) cepstrum and complex
More informationLecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationNoise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm
EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 0-05 June 2008. Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic
More informationVoiced Speech. Unvoiced Speech
Digital Speech Processing Lecture 2 Homomorphic Speech Processing General Discrete-Time Model of Speech Production p [ n] = p[ n] h [ n] Voiced Speech L h [ n] = A g[ n] v[ n] r[ n] V V V p [ n ] = u [
More informationCEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.
CEPSTRAL ANALYSIS SYNTHESIS ON THE EL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITH FOR IT. Summarized overview of the IEEE-publicated papers Cepstral analysis synthesis on the mel frequency scale by Satochi
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationLinear Prediction 1 / 41
Linear Prediction 1 / 41 A map of speech signal processing Natural signals Models Artificial signals Inference Speech synthesis Hidden Markov Inference Homomorphic processing Dereverberation, Deconvolution
More informationA comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information
Sādhanā Vol. 38, Part 4, August 23, pp. 59 62. c Indian Academy of Sciences A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information DEBADATTA
More informationrepresentation of speech
Digital Speech Processing Lectures 7-8 Time Domain Methods in Speech Processing 1 General Synthesis Model voiced sound amplitude Log Areas, Reflection Coefficients, Formants, Vocal Tract Polynomial, l
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationA Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis
A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis Authors: Augustine H. Gray and John D. Markel By Kaviraj, Komaljit, Vaibhav Spectral Flatness
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationAllpass Modeling of LP Residual for Speaker Recognition
Allpass Modeling of LP Residual for Speaker Recognition K. Sri Rama Murty, Vivek Boominathan and Karthika Vijayan Department of Electrical Engineering, Indian Institute of Technology Hyderabad, India email:
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationON THE USE OF PHASE INFORMATION FOR SPEECH RECOGNITION. Baris Bozkurt and Laurent Couvreur
ON THE USE OF PHASE NFOMATON FO SPEECH ECOGNTON Baris Bozkurt and Laurent Couvreur TCTS Lab, Faculté Polytechnique De Mons, nitialis Scientific Park, B-7000 Mons, Belgium, hone: +32 65 374733, fax: +32
More informationModeling the creaky excitation for parametric speech synthesis.
Modeling the creaky excitation for parametric speech synthesis. 1 Thomas Drugman, 2 John Kane, 2 Christer Gobl September 11th, 2012 Interspeech Portland, Oregon, USA 1 University of Mons, Belgium 2 Trinity
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationApplications of Linear Prediction
SGN-4006 Audio and Speech Processing Applications of Linear Prediction Slides for this lecture are based on those created by Katariina Mahkonen for TUT course Puheenkäsittelyn menetelmät in Spring 03.
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationProc. of NCC 2010, Chennai, India
Proc. of NCC 2010, Chennai, India Trajectory and surface modeling of LSF for low rate speech coding M. Deepak and Preeti Rao Department of Electrical Engineering Indian Institute of Technology, Bombay
More informationChirp Transform for FFT
Chirp Transform for FFT Since the FFT is an implementation of the DFT, it provides a frequency resolution of 2π/N, where N is the length of the input sequence. If this resolution is not sufficient in a
More informationImproved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR
Improved Speech Presence Probabilities Using HMM-Based Inference, with Applications to Speech Enhancement and ASR Bengt J. Borgström, Student Member, IEEE, and Abeer Alwan, IEEE Fellow Abstract This paper
More informationVoice Activity Detection Using Pitch Feature
Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1 CONTENTS Introduction Related work Proposed Improvement References Questions 2 PROBLEM speech Non speech Speech Region Non Speech
More informationVocoding approaches for statistical parametric speech synthesis
Vocoding approaches for statistical parametric speech synthesis Ranniery Maia Toshiba Research Europe Limited Cambridge Research Laboratory Speech Synthesis Seminar Series CUED, University of Cambridge,
More informationJust Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice
Just Noticeable Differences of Open Quotient and Asymmetry Coefficient in Singing Voice * Nathalie Henrich, *Gunilla Sundin, *Daniel Ambroise, Christophe d Alessandro, *Michèle Castellengo, and Boris Doval
More informationSound 2: frequency analysis
COMP 546 Lecture 19 Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340 m/s, or 34 cm/ ms. (This depends on temperature and other factors) 2 Wave equation Pressure
More informationA Low-Cost Robust Front-end for Embedded ASR System
A Low-Cost Robust Front-end for Embedded ASR System Lihui Guo 1, Xin He 2, Yue Lu 1, and Yaxin Zhang 2 1 Department of Computer Science and Technology, East China Normal University, Shanghai 200062 2 Motorola
More informationON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION
Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III
More informationLINEAR-PHASE FIR FILTERS DESIGN
LINEAR-PHASE FIR FILTERS DESIGN Prof. Siripong Potisuk inimum-phase Filters A digital filter is a minimum-phase filter if and only if all of its zeros lie inside or on the unit circle; otherwise, it is
More informationImproved system blind identification based on second-order cyclostationary statistics: A group delay approach
SaÅdhanaÅ, Vol. 25, Part 2, April 2000, pp. 85±96. # Printed in India Improved system blind identification based on second-order cyclostationary statistics: A group delay approach P V S GIRIDHAR 1 and
More informationTime-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Time-varying quasi-closed-phase weighted linear prediction analysis of speech for accurate formant detection and tracking Dhananjaya Gowda and
More informationwhere =0,, 1, () is the sample at time index and is the imaginary number 1. Then, () is a vector of values at frequency index corresponding to the mag
Efficient Discrete Tchebichef on Spectrum Analysis of Speech Recognition Ferda Ernawan and Nur Azman Abu Abstract Speech recognition is still a growing field of importance. The growth in computing power
More informationHARMONIC WAVELET TRANSFORM SIGNAL DECOMPOSITION AND MODIFIED GROUP DELAY FOR IMPROVED WIGNER- VILLE DISTRIBUTION
HARMONIC WAVELET TRANSFORM SIGNAL DECOMPOSITION AND MODIFIED GROUP DELAY FOR IMPROVED WIGNER- VILLE DISTRIBUTION IEEE 004. All rights reserved. This paper was published in Proceedings of International
More informationModeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring Kornel Laskowski & Qin Jin Carnegie Mellon University Pittsburgh PA, USA 28 June, 2010 Laskowski & Jin ODYSSEY 2010,
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationThe Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X
The Z-Transform For a phasor: X(k) = e jωk We have previously derived: Y = H(z)X That is, the output of the filter (Y(k)) is derived by multiplying the input signal (X(k)) by the transfer function (H(z)).
More informationSpeech Enhancement with Applications in Speech Recognition
Speech Enhancement with Applications in Speech Recognition A First Year Report Submitted to the School of Computer Engineering of the Nanyang Technological University by Xiao Xiong for the Confirmation
More informationApplication of the Bispectrum to Glottal Pulse Analysis
ISCA Archive http://www.isca-speech.org/archive ITRW on Non-Linear Speech Processing (NOLISP 3) Le Croisic, France May 2-23, 23 Application of the Bispectrum to Glottal Pulse Analysis Dr Jacqueline Walker
More informationResonances and mode shapes of the human vocal tract during vowel production
Resonances and mode shapes of the human vocal tract during vowel production Atle Kivelä, Juha Kuortti, Jarmo Malinen Aalto University, School of Science, Department of Mathematics and Systems Analysis
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationFourier Analysis of Signals Using the DFT
Fourier Analysis of Signals Using the DFT ECE 535 Lecture April 29, 23 Overview: Motivation Many applications require analyzing the frequency content of signals Speech processing study resonances of vocal
More informationEstimation of Relative Operating Characteristics of Text Independent Speaker Verification
International Journal of Engineering Science Invention Volume 1 Issue 1 December. 2012 PP.18-23 Estimation of Relative Operating Characteristics of Text Independent Speaker Verification Palivela Hema 1,
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationNearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis. Thomas Ewender
Nearly Perfect Detection of Continuous F 0 Contour and Frame Classification for TTS Synthesis Thomas Ewender Outline Motivation Detection algorithm of continuous F 0 contour Frame classification algorithm
More informationMVA Processing of Speech Features. Chia-Ping Chen, Jeff Bilmes
MVA Processing of Speech Features Chia-Ping Chen, Jeff Bilmes {chiaping,bilmes}@ee.washington.edu SSLI Lab Dept of EE, University of Washington Seattle, WA - UW Electrical Engineering UWEE Technical Report
More informationOSE801 Engineering System Identification. Lecture 09: Computing Impulse and Frequency Response Functions
OSE801 Engineering System Identification Lecture 09: Computing Impulse and Frequency Response Functions 1 Extracting Impulse and Frequency Response Functions In the preceding sections, signal processing
More informationThe effect of speaking rate and vowel context on the perception of consonants. in babble noise
The effect of speaking rate and vowel context on the perception of consonants in babble noise Anirudh Raju Department of Electrical Engineering, University of California, Los Angeles, California, USA anirudh90@ucla.edu
More informationRAMCESS 2.X framework expressive voice analysis for realtime and accurate synthesis of singing
J Multimodal User Interfaces (2008) 2: 133 144 DOI 10.1007/s12193-008-0010-4 ORIGINAL PAPER RAMCESS 2.X framework expressive voice analysis for realtime and accurate synthesis of singing Nicolas d Alessandro
More informationGaussian Processes for Audio Feature Extraction
Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner (ret26@cam.ac.uk) Computational and Biological Learning Lab Department of Engineering University of Cambridge Machine hearing pipeline
More informationQUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING
QUASI CLOSED PHASE ANALYSIS OF SPEECH SIGNALS USING TIME VARYING WEIGHTED LINEAR PREDICTION FOR ACCURATE FORMANT TRACKING Dhananjaya Gowda, Manu Airaksinen, Paavo Alku Dept. of Signal Processing and Acoustics,
More informationDeep Learning for Speech Recognition. Hung-yi Lee
Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New
More informationTimbral, Scale, Pitch modifications
Introduction Timbral, Scale, Pitch modifications M2 Mathématiques / Vision / Apprentissage Audio signal analysis, indexing and transformation Page 1 / 40 Page 2 / 40 Modification of playback speed Modifications
More informationProbabilistic Modeling of Speech and Language
Probabilistic Modeling of Speech and Language Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab, Department of Electronic Engineering, Tsinghua University, Beijing, China. Now Visiting Scholar
More informationDepartment of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions
Problem 1 Department of Electrical and Computer Engineering Digital Speech Processing Homework No. 6 Solutions The complex cepstrum, ˆx[n], of a sequence x[n] is the inverse Fourier transform of the complex
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationA latent variable modelling approach to the acoustic-to-articulatory mapping problem
A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk
More informationLAB 6: FIR Filter Design Summer 2011
University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering ECE 311: Digital Signal Processing Lab Chandra Radhakrishnan Peter Kairouz LAB 6: FIR Filter Design Summer 011
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationCS578- Speech Signal Processing
CS578- Speech Signal Processing Lecture 7: Speech Coding Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab yannis@csd.uoc.gr Univ. of Crete Outline 1 Introduction
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationModel-based unsupervised segmentation of birdcalls from field recordings
Model-based unsupervised segmentation of birdcalls from field recordings Anshul Thakur School of Computing and Electrical Engineering Indian Institute of Technology Mandi Himachal Pradesh, India Email:
More informationAcoustic holography. LMS Test.Lab. Rev 12A
Acoustic holography LMS Test.Lab Rev 12A Copyright LMS International 2012 Table of Contents Chapter 1 Introduction... 5 Chapter 2... 7 Section 2.1 Temporal and spatial frequency... 7 Section 2.2 Time
More informationL6: Short-time Fourier analysis and synthesis
L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationCOMP 546, Winter 2018 lecture 19 - sound 2
Sound waves Last lecture we considered sound to be a pressure function I(X, Y, Z, t). However, sound is not just any function of those four variables. Rather, sound obeys the wave equation: 2 I(X, Y, Z,
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationChapter 9 Automatic Speech Recognition DRAFT
P R E L I M I N A R Y P R O O F S. Unpublished Work c 2008 by Pearson Education, Inc. To be published by Pearson Prentice Hall, Pearson Education, Inc., Upper Saddle River, New Jersey. All rights reserved.
More informationLecture 7: Feature Extraction
Lecture 7: Feature Extraction Kai Yu SpeechLab Department of Computer Science & Engineering Shanghai Jiao Tong University Autumn 2014 Kai Yu Lecture 7: Feature Extraction SJTU Speech Lab 1 / 28 Table of
More informationIntroduction to Biomedical Engineering
Introduction to Biomedical Engineering Biosignal processing Kung-Bin Sung 6/11/2007 1 Outline Chapter 10: Biosignal processing Characteristics of biosignals Frequency domain representation and analysis
More informationSource/Filter Model. Markus Flohberger. Acoustic Tube Models Linear Prediction Formant Synthesizer.
Source/Filter Model Acoustic Tube Models Linear Prediction Formant Synthesizer Markus Flohberger maxiko@sbox.tugraz.at Graz, 19.11.2003 2 ACOUSTIC TUBE MODELS 1 Introduction Speech synthesis methods that
More information-Digital Signal Processing- FIR Filter Design. Lecture May-16
-Digital Signal Processing- FIR Filter Design Lecture-17 24-May-16 FIR Filter Design! FIR filters can also be designed from a frequency response specification.! The equivalent sampled impulse response
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationIntroduction Basic Audio Feature Extraction
Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017 Today g Main modules A. Sound and music for
More informationEnhancement of Noisy Speech. State-of-the-Art and Perspectives
Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003 Applications of Noise Reduction
More informationENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH
ENTROPY RATE-BASED STATIONARY / NON-STATIONARY SEGMENTATION OF SPEECH Wolfgang Wokurek Institute of Natural Language Processing, University of Stuttgart, Germany wokurek@ims.uni-stuttgart.de, http://www.ims-stuttgart.de/~wokurek
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More information