Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
|
|
- Aileen Wright
- 5 years ago
- Views:
Transcription
1 Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan
2 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving its training algorithm 4. Discussing relationships to the other techniques (EMLLT, PoE, & HMM-based speech synthesis) 5. Evaluations both in speech recognition & synthesis 6. Conclusions & future plans
3 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving its training algorithm 4. Discussing relationships to the other techniques (EMLLT, PoE, & HMM-based speech synthesis) 5. Evaluations both in speech recognition & synthesis 6. Conclusions & future plans
4 / 42 Typical ASR framework Feature vector: MFCC (MF-PLP), their and Acoustic model: context-dependent HMMs Language model: word N-gram Limitations of the HMM () Piece-wise constant statistics within an HMM state (2) Conditional independence assumption (3) Weak duration modeling
5 2 / 42 Alternative acoustic models () Piece-wise constant statistics within an HMM state Polynomial regression HMM, Hidden dynamical model, Vocal tract resonance model, etc. (2) Conditional independence assumption Partly hidden Markov model, Stochastic segment model, Switching linear dynamical system, Conditional HMM, Dynamic Baysian network, Frame-correlated HMM, etc. (3) Weak duration modeling Hidden semi-markov model
6 3 / 42 Dynamic features [Furui;986] Augmenting dimensionality of observation vectors by adding their time derivatives Recognition accuracy improves very much Asimple method to capture time dependencies Ad hoc, rather than essential solution Allowing inconsistent statistics between static and dynamic features when it is used as a generative model (HMM + static & delta features ignores its relationship)
7 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving its training algorithm 4. Discussing relationships to the other techniques (EMLLT, PoE, & HMM-based speech synthesis) 5. Evaluations both in speech recognition & synthesis 6. Conclusions & future plans
8 4 / 42 Reformulating the HMM as a trajectory model () Output probability of o from a standard HMM Λ o Observation vector sequence (KT ) Observation vector at time t (K ) q Gaussian component sequence Gaussian component at time t K Dimensionality of observation vector
9 KT 5 / 42 Reformulating the HMM as a trajectory model (2) Output probability of o from Λ according to q Gaussian distribution (KT ) KT KT KT Mean vector of q Covariance matrix of q t Mean vector of t Covariance matrix of
10 Observation vector static & dynamic features c t 2 c t c t c t+ c t+2 static feature (M ) st-order time derivative 2nd-order time derivative (K = 3M) Dynamic features calculated from static features Ex.) c t 2 c t c t c t+ c t+2 6 / 42
11 7 / 42 Relationship between o and c in a matrix form = Window matrix projecting c into augmented space o 3MT MT Static feature vect sequence MT Ex)
12 Reformulating the HMM as a trajectory model (3) Current framework Above model is improper in the sense of statistical modeling - It allows inconsist static and dynamic feature vectors when it is used as a generative model Statistical model should be defined as a function of c Original observation is c, not augmented variable o 8 / 42
13 9 / 42 Reformulating the HMM as a trajectory model (4) should be normalized to yield a valid PDF : normalization constant where
14 Reformulating the HMM as a trajectory model (5) Normalized Gaussian distribution different Gaussian o o 2 c c 2 c c 2 P P 2 P P 2 22 P P T 2T c T c T P T P T2 P TT o T Mean vector (MT ) Covariance matrix (MT MT ) 0 / 42
15 Reformulating the HMM as a trajectory model (6) We may define a new statistical model by referred to as "trajectory-hmm" The mean vector is given as a smooth trajectory Variable statistics within a state The covariance matrix P is full Dependency of state output probabilities / 42
16 st Mel-cepstrum Time (frame) sil a i d a sil sil a i d a sil Variance Natural speech Mean trajectory Mean sequence varies in a state Inter-frame correlation captured by Large Small Time (frame) Inter-frame covariance matrix 2 / 42
17 st Mel-cepstrum Time (frame) sil a i d a sil Inter-frame covariance matrix sil a i d a sil Natural speech Mean trajectory Both mean & covariance vary according to durs. & neighboring models Possible to capture coarticulation effects Variance Large Small 2 / 42
18 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving its training algorithm 4. Discussing relationships to the other techniques (EMLLT, PoE, & HMM-based speech synthesis) 5. Evaluations both in speech recognition & synthesis 6. Conclusions & future plans
19 Estimating trajectory-hmm parameters Auxiliary function of EM algorithm (hidden variable: q) Computing is prohibitive - Exact inference is intractable iterbi V approximation Searching the best q Optimizing Λ 2 / 42
20 23 / 42 Optimizing model parameters () Log likelihood of the trajectory-hmm band diagonal diagonal
21 Optimizing trajectory-hmm parameters (2) Introduce Gaussian component sequence matrix: 3MT 3MT 0 3MN 0 3MN N : #Gaussians in the model set : mapping & & 3MT 3MT 0 0 3MN 3MN 24 / 42
22 Optimizing mean vectors By setting symmetric, positive define 0 diagonal 0 = Solution of above set of linear equations m which maximizes model likelihood 25 / 42
23 26 / 42 Optimizing covariance matrices can be optimized using gradient methods (e.g., steepest ascent, quasi-newthon)
24 27 / 42 Searching the best Gaussian component seq. Computing is intractable (Because inter & intra frame covariance matrix is full) Unable to apply iterbi V algorithm to find Using approximate Viterbi algorithm to find better q can be computed time-recursive V iterbi algorithm with delayed decision Possible to search sub-optimal q
25 Time-recursive computation of () 3MT : diagonal, & can be computed : full Computing & is dif ficult However, they can be calculated time-recursively 28 / 42
26 Time-recursive computation of (2) : band, symmetric and positive define mat It can be factorized by Choleky decomposition : Cholesky factor (Upper triangular) : t-th diagonal element of matrix : Depends only on the Gaussian components from to t+l (L = Window length) can be computed time-recursively 29 / 42
27 Time-recursive computation of (3) (Forward substitution) (Backward substitution) can also be computed time-recursively 30 / 42
28 Time-recursive computation of (4) can be computed in a time-recursive manner 3 / 42
29 32 / 42 Viterbi algorithm with delayed decision Ex.) Approx. Viterbi algorithm with 2-frame delayed decision Gaussian components preceding 2 frames succeeding frame State sequence from to t 3 have been determined To compute the likelihood statistics at t+ is required -frame look-ahead
30 32 / 42 Viterbi algorithm with delayed decision Ex.) Approx. Viterbi algorithm with 2-frame delayed decision Gaussian components preceding 2 frames succeeding frame Computing likelihoods of all possible Gaussian sequences staying s at time t
31 32 / 42 Viterbi algorithm with delayed decision Ex.) Approx. Viterbi algorithm with 2-frame delayed decision J-frame delayed decision Gaussian components determine state at t 2 at t Select more likely path incorpolate the effect of state determination for neighbouring frames Coarticulation effect 00-ms 200-ms J=0 20 is sufficient For a 0-ms frame shift
32 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving its training algorithm 4. Discussing relationships to the other techniques (EMLLT, PoE, & HMM-based speech synthesis) 5. Evaluations both in speech recognition & synthesis 6. Conclusions & future plans
33 Relationship to EMLLT () Covariance matrix modeling in ASR diagonal Unable to capture intra-frame correlation full Increasing model parameters Structured inverse covariance (precision) matrix modeling Semi-T ied Covariance Matrices (STC) Extended Maximum Likelihood Linear Transformation (EMLLT) Subspace for Precision And Mean (SPAM) Model STC EMLLT SPAM Basis Type rank- symmetric rank- symmetric full-rank symmetric Basis Order equal to dimensionality more than dimensionality more than dimensionality
34 Relationship to EMLLT (2) Inverse covariance (precision) matrix of trajectory-hmm Precision matrix Sum of rank- symmetric matrces #basis more than dimensionality EMLL T to capture inter-frame correlation 33 / 42
35 34 / 42 Relationship to Product of Experts (PoE) EMLL T as PoE [Sim & Gales;'04] HMM + static & delta features as PoE [Williams;'05] trajectory-hmm can be viewed as PoE PoE representation of the trajectory-hmm Augmented observations are modeled by Gaussian experts Producted Gaussians are normalized to yield valid PDF
36 3 / 42 Relationship to HMM-based speech synthesis Current speech synthesis paradigms Unit selection and concatenation High quality, but sometimes discontinuous Obtaining various voice qualities Large amount of speech data is required Speech synthesis from HMMs themselves (HTS) Buzzy, but smooth & stable V oice quality can be changed (e.g., adaptation, interpolation, eigenvoice)
37 Speech parameter generation from HMM () Synthesizing speech maximizing its output probability For given HMM Λ and Gaussian component sequence q, determine a obs. vector sequence which maximizes its output probabiltity from q: becomes a sequence of mean vectors 5 / 42
38 7 / 42 Speech parameter generation from HMM (2) = Window matrix projecting c into augmented space o 3MT MT Static feature vect sequence MT Ex)
39 7 / 42 Speech parameter generation from HMM (3) Synthesizing speech maximizing its output probability For given HMM Λ and Gaussian component sequence q, determine a obs. vector sequence which maximizes its output probabiltity from q under the constraints o = Wc
40 Speech parameter generation from HMM (3) By setting, we obtain A sequence of speech parameter vector can be determined based on statistics both in static and dynamic features /sil/ /a/ /i/ /sil/ static delta mean variance
41 Relationship between HTS and trajectory-hmm () Mean vector of the trajectory-hmm, Speech parameter vector sequence which maximizes its output probability from q, and are completely the same 9 / 42
42 Relationship between HTS and trajectory-hmm (2) When takes its maximum value? Estimating trajectory-hmm based on ML criterion Minimizing mean square error between c and Minimizing mean square error between c and Estimating parameters maximizing may improve HMM-based speech synthesis 20 / 42
43 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving training algorithm 4. Discussing relationships to other techniques (EMLLT, PoE, HMM-based speech synthesis) 5. Evaluation both in speech recognition & synthesis 6. Conclusion & Future plans
44 Speech recognition experiment Database Training data Test data Feature vector Model structure Training procedure ATR Japanese continuous speech database b-set speaker MHT 450 utterances 53 utterances 0 8 order Mel-cepstral coefficients and its delta and delta delta 3-state left-to-right monophone model with single Gaussian state output distribution After training standard HMMs (baseline), trajectory-hmms are reestimated by using standard HMMs as their initial models 35 / 42
45 Average log likelihood Average log likelihood per frame J : delay J=2 J= #iteration of Viterbi training J=4 J=5 J=6 J=7 Approx. iterbi V algorithm with larger J found more likely q Iterative training improved model likelihood 36 / 42
46 37 / 42 Examples of trajectories (data & mean) c() (sec) sil j i b u N n o j i ts u ry o k u w a pau one of training data HMM mean sequence
47 37 / 42 Examples of trajectories (data & mean) c() (sec) sil j i b u N n o j i ts u ry o k u w a pau one of training data HMM mean sequence mean trajectory from HMM
48 37 / 42 Examples of trajectories (data & mean) c() (sec) sil j i b u N n o j i ts u ry o k u w a pau one of training data HMM mean sequence mean trajectory from HMM mean trajectory from trajectory-hmm
49 Phoneme recognition experiment 00-best lists were genereted using HMMs (Baseline) Each hypothesis was re-aligned Without reference hypothesis (Baseline:9.7%) Phoneme error rate (%) 9 8 J : delay J=2 J=3 J=4 J=5 J=6 J=7 8.0% (9% error reduction) #iteration of Viterbi training 38 / 42
50 Phoneme recognition experiment With reference hypothesis included (Baseline: 5.9%) Phoneme error rate (%) 0 9 J : delay J=2 J=3 J=4 J=5 9.0% (43% error reduction) J=6 J= #itaration of the Viterbi training 39 / 42
51 40 / 42 Speech synthesis experiment Spectrum : single Gauss F0 : multi-space proba Training data CMU ARCTIC database speaker AWB first 096 utteranc Test data Remaining 42 utterances Sampling rate 6 khz Window 25-ms Blackman window Frame rate 5-ms Spectral analysis 24-order Mel-cepstral analysis Dynamic feature calculated from frames Feature vector c(0) c(24), log F0, and their and Topology 5-state left-to-right HMM with no skip
52 Subjective listening test Test type Subjects Test sentences Paired comparison test 8 Graduate students 20 test sentences were chosen at random Preference Score (%) Baum-Welch Viterbi trajectory-hmm 42.5% 38.4% 69.% : 95% confidence interval 4 / 42
53 Overview. Research backgrounds 2. Reformulating the HMM as a trajectory model 3. Deriving training algorithm 4. Discussing relationships to other techniques (EMLLT, PoE, HMM-based speech synthesis) 5. Evaluation both in speech recognition & synthesis 6. Conclusion & Future plans
54 42 / 42 Conclusions Reformulating the HMM as a trajectory model Deriving iterbi-type V training algorithm Evaluations both in speech recognition and synthesis Significant improvements over HMM were achiev Future plans Designing & implementing iterbi V decoder Large-scale evaluation (speaker independent, VCSR) L EM-type training ariational (V or MonteCarlo EM)
Lecture 5: GMM Acoustic Modeling and Feature Extraction
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 5: GMM Acoustic Modeling and Feature Extraction Original slides by Dan Jurafsky Outline for Today Acoustic
More informationOn the Influence of the Delta Coefficients in a HMM-based Speech Recognition System
On the Influence of the Delta Coefficients in a HMM-based Speech Recognition System Fabrice Lefèvre, Claude Montacié and Marie-José Caraty Laboratoire d'informatique de Paris VI 4, place Jussieu 755 PARIS
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate discrete samples A frame Need to focus on short
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationEstimation of Cepstral Coefficients for Robust Speech Recognition
Estimation of Cepstral Coefficients for Robust Speech Recognition by Kevin M. Indrebo, B.S., M.S. A Dissertation submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationSPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS
SPEECH RECOGNITION USING TIME DOMAIN FEATURES FROM PHASE SPACE RECONSTRUCTIONS by Jinjin Ye, B.S. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for
More informationDiscriminative models for speech recognition
Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationText-to-speech synthesizer based on combination of composite wavelet and hidden Markov models
8th ISCA Speech Synthesis Workshop August 31 September 2, 2013 Barcelona, Spain Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models Nobukatsu Hojo 1, Kota Yoshizato
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationRobust Speaker Identification
Robust Speaker Identification by Smarajit Bose Interdisciplinary Statistical Research Unit Indian Statistical Institute, Kolkata Joint work with Amita Pal and Ayanendranath Basu Overview } } } } } } }
More informationJorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function
890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an
More informationEigenvoice Speaker Adaptation via Composite Kernel PCA
Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationUsing Sub-Phonemic Units for HMM Based Phone Recognition
Jarle Bauck Hamar Using Sub-Phonemic Units for HMM Based Phone Recognition Thesis for the degree of Philosophiae Doctor Trondheim, June 2013 Norwegian University of Science and Technology Faculty of Information
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationCEPSTRAL analysis has been widely used in signal processing
162 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 2, MARCH 1999 On Second-Order Statistics and Linear Estimation of Cepstral Coefficients Yariv Ephraim, Fellow, IEEE, and Mazin Rahim, Senior
More informationFACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno
ACTORIAL HMMS OR ACOUSTIC MODELING Beth Logan and Pedro Moreno Cambridge Research Laboratories Digital Equipment Corporation One Kendall Square, Building 700, 2nd loor Cambridge, Massachusetts 02139 United
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationAutomatic Phoneme Recognition. Segmental Hidden Markov Models
Automatic Phoneme Recognition with Segmental Hidden Markov Models Areg G. Baghdasaryan Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationGMM-Based Speech Transformation Systems under Data Reduction
GMM-Based Speech Transformation Systems under Data Reduction Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-22305 Lannion Cedex
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationExemplar-based voice conversion using non-negative spectrogram deconvolution
Exemplar-based voice conversion using non-negative spectrogram deconvolution Zhizheng Wu 1, Tuomas Virtanen 2, Tomi Kinnunen 3, Eng Siong Chng 1, Haizhou Li 1,4 1 Nanyang Technological University, Singapore
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationUncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition
Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationGMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System
GMM Vector Quantization on the Modeling of DHMM for Arabic Isolated Word Recognition System Snani Cherifa 1, Ramdani Messaoud 1, Zermi Narima 1, Bourouba Houcine 2 1 Laboratoire d Automatique et Signaux
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationSpeech Recognition HMM
Speech Recognition HMM Jan Černocký, Valentina Hubeika {cernocky ihubeika}@fit.vutbr.cz FIT BUT Brno Speech Recognition HMM Jan Černocký, Valentina Hubeika, DCGM FIT BUT Brno 1/38 Agenda Recap variability
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationDeep Learning for Automatic Speech Recognition Part I
Deep Learning for Automatic Speech Recognition Part I Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief history of automatic speech recognition Speech
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationOutline of Today s Lecture
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationChapter 9. Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析
Chapter 9 Linear Predictive Analysis of Speech Signals 语音信号的线性预测分析 1 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationA New OCR System Similar to ASR System
A ew OCR System Similar to ASR System Abstract Optical character recognition (OCR) system is created using the concepts of automatic speech recognition where the hidden Markov Model is widely used. Results
More informationEvaluation of the modified group delay feature for isolated word recognition
Evaluation of the modified group delay feature for isolated word recognition Author Alsteris, Leigh, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium on Signal Processing and
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationASR using Hidden Markov Model : A tutorial
ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11
More informationShankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms
Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar
More informationL7: Linear prediction of speech
L7: Linear prediction of speech Introduction Linear prediction Finding the linear prediction coefficients Alternative representations This lecture is based on [Dutoit and Marques, 2009, ch1; Taylor, 2009,
More informationRobust Speech Recognition in the Presence of Additive Noise. Svein Gunnar Storebakken Pettersen
Robust Speech Recognition in the Presence of Additive Noise Svein Gunnar Storebakken Pettersen A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of PHILOSOPHIAE DOCTOR
More informationCS229 Project: Musical Alignment Discovery
S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,
More informationFactorial Hidden Markov Models for Speech Recognition: Preliminary Experiments
TM Factorial Hidden Markov Models for Speech Recognition: Preliminary Experiments Beth Logan Pedro J. Moreno CRL 97/7 September 1997 Cambridge Research Laboratory The Cambridge Research Laboratory was
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationAutoregressive Neural Models for Statistical Parametric Speech Synthesis
Autoregressive Neural Models for Statistical Parametric Speech Synthesis シンワン Xin WANG 2018-01-11 contact: wangxin@nii.ac.jp we welcome critical comments, suggestions, and discussion 1 https://www.slideshare.net/kotarotanahashi/deep-learning-library-coyotecnn
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationCAMBRIDGE UNIVERSITY
CAMBRIDGE UNIVERSITY ENGINEERING DEPARTMENT SWITCHING LINEAR DYNAMICAL SYSTEMS FOR SPEECH RECOGNITION A-V.I. Rosti & M.J.F. Gales CUED/F-INFENG/TR.461 December 12, 2003 Cambridge University Engineering
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationMarkov processes on curves for automatic speech recognition
Markov processes on curves for automatic speech recognition Lawrence Saul and Mazin Rahim AT&T Labs - Research Shannon Laboratory 180 Park Ave E-171 Florham Park, NJ 07932 {lsaul,rnazin}gresearch.att.com
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationarxiv: v1 [cs.sd] 25 Oct 2014
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech arxiv:1410.6903v1 [cs.sd] 25 Oct 2014 Laxmi Narayana M, Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, Tata Consultancy Services, Yantra
More informationTinySR. Peter Schmidt-Nielsen. August 27, 2014
TinySR Peter Schmidt-Nielsen August 27, 2014 Abstract TinySR is a light weight real-time small vocabulary speech recognizer written entirely in portable C. The library fits in a single file (plus header),
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationFEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION. Xiao Li and Jeff Bilmes
FEATURE PRUNING IN LIKELIHOOD EVALUATION OF HMM-BASED SPEECH RECOGNITION Xiao Li and Jeff Bilmes Department of Electrical Engineering University. of Washington, Seattle {lixiao, bilmes}@ee.washington.edu
More informationAn Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling
An Model for HMM-Based Speech Synthesis Based on Residual Modeling Ranniery Maia,, Tomoki Toda,, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, National Inst. of Inform. and Comm. Technology (NiCT), Japan
More informationFeature extraction 2
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 2 Dr Philip Jackson Linear prediction Perceptual linear prediction Comparison of feature methods
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationParametric Models Part III: Hidden Markov Models
Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationHow does a dictation machine recognize speech?
Chapter 4 How does a dictation machine recognize speech? This Chapter is not about how to wreck a nice beach 45 T. Dutoit ( ), L. Couvreur ( ), H. Bourlard (*) ( ) Faculté Polytechnique de Mons, Belgium
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationFast speaker diarization based on binary keys. Xavier Anguera and Jean François Bonastre
Fast speaker diarization based on binary keys Xavier Anguera and Jean François Bonastre Outline Introduction Speaker diarization Binary speaker modeling Binary speaker diarization system Experiments Conclusions
More informationL23: hidden Markov models
L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationA Direct Criterion Minimization based fmllr via Gradient Descend
A Direct Criterion Minimization based fmllr via Gradient Descend Jan Vaněk and Zbyněk Zajíc University of West Bohemia in Pilsen, Univerzitní 22, 306 14 Pilsen Faculty of Applied Sciences, Department of
More informationDiscriminant Feature Space Transformations for Automatic Speech Recognition
Discriminant Feature Space Transformations for Automatic Speech Recognition Vikrant Tomar McGill ID: 260394445 Department of Electrical & Computer Engineering McGill University Montreal, Canada February
More information