idden Markov Models, iscriminative Training, and odern Speech Recognition Li Deng Microsoft Research UW-EE516; Lecture May 21, 2009
|
|
- Dortha Rosamund Lawson
- 5 years ago
- Views:
Transcription
1 idden Markov Models, iscriminative Training, and odern Speech Recognition Li Deng Microsoft Research UW-EE516; Lecture May 21, 2009
2 Architecture of a speech recognition system Voice Signal Processing Application Application Decoder Acoustic Models Language Models Adaptation
3 Fundamental Equation of Speech Recognition Wˆ = arg max P( W x) = arg max P( x W) P( W) W W Roles of acoustic modeling and Hidden Markov Model (HMM)
4 utline Introduction to HMMs; Basic ML Training Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning (intro) Engineering Intricacy in Using HMMs in Speech Recognition Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework
5 The remaining mode are hidden because the underlying state sequence cannot be directly inferred from the output sequence MM Introduction: Description The 1-coin model is observable because the output sequence can be mapped to a specific sequence o state transitions
6 Description Specification of an HMM A - the state transition probability matrix aij = P(q t+1 = j q t = i) B- observation probability distribution b j (k) = P(o t = k q t = j) i k M π - the initial state distribution
7 Central problems for HMM Problem 1: Evaluation Probability of occurrence of a particular observation sequence, O = {o 1,,o k }, given the model: P(O λ) Not straightforward hidden states Useful in sequence classification (e.g., isolatedword recognition):
8 Problem 1: Evaluation Naïve computation: P ( O λ) = P( O q, λ) P( q λ) q where P( q λ) = π q1 aq 1q2aq2q3... aqt 1qT -The above sum is over all state paths -There are N T states paths, each costing O(T) calculations, leading to O(TN T ) time complexity.
9 Problem 1 Need efficient Solution Define auxiliary forward variable α: α ( i) = P( o,..., o q = i, λ),..., t 1 t t,..., t α ( i) = P( o1,..., o q i, λ) t t t = α ( i) = P( o o q = i, λ) t 1 t α ( i) = P( o o q = i, λ) t 1 t t α t (i) is the probability of observing a partial sequence of observables o 1, o t such that at time t, state q t =i Then recursion on α provides the most efficient solution to problem 1 (forward
10 Central problems for HMM Problem 2: Decoding Optimal state sequence to produce given observations, O = {o 1,,o k }, given model Optimality criterion Useful in recognition problems (e.g., continuous speech recognition
11 Problem 2: Decoding Recursion: δ t ( j ) = max ( δ t 1 1 i N ( i ) a ij ) b j ( o t ) ψ Termination: Backtracking: t ( j ) = arg max ( δ t 1 P q = = T maxδ 1 i N T ( i) arg max 1 i N 1 i δ T N ( i) ( i ) a ij ) 2 t T, 1 P* gives the state-optimised probability j N Q* is the optimal state sequence (Q* = {q1*,q2*,,qt*})
12 Central problems Central problems for HMM Problem 3: Learning of HMM parameters: Determine optimum model, given a training set of observations Find λ, such that P(O λ) is maximal (ML), or class discrimination is maximal (with greater separation margins)
13 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework
14 Problem 3: Learning Maximum Likelihood: Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables α,β B-W algorithm is a special case of the EM algorithm: E-step: calculation of posterior probs from α, β πˆ â ˆ ( k) ij b j M-step: HMM parameter updates Practical issues: Can get stuck in local maxima Numerical problems log and scaling
15 iscriminative Learning (intro) Maximum Mutual Information (MMI) Minimum Classification Error (MCE) Minimum Phone/Word Errors (MPE, MWE) Large-Margin Discriminative Training More Coverage later
16 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning (Advanced ML Techniques) EM (maximum likelihood) Discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework
17 eft-to-right Topology in HMM for Speech ower triangle parts of transition prob matrices forced to zero! hen aij s play very minor roles viz output probabilities
18 ord/phone as HMM States, Lexicon, Context Dependency, & Decision Tree
19 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning (Advanced ML Techniques) EM (maximum likelihood) Discriminative learning Large-margin discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A
20 HMM is a Poor Model for Speech (not just IID)
21 Major Research Directions in the Field Direction 1: Beyond HMM Generative Modeling build more accurate statistical speech models than HMM Incorporate key insights from human speech processing
22 SPEAKER LISTENER brain message brain Internal model decod messa motor/articulators ear speech acoustics auditory reception
23 o Hidden Dynamic Model --- Graphical Model Representation (1) S 1 (1) (1) S 2 S 3 (1) S 4... (1) S T (2) S 1 ( 2) S 2 (2) ( 2) S 3 S... (2 ) 4 S T S ( L) 1 S ( L) 2 ( L) S ( L) 3 S4... (L) S T t1 t2 t3 t 4 tk z z z z 4 zk
24 Result Comparisons (TIMIT phone recog. accuracy %) ASR Systems Generative Discriminative arly (Discrete) HMM (Lee & Hon) 66.08? arly HMM (Cambridge U) (MMI) arly HMM (ATR-Japan) (MCE) onditional Random Field (Ohio State U., 2005) NA ecurrent Neural Nets (Cambridge U.) NA arge-margin HMM (U. Penn, 2006) (NIPS 06) odern HMM (MSR) (MCE) idden Dynamic Model (MSR) 75.18?
25 Major Research Directions in the Field Direction 2: Discriminative Modeling/Learning Acknowledge accurate speech modeling is too hard View HMM not as a generative model but as discriminant function Large-margin discriminative parameter learning Direct models for speech-class discrimination: e.g., MEMM, CRF; feature engineering; deep learning and structure Discriminative training for generative models
26 utline Introduction to HMMs; Basic ML Training Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning (intro) Engineering Intricacy in Using HMMs in Speech Recognition Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework (He, Deng, Wu, 2008, IEEE SPM)
27
28 MMIE Criterion Maximizing the mutual information is equivalent to choosing the parameter set λ to maximize: F MMIE ( λ) = t ( ) O M P( w ) R Pλ t w t log t Σ ( ) ( P O M P w) = 1 λ t w w Maximization implies increasing the numerator term (maximum likelihood estimation MLE) and/or decreasing the denominator term The latter is accomplished by reducing the probabilities of incorrect, or competing, hypotheses Clever optimization methods
29 MPE/MWE Criteria O (Λ)= MWE R r=1 p(x s,λ) P(s ) A (s,s ) sr r r r 1 r r p(x s,λ)p(s ) sr r r r Raw phone or word accuracy A(s, S) (Povey et. al. 2002) Requiring use of lattices to determine A(s,S) Very difficult to optimize wrt HMM parameters Engineering tricks Unifying MPE, MCE, MMI (He & Deng, 2007) in both criterion and optimization method
30 MCE Criterion M i X g i, 1 ),, ( = Λ iscriminative functions: ), ( arg max ), ( ), ( 1,, Λ + Λ = Λ = X g X g X d j M j i j i i [ ] ) ( min arg ) ( )), ( ( )), ( ( ) ( ) 1( ), ( ), ( 1 Λ = Λ Λ = Λ = Λ Λ = Λ Λ = L X X P X d l X d l E L C X X d X d opt X X M i i i moothed classification rror count: isclassification measure: ptimization (gradient): 0, 1 )), ( ( > = Λ γ X d l i ere error smoothing ction is sigmoidal :
31 rge-margin Estimates MMI, MPE/MWE, & common MCE do not embrace margins Margin parameter m>0 in MCE s error smoothing function 1 l( di ( X, Λ )) =, γ > exp( γ ( d ( X, Λ ) + m( I))) i Careful exploitation of the margin parameter in LM-MCE training of HMM parameters leads to significant ASR performance improvement (Yu, Deng, He, Acero, 2006, 2007)
32 rge-margin Estimates Paper: Sha & Saul (NIPS 06): Large Margin Training of Continuous-density Hidden Markov Models Re-formulate Gaussians in HMM -- turning log(variances) into independent parameters to optimize SVM-type formulation of objective function --- use of slack variables Make margin a function of the training token according to Hamming distance Convex optimization (vs. LM-MCE with local optima) Good relative performance improvement on TIMIT: phone accuracy from 67.3% (EM) to 71.8% (LM) Several other groups explore different ways of incorporating margins in HMM parameter training
33 MCE in more detail Define misclassification measure: d ( X, Λ ) = log p ( X s ) log p ( X S ) r r Λ r, r,1 Λ r, r he case of using correct and top one incorrect competing tokens) Observation. seq.: X r x 1, x 2, x 3, x 4,, x t,, x T correct label: S r OH THREE EIGHT competitor: s r,1 OH SIX EIGHT s r,1 : the top one incorrect (not equal to S r ) competing string 33
34 34 CE: Loss function s = arg max log p ( X, s ) * Classification: { } r Λ r r s Classifi. error: d r (X r,λ) > 0 1 classification error Loss function: smoothed error count function: 1 l d ( X, Λ ) = + e Λ ( ) (, ) r r r d X r d r (X r,λ) < 0 0 classification error 1 r r
35 35 MCE: Objective function MCE objective function: R 1 L ( Λ ) = l d ( X, Λ) ( ) MCE r r r R r = 1 L MCE (Λ) is the smoothed recognition error rate on the string (token) level. Model (acoustic model) is trained to minimize L MCE (Λ), i.e., Λ * = argmin Λ {L MCE (Λ)}
36 36 CE: Optimization Traditional Stochastic GD Gradient descent based online optimization New Growth Transform. Extend Baum-Welch based batch-mode method Convergence is unstable Training process is difficult to be parallelized Stable convergence Ready for parallelized processing
37 Introduction to EBW or GT (Part I) I) Applied to rational function as the objective fcnt: P( Λ ) = G( Λ) H ( Λ) Equivalent auxiliary objective fctn (often easier t optimize): F( ΛΛ ; ) = G( Λ) P( Λ ) H( Λ ) + D here D is a Λ-independent constant. (see proof in SPM paper)
38 Introduction to EBW or GT (Part II) Applied to the objective fcnt of the form: Equivalent auxiliary objective fctn (easier to optimize): (see proof in SPM paper & tech report)
39 CE: Optimization o Growth Transformation based MCE: If Λ=T(Λ') ensures P(Λ)>P(Λ'), i.e., P(Λ) grows, then T( ) is called a growth transformation of Λ for P(Λ). Minimizing L MCE (Λ) = l d( ) Maximizing P(Λ) = G(Λ)/H(Λ) Maximizing F(Λ;Λ ) = G-P H+D GT formula U( )/ Λ = 0 Λ =T(Λ ) Maximizing U(Λ;Λ ) = f ( )log f( ) Maximizing F(Λ;Λ ) = f ( ) 39
40 Rational Function Form for EC Objective Function d ( X, Λ ) = log p ( X s ) log p ( X S ) r r Λ r, r,1 Λ r, r 1 l d ( X, Λ ) = + e Λ ( ) (, ) r r r d X 1 r r R 1 L ( Λ ) = l d ( X, Λ) ( ) MCE r r r R r = 1 Not an obvious rational function (ratio) due to the summation
41 41 ational Function Form for MEC Objectiv nction Re-write MCE loss function to l d ( X, Λ ) = ( ) r r r px (, s Λ) r r,1 px (, s Λ ) + px (, S Λ) r r,1 r r Then, min. L MCE (Λ) max. Q(Λ), where ( ) Q( Λ ) = R 1 L ( Λ) MCE p( Xr, sr Λ) δ ( sr, Sr) R R px ( r, Sr Λ) s r { s r,1, S r} = = px (, s Λ ) + px (, S Λ) px (, s Λ) r= 1 r r,1 r r r= 1 r r s { s, S } r r,1 r
42
43 Unified Objective Function Can do the same trick for MPE/MWE MMI objective function is naturally a rational function (due to logarithm in the MMI definition) Unified form:
44
45 45 CE: Optimization (cont d) Coming back to MCE: its objective function can be reformulated as a single fractional function P(Λ) G( Λ) P( Λ ) = H ( Λ) where R G( Λ ) = p ( X1,..., XR, s1,..., sr) Λ δ ( sr, Sr) s s r= 1 1 H( Λ ) = p ( X,..., X, s,..., s ) s 1 R s R Λ 1 R 1 R
46 MCE: Optimization (cont d) Increasing P(Λ) can be achieved by maximizing F( ΛΛ ; ) = G( Λ) P( Λ ) H( Λ ) + D as long as D is a Λ-independent constant. i.e., P( Λ) P( Λ ) = F( Λ; Λ ) F( Λ ; Λ ) 1 H ( Λ) [ ] (Λ is the parameter set obtained from last iteration) Substitute G() and H() into F(), F( ΛΛ ; ) = pxqs (,, Λ)[ Cs ( ) P( Λ )] + D q s 46
47 Reformulate F(Λ;Λ') to F( ΛΛ ; ) = f( χ, q, s, ΛΛ ; ) dχ where s q χ [ ] f ( χ, qs,, Λ; Λ ) = Γ( Λ ) + ds ( ) p( χ, q s, Λ) ΓΛ ( ) = δ( χ, X) p( q, s)[ C( s) P( Λ )] R Cs () = δ ( sr, Sr) r= 1 s F(Λ;Λ') is ready for EM-style optimization after applying GT theory (Part II) Note: Γ(Λ ) is a constant, and log p(χ, q s, Λ) is easy to 47
48 Increasing F(Λ;Λ') can be achieved by maximizing U( Λ ) = f( χ, q, s, Λ ; Λ )log f( χ, q, s, Λ; Λ ) dχ s q χ Use EBM/GT (Part II) for E step. log f(χ,q,s,λ;λ') is decomposable w.r.t Λ, so M step is easy to compute. So the growth transformation of Λ for CDHMM is: U ( Λ) Λ = 0 Λ = T ( Λ ) 48
49 CE: HMM estimation ormulas For Gaussian mixture CDHMM, μ m T 1 ( μ, ) exp ( μ) ( μ) px Σ Σ x Σ x 2 GT of mean and covariance of Gaussian m is m Σ = = r j t r t t Δ γ () tx + D mr, rt, m m Δ γ () t + D mr, m μ T Δ γmr, ( t)( xrt, - μm)( xrt, - μm) + DmΣ m+ Dm( μm- μ m)( μm- μ m) r t Δ γ () t + D mr, m T where ( ) Δ γ () t = p( S X, Λ ) p( s X, Λ ) γ () t γ () t mr, r r r,1 r mrs,, mrs,, r 49 r,1
50 50 CE: HMM estimation ormulas Setting of D m Theoretically, set D m so that f(χ,q,s,λ;λ') > 0 Empirically, R [ D = E p( S X, Λ ) p( S X, Λ ) γ ( t) m r r r r m, r, S r= 1 t + p( s X, Λ ) γ ( t) r,1 r m, r, s t r r,1 ]
51 51 MCE: Workflow Training utterances Recognition Last iteration Model Λ Competing strings Training transcripts GT-MCE New model Λ next iteration
52 52 Experiment: TI-DIGITS Vocabulary: 1 to 9, plus oh and zero Training set: 8623 utterances / words Test set: 8700 utterances / words 33-dimentional spectrum feature: energy +10 MFCCs, plus and features. Model: Continuous Density HMMs Total number of Gaussian components: 3284
53 Experiment: TI-DIGITS GT-MCE vs. ML (maximum likelihood) baseline 0.4 MCE training - TIdigits 1100 MCE training - TIdigits WER (%) E=1.0 E=2.0 E=2.5 Loss func. (sigmoid error count) E=1.0 E=2.0 E= MCE iteration MCE iteration Obtain the lowest error rate on this task Reduce recognition Word Error Rate (WER) by 23% Fast and stable convergence 53
54 HMM is an underlying model for modern speech recognizers Fundamental properties of HMM Strengths and weaknesses of HMM for speech modeling and recognition Discriminative training for HMM as the generative model Reasonably good theory and practice for HMM discriminative training
Discriminative Learning in Speech Recognition
Discriminative Learning in Speech Recognition Yueng-Tien, Lo g96470198@csie.ntnu.edu.tw Speech Lab, CSIE Reference Xiaodong He and Li Deng. "Discriminative Learning in Speech Recognition, Technical Report
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationOutline of Today s Lecture
University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION
SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION A Dissertation Presented to The Academic Faculty By Jinyu Li In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Electrical
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationIn machine learning, there are two distinct groups of learning methods for
[ Hui Jiang and Xinwei Li ] [ An advanced method of discriminative training for speech and language processing] In machine learning, there are two distinct groups of learning methods for building pattern
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationHIDDEN MARKOV MODELS IN SPEECH RECOGNITION
HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",
More informationDiscriminative models for speech recognition
Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationMultiscale Systems Engineering Research Group
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden
More informationAugmented Statistical Models for Speech Recognition
Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent
More informationHidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391
Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical Processing of Natural Language
Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationAugmented Statistical Models for Classifying Sequence Data
Augmented Statistical Models for Classifying Sequence Data Martin Layton Corpus Christi College University of Cambridge September 2006 Dissertation submitted to the University of Cambridge for the degree
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationDiscriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning
Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationDiagonal Priors for Full Covariance Speech Recognition
Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK
More informationGraphical Models for Automatic Speech Recognition
Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationASR using Hidden Markov Model : A tutorial
ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationLecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationModel-Based Margin Estimation for Hidden Markov Model Learning and Generalization
1 2 3 4 5 6 7 8 Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization Sabato Marco Siniscalchi a,, Jinyu Li b, Chin-Hui Lee c a Faculty of Engineering and Architecture, Kore
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationLecture 11: Hidden Markov Models
Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing
More informationHidden Markov Models
Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationHidden Markov Models
Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationMAP adaptation with SphinxTrain
MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationMatrix Updates for Perceptron Training of Continuous Density Hidden Markov Models
Matrix Updates for Perceptron Training of Continuous Density Hidden Markov Models Chih-Chieh Cheng Department of Computer Science and Engineering, University of California, San Diego Fei Sha Department
More informationParametric Models Part III: Hidden Markov Models
Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationEngineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics
Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More information