idden Markov Models, iscriminative Training, and odern Speech Recognition Li Deng Microsoft Research UW-EE516; Lecture May 21, 2009

Size: px
Start display at page:

Download "idden Markov Models, iscriminative Training, and odern Speech Recognition Li Deng Microsoft Research UW-EE516; Lecture May 21, 2009"

Transcription

1 idden Markov Models, iscriminative Training, and odern Speech Recognition Li Deng Microsoft Research UW-EE516; Lecture May 21, 2009

2 Architecture of a speech recognition system Voice Signal Processing Application Application Decoder Acoustic Models Language Models Adaptation

3 Fundamental Equation of Speech Recognition Wˆ = arg max P( W x) = arg max P( x W) P( W) W W Roles of acoustic modeling and Hidden Markov Model (HMM)

4 utline Introduction to HMMs; Basic ML Training Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning (intro) Engineering Intricacy in Using HMMs in Speech Recognition Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework

5 The remaining mode are hidden because the underlying state sequence cannot be directly inferred from the output sequence MM Introduction: Description The 1-coin model is observable because the output sequence can be mapped to a specific sequence o state transitions

6 Description Specification of an HMM A - the state transition probability matrix aij = P(q t+1 = j q t = i) B- observation probability distribution b j (k) = P(o t = k q t = j) i k M π - the initial state distribution

7 Central problems for HMM Problem 1: Evaluation Probability of occurrence of a particular observation sequence, O = {o 1,,o k }, given the model: P(O λ) Not straightforward hidden states Useful in sequence classification (e.g., isolatedword recognition):

8 Problem 1: Evaluation Naïve computation: P ( O λ) = P( O q, λ) P( q λ) q where P( q λ) = π q1 aq 1q2aq2q3... aqt 1qT -The above sum is over all state paths -There are N T states paths, each costing O(T) calculations, leading to O(TN T ) time complexity.

9 Problem 1 Need efficient Solution Define auxiliary forward variable α: α ( i) = P( o,..., o q = i, λ),..., t 1 t t,..., t α ( i) = P( o1,..., o q i, λ) t t t = α ( i) = P( o o q = i, λ) t 1 t α ( i) = P( o o q = i, λ) t 1 t t α t (i) is the probability of observing a partial sequence of observables o 1, o t such that at time t, state q t =i Then recursion on α provides the most efficient solution to problem 1 (forward

10 Central problems for HMM Problem 2: Decoding Optimal state sequence to produce given observations, O = {o 1,,o k }, given model Optimality criterion Useful in recognition problems (e.g., continuous speech recognition

11 Problem 2: Decoding Recursion: δ t ( j ) = max ( δ t 1 1 i N ( i ) a ij ) b j ( o t ) ψ Termination: Backtracking: t ( j ) = arg max ( δ t 1 P q = = T maxδ 1 i N T ( i) arg max 1 i N 1 i δ T N ( i) ( i ) a ij ) 2 t T, 1 P* gives the state-optimised probability j N Q* is the optimal state sequence (Q* = {q1*,q2*,,qt*})

12 Central problems Central problems for HMM Problem 3: Learning of HMM parameters: Determine optimum model, given a training set of observations Find λ, such that P(O λ) is maximal (ML), or class discrimination is maximal (with greater separation margins)

13 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework

14 Problem 3: Learning Maximum Likelihood: Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables α,β B-W algorithm is a special case of the EM algorithm: E-step: calculation of posterior probs from α, β πˆ â ˆ ( k) ij b j M-step: HMM parameter updates Practical issues: Can get stuck in local maxima Numerical problems log and scaling

15 iscriminative Learning (intro) Maximum Mutual Information (MMI) Minimum Classification Error (MCE) Minimum Phone/Word Errors (MPE, MWE) Large-Margin Discriminative Training More Coverage later

16 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning (Advanced ML Techniques) EM (maximum likelihood) Discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework

17 eft-to-right Topology in HMM for Speech ower triangle parts of transition prob matrices forced to zero! hen aij s play very minor roles viz output probabilities

18 ord/phone as HMM States, Lexicon, Context Dependency, & Decision Tree

19 utline Introduction to HMMs; Basic ML Techniques HMM Parameter Learning (Advanced ML Techniques) EM (maximum likelihood) Discriminative learning Large-margin discriminative learning Engineering Intricacy in Using HMMs (Speech) Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A

20 HMM is a Poor Model for Speech (not just IID)

21 Major Research Directions in the Field Direction 1: Beyond HMM Generative Modeling build more accurate statistical speech models than HMM Incorporate key insights from human speech processing

22 SPEAKER LISTENER brain message brain Internal model decod messa motor/articulators ear speech acoustics auditory reception

23 o Hidden Dynamic Model --- Graphical Model Representation (1) S 1 (1) (1) S 2 S 3 (1) S 4... (1) S T (2) S 1 ( 2) S 2 (2) ( 2) S 3 S... (2 ) 4 S T S ( L) 1 S ( L) 2 ( L) S ( L) 3 S4... (L) S T t1 t2 t3 t 4 tk z z z z 4 zk

24 Result Comparisons (TIMIT phone recog. accuracy %) ASR Systems Generative Discriminative arly (Discrete) HMM (Lee & Hon) 66.08? arly HMM (Cambridge U) (MMI) arly HMM (ATR-Japan) (MCE) onditional Random Field (Ohio State U., 2005) NA ecurrent Neural Nets (Cambridge U.) NA arge-margin HMM (U. Penn, 2006) (NIPS 06) odern HMM (MSR) (MCE) idden Dynamic Model (MSR) 75.18?

25 Major Research Directions in the Field Direction 2: Discriminative Modeling/Learning Acknowledge accurate speech modeling is too hard View HMM not as a generative model but as discriminant function Large-margin discriminative parameter learning Direct models for speech-class discrimination: e.g., MEMM, CRF; feature engineering; deep learning and structure Discriminative training for generative models

26 utline Introduction to HMMs; Basic ML Training Techniques HMM Parameter Learning EM (maximum likelihood) Discriminative learning (intro) Engineering Intricacy in Using HMMs in Speech Recognition Left-to-right structure Use of lexicon and pronunciation modeling Use of context dependency (variance reduction) Complexity control: decision tree & automatic parameter tying Beyond HMM for Speech Modeling/Recognition Better generative models for speech dynamics Discriminative models: e.g., CRF and feature engineering Discriminative Training in Speech Recognition --- A unifying framework (He, Deng, Wu, 2008, IEEE SPM)

27

28 MMIE Criterion Maximizing the mutual information is equivalent to choosing the parameter set λ to maximize: F MMIE ( λ) = t ( ) O M P( w ) R Pλ t w t log t Σ ( ) ( P O M P w) = 1 λ t w w Maximization implies increasing the numerator term (maximum likelihood estimation MLE) and/or decreasing the denominator term The latter is accomplished by reducing the probabilities of incorrect, or competing, hypotheses Clever optimization methods

29 MPE/MWE Criteria O (Λ)= MWE R r=1 p(x s,λ) P(s ) A (s,s ) sr r r r 1 r r p(x s,λ)p(s ) sr r r r Raw phone or word accuracy A(s, S) (Povey et. al. 2002) Requiring use of lattices to determine A(s,S) Very difficult to optimize wrt HMM parameters Engineering tricks Unifying MPE, MCE, MMI (He & Deng, 2007) in both criterion and optimization method

30 MCE Criterion M i X g i, 1 ),, ( = Λ iscriminative functions: ), ( arg max ), ( ), ( 1,, Λ + Λ = Λ = X g X g X d j M j i j i i [ ] ) ( min arg ) ( )), ( ( )), ( ( ) ( ) 1( ), ( ), ( 1 Λ = Λ Λ = Λ = Λ Λ = Λ Λ = L X X P X d l X d l E L C X X d X d opt X X M i i i moothed classification rror count: isclassification measure: ptimization (gradient): 0, 1 )), ( ( > = Λ γ X d l i ere error smoothing ction is sigmoidal :

31 rge-margin Estimates MMI, MPE/MWE, & common MCE do not embrace margins Margin parameter m>0 in MCE s error smoothing function 1 l( di ( X, Λ )) =, γ > exp( γ ( d ( X, Λ ) + m( I))) i Careful exploitation of the margin parameter in LM-MCE training of HMM parameters leads to significant ASR performance improvement (Yu, Deng, He, Acero, 2006, 2007)

32 rge-margin Estimates Paper: Sha & Saul (NIPS 06): Large Margin Training of Continuous-density Hidden Markov Models Re-formulate Gaussians in HMM -- turning log(variances) into independent parameters to optimize SVM-type formulation of objective function --- use of slack variables Make margin a function of the training token according to Hamming distance Convex optimization (vs. LM-MCE with local optima) Good relative performance improvement on TIMIT: phone accuracy from 67.3% (EM) to 71.8% (LM) Several other groups explore different ways of incorporating margins in HMM parameter training

33 MCE in more detail Define misclassification measure: d ( X, Λ ) = log p ( X s ) log p ( X S ) r r Λ r, r,1 Λ r, r he case of using correct and top one incorrect competing tokens) Observation. seq.: X r x 1, x 2, x 3, x 4,, x t,, x T correct label: S r OH THREE EIGHT competitor: s r,1 OH SIX EIGHT s r,1 : the top one incorrect (not equal to S r ) competing string 33

34 34 CE: Loss function s = arg max log p ( X, s ) * Classification: { } r Λ r r s Classifi. error: d r (X r,λ) > 0 1 classification error Loss function: smoothed error count function: 1 l d ( X, Λ ) = + e Λ ( ) (, ) r r r d X r d r (X r,λ) < 0 0 classification error 1 r r

35 35 MCE: Objective function MCE objective function: R 1 L ( Λ ) = l d ( X, Λ) ( ) MCE r r r R r = 1 L MCE (Λ) is the smoothed recognition error rate on the string (token) level. Model (acoustic model) is trained to minimize L MCE (Λ), i.e., Λ * = argmin Λ {L MCE (Λ)}

36 36 CE: Optimization Traditional Stochastic GD Gradient descent based online optimization New Growth Transform. Extend Baum-Welch based batch-mode method Convergence is unstable Training process is difficult to be parallelized Stable convergence Ready for parallelized processing

37 Introduction to EBW or GT (Part I) I) Applied to rational function as the objective fcnt: P( Λ ) = G( Λ) H ( Λ) Equivalent auxiliary objective fctn (often easier t optimize): F( ΛΛ ; ) = G( Λ) P( Λ ) H( Λ ) + D here D is a Λ-independent constant. (see proof in SPM paper)

38 Introduction to EBW or GT (Part II) Applied to the objective fcnt of the form: Equivalent auxiliary objective fctn (easier to optimize): (see proof in SPM paper & tech report)

39 CE: Optimization o Growth Transformation based MCE: If Λ=T(Λ') ensures P(Λ)>P(Λ'), i.e., P(Λ) grows, then T( ) is called a growth transformation of Λ for P(Λ). Minimizing L MCE (Λ) = l d( ) Maximizing P(Λ) = G(Λ)/H(Λ) Maximizing F(Λ;Λ ) = G-P H+D GT formula U( )/ Λ = 0 Λ =T(Λ ) Maximizing U(Λ;Λ ) = f ( )log f( ) Maximizing F(Λ;Λ ) = f ( ) 39

40 Rational Function Form for EC Objective Function d ( X, Λ ) = log p ( X s ) log p ( X S ) r r Λ r, r,1 Λ r, r 1 l d ( X, Λ ) = + e Λ ( ) (, ) r r r d X 1 r r R 1 L ( Λ ) = l d ( X, Λ) ( ) MCE r r r R r = 1 Not an obvious rational function (ratio) due to the summation

41 41 ational Function Form for MEC Objectiv nction Re-write MCE loss function to l d ( X, Λ ) = ( ) r r r px (, s Λ) r r,1 px (, s Λ ) + px (, S Λ) r r,1 r r Then, min. L MCE (Λ) max. Q(Λ), where ( ) Q( Λ ) = R 1 L ( Λ) MCE p( Xr, sr Λ) δ ( sr, Sr) R R px ( r, Sr Λ) s r { s r,1, S r} = = px (, s Λ ) + px (, S Λ) px (, s Λ) r= 1 r r,1 r r r= 1 r r s { s, S } r r,1 r

42

43 Unified Objective Function Can do the same trick for MPE/MWE MMI objective function is naturally a rational function (due to logarithm in the MMI definition) Unified form:

44

45 45 CE: Optimization (cont d) Coming back to MCE: its objective function can be reformulated as a single fractional function P(Λ) G( Λ) P( Λ ) = H ( Λ) where R G( Λ ) = p ( X1,..., XR, s1,..., sr) Λ δ ( sr, Sr) s s r= 1 1 H( Λ ) = p ( X,..., X, s,..., s ) s 1 R s R Λ 1 R 1 R

46 MCE: Optimization (cont d) Increasing P(Λ) can be achieved by maximizing F( ΛΛ ; ) = G( Λ) P( Λ ) H( Λ ) + D as long as D is a Λ-independent constant. i.e., P( Λ) P( Λ ) = F( Λ; Λ ) F( Λ ; Λ ) 1 H ( Λ) [ ] (Λ is the parameter set obtained from last iteration) Substitute G() and H() into F(), F( ΛΛ ; ) = pxqs (,, Λ)[ Cs ( ) P( Λ )] + D q s 46

47 Reformulate F(Λ;Λ') to F( ΛΛ ; ) = f( χ, q, s, ΛΛ ; ) dχ where s q χ [ ] f ( χ, qs,, Λ; Λ ) = Γ( Λ ) + ds ( ) p( χ, q s, Λ) ΓΛ ( ) = δ( χ, X) p( q, s)[ C( s) P( Λ )] R Cs () = δ ( sr, Sr) r= 1 s F(Λ;Λ') is ready for EM-style optimization after applying GT theory (Part II) Note: Γ(Λ ) is a constant, and log p(χ, q s, Λ) is easy to 47

48 Increasing F(Λ;Λ') can be achieved by maximizing U( Λ ) = f( χ, q, s, Λ ; Λ )log f( χ, q, s, Λ; Λ ) dχ s q χ Use EBM/GT (Part II) for E step. log f(χ,q,s,λ;λ') is decomposable w.r.t Λ, so M step is easy to compute. So the growth transformation of Λ for CDHMM is: U ( Λ) Λ = 0 Λ = T ( Λ ) 48

49 CE: HMM estimation ormulas For Gaussian mixture CDHMM, μ m T 1 ( μ, ) exp ( μ) ( μ) px Σ Σ x Σ x 2 GT of mean and covariance of Gaussian m is m Σ = = r j t r t t Δ γ () tx + D mr, rt, m m Δ γ () t + D mr, m μ T Δ γmr, ( t)( xrt, - μm)( xrt, - μm) + DmΣ m+ Dm( μm- μ m)( μm- μ m) r t Δ γ () t + D mr, m T where ( ) Δ γ () t = p( S X, Λ ) p( s X, Λ ) γ () t γ () t mr, r r r,1 r mrs,, mrs,, r 49 r,1

50 50 CE: HMM estimation ormulas Setting of D m Theoretically, set D m so that f(χ,q,s,λ;λ') > 0 Empirically, R [ D = E p( S X, Λ ) p( S X, Λ ) γ ( t) m r r r r m, r, S r= 1 t + p( s X, Λ ) γ ( t) r,1 r m, r, s t r r,1 ]

51 51 MCE: Workflow Training utterances Recognition Last iteration Model Λ Competing strings Training transcripts GT-MCE New model Λ next iteration

52 52 Experiment: TI-DIGITS Vocabulary: 1 to 9, plus oh and zero Training set: 8623 utterances / words Test set: 8700 utterances / words 33-dimentional spectrum feature: energy +10 MFCCs, plus and features. Model: Continuous Density HMMs Total number of Gaussian components: 3284

53 Experiment: TI-DIGITS GT-MCE vs. ML (maximum likelihood) baseline 0.4 MCE training - TIdigits 1100 MCE training - TIdigits WER (%) E=1.0 E=2.0 E=2.5 Loss func. (sigmoid error count) E=1.0 E=2.0 E= MCE iteration MCE iteration Obtain the lowest error rate on this task Reduce recognition Word Error Rate (WER) by 23% Fast and stable convergence 53

54 HMM is an underlying model for modern speech recognizers Fundamental properties of HMM Strengths and weaknesses of HMM for speech modeling and recognition Discriminative training for HMM as the generative model Reasonably good theory and practice for HMM discriminative training

Discriminative Learning in Speech Recognition

Discriminative Learning in Speech Recognition Discriminative Learning in Speech Recognition Yueng-Tien, Lo g96470198@csie.ntnu.edu.tw Speech Lab, CSIE Reference Xiaodong He and Li Deng. "Discriminative Learning in Speech Recognition, Technical Report

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Outline of Today s Lecture

Outline of Today s Lecture University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Jeff A. Bilmes Lecture 12 Slides Feb 23 rd, 2005 Outline of Today s

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION

SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION A Dissertation Presented to The Academic Faculty By Jinyu Li In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Electrical

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

In machine learning, there are two distinct groups of learning methods for

In machine learning, there are two distinct groups of learning methods for [ Hui Jiang and Xinwei Li ] [ An advanced method of discriminative training for speech and language processing] In machine learning, there are two distinct groups of learning methods for building pattern

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

Discriminative models for speech recognition

Discriminative models for speech recognition Discriminative models for speech recognition Anton Ragni Peterhouse University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 2013 Declaration This dissertation is the result of

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

Augmented Statistical Models for Speech Recognition

Augmented Statistical Models for Speech Recognition Augmented Statistical Models for Speech Recognition Mark Gales & Martin Layton 31 August 2005 Trajectory Models For Speech Processing Workshop Overview Dependency Modelling in Speech Recognition: latent

More information

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

Hidden Markov Models

Hidden Markov Models CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each

More information

Lecture 3: Machine learning, classification, and generative models

Lecture 3: Machine learning, classification, and generative models EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical NLP Spring The Noisy Channel Model

Statistical NLP Spring The Noisy Channel Model Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.

More information

Statistical Processing of Natural Language

Statistical Processing of Natural Language Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

Augmented Statistical Models for Classifying Sequence Data

Augmented Statistical Models for Classifying Sequence Data Augmented Statistical Models for Classifying Sequence Data Martin Layton Corpus Christi College University of Cambridge September 2006 Dissertation submitted to the University of Cambridge for the degree

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning Zaihu Pang 1, Xihong Wu 1, and Lei Xu 1,2 1 Speech and Hearing Research Center, Key Laboratory of Machine

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Diagonal Priors for Full Covariance Speech Recognition

Diagonal Priors for Full Covariance Speech Recognition Diagonal Priors for Full Covariance Speech Recognition Peter Bell 1, Simon King 2 Centre for Speech Technology Research, University of Edinburgh Informatics Forum, 10 Crichton St, Edinburgh, EH8 9AB, UK

More information

Graphical Models for Automatic Speech Recognition

Graphical Models for Automatic Speech Recognition Graphical Models for Automatic Speech Recognition Advanced Signal Processing SE 2, SS05 Stefan Petrik Signal Processing and Speech Communication Laboratory Graz University of Technology GMs for Automatic

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

ASR using Hidden Markov Model : A tutorial

ASR using Hidden Markov Model : A tutorial ASR using Hidden Markov Model : A tutorial Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11 samudravijaya@gmail.com Tata Institute of Fundamental Research Samudravijaya K Workshop on ASR @BAMU; 14-OCT-11

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization

Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization 1 2 3 4 5 6 7 8 Model-Based Margin Estimation for Hidden Markov Model Learning and Generalization Sabato Marco Siniscalchi a,, Jinyu Li b, Chin-Hui Lee c a Faculty of Engineering and Architecture, Kore

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Dr Philip Jackson Centre for Vision, Speech & Signal Processing University of Surrey, UK 1 3 2 http://www.ee.surrey.ac.uk/personal/p.jackson/isspr/ Outline 1. Recognizing patterns

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

Statistical NLP Spring Digitizing Speech

Statistical NLP Spring Digitizing Speech Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon

More information

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ... Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

MAP adaptation with SphinxTrain

MAP adaptation with SphinxTrain MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Matrix Updates for Perceptron Training of Continuous Density Hidden Markov Models

Matrix Updates for Perceptron Training of Continuous Density Hidden Markov Models Matrix Updates for Perceptron Training of Continuous Density Hidden Markov Models Chih-Chieh Cheng Department of Computer Science and Engineering, University of California, San Diego Fei Sha Department

More information

Parametric Models Part III: Hidden Markov Models

Parametric Models Part III: Hidden Markov Models Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Pattern Classification

Pattern Classification Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics Phil Woodland: pcw@eng.cam.ac.uk Lent 2013 Engineering Part IIB: Module 4F11 What is Speech Recognition?

More information

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions

More information