Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Size: px

Start display at page:

Download "Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition"

Georgiana Townsend
5 years ago
Views:

Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models

Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Fundamental Equaton of Statstcal Speech Recognton Acoustc Modellng If X s the

1 Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans GMMs: Gaussan mxture models HMMs: Hdden Marov models HMM algorthms Lelhood computaton (forward algorthm Most probable state sequence (Vterb algorthm Estmtng the parameters (EM algorthm ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Fundamental Equaton of Statstcal Speech Recognton Acoustc Modellng If X s the sequence of acoustc feature vectors (observatons and W denotes a word sequence, the most lely word sequence W s gven by Applyng Bayes Theorem: W = arg max P(W X W P(W X = p(x WP(W p(x p(x WP(W W = arg max W p(x W }{{} Acoustc model P(W }{{} Language model Recorded Speech Sgnal Analyss Tranng Data Decoded Text (Transcrpton Hdden Marov Model Acoustc Model Lexcon Language Model Search Space ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models

Herarchcal modellng of speech Acoustc Model: Contnuous Densty HMM Generatve Model "No rght" Utterance P(s s P(s s P(s 3 s 3 NO RIGHT Word n oh r a t Subword s s s 3 s E s I P(s s I P(s s P(s 3 s P(s

2 Herarchcal modellng of speech Acoustc Model: Contnuous Densty HMM Generatve Model "No rght" Utterance P(s s P(s s P(s 3 s 3 NO RIGHT Word n oh r a t Subword s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 HMM p(x s p(x s p(x s 3 Acoustcs x x x Probablstc fnte state automaton Paramaters λ: Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Acoustc Model: Contnuous Densty HMM HMM Assumptons ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 P(s s P(s s P(s 3 s 3 s I s s s 3 s E s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 p(x s p(x s p(x s 3 Paramaters λ: x x x 3 x x 5 x 6 Probablstc fnte state automaton Transton probabltes: a = P(s s Output probablty densty functon: b (x = p(x s x x Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

3 HMM Assumptons s(t s(t s(t+ x(t x(t x(t + HMM OUTPUT DISTRIBUTION Observaton ndependence An acoustc observaton x s condtonally ndependent of all other observatons gven the state that generated t Marov process A state s condtonally ndependent of all other states gven the prevous state ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Output dstrbuton Bacground: cdf P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 Consder a real valued random varable X Cumulatve dstrbuton functon (cdf F (x for X : p(x s p(x s p(x s 3 F (x = P(X x x x x Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= To obtan the probablty of fallng n an nterval we can do the followng: P(a < X b = P(X b P(X a = F (b F (a ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models

4 Bacground: pdf The Gaussan dstrbuton (unvarate The rate of change of the cdf gves us the probablty densty functon (pdf, p(x: p(x = d dx F (x = F (x F (x = x p(xdx p(x s not the probablty that X has value x. But the pdf s proportonal to the probablty that X les n a small nterval centred on x. Notaton: p for pdf, P for probablty The Gaussan (or Normal dstrbuton s the most common (and easly analysed contnuous dstrbuton It s also a reasonable model n many stuatons (the famous bell curve If a (scalar varable has a Gaussan dstrbuton, then t has a probablty densty functon wth ths form: ( p(x µ, σ = N(x; µ, σ (x µ = exp πσ σ The Gaussan s descrbed by two parameters: the mean µ (locaton the varance σ (dsperson Plot of Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Propertes of the Gaussan dstrbuton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 Gaussans have the same shape, wth the locaton controlled by the mean, and the spread controlled by the varance One-dmensonal Gaussan wth zero mean and unt varance (µ =, σ = : N(x; µ, σ =. ( (x µ exp πσ σ pdfs of Gaussan dstrbutons..35 mean= varance= pdf of Gaussan Dstrbuton.35.3 mean= varance= mean= varance= p(x m,s. p(x m,s..5 mean= varance= x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models x ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5

5 Parameter estmaton Estmate mean and varance parameters of a Gaussan from data x, x,..., x n Use sample mean and sample varance estmates: µ = n σ = n n = x (sample mean n (x µ (sample varance = Exercse Consder the log lelhood of a set of N data ponts {x,..., x N } beng generated by a Gaussan wth mean µ and varance σ : L = ln p({x,..., x n } µ, σ = = σ ( (xn µ n= σ ln σ ln(π (x n µ N ln σ N ln(π n= By maxmsng the the log lelhood functon wth respect to µ show that the maxmum lelhood estmate for the mean s ndeed the sample mean: µ ML = x n. N n= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 The multdmensonal Gaussan dstrbuton Covarance matrx ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 The mean vector µ s the expectaton of x: The d-dmensonal vector x s multvarate Gaussan f t has a probablty densty functon of the followng form: ( p(x µ, Σ = (π d/ exp Σ / (x µt Σ (x µ The pdf s parameterzed by the mean vector µ and the covarance matrx Σ. The -dmensonal Gaussan s a specal case of ths pdf The argument to the exponental.5(x µ T Σ (x µ s referred to as a quadratc form. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 µ = E[x] The covarance matrx Σ s the expectaton of the devaton of x from the mean: Σ = E[(x µ(x µ T ] Σ s a d d symmetrc matrx: Σ = E[(x µ (x µ ] = E[(x µ (x µ ] = Σ The sgn of the covarance helps to determne the relatonshp between two components: If x s large when x s large, then (x µ (x µ wll tend to be postve; If x s small when x s large, then (x µ (x µ wll tend to be negatve. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9

6 Sphercal Gaussan Dagonal Covarance Gaussan Surface plot of p(x, x Contour plot of p(x, x Surface plot of p(x, x Contour plot of p(x, x p(x, x x.5.5 p(x, x x x µ =.5.5 ( x.5.5 Σ = x ( ρ =. x µ = ( x Σ = x ( ρ = Full covarance Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models p(x, x x µ = ( Surface plot of p(x, x 3 x Σ = 3 x 3 3 ( Contour plot of p(x, x 3 3 x ρ =.5 It s possble to show that the mean vector ˆµ and covarance matrx ˆΣ that maxmze the lelhood of the tranng data are gven by: ˆµ = N ˆΣ = N n= x n (x n ˆµ(x n ˆµ T n= The mean of the dstrbuton s estmated by the sample mean and the covarance by the sample covarance ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

7 Example data Maxmum lelhood ft to a Gaussan 5 5 X X X X Data n clusters (example ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Example ft by a Gaussan ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models µ = [ ] T µ = [ ] T Σ = Σ =.I µ = [ ] T µ = [ ] T Σ = Σ =.I ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

8 -means clusterng -means example: data set (,3 -means s an automatc procedure for clusterng unlabelled data Requres a prespecfed number of clusters Clusterng algorthm chooses a set of clusters wth the mnmum wthn-cluster varance Guaranteed to converge (eventually Clusterng soluton s dependent on the ntalsaton 5 (,9 (7,8 (6,6 (7,6 (,5 (5, (8, (,5 (, (5, (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 -means example: ntalzaton -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (7,8 (,9 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (7,6 (,5 (,5 (5, (8, (5, (8, (, (5, (, (5, (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

9 -means example: teraton (recompute centres -means example: teraton (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (,5 5 (6,6 (,5 (7,6 (,5 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3.57, 3 (5, (5, (8, (8.75,3.75 (, (3, (, 5 (, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 33 -means example: teraton (recompute centres -means example: teraton 3 (assgn ponts to clusters (,3 (,3 (,9 (.33, (,9 (.33, (7,8 (7,8 5 (6,6 (7,6 (,5 (5, (8, (,5 (8.,. 5 (,5 (6,6 (5, (7,6 (8, (,5 (8.,. (, (, (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3 (, (, No changes, so converged (3.7,.5 (5, (3, (, 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 35

10 Mxture model Component occupaton probablty A more flexble form of densty estmaton s made up of a lnear combnaton of component denstes: p(x = M p(x P( = Ths s called a mxture model or a mxture densty p(x : component denstes P(: mxng parameters Generatve model: Choose a mxture component based on P( Generate a data pont x from the chosen component usng p(x We can apply Bayes theorem: P( x = p(x P( p(x = p(x P( M = p(x P( The posteror probabltes P( x gve the probablty that component was responsble for generatng data pont x The P( xs are called the component occupaton probabltes (or sometmes called the responsbltes Snce they are posteror probabltes: M P( x = = Parameter estmaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 36 If we new whch mxture component was responsble for a data pont: we would be able to assgn each pont unambguously to a mxture component and we could estmate the mean for each component Gaussan as the sample mean (ust le -means clusterng and we could estmate the covarance as the sample covarance But we don t now whch mxture component a data pont comes from... Maybe we could use the component occupaton probabltes P( x? Gaussan mxture model ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 37 The most mportant mxture model s the Gaussan Mxture Model (GMM, where the component denstes are Gaussans Consder a GMM, where each component Gaussan N (x; µ, σ has mean µ and a sphercal covarance Σ = σ I p(x = p(x P P(p(x = = P( p(x P( p(x P P(N (x; µ, σ = P(M p(x M ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 38 x x x d ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 39

11 GMM Parameter estmaton when we now whch component generated the data Defne the ndcator varable z n = f component generated component x n (and otherwse If z n wasn t hdden then we could count the number of observed data ponts generated by : N = EM algorthm n= And estmate the mean, varance and mxng parameters as: Problem! Recall that: ˆµ = ˆσ = ˆP( = N n z nx n N z n n z n x n µ n N z n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models P( x = p(x P( p(x We need to now p(x and P( to estmate the parameters of p(x and to estmate P(... Soluton: an teratve algorthm where each teraton has two parts: Compute the component occupaton probabltes P( x usng the current estmates of the GMM parameters (means, varances, mxng parameters (E-step Computer the GMM parameters usng the current estmates of the component occupaton probabltes (M-step Startng from some ntalzaton (e.g. usng -means for the means these steps are alternated untl convergence Ths s called the EM Algorthm and can be shown to maxmze the lelhood ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Soft assgnment Estmate soft counts based on the component occupaton probabltes P( x n : N = P( x n n= We can magne assgnng data ponts to component weghted by the component occupaton probablty P( x n So we could magne estmatng the mean, varance and pror probabltes as: n ˆµ = P( xn x n n = P( xn x n n P( xn N n ˆσ = P( xn x n µ n = P( xn x n µ n P( xn ˆP( = N n P( x n = N N ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Maxmum lelhood parameter estmaton The lelhood of a data set X = {x, x,..., x N } s gven by: L = N p(x n = n= N n= = N M p(x n P( We can regard the negatve log lelhood as an error functon: E = ln L = = ln p(x n n= M ln p(x n P( n= = Consderng the dervatves of E wth respect to the parameters, gves expressons le the prevous slde ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 3

12 Example ft usng a GMM Pealy dstrbuted data (Example µ = µ = [ ] T Σ =.I Σ = I.5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models.5 Example ft by a Gaussan.5 Example ft by a GMM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models Ftted wth a two component GMM usng EM µ = µ = [ ] T Σ =.I Σ = I 3 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

13 Example : component Gaussans Comments on GMMs GMMs traned usng the EM algorthm are able to self organze to ft a data set Indvdual components tae responsblty for parts of the data set (probablstcally Soft assgnment to components not hard assgnment soft clusterng GMMs scale very well, e.g.: large speech recognton systems can have 3, GMMs, each wth 3 components: sometmes mllon Gaussan components!! And the parameters all estmated from (a lot of data by EM ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 8 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 9 Bac to HMMs... The three problems of HMMs P(s s P(s s P(s 3 s 3 s s s 3 s E s I P(s s I P(s s P(s 3 s P(s E s 3 x p(x s x p(x s x p(x s 3 Output dstrbuton: Sngle multvarate Gaussan wth mean µ, covarance matrx Σ : b (x = p(x s = N (x; µ, Σ M-component Gaussan mxture model: M b (x = p(x s = c m N (x; µ m, Σ m m= ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 Worng wth HMMs requres the soluton of three problems: Lelhood Determne the overall lelhood of an observaton sequence X = (x,..., x t,..., x T beng generated by an HMM Decodng Gven an observaton sequence and an HMM, determne the most probable hdden state sequence 3 Tranng Gven an observaton sequence and an HMM, learn the best HMM parameters λ = {{a }, {b (}} ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5

14 . Lelhood: The Forward algorthm Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Recursve algorthms on HMMs Vsualze the problem as a state-tme trells t- t t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 53. Lelhood: The Forward algorthm. Lelhood: The Forward recurson Goal: determne p(x λ Sum over all possble state sequences s s... s T that could result n the observaton sequence X Rather than enumeratng each sequence, compute the probabltes recursvely (explotng the Marov assumpton Forward probablty, α t (s : the probablty of observng the observaton sequence x... x t and beng n state s at tme t: α t (s = p(x,..., x t, S(t = s λ Intalzaton Recurson Termnaton α (s I = α (s = f s s I α t (s = α t (s a b (x t = p(x λ = α T (s E = α T (s a E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 5 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 55

15 . Lelhood: Forward Recurson Vterb approxmaton α t (s = p(x,..., x t, S(t = s λ t- α t (s α t (s a α t (s Vterb Recurson a a b (x t t α t (s t+ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 56 Instead of summng over all possble state sequences, ust consder the most lely Acheve ths by changng the summaton to a maxmsaton n the recurson: V t (s = max V t (s a b (x t Changng the recurson n ths way gves the lelhood of the most probable path We need to eep trac of the states that mae up ths path by eepng a sequence of bacponters to enable a Vterb bactrace: the bacponter for each state at each tme ndcates the prevous state on the most probable path Vterb Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 57 Lelhood of the most probable path t- t t+ V t (s a b (x t max V t (s Bacponters to the prevous state on the most probable path t- t bt t (s = s t+ b (x t V t (s a a V t (s a V t (s V t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 58 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 59

16 . Decodng: The Vterb algorthm Vterb Bactrace Intalzaton Recurson Termnaton V t (s = V (s I = V (s = bt (s = f s s I N max = V t (s a b (x t bt t (s = arg N max = V t (s a b (x t P = V T (s E = N max = V T (s a E s T = bt T (q E = arg N max = V T (s a E ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Bactrace to fnd the state sequence of the most probable path t- a t bt t (s = s b (x t V t (s t+ V t (s bt t+ (s = s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 3. Tranng: Forward-Bacward algorthm Goal: Effcently estmate the parameters of an HMM λ from an observaton sequence Assume sngle Gaussan output probablty dstrbuton Parameters λ: Transton probabltes a : b (x = p(x s = N (x; µ, Σ a = Gaussan parameters for state s : mean vector µ ; covarance matrx Σ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 Vterb Tranng If we new the state-tme algnment, then each observaton feature vector could be assgned to a specfc state A state-tme algnment can be obtaned usng the most probable path obtaned by Vterb decodng Maxmum lelhood estmate of a, f C(s s s the count of transtons from s to s â = C(s s C(s s Lewse f Z s the set of observed acoustc feature vectors assgned to state, we can use the standard maxmum lelhood estmates for the mean and the covarance: ˆµ x Z = x Z x Z ˆΣ = (x ˆµ (x ˆµ T Z ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 63

17 EM Algorthm Vterb tranng s an approxmaton we would le to consder all possble paths In ths case rather than havng a hard state-tme algnment we estmate a probablty State occupaton probablty: The probablty γ t (s of occupyng state s at tme t gven the sequence of observatons. Compare wth component occupaton probablty n a GMM We can use ths for an teratve algorthm for HMM tranng: the EM algorthm Each teraton has two steps: E-step estmate the state occupaton probabltes (Expectaton M-step re-estmate the HMM parameters based on the estmated state occupaton probabltes (Maxmsaton Bacward Recurson ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 6 β t (s = p(x t+, x t+, x T S(t = s, λ t- t β t (s a t+ b (x t+ a a β t+ (s b (x t+ β t+ (s b (x t+ β t+ (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 66 Bacward probabltes To estmate the state occupaton probabltes t s useful to defne (recursvely another set of probabltes the Bacward probabltes β t (s = p(x t+, x t+, x T S(t = s, λ The probablty of future observatons gven a the HMM s n state s at tme t These can be recursvely computed (gong bacwards n tme Intalsaton β T (s = a E Recurson Termnaton β t (s = a b (x t+ β t+ (s = p(x λ = β (s I = State Occupaton Probablty a I b (x β (s = α T (s E = ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 65 The state occupaton probablty γ t (s s the probablty of occupyng state s at tme t gven the sequence of observatons Express n terms of the forward and bacward probabltes: γ t (s = P(S(t = s X, λ = α T (s E α t(β t ( recallng that p(x λ = α T (s E Snce α t (s β t (s = p(x,..., x t, S(t = s λ p(x t+, x t+, x T S(t = s, λ = p(x,..., x t, x t+, x t+,..., x T, S(t = s λ = p(x, S(t = s λ P(S(t = s X, λ = p(x, S(t = s λ p(x λ ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 67

18 Re-estmaton of Gaussan parameters Re-estmaton of transton probabltes The sum of state occupaton probabltes through tme for a state, may be regarded as a soft count We can use ths soft algnment to re-estmate the HMM parameters: T ˆµ t= = γ t(s x t T t= γ t(s T ˆΣ t= = γ t(s (x t ˆµ (x ˆµ T T t= γ t(s Smlarly to the state occupaton probablty, we can estmate ξ t (s, s, the probablty of beng n s at tme t and s at t +, gven the observatons: ξ t (s, s = P(S(t = s, S(t + = s X, λ = P(S(t = s, S(t + = s, X λ p(x Λ = α t(s a b (x t+ β t+ (s α T (s E We can use ths to re-estmate the transton probabltes â = T t= ξ t(s, s N T = t= ξ t(s, s Pullng t all together ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 68 Extenson to a corpus of utterances ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 69 Iteratve estmaton of HMM parameters usng the EM algorthm. At each teraton E step For all tme-state pars Recursvely compute the forward probabltes α t (s and bacward probabltes β t ( Compute the state occupaton probabltes γ t (s and ξ t (s, s M step Based on the estmated state occupaton probabltes re-estmate the HMM parameters: mean vectors µ, covarance matrces Σ and transton probabltes a The applcaton of the EM algorthm to HMM tranng s sometmes called the Forward-Bacward algorthm We usually tran from a large corpus of R utterances If x r t s the tth frame of the rth utterance X r then we can compute the probabltes α r t(, β r t (, γ r t (s and ξ r t (s, s as before The re-estmates are as before, except we must sum over the R utterances, eg: R T ˆµ r= t= = γr t (s x r t R T r= t= γr t (s ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7

19 Extenson to Gaussan mxture model (GMM The assumpton of a Gaussan dstrbuton at each state s very strong; n practce the acoustc feature vectors assocated wth a state may be strongly non-gaussan In ths case an M-component Gaussan mxture model s an approprate densty functon: b (x = p(x s = M c m N (x; µ m, Σ m m= Gven enough components, ths famly of functons can model any dstrbuton. Tran usng the EM algorthm, n whch the component estmaton probabltes are estmated n the E-step EM tranng of HMM/GMM Rather than estmatng the state-tme algnment, we estmate the component/state-tme algnment, and component-state occupaton probabltes γ t (s, m: the probablty of occupyng mxture component m of state s at tme t We can thus re-estmate the mean of mxture component m of state s as follows T ˆµ m t= = γ t(s, mx t T t= γ t(s, m And lewse for the covarance matrces (mxture models often use dagonal covarance matrces The mxture coeffcents are re-estmated n a smlar way to transton probabltes: T t= ĉ m = γ t(s, m M T l= t= γ t(s, l Dong the computaton ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 Summary: HMMs ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 73 The forward, bacward and Vterb recursons result n a long sequence of probabltes beng multpled Ths can cause floatng pont underflow problems In practce computatons are performed n the log doman (n whch multples become adds Worng n the log doman also avods needng to perform the exponentaton when computng Gaussans HMMs provde a generatve model for statstcal speech recognton Three ey problems Computng the overall lelhood: the Forward algorthm Decodng the most lely state sequence: the Vterb algorthm 3 Estmatng the most lely parameters: the EM (Forward-Bacward algorthm Solutons to these problems are tractable due to the two ey HMM assumptons Condtonal ndependence of observatons gven the current state Marov assumpton on the states ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 7 ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 75

20 References: HMMs Gales and Young (7. The Applcaton of Hdden Marov Models n Speech Recognton, Foundatons and Trends n Sgnal Processng, (3, 95 3: secton.. Jurafsy and Martn (8. Speech and Language Processng (nd ed.: sectons ; 9.; 9.. (Errata at SLP-PIEV-Errata.html Rabner and Juang (989. An ntroducton to hdden Marov models, IEEE ASSP Magazne, 3 (, 6. Renals and Han (. Speech Recognton, Computatonal Lngustcs and Natural Language Processng Handboo, Clar, Fox and Lappn (eds., Blacwells. ASR Lectures &5 Hdden Marov Models and Gaussan Mxture Models 76

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian