Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1
Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems: Evlution: Forwrd lgorithm Bckwrd lgorithm Decoding: Lerning: Viterbi lgorithm Forwrd-Bckwrd Algorithm Speech recogition with HMM Isolted word recognizer Mesured performnce Conclusion 2
Mrkov Models - A System of n sttes : the system is t time t in stte w(t) - Chnges of stte re nondeterministic but depending on the previous sttes In First order,..., n-order model chnges depend on the previous 1,..., n sttes - Trnsition probbilities: Often shown s stte trnsition mtrix - Vector of initil probbilities ϖ 3
Exmple: Wether system Three sttes: sunny, cloudy, riny Trnsition Mtrix A A = Ê 0,5 Á Á0,375 Á Ë 0,125 0,25 0,125 0,625 0,25 ˆ 0,375 0,375 Stte vector p = (1,0,0) T 4
Hidden Mrkov Models - n invisible sttes - every stte emits t time t visible symbol/stte v(t) - System genertes sequence of symbols (sttes) V T = { v(1), v(2), v(3),..., v(n)} - Trnsition probbility: P(w(t+1)= w(t)=i) = i = Trnsition mtrix A - Probbility of emitting v(t) in stte w(t)=: P( v(t) w(t)=) = b (k) = Confusion mtrix B - Vector of initil probbilities ϖ - Normliztion conditions  i= 1 for ll i  b k) = 1 ( for ll k 5
Exmple of hidden Mrkov model: Indirect observtion of the wether using piece of seweed 6
Trnsition mtrix A sun clouds rin sun clouds rin Confusion mtrix B Ê 0,5 Á Á0,375 Á Ë 0,125 0,25 0,125 0,625 0,25 ˆ 0,375 0,375 sun clouds rin Stte vector Dry Dryish Dmp Soggy Ê 0,6 0,2 0,15 0,05ˆ Á Á0,25 0,25 0,25 0,25 Á Ë0,05 0,1 0,35 0,5 p = (1,0,0) T 7
Types of Hidden Mrkov Models Ergodic All sttes cn be reched within one step from everywhere Trnsition mtrix A entries re never zero A = Ê Á Á Á Á Ë 11 21 31 41 12 22 32 42 13 23 33 43 14 24 34 44 ˆ 8
Left-right Models No bckleding trnsitions i = 0 < i Additionl constrints: i = 0 > i + (f.e. = 2 => no umps of more thn 2 sttes) A = Ê Á Á 0 Á 0 Á Ë 0 11 12 22 0 0 13 23 33 0 0 24 34 44 ˆ 9
Applictions using HMM Speech recognition Lnguge modelling Protein sequence nlysis Recognition of hnd writings Finncil/Economic Models 10
Three centrl problems 1. Evlution Given HMM (A,B,ϖ) : Find the probbility tht sequence of visible sttes V T ws generted by tht model 2. Decoding Given HMM (A,B,ϖ) nd set of observtions: Find the most probble sequence of hidden sttes tht led to those observtions 3. Lerning Given the number of visible nd hidden sttes nd some sequences of trining observtions: Find the prmeters i nd b (k) 11
Evlution: Probbility tht the system M produces sequence V T P( V T ) = r mx  r= 1 P( V T w T r ) P( w For ll possible sequences w rt ={w(1), w(2),..., w(t)} Tht mens: Tke ll sequences of hidden sttes, clculte the probbility tht they clculted V T nd dd them N T  T P( V ) = P( v( t) w( t)) P( w( t) w( t r= 1 t= 1 T r ) -1)) N is the number of sttes, T is the number of visible symbols / steps to go Problem: Complexity is o(n T T) 12
The Forwrd logithm: Clculte the Problem recursively ( t) Ï Ô N = Ì Ô Ó Â i= 1 p b ( t i ( v(0)) -1) i b ( v( t)) t=0 nd = initil stte else b (v(t)) mens the probbility to emit the stte selected by v(t) (t) is the probbility tht the model is in stte nd hs produced the first t elements of V T 13
Forwrd lgorithm initilize (0)= p()b (v(0)), t=0 i,b,visible sequence V T for t <= t + 1 ( t) ææ Â = i ( t -1) N i 1 i b ( v( t)) for ll N until t = T return P(V T ) = finl (T) end Complexity of this lgorithm: o(n 2 T) 14
Exmple: Forwrd lgorithm Stte 1 Stte 2 Stte 3 t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 15
Link to Jv pplet exmple 16
Bckwrd lgorithm initilize b i (T)= 1, t=t i, b k, visible sequence V T for t <= t - 1 until t = 1 return P(V T ) = b i (0) end N b ( t) ææ Â = b ( t + 1) b ( v( t i 1 i i + 1)) for ll < N b i (t) is the probbility tht the model is in stte nd will produce the lst T - t elements of V T 17
Decoding (Viterbi Algorithm) Finds the sequence of hidden sttes in N-stte model M tht most probble generted sequence V T = {v(0),v(1),...,v(t)} of visible sttes. Cn be recursively clculted with the Viterbi lgorithm: d i (t) = mx P(w(1),..., w(t)=i, v(1),...,v(t) M ) over ll pthes w recursively: d (t) = mx i [d i (t-1) i ] b (v(t)) with d (0) = p(i) b i (v(0)) d i (t) is the mximl probbility long pth w(1),..., w(t) = i to generte the sequence v(1),...,v(t) (prtil best pth) to keep trck of the pth mximizing d i (t) the rry y (t) is used 18
Sequence clcultion: initilize d i (0) = p(i)b i (v(0)), y i (0)=0, t=0 für lle i for t <= t + 1 for ll sttes d (t)= mx 1 i N [d i (t-1) i ] b (v(t)) y (t)= rg mx 1 i N [d i (t-1) i ] until t = T Sequence Termintion: w(t) = rg mx 1 i N [d i (t-1) i ] Sequence bcktrcking: for t = T-1, t <= t - 1 w(t) = y w(t+1) (t+1) until t = 0 19
Exmple: 20
Link to Jv pplet exmple 21
Positive spects of the Viterbi Algorithm Reduction of computtionl complexity by recursion Tkes the entire context to find optiml solution, therefore lower error rtes with noisy dt 22
Lerning (Forwrd Bckwrd lgorithm) determine the N-stte model prmeters i nd b k bsed on trining sequence by itertively clculting better vlues Definition: i i + i ( t) x ( t) = i i b Probbility of trnsition w i (t) to w (t+1) given the model M generted V T by ny pth T x ( t) = P( w ( t), w ( t 1) V, M ) x ( t) i ( v( t P( V ( t) T b + 1)) b ( t M ) + 1) ( v( t + 1)) b ( t + 1) i i = Â N Â N = = + + k l ki ( t) klbl ( v( t 1)) bl ( t 1 1 1) N g = Â i ( t) = x 1 i ( t) Probbility of being in stte w i t time t 23
24
 T - 1 g ( t= 1 i t )  T t = g i ( t 1  T - 1 x ( t= 1 i t ) ) Expected number of trnsitions from w i to ny other stte Expected number of times in w i Expected number of trnsitions from w i to w which gives us better vlues for i ' i   T -1 t= 1 = T -1 t= 1 x ( t) i g ( t) i T -1  t= 1 = T - 1 N  t= 1 k = 1 ( t) it k i ( t) b ( v( t + 1)) b ( t ik i b k ( v( t + 1)) b ( t k + 1) + 1) 25
26 Â Â = = = T t T t k t k t v b t v b 1 1 ) ( ) ( ) ( ) ( ' g g Expected number of times in w i Expected number of times in w i emitting v k p (i) = g i (0) Probbility of being in stte i t time 0 Clcultion of the b
Positive Aspect rbitrry precision of estimtion Problems How to choose the initil prmeters i nd b? For ϖ nd i either rndom or uniform for the b better initil estimtes re very useful for fst convergence estimte these prmeters by: Mnul segmenttion Mximum Likelihood segmenttion 27
Wht is the pproprite model Must be decided on the kind of signl modeled In speech recognition often left-right model is used to model the dvncing of time f.e. every sylble get stte nd finl silent stte 28
Scling Problem: The i nd b i re lwys smller thn 1 so the clcultions converge ginst zero This exceeds the precision rnge even in double precision Solution: Multiply the i nd b i by scling coefficient c t tht is independent from i but depends on t For exmple: c t = N Â i= 1 1 ( t) i These prmeters fll out in the clcultion of i nd b 29
Speech recognizers using HMMs Isolted Word recognizer Build HMM for ech word in vocbulry nd clculte the (A,B,ϖ) prmeters (trin the model) For ech word to be recognized: - Feture nlysis (Vector quntiztion) Genertes Observtion Vectors from the signl - Run Viterbi on ll models to find the most probble model for tht observtion sequence 30
Block digrm of n isolted word recognizer 31
) log energy nd b) stte ssignment for the word six 32
Mesured performnce of n isolted word recognizer 100 digits by 100 tlkers(50 femle / 50 mle) Originl Trining: the originl trining set ws used TS2: the originl spekers s in the trining TS3: Complete new set of spekers TS4: Another new set of spekers 33
Conclusion There re vrious processes where the rel ctivity is invisible nd only generted pttern cn be observed. These cn be modeled by HMM Limittions of HMM : needs enough trining dt The Mrkov ssumption tht ech stte only depends on the previous stte is not lwys true Advntges Acceptble clcultion complexity Low error rtes HMM re the predominnt method for current utomtic speech recognition nd ply gret role in other recognition systems 34
Bibliogrphy Rbiner, L.R. A Tutoril on Hidden Mrkov Models nd Selected Applictions in Speech Recognition; Proceedings of the IEEE, Vol.77, Iss.2, Feb 1989; Pges:257-286 Richrd O. Dud, Peter E. Hrt, Dvid G. Stork Pttern Clssifiction chpter 3.10 Eric Keller (Editor) Fundmentls of Speech synthesis nd Speech recognition Chpter 8 by Kri Torkkol Used Websites http://www.comp.leeds.c.uk/roger/hiddenmrkovmodels/html_dev/min.html Introduction to Hidden Mrkov Models, University of Leeds http://www.cnel.ufl.edu/~ydu/report1.html Speech recognition using Hidden Mrkov Models, Ydunndn Ngr Ro, University of Florid 35