Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals

A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden Markov Models: Slde 2

A Markov Sysem Has N saes, called s, s 2.. s N N 3 0 Curren Sae s 2 s s 3 There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } q q 0 s 3 Hdden Markov Models: Slde 3

A Markov Sysem Curren Sae Has N saes, called s, s 2.. s N N 3 q q s 2 s 2 s s 3 There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. Hdden Markov Models: Slde 4

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s s 3 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 A Markov Sysem Has N saes, called s, s 2.. s N There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. The curren sae deermnes he probably dsrbuon for he nex sae. Hdden Markov Models: Slde 5

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 /2 s 2 s /3 s 3 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 /2 Ofen noaed wh arcs beween saes A Markov Sysem Has N saes, called s, s 2.. s N There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. The curren sae deermnes he probably dsrbuon for he nex sae. Hdden Markov Models: Slde 6

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) /2 N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s /3 s 3 /2 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 Markov Propery q + s condonally ndependen of { q -, q -2, q, q 0 } gven q. In oher words: P(q + s j q s ) P(q + s j q s,any earler hsory) Queson: wha would be he bes Bayes Ne srucure o represen he Jon Dsrbuon of ( q 0, q, q 3,q 4 )? Hdden Markov Models: Slde 7

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) /2 Each of hese N probably 3 ables s dencal q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s /3 s 3 /2 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 Markov Propery q + s condonally ndependen of { q -, q -2, q, q 0 } gven q. In oher words: P(q + s j q s ) P(q + s j q s,any earler hsory) Queson: wha would be he bes Bayes Ne srucure o represen he Jon Dsrbuon of ( q 0, q, q 2,q 3,q 4 )? Noaon: a j P( q+ s j q s ) Hdden Markov Models: Slde 9

A Blnd Robo A human and a robo wander around randomly on a grd H R STATE q Locaon of Robo, Locaon of Human Noe: N (num. saes) 8 * 8 324 Hdden Markov Models: Slde 0

Dynamcs of Sysem q 0 H Typcal Quesons: Wha s he expeced me unl he human s crushed lke a bug? R Each mesep he human moves randomly o an adjacen cell. And Robo also moves randomly o an adjacen cell. Wha s he probably ha he robo wll h he lef wall before hs he human? Wha s he probably Robo crushes human on nex me sep? Hdden Markov Models: Slde

Example Queson I s currenly me, and human remans uncrushed. Wha s he probably of crushng occurrng a me +? If robo s blnd: We can compue hs n advance. If robo s omnpoen: (I.E. If robo knows sae a me ), can compue drecly. If robo has some sensors, bu ncomplee sae nformaon Hdden Markov Models are applcable! We ll do hs frs Too Easy. We won do hs Man Body of Lecure Hdden Markov Models: Slde 2

Wha s P(q s)? slow, supd answer Sep : Work ou how o compue P(Q) for any pah Q q q 2 q 3.. q Gven we know he sar sae q (.e. P(q )) P(q q 2.. q ) P(q q 2.. q - ) P(q q q 2.. q - ) P(q q 2.. q - ) P(q q - ) P(q 2 q )P(q 3 q 2 ) P(q q - ) WHY? Sep 2: Use hs knowledge o ge P(q s) P( q s) Q" Pahs of! P( Q) lengh ha end n s Compuaon s exponenal n Hdden Markov Models: Slde 3

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon! p 0 ( )! j p + + ( j) P( q s j ) Hdden Markov Models: Slde 4

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) Hdden Markov Models: Slde 5

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Hdden Markov Models: Slde 6

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Remember, a j P( q+ s j q s ) N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 7

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Compuaon s smple. Jus fll n hs able n hs order: 0 : fnal p () 0 p (2) p (N) 0 N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 8

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Cos of compung P () for all saes S s now O( N 2 ) The supd way was O(N ) Ths was a smple example I was mean o warm you up o hs rck, called Dynamc Programmng, because HMMs do many rcks lke hs. N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 9

Hdden Sae I s currenly me, and human remans uncrushed. Wha s he probably of crushng occurrng a me +? If robo s blnd: We can compue hs n advance. If robo s omnpoen: (I.E. If robo knows sae a me ), can compue drecly. If robo has some sensors, bu ncomplee sae nformaon Hdden Markov Models are applcable! We ll do hs frs Too Easy. We won do hs Man Body of Lecure Hdden Markov Models: Slde 20

Hdden Sae The prevous example red o esmae P(q s ) uncondonally (usng no observed evdence). Suppose we can observe somehng ha s affeced by he rue sae. Example: Proxmy sensors. (ell us he conens of he 8 adjacen squares) H R 0 W H W W W denoes WALL True sae q Wha he robo sees: Observaon O Hdden Markov Models: Slde 2

Nosy Hdden Sae Example: Nosy Proxmy sensors. (unrelably ell us he conens of he 8 adjacen squares) H True sae q R 0 W H W W Uncorruped Observaon W denoes WALL W H H W W Wha he robo sees: Observaon O Hdden Markov Models: Slde 22

Nosy Hdden Sae Example: Nosy Proxmy sensors. (unrelably ell us he conens of he 8 adjacen squares) H R 0 True sae q O s nosly deermned dependng on he curren sae. Assume ha O s condonally ndependen of {q -, q -2, q, q 0, O -, O -2, O, O 0 } gven q. In oher words: P(O X q s ) P(O X q s,any earler hsory) 2 W H W W Uncorruped Observaon W H H W denoes WALL W W Wha he robo sees: Observaon O Hdden Markov Models: Slde 23

Hdden Markov Models Our robo wh nosy sensors s a good example of an HMM Queson : Sae Esmaon Wha s P(q T S O O 2 O T ) I wll urn ou ha a new cue D.P. rck wll ge hs for us. Queson 2: Mos Probable Pah Gven O O 2 O T, wha s he mos probable pah ha I ook? And wha s ha probably? Ye anoher famous D.P. rck, he VITERBI algorhm, ges hs. Queson 3: Learnng HMMs: Gven O O 2 O T, wha s he maxmum lkelhood HMM ha could have produced hs srng of observaons? Very very useful. Uses he E.M. Algorhm Hdden Markov Models: Slde 25

Are H.M.M.s Useful? You be!! Robo plannng + sensng when here s uncerany Speech Recognon/Undersandng Phones Words, Sgnal phones Human Genome Projec Complcaed suff your lecurer knows nohng abou. Consumer decson modelng Economcs & Fnance. Plus a leas 5 oher hngs I haven hough of. Hdden Markov Models: Slde 26

HMM Noaon (from Rabner s Survey) The saes are labeled S S 2.. S N *L. R. Rabner, "A Tuoral on Hdden Markov Models and Seleced Applcaons n Speech Recognon," Proc. of he IEEE, Vol.77, No.2, pp.257--286, 989. For a parcular ral. Le T T be he number of observaons s also he number of saes passed hrough O O O 2.. O T s he sequence of observaons Q q q 2.. q T s he noaon for a pah of saes λ N,M,{π, },{a j },{b (j)} s he specfcaon of an HMM Hdden Markov Models: Slde 27

HMM Formal Defnon An HMM, λ, s a 5-uple conssng of N he number of saes M he number of possble observaons {π, π 2,.. π N } The sarng sae probables P(q 0 S ) π Ths s new. In our prevous example, sar sae was deermnsc a a 22 a N a 2 a 22 a 2N : : : a N a N2 a NN b () b (2) b (M) b 2 () b 2 (2) b 2 (M) : : : b N () b N (2) b N (M) The sae ranson probables P(q + S j q S )a j The observaon probables P(O k q S )b (k) Hdden Markov Models: Slde 28

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π /2 π 2 /2 /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. a 0 a 2 /3 a 3 a 2 /3 a 22 0 a 3 a 3 /3 a 32 /3 a 3 /3 b (X) /2 b (Y) /2 b (Z) 0 b 2 (X) 0 b 2 (Y) /2 b 2 (Z) /2 b 3 (X) /2 b 3 (Y) 0 b 3 (Z) /2 Hdden Markov Models: Slde 29

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween S and S 2 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 O 0 b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 O O 2 Hdden Markov Models: Slde 30

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween X and Y a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 O O 2 Hdden Markov Models: Slde 3

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Goo S 3 wh probably or S 2 wh prob. /3 S O 0 O O 2 X Hdden Markov Models: Slde 32

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween Z and X a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 X b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 S 3 O O 2 Hdden Markov Models: Slde 33

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Each of he hree nex saes s equally lkely S S 3 O 0 O O 2 X X Hdden Markov Models: Slde 34

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 X b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 S 3 S 3 O O 2 X Z Hdden Markov Models: Slde 36

Sae Esmaon S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Ths s wha he observer has o work wh??? O 0 O O 2 X X Z Hdden Markov Models: Slde 37

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) S XY /3 /3 /3 ZX S 3 /3 Z Y /3 S 2 How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? Hdden Markov Models: Slde 38

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) S XY /3 P(Q) P(q,q 2,q 3 ) /3 /3 ZX S 3 /3 Z Y /3 S 2 How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? P(q ) P(q 2,q 3 q ) (chan rule) P(q ) P(q 2 q ) P(q 3 q 2,q ) (chan) P(q ) P(q 2 q ) P(q 3 q 2 ) (why?) Example n he case Q S S 3 S 3 : /2 * * /3 /9 Hdden Markov Models: Slde 39

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? S P(O Q) XY /3 P(O O 2 O 3 q q 2 q 3 ) P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) Example n he case Q S S 3 S 3 : P(X S ) P(X S 3 ) P(Z S 3 ) /2 * /2 * /2 /8 /3 /3 ZX /3 S 3 Z Y /3 S 2 Hdden Markov Models: Slde 40

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? S P(O Q) XY /3 /3 /3 ZX P(O O 2 O 3 q q 2 q 3 ) /3 S 3 Z Y /3 P(O) would need 27 P(Q) S 2 P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) compuaons and 27 P(O Q) compuaons Example n he case Q S S 3 S 3 : P(X S ) P(X S 3 ) P(Z S 3 ) /2 * /2 * /2 /8 So le s be smarer A sequence of 20 observaons would need 3 20 3.5 bllon compuaons and 3.5 bllon P(O Q) compuaons Hdden Markov Models: Slde 4

The Prob. of a gven seres of observaons, non-exponenal-cos-syle Gven observaons O O 2 O T Defne α () P(O O 2 O q S λ) where T α () Probably ha, n a random ral, We d have seen he frs observaons We d have ended up n S as he h sae vsed. In our example, wha s α 2 (3)? Hdden Markov Models: Slde 42

Hdden Markov Models: Slde 43 α (): easy o defne recursvely α () P(O O 2 O T q S λ) (α () can be defned supdly by consderng all pahs lengh. How?) ( ) ( ) ( ) ( ) ( ) ( )!! + + + j S q O O O O j S q O S q S q O 2... P wha? P P P " "

Hdden Markov Models: Slde 44 α (): easy o defne recursvely α () P(O O 2 O T q S λ) (α () can be defned supdly by consderng all pahs lengh. How?) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) O b a S q O S q S q S q S q O S q O O O S q O O O S q O S q O S q O O O S q O O O O j S q O S q S q O j j j j j N j N j j!!!!! " " " " " + + + + + + + + + + + + + # # # # # # # 2 2 2 2 P P, P... P..., P... P... P wha? P P P

n our example!!! ( ) P( O O O % q S #) 2.. ( ) b ( O )" ( j) a b ( O )! ( ) + $ j j + S XY /3 /3 /3 ZX S 3 /3 Z Y /3 S 2 WE SAW O O 2 O 3 X X Z!!! 2 3 4 ( )! ( 2) 0! ( 3) ( ) 0! ( 2) 0! ( 3) 72 ( ) 0! ( 2)! ( 3) 3 2 2 3 0 2 72 Hdden Markov Models: Slde 45

Easy Queson We can cheaply compue α ()P(O O 2 O q S ) (How) can we cheaply compue P(O O 2 O )? (How) can we cheaply compue P(q S O O 2 O ) Hdden Markov Models: Slde 46

Easy Queson We can cheaply compue α ()P(O O 2 O q S ) (How) can we cheaply compue P(O O 2 O )? N! " ( ) (How) can we cheaply compue P(q S O O 2 O ) " ( ) N! j " ( j) Hdden Markov Models: Slde 47

Mos probable pah gven observaons Wha's mos probable pah gven Wha s Slow, supd answer : argmax Q Q Q argmax argmax argmax Q P P P ( Q O O... O ) ( Q O O... O ) ( O O... O Q) P P T ( O O... O ) P( Q) ( O O... O Q) P( Q) 2 2 2 2 T T T 2 O O T 2?... O T,.e. Hdden Markov Models: Slde 48

Effcen MPP compuaon We re gong o compue he followng varables: δ () max P(q q 2.. q - q S O.. O ) q q 2..q - The Probably of he pah of Lengh - wh he maxmum chance of dong all hese hngs: OCCURING and ENDING UP IN STATE S and PRODUCING OUTPUT O O DEFINE: So: mpp () ha pah δ () Prob(mpp ()) Hdden Markov Models: Slde 49

" mpp " The Verb Algorhm ( ) P( q q... q # q S # O O.. O ) ( ) P( q q... q # q S # O O.. O ) ( ) one choce P( q S # O ) q q P q q ( q S ) P( O q S ) 2 arg max 2 max max... q... q ( O ) $ $! b Now, suppose we have all he δ () s and mpp () s for all. 2 2 $ $ HOW TO GET δ + (j) and mpp + (j)? 2 2 mpp () Probδ () mpp (2) : mpp (N) S? S 2 : Probδ (2) S N Probδ (N) S j q q + Hdden Markov Models: Slde 50

The Verb Algorhm me me + S : S j S : The mos prob pah wh las wo saes S S j s he mos prob pah o S, followed by ranson S S j Hdden Markov Models: Slde 5

The Verb Algorhm me me + S : S j S : The mos prob pah wh las wo saes S S j he mos prob pah o S, followed by ranson S S j Wha s he prob of ha pah? δ () x P(S S j O + λ) δ () a j b j (O + ) SO The mos probable pah o S j has S * as s penulmae sae where *argmax δ () a j b j (O + ) s Hdden Markov Models: Slde 52

Wha s Verb used for? Sgnal words Classc Example Speech recognon: HMM observable s sgnal Hdden sae s par of word formaon Wha s he mos probable word gven hs sgnal? UTTERLY GROSS SIMPLIFICATION In pracce: many levels of nference; no one bg jump. Hdden Markov Models: Slde 54

HMMs are used and useful Bu how do you desgn an HMM? Occasonally, (e.g. n our robo example) s reasonable o deduce he HMM from frs prncples. Bu usually, especally n Speech or Genecs, s beer o nfer from large amouns of daa. O O 2.. O T wh a bg T. Observaons prevously n lecure O O 2.. O T Observaons n he nex b O O 2.. O T Hdden Markov Models: Slde 55

Inferrng an HMM Remember, we ve been dong hngs lke P(O O 2.. O T λ ) Tha λ s he noaon for our HMM parameers. Now We have some observaons and we wan o esmae λ from hem. AS USUAL: We could use () MAX LIKELIHOOD λ argmax P(O.. O T λ) λ () BAYES Work ou P( λ O.. O T ) and hen ake E[λ] or max P( λ O.. O T ) λ Hdden Markov Models: Slde 56

Max lkelhood HMM esmaon Defne γ () P(q S O O 2 O T, λ ) ε (,j) P(q S q + S j O O 2 O T,λ ) γ () and ε (,j) can be compued effcenly,j, (Deals n Rabner paper) T "! $ T "!# ( ) (, j) Expeced number of ransons ou of sae durng he pah Expeced number of ransons from sae o sae j durng he pah Hdden Markov Models: Slde 57

% $ T # " T # " ( ) P( q S OO 2.. OT,&) (, j) P( q S! q S O O.. O,&) % $ ( ) + j 2 T expeced number of ransons ou of sae durng pah (, j) expeced number of ransons ou of and no j durng pah HMM esmaon Noce Esmae of b a (, j) ( ) Prob We can re - esmae (, j) ( ) We can also re - esmae j j T * ) T * )! ) ) ' expeced frequency$ % " & ( j # ' expeced frequency$ % " & # ( Nex sae S Ths sae S ) ( O )!L (See Rabner) k, +, + Hdden Markov Models: Slde 58 j

EM for HMMs If we knew λ we could esmae EXPECTATIONS of quanes such as Expeced number of mes n sae Expeced number of ransons j If we knew he quanes such as Expeced number of mes n sae Expeced number of ransons j We could compue he MAX LIKELIHOOD esmae of λ {a j },{b (j)}, π Roll on he EM Algorhm Hdden Markov Models: Slde 59

EM 4 HMMs. Ge your observaons O O T 2. Guess your frs λ esmae λ(0), k0 3. k k+ 4. Gven O O T, λ(k) compue γ (), ε (,j) T, N, j N 5. Compue expeced freq. of sae, and expeced freq. j 6. Compue new esmaes of a j, b j (k), π accordngly. Call hem λ(k+) 7. Goo 3, unless converged. Also known (for he HMM case) as he BAUM-WELCH algorhm. Hdden Markov Models: Slde 60

Bad News There are los of local mnma Good News The local mnma are usually adequae models of he daa. Noce EM does no esmae he number of saes. Tha mus be gven. Ofen, HMMs are forced o have some lnks wh zero probably. Ths s done by seng a j 0 n nal esmae λ(0) Easy exenson of everyhng seen oday: HMMs wh real valued oupus Hdden Markov Models: Slde 6

Bad News There are los of local mnma Trade-off beween oo few saes (nadequaely modelng he srucure n he daa) and oo many (fng he nose). Thus #saes s a regularzaon parameer. Good News Blah blah blah bas varance radeoff blah blah cross-valdaon blah blah.aic, The local mnma BIC.blah are usually blah adequae (same ol same models ol ) of he daa. Noce EM does no esmae he number of saes. Tha mus be gven. Ofen, HMMs are forced o have some lnks wh zero probably. Ths s done by seng a j 0 n nal esmae λ(0) Easy exenson of everyhng seen oday: HMMs wh real valued oupus Hdden Markov Models: Slde 62

Wha You Should Know Wha s an HMM? Compung (and defnng) α () The Verb algorhm Oulne of he EM algorhm To be very happy wh he knd of mahs and analyss needed for HMMs DON T PANIC: sars on p. 257. Farly horough readng of Rabner* up o page 266* [Up o bu no ncludng IV. Types of HMMs ]. *L. R. Rabner, "A Tuoral on Hdden Markov Models and Seleced Applcaons n Speech Recognon," Proc. of he IEEE, Vol.77, No.2, pp.257--286, 989. Hdden Markov Models: Slde 63