Hidden Markov Models

Size: px

Start display at page:

Download "Hidden Markov Models"

Rosemary Tucker
5 years ago
Views:

1 Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your own needs. PowerPont orgnals are avalable. If you make use of a sgnfcant porton of these sldes n your own lecture, please nclude ths message, or the followng lnk to the source repostory of Andrew s tutorals: Comments and correctons gratefully receved. Hdden Markov Models Andrew W. Moore Professor School of Computer Scence Carnege Mellon Unversty awm@cs.cmu.edu Copyrght , Andrew W. Moore Nov 29th, 200

2 A Markov System s 2 Has N states, called s, s 2.. s N There are dscrete tmesteps, t=0, t=, s s 3 N = 3 t=0 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 2

3 N = 3 t=0 Current State s 2 s s 3 A Markov System Has N states, called s, s 2.. s N There are dscrete tmesteps, t=0, t=, On the t th tmestep the system s n exactly one of the avalable states. Call t q t Note: q t {s, s 2.. s N } q t =q 0 =s 3 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 3

4 N = 3 t= Current State s 2 s s 3 A Markov System Has N states, called s, s 2.. s N There are dscrete tmesteps, t=0, t=, On the t th tmestep the system s n exactly one of the avalable states. Call t q t Note: q t {s, s 2.. s N } Between each tmestep, the next state s chosen randomly. q t =q =s 2 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 4

5 P(q t+ =s q t =s ) = 0 P(q t+ =s 2 q t =s ) = 0 P(q t+ =s 3 q t =s ) = N = 3 t= q t =q =s 2 P(q t+ =s q t =s 2 ) = /2 P(q t+ =s 2 q t =s 2 ) = /2 P(q t+ =s 3 q t =s 2 ) = 0 s s 3 Copyrght , Andrew W. Moore s 2 P(q t+ =s q t =s 3 ) = /3 P(q t+ =s 2 q t =s 3 ) = 2/3 P(q t+ =s 3 q t =s 3 ) = 0 A Markov System Has N states, called s, s 2.. s N There are dscrete tmesteps, t=0, t=, On the t th tmestep the system s n exactly one of the avalable states. Call t q t Note: q t {s, s 2.. s N } Between each tmestep, the next state s chosen randomly. The current state determnes the probablty dstrbuton for the next state. Hdden Markov Models: Slde 5

6 P(q t+ =s q t =s 2 ) = /2 P(q t+ =s 2 q t =s 2 ) = /2 P(q t+ =s 3 q t =s 2 ) = 0 A Markov System Has N states, called s, s 2.. s N P(q t+ =s q t =s ) = 0 P(q t+ =s 2 q t =s ) = 0 s 2 /2 There are dscrete tmesteps, t=0, t=, P(q t+ =s 3 q t =s ) = N = 3 t= q t =q =s 2 /2 s /3 s 3 P(q t+ =s q t =s 3 ) = /3 P(q t+ =s 2 q t =s 3 ) = 2/3 P(q t+ =s 3 q t =s 3 ) = 0 Copyrght , Andrew W. Moore 2/3 Often notated wth arcs between states On the t th tmestep the system s n exactly one of the avalable states. Call t q t Note: q t {s, s 2.. s N } Between each tmestep, the next state s chosen randomly. The current state determnes the probablty dstrbuton for the next state. Hdden Markov Models: Slde 6

7 P(q t+ =s q t =s ) = 0 P(q t+ =s 2 q t =s ) = 0 P(q t+ =s 3 q t =s ) = /2 N = 3 t= q t =q =s 2 P(q t+ =s q t =s 2 ) = /2 P(q t+ =s 2 q t =s 2 ) = /2 P(q t+ =s 3 q t =s 2 ) = 0 s 2 2/3 s /3 s 3 /2 P(q t+ =s q t =s 3 ) = /3 P(q t+ =s 2 q t =s 3 ) = 2/3 P(q t+ =s 3 q t =s 3 ) = 0 Markov Property q t+ s condtonally ndependent of { q t-, q t-2, q, q 0 } gven q t. In other words: P(q t+ = s j q t = s ) = P(q t+ = s j q t = s,any earler hstory) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 7

8 Markov Property: Representaton q 0 q q 2 q 3 q 4 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 8

9 A Blnd Robot A human and a robot wander around randomly on a grd H R STATE q = Copyrght , Andrew W. Moore Locaton of Robot, Locaton of Human Note: N (num. states) = 8 * 8 = 324 Hdden Markov Models: Slde 9

10 Dynamcs of System q 0 = H Typcal Questons: What s the expected tme untl the human s crushed lke a bug? What s the probablty that the robot wll ht the left wall before t hts the human? What s the probablty Robot crushes human on next tme step? Copyrght , Andrew W. Moore R Each tmestep the human moves randomly to an adjacent cell. And Robot also moves randomly to an adjacent cell. Hdden Markov Models: Slde 0

11 Example Queston It s currently tme t, and human remans uncrushed. What s the probablty of crushng occurrng at tme t +? If robot s blnd: We can compute ths n advance. We ll do ths frst If robot s omnpotent: (I.E. If robot knows state at tme t), can compute drectly. If robot has some sensors, but ncomplete state nformaton Hdden Markov Models are applcable! Too Easy. We won t do ths Man Body of Lecture Copyrght , Andrew W. Moore Hdden Markov Models: Slde

12 What s P(q t =s)? Too Slow Step : Work out how to compute P(Q) for any path Q = q 0 q q 2 q 3.. q t Gven we know the start state q 0 P(q 0 q.. q t ) = P(q 0 q.. q t- ) P(q t q 0 q.. q t- ) = P(q 0 q.. q t- ) P(q t q t- ) = P(q q 0 )P(q 2 q ) P(q t q t- ) WHY? Step 2: Use ths knowledge to get P(q t =s) P( q t = s) = Q Paths of P( Q) length t that end n s Computaton s exponental n t Copyrght , Andrew W. Moore Hdden Markov Models: Slde 2

13 What s P(q t =s)? Clever Answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton p 0 ( ) = j pt + + ( j) = P( qt = s j ) = Copyrght , Andrew W. Moore Hdden Markov Models: Slde 3

14 What s P(q t =s)? Clever answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton # f s s the start state p0( ) = "! 0 otherwse j pt + + ( j) = P( qt = s j ) = Copyrght , Andrew W. Moore Hdden Markov Models: Slde 4

15 What s P(q t =s)? Clever answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton # f s s the start state p0( ) = "! 0 otherwse j pt + + ( j) = P( qt = s j ) N = P( q t+ = = s j qt = s ) = Copyrght , Andrew W. Moore Hdden Markov Models: Slde 5

16 What s P(q t =s)? Clever answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton # f s s the start state p0( ) = "! 0 otherwse j pt + + ( j) = P( qt = s j ) N = P( q t+ = = s j qt = s ) = Remember, a j = P( qt+ = s j qt = s ) N = P( qt+ = s j qt = s ) P( qt = s ) = a N = j p t ( ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 6

17 What s P(q t =s)? Clever answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton # f s s the start state p0( ) = "! 0 otherwse j pt + + ( j) = P( qt = s j ) N = P( q t+ = = s j qt = s ) = Computaton s smple. Just fll n ths table n ths order: t p t () p t (2) p t (N) : t fnal N = P( qt+ = s j qt = s ) P( qt = s ) = a N = j p t ( ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 7

18 What s P(q t =s)? Clever answer For each state s, defne p t () = Prob. state s s at tme t = P(q t = s ) Easy to do nductve defnton # f s s the start state p0( ) = "! 0 otherwse j pt + + ( j) = P( qt = s j ) N = P( q t+ = = s j qt = s ) = Cost of computng P t () for all states S s now O(t N 2 ) The stupd way was O(N t ) Ths was a smple example It was meant to warm you up to ths trck, called Dynamc Programmng, because HMMs do many trcks lke ths. N = P( qt+ = s j qt = s ) P( qt = s ) = a N = j p ( ) t Copyrght , Andrew W. Moore Hdden Markov Models: Slde 8

19 Hdden State It s currently tme t, and human remans uncrushed. What s the probablty of crushng occurrng at tme t +? If robot s blnd: We can compute ths n advance. We ll do ths frst If robot s omnpotent: (I.E. If robot knows state at tme t), can compute drectly. If robot has some sensors, but ncomplete state nformaton Hdden Markov Models are applcable! Too Easy. We won t do ths Man Body of Lecture Copyrght , Andrew W. Moore Hdden Markov Models: Slde 9

20 Hdden State The prevous example tred to estmate P(q t = s ) uncondtonally (usng no observed evdence). Suppose we can observe somethng that s affected by the true state. Example: Proxmty sensors. (tell us the contents of the 8 adjacent squares) H R 0 W W W H W denotes WALL True state q t What the robot sees: Observaton O t Copyrght , Andrew W. Moore Hdden Markov Models: Slde 20

21 Nosy Hdden State Example: Nosy proxmty sensors. (unrelably tell us the contents of the 8 adjacent squares) H True state q t R 0 W W W H Uncorrupted Observaton W denotes WALL W H W W H What the robot sees: Observaton O t Copyrght , Andrew W. Moore Hdden Markov Models: Slde 2

22 Nosy Hdden State Example: Nosy Proxmty sensors. (unrelably tell us the contents of the 8 adjacent squares) H R 0 2 True state q t O t s nosly determned dependng on the current state. Assume that O t s condtonally ndependent of {q t-, q t-2, q, q 0, O t-, O t-2, O, O 0 } gven q t. In other words: P(O t = X q t = s ) = P(O t = X q t = s,any earler hstory) Copyrght , Andrew W. Moore W W W H Uncorrupted Observaton W H W denotes WALL W W H What the robot sees: Observaton O t Hdden Markov Models: Slde 22

23 Nosy Hdden State: Representaton O 0 O O 3 O 3 O 4 q 0 q q 2 q 3 q 4 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 23

24 Hdden Markov Models Our robot wth nosy sensors s a good example of an HMM Queston : State Estmaton What s P(q T =S O O 2 O T ) It wll turn out that a new cute D.P. trck wll get ths for us. Queston 2: Most Probable Path Gven O O 2 O T, what s the most probable path that I took? And what s that probablty? Yet another famous D.P. trck, the VITERBI algorthm, gets ths. Queston 3: Learnng HMMs: Gven O O 2 O T, what s the maxmum lkelhood HMM that could have produced ths strng of observatons? Very very useful. Uses the E.M. Algorthm Copyrght , Andrew W. Moore Hdden Markov Models: Slde 24

25 Are H.M.M.s Useful? You bet!! Robot plannng + sensng when there s uncertanty Speech Recognton/Understandng Phones Words, Sgnal phones Gesture Recognton Economcs & Fnance. Many others Copyrght , Andrew W. Moore Hdden Markov Models: Slde 25

26 HMM Notaton (from Rabner s Survey) The states are labeled S S 2.. S N *L. R. Rabner, "A Tutoral on Hdden Markov Models and Selected Applcatons n Speech Recognton," Proc. of the IEEE, Vol.77, No.2, pp , 989. For a partcular tral. Let T be the number of observatons T s also the number of states passed through O = O O 2.. O T s the sequence of observatons Q = q q 2.. q T s the notaton for a path of states λ = N,M,{π, },{a j },{b (j)} s the specfcaton of an HMM Copyrght , Andrew W. Moore Hdden Markov Models: Slde 26

27 HMM Formal Defnton An HMM, λ, s a 5-tuple consstng of N the number of states M the number of possble observatons {π, π 2,.. π N } The startng state probabltes P(q 0 = S ) = π Ths s new. In our prevous example, start state was determnstc a a 2 a N a 2 a 22 a 2N : : : a N a N2 a NN b () b (2) b (M) b 2 () b 2 (2) b 2 (M) : : : b N () b N (2) b N (M) The state transton probabltes P(q t+ =S j q t =S )=a j The observaton probabltes P(O t =k q t =S )=b (k) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 27

28 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = /2 π 2 = /2 π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. a = 0 a 2 = /3 a 3 = 2/3 a 2 = /3 a 22 = 0 a 3 = 2/3 a 3 = /3 a 32 = /3 a 3 = /3 b (X) = /2 b (Y) = /2 b (Z) = 0 b 2 (X) = 0 b 2 (Y) = /2 b 2 (Z) = /2 b 3 (X) = /2 b 3 (Y) = 0 b 3 (Z) = /2 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 28

29 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: choce between S and S 2 a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = O 0 = q = O = q 2 = O 2 = Hdden Markov Models: Slde 29

30 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: choce between X and Y a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = q = O = q 2 = O 2 = Hdden Markov Models: Slde 30

31 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: Goto S 3 wth probablty 2/3 or S 2 wth prob. /3 a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = X q = O = q 2 = O 2 = Hdden Markov Models: Slde 3

32 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: choce between Z and X a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = X q = S 3 O = q 2 = O 2 = Hdden Markov Models: Slde 32

33 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: Each of the three next states s equally lkely a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = X q = S 3 O = X q 2 = O 2 = Hdden Markov Models: Slde 33

34 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: choce between Z and X a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = X q = S 3 O = X q 2 = S 3 O 2 = Hdden Markov Models: Slde 34

35 N = 3 M = 3 Here s an HMM S XY /3 2/3 /3 /3 ZX /3 2/3 S 3 π = ½ π 2 = ½ π 3 = 0 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore q 0 = S O 0 = X q = S 3 O = X q 2 = S 3 O 2 = Z Hdden Markov Models: Slde 35

36 N = 3 M = 3 State Estmaton S XY /3 2/3 π = ½ π 2 = ½ π 3 = 0 a = 0 a 2 = ⅓ a 3 = ⅔ a 2 = ⅓ a 22 = 0 a 3 = ⅔ a 3 = ⅓ a 32 = ⅓ a 3 = ⅓ b (X) = ½ b (Y) = ½ b (Z) = 0 b 2 (X) = 0 b 2 (Y) = ½ b 2 (Z) = ½ b 3 (X) = ½ b 3 (Y) = 0 b 3 (Z) = ½ Copyrght , Andrew W. Moore /3 /3 ZX /3 2/3 S 3 Z Y /3 S 2 Start randomly n state or 2 Choose one of the output symbols n each state at random. Let s generate a sequence of observatons: Ths s what the observer has to work wth q 0 =? O 0 = X q =? O = X q 2 =? O 2 = Z Hdden Markov Models: Slde 36

37 Prob. of a seres of observatons What s P(O) = P(O O 2 O 3 ) = P(O = X ^ O 2 = X ^ O 3 = Z)? Slow, stupd way: P( O) = = Q Paths of length 3 Q Paths of length 3 P( O Q) P( O Q) P( Q) S XY /3 2/3 /3 /3 ZX 2/3 S 3 /3 Z Y /3 S 2 How do we compute P(Q) for an arbtrary path Q? How do we compute P(O Q) for an arbtrary path Q? Copyrght , Andrew W. Moore Hdden Markov Models: Slde 37

38 Prob. of a seres of observatons What s P(O) = P(O O 2 O 3 ) = P(O = X ^ O 2 = X ^ O 3 = Z)? Slow, stupd way: P( O) = = Q Paths of length 3 Q Paths of length 3 P( O Q) P( O Q) P( Q) S XY /3 2/3 P(Q)= P(q,q 2,q 3 ) /3 /3 ZX 2/3 S 3 /3 Z Y /3 S 2 How do we compute P(Q) for an arbtrary path Q? How do we compute P(O Q) for an arbtrary path Q? =P(q ) P(q 2,q 3 q ) (chan rule) =P(q ) P(q 2 q ) P(q 3 q 2,q ) (chan) =P(q ) P(q 2 q ) P(q 3 q 2 ) (why?) Example n the case Q = S S 3 S 3 : =/2 * 2/3 * /3 = /9 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 38

39 Prob. of a seres of observatons What s P(O) = P(O O 2 O 3 ) = P(O = X ^ O 2 = X ^ O 3 = Z)? Slow, stupd way: P( O) = = Q Paths of length 3 Q Paths of length 3 P( O Q) P( O Q) P( Q) How do we compute P(Q) for an arbtrary path Q? How do we compute P(O Q) for an arbtrary path Q? S P(O Q) XY /3 2/3 = P(O O 2 O 3 q q 2 q 3 ) = P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) Example n the case Q = S S 3 S 3 : = P(X S ) P(X S 3 ) P(Z S 3 ) = =/2 * /2 * /2 = /8 /3 /3 ZX /3 2/3 S 3 Z Y /3 S 2 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 39

40 Prob. of a seres of observatons What s P(O) = P(O O 2 O 3 ) = P(O = X ^ O 2 = X ^ O 3 = Z)? Slow, stupd way: P( O) = = Q Paths of length 3 Q Paths of length 3 How do we compute P(Q) for an arbtrary path Q? How do we compute P(O Q) for an arbtrary path Q? Copyrght , Andrew W. Moore P( O Q) P( O Q) P( Q) S P(O Q) XY /3 2/3 /3 /3 ZX = P(O O 2 O 3 q q 2 q 3 ) /3 2/3 S 3 Z Y /3 P(O) would need 27 P(Q) S 2 = P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) computatons and 27 P(O Q) computatons Example n the case Q = S S 3 S 3 : = P(X S ) P(X S 3 ) P(Z S 3 ) = =/2 * /2 * /2 = /8 So let s be smarter A sequence of 20 observatons would need 3 20 = 3.5 bllon computatons and 3.5 bllon P(O Q) computatons Hdden Markov Models: Slde 40

41 The Prob. of a gven seres of observatons, non-exponental-cost-style Gven observatons O O 2 O T Defne α t () = P(O O 2 O t q t = S λ) where t T α t () = Probablty that, n a random tral, We d have seen the frst t observatons We d have ended up n S as the t th state vsted. In our example, what s α 2 (3)? Copyrght , Andrew W. Moore Hdden Markov Models: Slde 4

42 α t (): easy to defne recursvely α t () = P(O O 2 O t q t = S λ) α ( ) = P( O q = S ) ( q S ) P( O q S ) = P = = = π b( O ) ( j) = P ( OO... OO q = S ) α t + 2 t t + t + j = Copyrght , Andrew W. Moore Hdden Markov Models: Slde 42

43 α t (): easy to defne recursvely α t () = P(O O 2 O t q t = S λ) α α ( ) = P( O q = S ) ( q S) P( O q S) = P = = = π b( O ) ( j) = P ( OO... OO q = S ) t+ 2 t t+ t+ j N = = ( OO 2 Ot qt S Ot+ qt+ Sj) = P... = = N ( Ot+ qt+ Sj OO 2 Ot qt S) ( OO 2 Ot qt S) = P, =... = P... = ( Ot+ qt+ Sj qt S) αt( ) = P, = = ( ) = P q = S q = S P O q = = t+ j t t+ t+ ( ) α ( ) ab O j j t+ t ( S ) j αt( ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 43

44 α t (): easy to defne recursvely α t () = P(O O 2 O t q t = S λ) α α ( ) = P( O q = S ) ( q S) P( O q S) = P = = = π b( O ) ( j) = P ( OO... OO q = S ) t+ 2 t t+ t+ j N = = ( OO 2 Ot qt S Ot+ qt+ Sj) = P... = = N ( Ot+ qt+ Sj OO 2 Ot qt S) ( OO 2 Ot qt S) = P, =... = P... = ( Ot+ qt+ Sj qt S) αt( ) = P, = = ( ) = P q = S q = S P O q = = t+ j t t+ t+ ( ) α ( ) ab O j j t+ t ( S ) j αt( ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 44

45 α t (): easy to defne recursvely α t () = P(O O 2 O T q t = S λ) α α ( ) = P( O q = S ) ( q S) P( O q S) = P = = = π b( O ) ( j) = P ( OO... OO q = S ) t+ 2 t t+ t+ j N = = ( OO 2 Ot qt S Ot+ qt+ Sj) = P... = = N ( Ot+ qt+ Sj OO 2 Ot qt S) ( OO 2 Ot qt S) = P, =... = P... = ( Ot+ qt+ Sj qt S) αt( ) = P, = = ( ) = P q = S q = S P O q = = t+ j t t+ t+ ( ) α ( ) ab O j j t+ t ( S ) j αt( ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 45

46 n our example α α α t ( ) = P( O O O q = S λ) ( ) = b ( O ) 2.. π ( j) a b ( O ) α ( ) t+ = j j t+ t t t S XY /3 2/3 /3 /3 ZX 2/3 S 3 /3 Z Y /3 S 2 WE SAW O O 2 O 3 = X X Z α α α ( ) = α ( 2) = 0 α ( 3) ( ) = 0 α ( 2) = 0 α ( 3) 72 ( ) = 0 α ( 2) = α ( 3) = 0 = 2 = 72 Copyrght , Andrew W. Moore Hdden Markov Models: Slde 46

47 Easy Queston We can cheaply compute α t ()=P(O O 2 O t q t =S ) (How) can we cheaply compute P(O O 2 O t )? (How) can we cheaply compute P(q t =S O O 2 O t ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 47

48 Easy Queston We can cheaply compute α t ()=P(O O 2 O t q t =S ) (How) can we cheaply compute P(O O 2 O t )? (How) can we cheaply compute P(q t =S O O 2 O t ) N = α ( ) t α ( ) N j= t α ( j) t Copyrght , Andrew W. Moore Hdden Markov Models: Slde 48

49 Most probable path gven observatons What's most probable path gven What s Slow, stupd answer : = = argmax Q Q Q argmax argmax argmax Q P P P ( Q O O... O ) ( Q O O... O ) ( O O... O Q) P P ( O O... O ) P( Q) ( O O... O Q) P( Q) T T T T 2 O O T 2?... O T,.e. Copyrght , Andrew W. Moore Hdden Markov Models: Slde 49

50 Effcent MPP computaton We re gong to compute the followng varables: δ t ()= max P(q q 2.. q t- q t = S O.. O t ) q q 2..q t- = The Probablty of the path of Length t- wth the maxmum chance of dong all these thngs: OCCURING and ENDING UP IN STATE S and PRODUCING OUTPUT O O t DEFINE: mpp t () = that path So: δ t ()= Prob(mpp t ()) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 50

51 δ The Vterb Algorthm ( ) = max P ( qq... q q = S OO.. O ) t 2 t t 2 t qq... q 2 t ( ) = arg max P (... =.. ) mpp q q q q S O O O δ t 2 t t 2 t qq... q 2 t ( ) = max P( q = S O ) ( q S) P( O q S) = P = = ( ) = π b O Now, suppose we have all the δ t () s and mpp t () s for all. HOW TO GET δ t+ (j) and mpp t+ (j)? mpp t () Prob=δ t () mpp t (2) : mpp t (N)? Prob=δ t (2) Prob=δ t (N) S S 2 : S N S j Copyrght , Andrew W. Moore q t q t+ Hdden Markov Models: Slde 5

52 The Vterb Algorthm tme t tme t+ S : S j S : The most prob path wth last two states S S j s the most prob path to S, followed by transton S S j Copyrght , Andrew W. Moore Hdden Markov Models: Slde 52

53 The Vterb Algorthm tme t tme t+ S : S j S : The most prob path wth last two states S S j s the most prob path to S, followed by transton S S j What s the prob of that path? δ t () x P(S S j O t+ λ) = δ t () a j b j (O t+ ) SO The most probable path to S j has S * as ts penultmate state where *=argmax δ t () a j b j (O t+ ) Copyrght , Andrew W. Moore Hdden Markov Models: Slde 53

54 The Vterb Algorthm tme t tme t+ S : S j S : Copyrght , Andrew W. Moore The most prob path wth last two states S S j the most prob path to S, followed by transton S S j What s the prob of that path? δ t () x P(S S j O t+ λ) = δ t () a j b j (O t+ ) SO The most probable path to S j has S * as ts penultmate state where *=argmax δ t () a j b j (O t+ ) s Summary: δ t+ (j) = δ t (*) a j b j (O t+ ) mpp t+ (j) = mpp t+ (*)S * } wth * defned to the left Hdden Markov Models: Slde 54

55 What s Vterb used for? Sgnal words Classc Example Speech recognton: HMM observable s sgnal Hdden state s part of word formaton What s the most probable word gven ths sgnal? UTTERLY GROSS SIMPLIFICATION In practce: many levels of nference; not one bg jump. Copyrght , Andrew W. Moore Hdden Markov Models: Slde 55

56 HMMs are used and useful But how do you desgn an HMM? Occasonally, (e.g. n our robot example) t s reasonable to deduce the HMM from frst prncples. But usually, especally n Speech or Genetcs, t s better to nfer t from large amounts of data. O O 2.. O T wth a bg T. Observatons prevously n lecture O O 2.. O T Observatons n the next bt O O 2.. O T Copyrght , Andrew W. Moore Hdden Markov Models: Slde 56

57 Inferrng an HMM Remember, we ve been dong thngs lke Copyrght , Andrew W. Moore P(O O 2.. O T λ ) That λ s the notaton for our HMM parameters. Now We have some observatons and we want to estmate λ from them. AS USUAL: We could use () MAX LIKELIHOOD λ = argmax P(O.. O T λ) λ () BAYES Work out P( λ O.. O T ) and then take E[λ] or max P( λ O.. O T ) λ Hdden Markov Models: Slde 57

58 Max lkelhood HMM estmaton Defne γ t () = P(q t = S O O 2 O T, λ ) ε t (,j) = P(q t = S q t+ = S j O O 2 O T,λ ) γ t () and ε t (,j) can be computed effcently,j,t (Detals n Rabner paper) T γ t= T ε t= t t ( ) = (, j) = Expected number of transtons out of state durng the path Expected number of transtons from state to state j durng the path Copyrght , Andrew W. Moore Hdden Markov Models: Slde 58

59 γ ε t t T t= T t= ( ) = P( qt = S OO 2.. OT, λ) (, j) = P( q = S q = S O O.. O, λ) γ ε t t ( ) t t+ j 2 T = expected number of transtons out of state durng path (, j) = expected number of transtons out of and nto j durng path HMM estmaton Copyrght , Andrew W. Moore Notce = Estmate of b a (, j) ( ) Prob Wecan re- estmate (, j) ( ) We can also re- estmate j j T t t= T t= = ' expected frequency $ % " & j # ' expected frequency $ % " & # ( Next state S Ths state S ) ( O )! (See Rabner) k ε γ t ε t γ t Hdden Markov Models: Slde 59 j

60 EM for HMMs If we knew λ we could estmate EXPECTATIONS of quanttes such as Expected number of tmes n state Expected number of transtons j If we knew the quanttes such as Expected number of tmes n state Expected number of transtons j We could compute the MAX LIKELIHOOD estmate of λ = {a j },{b (j)}, π Roll on the EM Algorthm Copyrght , Andrew W. Moore Hdden Markov Models: Slde 60

61 EM 4 HMMs. Get your observatons O O T 2. Guess your frst λ estmate λ(0), k=0 3. k = k+ 4. Gven O O T, λ(k) compute γ t (), ε t (,j) t T, N, j N 5. Compute expected freq. of state, and expected freq. j 6. Compute new estmates of a j, b j (k), π accordngly. Call them λ(k+) 7. Goto 3, unless converged. Also known (for the HMM case) as the BAUM-WELCH algorthm. Copyrght , Andrew W. Moore Hdden Markov Models: Slde 6

62 Bad News There are lots of local mnma Good News The local mnma are usually adequate models of the data. Notce EM does not estmate the number of states. That must be gven. Often, HMMs are forced to have some lnks wth zero probablty. Ths s done by settng a j =0 n ntal estmate λ(0) Easy extenson of everythng seen today: HMMs wth real valued outputs Copyrght , Andrew W. Moore Hdden Markov Models: Slde 62

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden