Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecure Sldes for INTRDUCTIN T Machne Learnng ETHEM ALAYDIN The MIT ress, 2004 alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/2ml

CHATER 3: Hdden Marov Models

Inroducon Modelng dependences n npu; no longer d Sequences: Temporal: In speech; phonemes n a word (dconary), words n a senence (synax, semancs of he language). In handwrng, pen movemens Spaal: In a DNA sequence; base pars 3

Dscree Marov rocess N saes: S, S 2,..., S N Sae a me, q S Frs-order Marov (q + S q S, q - S,...) (q + S q S ) Transon probables a (q + S q S ) a 0 and Σ N a Inal probables π (q S ) Σ N π 4

Sochasc Auomaon T ( Q A, Π) ( q ) ( ) q q πq aq q La 2 q T qt 2 5

Example: Balls and Urns Three urns each full of balls of one color S : red, S 2 : blue, S 3 : green 0. 4 0. 3 0. 3 [ ] T Π 0. 5, 0. 2, 0. 3 A 0. 2 0. 6 0. 2 0. 0. 0. 8 { S,S,S,S } 3 ( A, Π) ( S ) ( S S ) ( S S ) ( S S ) π 3 a a 3 a 33 0. 5 0. 4 0. 3 0. 8 0. 048 3 3 3 6

Balls and Urns: Learnng Gven K example sequences of lengh T ˆπ â # # { sequences sarng wh S} # { sequences} { ransons from S o S } # { ransons from S } ( q S ) K T- ( q ) S and q S T- ( q S ) + 7

Hdden Marov Models Saes are no observable Dscree observaons {v,v 2,...,v M } are recorded; a probablsc funcon of he sae Emsson probables b (m) ( v m q S ) Example: In each urn, here are balls of dfferen colors, bu wh dfferen probables. For each observaon sequence, here are mulple sae sequences 8

HMM Unfolded n Tme 9

Elemens of an HMM N: Number of saes M: Number of observaon symbols A [a ]: N by N sae ranson probably marx B b (m): N by M observaon probably marx Π [π ]: N by nal sae probably vecor λ (A, B, Π), parameer se of HMM 0

Three Basc roblems of HMMs. Evaluaon: Gven λ, and, calculae ( λ) 2. Sae sequence: Gven λ, and, fnd Q * such ha (Q *, λ ) max Q (Q, λ ) 3. Learnng: Gven X{ }, fnd λ * such ha ( X λ * )max λ ( X λ ) (Rabner, 989)

2 Evaluaon Forward varable: ( ) ( ) () ( ) ( ) ( ) ( ) ( ) ( ) + + N T N b a b S,q Recurson : Inalzaon : α λ α α π α λ α L

3 Bacward varable: ( ) ( ) () () ( ) ( ) + + + N T T b a, S q Recurson : Inalzaon : β β β λ β L

Fndng he Sae Sequence γ Choose he sae ha has he hghes probably, for each me sep: q * arg max γ () No! ( ) ( q S, λ) α () β () N α β ( ) ( ) 4

Verb s Algorhm δ () max qq2 q- p(q q 2 q -,q S, λ) Inalzaon: δ () π b ( ), ψ () 0 Recurson: δ () max δ - ()a b ( ), ψ () argmax δ - ()a Termnaon: p * max δ T (), q T* argmax δ T () ah bacracng: q * ψ + (q + * ), T-, T-2,..., 5

Learnng ξ ξ (, ) ( q S,q S, λ) (, ) α + () a ( ) ( ) b + β + α ( ) a b ( ) β ( l) l l l + + Baum - Welch algorhm (EM) : z f q S 0 oherwse z f q S and q + 0 oherwse S 6

7 Baum-Welch (EM) [ ] ( ) [ ] ( ) () ( ) () ( ) ( ) ( ) () sep : M sep : E γ γ γ ξ γ π ξ γ K T K T m K T K T K v m bˆ, â K ˆ, z E z E

8 Connuous bservaons Dscree: Gaussan mxure (Dscreze usng -means): Connuous: ( ) ( ) 2 ~,, S q σ µ λ N Use EM o learn parameers, e.g., ( ) ( ) λ oherwse 0 f m m r M m v r m b, S q m ( ) ( ) ( ) ( ) l l l L l l,,, S q p, S q Σ µ λ λ N G G ~ ( ) ( ) γ γ µ ˆ

HMM wh Inpu Inpu-dependen observaons: ( ) ( ( ) 2 q S,x, λ ~ N g x θ σ ), Inpu-dependen ransons (Mela and Jordan, 996; Bengo and Frascon, 996): Tme-delay npu: ( q S q S, x ) x + (,..., ) f τ 9

Model Selecon n HMM Lef-o-rgh HMMs: A a 0 0 0 a a 2 22 0 0 a a a 3 23 33 0 a a a 0 24 34 44 In classfcaon, for each C, esmae ( λ ) by a separae HMM and use Bayes rule ( λ ) ( λ ) ( λ ) ( λ ) ( λ ) 20