Hdden Markov Model Cheat Sheet (GIT ID: dc2f391536d67ed5847290d5250d4baae103487e) Ths document s a cheat sheet on Hdden Markov Models (HMMs). It resembles lecture notes, excet that t cuts to the chase a lttle faster by defnng terms and dvulgng the useful formulas as quckly as ossble, n the lace of gentle exlanatons and ntutons. 1 Notaton HMM: states are not observable. observatons are robablstc functon of state state transtons are robablstc N: number of hdden states, numbered 1,..., N M: number of outut symbols, numbered 1,..., M T : number of tme stes n sequence of states and sequence of outut symbols q: sequence of states traversed, q (q 1,..., q t,..., q T ) where each q t {1,..., N} o: observed outut symbol sequence, o (o 1,..., o t,..., o T ) where o t {1,..., M} A: state transton matrx, a j P (q t+1 j q t ) B: er-state observaton dstrbutons, b (k) P (o t k q t ) π: ntal state dstrbuton, π P (q 1 ) λ: all numerc arameters defnng the HMM consdered together, λ (A, B, π) ndces:, j ndex states; k ndexes outut symbols; t ndexes tme We roceed to revew the solutons to the three bg HMM roblems: fndng P ( o λ), fndng q argmax q P ( q o, λ), and fndng λ argmax λ P ( o λ). 2 Probablty of sequence of observatons We wsh to calculate P ( o λ).
Defnton: α t () P (o 1,..., o t, q t λ). (In words: the robablty of observng the head of length t of the observatons and beng n state after that.) Intalzaton: α 1 () π b (o 1 ). ( N ) Loo: α t+1 (j) α t () a j b j (o t+1 ) 1 At termnaton, P ( o λ) N α T (). 1 Note: comlexty s O(N 2 T ) tme, O(NT ) sace. Note: calculatng the α values s called the forward algorthm. 3 Otmal state sequence from observatons Fnd q argmax q P ( q o, λ), the most lkely sequence of hdden states gven the observatons. Note: calculatng the most lkely sequence of states s called a Vterb algnment. Defnton: β t () P (o t+1, o t+2,..., o T q t, λ). (In words: the robablty that startng n state at tme t, then generatng the remanng tal of the observatons.) Intalzaton: β T () 1. N Loo: β t () a j b j (o t+1 )β t+1 (j). Calculated backwards: t T 1, T 2,..., 1. j1 Note: calculatng the β values s called the backward algorthm. Defne: δ t () max P (q 1,..., q t 1, q t, o 1,..., o t λ). q 1,...,q t 1 (In words: the robablty of generatng the head of length t of observables and havng gone through the most lkely states for the frst t 1 stes and endng u n state.) Intalzaton: δ 1 () π b (o 1 ) Loo: δ t (j) (max δ t 1 () a j ) b j (o t ) Intalzaton: ψ 1 () 0 Loo: ψ t (j) argmax δ t 1 () a j Termnaton: P max δ T (), the robablty of generatng the entre sequence of observables va the most robable sequence of states. 2
Termnaton: qt argmax δ T (), the most robable fnal state. Loo to fnd state sequence ( backtrackng ): qt ψ t+1 (qt+1) Note: ψ s wrtten s n Englsh, and ronounced sa. 3.1 Useful roerty of α and β Note that α t () β t () P (o 1,..., o t, q t λ) P (o t+1, o t+2,..., o T q t, λ) P (o 1,..., o t, o t+1, o t+2,..., o T, q t λ) P ( o, q t λ) P ( o λ) Ths logc holds for any t, so the gven sum should be the same for any t. (The earler formula for P ( o λ) was for the secal case t T snce β T () 1.) Ths formula thus rovdes a useful debuggng test for HMM rograms. 4 Estmate model arameters Gven o fnd λ argmax λ P ( o λ). Not an analytc soluton. Instead, we start wth a guess of λ, tycally random, then terate λ to a local maxmum, usng an EM algorthm. At each ste we reestmate a new λ, called ˆλ, whch has an ncreased robablty of generatng o. (Or f already at a (ossbly local) otmum, the same robablty.) Note: ths rocess s called Baum-Welch Re-Estmaton. Tycal stong rule for ths re-estmaton loo s: sto when log P ( o ˆλ) log P ( o λ) < ɛ for some small ɛ Note: debuggng hnt, P ( o ˆλ) P ( o λ) should always be true. Defnton: γ t () P (q t o, λ). (In words: the robablty of havng been n state at tme t.) γ t () α t() β t () P ( o λ) 3
Defnton: ξ t (, j) P (q t, q t+1 j o, λ). (In words: the robablty of havng transtoned from state to j at tme t.) ξ t (, j) α t() a j b j (o t+1 ) β t+1 (j) P ( o λ) Note: γ t() 1 and j ξ t(, j) 1. Note: ξ s wrtten x n Englsh, and ronounced k sa. We wrte # to abbrevate the hrase exected number of tmes T # state vsted: γ t () T 1 # transtons from state to state j s: ξ t (, j) ˆπ â j ˆbj (k) γ 1() γ 1 (j) γ 1() j # transtons state to state j # transtons from state # n state j and outut symbol k # n state j T 1 ξ t (, j) T 1 γ t () T [o t k] γ t (j) T γ t (j) where we use Knuth notaton, [boolean condton] 1 or 0 deendng on whether boolean condton s true or false. 4.1 Tranng on multle sequences The above s for one outut observable sequence o. If there are multle such observable outut sequences,.e. a tranng set of them, then the basc varables defned above (α, β, etc) are comuted for each of them. Excet for the re-estmaton formulas, whch need to sum over them as an outer sum around the sums shown. We use a suerscrt () to ndcate values comuted for observable sequence o (). Note that λ and N and M are ndeendent of, but T s not snce each strng n the tranng set mght be a dfferent length, T () dm o (). 4
The udate formulas become: γ () 1 () ˆπ â j 1 # transtons state to state j # transtons from state T () 1 T () 1 ξ () t (, j) γ () t () ˆbj (k) # n state j and outut symbol k # n state j T () [o () t k] γ () t (j) T () γ () t (j) 5