Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013 1 / 23
Forward-Backward Algorithm for HMMs Sum-Product Algorithm for HMMs Viterbi Algorithm 2 / 23
Introduction Sequential data rainfall measurements currency exchange rate acoustics/ speech recognition nucleotide base pair sequences (DNA) sequence of characters in an English sentence 3 / 23
State-Space Model z 0 z 1 z 2 z n z N x 1 x 2 x n x N Figure: Bayesian network for a state space model (z i+1 z i 1 z i ) N p(z 0,..., z N, x 1,..., x N ) = p(z 0 ) p(z i z i 1 )p(x i z i ) (1) i=1 Hidden Markov model: Discrete state-space model Linear dynamical system: Gaussian state-space model. 4 / 23
Forward-Backward Algorithm Find the posterior marginals of the hidden state variables z n given the observations X = {x 1,..., x N } in a SSM. Tree structure: Computation of exact marginals Focus on discrete random variables (HMM). 5 / 23
Derivation of Forward-Backward Algorithm Use of chain rule and conditional independence. p(z n X ) = p(z n, X ) p(x ) = p(x 0,..., x n, z n )p(x n+1,..., x N z n ) p(x ) = α(z n)β(z n ) p(x ) (2) (3) (4) 6 / 23
Forward Recursion α(z n ) = p(x 0,..., x n, z n ) (5) = p(x 0,..., x n z n )p(z n ) (6) = p(x 0,..., x n 1 z n )p(x n z n )p(z n ) (7) = p(x n z n )p(x 0,..., x n 1, z n ) (8) = p(x n z n ) p(x 0,..., x n 1, z n 1, z n ) z n 1 (9) = p(x n z n ) z n 1 p(x 0,..., x n 1, z n z n 1 )p(z n 1 ) (10) = p(x n z n ) z n 1 p(x 0,..., x n 1 z n 1 )p(z n z n 1 )p(z n 1 ) (11) = p(x n z n ) z n 1 p(x 0,..., x n 1, z n 1 )p(z n z n 1 ) (12) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) (13) 7 / 23
Backward Recursion β(z n ) = p(x n+1,..., x N z n ) (14) = z n+1 p(x n+1,..., x N, z n+1 z n ) (15) = z n+1 p(x n+1,..., x N z n, z n+1 )p(z n+1 z n ) (16) = z n+1 p(x n+2,..., x N z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (17) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (18) 8 / 23
Forward-Backward Algorithm Forward recursion α(z n ) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) (19) Backward recursion β(z n ) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (20) Starting conditions α(z 0 ) = p(z 0 ), β(z N ) = 1. p(z n X ) = α(z n)β(z n ) p(x ) (21) 9 / 23
Summary Efficient recursive forward-backward algorithm to compute the posterior marginals of the hidden state variables given the observations Computational costs for K states, N steps: O(K 2 N) Brute force costs: O(K N ) 10 / 23
Sum-Product Algorithm for HMMs z 0 z 1 z n 1 z n x 1 x n 1 x n Figure: Bayesian network for a state space model (z i+1 z i 1 z i ) 11 / 23
Sum-Product Algorithm for HMMs Ψ 0 z 1 z n 1 Ψn z n g 1 g n 1 g n x 1 x n 1 x n Figure: Factor graph for a state space model (z i+1 z i 1 z i ) Absorb z 0 in Ψ 0. Replace directed edged by undirected edges Insert factor nodes 12 / 23
Sum-Product Algorithm for HMMs f 1 z 1 z n 1 fn z n Figure: Factor graph for a HMM Absorb the emission probability into transition probability factors f n (z n 1, z n ) = p(z n z n 1 )p(x n z n ) (22) f 1 (z 1 ) = p(z 0 )p(z 1 z 0 )p(x 1 z 1 ) (23) 13 / 23
Sum-Product Algorithm for HMMs f 1 z 1 z n 1 fn z n Figure: Factor graph for a HMM µ fn 1 z n 1 (z n 1 ) = µ zn 1 f n (z n 1 ) (24) µ fn zn (z n ) = f (z }{{} n 1, z n )µ zn 1 f n (z n ) (25) z n 1 α(z n) Analogously for β messages. = z n 1 p(z n z n 1 )p(x n z n ) µ zn 1 f n (z n 1 ) }{{} α(z n 1 ) (26) 14 / 23
Sum-Product Algorithm for HMMs The forward-backward algorithm is the sum-product algorithm in a HMM. 15 / 23
Viterbi Algorithm Find most probable sequence of states {ẑ 1,..., ẑ N } given all observations {x 1,..., x N } Max-product algorithm {ˆx 1,..., ˆx N } = argmax z 1,...,z N p(z 1,..., z N, x 1,..., x N ) (27) Max-sum algorithm {ˆx 1,..., ˆx N } = argmax z 1,...,z N log p(z 1,..., z N, x 1,..., x N ) (28) For HMMs, this is knows as the Viterbi algorithm 16 / 23
Derivation of Viterbi Algorithm Using chain rule and conditional independence. Use factor graph notation and check equivalence. Rule of thumb: Exchange summations by maximizations in the sum-product algorithm. Take log of the argument for the max-sum algorithm. 17 / 23
Recursion for Viterbi Algorithm f 1 z 1 z n f n+1 z n+1 Figure: Factor graph for a HMM µ fn+1 z n+1 (z n+1 ) }{{} ω(z n+1 ) µ zn f n+1 (z n ) = µ fn z n (z n ) (29) (30) = max z n { log fn+1 (z n, z n+1 ) + µ zn f n+1 (z n ) } (31) = max z n {log p(z n+1 z n )p(x n+1 z n+1 ) + µ fn z n (z n )} (32) = log p(x n+1 z n+1 ) + max{log p(z n+1 z n ) + µ fn z z n (z n )} (33) n }{{} ω(z n) 18 / 23
Viterbi Algorithm (Max-Sum) Initialization Forward recursion ω(z 0 ) = log p(z 0 ) (34) ω(z n+1 ) = log p(x n+1 z n+1 ) + max z n {log p(z n+1 z n ) + ω(z n )} (35) Store (most probable previous state) φ(z n ) = argmax z n 1 {log p(x n z n ) + log p(z n z n 1 ) + ω(z n )} (36) At the root node z max N = argmax z N µ fn z N (z N ) (37) Backtracking 19 / 23
Intuition Number of possible paths through trellis O(K N ) Figure: Two paths in a fragment of a trellis Store only the most probable incoming path for each state in each step Backtracking 20 / 23
Example (Wikipedia) Figure: Doctor in a village. Observed sequence {normal, cold, dizzly} 21 / 23
Summary Sum-product/ Forward-backward algorithm: Compute state posterior in a HMM Max-sum/ Max-product/ Viterbi algorithm: Compute most likely sequence of hidden states in a HMM Costs linear instead of exponential in N. Not covered: Baum-Welch algorithm: Estimate parameters in a HMM 22 / 23
References Bishop, PRML, Chapters 8.4 and 13. Further reading Factor graphs (with a slightly improved notation and emphasis on linear dynamical systems): Loeliger et al. (2007), The Factor Graph Approach to Model-Based Signal Processing 23 / 23