Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1
Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position t, Appearance t, etc. E t = set of observable evidence variables at time t e.g., Imagepixels t This assumes discrete time; step size depends on problem Notation: X a:b = X a,x a+1,...,x b 1,X b AIMA Chapter 15, Sections 1 5 2
Markov processes (Markov chains) Construct a Bayes net from these variables: Markov assumption: X t depends on bounded subset of X 0:t 1 First-order Markov process: P(X t X 0:t 1 ) = P(X t X t 1 ) Second-order Markov process: P(X t X 0:t 1 ) = P(X t X t 2,X t 1 ) First order X t 2 X t 1 X t X t +1 X t +2 Second order X t 2 X t 1 X t X t +1 X t +2 Stationary process: transition model P(X t X t 1 ) fixed for all t AIMA Chapter 15, Sections 1 5 3
Hidden Markov Model (HMM) Sensor Markov assumption: P(E t X 0:t,E 1:t 1 ) = P(E t X t ) Stationary process: transition model P(X t X t 1 ) and sensor model P(E t X t ) fixed for all t HMM is a special type of Bayes net, X t is single discrete random variable: with joint probability distribution P(X 0:t,E 1:t ) = P(X 0 )Π t i=1p(x i X i 1 )P(E i X i ) AIMA Chapter 15, Sections 1 5 4
Example Rain t 1 R t 1 t f P(R t ) 0.7 0.3 Rain t Rain t +1 Rt t f P(U t ) 0.9 0.2 Umbrella t 1 Umbrella t Umbrella t +1 First-order Markov assumption not exactly true in real world! Possible fixes: 1. Increase order of Markov process 2. Augment state, e.g., add Temp t, Pressure t Example: robot motion. Augment position and velocity with Battery t AIMA Chapter 15, Sections 1 5 5
Inference tasks Filtering: P(X t e 1:t ) belief state input to the decision process of a rational agent Prediction: P(X t+k e 1:t ) for k > 0 evaluation of possible action sequences; like filtering without the evidence Smoothing: P(X k e 1:t ) for 0 k < t better estimate of past states, essential for learning Most likely explanation: arg max x1:t P(x 1:t e 1:t ) speech recognition, decoding with a noisy channel AIMA Chapter 15, Sections 1 5 6
Filtering Aim: devise a recursive state estimation algorithm: P(X t+1 e 1:t+1 ) = f(e t+1,p(x t e 1:t )) AIMA Chapter 15, Sections 1 5 7
Filtering Aim: devise a recursive state estimation algorithm: P(X t+1 e 1:t+1 ) = f(e t+1,p(x t e 1:t )) P(X t+1 e 1:t+1 ) = P(X t+1 e 1:t,e t+1 ) = αp(e t+1 X t+1,e 1:t )P(X t+1 e 1:t ) = αp(e t+1 X t+1 )P(X t+1 e 1:t ) AIMA Chapter 15, Sections 1 5 8
Filtering Aim: devise a recursive state estimation algorithm: P(X t+1 e 1:t+1 ) = f(e t+1,p(x t e 1:t )) P(X t+1 e 1:t+1 ) = P(X t+1 e 1:t,e t+1 ) = αp(e t+1 X t+1,e 1:t )P(X t+1 e 1:t ) = αp(e t+1 X t+1 )P(X t+1 e 1:t ) I.e., prediction + estimation. Prediction by summing out X t : P(X t+1 e 1:t+1 ) = αp(e t+1 X t+1 )Σ xt P(X t+1,x t e 1:t ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t,e 1:t )P(x t e 1:t ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t )P(x t e 1:t ) AIMA Chapter 15, Sections 1 5 9
Filtering Aim: devise a recursive state estimation algorithm: P(X t+1 e 1:t+1 ) = f(e t+1,p(x t e 1:t )) P(X t+1 e 1:t+1 ) = P(X t+1 e 1:t,e t+1 ) = αp(e t+1 X t+1,e 1:t )P(X t+1 e 1:t ) = αp(e t+1 X t+1 )P(X t+1 e 1:t ) I.e., prediction + estimation. Prediction by summing out X t : P(X t+1 e 1:t+1 ) = αp(e t+1 X t+1 )Σ xt P(X t+1,x t e 1:t ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t,e 1:t )P(x t e 1:t ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t )P(x t e 1:t ) f 1:t+1 = Forward(f 1:t,e t+1 ) where f 1:t =P(X t e 1:t ) Time and space constant (independent of t) AIMA Chapter 15, Sections 1 5 10
Filtering example 0.500 0.500 0.627 0.373 True False 0.500 0.500 0.818 0.182 0.883 0.117 Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 P(X t+1 e 1:t+1 ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t )P(x t e 1:t ) R t 1 P(R t ) t 0.7 f 0.3 R t P(U t ) t 0.9 f 0.2 AIMA Chapter 15, Sections 1 5 11
Smoothing X 0 X 1 X k X t E1 E k Et Divide evidence e 1:t into e 1:k, e k+1:t : P(X k e 1:t ) = P(X k e 1:k,e k+1:t ) = αp(x k e 1:k )P(e k+1:t X k,e 1:k ) = αp(x k e 1:k )P(e k+1:t X k ) = αf 1:k b k+1:t AIMA Chapter 15, Sections 1 5 12
Smoothing X 0 X 1 X k X t E1 E k Et Divide evidence e 1:t into e 1:k, e k+1:t : P(X k e 1:t ) = P(X k e 1:k,e k+1:t ) = αp(x k e 1:k )P(e k+1:t X k,e 1:k ) = αp(x k e 1:k )P(e k+1:t X k ) = αf 1:k b k+1:t Backward message computed by a backwards recursion: P(e k+1:t X k ) = Σ xk+1 P(e k+1:t X k,x k+1 )P(x k+1 X k ) = Σ xk+1 P(e k+1:t x k+1 )P(x k+1 X k ) = Σ xk+1 P(e k+1 x k+1 )P(e k+2:t x k+1 )P(x k+1 X k ) AIMA Chapter 15, Sections 1 5 13
Smoothing example 0.500 0.500 0.627 0.373 True False 0.500 0.500 0.818 0.182 0.883 0.117 forward 0.883 0.117 0.883 0.117 smoothed 0.690 0.410 1.000 1.000 backward Rain 0 Rain 1 Rain 2 Umbrella 1 Umbrella 2 Forward backward algorithm: cache forward messages along the way Time linear in t (polytree inference), space O(t f ) AIMA Chapter 15, Sections 1 5 14
Most likely explanation AIMA Chapter 15, Sections 1 5 15
Most likely explanation Most likely sequence sequence of most likely states!!!! Most likely path to each x t+1 = most likely path to some x t plus one more step max P(x x 1...x 1,...,x t,x t+1 e 1:t+1 ) t = P(e t+1 X t+1 ) max P(Xt+1 x x t ) max P(x t x 1...x 1,...,x t 1,x t e 1:t ) t 1 Identical to filtering, except f 1:t replaced by m 1:t = max x 1...x t 1 P(x 1,...,x t 1,X t e 1:t ), I.e., m 1:t (i) gives the probability of the most likely path to state i. Update has sum replaced by max, giving the Viterbi algorithm: m 1:t+1 = P(e t+1 X t+1 ) max x t (P(X t+1 x t )m 1:t ) AIMA Chapter 15, Sections 1 5 16
Viterbi example Rain 1 Rain 2 Rain 3 Rain 4 Rain 5 state space paths true false true false true false true false true false umbrella true true false true true most likely paths.8182.5155.0361.0334.0210.1818.0491.1237.0173.0024 m 1:1 m 1:2 m 1:3 m 1:4 m 1:5 AIMA Chapter 15, Sections 1 5 17
Example Umbrella Problems Filtering: P(X t+1 e 1:t+1 ) = αp(e t+1 X t+1 )Σ xt P(X t+1 x t )P(x t e 1:t ) =: f 1:t+1 Smoothing: P(X k e 1:t ) = αf 1:k b k+1:t P(e k+1:t X k ) = Σ xk+1 P(e k+1 x k+1 )P(e k+2:t x k+1 )P(x k+1 X k ) =: b k+1:t R t 1 P(R t ) t 0.7 f 0.3 R t P(U t ) t 0.9 f 0.2 P(R 3 u 1, u 2, u 3 ) =? arg maxp(r 1:3 u 1, u 2, u 3 ) =? R 1:3 P(R 2 u 1, u 2, u 3 ) =? AIMA Chapter 15, Sections 1 5 18