Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1
Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete random variables (e.g., taking 3 discrete 1 0 0 values x t = { 0 0, 1 0, 0 }) 1 Markov Property: e.g. p(x t = p(x t x 1,.., x t 1 = p(x t x t 1 ) 1 0 0 x t 1 = 0 1 0 Stationary, Homogeneous or Time-Invariant if the distribution p x t x t 1 does not depend on t ) 2
Markov Chains with Discrete Random Variables p(x 1,.., x T ) = p(x 1 ) p(x t x t 1 ) t=2 What do we need in order to describe the whole procedure? T (1) A probability for the first frame/timestamp etc p(x 1 ). In order to define the probability we need to define the vector π = (π 1, π 2,, π K ) K p x 1 π = π c x 1c c=1 (2) A transition probability p x t x t 1. In order to define it we need a KxK transition matrix A = a ij K K 3 p x t x t 1, A = a jk x t 1j x tk j=1 k=1
Markov Chains with Discrete Random Variables (1) Using the transition matrix we can compute various probabilities regarding future p x t+1 x t 1, A = A 2 p x t+2 x t 1, A = A 3 p x t+n x t 1, A = A n (1) The stationary probability of a Markov Chain is very important (it s an indication of how probable ending in one of states in random move) (Google Page Rank). π T A = π T 4
Latent Variables in Markov Chain z 1 z 2 z 3 z T x 1 x 2 x 3 x T p(x t z t ) LxK emission probability matrix B 5
Latent Variables in a Markov Chain z 1 z 2 z 3 z T x 1 x 2 x 3 x T K p x t z t = N(x t μ κ, Σ κ ) z k k=1 K Gaussian distributions 6
Factorization of an HMM p(z 1,.., z T ) = p(z 1 ) p(z t z t 1 ) T t=2 p X, Z θ = p x 1, x 2,, x T, z 1, z 2,, z T θ T = p(x t z t ) p(z 1 ) p(z t z t 1 ) T t=1 t=2 7
8 What can we do with an HMM? Given a string of observations and parameters: (1) We want to find for a timestamp t the probabilities of z t given the observations that far. This process is called Filtering: p z t x 1, x 2,, x t (2) We want to find for a timestamp t the probabilities of z t given the whole string. This process is called Smoothing: p z t x 1, x 2,, x T (3) Given the observation string find the string of hidden variables that maximize the posterior. This process is called Decoding (Viterbi). arg max z1 z t p z 1, z 2,, z t x 1, x 2,, x t
Hidden Markov Models Filtering Smoothing Decoding Taken from Machine Learning: A Probabilistic Perspective by K. Murphy 9
Hidden Markov Models (4) Find the probability of the model. This process is called Evaluation p(x 1, x 2,, x T ) (5) Prediction p z t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t (6) EM Parameter estimation (Baum-Welch algorithm) A, π, θ 10
Hidden Markov Models p z t x 1, x 2,, x t arg max z1 z t p z 1,, z t x 1,, x t p z t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t p z t τ x 1, x 2,, x t p z t x 1, x 2,, x T 11
Linear Dynamical Systems (LDS) x 1 x 2 x 3 x T Up until now we had Markov Chains with discrete variables How can define a transition relationship with continuous valued variables? x 1 = μ 0 + u x t = Ax t 1 + v t u~n u 0, P 0 or v~n(v 0, Γ) x 1 ~N(x 1 μ 0, P 0 ) or x t ~N(x t Ax t 1, Γ) p(x 1 ) = N(x 1 μ 0, P 0 ) p(x t x t 1 = N(Ax t 1 0, Γ) 12
Latent Variable Models (Dynamic, Continuous) 13
Linear Dynamical Systems (LDS) y 1 y 2 y 3 y N Share a common linear structure x 1 x 2 x 3 x N x N We want to find the parameters: 14
Linear Dynamical Systems (LDS) y 1 y 2 y 3 y T x 1 x 2 x 3 x T x t = Wy t + e t Transition model y 1 = μ 0 + u y t = Ay t 1 + v t Parameters: e~n(e 0, Σ) u~n u 0, P 0 v~n(v 0, Γ) θ = {W, A, μ 0, Σ, Γ, P 0 } 15
Linear Dynamical Systems (LDS) y 1 y 2 y 3 y T x 1 x 2 x 3 x T First timestamp: p(y 1 ) = N(y 1 μ 0, P 0 ) Transition Probability : Emission: p(y t y t 1 p(x t y t = N(y t Ay t 1, Γ) = N(x t Wy t, Σ) 16
HMM vs LDS HMM Markov Chain with discrete latent variables Markov Chain LDS with continuous latent variables p y 1 π Kx1 p(y 1 ) = N(y 1 μ 0, P 0 ) p y t y t 1 A KxK p(y t y t 1 = N(Ay t 1 0, Γ) p x t y t B LxK p(x t y t = N(x t Wy t, Σ) or p x t y t K distributions 17
What can we do with LDS? Global Positioning System (GPS) 18
What can we do with LDS? {x t, y t, z t } clock 19
What can we do with LDS? We still have noisy measurements Noisy measurements Noisy path Filtered path Actual path 20
What can we do with LDS? Given a string of observations and parameters: (1) We want to find for a timestamp t the probabilities of z t given the observations that far. This process is called Filtering: p y t x 1, x 2,, x t (2) We want to find for a timestamp t the probabilities of z t given the whole time series. This process is called Smoothing: p y t x 1, x 2,, x T All above probabilities are Gaussians. Means and covariance matrices are computed recursively!!!! 21
What can we do with LDS? (3) Find the probability of the model. This process is called Evaluation: p(x 1, x 2,, x T ) (5) Prediction p y t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t (6) EM Parameter estimation (Baum-Welch algorithm) θ = {W, A, μ 0, Σ, Γ, P 0 } 22