Linear Dynamical Systems (Kalman filter)

Similar documents
Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Linear Dynamical Systems

STA 4273H: Statistical Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

STA 414/2104: Machine Learning

Note Set 5: Hidden Markov Models

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Master 2 Informatique Probabilistic Learning and Data Analysis

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Probabilistic Graphical Models

Hidden Markov Models. Terminology, Representation and Basic Problems

Introduction to Machine Learning

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Dynamic Approaches: The Hidden Markov Model

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Parametric Models Part III: Hidden Markov Models

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

1 What is a hidden Markov model?

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

State Space and Hidden Markov Models

Basic math for biology

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

order is number of previous outputs

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov models 1

Markov Chains and Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

PROBABILISTIC REASONING OVER TIME

Hidden Markov Models and Gaussian Mixture Models

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

9 Forward-backward algorithm, sum-product on factor graphs

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Hidden Markov Models

Probabilistic Machine Learning

Hidden Markov Models. Terminology and Basic Algorithms

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Gaussian Processes for Audio Feature Extraction

Lecture 9. Intro to Hidden Markov Models (finish up)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models,99,100! Markov, here I come!

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Multiscale Systems Engineering Research Group

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Machine Learning 4771

Advanced Data Science

Graphical Models for Collaborative Filtering

Hidden Markov Models

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Graphical Models Seminar

Probabilistic Reasoning in Deep Learning

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Statistical NLP: Hidden Markov Models. Updated 12/15

Machine Learning for OR & FE

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov models

Lecture 3: ASR: HMMs, Forward, Viterbi

Math 350: An exploration of HMMs through doodles.

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Bayesian Networks BY: MOHAMAD ALSABBAGH

HMM: Parameter Estimation

The Expectation Maximization or EM algorithm

COMP90051 Statistical Machine Learning

Density Estimation. Seungjin Choi

Inference and estimation in probabilistic time series models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Hidden Markov Models

Expectation-Maximization (EM) algorithm

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Lecture 6: Bayesian Inference in SDE Models

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CS 7180: Behavioral Modeling and Decision- making in AI

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Hidden Markov Modelling

Chapter 16. Structured Probabilistic Models for Deep Learning

p L yi z n m x N n xi

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models. Three classic HMM problems

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Factor Analysis and Kalman Filtering (11/2/04)

Automatic Speech Recognition (CS753)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Sean Escola. Center for Theoretical Neuroscience

CSCI-567: Machine Learning (Spring 2019)

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Probabilistic & Unsupervised Learning. Latent Variable Models for Time Series

CS532, Winter 2010 Hidden Markov Models

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Reduced-Rank Hidden Markov Models

ECE521 Lecture 19 HMM cont. Inference in HMM

CSE 473: Artificial Intelligence

Transcription:

Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1

Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete random variables (e.g., taking 3 discrete 1 0 0 values x t = { 0 0, 1 0, 0 }) 1 Markov Property: e.g. p(x t = p(x t x 1,.., x t 1 = p(x t x t 1 ) 1 0 0 x t 1 = 0 1 0 Stationary, Homogeneous or Time-Invariant if the distribution p x t x t 1 does not depend on t ) 2

Markov Chains with Discrete Random Variables p(x 1,.., x T ) = p(x 1 ) p(x t x t 1 ) t=2 What do we need in order to describe the whole procedure? T (1) A probability for the first frame/timestamp etc p(x 1 ). In order to define the probability we need to define the vector π = (π 1, π 2,, π K ) K p x 1 π = π c x 1c c=1 (2) A transition probability p x t x t 1. In order to define it we need a KxK transition matrix A = a ij K K 3 p x t x t 1, A = a jk x t 1j x tk j=1 k=1

Markov Chains with Discrete Random Variables (1) Using the transition matrix we can compute various probabilities regarding future p x t+1 x t 1, A = A 2 p x t+2 x t 1, A = A 3 p x t+n x t 1, A = A n (1) The stationary probability of a Markov Chain is very important (it s an indication of how probable ending in one of states in random move) (Google Page Rank). π T A = π T 4

Latent Variables in Markov Chain z 1 z 2 z 3 z T x 1 x 2 x 3 x T p(x t z t ) LxK emission probability matrix B 5

Latent Variables in a Markov Chain z 1 z 2 z 3 z T x 1 x 2 x 3 x T K p x t z t = N(x t μ κ, Σ κ ) z k k=1 K Gaussian distributions 6

Factorization of an HMM p(z 1,.., z T ) = p(z 1 ) p(z t z t 1 ) T t=2 p X, Z θ = p x 1, x 2,, x T, z 1, z 2,, z T θ T = p(x t z t ) p(z 1 ) p(z t z t 1 ) T t=1 t=2 7

8 What can we do with an HMM? Given a string of observations and parameters: (1) We want to find for a timestamp t the probabilities of z t given the observations that far. This process is called Filtering: p z t x 1, x 2,, x t (2) We want to find for a timestamp t the probabilities of z t given the whole string. This process is called Smoothing: p z t x 1, x 2,, x T (3) Given the observation string find the string of hidden variables that maximize the posterior. This process is called Decoding (Viterbi). arg max z1 z t p z 1, z 2,, z t x 1, x 2,, x t

Hidden Markov Models Filtering Smoothing Decoding Taken from Machine Learning: A Probabilistic Perspective by K. Murphy 9

Hidden Markov Models (4) Find the probability of the model. This process is called Evaluation p(x 1, x 2,, x T ) (5) Prediction p z t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t (6) EM Parameter estimation (Baum-Welch algorithm) A, π, θ 10

Hidden Markov Models p z t x 1, x 2,, x t arg max z1 z t p z 1,, z t x 1,, x t p z t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t p z t τ x 1, x 2,, x t p z t x 1, x 2,, x T 11

Linear Dynamical Systems (LDS) x 1 x 2 x 3 x T Up until now we had Markov Chains with discrete variables How can define a transition relationship with continuous valued variables? x 1 = μ 0 + u x t = Ax t 1 + v t u~n u 0, P 0 or v~n(v 0, Γ) x 1 ~N(x 1 μ 0, P 0 ) or x t ~N(x t Ax t 1, Γ) p(x 1 ) = N(x 1 μ 0, P 0 ) p(x t x t 1 = N(Ax t 1 0, Γ) 12

Latent Variable Models (Dynamic, Continuous) 13

Linear Dynamical Systems (LDS) y 1 y 2 y 3 y N Share a common linear structure x 1 x 2 x 3 x N x N We want to find the parameters: 14

Linear Dynamical Systems (LDS) y 1 y 2 y 3 y T x 1 x 2 x 3 x T x t = Wy t + e t Transition model y 1 = μ 0 + u y t = Ay t 1 + v t Parameters: e~n(e 0, Σ) u~n u 0, P 0 v~n(v 0, Γ) θ = {W, A, μ 0, Σ, Γ, P 0 } 15

Linear Dynamical Systems (LDS) y 1 y 2 y 3 y T x 1 x 2 x 3 x T First timestamp: p(y 1 ) = N(y 1 μ 0, P 0 ) Transition Probability : Emission: p(y t y t 1 p(x t y t = N(y t Ay t 1, Γ) = N(x t Wy t, Σ) 16

HMM vs LDS HMM Markov Chain with discrete latent variables Markov Chain LDS with continuous latent variables p y 1 π Kx1 p(y 1 ) = N(y 1 μ 0, P 0 ) p y t y t 1 A KxK p(y t y t 1 = N(Ay t 1 0, Γ) p x t y t B LxK p(x t y t = N(x t Wy t, Σ) or p x t y t K distributions 17

What can we do with LDS? Global Positioning System (GPS) 18

What can we do with LDS? {x t, y t, z t } clock 19

What can we do with LDS? We still have noisy measurements Noisy measurements Noisy path Filtered path Actual path 20

What can we do with LDS? Given a string of observations and parameters: (1) We want to find for a timestamp t the probabilities of z t given the observations that far. This process is called Filtering: p y t x 1, x 2,, x t (2) We want to find for a timestamp t the probabilities of z t given the whole time series. This process is called Smoothing: p y t x 1, x 2,, x T All above probabilities are Gaussians. Means and covariance matrices are computed recursively!!!! 21

What can we do with LDS? (3) Find the probability of the model. This process is called Evaluation: p(x 1, x 2,, x T ) (5) Prediction p y t+δ x 1, x 2,, x t p x t+δ x 1, x 2,, x t (6) EM Parameter estimation (Baum-Welch algorithm) θ = {W, A, μ 0, Σ, Γ, P 0 } 22