Linear Dynamical Systems

Similar documents
STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

Linear Dynamical Systems (Kalman filter)

Latent Variable View of EM. Sargur Srihari

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Probabilistic Graphical Models: Exercises

Mixtures of Gaussians. Sargur Srihari

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Machine Learning CMU-10701

Note Set 5: Hidden Markov Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Hidden Markov models

Probabilistic Graphical Models

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Graphical Models Seminar

ECE521 Lecture 19 HMM cont. Inference in HMM

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

The Expectation-Maximization Algorithm

Linear Factor Models. Sargur N. Srihari

Pattern Recognition and Machine Learning

Basic math for biology

Probabilistic Graphical Models

Factor Analysis and Kalman Filtering (11/2/04)

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models. Terminology, Representation and Basic Problems

Dynamic Approaches: The Hidden Markov Model

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning

p(d θ ) l(θ ) 1.2 x x x

9 Forward-backward algorithm, sum-product on factor graphs

Dimension Reduction. David M. Blei. April 23, 2012

COMP90051 Statistical Machine Learning

p L yi z n m x N n xi

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Lecture 21: Spectral Learning for Graphical Models

Mobile Robot Localization

Probabilistic Graphical Models

Bayesian Machine Learning - Lecture 7

Basic Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Variational Inference (11/04/13)

Inference and estimation in probabilistic time series models

Hidden Markov Models

Hidden Markov Models. Terminology and Basic Algorithms

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

13: Variational inference II

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

PROBABILISTIC REASONING OVER TIME

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Models Part 2: Algorithms

Machine Learning 4771

CS281 Section 4: Factor Analysis and PCA

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Machine Learning for OR & FE

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Bayesian Networks BY: MOHAMAD ALSABBAGH

Autonomous Navigation for Flying Robots

MCMC and Gibbs Sampling. Sargur Srihari

Introduction to Machine Learning

Introduction to Graphical Models

Hidden Markov Models Part 1: Introduction

Dynamic models 1 Kalman filters, linearization,

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

MACHINE LEARNING 2 UGM,HMMS Lecture 7

GWAS V: Gaussian processes

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Machine Learning Techniques for Computer Vision

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Introduction to Machine Learning

Non-Parametric Bayes

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Mathematical Formulation of Our Example

Hidden Markov Models. Terminology and Basic Algorithms

Learning Energy-Based Models of High-Dimensional Data

Expectation Propagation Algorithm

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Networks. Dynamic. Bayesian. A Whirlwind Tour. Johannes Traa. Computational Audio Lab, UIUC

Variational Inference. Sargur Srihari

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030

Machine Learning for Data Science (CS4786) Lecture 12

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Variational Inference and Learning. Sargur N. Srihari

STA 4273H: Statistical Machine Learning

Chris Bishop s PRML Ch. 8: Graphical Models

Hidden Markov Models in Language Processing

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Introduction to Probabilistic Graphical Models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Latent Variable Models

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Multivariate Gaussians. Sargur Srihari

Mobile Robot Localization

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Transcription:

Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html

Two Models Described by Same Graph Latent variables Observations 1. Hidden Markov Model: If latent variables are discrete: Observed variables in a HMM may be discrete or continuous 2. Linear Dynamical Systems: If both latent and observed variables are Gaussian

Motivation Simple problem in practical settings Measure a value of unknown quantity z using a noisy sensor that returns a value x x is z plus zero mean Gaussian noise Given single measurement, best guess for z is to assume that z = x We can improve estimate for z by taking lots of measurements and averaging them Random noise terms tend to cancel each other

Time Varying Data Take regular measurements of x We have obtained x 1,.., x N We wish to find corresponding values z 1,.., z N Simply averaging them will reduce error due to random noise but we only get a single average estimate Can do better by averaging only last few measurements x N-L,..,x N If z is changing slowly and random noise error is high then use a relatively long window to average If z is varying rapidly and noise is small then use z N =x N Can do better by weighted averaging where more recent measurements make a greater contribution than less recent ones

Formalization of Intuitive Idea How to form weighted average? LDS are a systematic approach Probabilistic model that captures time evolution and measurement processes Apply inference and learning methods

HMM versus LDS Model Corresponding Factorization p ( X, where Z N N θ ) = p ( z1 π ) p ( z n z n 1, A) p ( x n = 2 m = 1 X = {x,..x }, Z = {z,..z }, θ = { π, A, φ } 1 N 1 N m z m, φ ) LDS has continuous latent variables In Sum-Product Algorithm summations become integrals General form of inference algorithms are same as for HMM Two models were developed independently

Need for Gaussian Distribution Key requirement is to retain efficient algorithm which is linear in length of chain Posterior probability of z n given x 1,..x n is denoted α(z n-1 ) It is multiplied by the transition probability p(z n z n- 1 ) and emission probability p(x n z n ) Then marginalize over z n-1 to obtain a distribution over z n So that distribution does not become more complex at each stage, we need the exponential form, e.g., Gaussian

Linear Gaussian Units Both latent variables and observed variables are multivariate Gaussian Their means are linear functions of the states of their parents in the graph Linear Gaussian units are equivalent to joint Gaussian distribution of the variables

Kalman Filter Since model is a tree-structured graph inference solved using sum-product Forward recursions, analogous to alpha messages in HMM, are known as Kalman filter equations Backward messages are known as Kalman smoother equations

Kalman Filter Applications Used widely in real-time tracking applications Used originally in the Apollo program navigation computer Also in computer vision

Machine Learning Joint Distribution over Latent and Observed variables },, { where ), ( ), ( ) ( ), ( 1 1 1 2 1 1 φ π φ π θ A }, θ,..z {z }, Z,..x {x X z x p A z z p z p Z X p N N m m N m N n n n = = = = = =

Linear Dynamical System It is a linear-gaussian model Joint distribution over all variables, as well as marginals and conditionals, is Gaussian Therefore sequence of individually most probable latent variable values is same as most probable latent sequence Thus there is no need to consider analog of Viterbi algorithm

LDS Formulation Transition and Emission probabilities are written in the general form: p(z z ) = N(z Az, Γ) n n 1 n n 1 p(x z ) = N(x Cz, Σ) n n 1 n n Initial latent variable has the form: p(x z) = N(z µ, V ) n n 1 0 0 Parameters of the model θ={a,γ,c,σ,µ 0,V 0 ) are determined using maximum likelihood through EM algorithm

Inference in LDS Finding marginal distributions for latent variables conditioned on observation sequence For given parameter settings wish to make predictions of next latent state z n and next observation x n conditioned on observed data x 1,..x n-1 Done efficiently using sum-product algorithm which are Kalman filter and smoother equations Joint distribution over all latent and observed variables is simply Gaussian Inference problem solved using marginals and conditionals of multivariate Gaussian

Uncertainty Reduction Kalman filter is a process of making successive predictions and then correcting predictions in light of new observations Before Observing x n p(z n-1 x 1,..,x n-1 ) Incorporates data upto Step n-1 p(z n x 1,..,x n-1 ) Diffusion arising from non-zero variance of transition probability p(z n z n-1 ) After Observing x n p(z n x 1,..,x n ) p(x n z n ) Not a density wrt z n and not normalized to one Inclusion of new data leads to revised distribution of state density shifted and narrowed compared to p(z n x 1,..,x n-1 )

Illustration: LDS to track moving object Blue points: True position of object in 2-D space at successive time steps Green points: Noisy measurements of the positions Red crosses: Means of inferred posterior distributions obtained by running Kalman filtering equations Red ellipses: Covariances of inferred positions correspond to contours of one std deviation

Estimated Path Red dashed line is closer to blue dashed line than green dashed line

Learning in LDS Determination of parameters θ={a,γ,c,σ,µ 0,V 0 ) using EM algorithm Denote the estimated parameter values at some particular cycle of the algorithm as θ old For these parameter values we run the inference algorithm to determine the posterior distribution of the latent variables p(z X, θ old ) In the M step, function Q(θ, θ old ) is maximized with respect to the components of θ

Further Topics in LDS Extensions of LDS Using mixture of K Gaussians Expanding graphical representation similar to HMMs Switching state-space model combines HMM and LDS Particle Filters If a non-gaussian emission density is used, use sampling methods to find a tractable inference algorithm

Other Topics on Sequential Data Sequential Data and Markov Models: http://www.cedar.buffalo.edu/~srihari/cse574/chap11/ch11.1- MarkovModels.pdf Hidden Markov Models: http://www.cedar.buffalo.edu/~srihari/cse574/chap11/ch11.2- HiddenMarkovModels.pdf Extensions of HMMs: http://www.cedar.buffalo.edu/~srihari/cse574/chap11/ch11.3- HMMExtensions.pdf Conditional Random Fields: http://www.cedar.buffalo.edu/~srihari/cse574/chap11/ch11.5- ConditionalRandomFields.pdf