Machine Learning 4771

Size: px
Start display at page:

Download "Machine Learning 4771"

Transcription

1 Machine Learning 4771 Instructor: ony Jebara

2 Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion

3 Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object pose y=pixel coordinates Industrial: x=state of steel y=temperature/pressure Military: x=target pos n y=radar returns Bio: x=protein levels y=gene expression levels Vision example (Rao)

4 Linear Dynamical Systems A Linear Dynamical System (LDS) or Kalman Filter (KF) has linear state dynamics and linear outputs: x t = Ax t 1 +Gw t w t N. 0,Q y t = Cx t + v t v t N. 0,R x t = state n y t = observed output p w v t = measurement noise p t = process noise n Called Linear-Gauss Markov Model (See Markov property?) Stationary LDS: A,C,G,Q,R do not change with t Non-Stationary LDS: they A,C,G,Q,R depend on t

5 Gaussian Marginals/Conditionals Conditional & marginal from joint: See derivation in Jordan 12.4: p( z µ,σ) 1 = exp 1 ( 2π) D/2 Σ z µ 2 p( x,y) 1 = exp 1 x ( 2π) D/2 Σ 2 y µ x µ y p( x) 1 = exp 1 ( 2π) x µ D x /2 Σ 2 x xx p y x = p ( x,y ) p( x) = p ( x,y ) p( x,y) p y x Σ 1 ( z µ ) Σ 1 xx Σ xx Σ xy x µ x Σ yx Σ yy ( 1 ( x µ x ),Σ yy Σ yx Σ 1 xx Σ ) xy = N y µ y + Σ yx Σ xx y = µ y + Σ yx Σ 1 xx y = Σ yx Σ 1 xx x + noise ( x µ ) x + noise 1 x y if zero-mean y µ x µ y

6 LDS as a Gaussian More statistically, the LDS is a large Gaussian distribution All variables are related by linear conditioned Gaussians: x t = Ax t 1 +Gw t w t N. 0,Q y t = Cx t + v t v t N (. 0,R ) = N ( x t Ax t 1,Q) p x t x t 1 = N ( y t Cx t,r) p y t x t We are assuming G=I, to simplify Jordan s approach he first hidden state has no parents, but still needs a Gaussian distribution: p( x ) 0 = N ( x t 0,Q ) Without loss of generality: all are zero-mean Gaussians

7 LDS as a Gaussian LDS has the following graphical model and conditionals: = N ( x t Ax t 1,Q) p x t x t 1 p( y t x ) t = N ( y t Cx t,r) p( x ) 0 = N ( x 0 0,P ) 0 Note (as with HMMs) stationary since A,C,R,Q constant Products of Gaussian distributions form Gaussian p( X ) U = p( x ) 0 p( x t x ) t 1 t=1 p( y t x ) t = N X U 0,Σ he graphical model describes a particular factorization of this large Gaussian over all variables. his factorization forms a huge yet sparse overall covariance Σ t=0

8 Properties of the LDS Since all variables in the LDS are Gaussian, we summarize their conditional probabilities not by tables p(q t q t-1 ) but by Mean & Covariance (Expectations or Moments) of Gaussian p( x t y 0, y ) t ' = N ( x t ˆx t t ',P ) t t ' Conditional Mean Conditional Cov Condition on Information until t observations y 0,,y t ˆx t t ' = E p ( xt y 0 y t ' ) x P t t t ' = E ( p xt x ( y 0 y t ' ) t ˆx )( t t ' x t ˆx ) t t ' Lyapunov Eqn shows evolution of unconditional covariance Σ t+1 = E x t+1 x t+1 = E ( Ax + w )( Ax t t t + w ) t = AE x t x t A + 2AE x t w t + E w w If evals of A <1 t t = AΣ t A + 0 +Q Σ = AΣ A +Q

9 Steady-State Covariance Consider Lyapunov at t going to infinity Σ t+1 = AΣ t A +Q If evals of A <1 Σ = AΣ A +Q Can solve for covariance at steady state (t = infinity) Example (Boyd): pick A and Q, solve for steady covariance A = Q = Σ 0 1 = initialize Σ 0 = 0 I Σ 0 = 100 I

10 LDS: Basic Operations Recall we did 3 basic things with Bayesian networks: 1) Evaluate: given y 0,,y compute likelihood p(y 0,,y ) 2) Decode: given y 0,,y compute p(q t ) 3) Max Likelihood: given y 0,,y learn parameters θ = p( q,y) p y q0 q We want 3 basic things with our LDSs: 1) Evaluate: given y 0,,y compute likelihood p(y 0,,y ) 2) Filter/Smooth: given y 0,,y compute p(x t ) 3) Max Likelihood: given y 0,,y learn parameters θ p( y) = p( x,y)dx 0 dx x 0 x Will do Filtering traditionally but Junction ree does it all

11 Kalman Filtering Used in online tracking, information up to time t Filtering: given y 0,,y t compute p(x t ) incrementally via ˆx t t = E p ( X y0 y t ) x t P t t = E p X y0 x ( y t ) t ˆx t t Update with 2 steps: ( ˆx t t,p ) t t ˆx t+1 t,p t+1 t ( x t ˆx ) t t ( ˆx t+1 t+1,p ) t+1 t+1 ime Update: p(x t+1 ) moments at t+1 given same y s until t ˆx t+1 t = E p ( X y0 y t ) x t+1 = E p( X y 0 y t ) Ax + w t t = AE p ( X y0 y t ) x t + 0 = Aˆx t t P t+1 t = E ( p ( X y0 x y t ) t+1 ˆx )( t+1 t x t+1 ˆx ) t+1 t = E ( p ( X y0 Ax y t ) t + w t Aˆx )( t t ) = AP t t A +Q

12 Kalman Filtering Measurement Update: p(x t+1 ) moments given y t+1 E p X,yt+1 y 0 y t So: p Conditional Mean y t+1 = E p( X,y t+1 y 0 y t ) Cx + v t+1 t+1 = Cˆx + 0 t+1 t Conditional Covariance E ( p X,yt+1 y ( y 0 y t ) t+1 ŷ ) t+1 t ( ) = E p X,yt+1 Cx ( y 0 y t ) t+1 + v t+1 Cˆx t+1 t Conditional Correlation E p ( X,yt+1 y 0 y t ) x t+1 y t+1 ( y t+1 ŷ ) t+1 t x t+1 ˆx t+1 t y 0 y t = N x t+1 y t+1 ˆx t+1 t Cˆx t+1 t = CP t+1 t, P t+1 t CP t+1 t = CP C + R t+1 t P t+1 t C CP t+1 t C + R

13 Kalman Filtering Measurement Update: x p t+1 y y t+1 0 y t = N x t+1 y t+1 Want to condition this to get: Recall conditioning formula: p x y = N x µ x + Σ xy Σ yy ˆx t+1 t+1 = ˆx t+1 t + K t+1 ˆx t+1 t Cˆx t+1 t, P t+1 t CP t+1 t p x t+1,y t+1 P t+1 t C CP t+1 t C + R p( x t+1,y t+1 ) ( 1 ( y µ y ),Σ xx Σ xy Σ 1 yy Σ ) yx Next Conditional Mean y t+1 Cˆx t+1 t ˆx t+1 t = Aˆx t t P t+1 t = AP t t A +Q Next Conditional Cov where Kalman gain is K = P C t+1 t+1 t CP C t+1 t + R P t+1 t+1 = P t+1 t K t+1 CP t+1 t 1

14 Kalman Filtering (summary) Start at time t=0 and run forward until t= ˆx t+1 t = Aˆx t t P t+1 t = AP t t A +Q K t+1 = P t+1 t C ˆx t+1 t+1 = ˆx t+1 t + K t+1 ( CP t+1 t C + R) 1 P t+1 t+1 = P t+1 t K t+1 CP t+1 t ( y t+1 Cˆx ) t+1 t hen, the log-likelihood is (using the Gaussian model): loglikelihood = t=0 log N ( y t Cˆx t t 1,CP t t 1 C + R)

15 Filtering & Smoothing We have a recursive formula for going from ( ˆx t t,p ) t t ˆx t+1 t,p t+1 t ( ˆx t+1 t+1,p ) t+1 t+1 Initialize with: ( ˆx 0 1 = 0,P 0 1 = P ) 0 and loop updates Filtering: lets us compute the distribution over hidden states X online as we get more measurements given y 0,,y t compute p(x t ) p(x t ) does not get any information about future y s Smoothing: assume we have access to the whole sequence at once, can get BEER estimates for the hidden states X given y 0,,y compute p(x t ) Not real time tracking, since have all observations

16 RS Smoothing Smoothing goes back in time, propagates expectations we know at the last observation to earlier states Procedure: ( ˆx t+1,p ) t+1 ( ˆx t,p ) t Filtering already gave: ( ˆx 0 0,P 0 0 ) ( ˆx t t,p t t ) ( ˆx,P ) Recall: ˆx t+1 t = Aˆx t t so E ( P X y0 x (,,y t ) t t ˆx )( t t x t+1 t ˆx ) t+1 t = P t t A hus, have joint probability: x p t x y x t+1 0 y t = N t ˆx t t P t t P t t A, x t+1 ˆx t+1 t AP t t P t+1 t Using conditioning rule: p( x t x t+1,y 0 y t ) = N ( x t ˆx t t + L ( t x t+1 ˆx t+1 t ),P t t L t P t+1 t L ) t where we define L t = P t t A 1 P t+1 t

17 RS Smoothing Iteratively get the conditional mean on all sequence ˆx t = E P ( X y0,,y ) x t = x t Same for conditional cov: x p x y t (,,y t 0 ) x t = x t p x t,x t+1 y 0,,y x dx t dx xt+1 t+1 t = x t p( x t x t+1,y 0,,y ) dx t p( x t+1 y 0,,y )dx t+1 xt+1 xt+1 = x t p x t x t+1,y 0,,y t x t = ˆx t t + L t x t+1 ˆx xt+1 t+1 t = ˆx t t + L t ˆx t+1 ˆx t+1 t dx t dx t p( x t+1 y 0,,y )dx t+1 p ( x y,,y t+1 0 )dx t+1 P t = P t t + L ( t P t+1 P t+1 t )L t End up with: ( ˆx 0,P 0 ) ( ˆx t,p t ) ( ˆx,P )

18 Junction ree Approach Instead of deriving filtering & smoothing could use JA forward collect = filter, backward distribute = smooth Same directed graphical model as HMM Get same Junction tree as HMM ψ ( x,x 0 1) ψ ( x,x 1 2) ψ ( x,x 1 ) ψ x 0,y 0 φ ( x 1) ς( x 1 ) ς ( x 2) ς ( x ) φ x 0 ψ( x 2,y 2 ) ψ( x,y ) ψ x 1,y 1 Except now cliques & separators are not discrete tables but continuous functions of Gaussian form

19 Junction ree Approach Rewrite Gaussians in canonical or natural parameters ξ,k instead of mean and covariance µ,σ N ( x µ,σ) exp 1 ( x µ 2 )Σ 1 ( x µ ) ξ = Σ 1 µ exp( ξ x 1 2 x Kx) K = Σ 1 property: exp( ξ 1 x 1 2 x K 1 x)exp ξ 2 x 1 2 x K 2 x Initialize all cliques to the conditional Gaussians = exp 1 2 x t+1 Ax t ψ x t+1,x t = exp 1 y Cx 2 t t ζ( x ) t = φ( x ) t = 1 ψ x t,y t = exp( ξ 1+2 x 1 2 x K 1+2 x) ( Q 1 ( x t+1 Ax )) t p x t+1 x t ( R 1 ( x t Cx )) t p y t x t

20 Junction ree Algorithm Canonical-form Gaussians make it easy to compute the continuous version of JA message passing Collect Distribute φ * S = V \S ψ V ψ W * ψ V * = φ * S ψ φ W S = ψ V φ * S = ψ V \S V ψ W * ψ V * = φ * S ψ φ W S = ψ V Get same implementation as previous filtering & smoothing All cliques and separators become the marginals ψ( x t+1,x ) ** t p( x t+1,x t y 0,,y ) ζ( x ) ** t p( x t y 0,,y )

21 Expectation Maximization If we knew X states, maximum complete likelihood is easy: log p( X,Y ) = log p( x ) 0 p( x t x ) t=1 t 1 p( y t x ) t=0 t = log p( x ) 0 + log p( x t x ) t 1 + log p y t x t 1 1 y 2 t Cx t + t=0 2 For example, take derivatives over C and set to 0: log p = 1 ( y C C 2 t Cx ) t R 1 ( y t Cx ) t + t=0 2 t=1 1 x P x 0 1 log P = t=1 2R 1 t=0 t=0 ( x t Ax ) t 1 Q 1 x t Ax t 1 R 1 ( y t Cx ) t log R 1 log R 1 ( y t Cx t )x t C = ( y t x ) t=0 t x t x t=0 t log Q 1 1

22 Expectation Maximization But, we don t know hidden states, maximize incomplete E-step: Use current model A,C,Q,R to get expectations smoothing gives: ( ˆx 0,P 0 ) ( ˆx t,p t ) ( ˆx,P ) M-step: Maximize the expected complete likelihood (replace x s & outerproducts with their expectations) C = ( y t x )( t=0 t x t x ) 1 t=0 t ( y t ˆx )( t=0 t P ) 1 t=0 t ( ) P t 1 1 A = E x t 1 x t etc... t=1 t=1 actually need pairwise marginals (see Ghahramani Paper) p( x t 1,x t y 0,,y ) = N x t 1 x t ˆx t 1 P t 1 P t 1,t, ˆx t P t,t 1 P t and iterate until convergence

23 LDS Smoothing Example Given some data y 0, y Fit LDS model with EM A,C,Q,R,P 0 hen do smoothing,( ˆx t,p t ), Show expected y s,( Cˆx t ),

24 Nonlinear Dynamical Systems What about nonlinear dynamics? LDS is simplistic since many real phenomena have nonlinear state evolution and nonlinear output given state x t = a x t 1 y t = c x t + w t w t N (. + v t v t N. 0,R 0,Q ) x t = state n w t = process noise n y t = observed output p v t = measurement noise p ypically we prespecify the functions a() and c() Also called an Extended Kalman Filter Good news: the basic filtering & smoothing algorithms stay the same and we can recover hidden states Bad news: not guaranteed optimal, EM learning harder

25 Extended Kalman Filtering he regular Kalman filter becomes non-stationary We need to approximate the nonlinear a() and c() functions at each step in time with their best constant (linear) estimate A and C given our guess of hidden state A t = a ( x ) C x ˆx t = c ( x ) t 1 t 1 x ˆx t 1 t 1 Change filtering equations as follows ˆx t+1 t = a ( ˆx ) t t P t+1 t = A t P t t A t +Q ˆx t+1 t+1 = ˆx t+1 t + K t+1 ( y t+1 c( ˆx )) t+1 t P t+1 t+1 = P t+1 t K t+1 C t P t+1 t K = P C t+1 t+1 t t ( C t P t+1 t C t + R) 1

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1 Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Lecture 7: Optimal Smoothing

Lecture 7: Optimal Smoothing Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011 Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

1 What is a hidden Markov model?

1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

Particle Filtering for Data-Driven Simulation and Optimization

Particle Filtering for Data-Driven Simulation and Optimization Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October

More information

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand

More information

PROBABILISTIC REASONING OVER TIME

PROBABILISTIC REASONING OVER TIME PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1 Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. ! Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

An EM algorithm for Gaussian Markov Random Fields

An EM algorithm for Gaussian Markov Random Fields An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

CRF for human beings

CRF for human beings CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

A Study of the Kalman Filter applied to Visual Tracking

A Study of the Kalman Filter applied to Visual Tracking A Study of the Kalman Filter applied to Visual Tracking Nathan Funk University of Alberta Project for CMPUT 652 December 7, 2003 Abstract This project analyzes the applicability of the Kalman filter as

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Graphical Models for Statistical Inference and Data Assimilation

Graphical Models for Statistical Inference and Data Assimilation Graphical Models for Statistical Inference and Data Assimilation Alexander T. Ihler a Sergey Kirshner a Michael Ghil b,c Andrew W. Robertson d Padhraic Smyth a a Donald Bren School of Information and Computer

More information

Machine Learning for Data Science (CS4786) Lecture 19

Machine Learning for Data Science (CS4786) Lecture 19 Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 18 The Junction Tree Algorithm Collect & Distribute Algorithmic Complexity ArgMax Junction Tree Algorithm Review: Junction Tree Algorithm end message

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Graphical Models for Statistical Inference and Data Assimilation

Graphical Models for Statistical Inference and Data Assimilation Graphical Models for Statistical Inference and Data Assimilation Alexander T. Ihler a Sergey Kirshner b Michael Ghil c,d Andrew W. Robertson e Padhraic Smyth a a Donald Bren School of Information and Computer

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

State Estimation of Linear and Nonlinear Dynamic Systems

State Estimation of Linear and Nonlinear Dynamic Systems State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Probabilistic & Unsupervised Learning. Latent Variable Models for Time Series

Probabilistic & Unsupervised Learning. Latent Variable Models for Time Series Probabilistic & Unsupervised Learning Latent Variable Models for Time Series Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Dynamic models 1 Kalman filters, linearization,

Dynamic models 1 Kalman filters, linearization, Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models

More information

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier

Miscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models 6 Dependent data The AR(p) model The MA(q) model Hidden Markov models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Data assimilation with and without a model

Data assimilation with and without a model Data assimilation with and without a model Tim Sauer George Mason University Parameter estimation and UQ U. Pittsburgh Mar. 5, 2017 Partially supported by NSF Most of this work is due to: Tyrus Berry,

More information

Statistical Filtering and Control for AI and Robotics. Part II. Linear methods for regression & Kalman filtering

Statistical Filtering and Control for AI and Robotics. Part II. Linear methods for regression & Kalman filtering Statistical Filtering and Control for AI and Robotics Part II. Linear methods for regression & Kalman filtering Riccardo Muradore 1 / 66 Outline Linear Methods for Regression Gaussian filter Stochastic

More information