Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Probabilistic Graphical Models

Factor Analysis and Kalman Filtering (11/2/04)

9 Forward-backward algorithm, sum-product on factor graphs

Linear Dynamical Systems

STA 4273H: Statistical Machine Learning

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Machine Learning 4771

Probabilistic Graphical Models

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation

Bayesian Machine Learning - Lecture 7

Lecture 2: From Linear Regression to Kalman Filter and Beyond

CS Lecture 19. Exponential Families & Expectation Propagation

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Stat 521A Lecture 18 1

Variational Inference (11/04/13)

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models

Probabilistic Graphical Models

Expectation Propagation in Factor Graphs: A Tutorial

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Dynamic models 1 Kalman filters, linearization,

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Probabilistic Graphical Models

Introduction to Machine Learning

Probabilistic Graphical Models

Graphical Models for Statistical Inference and Data Assimilation

Graphical models and message-passing Part II: Marginals and likelihoods

Introduction to Probabilistic Graphical Models: Exercises

STA 414/2104: Machine Learning

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

6.867 Machine learning, lecture 23 (Jaakkola)

Graphical Models for Statistical Inference and Data Assimilation

Sensor Tasking and Control

Graphical models for statistical inference and data assimilation

Organization. I MCMC discussion. I project talks. I Lecture.

The Expectation-Maximization Algorithm

Probabilistic and Bayesian Machine Learning

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Lecture 6: Bayesian Inference in SDE Models

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Dynamic Approaches: The Hidden Markov Model

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

COMP90051 Statistical Machine Learning

Hidden Markov Models (recap BNs)

State Space and Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

An EM algorithm for Gaussian Markov Random Fields

Introduction to Machine Learning

Graphical Models for Collaborative Filtering

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

11 : Gaussian Graphic Models and Ising Models

Chris Bishop s PRML Ch. 8: Graphical Models

Graphical Models and Kernel Methods

13: Variational inference II

Bayesian Machine Learning

Expectation Propagation Algorithm

p L yi z n m x N n xi

F denotes cumulative density. denotes probability density function; (.)

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Undirected graphical models

CS281A/Stat241A Lecture 17

PROBABILISTIC REASONING OVER TIME

Multivariate Gaussians. Sargur Srihari

Part 1: Expectation Propagation

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Note Set 5: Hidden Markov Models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

9 Multi-Model State Estimation

Lecture 15. Probabilistic Models on Graph

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Embedded Trees: Estimation of Gaussian Processes on Graphs with Cycles

Machine Learning for Data Science (CS4786) Lecture 24

CPSC 540: Machine Learning

Switching Regime Estimation

The Multivariate Gaussian Distribution [DRAFT]

Hidden Markov Models and Gaussian Mixture Models

Expectation propagation for signal detection in flat-fading channels

ECE521 Lecture 19 HMM cont. Inference in HMM

STA 4273H: Statistical Machine Learning

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner

Embedded Trees: Estimation of Gaussian Processes on Graphs with Cycles

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Mobile Robot Localization

Undirected Graphical Models

Lecture 7: Optimal Smoothing

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Transcription:

Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by Jason Pacheco Some figures courtesy Michael Jordan s draft textbook, An Introduction to Probabilistic Graphical Models

Pairwise Markov Random Fields p(x) = 1 Z Y (s,t)2e st(x s,x t ) Y s2v s(x s ) x 1 x 2 Simple parameterization, but still expressive and widely used in practice Guaranteed Markov with respect to graph Any jointly Gaussian distribution can be represented by only pairwise potentials x 3 x 4 x 5 E V Z set of undirected edges (s,t) linking pairs of nodes set of N nodes or vertices, {1, 2,...,N} normalization constant (partition function)

Belief Propagation (Integral-Product) BELIEFS: Posterior marginals x t MESSAGES: Sufficient statistics m ts (x s ) / Z ˆp t (x t ) / t (x t ) Y (t) x t st(x s,x t ) t (x t ) u2 (t) m ut (x t ) neighborhood of node t (adjacent nodes) Y u2 (t)\s m ut (x t ) x s x t I) Message Product II) Message Propagation

Gaussian Distributions Simplest joint distribution that can capture arbitrary mean & covariance Justifications from central limit theorem and maximum entropy criterion Probability density above assumes covariance is positive definite ML parameter estimates are sample mean & sample covariance

Gaussian Conditionals & Marginals For any joint multivariate Gaussian distribution, all marginal distributions are Gaussians, and all conditional distributions are Gaussians

Partitioned Gaussian Distributions Marginals: Conditionals: Intuition: p(x 1 x 2 )= p(x 1,x 2 ) p(x 2 ) / exp 1 2 (x µ)t 1 (x µ)+ 1 2 (x 2 µ 2 ) T 1 22 (x 2 µ 2 )

Gaussian Conditionals & Marginals p(x 1 x 2 )=N x 1 µ 1 + 1 2 (x 2 µ 2 ), 2 1(1 2 )

Gaussian Graphical Models x N (µ, ) J = 1

Gaussian Potentials

Interpreting GMRF Parameters var ( x s x N(s) ) =(Js,s ) 1 ρ st N(s,t) cov ( ) x s,x t x N(s,t) var ( ) ( ) = x s x N(s,t) var xt x N(s,t) J s,t Js,s J t,t

Gaussian Markov Properties x N (µ, ) J = 1 Theorem 2.2. Let x N (0,P) be a Gaussian stochastic process which is Markov with respect to an undirected graph G =(V, E). Assume that x is not Markov with respect to any G =(V, E ) such that E E, and partition J = P 1 into a V V grid according to the dimensions of the node variables. Then for any s, t V such that s t, J s,t = Jt,s T will be nonzero if and only if (s, t) E.

Inference with Gaussian Observations

Linear Gaussian Systems Marginal Likelihood: Posterior Distribution: Gaussian BP: Complexity linear, not cubic, in number of nodes

Belief Propagation (Integral-Product) BELIEFS: Posterior marginals x t MESSAGES: Sufficient statistics m ts (x s ) / Z ˆp t (x t ) / t (x t ) Y (t) x t st(x s,x t ) t (x t ) u2 (t) m ut (x t ) neighborhood of node t (adjacent nodes) Y u2 (t)\s m ut (x t ) x s x t I) Message Product II) Message Propagation

Gaussian Belief Propagation The natural, canonical, or information parameterization of a Gaussian distribution arises from quadratic form: 1 N (x #, ) / exp 2 xt x + # T x Gaussian BP N represents messages and marginals as: N # = 1 = 1 µ m ts (x s )=αn 1 (ϑ ts, Λ ts ) p (x s y) =N 1 (ϑ s, Λ s )

Gaussian Belief Propagation y 1 y 2\1 y 1 1 y 1 0 x 4 x 1 x 2 y 1 2 x 1 x 3 y 1 3 y 4\1 y 3\1 Gaussian BP represents messages and marginals as: N N m ts (x s )=αn 1 (ϑ ts, Λ ts ) p (x s y) =N 1 (ϑ s, Λ s ) Gaussian BP belief updates then have a simple form: ϑ n s = Cs T Rs 1 y s + p (x s ys n )=αp(y s x s ) m n ts (x s ) t N(s) p (y s x s )=αn 1( ) Cs T Rs 1 y s,cs T Rs 1 C s t N(s) Λ n s = Cs T Rs 1 C s + t N(s) ϑ n ts Λ n ts

Gaussian Belief Propagation y 1 1 y 1 0 y 1 3 y 1 2 x 1 x s x t ϑ = m n ts (x s )=α ψ s,t (x s,x t ) p (y t x t ) 0 x t ψ s,t (x s,x t ) p (y t x t ) Ct T Rt 1 y t + u N(t)\s ϑn 1 ut u N(t)\s Λ = m n 1 J s(t) J t,s ( ) m n 1 ut (x t ) dx t u N(t)\s ut (x t ) N 1( ϑ, Λ ) \ J t(s) + C T t R 1 t J s,t C t + u N(t)\s Λn 1 ut

Gaussian Belief Propagation y 1 1 y 1 0 y 1 2 x 1 y 1 3 x s x t M = A B C D (A BD M 1 = C) 1 (A BD 1 C) 1 BD 1 D 1 C (A BD 1 C) 1 D 1 + D 1 C (A BD 1 C) 1 BD 1 = A 1 + A 1 B (D CA 1 B) 1 CA 1 A 1 B (D CA 1 B) 1 (D CA 1 B) 1 CA 1 (D CA 1 B) 1 Matrix Inversion Lemma:

Gaussian Belief Propagation y 1 1 y 1 0 y 1 2 x 1 y 1 3 x s x t m n ts (x s )=α ϑ n ts Λ n ts = J s,t = J s(t) J s,t x t ψ s,t (x s,x t ) p (y t x t ) J t(s) + C T t R 1 t C t + u N(t)\s Λ n 1 ut J t(s) + C T t R 1 t C t + u N(t)\s N u N(t)\s 1 Λ n 1 ut m n 1 ut C T t R 1 t y t + 1 J t,s (x t ) dx t u N(t)\s ϑ n 1 ut

State Space Models Like HMM, but over continuous variables... Plant Equations:

State Space Models Dynamics: Process Noise Same as,

State Space Models Measurement: Measurement Noise Same as, This is known as the Linear Gaussian assumption.

Example: Target Tracking Target State Position Measurement Velocity Constant velocity dynamics (1st order diffeq) Lower-dimensional measurement

Example: Target Tracking

Correspondence With Factor Analysis Factor Analysis: State Space:

Inference for State Space Model Broken into 2 parts Filtering (Forward pass): Smoothing (Backward pass) Why filter? From signal processing Filters out system noise to produce an estimate

Conditional Moments Everything is Gaussian Can focus on mean / variance computations Conditional Mean Conditional Variance

Kalman Filter Two recursive updates: 1) Time update: best guess before seeing measurement) 2) Measurement Update: after measurement

Kalman Filter 1) Time update: 2) Measurement Update:

Kalman Filter: Time Update Recall that, so the conditional mean recursion is, Similar for covariance, Zero Noise In Expectation

Gaussian Conditionals For any joint multivariate Gaussian distribution, all conditional distributions are Gaussians

Gaussian Conditionals Gaussian joint distribution, Inverse Covariance Conditional is Gaussian with parameters, Conditional Covariance

Kalman Filter: Measurement Update Form the joint over as, Time Update Compute conditional, Measurement Equation

Kalman Filter: Measurement Update Quantity called the Kalman Gain Matrix Because it multiplies observation, i.e. produces gain Update takes linear combination of predicted mean and observation, weighted by predicted covariance

Kalman Filter Consider the covariance updates, Independent of observed measurements This is a property of Gaussians in general Only depend on process and measurement noise Can be computed offline

Kalman Filter

Information Filter Recall Gaussian has equivalent canonical parameterization Sometimes called Information form Inverse Covariance Canonical Mean Recursive updates follow definitions Matrix condition is reciprocal of condition of its inverse

Information Filter Define to, focus on precision update Matrix Inversion Lemma Measurement update Wait a minute...this looks familiar...

Gaussian Belief Propagation Message Update: Belief Update:

Gaussian BP => Kalman Filter Time Update: Measurement Update:

Smoother Combines forward and backward probabilities to produce full marginal posterior. Similar to inference on an HMM (forwardbackward algorithm)

Smoother Can we just invert the dynamics, and run Kalman filter backwards? x 1... x T-2 x T-1 x T y 1 y T-2 y T-1 y T No, w t is no longer independent of the past state (e.g. x t+1,, x T )

Unconditional Distribution Marginal distributions of a Gaussian are Gaussian Where, From zero-mean noise assumption Covariance computed recursively Does not depend on means

Smoother Given the unconditional marginal, Form the unconditional over Solve for reverse dynamics Run filter backwards with new dynamics

Smoother x 1... x T-2 x T-1 x T y 1 y T-2 y T-1 y T

Smoother Forward Conditional Backward Conditional Gaussian closed under multiplication Multiply to produce full smoothed marginal Kalman filter + smoother equivalent to Gaussian BP

Summary Kalman filter is optimal filter for Linear Gaussian State Space model Smoother provides full marginal inference Gaussian BP produces equivalent algorithm Correspondence clearly shown in Information form of the filter