Dynamic models 1 Kalman filters, linearization,

Similar documents
What you need to know about Kalman Filters

Probabilistic Graphical Models

Linear Dynamical Systems

Stat 521A Lecture 18 1

Probabilistic Graphical Models

Probabilistic Graphical Models

Intelligent Systems (AI-2)

BN Semantics 3 Now it s personal! Parameter Learning 1

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Part 1: Expectation Propagation

Intelligent Systems (AI-2)

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

EM (cont.) November 26 th, Carlos Guestrin 1

Learning P-maps Param. Learning

13: Variational inference II

Introduction to Machine Learning

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Mobile Robot Localization

Probabilistic Graphical Models

Mobile Robot Localization

Approximate Inference Part 1 of 2

Bayesian Networks BY: MOHAMAD ALSABBAGH

Pattern Recognition and Machine Learning

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Approximate Inference Part 1 of 2

CS 188: Artificial Intelligence Spring Announcements

Intelligent Systems (AI-2)

Reasoning Under Uncertainty: Bayesian networks intro

9 Forward-backward algorithm, sum-product on factor graphs

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

Bayesian Networks Structure Learning (cont.)

CSC 2541: Bayesian Methods for Machine Learning

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

Variational Inference. Sargur Srihari

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

CS 188: Artificial Intelligence. Bayes Nets

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

Introduction to Probabilistic Graphical Models: Exercises

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Structure Learning: the good, the bad, the ugly

2D Image Processing (Extended) Kalman and particle filter

Probabilistic Graphical Networks: Definitions and Basic Results

4 : Exact Inference: Variable Elimination

Introduction to Machine Learning CMU-10701

Bayesian Networks Representation

Graphical Models and Kernel Methods

Machine Learning Summer School

Based on slides by Richard Zemel

Expectation Maximization Algorithm

Probabilistic Graphical Models

Lecture 13 : Variational Inference: Mean Field Approximation

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Expectation Propagation Algorithm

Latent Variable Models

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Lecture : Probabilistic Machine Learning

STA 4273H: Statistical Machine Learning

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Bayes Nets III: Inference

Machine Learning for Signal Processing Bayes Classification and Regression

Autonomous Navigation for Flying Robots

Variational Inference (11/04/13)

CSE446: Clustering and EM Spring 2017

Introduction to Mobile Robotics Probabilistic Robotics

2D Image Processing. Bayes filter implementation: Kalman filter

CS 343: Artificial Intelligence

Gaussian Mixture Models

Local Probabilistic Models: Continuous Variable CPDs

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

p L yi z n m x N n xi

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Introduction to Probabilistic Graphical Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Undirected Graphical Models

Bias-Variance Tradeoff

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Autonomous Mobile Robot Design

Hidden Markov models

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Transcription:

Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 16 th, 2005

Announcement Special recitation lectures Pradeep will give two special lectures Nov. 22 & Dec. 1: 5-6pm, during recitation Covering: variational methods, loopy BP and their relationship Don t miss them!!!

Adventures of our BN hero Compact representation for probability distributions Fast inference Fast learning Approximate inference But Who are the most popular kids? 1. Naïve Bayes 2 and 3. Hidden Markov models (HMMs) Kalman Filters

The Kalman Filter An HMM with Gaussian distributions Has been around for at least 50 years Possibly the most used graphical model ever It s what does your cruise control tracks missiles controls robots And it s so simple Possibly explaining why it s so used Many interesting models build on it Review and extensions today

Example of KF SLAT Simultaneous Localization and Tracking [Funiak, Guestrin, Paskin, Sukthankar 05] Place some cameras around an environment, don t know where they are Could measure all locations, but requires lots of grad. student (Stano) time Intuition: A person walks around If camera 1 sees person, then camera 2 sees person, learn about relative positions of cameras

Example of KF SLAT Simultaneous Localization and Tracking [Funiak, Guestrin, Paskin, Sukthankar 05]

Multivariate Gaussian Mean vector: Covariance matrix:

Conditioning a Gaussian Joint Gaussian: p(x,y) ~ N(µ;Σ) Conditional linear Gaussian: p(y X) ~ N(µ Y X ; σ 2 )

Gaussian is a Linear Model Conditional linear Gaussian: p(y X) ~ N(β 0 +βx; σ 2 )

Conditioning a Gaussian Joint Gaussian: p(x,y) ~ N(µ;Σ) Conditional linear Gaussian: p(y X) ~ N(µ Y X ; Σ YY X )

Conditional Linear Gaussian (CLG) general case Conditional linear Gaussian: p(y X) ~ N(β 0 +ΒX; Σ YY X )

Understanding a linear Gaussian the 2d case Variance increases over time (motion noise adds up) Object doesn t necessarily move in a straight line

Tracking with a Gaussian 1 p(x 0 ) ~ N(µ 0,Σ 0 ) p(x i+1 X i ) ~ N(Β X i + β; Σ Xi+1 Xi )

Tracking with Gaussians 2 Making observations We have p(x i ) Detector observes O i =o i Want to compute p(x i O i =o i ) Use Bayes rule: Require a CLG observation model p(o i X i ) ~ N(W X i + v; Σ Oi Xi )

Operations in Kalman filter X 1 X 2 X 3 X 4 X 5 O 1 = O 2 = O 3 = O 4 = O 5 = Compute Start with At each time step t: Condition on observation Prediction (Multiply transition model) Roll-up (marginalize previous time step) I ll describe one implementation of KF, there are others Information filter

Canonical form Standard form and canonical forms are related: Conditioning is easy in canonical form Marginalization easy in standard form

Conditioning in canonical form First multiply: Then, condition on value B = y

Operations in Kalman filter X 1 X 2 X 3 X 4 X 5 O 1 = O 2 = O 3 = O 4 = O 5 = Compute Start with At each time step t: Condition on observation Prediction (Multiply transition model) Roll-up (marginalize previous time step)

Prediction & roll-up in canonical form First multiply: Then, marginalize X t :

What if observations are not CLG? Often observations are not CLG CLG if O i = Β X i + β o + ε Consider a motion detector O i = 1 if person is likely to be in the region Posterior is not Gaussian

Linearization: incorporating nonlinear evidence p(o i X i ) not CLG, but Find a Gaussian approximation of p(x i,o i )= p(x i ) p(o i X i ) Instantiate evidence O i =o i and obtain a Gaussian for p(x i O i =o i ) Why do we hope this would be any good? Locally, Gaussian may be OK

Linearization as integration Gaussian approximation of p(x i,o i )= p(x i ) p(o i X i ) Need to compute moments E[O i ] E[O i2 ] E[O i X i ] Note: Integral is product of a Gaussian with an arbitrary function

Linearization as numerical integration Product of a Gaussian with arbitrary function Effective numerical integration with Gaussian quadrature method Approximate integral as weighted sum over integration points Gaussian quadrature defines location of points and weights Exact if arbitrary function is polynomial of bounded degree Number of integration points exponential in number of dimensions d Exact monomials requires exponentially fewer points For 2d+1 points, this method is equivalent to effective Unscented Kalman filter Generalizes to many more points

Operations in non-linear Kalman filter X 1 X 2 X 3 X 4 X 5 O 1 = O 2 = O 3 = O 4 = O 5 = Compute Start with At each time step t: Condition on observation (use numerical integration) Prediction (Multiply transition model, use numerical integration) Roll-up (marginalize previous time step)

What if the person chooses different motion models? With probability θ, move more or less straight With probability 1-θ, do the moonwalk

The moonwalk

What if the person chooses different motion models? With probability θ, move more or less straight With probability 1-θ, do the moonwalk

Switching Kalman filter At each time step, choose one of k motion models: You never know which one! p(x i+1 X i,z i+1 ) CLG indexed by Z i p(x i+1 X i,z i+1 =j) ~ N(β j 0 + Β j X i ; Σ j Xi+1 Xi)

Inference in switching KF one step Suppose p(x 0 ) is Gaussian Z 1 takes one of two values p(x 1 X o,z 1 ) is CLG Marginalize X 0 Marginalize Z 1 Obtain mixture of two Gaussians!

Multi-step inference Suppose p(x i ) is a mixture of m Gaussians Z i+1 takes one of two values p(x i+1 X i,z i+1 ) is CLG Marginalize X i Marginalize Z i Obtain mixture of 2m Gaussians! Number of Gaussians grows exponentially!!!

Visualizing growth in number of Gaussians

Computational complexity of inference in switching Kalman filters Switching Kalman Filter with (only) 2 motion models Query: Problem is NP-hard!!! [Lerner & Parr `01] Why!!!? Graphical model is a tree: Inference efficient if all are discrete Inference efficient if all are Gaussian But not with hybrid model (combination of discrete and continuous)

Bounding number of Gaussians P(X i ) has 2 m Gaussians, but usually, most are bumps have low probability and overlap: Intuitive approximate inference: Generate k.m Gaussians Approximate with m Gaussians

Collapsing Gaussians Single Gaussian from a mixture Given mixture P <w i ;N(µ i,σ i )> Obtain approximation Q~N(µ,Σ) as: Theorem: P and Q have same first and second moments KL projection: Q is single Gaussian with lowest KL divergence from P

Collapsing mixture of Gaussians into smaller mixture of Gaussians Hard problem! Akin to clustering problem Several heuristics exist c.f., Uri Lerner s Ph.D. thesis

Operations in non-linear switching Kalman filter X 1 X 2 X 3 X 4 X 5 O 1 = O 2 = O 3 = O 4 = O 5 = Compute mixture of Gaussians for Start with At each time step t: For each of the m Gaussians in p(x i o 1:i ): Condition on observation (use numerical integration) Prediction (Multiply transition model, use numerical integration) Obtain k Gaussians Roll-up (marginalize previous time step) Project k.m Gaussians into m Gaussians p(x i o 1:i+1 )

Assumed density filtering Examples of very important assumed density filtering: Non-linear KF Approximate inference in switching KF General picture: Select an assumed density e.g., single Gaussian, mixture of m Gaussians, After conditioning, prediction, or roll-up, distribution no-longer representable with assumed density e.g., non-linear, mixture of k.m Gaussians, Project back into assumed density e.g., numerical integration, collapsing,

When non-linear KF is not good enough Sometimes, distribution in non-linear KF is not approximated well as a single Gaussian e.g., a banana-like distribution Assumed density filtering: Solution 1: reparameterize problem and solve as a single Gaussian Solution 2: more typically, approximate as a mixture of Gaussians

Reparameterized KF for SLAT [Funiak, Guestrin, Paskin, Sukthankar 05]

When a single Gaussian ain t good enough [Fox et al.] Sometimes, smart parameterization is not enough Distribution has multiple hypothesis Possible solutions Sampling particle filtering Mixture of Gaussians Quick overview of one such solution

Approximating non-linear KF with mixture of Gaussians Robot example: P(X i ) is a Gaussian, P(X i+1 ) is a banana Approximate P(X i+1 ) as a mixture of m Gaussians e.g., using discretization, sampling, Problem: P(X i+1 ) as a mixture of m Gaussians P(X i+2 ) is m bananas One solution: Apply collapsing algorithm to project m bananas in m Gaussians

What you need to know Kalman filter Probably most used BN Assumes Gaussian distributions Equivalent to linear system Simple matrix operations for computations Non-linear Kalman filter Usually, observation or motion model not CLG Use numerical integration to find Gaussian approximation Switching Kalman filter Hybrid model discrete and continuous vars. Represent belief as mixture of Gaussians Number of mixture components grows exponentially in time Approximate each time step with fewer components Assumed density filtering Fundamental abstraction of most algorithms for dynamical systems Assume representation for density Every time density not representable, project into representation