What you need to know about Kalman Filters

Similar documents
Dynamic models 1 Kalman filters, linearization,

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Probabilistic Graphical Models

BN Semantics 3 Now it s personal! Parameter Learning 1

PAC-learning, VC Dimension and Margin-based Bounds

EM (cont.) November 26 th, Carlos Guestrin 1

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Structure Learning: the good, the bad, the ugly

Intelligent Systems (AI-2)

Bayesian Networks BY: MOHAMAD ALSABBAGH

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

Bayesian Networks Representation

Intelligent Systems (AI-2)

CS 188: Artificial Intelligence Spring Announcements

Lecture 13 : Variational Inference: Mean Field Approximation

CS 188: Artificial Intelligence Spring 2009

13: Variational inference II

CS 343: Artificial Intelligence

Learning P-maps Param. Learning

Variational Inference. Sargur Srihari

Intelligent Systems (AI-2)

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Bayesian Networks Structure Learning (cont.)

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

PROBABILISTIC REASONING OVER TIME

Introduction to Artificial Intelligence (AI)

Bayes Nets III: Inference

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Machine Learning Summer School

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Probabilistic Graphical Models

2-Step Temporal Bayesian Networks (2TBN): Filtering, Smoothing, and Beyond Technical Report: TRCIM1030

Part 1: Expectation Propagation

Linear Dynamical Systems

Graphical Models. Lecture 5: Template- Based Representa:ons. Andrew McCallum

Probabilistic Graphical Models

Introduction to Machine Learning

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

p L yi z n m x N n xi

STA 4273H: Statistical Machine Learning

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Template-Based Representations. Sargur Srihari

Reasoning Under Uncertainty: Bayesian networks intro

Autonomous Mobile Robot Design

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Variational algorithms for marginal MAP

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

10708 Graphical Models: Homework 2

SVMs, Duality and the Kernel Trick

Stat 521A Lecture 18 1

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

A minimalist s exposition of EM

CSEP 573: Artificial Intelligence

CS 343: Artificial Intelligence

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Probabilistic Graphical Models

Probabilistic Models

CS 188: Artificial Intelligence Spring Announcements

Pattern Recognition and Machine Learning

Probabilistic Robotics

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Undirected Graphical Models

2 : Directed GMs: Bayesian Networks

9 Forward-backward algorithm, sum-product on factor graphs

Temporal probability models. Chapter 15, Sections 1 5 1

Approximate Inference

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CSE 473: Artificial Intelligence

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Bayesian Networks Inference with Probabilistic Graphical Models

Intelligent Systems (AI-2)

1 Bayesian Linear Regression (BLR)

4 : Exact Inference: Variable Elimination

Based on slides by Richard Zemel

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Probabilistic Graphical Models

Lecture 6: Graphical Models: Learning

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Hidden Markov models

Final Exam December 12, 2017

Mobile Robot Localization

CSEP 573: Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Localization. Howie Choset Adapted from slides by Humphrey Hu, Trevor Decker, and Brad Neuman

Weighted Majority and the Online Learning Approach

Expectation Propagation in Factor Graphs: A Tutorial

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Lecture 9: PGM Learning

Bayesian Networks. Motivation

Answers and expectations

Temporal probability models. Chapter 15

Unsupervised Learning

CS Lecture 4. Markov Random Fields

Transcription:

Readings: K&F: 4.5, 12.2, 12.3, 12.4, 18.1, 18.2, 18.3, 18.4 Switching Kalman Filter Dynamic Bayesian Networks Graphical Models 10708 Carlos Guestrin Carnegie Mellon University November 27 th, 2006 1 What you need to know about Kalman Filters Kalman filter Probably most used BN Assumes Gaussian distributions Equivalent to linear system Simple matrix operations for computations Non-linear Kalman filter Usually, observation or motion model not CLG Use numerical integration to find Gaussian approximation 2 1

What if the person chooses different motion models? With probability θ, move more or less straight With probability 1-θ, do the moonwalk 3 The moonwalk 4 2

What if the person chooses different motion models? With probability θ, move more or less straight With probability 1-θ, do the moonwalk 5 Switching Kalman filter At each time step, choose one of k motion models: You never know which one! p(x i+1 X i,z i+1 ) CLG indexed by Z i p(x i+1 X i,z i+1 =j) ~ N(β j 0 + Βj X i ; Σ j Xi+1 Xi ) 6 3

Inference in switching KF one step Suppose p(x 0 ) is Gaussian Z 1 takes one of two values p(x 1 X o,z 1 ) is CLG Marginalize X 0 Marginalize Z 1 Obtain mixture of two Gaussians! 7 Multi-step inference Suppose p(x i ) is a mixture of m Gaussians Z i+1 takes one of two values p(x i+1 X i,z i+1 ) is CLG Marginalize X i Marginalize Z i Obtain mixture of 2m Gaussians! Number of Gaussians grows exponentially!!! 8 4

Visualizing growth in number of Gaussians 9 Computational complexity of inference in switching Kalman filters Switching Kalman Filter with (only) 2 motion models Query: Problem is NP-hard!!! [Lerner & Parr `01] Why!!!? Graphical model is a tree: Inference efficient if all are discrete Inference efficient if all are Gaussian But not with hybrid model (combination of discrete and continuous) 10 5

Bounding number of Gaussians P(X i ) has 2 m Gaussians, but usually, most are bumps have low probability and overlap: Intuitive approximate inference: Generate k.m Gaussians Approximate with m Gaussians 11 Collapsing Gaussians Single Gaussian from a mixture Given mixture P <w i ;N(µ i,σ i )> Obtain approximation Q~N(µ,Σ) as: Theorem: P and Q have same first and second moments KL projection: Q is single Gaussian with lowest KL divergence from P 12 6

Collapsing mixture of Gaussians into smaller mixture of Gaussians Hard problem! Akin to clustering problem Several heuristics exist c.f., K&F book 13 Operations in non-linear switching Kalman filter X 1 X 2 X 3 X 4 X 5 O 1 = O 2 = O 3 = O 4 = O 5 = Compute mixture of Gaussians for Start with At each time step t: For each of the m Gaussians in p(x i o 1:i ): Condition on observation (use numerical integration) Prediction (Multiply transition model, use numerical integration) Obtain k Gaussians Roll-up (marginalize previous time step) Project k.m Gaussians into m Gaussians p(x i o 1:i+1 ) 14 7

Announcements Lectures the rest of the semester: Wed. 11/30, regular class time: Causality (Richard Scheines) Last Class: Friday 12/1, regular class time: Finish Dynamic BNs & Overview of Advanced Topics Deadlines & Presentations: Project Poster Presentations: Dec. 1 st 3-6pm (NSH Atrium) popular vote for best poster Project write up: Dec. 8 th by 2pm by email 8 pages limit will be strictly enforced Final: Out Dec. 1 st, Due Dec. 15 th by 2pm (strict deadline) no late days on final! 10-708 Carlos Guestrin 2006 15 Assumed density filtering Examples of very important assumed density filtering: Non-linear KF Approximate inference in switching KF General picture: Select an assumed density e.g., single Gaussian, mixture of m Gaussians, After conditioning, prediction, or roll-up, distribution no-longer representable with assumed density e.g., non-linear, mixture of k.m Gaussians, Project back into assumed density e.g., numerical integration, collapsing, 16 8

When non-linear KF is not good enough Sometimes, distribution in non-linear KF is not approximated well as a single Gaussian e.g., a banana-like distribution Assumed density filtering: Solution 1: reparameterize problem and solve as a single Gaussian Solution 2: more typically, approximate as a mixture of Gaussians 17 Reparameterized KF for SLAT [Funiak, Guestrin, Paskin, Sukthankar 05] 18 9

When a single Gaussian ain t good enough [Fox et al.] Sometimes, smart parameterization is not enough Distribution has multiple hypothesis Possible solutions Sampling particle filtering Mixture of Gaussians See book for details 19 Approximating non-linear KF with mixture of Gaussians Robot example: P(X i ) is a Gaussian, P(X i+1 ) is a banana Approximate P(X i+1 ) as a mixture of m Gaussians e.g., using discretization, sampling, Problem: P(X i+1 ) as a mixture of m Gaussians P(X i+2 ) is m bananas One solution: Apply collapsing algorithm to project m bananas in m Gaussians 20 10

What you need to know Switching Kalman filter Hybrid model discrete and continuous vars. Represent belief as mixture of Gaussians Number of mixture components grows exponentially in time Approximate each time step with fewer components Assumed density filtering Fundamental abstraction of most algorithms for dynamical systems Assume representation for density Every time density not representable, project into representation 21 More than just a switching KF Switching KF selects among k motion models Discrete variable can depend on past Markov model over hidden variable What if k is really large? Generalize HMMs to large number of variables 22 11

Dynamic Bayesian network (DBN) HMM defined by Transition model P(X (t+1) X (t) ) Observation model P(O (t) X (t) ) Starting state distribution P(X (0) ) DBN Use Bayes net to represent each of these compactly Starting state distribution P(X (0) ) is a BN (silly) e.g, performance in grad. school DBN Vars: Happiness, Productivity, HiraBlility, Fame Observations: PapeR, Schmooze 23 Transition Model: Two Time-slice Bayes Net (2-TBN) Process over vars. X 2-TBN: represents transition and observation models P(X (t+1),o (t+1) X (t) ) X (t) are interface variables (don t represent distribution over these variables) As with BN, exponential reduction in representation complexity 24 12

Unrolled DBN Start with P(X (0) ) For each time step, add vars as defined by 2- TBN 25 Sparse DBN and fast inference Sparse DBN Fast inference Time t t+1 t+2 t+3 A A A A B B B B C C C C D D D D E E E E F F F F 26 13

Even after one time step!! What happens when we marginalize out time t? Time t t+1 A A B B C C D E D E F F 27 Sparse DBN and fast inference 2 Structured representation of belief often yields good approximate Sparse DBN Time t t+1 Almost!? Fast inference t+2 t+3 A A A A B B B B C C C C D D D D E E E E F F F F 28 14

BK Algorithm for approximate DBN inference [Boyen, Koller 98] Assumed density filtering: ^ Choose a factored representation P for the belief state ^ Every time step, belief not representable with P, project into representation Time t t+1 A A t+2 t+3 A A B B B B C C C C D D D D E E E E F F F F 29 A simple example of BK: Fully- Factorized Distribution Assumed density: Fully factorized Time t t+1 A A True P(X (t+1) ): Assumed Density ^ for P(X (t+1) ): B B C C D E D E F F 30 15

Computing Fully-Factorized Distribution at time t+1 Assumed density: Fully factorized Assumed Density ^ for P(X (t+1) ): Time t t+1 A A Computing ^ for P(X (t+1) i ): B B C C D E D E F F 31 General case for BK: Junction Tree Represents Distribution Assumed density: Fully factorized Time t t+1 A A True P(X (t+1) ): Assumed Density ^ for P(X (t+1) ): B B C C D E D E F F 32 16

Computing factored belief state in the next time step Introduce observations in current time step Use J-tree to calibrate time t beliefs Compute t+1 belief, project into approximate belief state marginalize into desired factors corresponds to KL projection Equivalent to computing marginals over factors directly For each factor in t+1 step belief Use variable elimination A B C D E F A B C D E F 33 Error accumulation Each time step, projection introduces error Will error add up? causing unbounded approximation error as t 34 17

Contraction in Markov process 35 BK Theorem Error does not grow unboundedly! 36 18

Example BAT network [Forbes et al.] 37 BK results [Boyen, Koller 98] 38 19

Thin Junction Tree Filters [Paskin 03] BK assumes fixed approximation clusters TJTF adapts clusters over time attempt to minimize projection error 39 Hybrid DBN (many continuous and discrete variables) DBN with large number of discrete and continuous variables # of mixture of Gaussian components blows up in one time step! Need many smart tricks e.g., see Lerner Thesis Reverse Water Gas Shift System (RWGS) [Lerner et al. 02] 40 20

DBN summary DBNs factored representation of HMMs/Kalman filters sparse representation does not lead to efficient inference Assumed density filtering BK factored belief state representation is assumed density Contraction guarantees that error does blow up (but could still be large) Thin junction tree filter adapts assumed density over time Extensions for hybrid DBNs 41 21