Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Similar documents
Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Approximate Inference

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond

2D Image Processing (Extended) Kalman and particle filter

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

STA 4273H: Statistical Machine Learning

State Space and Hidden Markov Models

Mobile Robot Localization

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Artificial Intelligence (AI)

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Spring Announcements

Introduction to Machine Learning

Bayesian Methods / G.D. Hager S. Leonard

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Introduction to Machine Learning CMU-10701

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

STA 414/2104: Machine Learning

CS 5522: Artificial Intelligence II

CSEP 573: Artificial Intelligence

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Sensor Fusion: Particle Filter

Computer Intensive Methods in Mathematical Statistics

CSE 473: Artificial Intelligence

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Machine Learning for OR & FE

Math 350: An exploration of HMMs through doodles.

Hidden Markov models

Markov Chains and Hidden Markov Models

The Kalman Filter ImPr Talk

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

PROBABILISTIC REASONING OVER TIME

Advanced Data Science

Hidden Markov Models (recap BNs)

2D Image Processing. Bayes filter implementation: Kalman filter

CS 343: Artificial Intelligence

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Hidden Markov Models

Mobile Robot Localization

Sequential Monte Carlo Methods for Bayesian Computation

Robert Collins CSE586, PSU Intro to Sampling Methods

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models,99,100! Markov, here I come!

Introduction to Artificial Intelligence (AI)

CS 343: Artificial Intelligence

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Content.

Probabilistic Graphical Models

Gaussian Process Approximations of Stochastic Differential Equations

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Bayes Filter Reminder. Kalman Filter Localization. Properties of Gaussians. Gaussians. Prediction. Correction. σ 2. Univariate. 1 2πσ e.

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

2D Image Processing. Bayes filter implementation: Kalman filter

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

CS 5522: Artificial Intelligence II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Markov Models and Hidden Markov Models

Sequential Bayesian Updating

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

CS 5522: Artificial Intelligence II

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

COMP90051 Statistical Machine Learning

Probability and Time: Hidden Markov Models (HMMs)

Probabilistic Graphical Models

CS532, Winter 2010 Hidden Markov Models

AUTOMOTIVE ENVIRONMENT SENSORS

Linear Dynamical Systems

L06. LINEAR KALMAN FILTERS. NA568 Mobile Robotics: Methods & Algorithms

Lecture 9. Time series prediction

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

CSE 473: Ar+ficial Intelligence

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Autonomous Navigation for Flying Robots

CPSC 340: Machine Learning and Data Mining

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Particle Filtering for Data-Driven Simulation and Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Transcription:

Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where your robot is makes use of reasoning about processes that unfold over time. So does most reasoning, actually. Medical diagnosis, politics, stock market, and the choices you make every day, for eample. What is a state Everything you need to know to make the best prediction about what happens net. Depends how you define the system you care about. States are called or s. Dependence on time can be indicated by (t). States can be discrete or continuous. AI researchers tend to say state when they mean some features derived from the state. This should be discouraged. A belief state is your knowledge about the state, which is typically a probability p(). Processes with state are called Markov processes. Dealing with time The concept of state gives us a handy way of thinking about how things evolve over time. We will use discrete time, for eample 0.001, 0.002, 0.003, State at time t k, (t k ), will be written k or [k]. Deterministic state transition function k+1 = f( k ) Stochastic state transition function p( k+1 k ) Mildly stochastic state transition function k+1 = f( k ) + ε, with ε being Gaussian. Hidden state Sometimes the state is directly measurable/observable. Sometimes it isn t. Then you have hidden state and a hidden Markov model or HMM. Eamples: Do you have a disease? What am I thinking about? What is wrong with the Mars rover? Where is the Mars rover? measurements k-1 k k+1 y k-1 y k y k+1 Measurements Measurements (y) are also called evidence (e) and observables (o). Measurements can be discrete or continuous. Deterministic measurement function y k = g( k ) Stochastic measurement function p(y k k ) Mildly stochastic measurement function y k = g( k ) + υ, with υ being Gaussian. 1

Standard problems Predict the future. Estimate the current state (filtering). Estimate what happened in the past (smoothing). Find the most likely state trajectory (sequence/trajectory (speech) recognition). Learn about the process (learn state transition and measurement models). Prediction, Case 0 Deterministic state transition function k+1 = f( k ) and known state k : Just apply f() n times to get k+n. When we worry about learning the state transition function and the fact that it will always have errors, the question will arise: To predict k+n, is it better to learn k+1 = f 1 ( k ) and iterate, or learn k+n = f n ( k ) directly? Prediction, Case 1 discrete states, belief state p( k ) Use tables to represent p( k ) Propagate belief state: p( k+1 ) = Σp( k+1 k )p( k ) Matri notation: Vector p k, Transition matri M, M ij = p( i j ); i, j, components, not time. Propagate belief k state: p k+1 = Mp k time.stationary distribution M = lim(n-> ) M n Miing time: n for which M n M Prediction, Case 2 continuous states, belief state p( k ) Propagate belief state analytically if possible p( k+1 ) = p( k+1 k )p( k )d k Particle filtering (actually many ways to implement). Sample p( k ). For each sample, sample p( k+1 k ). Normalize/resample resulting samples to get p( k+1 ). Iterate to get p( k+n ) Prediction, Case 3 Mildly stochastic state transition function with p( k ) being N(µ,Σ ), k+1 = f( k ) + ε, with ε being N(0,Σ ε ), ε independent of process. E( k+1 ) f(µ) A = f/ Var( k+1 ) AΣ A T + Σ ε p( k+1 ) is N(E( k+1 ),Var( k+1 )). Eact if f() linear. Iterate to get p( k+n ). Much simpler than particle filtering. Filtering, in general Start with p( k-1 + ) Predict p( k- ) Apply measurement using Bayes Rule to get p( k + )= p( k y k ) p( k y k ) = p(y k k )p( k )/p(y k ) Sometimes we ignore p(y k ) and just renormalize as necessary, so all we have to do is p( k y k ) = αp(y k k )p( k ) 2

Filtering, Case 1 discrete states, belief state p( k ), p(y k k ) Use tables to represent p( k ) Propagate belief state: p( k+1- ) = Σ p( k+1 k )p( k+ ) Weight each entry by p(y k k ): p( k+1+ ) p(y kk k )p( k+1- ) Normalize so sum of p() = 1 This is called a Discrete Bayes Filter Filtering, Case 2 continuous states, belief state p( k ) Particle filtering (actually many ways to implement). Sample p( k ). For each sample, sample p( k+1 k ). Weight each sample by p(y k k ). Normalize/resample resulting samples to get p( k+1 ). Iterate to get p( k+n ) Particle Filter Algorithm EEE 581 Lecture 16 Particle Filters: a Gentle Introduction http://www.fulton.asu.edu/~morrell/581/ Create particles as samples from the initial state distribution p( 0 ). For k going from 1 to K Update each particle using the state update equation. Compute weights for each particle using the observation value. (Optionally) resample particles. Initial State Distribution: Samples Only Samples and Weights Each particle has a value and a weight 0 0 3

Importance Sampling Ideally, the particles would represent samples drawn from the distribution p(). In practice, we usually cannot get p() in closed form; in any case, it would usually be difficult to draw samples from p(). We use importance sampling: Particles are drawn from an importance distribution. Particles are weighted by importance weights. k+1 = f 0 ( k ) + noise State Update Things are more complicated if have multimodal p( k+1 k ) 0 1 Compute Weights Resampling Use p(y ) to alter weights 1 1 Before After Can also draw samples with replacement using p(y )*weight as p(selection) In inference problems, most weights tend to zero ecept a few (from particles that closely match observations), which become large. We resample to concentrate particles in regions where p( y) is larger. Advantages of Particle Filters Under general conditions, the particle filter estimate becomes asymptotically optimal as the number of particles goes to infinity. Non-linear, non-gaussian state update and observation equations can be used. Multi-modal distributions are not a problem. Particle filter solutions to inference problems are often easy to formulate. Disadvantages of Particle Filters Naïve formulations of problems usually result in significant computation times. It is hard to tell if you have enough particles. The best importance distribution and/or resampling methods may be very problem specific. 4

Conclusions Particle filters (and other Monte Carlo methods) are a powerful tool to solve difficult inference problems. Formulating a filter is now a tractable eercise for many previously difficult or impossible problems. Implementing a filter effectively may require significant creativity and epertise to keep the computational requirements tractable. Particle Filtering Comments Reinvented many times in many fields: sequential Monte Carlo, condensation, bootstrap filtering, interacting particle approimations, survival of the fittest, Do you need R d samples to cover space? R is crude measure of linear resolution, d is dimensionality. You maintain a belief state p(). How do you answer the question Where is the robot now? mean, best sample, robust mean, ma likelihood, What happens if p() really is multimodal? Return to our regularly scheduled programming Filtering Filtering, Case 3 Mildly stochastic state transition function with p( k ) being N(µ,Σ ), k+1 = f( k ) + ε, with ε being N(0,Σ ε ) and independent of process. Mildly stochastic measurement function y k = g( k ) + υ, with υ being N(0,Σ υ ) and independent of everything else. This will lead to Kalman Filtering Nonlinear f() or g() means you are doing Etended Kalman Filtering (EKF). Filtering, Case 3 Prediction Step E( k+1- ) f(µ) A = f/ Var( k+1- ) AΣ A T + Σ ε p( k+1- ) is N(E( k+1- ),Var( k+1- )) Filtering, Case 3 Measurement Update Step E( k+ ) E( k- ) + K k (y k g(e( k- ))) C = g/ Σ k- = Var( k- ) Var( k+ ) Σ k- -K k C Σ k - S k = CΣ k- C T + Σ υ K k = Σ k- C T S k -1 p( k+ ) is N(E( k+ ),Var( k+ )) This all comes from Gaussian lecture 5

Unscented Filter Numerically find best fit Gaussian instead of analytical computation. Good if f() or g() strongly nonlinear. What I would like to see Combine particle system and Kalman filter, so each particle maintains a simple distribution, instead of just a point estimate. Smoothing, in general Have y 1:N, want p( k y 1:N ) Know how to compute p( k y 1:k ) from filtering slides p( k y 1:N ) = p( k y 1:k,y k+1:n ) p( k y 1:N ) p( k y 1:k )p(y k+1:n y 1:k, k ) p( k y 1:N ) p( k y 1:k )p(y k+1:n k ) p(y k+1:n k ) = Σ/ p(y k+1:n k, k+1 )p( k+1, k ) d k+1 = Σ/ p(y k+1:n k+1 )p( k+1, k ) d k+1 = Σ/ p(y k+1 k+1 )p(y k+2:n k+1 )p( k+1, k ) d k+1 Note recursion implied by p(y k+i+1:n k+i ) Smoothing, general comments Need to maintain distributional information at all time steps from forward filter. Case 1: discrete states: forward/backward algorithm. Case 2: continuous states, nasty dynamics or noise: particle smoothing (epensive). Case 3: continuous states, Gaussian noise: Kalman smoother. Finding most likely state trajectory Goal in speech recognition p( 1, 2,, N y 1:N ) p( 1 y 1:N )p( 2 y 1:N ) p( N y 1:N ) Are we screwed? Computing joint probability is hard! Viterbi Algorithm ma p( 1:k y 1:k ) = ma p(y 1:k 1:k )p( 1:k ) = ma p(y 1:k-1,y k 1:k )p( 1:k ) = ma p(y 1:k-1 1:k-1 )p(y k k )p( 1:k ) = ma p(y 1:k-1 1:k-1 )p(y k k )p( k 1:k-1 )p( 1:k-1 ) = ma [p(y 1:k-1 1:k-1 ) p( 1:k-1 )]p(y k k )p( k 1:k-1 ) = ma p( 1:k-1 y 1:k-1 )p(y k k )p( k k-1 ) Note recursion Do we evaluate this over all possible sequences? 6

Viterbi Algorithm (2) Viterbi Algorithm (3) Use dynamic programming p( 1:k-2 y 1:k-2 ) p( 1:k-1 y 1:k-1 ) p( 1:k-2 y 1:k-2 ) p( 1:k-1 y 1:k-1 ) p( 1:k-2 y 1:k-2 ) p( 1:k-1 y 1:k-1 ) p(y k k )p( k k-1 ) Well, this still only really works for discrete states. Continuous states have too many possible states at each step. D dimensions, R resolution in each dimension implies R D states at each time step. Ask me about local sequence maimization. States at time k-2 States at time k-1 States at time k ma p( 1:k-1 y 1:k-1 )p(y k k )p( k k-1 ) over predecessors Learning Given data, want to learn dynamic/transition and sensor models. Smooth, choose most likely state at each time, learn models, iterate. This is known as the EM algorithm. Discrete case: Baum-Welch Algorithm 7