The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Similar documents
Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Monte Carlo Methods. Leon Gu CSD, CMU

Markov chain Monte Carlo Lecture 9

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Results: MCMC Dancers, q=10, n=500

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Introduction to Machine Learning CMU-10701

Answers and expectations

Introduction to Machine Learning

Robert Collins CSE586, PSU Intro to Sampling Methods

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Probabilistic Graphical Models

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Bayesian Inference and MCMC

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Bayesian Methods for Machine Learning

STA 4273H: Statistical Machine Learning

13: Variational inference II

Mathematical Formulation of Our Example

Graphical Models and Kernel Methods

16 : Approximate Inference: Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

The Expectation Maximization or EM algorithm

Convex Optimization CMU-10725

Autonomous Mobile Robot Design

Quantifying Uncertainty

Quantitative Biology II Lecture 4: Variational Methods

STA 4273H: Statistical Machine Learning

Lecture 13 : Variational Inference: Mean Field Approximation

17 : Markov Chain Monte Carlo

Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

CS 343: Artificial Intelligence

Linear Dynamical Systems

2D Image Processing (Extended) Kalman and particle filter

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Lecture 7 and 8: Markov Chain Monte Carlo

16 : Markov Chain Monte Carlo (MCMC)

CPSC 540: Machine Learning

STA 4273H: Statistical Machine Learning

ELEC633: Graphical Models

Variational Inference (11/04/13)

Variational Principal Components

Approximate Inference

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Stochastic optimization Markov Chain Monte Carlo

Expectation Propagation Algorithm

Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

The Ising model and Markov chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Part IV: Monte Carlo and nonparametric Bayes

Bayesian Networks BY: MOHAMAD ALSABBAGH

Robert Collins CSE586, PSU Intro to Sampling Methods

Sampling Methods (11/30/04)

MCMC: Markov Chain Monte Carlo

19 : Slice Sampling and HMC

Machine Learning Techniques for Computer Vision

Bayes Nets: Sampling

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Advanced Introduction to Machine Learning

STA 294: Stochastic Processes & Bayesian Nonparametrics

Robert Collins CSE586, PSU Intro to Sampling Methods

STA 4273H: Statistical Machine Learning

an introduction to bayesian inference

Probabilistic Graphical Models

Mobile Robot Localization

Probabilistic Graphical Models

Pattern Recognition and Machine Learning

Markov Chain Monte Carlo Methods

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Markov Chains and MCMC

Chris Bishop s PRML Ch. 8: Graphical Models

Markov-Chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Markov Networks.

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Bayesian Machine Learning - Lecture 7

p L yi z n m x N n xi

DAG models and Markov Chain Monte Carlo methods a short overview

Mobile Robot Localization

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Contrastive Divergence

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

LECTURE 15 Markov chain Monte Carlo

MARKOV CHAIN MONTE CARLO

Probabilistic Graphical Models

6 Markov Chain Monte Carlo (MCMC)

Transcription:

The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that are not Gaussian. Can model non-linear transformations. Basic principle: Set of state hypotheses ( particles ) Survival-of-the-fittest 1

The Bayes Filter Algorithm (Rep.) Algorithm Bayes_filter : 1. if is a sensor measurement then 2. 3. for all do 4. 5. 6. for all do 7. else if is an action then 8. for all do 9. return Computer Vision 2

Mathematical Description Set of weighted samples: State hypotheses Importance weights The samples represent the probability distribution: Point mass distribution ( Dirac ) 3

The Particle Filter Algorithm 1. Algorithm Particle_filter : 2. for to do 3. 4. 5. 6. for to do Sample from proposal Compute sample weights Resampling 7. return 4

Localization with Particle Filters Each particle is a potential pose of the robot Proposal distribution is the motion model of the robot (prediction step) The observation model is used to compute the importance weight (correction step) Randomized algorithms are usually called Monte Carlo algorithms, therefore we call this: Monte-Carlo Localization 5

A Simple Example The initial belief is a uniform distribution (global localization). This is represented by an (approximately) uniform sampling of initial particles. 6

Sensor Information The sensor model new importance weights: is used to compute the 7

Robot Motion After resampling and applying the motion model densely at three locations. the particles are distributed more 8

Sensor Information Again, we set the new importance weights equal to the sensor model. 9

Robot Motion Resampling and application of the motion model: One location of dense particles is left. The robot is localized. 10

A Closer Look at the Algorithm 1. Algorithm Particle_filter : 2. for to do 3. 4. 5. 6. for to do Sample from proposal Compute sample weights Resampling 7. return 11

Sampling from Proposal This can be done in the following ways: Adding the motion vector to each particle directly (this assumes perfect motion) Sampling from the motion model, e.g. for a 2D motion with translation velocity v and rotation velocity w we have: Position Orientation 12

Motion Model Sampling (Example) Start 13

Computation of Importance Weights Computation of the sample weights: Proposal distribution: (we sample from that using the motion model) Target distribution (new belief): (we can not directly sample from that importance sampling) Computation of importance weights: 14

Proximity Sensor Models How can we obtain the sensor model? Sensor Calibration: Laser sensor Sonar sensor 15

Resampling Given: Set of weighted samples. Wanted : Random sample, where the probability of drawing x i is equal to w i. Typically done M times with replacement to generate new sample set. for to do 16

Resampling W n-1 w n w 1 w 2 W n-1 w n w 1 w 2 w 3 w 3 Standard n-times sampling results in high variance This requires more particles O(nlog n) complexity Instead: low variance sampling only samples once Linear time complexity Easy to implement 17

Sample-based Localization (sonar) 18

Initial Distribution 19

After Ten Ultrasound Scans 20

After 65 Ultrasound Scans 21

Estimated Path 22

Kidnapped Robot Problem The approach described so far is able to track the pose of a mobile robot and to globally localize the robot. How can we deal with localization errors (i.e., the kidnapped robot problem)? Idea: Introduce uniform samples at every resampling step This adds new hypotheses 23

Summary There are mainly 4 different types of sampling methods: Transformation method, rejections sampling, importance sampling and MCMC Transformation only rarely applicable Rejection sampling is often very inefficient Importance sampling is used in the particle filter which can be used for robot localization An efficient implementation of the resampling step is the low variance sampling 24

Prof. Daniel Cremers Markov Chain Monte Carlo

Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain Monte Carlo (MCMC) It keeps a record of the current state and the proposal depends on that state Most common algorithms are the Metropolis- Hastings algorithm and Gibbs Sampling 26

Markov Chains Revisited A Markov Chain is a distribution over discretestate random variables x 1,...,x M p(x 1,...,x T )=p(x 1 )p(x 2 x 1 ) = p(x 1 ) so that The graphical model of a Markov chain is this: TY t=2 p(x t x t 1 ) T We will denote as a row vector p(x t x t 1 ) t A Markov chain can also be visualized as a state transition diagram. 27

The State Transition Diagram time k=1 A 11 A 11 states k=2 k=3 A 33 A 33 t-2 t-1 t 28

Some Notions The Markov chain is said to be homogeneous if the transitions probabilities are all the same at every time step t (here we only consider homogeneous Markov chains) The transition matrix is row-stochastic, i.e. all entries are between 0 and 1 and all rows sum up to 1 Observation: the probabilities of reaching the states can be computed using a vector-matrix multiplication 29

The Stationary Distribution KX The probability to reach state k is Or, in matrix notation: t = t 1 A k,t = i=1 i,t 1 A ik We say that t is stationary if t = t 1 Questions: How can we know that a stationary distributions exists? And if it exists, how do we know that it is unique? 30

The Stationary Distribution (Existence) To find a stationary distribution we need to solve the eigenvector problem A T v = v The stationary distribution is then = v T where v is the eigenvector for which the eigenvalue is 1. This eigenvector needs to be normalized so that it is a valid distribution. Theorem (Perron-Frobenius): Every rowstochastic matrix has such an eigen vector, but this vector may not be unique. 31

Stationary Distribution (Uniqueness) A Markov chain can have many stationary distributions Sufficient for a unique stationary distribution: we can reach every state from any other state in finite steps at non-zero probability (i.e. the chain is ergodic) This is equivalent to the property that the transition matrix is irreducible: 0.1 0.1 1.0 0.9 1 2 3 4 0.5 0.5 0.9 8i, j 9m (A m ) ij > 0 32

Main Idea of MCMC So far, we specified the transition probabilities and analysed the resulting distribution This was used, e.g. in HMMs Now: We want to sample from an arbitrary distribution To do that, we design the transition probabilities so that the resulting stationary distribution is our desired (target) distribution! 33

Detailed Balance Definition: A transition distribution t satisfies the property of detailed balance if i A ij = j A ji The chain is then said to be reversible. 1 3 A 31 + A 13 A 31 3 1 A 13 + t-1 t 34

Making a Distribution Stationary Theorem: If a Markov chain with transition matrix A is irreducible and satisfies detailed balance wrt. the distribution, then is a stationary distribution of the chain. Proof: KX KX K X i A ij = j A ji = j A ji = j 8j i=1 i=1 i=1 it follows. = A This is a sufficient, but not necessary condition. 35

Sampling with a Markov Chain The idea of MCMC is to sample state transitions based on a proposal distribution q. The most widely used algorithm is the Metropolis-Hastings (MH) algorithm. In MH, the decision whether to stay in a given state is based on a given probability. If the proposal distribution is stay in state x 0 with probability min Unnormalized target distribution q(x 0 x) 1, p(x0 )q(x x 0 ) p(x)q(x 0 x), then we 36

The Metropolis-Hastings Algorithm Initialize x 0 for define s =0, 1, 2,... sample compute acceptance probability compute x = x s x 0 q(x 0 x) = p(x0 )q(x x 0 ) p(x)q(x 0 x) r =min(1, ) sample u U(0, 1) set new sample to x s+1 = ( x 0 if u<r x s if u r 37

Why Does This Work? We have to prove that the transition probability of the MH algorithm satisfies detailed balance wrt the target distribution. Theorem: If p MH (x 0 x) is the transition probability of the MH algorithm, then p(x)p MH (x 0 x) =p(x 0 )p MH (x x 0 ) Proof: 38

Why Does This Work? We have to prove that the transition probability of the MH algorithm satisfies detailed balance wrt the target distribution. Theorem: If p MH (x 0 x) is the transition probability of the MH algorithm, then p(x)p MH (x 0 x) =p(x 0 )p MH (x x 0 ) Note: All formulations are valid for discrete and for continuous variables! 39

Choosing the Proposal A proposal distribution is valid if it gives a nonzero probability of moving to the states that have a non-zero probability in the target. A good proposal is the Gaussian, because it has a non-zero probability for all states. However: the variance of the Gaussian is important! with low variance, the sampler does not explore sufficiently, e.g. it is fixed to a particular mode with too high variance, the proposal is rejected too often, the samples are a bad approximation 40

Example MH with N(0,1.000 2 ) proposal Target is a mixture of 2 1D Gaussians. Proposal is a Gaussian 0.2 0.1 0 0 200 with different variances. 400 600 800 Iterations 1000 100 50 0 Samples 50 100 MH with N(0,500.000 2 ) proposal MH with N(0,8.000 2 ) proposal 0.06 0.04 0.03 0.02 0.02 0 0 200 400 600 800 Iterations 1000 100 50 0 Samples 50 100 0.01 0 0 200 400 600 800 Iterations 1000 100 50 0 Samples 50 100 41

Gibbs Sampling Initialize {z i : i =1,...,M} For Sample Sample... =1,...,T Sample z ( +1) 1 p(z 1 z ( ) 2,...,z( ) M ) z ( +1) 2 p(z 2 z ( +1) 1,...,z ( ) M ) z ( +1) M p(z M z ( +1) 1,...,z ( +1) M 1 ) Idea: sample from the full conditional This can be obtained, e.g. from the Markov blanket in graphical models. 42

Gibbs Sampling: Example Use an MRF on a binary image with edge potentials (x s,x t )=exp(jx s x t ) ( Ising model ) and node potentials (x t )=N (y t x t, 2 ) x t 2 { 1, 1} y t x s x t 43

Gibbs Sampling: Example Use an MRF on a binary image with edge potentials (x s,x t )=exp(jx s x t ) ( Ising model ) and node potentials (x t )=N (y t x t, 2 ) Sample each pixel in turn sample 1, Gibbs 1 sample 5, Gibbs 1 mean after 15 sweeps of Gibbs 1 0.5 0.5 0.5 0 0 0 0.5 0.5 0.5 1 1 1 After 1 sample After 5 samples Average after 15 samples 44

Gibbs Sampling is a Special Case of MH The proposal distribution in Gibbs sampling is q(x 0 x) =p(x 0 i x i )I(x 0 i = x i ) This leads to an acceptance rate of: = p(x0 )q(x x 0 ) p(x)q(x 0 x) = p(x0 i x0 i )p(x0 i )p(x i x 0 i ) p(x i x i )p(x i )p(x 0 i x i) =1 Although the acceptance is 100%, Gibbs sampling does not converge faster, as it only updates one variable at a time. 45

Summary Markov Chain Monte Carlo is a family of sampling algorithms that can sample from arbitrary distributions by moving in state space Most used methods are the Metropolis-Hastings (MH) and the Gibbs sampling method MH uses a proposal distribution and accepts a proposed state randomly Gibbs sampling does not use a proposal distribution, but samples from the full conditionals 46

Prof. Daniel Cremers 11. Variational Inference

Motivation A major task in probabilistic reasoning is to evaluate the posterior distribution p(z X) of a set of latent variables Z given data X (inference) However: This is often not tractable, e.g. because the latent space is high-dimensional Two different solutions are possible: sampling methods and variational methods. In variational optimization, we seek a tractable distribution q that approximates the posterior. Optimization is done using functionals. 48

Motivation A major task in probabilistic reasoning is to evaluate the posterior distribution p(z X) of a set of latent variables Z given data X (inference) However: Careful: This is often Different not tractable, notation! e.g. because In Bishop the latent (and space in the is following high-dimensional slides) Two different solutions Z are hidden are possible: states sampling methods and and variational X are observations methods. In variational optimization, we seek a tractable distribution q that approximates the posterior. Optimization is done using functionals. 49

Variational Inference In general, variational methods are concerned with mappings that take functions as input. Example: the entropy of a distribution p Z H[p] = p(x) log p(x)dx Functional Variational optimization aims at finding functions that minimize (or maximize) a given functional. This is mainly used to find approximations to a given function by choosing from a family. The aim is mostly tractability and simplification. 50

The KL-Divergence Aim: define a functional that resembles a difference between distributions p and q Idea: use the average additional amount of information: Z p(x) log q(x)dx Z p(x) log p(x)dx = Z p(x) log q(x) p(x) dx = KL(pkq) This is known as the Kullback-Leibler divergence It has the properties: KL(qkp) 6= KL(pkq) KL(pkq) 0 KL(pkq) =0, p q This follows from Jensen s inequality 51

Example: A Variational Formulation of EM Assume for a moment that we observe X and the binary latent variables Z. The likelihood is then: p(x, Z, µ, ) = NY n=1 p(z n )p(x n z n, µ, ) 52

Example: A Variational Formulation of EM Assume for a moment that we observe X and the binary latent variables Z. The likelihood is then: p(x, Z, µ, ) = where p(z n ) = p(x n z n, µ, ) = NY n=1 KY k=1 KY k=1 p(z n )p(x n z n, µ, ) z nk k and N (x n µ k, k ) z nk Remember: z nk 2 {0, 1}, KX z nk =1 k=1 53

Example: A Variational Formulation of EM Assume for a moment that we observe X and the binary latent variables Z. The likelihood is then: p(x, Z, µ, ) = where p(z n ) = p(x n z n, µ, ) = NY n=1 KY k=1 KY k=1 p(z n )p(x n z n, µ, ) and which leads to the log-formulation: log p(x, Z, µ, ) = z nk k N (x n µ k, k ) z nk NX n=1 54 Remember: z nk 2 {0, 1}, KX z nk =1 k=1 KX z nk (log k + log N (x n µ k, k )) k=1

The Complete-Data Log-Likelihood log p(x, Z, µ, ) = NX KX z nk (log k + log N (x n µ k, k )) n=1 k=1 This is called the complete-data log-likelihood Advantage: solving for the parameters ( k, µ k, k ) is much simpler, as the log is inside the sum! We could switch the sums and then for every mixture component k only look at the points that are associated with that component. This leads to simple closed-form solutions for the parameters However: the latent variables Z are not observed! 55