ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Similar documents
Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 6: Bayesian Inference in SDE Models

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Introduction to Machine Learning

TSRT14: Sensor Fusion Lecture 8

An introduction to particle filters

16 : Markov Chain Monte Carlo (MCMC)

2D Image Processing (Extended) Kalman and particle filter

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

AUTOMOTIVE ENVIRONMENT SENSORS

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Lecture 7: Optimal Smoothing

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Answers and expectations

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Part 1: Expectation Propagation

Sequential Monte Carlo Methods for Bayesian Computation

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Inference and MCMC

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Adaptive Monte Carlo methods

Previously Monte Carlo Integration

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Gaussian Process Approximations of Stochastic Differential Equations

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

1 Kalman Filter Introduction

DAG models and Markov Chain Monte Carlo methods a short overview

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Dynamic System Identification using HDMR-Bayesian Technique

Robert Collins CSE586, PSU Intro to Sampling Methods

9 Multi-Model State Estimation

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

STA 4273H: Statistical Machine Learning

Introduction to Probabilistic Graphical Models: Exercises

Lecture 8: The Metropolis-Hastings Algorithm

A NEW NONLINEAR FILTER

CPSC 540: Machine Learning

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Introduction to Probability and Statistics (Continued)

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Computer Intensive Methods in Mathematical Statistics

Approximate Bayesian Computation

Particle Filtering for Data-Driven Simulation and Optimization

CSC 446 Notes: Lecture 13

17 : Markov Chain Monte Carlo

Sensor Fusion: Particle Filter

Human Pose Tracking I: Basics. David Fleet University of Toronto

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Bayesian Methods for Machine Learning

Probabilistic Fundamentals in Robotics

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Markov chain Monte Carlo Lecture 9

Markov Chain Monte Carlo Methods for Stochastic Optimization

Lecture 7 and 8: Markov Chain Monte Carlo

Particle Filtering Approaches for Dynamic Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Graphical Models and Kernel Methods

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Introduction to Machine Learning CMU-10701

Robert Collins CSE586, PSU Intro to Sampling Methods

Particle Filters. Outline

ENGR352 Problem Set 02

Monte Carlo Methods. Leon Gu CSD, CMU

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017

Introduction to Bayesian methods in inverse problems

Nonparametric Bayesian Methods (Gaussian Processes)

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Machine Learning 4771

Overfitting, Bias / Variance Analysis

Blind Equalization via Particle Filtering

Probabilistic Graphical Models

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

SEQUENTIAL MONTE CARLO METHODS WITH APPLICATIONS TO COMMUNICATION CHANNELS. A Thesis SIRISH BODDIKURAPATI

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

Bayesian Inverse problem, Data assimilation and Localization

Expectation Propagation in Dynamical Systems

Cheng Soon Ong & Christian Walder. Canberra February June 2018

The Kalman Filter ImPr Talk

X t = a t + r t, (7.1)

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Results: MCMC Dancers, q=10, n=500

Ergodicity in data assimilation methods

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

Expectation propagation for signal detection in flat-fading channels

Machine learning - HT Maximum Likelihood

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Transcription:

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal: a2pal@eng.ucsd.edu 1

Vectors Notation x R d d-dimensional vector in Euclidean space. It may be deterministic or random. If it is random we describe it with a parametric distribution with parameters θ or w or equivalently with its probability density function px; w. Example: x N µ, Σ with pdf φx; µ, Σ, where N is the distribution, φ is the pdf, and {µ, Σ} are the deterministic vector and matrix parameters. A R n m An n m dimensional matrix. Its transpose is A T R m n and its inverse if it exists is A 1 R n m for n = m. x 2 := x T x vector Euclidean norm xt a vector that changes in continuous time t R >0 x t a vector that changes in discrete time t N ẋt := d dt xt time derivative of xt 2

Random Vectors Notation x R d d-dimensional random vector with CDF F and pdf p E[hx] expectation of a function h of the random vector x: E[hx] := hxpxdx µ := E[x] mean of x, also called first moment of x x := arg max px mode of x x E [ xx T ] second moment of x Σ := E [x ] µx µ T = E [xx ] covariance of x T µµ T 3

Parameter Estimation Notation D := {x i, y i } n i=1 px i, y i ; ω py i x i ; ω max ω n i=1 px i, y i ; ω max y px, y; ω σz := 1 1+exp z R softmaxz := training data of examples x i R d and labels y i { 1, 1} classification or y i R regression. The elements x i, y i are iid over i sampled from an unknown joint over x i and y i pdf p x i, y i generative model using pdf parameters ω discriminative model using parameters ω training: parameter estimation via MLE testing: label prediction for given new sample x and already trained parameters ω sigmoid function: useful for modeling binary classification ez R d softmax function: useful for modeling K- 1 T e z ary classification 4

Orientations Notation a R 3 â so3 R = expâ SO3 q = exp[0, 1 2a] S3 quaternion an axis-angle representation of orientation/rotation, also called rotation vector. The axis of rotation is ξ := a a 2 and the angle is θ := a 2 a skew-symmetric matrix associated with cross products âb = a b for b R 3. Skew-symmetric matrices define the Lie algebra i.e., local linear approximation of the Lie group of rotations. rotation matrix representation of the rotation vector a R 3. The matrix exponential function is expa := k=0 Ak k! and for rotations can be computed in closed form via the Rodrigues formula. representation of the rotation vector a R 3. The quaternion exponential function is not the same as the matrix exponential function! 5

Transformations Notation ω R 3 Angular velocity defines the rate of change of orientation for rotation matrices Ṙ = ˆωR or equivalently for quaternions q = [ 0, ω 2 ] q ζ := ω, v R 6 Linear velocity v R 3 and angular velocity ω R 3 ˆζ se3 a twist matrix v R 4 4 defining the Lie 0 1 algebra of the Lie group of rigid body transformations. A twist ˆζ defines the rate of change of pose of position ṗ = ˆωp + v and orientation Ṙ = ˆωR g = expˆζ SE3 a matrix g = R p representing rigid body 0 1 transformations, where R SO3 is the rotation/orientation and p R 3 is the translation/position 6

Transformations Notation W g B SE3 x t SE3 Body frame to world frame transformation defined by the body pose W g B or robot state x t g 1 g 2 := g 1 g 2 = R 1 p 1 R 2 p 2 0 1 0 1 g 2 g 1 := g 1 1 g 2 = RT 1 R1 T p 1 R 2 p 2 0 1 0 1 Transformation composition in SE3 equivalent to addition in Euclidean spaces Transformation inverse composition in SE3 equivalent to subtraction in Euclidean spaces 7

Filtering Notation x t R d system/robot state to be estimated e.g., position, orientation, etc.. Usually x t N µ t t, Σ t t with pdf φx t ; µ t t, Σ t t u t R du z t R dz w t N 0, W v t N 0, V known system/robot control input e.g., rotational velocity known system/robot measurement/observation e.g., pixel coordinates Gaussian motion noise Gaussian observation noise p t t x t pdf of the robot state x t given past measurements z 0:t and control inputs u 0:t 1 : p t t x t := px t z 0:t, u 0:t 1 p x t+1 predicted pdf of the robot state x t+1 given past measurements z 0:t and control inputs u 0:t : p x t+1 := px t+1 z 0:t, u 0:t 8

Bayes Filter Motion model: x t+1 = ax t, u t, w t p a x t, u t Observation model: z t = hx t, v t p h x t Filtering: keeps track of Bayes filter: p t t x t := px t z 0:t, u 0:t 1 p x t+1 := px t+1 z 0:t, u 0:t 1 η t+1 {}}{ 1 p +1 x t+1 = pz t+1 z 0:t, u 0:t p hz t+1 x t+1 Predict: p x t+1 { }}{ p a x t+1 x t, u t p t t x t dx t } {{ } Update Joint distribution: T px 0:T, z 0:T, u 0:T 1 = p 0 0 x 0 }{{} t=0 prior p h z t x t }{{} t=0 observation model T p a x t x t 1, u t 1 }{{} motion model 9

Gaussian Mixture Filter Prior: x t z 0:t, u 0:t 1 p t t x x := k αk t t φ x t ; µ k t t, Σk t t Motion model: x t+1 = Ax t + Bu t + w t, w t N 0, W Observation model: z t = Hx t + v t, v t N 0, V Prediction: p x = Update: = k p +1 x = p hz t+1 xp x pz t+1 z 0:t, u 0:t = k = k t t p a x s, u t p t t sds = k t t φ x; Aµ k t t + Bu t, AΣ k t t AT + W αk φz t+1; Hx, V φ x; µ k, Σk j αj φ z t+1 ; Hµ j, HΣj αk φ z t+1 ; Hµ k, HΣk HT + V j αj φ z t+1 ; Hµ j, HΣj HT + V p a x s, u t φ s; µ k t t, Σk t t ds φz t+1 ; Hx, V k αk φ x; µ k, Σk = φzt+1 ; Hs, V j αj φ s; µ j, Σj ds φ z t+1 ; Hµ k, HΣk HT + V HT + V φ z t+1 ; Hµ k, HΣk HT + V φ x; µ k Kalman Gain: K k := Σk HT HΣ k HT + V k +K z t+1 Hµ k k, I K HΣk 1 10

Gaussian Mixture Filter pdf: x t z 0:t, u 0:t 1 p t t x := k t t φ x t ; µ k t t, Σk t t mean: µ t t := E[x t z 0:t, u 0:t 1 ] = xp t t xdx = k t t µk t t ] cov: Σ t t := E [x t xt T z 0:t, u 0:t 1 = xx T p t t xdx µ t t µ T t t = k µ t t µ T t t t t Σ k t t + µk t t µk t t T µ t t µ T t t The GMF is just a bank of Kalman filters; sometimes called Gaussian Sum filter 11

Gaussian Mixture Filter If the motion or observation models are nonlinear, we can apply the EKF or UKF tricks to get a nonlinear GMF Additional operations are needed when strong nonlinearities are present in the motion or observation models: Refinement: introduces additional components to reduce the linearization error Pruning: approximates the overall distribution with a smaller number of components e.g., using KL divergence as a measure of accuracy More details: Nonlinear Gaussian Filtering: Theory, Algorithms, and Applications: Huber Bayesian Filtering and Smoothing: Särkkä 12

Histogram Filter Represent the pdf via a histogram over a discrete set of possible locations The accuracy is limited by the grid size A small grid becomes very computationally expensive in high dimensional state spaces because the number of cells is exponential in the number of dimensions Idea: represent the pdf via adaptive discretization, e.g., octrees 13

Histogram Filter Prediction step Assume bounded Gaussian noise in the motion model The prediction step can be realized by shifting the data in the grid according to the control input and convolving the grid with a separable Gaussian kernel: This reduces the prediction step cost from On 2 to On where n is the number of cells Update step To update and normalize the pdf upon sensory input, one has to iterate over all cells Is it possible to monitor which part of the state space is affected by the observations and only update that? 14

Particle Filter A Gaussian mixture filter with Σ k t t 0 so that φ x t ; µ k t t, Σk t t δ x t ; µ k t t := 1{x t = µ k t t } Prior: x t z 0:t, u 0:t 1 p t t x x := N t t αk t t δ x t ; µ k t t Motion model: x t+1 p a x t, u t Observation model: z t p h x t Prediction: p x = Update: N t t p a x s, u t t t δ s; µ k ds = t t N t t p h z t+1 x N δ x; µ k N p +1 x = ph z t+1 s N = j=1 α j δ s; µ j ds N t t p ax µ k t t, u t?? p h N j=1 α j p h z t+1 µ k δ x; µ k z t+1 µ j δ x; µ k 15

Particle Filter Resampling How do we approximate the prediction step? N t t p x = N t t p ax µ k t t, u t?? δ x; µ k How do we avoid particle depletion - a situation in which most of the particle weights are close to zero? Just like the GMF uses refinement and pruning, the particle filter uses a procedure called resampling to: 1. approximate the prediction step 2. avoid particle depletion during the update step Resampling is applied at time t if the effective number of particles: 1 N eff := is less than a threshold Nt t t t 2 16

Particle Filter Prediction How do we approximate the prediction step? N t t p x = N t t p ax µ k t t, u t?? δ x; µ k Since p x is a mixture pdf, we can approximate it with particles by drawing samples directly from it Let N be the number of particles in the approximation usually, N = N t t Bootstrap approximation: repeat N times and normalize the weights at the end: Draw j {1,..., Nt t } with probability α j t t j Draw µ p a µ j t t, u t Add the weighted sample µ j, p µ j to the new particle set 17

Particle Filter 18

Inverse Transform Sampling Target distribution: How do we sample from a distribution with pdf px and CDF F x = x psds? Inverse Transform Sampling: 1. Draw u U0, 1 2. Return inverse CDF value: µ = F 1 u 3. The CDF of F 1 u is: PF 1 u x = Pu F x = F x 19

Rejection Sampling Target distribution: How do we sample from a complicated pdf px? Proposal distribution: use another pdf qx that is easy to sample from e.g., Uniform, Gaussian and: λpx qx with λ 0, 1 Rejection Sampling: 1. Draw u U0, 1 and µ q 2. Return µ only if u λpµ qµ. If λ is small, many rejections are necessary Good qx and λ are hard to choose in practice 20

Sample Importance Resampling SIR How about rejection sampling without λ? Sample Importance Resampling for a target distribution p with proposal distribution q 1. Draw µ 1,..., µ N q 2. Compute importance weights = pµk qµ k and normalize: αk = αk j αj 3. Draw µ k independently with replacement from { µ 1,..., µ N} with probability and add to the final sample set with weight 1 N If q is a poor approximation of p, then the best samples from q are not necessarily good samples for resampling Markov Chain Monte Carlo methods e.g., Metropolis-Hastings and Gibbs sampling: The main drawback of rejection sampling and SIR is that choosing a good proposal distribution q is hard Idea: let the proposed samples µ depend on the last accepted sample µ, i.e., obtain correlated samples from a conditional proposal distribution µ k q µ k 1 Under certain conditions, the samples generated from q µ form an ergodic Markov chain with p as its stationary distribution 21

Stratified Resampling In the last step of SIR, the weighted sample set {µ k, } is resampled independently with replacement This might result in high variance resampling, i.e., sometimes some samples with large weights might not be selected or samples with very small weights may be selected multiple times Stratified resampling: guarantees that samples with large weights appear at least once and those with small weights at most once. Stratified resampling is optimal in terms of variance Thrun et al. 2005 Instead of selecting samples independently, use a sequential process: Add the weights along the circumference of a circle Divide the circle into N equal pieces and sample a uniform on each piece Samples with large weights are chosen at least once and those with small weights at most once 22

Stratified Resampling Stratified low variance resampling 1: Input: particle set { µ k, } N 2: Output: resampled particle set 3: j 1, c α 1 4: for k = 1,..., N do 5: u U 0, 1 N 6: β = u + k 1 N 7: while β > c do 8: j = j + 1, c = c + α j 9: add µ j, 1 N to the new set Systematic resampling: the same as stratified resampling except that the same uniform is used for each piece, i.e., u U 0, 1 N is sampled only once before the for loop above. Sample importance resampling SIR: draw µ k independently with replacement from { µ 1,..., µ N} with probability and add to the final sample set with weight 1 N 23

Particle Filter Summary Prior: x t z 0:t, u 0:t 1 p t t x x := N t t αk t t δ x t ; µ k t t Motion model: x t+1 p a x t, u t Observation model: z t p h x t Prediction: approximate the mixture by sampling: p x = N t t p a x s, u t t t δ s; µ k ds = t t N t t N t t p ax µ k t t, u t δ x; µ k Update: rescale the particles based on the observation likelihood: p h z t+1 ; x N δ x; µ k N p +1 x = ph z t+1 ; s N = j=1 α j δ s; µ j ds p h N j=1 α j p h z t+1 µ k z t+1 µ j δ x; µ k If N eff := 1 { µ k N t t t t +1, αk +1 N threshold, resample the particle set 2 } via stratified or sample importance resampling 24

Rao-Blackwellized Particle Filter The Rao-Blackwellized marginalized particle filter is applicable to conditionally linear-gaussian models: xt+1 n = ft n xt n + A n t xt n xt l + Gt n xt n wt n xt+1 l = ft l xt n + A l txt n xt l + Gt l xt n wt l z t = h t xt n + C t xt n xt l Nonlinear states: x + v t n t Linear states: xt l Idea: exploit linear-gaussian sub-structure to handle high dim. problems xt, l x0:t n z 0:t, u 0:t 1 p = p xt l z 0:t, u 0:t 1, x0:t n }{{} Kalman Filter N t t px0:t n z 0:t, u 0:t 1 }{{} Particle Filter = t t δ x0:t; n m k t t φ xt; l µ k t t, Σk t t The RBPF is a combination of the particle filter and the Kalman filter, in which each particle has a Kalman filter associated to it 25