Results: MCMC Dancers, q=10, n=500

Similar documents
Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Machine Learning for Data Science (CS4786) Lecture 24

Introduction to Machine Learning

Markov chain Monte Carlo Lecture 9

STA 4273H: Sta-s-cal Machine Learning

CPSC 540: Machine Learning

Introduction to Machine Learning CMU-10701

Graphical Models and Kernel Methods

Previously Monte Carlo Integration

CS 188: Artificial Intelligence. Bayes Nets

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

STA 4273H: Statistical Machine Learning

Bayes Nets: Sampling

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo (MCMC)

Answers and expectations

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Inference and MCMC

The Ising model and Markov chain Monte Carlo

Bayesian networks: approximate inference

Probabilistic Graphical Models

Lecture 7 and 8: Markov Chain Monte Carlo

Markov Networks.

Monte Carlo Methods. Leon Gu CSD, CMU

CS 343: Artificial Intelligence

Part IV: Monte Carlo and nonparametric Bayes

2D Image Processing (Extended) Kalman and particle filter

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

The Origin of Deep Learning. Lili Mou Jan, 2015

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Approximate Inference

LECTURE 15 Markov chain Monte Carlo

Pattern Recognition and Machine Learning

Autonomous Mobile Robot Design

Monte Carlo Methods. Geoff Gordon February 9, 2006

Robotics. Lecture 4: Probabilistic Robotics. See course website for up to date information.

Markov Chain Monte Carlo methods

STA 4273H: Statistical Machine Learning

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Markov Chain Monte Carlo

MCMC Sampling for Bayesian Inference using L1-type Priors

Probabilistic Graphical Models

Markov Chain Monte Carlo methods

19 : Slice Sampling and HMC

Sampling Algorithms for Probabilistic Graphical models

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Brief introduction to Markov Chain Monte Carlo

Markov Chain Monte Carlo

CSEP 573: Artificial Intelligence

Probabilistic Graphical Models

16 : Approximate Inference: Markov Chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Part 1: Expectation Propagation

Bayesian Methods / G.D. Hager S. Leonard

Markov and Gibbs Random Fields

Approximate Inference Part 1 of 2

Introduction to Machine Learning CMU-10701

DAG models and Markov Chain Monte Carlo methods a short overview

Approximate Inference Part 1 of 2

an introduction to bayesian inference

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Mobile Robot Localization

Bayesian Estimation with Sparse Grids

16 : Markov Chain Monte Carlo (MCMC)

Lecture 8: Bayesian Networks

A Bayesian Approach to Phylogenetics

Bayesian Machine Learning - Lecture 7

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Convex Optimization CMU-10725

Variational Inference (11/04/13)

17 : Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

CPSC 540: Machine Learning

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Introduction to Bayesian methods in inverse problems

Nonparametric Bayesian Methods (Gaussian Processes)

Probabilistic Graphical Networks: Definitions and Basic Results

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Bayesian Phylogenetics:

Markov chain Monte Carlo methods for visual tracking

28 : Approximate Inference - Distributed MCMC

Undirected graphical models

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Sensor Fusion: Particle Filter

Online Social Networks and Media. Link Analysis and Web Search

Robert Collins CSE586, PSU Intro to Sampling Methods

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Markov-Chain Monte Carlo

Transcription:

Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1

Probabilistic Topological Maps Results Real-Time Urban Reconstruction Current Main Effort: 4D Atlanta 4D Atlanta, only real time, multiple cameras Large scale SFM: closing the loop Goals Bayesian paradigm is a useful tool to Represent knowledge Perform inference Sampling is a nice way to implement the Bayesian paradigm, e.g. Condensation Markov chain Monte Carlo methods are a nice way to implement sampling References Neal, Probabilistic Inference using MCMC Methods Smith & Gelfand, Bayesian Statistics Without Tears MacKay, Introduction to MC Methods Gilks et al, Introducing MCMC Gilks et al, MCMC in Practice 2

Probability of Robot Location Density Representation Y P(Robot Location) Gaussian centered around mean x,y Mixture of Gaussians Finite element i.e. histogram Larger spaces -> We have a problem! State space = 2D, infinite #states X Sampling as Representation Sampling Advantages Y P(Robot Location) Arbitrary densities Memory = O(#samples) Only in Typical Set Great visualization tool! minus: Approximate X How to Sample? Target Density P(x) Assumption: we can evaluate P(x) up to an arbitrary multiplicative constant Why can t we just sample from P(x)?? How to Sample? Numerical Recipes in C, Chapter 7 Transformation method: Gaussians etc Rejection sampling Importance sampling Markov chain Monte Carlo 3

Rejection Sampling The Good Target Density P Proposal Density Q P and Q need only be known up to a factor: P* and Q* must exist c such that cq*>=p* for all x 9% Rejection Rate the Bad and the Ugly. 50% Rejection Rate 70% Rejection Rate Mean and Variance of a Sample Monte Carlo Expected Value Mean Variance (1D) α Expected angle = 30 o 4

Monte Carlo Estimates (General) Estimate expectation of any function f: Bayes Law P(x Z) ~ P(Z x)p(x) Data = Z model P(Z x) Belief before = P(x) Prior Distribution of x Likelihood of x given Z Belief after = P(x Z) Posterior Distribution of x given Z Inference by Rejection Sampling P(measured_angle x,y) = N(predicted_angle,3 degrees) Prior(x,y) Posterior(x,y measured_angle=20 o ) Importance Sampling Good Proposal Density would be: prior! Problem: No guaranteed c s.t. c P(x)>=P(x z) for all x Idea: sample from P(x) give each sample x (r) a importance weight equal to P(Z x (r) ) Example Importance Sampling {x (r),y (r) ~Prior(x,y), w r =P(Z x (r),y (r) ) } Importance Sampling (general) Sample x (r) from Q* w r = P*(x (r) )/Q*(x (r) ) 5

Important Expectations Particle Filtering Any expectation using weighted average: 1D Robot Localization Importance Sampling Prior P(X) Likelihood L(X;Z) Histogram approach does not scale Monte Carlo Approximation Sample from P(X Z) by: sample from prior P(x) weight each sample x (r) using an importance weight equal to likelihood L(x (r) ;Z) Posterior P(X Z) 1D Importance Sampling Particle Filter = Recursive Importance Sampling w modeled dynamics π (1) π (2) π (3) First appeared in 70 s, re-discovered by Kitagawa, Isard, 6

3D Particle filter for robot pose: Monte Carlo Localization Dellaert, Fox & Thrun ICRA 99 Segmentation Example Binary Segmentation of image Probability of a Segmentation Very high-dimensional 256*256 pixels = 65536 pixels Dimension of state space N = 65536!!!! # binary segmentations = finite! 65536 2 = 4,294,967,296 Representation P(Segmentation) Histogram? I don t think so! Assume pixels independent P(x 1 x 2 x 2...)=P(x 1 )P(x 2 )P(x 3 )... Markov Random Fields Pixel is independent given its neighbors Clearly a problem! Giveaway: samples!!! Sampling in High-dimensional Spaces Exact schemes? If only we were so lucky! Rejection Sampling Rejection rate increase with N -> 100% Importance Sampling Same problem: vast majority weights -> 0 Markov Chains 7

A simple Markov chain 0.1 X 1 0.6 0.6 0.3 0.5 0.3 X 3 X 2 0.2 0.1 0.3 K= [ 0.1 0.5 0.6 0.6 0.2 0.3 0.3 0.3 0.1 ] [1 0 0] [0 1 0] [0 0 1] Stationary Distribution q 0 q 1 = K q 0 q 2 = K q 1 = K 2 q 0 q 3 = K q 2 = K 2 q 1 = K 3 q 0 q 10 = K q 9 = K 10 q 0 The Web as a Markov Chain Where do we end up if we click hyperlinks randomly? Eigen-analysis K = 0.1000 0.5000 0.6000 0.6000 0.2000 0.3000 0.3000 0.3000 0.1000 KE = ED Eigenvalue v 1 always 1 www.yahoo.com E = 0.6396 0.7071-0.2673 0.6396-0.7071 0.8018 0.4264 0.0000-0.5345 Stationary = e 1 /sum(e 1 ) i.e. Kp = p Answer: stationary distribution! D = 1.0000 0 0 0-0.4000 0 0 0-0.2000 e 1 e 2 e 3 q Eigen-analysis Google Pagerank q n =K n q 0 = E D n c Pagerank == First Eigenvector of the Web Graph! = p + c 2 v 2n e 2 + c 3 v 3n e 3+ www.yahoo.com Computation assumes a 15% random restart probability Sergey Brin and Lawrence Page, The anatomy of a large-scale hypertextual {Web} search engine, Computer Networks and ISDN Systems, 1998 8

Markov chain Monte Carlo Brilliant Idea! Published June 1953 Top 10 algorithm! Set up a Markov chain Run the chain until stationary All subsequent samples are from stationary distribution Markov chain Monte Carlo Example In high-dimensional spaces: Start at x 0 ~ q 0 Propose a move K(x t+1 x t ) K never stored as a big matrix K as a function/search operator How do get the right chain? Detailed balance: K(y x) p(x) = K(x y) p(y) Reject fraction of moves! Detailed balance: K(y x) 1/3 = K(x y) 2/3 0.5 * 9/14 = 0.9 * 5/14 0.5 * 1/3 = a * 0.9 * 2/3 a = 0.5 * 1/3 / (0.9 * 2/3) = 5/18 0.5 X 4 X 5 0.5 0.9 0.1 0.5 X 4 X 5 0.5 0.9 0.1 0.5 X 4 X 5 0.5 0.25 0.75 9

Metropolis-Hastings Algorithm - pick x (0), then iterate over: 1. propose x from Q(x ;x (t) ) 2. calculate ratio 3. if a>1 accept x (t+1) =x else accept with probability a if rejected: x (t+1) =x (t) Again! 1. x (0) =10 2. Proposal: x =x-1 with Pr 0.5 x =x+1 with Pr 0.5 3. Calculate a: a=1 if x in [0,20] a=0 if x =-1 or x =21 4. Accept if 1, reject if 0 5. Goto 2 1D Robot Localization Localization Eigenvectors Chain started at random Converges to posterior 1.0000 0.9962 Gibbs Sampling - MCMC method that always accepts - Algorithm: - alternate between x 1 and x 2-1. sample from x 1 ~ P(x 1 x 2 ) - 2. sample from x 2 ~ P(x 2 x 1 ) - Rationale: easy conditional distributions - = Gauss-Seidel of samplers 10

Sampling Segmentations Samples from Prior Prior model: Markov Random Field Likelihood: 1 or 0 plus Gaussian noise Gibbs Sampling method of choice Conditional densities are easy in MRF Forgiving Prior Stricter Prior P(being one others)= HIGH if many ones around you LOW if many zeroes around you 1 0.9 0.8 0.7 Sampling Prior Sampling Posterior P(being one others) pulled towards 0 if data close to 0 pushed towards 1 if data close to 1 and influence of prior... 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 Samples from Posterior Forgiving Prior Stricter Prior Relation to Belief Propagation In poly-trees: BP is exact In MRFs: BP is a variational approximation Computation is very similar to Gibbs Difference: BP Can be faster in yielding a good estimate BP exactly calculates the wrong thing MC might take longer to converge MC approximately calculates the right thing 11

Relation to Belief Propagation Application: Edge Classification Given vanishing points of a scene, classify each pixel according to vanishing direction MAP Edge Classifications Bayesian Model p(m G,V) = p(g M,V) p(m) / Z M = classifications, G = gradient magnitude/direction, V = vanishing points Independent Prior MRF Prior Prior: p(m) m Likelihood: p(g m,v) Red: VP1 Green: VP2 Blue: VP3 Gray: Other White: Off g Classifications w/mrf Prior Gibbs Sampling & MRFs Sample from distribution over labels for one site conditioned on all other sites in its Markov blanket Gibbs sampling over 4-neighbor lattice w/ clique potentials defined as: A if i=j, B if i <> j Gibbs sampling approximates posterior distribution over classifications at each site (by iterating and accumulating statistics) 12

Directional MRF Original Image vp Give more weight to potentials of neighbors which lie along the vanishing direction of current model Independent Prior MRF Prior Directional MRF Prior Take Home Points! Bayesian paradigm is a useful tool to Represent knowledge Perform inference Sampling is a nice way to implement the Bayesian paradigm, e.g. Condensation Markov chain Monte Carlo methods are a nice way to implement sampling 13

World Knowledge z Proposal Density Q - Q(x ;x) that depends on x x d Sensor Model P(z x) Most often analytic expression, can be learned d Step Size and #Samples - Too large: all rejected - Too small: random walk - E[d]=e sqrt(t) - Rule of thumb: T>=(L/e) 2 Discussion Example - e=1 - L=20 - T>=400 - Moral: avoid random walks - Bummer: just a lower bound MCMC in high dimensions - e=s min - L=s max - T=(s max /s min ) 2 - Good news: no curse in N - bad news: quadratic dependence 14