Robert Collins CSE586, PSU Intro to Sampling Methods

Similar documents
Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Markov-Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

2D Image Processing (Extended) Kalman and particle filter

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Answers and expectations

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Sensor Fusion: Particle Filter

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

14 : Approximate Inference Monte Carlo Methods

17 : Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Chris Bishop s PRML Ch. 8: Graphical Models

Graphical Models and Kernel Methods

Sequential Monte Carlo Methods for Bayesian Computation

Markov chain Monte Carlo methods in atmospheric remote sensing

Content.

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Physics 509: Bootstrap and Robust Parameter Estimation

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

An introduction to particle filters

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Reminder of some Markov Chain properties:

Approximate Inference

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC)

Negative Association, Ordering and Convergence of Resampling Methods

Machine Learning for Data Science (CS4786) Lecture 24

Blind Equalization via Particle Filtering

Computational Genomics

Monte Carlo Approximation of Monte Carlo Filters

An Brief Overview of Particle Filtering

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Particle Filters. Outline

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Linear Dynamical Systems

Bayes Nets: Sampling

Stochastic Simulation

Computer Intensive Methods in Mathematical Statistics

DAG models and Markov Chain Monte Carlo methods a short overview

Learning Static Parameters in Stochastic Processes

Particle Filtering for Data-Driven Simulation and Optimization

An introduction to Sequential Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Application of new Monte Carlo method for inversion of prestack seismic data. Yang Xue Advisor: Dr. Mrinal K. Sen

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Computer Intensive Methods in Mathematical Statistics

Hidden Markov Models Part 1: Introduction

CSC 446 Notes: Lecture 13

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Introduction to Probability and Statistics (Continued)

Markov chain Monte Carlo

Monte Carlo Methods. Geoff Gordon February 9, 2006

Fundamental Probability and Statistics

MCMC: Markov Chain Monte Carlo

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

CS 343: Artificial Intelligence

CPSC 540: Machine Learning

Introduction to Machine Learning CMU-10701

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

MCMC Methods: Gibbs and Metropolis

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Basic Sampling Methods

Introduction to Machine Learning CMU-10701

Bayesian Regression Linear and Logistic Regression

MCMC and Gibbs Sampling. Kayhan Batmanghelich

6 Markov Chain Monte Carlo (MCMC)

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Introduction to Machine Learning

The Hierarchical Particle Filter

Simulation. Where real stuff starts

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

an introduction to bayesian inference

Introduction to Bayesian Computation

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Monte Carlo Solution of Integral Equations

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

A Comparison of Particle Filters for Personal Positioning

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Introduction to Statistics and Error Analysis

Practice Problems Section Problems

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

Auxiliary Particle Methods

16 : Approximate Inference: Markov Chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Transcription:

Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ

Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area total # boxes http://videolectures.net/mlss08au_freitas_asm/

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area As we use more samples, our answer should get more and more accurate Doesn t matter what the shape looks like (1,1) (1,1) (0,0) arbitrary region (even disconnected) (0,0) area under curve aka integration!

Monte Carlo Integration Goal: compute definite integral of function f(x) from a to b c upper bound f(x) Generate N uniform random samples in upper bound volume N a b K count the K samples that fall below the f(x) curve Answer= K N * Area of upper bound volume K = * (b-a)(c-0) N

Motivation: Normalizing Constant Sampling-based integration is useful for computing the normalizing constant that turns an arbitrary non-negative function f(x) into a probability density function p(x). Z Compute this via sampling (Monte Carlo Integration). Then: 1 Z Note: for complicated, multidimensional functions, this is the ONLY way we can compute this normalizing constant.

Motivation : Expected Values If we can generate random samples x i from a given distribution P(x), then we can estimate expected values of functions under this distribution by summation, rather than integration. That is, we can approximate: by first generating N i.i.d. samples from P(x) and then forming the empirical estimate:

Expected Values and Sampling Example: a discrete pdf P(x) P(1) = 1/4 1 2 3 P(2) = 1/4 P(3) = 2/4

Expected Values and Sampling (cont) generate 10 samples from P(x) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 1 3 3 2 2 3 1 3 3 1 10 P(x) 1 2 3 ~

Inverse Transform Sampling It is easy to sample from a discrete 1D distribution, using the cumulative distribution function. w k 1 0 1 k N 1 k N

Inverse Transform Sampling It is easy to sample from a discrete 1D distribution, using the cumulative distribution function. 1) Generate uniform u in the range [0,1] 1 2) Visualize a horizontal line intersecting bars 3) If index of intersected bar is j, output new sample x k =j u 0 1 k N j x k

Efficient Generating Many Samples (naive approach is NlogN, but we can do better) Basic idea: choose one initial small random number; deterministically sample the rest by crawling up the cdf function. This is O(N). Due to Arulampalam odd property: you generate the random numbers in sorted order...

Efficient Generating Many Samples (naive approach is NlogN, but we can do better) This approach, called Systematic Resampling (Kitagawa 96), is known to produce Monte Carlo estimates with minimum variance (more certainty). Due to Arulampalam To explore further, see the quasi-random sampling final project idea for this class.

Example: Sampling From an Image Likelihood image to sample from. Concatenate values into 1D vector and normalize to form prob mass function. Do systematic resampling. Accumulate histogram of sample values generated and map counts back into the corresponding pixel locations. 500 samples 1000 samples 5000 samples 10000 samples

Example: Ancestral Sampling Chris Bishop, PRML.

Ancestral Sampling [.7.3].4.6.5.5.8.2.2.8.5.5.7.3 Conditional probability tables where values in each row sum to 1

Ancestral Sampling [.7.3].4.6.5.5.8.2.2.8.5.5.7.3 Conditional probability tables where values in each row sum to 1 Start by sampling from P(x1). Then sample from P(x2 x1). Then sample from P(x3 x2). Finally, sample from P(x4 x3). {x1,x2,x3,x4} is a sample from the joint distribution.

Ancestral Sampling p1 = [.7.3]; %marginal on p1 (root node) p12 = [.4.6;.5.5]; %conditional probabilities p(xn xn-1) p23 = [.8.2;.2.8]; %rows must sum to one! p34 = [.5.5;.7.3]; clear foo for i=1:10000 %x1 Matlab Demo x1 = samplefrom(p1); %x2 x2 = samplefrom(p12(x1,:)); %x3 x3 = samplefrom(p23(x2,:)); %x4 x4 = samplefrom(p34(x3,:)); %compute prob prob = p1(x1)*p12(x1,x2)*p23(x2,x3)*p34(x3,x4); foo(i,:) = [x1 x2 x3 x4 prob]; end

Ground Truth (to compare) Joint Probability, represented in a truth table x1 x2 x3 x4 P(x1,x2,x3,x4) MAP Marginals

A Brief Overview of Sampling Inverse Transform Sampling (CDF) Rejection Sampling Importance Sampling For these two, we can sample from an unnormalized distribution function. That is, to sample from distribution P, we only need to know a function P*, where P = P* / c, for some normalization constant c.

Rejection Sampling Need a proposal density Q(x) [e.g. uniform or Gaussian], and a constant c such that c(qx) is an upper bound for P*(x) cq(x) Example with Q(x) uniform upper bound generate uniform random samples in upper bound volume P*(x) accept samples that fall below the P*(x) curve the marginal density of the x coordinates of the points is then proportional to P*(x) Note the relationship to Monte Carlo integration.

Rejection Sampling More generally: 1) generate sample x i from a proposal density Q(x) 2) generate sample u from uniform [0,cQ(x i )] 3) if u <= P*(x i ) accept x i ; else reject cq(x i ) P * (x i ) cq(x) P*(x) Q(x) u accept! x i

Importance Sampling Not for generating samples. It is a method to estimate the expected value of a function f(x i ) directly 1) Generate x i from Q(x) 2) an empirical estimate of E Q (f(x)), the expected value of f(x) under distribution Q(x), is then 3) However, we want E P (f(x)), which is the expected value of f(x) under distribution P(x) = P * (x)/z

Importance Sampling P*(x) Q(x) When we generate from Q(x), values of x where Q(x) is greater than P*(x) are overrepresented, and values where Q(x) is less than P*(x) are underrepresented. To mitigate this effect, introduce a weighting term

Importance Sampling New procedure to estimate E P (f(x)): 1) Generate N samples x i from Q(x) 2) form importance weights 3) compute empirical estimate of E P (f(x)), the expected value of f(x) under distribution P(x), as

Resampling Note: We thus have a set of weighted samples (xi, wi i=1,,n) If we really need random samples from P, we can generate them by resampling such that the likelihood of choosing value x i is proportional to its weight w i This would now involve now sampling from a discrete distribution of N possible values (the N values of x i ) Therefore, regardless of the dimensionality of vector x, we are resampling from a 1D distribution (we are essentially sampling from the indices 1...N, in proportion to the importance weights w i ). So we can using the inverse transform sampling method we discussed earlier.

Note on Proposal Functions Computational efficiency is best if the proposal distribution looks a lot like the desired distribution (area between curves is small). These methods can fail badly when the proposal distribution has 0 density in a region where the desired distribution has non-negligeable density. For this last reason, it is said that the proposal distribution should have heavy tails.

Sequential Monte Carlo Methods Sequential Importance Sampling (SIS) and the closely related algorithm Sampling Importance Sampling (SIR) are known by various names in the literature: - bootstrap filtering - particle filtering - Condensation algorithm - survival of the fittest General idea: Importance sampling on time series data, with samples and weights updated as each new data term is observed. Well-suited for simulating Markov chains and HMMs!