Markov-Chain Monte Carlo

Similar documents
References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods

Introduction to Machine Learning CMU-10701

Markov chain Monte Carlo Lecture 9

Quantifying Uncertainty

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chains Handout for Stat 110

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

MCMC: Markov Chain Monte Carlo

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

STA 4273H: Statistical Machine Learning

Convex Optimization CMU-10725

Monte Carlo Methods. Geoff Gordon February 9, 2006

6 Markov Chain Monte Carlo (MCMC)

Stochastic optimization Markov Chain Monte Carlo

Markov Chains and MCMC

Introduction to Machine Learning CMU-10701

16 : Approximate Inference: Markov Chain Monte Carlo

Markov chain Monte Carlo methods in atmospheric remote sensing

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Computational statistics

CPSC 540: Machine Learning

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Simulation of point process using MCMC

Lecture 6: Markov Chain Monte Carlo

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

MCMC notes by Mark Holder

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

CPSC 540: Machine Learning

Monte Carlo Methods. Leon Gu CSD, CMU

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Chapter 10. Optimization Simulated annealing

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Results: MCMC Dancers, q=10, n=500

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

CS 343: Artificial Intelligence

Bayesian model selection in graphs by using BDgraph package

MCMC Sampling for Bayesian Inference using L1-type Priors

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Bayesian GLMs and Metropolis-Hastings Algorithm

MCMC for Cut Models or Chasing a Moving Target with MCMC

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Computer intensive statistical methods

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Markov chain Monte Carlo

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Convergence Rate of Markov Chains

MCMC and Gibbs Sampling. Sargur Srihari

Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

CS 188: Artificial Intelligence. Bayes Nets


An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

CSC 446 Notes: Lecture 13

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Doing Bayesian Integrals

CS281A/Stat241A Lecture 22

Markov Chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

MCMC Methods: Gibbs and Metropolis

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Markov Chains and MCMC

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Markov chain Monte Carlo

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Lecture 8: PGM Inference

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

9 Markov chain Monte Carlo integration. MCMC

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

F denotes cumulative density. denotes probability density function; (.)

The Ising model and Markov chain Monte Carlo

Advances and Applications in Perfect Sampling

Lecture 7 and 8: Markov Chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

Monte Carlo integration

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Transdimensional Markov Chain Monte Carlo Methods. Jesse Kolb, Vedran Lekić (Univ. of MD) Supervisor: Kris Innanen

Bayes Nets: Sampling

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Transcription:

Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.

References

Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can estimate expected values of functions under this distribution by summation, rather than integration. That is, we can approximate: by first generating N i.i.d. samples from P(x) and then forming the empirical estimate:

unknown normalization factor we only know how to sample from a few nice multidimensional distributions (uniform, normal)

Recall: Sampling Methods Inverse Transform Sampling (CDF) Rejection Sampling Importance Sampling

Problem Intuition: In high dimension problems, the Typical Set (volume of nonnegligable prob in state space) is a small fraction of the total space.

Recall: Markov Chain Question Assume you start in some state, and then run the simulation for a large number of time steps. What percentage of time do you spend at X1, X2 and X3? Recall: Transpose of transition matrix (columns sum to one)

four possible initial distributions [.33.33.33] initial distribution distribution after one time step all eventually end up with same distribution -- this is the stationary distribution!

in matlab: [E,D] = eigs(k)

General Idea Start in some state, and then run the simulation for some number of time steps. After you have run it long enough start keeping track of the states you visit. {... X1 X2 X1 X3 X3 X2 X1 X2 X1 X1 X3 X3 X2...} These are samples from the distribution you want, so you can now compute any expected values with respect to that distribution empirically.

Theory (cause it s important) every state is accessible from every other state. expected return time to every state is finite If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution. Then, the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution We can compute this empirically as we generate samples. This is what we want to compute, and is infeasible to compute in any other way.

But how to design the chain? Assume you want to spend a particular percentage of time at X1, X2 and X3. What should the transition probabilities be? P(x1) =.2 P(x2) =.3 X1 P(x3) =.5 K = [????????? ] X3 X2

Let s consider a specific example

Example: People counting Problem statement: Given a foreground image, and personsized bounding box*, find a configuration (number and locations) of bounding boxes that cover a majority of foreground pixels while leaving a majority of background pixels uncovered. foreground image person-sized bounding box *note: height, width and orientation of the bounding box may depend on image location we determine these relationships beforehand through a calibration procedure.

Likelihood Score To measure how good a proposed configuration is, we generate a foreground image from it and compare with the observed foreground image to get a likelihood score. config = {{x 1,y 1,w 1,h 1,theta 1 },{x 2,y 2,w 2,h 2,theta 2 },{x 3,y 3,w 3,h 3,theta 3 }} generated foreground image observed foreground image compare Likelihood Score

Likelihood Score Bernoulli distribution model likelihood simplify, by assuming log likelihood Number of pixels that disagree!

Searching for the Max The space of configurations is very large. We can t exhaustively search for the max likelihood configuration. We can t even really uniformly sample the space to a reasonable degree of accuracy. config k = {{x 1,y 1,w 1,h 1,theta 1 },{x 2,y 2,w 2,h 2,theta 2 },,{x k,y k,w k,h k,theta k }} Let N = number of possible locations for (x i,y i ) in a k-person configuration. Size of config k = N k And we don t even know how many people there are... Size of config space = N 0 + N 1 + N 2 + N 3 + If we also wanted to search for width, height and orientation, this space would be even more huge.

Searching for the Max Local Search Approach Given a current configuration, propose a small change to it Compare likelihood of proposed config with likelihood of the current config Decide whether to accept the change

Proposals Add a rectangle (birth) add current configuration proposed configuration

Proposals Remove a rectangle (death) remove current configuration proposed configuration

Move a rectangle Proposals move current configuration proposed configuration

Naïve Acceptance Searching for the Max Accept proposed configuration if it has a larger likelihood score, i.e. Compute a = L(proposed) L(current) Accept if a > 1 Problem: leads to hill-climbing behavior that gets stuck in local maxima Brings us here But we really want to be over here! start Likelihood

The MCMC approach Searching for the Max Generate random configurations from a distribution proportional to the likelihood! Generates many high likelihood configurations Likelihood Generates few low likelihood ones.

The MCMC approach Searching for the Max Generate random configurations from a distribution proportional to the likelihood! This searches the space of configurations in an efficient way. Now just remember the generated configuration with the highest likelihood.

Sounds good, but how to do it? Think of configurations as nodes in a graph. Put a link between nodes if you can get from one config to the other in one step (birth, death, move) death birth config A birth death config C config B move move birth death move move move move birth death config D move move config E death birth Note links come in pairs: birth/death; move/move

Detailed Balance Consider a pair of configuration nodes r,s Want to generate them with frequency relative to their likelihoods L(r) and L(s) Let q(r,s) be relative frequency of proposing configuration s when the current state is r (and vice versa) A sufficient condition to generate r,s with the desired frequency is L(r) r q(r,s) L(r) q(r,s) = L(s) q(s,r) s L(s) detailed balance q(s,r)

Detailed Balance Typically, your proposal frequencies do NOT satisfy detailed balance (unless you are extremely lucky). To fix this, we introduce a computational fudge factor a Detailed balance: a* L(r) q(r,s) = L(s) q(s,r) a * q(r,s) Solve for a: L(r) r a = L(s) q(s,r) L(r) q(r,s) q(s,r) s L(s)

MCMC Sampling Metropolis Hastings algorithm Propose a new configuration Compute a = L(proposed) q(proposed,current) L(current) q(current,proposed) Accept if a > 1 Else accept anyways with probability a Difference from Naïve algorithm

Trans-dimensional MCMC Green s reversible-jump approach (RJMCMC) gives a general template for exploring and comparing states of differing dimension (diff numbers of rectangles in our case). Proposals come in reversible pairs: birth/death and move/move. We should add another term to the acceptance ratio for pairs that jump across dimensions. However, that term is 1 for our simple proposals.

MCMC in Action Sequence of proposed configurations Sequence of accepted configurations movies

MCMC in Action Max likelihood configuration Looking good!

Examples

Note: you can just make q up on-the-fly. diff with rejection sampling: instead of throwing away rejections, you replicate them into next time step.

Metropolis Hastings Example P(x1) =.2 P(x2) =.3 P(x3) =.5 X1 Matlab demo X3 X2 Proposal distribution q(xi, (xi-1)mod3 ) =.4 q(xi, (xi+1)mod3) =.6

Variants of MCMC there are many variations on this general approach, some derived as special cases of the Metropolis-Hastings algorithm

cancels q(x,x) q(x, x ) e.g. Gaussian

simpler version, using 1D conditional distributions or line search, or...

1D marginal wrt x1 1D marginal wrt x2 interleave

Gibbs Sampler Special case of MH with acceptance ratio always 1 (so you always accept the proposal). where S.Brooks, Markov Chain Monte Carlo and its Application

Simulated Annealing introduce a temperature term that makes it more likely to accept proposals early on. This leads to more aggressive exploration of the state space. Gradually reduce the temperature, causing the process to spend more time exploring high likelihood states. Rather than remember all states visited, keep track of the best state you ve seen so far. This is a method that attempts to find the global max (MAP) state.

Trans-dimensional MCMC Exploring alternative state spaces of differing dimensions (example, when doing EM, also try to estimate number of clusters along with parameters of each cluster). Green s reversible-jump approach (RJMCMC) gives a general template for exploring and comparing states of differing dimension.