Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Similar documents
Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Introduction to Machine Learning CMU-10701

Convex Optimization CMU-10725

Markov chain Monte Carlo Lecture 9

Monte Carlo Methods. Leon Gu CSD, CMU

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

MCMC and Gibbs Sampling. Sargur Srihari

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Results: MCMC Dancers, q=10, n=500

Stochastic optimization Markov Chain Monte Carlo

Quantifying Uncertainty

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Markov Chains and MCMC

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

16 : Markov Chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Markov Chains, Random Walks on Graphs, and the Laplacian

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Finite-Horizon Statistics for Markov chains

17 : Markov Chain Monte Carlo

The Ising model and Markov chain Monte Carlo

19 : Slice Sampling and HMC

Robert Collins CSE586, PSU Intro to Sampling Methods

ELEC633: Graphical Models

6 Markov Chain Monte Carlo (MCMC)

STA 4273H: Statistical Machine Learning

16 : Approximate Inference: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Markov Chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Simulation - Lectures - Part III Markov chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Advanced Introduction to Machine Learning

Markov Chain Monte Carlo

Markov chain Monte Carlo

Markov Chain Monte Carlo Methods

Markov Networks.

STA 294: Stochastic Processes & Bayesian Nonparametrics

CS281A/Stat241A Lecture 22

Sampling Methods (11/30/04)

CPSC 540: Machine Learning

Graphical Models and Kernel Methods

9 Markov chain Monte Carlo integration. MCMC

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Markov Chains and MCMC

Inference in Bayesian Networks

MCMC Methods: Gibbs and Metropolis

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Markov Chain Monte Carlo (MCMC)

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Markov Chain Monte Carlo

Numerical methods for lattice field theory

Likelihood Inference for Lattice Spatial Processes

CS 343: Artificial Intelligence

Computational statistics

Markov Chains Handout for Stat 110

Stochastic Simulation

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Clustering using Mixture Models

Markov-Chain Monte Carlo

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Bayesian Inference and MCMC

DAG models and Markov Chain Monte Carlo methods a short overview

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

MARKOV CHAIN MONTE CARLO

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Lecture 2: September 8

Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

First Hitting Time Analysis of the Independence Metropolis Sampler

Convergence Rate of Markov Chains

Chapter 7. Markov chain background. 7.1 Finite state space

Numerical integration and importance sampling

Lecture 8: The Metropolis-Hastings Algorithm

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Advances and Applications in Perfect Sampling

Intelligent Systems:

Markov and Gibbs Random Fields

Lecture 6: Markov Chain Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Answers and expectations

CSC 446 Notes: Lecture 13

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Transcription:

Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo

Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain Monte Carlo (MCMC) It keeps a record of the current state and the proposal depends on that state Most common algorithms are the Metropolis- Hastings algorithm and Gibbs Sampling 2 Group

Markov Chains Revisited A Markov Chain is a distribution over discretestate random variables x 1,...,x M p(x 1,...,x T )=p(x 1 )p(x 2 x 1 ) = p(x 1 ) so that The graphical model of a Markov chain is this: TY t=2 p(x t x t 1 ) T We will denote as a row vector p(x t x t 1 ) t A Markov chain can also be visualized as a state transition diagram. 3 Group

The State Transition Diagram time k=1 A 11 A 11 states k=2 k=3 A 33 A 33 t-2 t-1 t 4 Group

Some Notions The Markov chain is said to be homogeneous if the transitions probabilities are all the same at every time step t (here we only consider homogeneous Markov chains) The transition matrix is row-stochastic, i.e. all entries are between 0 and 1 and all rows sum up to 1 Observation: the probabilities of reaching the states can be computed using a vector-matrix multiplication 5 Group

The Stationary Distribution KX The probability to reach state k is Or, in matrix notation: t = t 1 A k,t = i=1 i,t 1 A ik We say that t is stationary if t = t 1 Questions: How can we know that a stationary distributions exists? And if it exists, how do we know that it is unique? 6 Group

The Stationary Distribution (Existence) To find a stationary distribution we need to solve the eigenvector problem A T v = v The stationary distribution is then = v T where v is the eigenvector for which the eigenvalue is 1. This eigenvector needs to be normalized so that it is a valid distribution. Theorem (Perron-Frobenius): Every rowstochastic matrix has such an eigen vector, but this vector may not be unique. 7 Group

Stationary Distribution (Uniqueness) A Markov chain can have many stationary distributions Sufficient for a unique stationary distribution: we can reach every state from any other state in finite steps at non-zero probability (i.e. the chain is ergodic) This is equivalent to the property that the transition matrix is irreducible: 0.1 0.1 1.0 0.9 1 2 3 4 0.5 0.5 0.9 8i, j 9m (A m ) ij > 0 8 Group

Main Idea of MCMC So far, we specified the transition probabilities and analysed the resulting distribution This was used, e.g. in HMMs Now: We want to sample from an arbitrary distribution To do that, we design the transition probabilities so that the resulting stationary distribution is our desired (target) distribution! 9 Group

Detailed Balance Definition: A transition distribution t satisfies the property of detailed balance if i A ij = j A ji The chain is then said to be reversible. 1 3 A 31 + A 13 A 31 3 1 A 13 + t-1 t 10 Group

Making a Distribution Stationary Theorem: If a Markov chain with transition matrix A is irreducible and satisfies detailed balance wrt. the distribution, then is a stationary distribution of the chain. Proof: KX KX K X i A ij = j A ji = j A ji = j 8j i=1 i=1 i=1 it follows. = A This is a sufficient, but not necessary condition. 11 Group

Sampling with a Markov Chain The idea of MCMC is to sample state transitions based on a proposal distribution q. The most widely used algorithm is the Metropolis-Hastings (MH) algorithm. In MH, the decision whether to stay in a given state is based on a given probability. If the proposal distribution is move to state x 0 with probability min Unnormalized target distribution q(x 0 x) 1, p(x0 )q(x x 0 ) p(x)q(x 0 x), then we 12 Group

The Metropolis-Hastings Algorithm Initialize x 0 for define s =0, 1, 2,... sample compute acceptance probability compute x = x s x 0 q(x 0 x) = p(x0 )q(x x 0 ) p(x)q(x 0 x) r =min(1, ) sample u U(0, 1) set new sample to x s+1 = ( x 0 if u<r x s if u r 13 Group

Why Does This Work? We have to prove that the transition probability of the MH algorithm satisfies detailed balance wrt the target distribution. Theorem: If p MH (x 0 x) is the transition probability of the MH algorithm, then p(x)p MH (x 0 x) =p(x 0 )p MH (x x 0 ) Proof: 14 Group

Why Does This Work? We have to prove that the transition probability of the MH algorithm satisfies detailed balance wrt the target distribution. Theorem: If p MH (x 0 x) is the transition probability of the MH algorithm, then p(x)p MH (x 0 x) =p(x 0 )p MH (x x 0 ) Note: All formulations are valid for discrete and for continuous variables! 15 Group

Choosing the Proposal A proposal distribution is valid if it gives a nonzero probability of moving to the states that have a non-zero probability in the target. A good proposal is the Gaussian, because it has a non-zero probability for all states. However: the variance of the Gaussian is important! with low variance, the sampler does not explore sufficiently, e.g. it is fixed to a particular mode with too high variance, the proposal is rejected too often, the samples are a bad approximation 16 Group

Example MH with N(0,1.000 2 ) proposal Target is a mixture of 2 1D Gaussians. Proposal is a Gaussian 0.2 0.1 0 0 200 with different variances. 400 600 800 Iterations 1000 100 50 0 Samples 50 100 MH with N(0,500.000 2 ) proposal MH with N(0,8.000 2 ) proposal 0.06 0.04 0.03 0.02 0.02 0 0 200 400 600 800 Iterations 1000 100 50 0 Samples 50 100 0.01 0 0 200 400 600 800 Iterations 1000 100 50 0 Samples 50 100 17 Group

Gibbs Sampling Initialize {z i : i =1,...,M} For Sample Sample... =1,...,T Sample z ( +1) 1 p(z 1 z ( ) 2,...,z( ) M ) z ( +1) 2 p(z 2 z ( +1) 1,...,z ( ) M ) z ( +1) M p(z M z ( +1) 1,...,z ( +1) M 1 ) Idea: sample from the full conditional This can be obtained, e.g. from the Markov blanket in graphical models. 18 Group

Gibbs Sampling: Example Use an MRF on a binary image with edge potentials (x s,x t )=exp(jx s x t ) ( Ising model ) and node potentials (x t )=N (y t x t, 2 ) x t 2 { 1, 1} y t x s x t 19 Group

Gibbs Sampling: Example Use an MRF on a binary image with edge potentials (x s,x t )=exp(jx s x t ) ( Ising model ) and node potentials (x t )=N (y t x t, 2 ) Sample each pixel in turn sample 1, Gibbs 1 sample 5, Gibbs 1 mean after 15 sweeps of Gibbs 1 0.5 0.5 0.5 0 0 0 0.5 0.5 0.5 1 1 1 After 1 sample After 5 samples Average after 15 samples 20 Group

Gibbs Sampling for GMMs Again, we start with the full joint distribution: p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( ) KY k=1 p(µ k )p( k ) It can be shown that the full conditionals are: p(z i = k x i, µ,, ) / k N (x i µ k, k ) NX p( z) =Dir({ k + z ik } K k=1) i=1 p(µ k k,z,x)=n(µ k m k,v k ) p( k µ k,z,x)=iw( k S k, k ) (linear-gaussian) 21 Group

Gibbs Sampling for GMMs First, we initialize all variables Then we iterate over sampling from each conditional in turn In the end, we look at and µ k k 22 Group

How Often Do We Have To Sample? Here: after 50 sample rounds the values don t change any more In general, the mixing time is related to the eigen gap of the transition matrix: = 1 2 apple O( 1 log n ) 23 Group

Gibbs Sampling is a Special Case of MH The proposal distribution in Gibbs sampling is q(x 0 x) =p(x 0 i x i )I(x 0 i = x i ) This leads to an acceptance rate of: = p(x0 )q(x x 0 ) p(x)q(x 0 x) = p(x0 i x0 i )p(x0 i )p(x i x 0 i ) p(x i x i )p(x i )p(x 0 i x i) =1 Although the acceptance is 100%, Gibbs sampling does not converge faster, as it only updates one variable at a time. 24 Group

Summary Markov Chain Monte Carlo is a family of sampling algorithms that can sample from arbitrary distributions by moving in state space Most used methods are the Metropolis-Hastings (MH) and the Gibbs sampling method MH uses a proposal distribution and accepts a proposed state randomly Gibbs sampling does not use a proposal distribution, but samples from the full conditionals 25 Group