CSC 446 Notes: Lecture 13

Similar documents
16 : Markov Chain Monte Carlo (MCMC)

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Introduction to Machine Learning CMU-10701

Markov chain Monte Carlo Lecture 9

Markov Chains and MCMC

Convex Optimization CMU-10725

Lecture 6: Markov Chain Monte Carlo

Machine Learning for Data Science (CS4786) Lecture 24

Monte Carlo Methods. Leon Gu CSD, CMU

16 : Approximate Inference: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Monte Carlo Methods. Geoff Gordon February 9, 2006

17 : Markov Chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Answers and expectations

Line of symmetry Total area enclosed is 1

STA 4273H: Statistical Machine Learning

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Markov Chains and MCMC

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Markov Chain Monte Carlo Methods

Bayesian Inference and MCMC

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

CPSC 540: Machine Learning

Sampling Methods (11/30/04)

Convergence Rate of Markov Chains

Markov chain Monte Carlo

Quantifying Uncertainty

Lecture 8: The Metropolis-Hastings Algorithm

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

STAT 425: Introduction to Bayesian Analysis

Generating the Sample

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Introduction to Machine Learning CMU-10701

MCMC and Gibbs Sampling. Sargur Srihari

Markov chain Monte Carlo

CSC 2541: Bayesian Methods for Machine Learning

Markov chain Monte Carlo

Paul Karapanagiotidis ECO4060

LECTURE 15 Markov chain Monte Carlo

Markov chain Monte Carlo

Probabilistic Graphical Models

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Markov chain Monte Carlo methods in atmospheric remote sensing

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

The Ising model and Markov chain Monte Carlo

I. Bayesian econometrics

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Statistical Data Mining and Medical Signal Detection. Lecture Five: Stochastic Algorithms and MCMC. July 6, Motoya Machida

Markov Chain Monte Carlo Methods for Stochastic Optimization

Inference in Bayesian Networks

Expectations, Markov chains, and the Metropolis algorithm

Particle Filtering Approaches for Dynamic Stochastic Optimization

Markov Chain Monte Carlo

Bayesian Estimation with Sparse Grids

19 : Slice Sampling and HMC

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

6 Markov Chain Monte Carlo (MCMC)

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Preliminary statistics

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Session 3A: Markov chain Monte Carlo (MCMC)

Markov Chain Monte Carlo, Numerical Integration

CSC 2541: Bayesian Methods for Machine Learning


22 Approximations - the method of least squares (1)

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Markov Chain Monte Carlo (MCMC)

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Monte Carlo and cold gases. Lode Pollet.

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Numerical integration and importance sampling

Adaptive Monte Carlo methods

ELEC633: Graphical Models

F denotes cumulative density. denotes probability density function; (.)

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Markov and Gibbs Random Fields

Markov Chain Monte Carlo methods

Stochastic optimization Markov Chain Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Lecture 7 and 8: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Markov Chain Monte Carlo Methods for Stochastic

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Simulation - Lectures - Part III Markov chain Monte Carlo

Transcription:

CSC 446 Notes: Lecture 3 The Problem We have already studied how to calculate the probability of a variable or variables using the message passing method. However, there are some times when the structure of the graph is too complicated to be calculated. The relation between the diseases and symptoms is a good example, where the variables are all mixed together and brings the graph a high tree width. Another case is that of continuous variable, where during the message passing, r m n = f( x m q n m(x n d( x m \x n. n If this integration can not be calculated, what can we do to evaluate the probability of variables? This is what sampling is used for. 2 How to Sample a Continuous Variable: Basics Now let us forget the above for a moment, say if we want to sample for a continuous variable, how can we ensure that the points we pick up satisfy the distribution of that variable? This question is easy for variables with uniform distribution, since we can generate random numbers directly using a computer. For some complicated distributions, we could use the inverse of cumulative distribution function (CDF to map the uniform distribution onto the required distribution to generate samples, where the CDF for a distribution with probability distribution function (PDF of P is CDF(x = x P (tdt. For example, if we want to sample from a variable with standard normal distribution, the points we pick up are calculated from X = erf (x, where x is drawn from a uniform distribution, and erf(x = x 0 N (t, 0, dt, We could play the same trick for many other distributions. However, there are some distributions which do not have a closed-form integral to calculate their CDF, which makes the above method fail. Under such conditions, we could turn to a framework called Markov chain Monte Carlo (MCMC. 3 The Metropolis-Hastings Algorithm Before discussing this method in more detail, let us review some basic properties of Markov chains. A firstorder Markov chain is a series of random variables such that each variable depends only on its previous state, that is,

x t P (x t x t. Our goal is to find a Markov chain which has a distribution similar to a given distribution which we want to sample from, so that by running the Markov chain, we get results as if we were sampling from the original distribution. In other words, we want to have the Markov chain that eventually be able to explore over the entire space of the original distribution, 2 reflect the original PDF. The general algorithm for generating the samples is called the Metropolis-Hastings algorithm. Such an algorithm draws a candidate x Q(x ; x t, and then accepts it with probability min, P (x Q(x t ; x } P (x t Q(x ; x t. The key here is the function Q, called proposed distribution which is used to reduce the complexity of the original distribution. Therefore, we have to select a Q that is easy to sample from, for instance, a Gaussian function. Note that there is a trade-off on choosing the variance of the Gaussian, which determines the step size of the Markov chain. If it is too small, it will take a long time, or even make it impossible for the states of the variable to go over the entire space. However, if the variance is too large, the probability of accepting the new candidate will become small, and thus it is possible that the variable will stay on the same state for ever. All these extremes will make the chain fail to simulate the original PDF. If we sample from P directly, that is Q(x ; x t = P (x, we have P (x Q(x t ; x P (x t Q(x ; x t =, which means that the candidate we draw will always be accepted. This tells us that Q should approximate P. By the way, how do we calculate P (x? There are two cases. Although we cannot get the integration of P (x, P (x itself is easy to compute. P (x = f(x/z, where Z = f(xdx is what we do not know. But since we know f(x = ZP (x, we could ust substitute f(x instead of P (x in calculating the probability of acceptance of a candidate. 4 Proof of the method In this section, our goal is to prove that the Markov chain generated by the Metropolis-Hastings algorithm has a unique stationary distribution. We will first introduce some basics about the definition of the stationary distribution, and the method to prove this stationary. Then we will apply those knowledge to accomplish our goal. (a Stationary distribution A distribution with respect to a Markov chain is said to be stationary if the distribution remains the same before and after taking one step in the chain, which could be denoted as Π t = T Π t = Π t, or Π i = T i Π, where Π is a vector which contains the stationary distribution of the state of the variable in each step with its element Π i = P (x = i, and T is the transition probability matrix where its element T i = P (x t = i x t = denotes the probability that the variable transits from state [ to i. ] For example, the two Markov chains in item a and item b all have a stationary distribution Π =. 2

0.7 0. q q q q q 0.3 0.3 0.9 0.9 0.7 0. (a (b (c (d (e Figure : Example Markov Chains The stationary distribution of a Markov chain could be calculated by solving the equation T Π = Π Π i =. i Note that there might be more than one stationary distribution for a Markov chain. A rather simple example would be a chain with a identity transition matrix shown in item c. If a Markov chain has a stationary distribution and the stationary distribution is unique, it is ensured that it will converge eventually to that distribution no matter what the original state the chain is. (b Detailed Balance: Property to ensure the stationary distribution Once we know a Markov chain is uniquely stationary, then we can use it to sample from a given distribution. Now, we will see a sufficient (but not necessary condition for ensuring a Π is stationary, which is a property of the transition matrix called Detailed Balance. The definition of such a property is T i Π = T i Π i, which means P i = P i, and is also called reversibility due to the symmetry of the structure. Starting from such definition, we have Note that T i =, we come up with T i Π = T i Π i = Π i T i. T i Π = Π i = Π i, which is exactly the second definition of stationary distribution we have ust discussed. Therefore, if a distribution makes the transition matrix of a Markov chain satisfy detailed balance, that distribution is the stationary distribution of that chain. Note that although a periodic Markov chain like that shown in item d satisfies detailed balance, we do not call it stationary. This because it will not truly converge and thus is not guaranteed to approximate the original PDF. What is more, it is often the case that we add a probability like shown in item e to avoid such a periodic circumstance. 3

Note that the Detailed Balance does not ensure the uniqueness of the stationary distribution of a Markov chain. However, such uniqueness is necessary, or the Markov chain would not go to the PDF we want. What we could do is that, when we construct the chain at the very beginning, we make the chain such that any state is reachable for any other and 2 the chain is aperiodic. Under that condition, we could ensure the uniqueness of the stationary distribution. (c Final proof Now, let us be back to the Metropolis-Hastings algorithm and prove that the transition matrix of its Markov chain has the detailed balance property. If we can prove that, it is obvious that such a Markov chain has a unique stationary distribution. According to the Metropolis-Hastings algorithm, the transition probability of the Markov chain of the algorithm is T (x ; x = Q(x ; x min, P (x Q(x; x } P (xq(x ; x If x = x, then it is automatically detailed balancing due to the symmetry of the definition of detailed balance. To be specific, the condition of detailed balance, which is T i Π = T i Π i, will always be valid if i =, which is ust the case of x = x. For the circumstances that x x, by using the distributive property of multiplication, the transition probability is derived as, Multiply both sides by P (x, it turns out that T (x ; x = min Q(x ; x, P (x Q(x; x }. P (x T (x ; xp (x = min Q(x ; xp (x, P (x Q(x; x } }} symmetric for x & x = T (x; x P (x Therefore, we proved the detailed balance of the transition matrix, and thus the Markov chain of the Metropolis-Hastings algorithm does have a stationary distribution, which means that we could use such a Markov chain to simulate the original PDF. 5 Gibbs Sampling Now, back to the very first problem of this class, we want to get the result of P (x k = x\x k Z f( x m without knowing Z. We could use the Gibbs Sampling, shown in Algorithm, where x k means all the variables x except x k. Note that the Gibbs Sampling is actually a particular instance of the Metropolis- Hastings algorithm where the new candidate is always accepted. This is proved as follows. m 4

Algorithm : Gibbs Sampling for k =... K do x k P (x k x k = Z m M(k f( x m; Substitute P (x = P (x k P (x k x k P (x = P (x kp (x k x k Q(x ; x = P (x k x k Q(x; x = P (x k x k x k = x k into the probability of the accepting the new candidate, we have P (x Q(x; x P (xq(x ; x = P (x k P (x k x k P (x k x k P (x k P (x k x k P (x k x k =. Therefore, the Gibbs Sampling algorithm will always accept the candidate. Shuie Chen 3/2; DG 3/3 5