Convergence Rate of Markov Chains

Similar documents
MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Introduction to Machine Learning CMU-10701

Stochastic optimization Markov Chain Monte Carlo

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

The Markov Chain Monte Carlo Method

Markov Chains and MCMC

Markov chain Monte Carlo Lecture 9

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Some Definition and Example of Markov Chain

Convex Optimization CMU-10725

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Stat 516, Homework 1

Coupling AMS Short Course

Lecture 2: September 8

Markov Chains and MCMC

Markov Chain Monte Carlo (MCMC)

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018


Monte Carlo Methods. Leon Gu CSD, CMU

MARKOV CHAIN MONTE CARLO

Lecture 7. We can regard (p(i, j)) as defining a (maybe infinite) matrix P. Then a basic fact is

Advanced Sampling Algorithms

Stochastic Processes

Markov Chain Monte Carlo Methods

7.1 Coupling from the Past

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Discrete solid-on-solid models

Markov Chain Monte Carlo

Lecture 3: September 10

Introduction to Machine Learning CMU-10701

Simulation - Lectures - Part III Markov chain Monte Carlo

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

On random walks. i=1 U i, where x Z is the starting point of

Computational statistics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Lecture 28: April 26

CS281A/Stat241A Lecture 22

Lecture 8: The Metropolis-Hastings Algorithm

Markov chain Monte Carlo

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

CSC 446 Notes: Lecture 13

Brief introduction to Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Stat-491-Fall2014-Assignment-III

Probability & Computing

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

CPSC 540: Machine Learning

Session 3A: Markov chain Monte Carlo (MCMC)

Ch5. Markov Chain Monte Carlo

MCMC notes by Mark Holder

Markov Processes. Stochastic process. Markov process

STA 4273H: Statistical Machine Learning

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Markov Chains Handout for Stat 110

17 : Markov Chain Monte Carlo

Chapter 7. Markov chain background. 7.1 Finite state space

MCMC Methods: Gibbs and Metropolis

Bayesian Methods for Machine Learning

Answers and expectations

Quantifying Uncertainty

Lecture 6: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Computer intensive statistical methods

1 Ways to Describe a Stochastic Process

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Sampling Algorithms for Probabilistic Graphical models

Numerical methods for lattice field theory

Monte Carlo and cold gases. Lode Pollet.

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10. x n+1 = f(x n ),

Modern Discrete Probability Spectral Techniques

The Ising model and Markov chain Monte Carlo

Markov Chain Model for ALOHA protocol

Flip dynamics on canonical cut and project tilings

Stochastic Simulation

Markov chains. Randomness and Computation. Markov chains. Markov processes

Monte Carlo importance sampling and Markov chain

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

MCMC and Gibbs Sampling. Sargur Srihari

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

The Metropolis-Hastings Algorithm. June 8, 2012

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

MCMC algorithms for fitting Bayesian models

MARKOV PROCESSES. Valerio Di Valerio

I. Bayesian econometrics

Markov Processes Hamid R. Rabiee

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Markov chain Monte Carlo

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Model Counting for Logical Theories

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

TCOM 501: Networking Theory & Fundamentals. Lecture 6 February 19, 2003 Prof. Yannis A. Korilis

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Transcription:

Convergence Rate of Markov Chains Will Perkins April 16, 2013

Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution µ on the state space X, so that no matter where the chain starts, in distribution as n. X n µ A very natural question to ask is How long does it take? Why do we care about the rate of convergence to the stationary distribution?

Sampling Algorithms How do you sample from a given probability distribution (with a computer)? It depends on what description you have of the distribution. 1 If you know the distribution function F (t), then you can sample a uniform [0, 1] rv U and take X = F 1 (U). What is the distribution of X? 2 Rejection sampling: If we can draw a box around the graph of the density function f (x), then we can sample a uniform point under the curve, by taking a uniform point in the box and rejecting if the point is above the curve. [Picture]. Rejection sampling is useful when you have procedure to compute f (x) but the form is difficult to work with analytically. Analogue for discrete distributions?

Sampling Algorithms Those are the easy cases. Sometimes we only know indirect information about the distribution we would like to sample from. Eg. consider a given graph G on n vertices. Say we want to sample an independent set from the graph, uniformly over all independent sets. Where to start? We don t even know the number of independent sets of a graph (it s a difficult computational problem).

Markov Chain Monte Carlo One idea comes from stationary distributions. If we could somehow set up a Markov Chain so that our desired distribution is stationary, then we could just run the chain and (eventually) the distribution would converge to the desired distribution. So we could just sample X N for large enough N. Two questions: 1 How can we set up such a Markov Chain? 2 How large does N have to be? We will start with the second question today.

Total Variation Distance What does it mean for the distribution of X n to be close to µ? Recall Total Variation Distance: P Q TV = 1 P(x) Q(x) 2 x X Exercise: If P n Q TV 0, then P n Q in distribution. Alternate formulation: P Q TV = max P(A) Q(A) A X

Mixing Times Let X n have distribution π j n when X 0 = j, and let µ be the stationary distribution of the MC. Definition The mixing time of the Markov Chain X n is τ 1/4 = inf{n : sup πn j µ TV < 1/4 j X In other words, no matter in which state the chain starts, after n steps the distribution is distance at most 1/4 from stationary. We could pick any 1/4 < 1/2 and τ would be the same up to a constant factor.

Examples Usually we are interested in the asymptotics of the mixing times of large Markov Chains, indexed by some paramter n. Eg. random walks on graphs of n vertices. What is the mixing time of a random walk on a complete graph K n? What is the mixing time of a random walk on a cycle C n? We are looking for upper and lower bounds that hopefully differ by only a constant factor.

Reversibility Let π be a probability distribution on X. The detailed balance equations are: for all i, j X. π(i)p ij = π(j)p ji Lemma If π satisifies the detailed balance equations then it is the unique stationary distribution. A chain with such a distribution is called reversible. Proof:

Reversibility Reversibility is a good way to find a stationary distribution. Example: Say the transition matrix of a finite Markov chain is symmetric: p ij = p ji. What is the stationary distribution of such a Markov chain? Exercise (in class): For a given graph G on n vertices, find a Markov Chain on the independent sets of the graph so that the stationary distribution of the chain is the uniform distribution over all independent sets.

Ehrnfest Urn Ehrenfest Urn (lazy version): n balls lie in two Urns A and B. At each step, pick a ball uniformly at random and place it in one of the bins uniformly at random. What is the stationary distribution of the Ehrenfest Urn? Lazy Random Walk on the hypercube: State space {0, 1} n, at each step pick a coordinate and with probability 1/2 flip it. What is the stationary distribution of this random walk? How are the two chains related?

The Metropolis Algorithm Say we want to sample from a different distribution, not necessarily uniform. Can we change the transition rates in such a way that our desired distribution is stationary? Amazingly, yes. Say we have a distribution π over X so that π(x) = w(x) y X w(y) I.e. we know the proportions but not the normalizing constant (and X is much too big to compute it).

The Metropolis Algorithm Metropolis-Hastings Algorithm 1 Create a graph structure on X so the graph is connected and has maximum degree D. 2 Define the following transition probabilities: 1 p(x, y) = 1 2D (min{w(y)/w(x), 1}) if x and y are neighbors. 2 p(x, y) = 0 if x and y are not neighbors 3 p(x, x) = 1 y x p(x, y) 3 Check that this Markov chain is irreducible, aperiodic, reversible and has stationary distribution π.

Example Say we want to sample large independent sets from a graph G. I.e. P(I ) = λ I where Z = J λ J where the sum is over all independent sets. Note that this distribution gives more weight to the largest independent sets. Use the Metropolis Algorithm to find a Markov Chain with this distribution as the stationary distribution. Z

How to Bound Mixing Times The first technique we will consider is Coupling. Recall: A coupling of two Markov Chains X n and Y n is a way to defined them on the same probability space - i.e. specify a joint distribution so the marginal distributions are correct but somehow the dependence of the chains tells us something. We can specify a coupling of P and Q both distributions on X as a probability distribution µ on X X so that P(x) = y µ(x, y) Q(y) = x µ(x, y)

How to Bound Mixing Times The maximal coupling of P and Q is the coupling that maximizes µ(x, x) the probabilies along the diagonal. x Lemma Let µ be a maximal coupling of P and Q. Then P Q TV = 1 x µ(x, x) Proof:

How to Bound Mixing Times We will look at a particular type of coupling of two Markov chains. Let X n and Y n be two copies of the same Markov chain but with different starting positions: X 0 = x for some arbitrary state x and Y 0 will have distribution π, the stationary distribution of the chain. Proposition Proof: X n Y n TV Pr[X n Y n ]

How to Bound Mixing Times In particular, Y n has distribution π for all n, so we have X n π TV Pr[X n Y n ] One useful feature we can require of our coupling is that once X n and Y n collide, they move together. In this case we have: Proposition Let τ be the first time X n = Y n. Then X n π TV Pr[τ > n] Proof:

Examples The Ehrenfest Urn: How can we couple X n and Y n? The idea is to look a the refined chain, the random walk on the hypercube. Here there is a good candidate for a coupling: To update, pick one of the n coordinates at random for both chains together, and update to the same value. Check that this is a valid coupling and that once the chains collide they move together. What is the collision time?

What about lower bounds?