University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

Similar documents
Lecture 6: September 22

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods. Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda

A Note on the Glauber Dynamics for Sampling Independent Sets

Coupling. 2/3/2010 and 2/5/2010

The Markov Chain Monte Carlo Method

Lecture 28: April 26

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

The coupling method - Simons Counting Complexity Bootcamp, 2016

An Algorithmist s Toolkit September 24, Lecture 5

Coupling AMS Short Course

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

7.1 Coupling from the Past

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Lecture 3: September 10

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

25.1 Markov Chain Monte Carlo (MCMC)

Probability & Computing

Randomly coloring constant degree graphs

Lecture 19: November 10

1 Adjacency matrix and eigenvalues

The Monte Carlo Method

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Lecture 6: Random Walks versus Independent Sampling

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

An Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes)

Lecture 10: Lovász Local Lemma. Lovász Local Lemma

Sampling Adsorbing Staircase Walks Using a New Markov Chain Decomposition Method

Propp-Wilson Algorithm (and sampling the Ising model)

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Exponential Metrics. Lecture by Sarah Cannon and Emma Cohen. November 18th, 2014

Lecture 5: Random Walks and Markov Chain

Randomized Algorithms III Min Cut

Can extra updates delay mixing?

Conductance and Rapidly Mixing Markov Chains

Decomposition Methods and Sampling Circuits in the Cartesian Lattice

Lecture 14: October 22

Lecture 24: April 12

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

Randomly Coloring Graphs of Girth at Least Five

Lecture 9: Counting Matchings

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Testing Expansion in Bounded-Degree Graphs

Lecture 16: Roots of the Matching Polynomial

Markov chain Monte Carlo algorithms

XVI International Congress on Mathematical Physics

Lecture 5: Counting independent sets up to the tree threshold

(K2) For sum of the potential differences around any cycle is equal to zero. r uv resistance of {u, v}. [In other words, V ir.]

Approximate Counting

Quantum walk algorithms

1 Random Walks and Electrical Networks

A simple condition implying rapid mixing of single-site dynamics on spin systems

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems

Lecture 2: September 8

Statistics 251/551 spring 2013 Homework # 4 Due: Wednesday 20 February

On Markov chains for independent sets

Dynamic Programming: Shortest Paths and DFA to Reg Exps

What can be sampled locally?

Counting Linear Extensions of a Partial Order

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Approximate Counting and Markov Chain Monte Carlo

6.842 Randomness and Computation February 24, Lecture 6

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

Scribes: Po-Hsuan Wei, William Kuzmaul Editor: Kevin Wu Date: October 18, 2016

2.1 Laplacian Variants

HW 4 SOLUTIONS. , x + x x 1 ) 2

Graph coloring, perfect graphs

Analysis of Markov chain algorithms on spanning trees, rooted forests, and connected subgraphs

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

directed weighted graphs as flow networks the Ford-Fulkerson algorithm termination and running time

Lecture 24 Nov. 20, 2014

Lecture 11: Generalized Lovász Local Lemma. Lovász Local Lemma

APPROXIMATE/PERFECT SAMPLERS FOR CLOSED JACKSON NETWORKS. Shuji Kijima Tomomi Matsui

Nowhere zero flow. Definition: A flow on a graph G = (V, E) is a pair (D, f) such that. 1. D is an orientation of G. 2. f is a function on E.

Observation 4.1 G has a proper separation of order 0 if and only if G is disconnected.

Lecture 5: January 30

5 Flows and cuts in digraphs

Glauber Dynamics on Trees and Hyperbolic Graphs

Cographs; chordal graphs and tree decompositions

Graphs & Algorithms: Advanced Topics Nowhere-Zero Flows

Observation 4.1 G has a proper separation of order 0 if and only if G is disconnected.

Model Counting for Logical Theories

arxiv: v2 [cs.ds] 7 Jun 2018

Lecture 1 : Probabilistic Method

Walks, Springs, and Resistor Networks

Inductive bounds, cubes, trees and matroids

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Fractional and circular 1-defective colorings of outerplanar graphs

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Lecture 11 October 11, Information Dissemination through Social Networks

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines

Lower Bounds for Testing Bipartiteness in Dense Graphs

Lecture 10 February 4, 2013

Inapproximability After Uniqueness Phase Transition in Two-Spin Systems

Lecture 5: Probabilistic tools and Applications II

Lecture 15: Expanders

20.1 2SAT. CS125 Lecture 20 Fall 2016

6 Markov Chain Monte Carlo (MCMC)

Transcription:

University of Chicago Autumn 2003 CS37101-1 Markov Chain Monte Carlo Methods Lecture 4: October 21, 2003 Bounding the mixing time via coupling Eric Vigoda 4.1 Introduction In this lecture we ll use the coupling method to get a priori bounds on the mixing time of various Markov chains. Our first example generates a random spanning tree. We then introduce the path coupling framework which simplifies our work. Finally, we apply path coupling to analyze a Markov chain for generating a random k-colorings of a graph. 4.2 Random Spanning Trees 4.2.1 Introduction Given an undirected graph G = (V, E) we are interested in generating a spanning tree of G uniformly at random. A spanning tree is a subset S E such that S = V 1 and S contains no cycles, thus S has a unique path between every pair of vertices. Let Ω(G) denote the set of spanning trees of G. It is convenient to transform G to a directed graph G, and look at arborescences of G. From G, we form G by replacing each edge (u, v) by a pair of anti-parallel edges uv and vu. An arborescence rooted at a vertex r of a directed graph is a subset S of edges, such that S = V 1 and every vertex, except r, has exactly one edge directed away from it, thus S forms a tree rooted at r and all edges pointing towards r. Let Ω r ( G ) denote the arborescences rooted at r. It suffices to consider arborescences rooted at any vertex. Lemma 4.1 For G = (V, E), r V, there is a bijection between Ω(G) and Ω r ( G ). Proof: For a spanning tree S, form its associated arborescence by simply pointing all edges towards r (observe there is a unique way of doing this). For the reverse direction, given 4-1

Lecture 4: October 21, 2003 4-2 an arborescence, simply remove the orientations and the resulting undirected edges form a spanning tree. 4.2.2 Markov chains For simplicity, let us assume G is a d-regular graph. Here is a simple Markov chain MC on the set of all arborescences Ω( G ) = v V Ω v( G ). From X t Ω( G ) with root u, Choose a random neighbor v of u. Let vw denote the unique edge oriented away from v in X t. Set X = X t \ vw uv. Thus, X is an arborescence rooted at v. Let X t+1 = { X with probability 1/2 X t otherwise. Since the graph is regular, the chain is symmetric. The self-loop probability of 1/2 (i.e., staying at the same state) ensures the chain is aperiodic. Therefore, the chain is ergodic and the unique stationary distribution is uniformly distributed over Ω( G ). The above chain was originally analyzed by Broder and Aldous. Propp and Wilson [5] introduced a slight variant of the above chain which we ll use. Given a root r V, the following chain MC r has state space Ω r ( G ). From Y t Ω r ( G ), Run the chain MC with initial state X 0 = Y t. Stop at the first time t > 0 where X t Ω r ( G ). In words, stop as soon as the root returns to r. Set Y t+1 = X t. Exercise 4.2 For a regular graph G, prove that the stationary distribution of MC r is uniformly distributed over Ω r ( G ).

Lecture 4: October 21, 2003 4-3 4.2.3 Analysis To analyze the mixing time we ll bound the coupling time of the chain. For every pair (Y t, Y t ) Ω r ( G ) 2, we ll define a coupling (Y t, Y t ) (Y t+1, Y t+1). The following lemma shows how the time till the chains couple gives a bound on the mixing time. Lemma 4.3 Given a coupling (Y t, Y t ) (Y t+1, Y t+1) for all (Y t, Y t ) Ω r ( G ) 2, such that for all X 0, Y 0 Ω, there exists a time T where Pr ( X T Y T X 0, Y 0 ) 1/4, then T mix (1/4) T. This Lemma, with slightly different language, was proven in Lecture 2. We will bound the first time T the chains couple via the cover time. The cover time of a graph G is the number of steps till a random walk on G visits every vertex at least once. Denote this as T cover. Lemma 4.4 (Propp and Wilson [5]) E ( T ) 4E ( T cover ). Proof: Denote the Hamming distance (or the difference) between the pair of chains by H(Y t, Y t ) = Y t \ Y t. Once H(Y t, Y t ) = 0 the chains have coupled. It will be straightforward to bound the time till the distance reaches zero. The coupling is the obvious choice. A step Y t Y t+1 of MC r involves a sequence of transitions X 0 X 1... X t 1 X t of chain MC. Call this sequence an excursion. Both chains Y t, Y t use the identical excursion. Consider the effect of an excursion on the resulting arborescence Y t+1. Say vertex v r is visited during the excursion, if X s Ω v ( G ) for some time s < t (i.e., the root moved to v for some intermediate arborescence). For the last such time s, let w denote the next root. Thus, we know vw X s+1 and vw X t (since it s the last visit to v). This implies vw Y t+1. Similarly, vw Y t+1. Thus, if a vertex v r is visited during an excursion, then the two chains agree on the edge coming out of v, i.e., vw Y t+1 and vw Y Therefore, once every vertex is visited on some excursion, the two chains are identical on every edge. The expected time till every vertex is visited is the cover time. Running at most t+1.

Lecture 4: October 21, 2003 4-4 a further cover time brings us back to the root r. There is an extra factor of two coming from the self-loop probability of 1/2 in the definition of MC. This proves the lemma. We can then bound the mixing time by Markov s inequality. T mix (1/4) 16E ( T cover ), 4.2.4 Cover time We conclude with a few final remarks about the cover time. For any graph, Feige [2, 3] proved n log n E ( T cover ) O( V E ). For some specific class of graphs, such as a random graph, more precise results are known. 4.3 Path Coupling In the above example the distance is non-increasing. Typically things are not that easy, and even defining a good coupling can be complicated. The following lemma of Bubley and Dyer [1], known as the path coupling lemma, significantly simplifies the definition and analysis of a coupling. Before stating and proving the lemma we need some basics about couplings. 4.3.1 Composing Couplings Consider a coupling α of µ (on Ω 1 ) with ν (on Ω 2 ), and a coupling β of ν with ω (on Ω 3 ). Define their composition as, for all x Ω 1, z Ω 3, γ(x, z) = (α β)(x, z) := α(x, y)β(y, z). ν(y) y Ω 2 Alternatively, we can construct γ algorithmically in the following manner. First, sample (X, Y ) from α. Given (X, Y ) = (x, y), we then sample (Y, Z) from β conditional on Y = y. We output (X, Z). Exercise 4.5 Prove γ is a coupling of µ with ω.

Lecture 4: October 21, 2003 4-5 4.3.2 Statement of Lemma Consider a Markov chain on Ω. The path coupling lemma says it suffices to define and analyze a coupling for pairs in a subset of Ω Ω assuming the subset connects the state space. Lemma 4.6 (Bubley and Dyer [1]) Let S Ω Ω such that (Ω, S) is connected. For (X, Y ) Ω Ω, define dist(x, Y ) as the shortest path length between X and Y in the graph (Ω, S). (Thus, if (X, Y ) S, then dist(x, Y ) = 1.) If there exists β < 1, and for all (X t, Y t ) S a coupling (X t, Y t ) (X t+1, Y t+1 ) such that E ( dist(x t+1, Y t+1 X t, Y t ) βdist(x t, Y t ), (4.1) then where d max is the diameter of the graph (Ω, S). T mix (ɛ) log(d max/ɛ), (4.2) 1 β If (4.1) holds with β = 1 and there exists α > 0 such that for this coupling for all (X t, Y t ) Ω 2, Pr ( dist(x t+1, Y t+1 ) dist(x t, Y t ) X t, Y t ) α then Proof: We ll only prove the β < 1 case. T mix (ɛ) ed max 2 α log(1/ɛ), We first construct a coupling for all (X t, Y t ) Ω Ω which satisfies (4.1). We do this by simply composing couplings along a shortest path between X t and Y t. Let Zt 0 = X t, Zt 1,..., Z j t = Y t, j = dist(x t, Y t ) denote an arbitrary such shortest path. From the hypothesis of the lemma we know We then have E ( dist(z i t+1, Z i+1 t+1 Z i t, Z i+1 t E ( dist(x t+1, Y t+1 X t, Y t ) 1 i<j ) βdist(z i t, Z i+1 t ). E ( ) dist(zt+1, i Zt+1 i+1 Zt, i Zt i+1 β dist(zt, i Zt i+1 ) i = βdist(x t, Y t )

Lecture 4: October 21, 2003 4-6 Inductively we have Pr ( X t Y t X 0, Y 0 ) E ( dist(x t, Y t ) X 0, Y 0 ) β t d max ɛ for t log(d max /ɛ)/(1 β), where the last inequality uses that log(1/β) > 1 β. 4.4 Random Graph Colorings For G = (V, E), let Ω = {1, 2,..., k} V. Let N(v) denote the neighbors of vertex v. The following Markov chain is known as the Glauber dynamics. From X t Ω, Choose z uniformly at random from V, and c uniformly at random from {1,..., k}. For all w z, set X t+1 (z) = X t (z). If no neighbors of z have color c (i.e., c X t (N(z))), then set X t+1 (z) = c, otherwise set X t+1 (z) = X t (z). A proper k-coloring is an assignment σ Ω such that all adjacent vertices receive different colors. Exercise 4.7 Let denote the maximum degree of G. Prove that when k > + 1 the stationary distribution of the above Glauber dynamics is uniformly distributed over proper k-colorings. We ll prove fast convergence via path coupling. We begin with following theorem. Theorem 4.8 Let denote the maximum degree of G. If k > 3, then τ(ɛ) nk ln(n/ɛ). Proof: Let S denote the pairs of colorings which differ at exactly one vertex. Consider a pair of such colorings (X t, Y t ) S where X t (v) Y t (v), say X t (v) = c X and Y t (v) = c Y. The coupling is simply the identity coupling. Both chains attempt to update the same vertex z to the same color c. Observe that only updates with z N(v) and c {c X, c Y } might

Lecture 4: October 21, 2003 4-7 succeed or fail in exactly one chain. All other updates succeed or fail in both chains. Thus, these are the only updates which might increase the distance. The only updates which might decrease the distance are successful recolorings of v. Since there are at most colors in the neighborhood of v, there are at least k available colors for v. Combining these contributions, we have, for (X t, Y t ) S, for k > 3. E ( dist(x t+1, Y t+1 ) X t, Y t ) 1 + 1 (2 (k )) 1 1/kn, kn Plugging this bound into the path coupling lemma proves the theorem. Modifying the above coupling we can achieve the following improvement. Theorem 4.9 (Jerrum [4]) If k > 2, then τ(ɛ) nk ln(n/ɛ). Proof: The set S remains the same, we simply modify the coupling slightly. If X t attempts to recolor a neighbor w of v to color c X, then Y t attempts to recolor w to c Y. Similarly, if X t attempts to recolor w N(v) to c X, then Y t attempts to recolor w to c X. In all other cases, X t and Y t attempt to modify the same vertex to the same color. Observe that w N(v) can not receive color c X in X t (since this is v s color), and w can not receive c Y in Y t. Thus, such a coupled update has no effect on the pair of colorings. Therefore, there are at most coupled recolorings which might increase the distance attempting to recolor w N(v) to c Y in X t and to c X in Y t. We now have for k > 2. E ( dist(x t+1, Y t+1 ) X t, Y t ) 1 1 ( (k )) 1 1/kn, kn Exercise 4.10 Why in the above proof did we need the Glauber dynamics to be defined on the set {1,..., k} V, instead of just on the set of proper k-colorings? We need it in order to claim that the distance increases by at most one (instead of two) after we attempt to recolor w N(v) to c Y in X t and to c X in Y t.

Lecture 4: October 21, 2003 4-8 References [1] R. Bubley and M. Dyer. Path coupling: A technique for proving rapid mixing in Markov chains. In 38th Annual Symposium on Foundations of Computer Science, pages 223 231, Miami Beach, FL, October 1997. IEEE. [2] U. Feige. A tight lower bound on the cover time for random walks on graphs. Random Structures Algorithms, 6(4):433 438, 1995. [3] U. Feige. A tight upper bound on the cover time for random walks on graphs. Random Structures Algorithms, 6(1):51 54, 1995. [4] M. Jerrum. A very simple algorithm for estimating the number of k-colorings of a lowdegree graph. Random Structures Algorithms, 7(2):157 165, 1995. [5] J. G. Propp and D. B. Wilson. How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms, 27(2):170 217, 1998.