The coupling method - Simons Counting Complexity Bootcamp, 2016

Similar documents
Coupling AMS Short Course

Lecture 28: April 26

Lecture 6: September 22

Oberwolfach workshop on Combinatorics and Probability

7.1 Coupling from the Past

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

Coupling. 2/3/2010 and 2/5/2010

Characterization of cutoff for reversible Markov chains

Flip dynamics on canonical cut and project tilings

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

Path Coupling and Aggregate Path Coupling

TOTAL VARIATION CUTOFF IN BIRTH-AND-DEATH CHAINS

Introduction to Markov Chains and Riffle Shuffling

The Markov Chain Monte Carlo Method

On Markov Chain Monte Carlo

XVI International Congress on Mathematical Physics

The cutoff phenomenon for random walk on random directed graphs

Propp-Wilson Algorithm (and sampling the Ising model)

Introduction to Stochastic Processes

A Note on the Glauber Dynamics for Sampling Independent Sets

Characterization of cutoff for reversible Markov chains

MARKOV CHAINS AND MIXING TIMES

Lecture 14: October 22

Decomposition Methods and Sampling Circuits in the Cartesian Lattice

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

MARKOV CHAINS AND HIDDEN MARKOV MODELS

Glauber Dynamics for Ising Model I AMS Short Course

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

Can extra updates delay mixing?

FAST INITIAL CONDITIONS FOR GLAUBER DYNAMICS

Markov chain Monte Carlo algorithms

ARCC WORKSHOP: SHARP THRESHOLDS FOR MIXING TIMES SCRIBES: SHARAD GOEL, ASAF NACHMIAS, PETER RALPH AND DEVAVRAT SHAH

On Markov chains for independent sets

Modern Discrete Probability Spectral Techniques

Probability Models of Information Exchange on Networks Lecture 2. Elchanan Mossel UC Berkeley All rights reserved

Lecture 3: September 10

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

What can be sampled locally?

CUTOFF FOR GENERAL SPIN SYSTEMS WITH ARBITRARY BOUNDARY CONDITIONS

Polynomial-time counting and sampling of two-rowed contingency tables

MIXING. Abstract. 1 Introduction DANA RANDALL. 1.1 The basics of sampling

Statistics 251/551 spring 2013 Homework # 4 Due: Wednesday 20 February

1 Sequences of events and their limits

Not all counting problems are efficiently approximable. We open with a simple example.

A GIBBS SAMPLER ON THE N-SIMPLEX. By Aaron Smith Stanford University

Randomly coloring constant degree graphs

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Some Definition and Example of Markov Chain

MIXING TIMES OF LOZENGE TILING AND CARD SHUFFLING MARKOV CHAINS 1. BY DAVID BRUCE WILSON Microsoft Research

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Edge Flip Chain for Unbiased Dyadic Tilings

Convergence Rate of Markov Chains

LECTURE NOTES FOR MARKOV CHAINS: MIXING TIMES, HITTING TIMES, AND COVER TIMES IN SAINT PETERSBURG SUMMER SCHOOL, 2012

MARKOV CHAINS AND COUPLING FROM THE PAST

Hard-Core Model on Random Graphs

Applied Stochastic Processes

A simple condition implying rapid mixing of single-site dynamics on spin systems

Approximate Counting

The complexity of approximate counting Part 1

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Weak convergence in Probability Theory A summer excursion! Day 3

Solutions to Problem Set 5

Faithful couplings of Markov chains: now equals forever

Dynamics for the critical 2D Potts/FK model: many questions and a few answers

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines

Mixing time for a random walk on a ring

1. Stochastic Processes and filtrations

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Lecture 17 Brownian motion as a Markov process

Model Counting for Logical Theories

Lecture 19: November 10

CUTOFF FOR THE ISING MODEL ON THE LATTICE

arxiv: v2 [math.pr] 26 Aug 2017

EYAL LUBETZKY AND ALLAN SLY

INFORMATION PERCOLATION AND CUTOFF FOR THE STOCHASTIC ISING MODEL

curvature, mixing, and entropic interpolation Simons Feb-2016 and CSE 599s Lecture 13

Markov Chains and Mixing Times

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

An Introduction to Randomized algorithms

Simons Workshop on Approximate Counting, Markov Chains and Phase Transitions: Open Problem Session

Non-Essential Uses of Probability in Analysis Part IV Efficient Markovian Couplings. Krzysztof Burdzy University of Washington

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

G(t) := i. G(t) = 1 + e λut (1) u=2

Lecture 2: September 8

Problem Points S C O R E Total: 120

Sampling Adsorbing Staircase Walks Using a New Markov Chain Decomposition Method

Lecture 9: Counting Matchings

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH. John Pike

1 Stat 605. Homework I. Due Feb. 1, 2011

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

MONOTONE COUPLING AND THE ISING MODEL

Conjectures and Questions in Graph Reconfiguration

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Mixing Times and Hitting Times

Assignment 2 : Probabilistic Methods

NEW FRONTIERS IN APPLIED PROBABILITY

Lecture 5: Counting independent sets up to the tree threshold

Transcription:

The coupling method - Simons Counting Complexity Bootcamp, 2016 Nayantara Bhatnagar (University of Delaware) Ivona Bezáková (Rochester Institute of Technology) January 26, 2016

Techniques for bounding the mixing time Probabilistic techniques - coupling, martingales, strong stationary times, coupling from the past. Eigenvalues and eigenfunctions. Functional, isoperimteric and geometric inequalities - Cheeger s inequality, conductance, Poincaré and Nash inequalities, discrete curvature. (Levin-Peres-Wilmer 2009, Aldous-Fill 1999/2014) have comprehensive accounts.

In this part of the tutorial Coupling distributions Coupling Markov chains Path Coupling Exact sampling - coupling from the past

Definition - Total variation distance 1 γ = α = β = µ ν tv Let µ and ν be probability measures on the same measurable space (Ω, F). The total variation distance between µ and ν is given by Here Ω is finite and F = 2 Ω µ ν tv = sup µ(a) ν(a) A F µ ν tv = max A Ω µ(a) ν(a) = 1 2 µ(x) ν(x) x Ω

Definition - Coupling of distributions (Doeblin 1938) Let µ and ν be probability measures on the same measurable space (Ω, F). A coupling of µ and ν is a pair of random variables (X, Y ) on the probability space (Ω Ω, F F, P) such that the marginals coincide P(X A) = µ(a), P(Y A) = ν(a), A F.

Example - Biased coins µ: Bernoulli(p), probability p of heads (= 1). ν: Bernoulli(q), probability q > p of heads. H µ (n) = no. of heads in n tosses. H ν (n) = no. of heads in n tosses. Proposition 1. P(H µ (n) > k) P(H ν (n) > k)

Example - Biased coins µ: Bernoulli(p), probability p of heads (= 1). ν: Bernoulli(q), probability q > p of heads. H µ (n) = no. of heads in n tosses. H ν (n) = no. of heads in n tosses. Proposition 1. P(H µ (n) > k) P(H ν (n) > k) Proof. For 1 i n, (X i, Y i ) a coupling of µ and ν. Indep. (X i, Y i ). Let X i µ. If X i = 1, set Y i = 1. If X i = 0, set Y i = 1 w.p. q p 1 p and 0 otherwise. P(X 1 +... + X n > k) P(Y 1 +... + Y n > k)

Coupling and Total Variation Distance Lemma 2. Let µ and ν be distributions on Ω. Then µ ν tv inf {P(X Y )} couplings (X,Y ) Proof. For any coupling (X, Y ) and A Ω µ(a) ν(a) = P(X A) P(Y A) P(X A, Y / A) P(X Y ). Similarly, ν(a) µ(a) P(X Y ). Therefore, µ ν tv P(X Y ).

Maximal coupling 1 γ = α = β = µ ν tv Lemma 3. Let µ and ν be distributions on Ω. Then µ ν tv = inf {P(X Y )} couplings (X,Y )

Coupling Markov chains (Pitman 1974, Griffeath 1974/5, Aldous 1983) A coupling of Markov chains with transition matrix P is a process (X t, Y t ) t=0 such that both (X t) and (Y t ) are Markov chains with transition matrix P. Applied to bounding the rate of convergence to stationarity of MC s for sampling tilings of lattice regions, particle processes, card shuffling, random walks on lattices and other natural graphs, Ising and Potts models, colorings, independent sets... (Aldous-Diaconis 1986, Bayer-Diaconis 1992, Bubley-Dyer 1997, Luby-Vigoda 1999, Luby-Randall-Sinclair 2001, Vigoda 2001, Wilson 2004,... )

Simple random walk on {0, 1..., n} SRW: Move up or down with probability 1 2 if possible. Do nothing if attempt to move outside interval. Claim 4. If 0 x y n, P t (y, 0) P t (x, 0).

Simple random walk on {0, 1..., n} SRW: Move up or down with probability 1 2 if possible. Do nothing if attempt to move outside interval. Claim 4. If 0 x y n, P t (y, 0) P t (x, 0). A coupling (X t, Y t ) of P t (x, ) and P t (y, ): X 0 = x, Y 0 = y. Let b 1, b 2... be i.i.d. {±1}-valued Bernoulli(1/2). At the ith step, attempt to add b i to both X i 1 and Y i 1.

Simple random walk on {0, 1..., n} For all t, X t Y t. Therefore, P t (y, 0) = P(Y t = 0) P(X t = 0) = P t (x, 0).

Simple random walk on {0, 1..., n} For all t, X t Y t. Therefore, P t (y, 0) = P(Y t = 0) P(X t = 0) = P t (x, 0). Note: Can modify any coupling so that the chains stay together after the first time they meet. We ll assume this to be the case.

Distance to stationarity and the mixing time Define d(t) := max x Ω Pt (x, ) π tv, d(t) := max x,y Ω Pt (x, ) P t (y, ) tv By the triangle inequality, d(t) d(t) 2d(t). The mixing time is t mix (ε) := min{t : d(t) ε} It s standard to work with t mix := t mix (1/4) (any constant ε < 1/2) because t mix (ε) log 2 ε 1 t mix.

Bound on the mixing time Theorem 5. Let (X t, Y t ) be a coupling with X 0 = x and Y 0 = y. Let τ be the first time they meet (and thereafter, coincide). Then P t (x, ) P t (y, ) tv P x,y (τ > t) Proof. By Lemma 2, P t (x, ) P t (y, ) tv P(X t Y t ) = P x,y (τ > t)

Bound on the mixing time Theorem 5. Let (X t, Y t ) be a coupling with X 0 = x and Y 0 = y. Let τ be the first time they meet (and thereafter, coincide). Then P t (x, ) P t (y, ) tv P x,y (τ > t) Proof. By Lemma 2, P t (x, ) P t (y, ) tv P(X t Y t ) = P x,y (τ > t) Corollary 6. t mix 4 max x,y E x,y (τ ). Proof. d(t) d(t) max x,y P x,y (τ > t) max x,y E x,y (τ ) t

Example - Card shuffling by random transpositions Shuffling MC on Ω = S n : Choose card X t and an independent position Y t uniformly. Exchange X t with σ t (Y t ) (the card at Y t ). Stationary distribution is uniform over permutations S n.

Example - Card shuffling by random transpositions Coupling of σ t, σ t: Choose card X t and independent position Y t uniformly. Use X t and Y t to update both σ t and σ t Let M t = number of cards at the same position in σ and σ.

Example - Card shuffling by random transpositions Coupling of σ t, σ t: Choose card X t and independent position Y t uniformly. Use X t and Y t to update both σ t and σ t Let M t = number of cards at the same position in σ and σ. Case 1: X t in same position M t+1 = M t.

Example - Card shuffling by random transpositions Coupling of σ t, σ t: Choose card X t and independent position Y t uniformly. Use X t and Y t to update both σ t and σ t Let M t = number of cards at the same position in σ and σ. Case 2: X t in different pos. σ(y t ) = σ (Y t ). M t+1 = M t.

Example - Card shuffling by random transpositions Mt+1 = Mt + 1 Mt+1 = Mt + 2 Case 3: Mt+1 = Mt + 3 I Xt in different pos. I σ(yt ) 6= σ 0 (Yt ). I Mt+1 > Mt.

Example - Card shuffling by random transpositions Proposition 7 (Broder). any x, y, Let τ be the first time M t = n. For E x,y (τ ) < π2 6 n2, t mix = O(n 2 ). Proof. Let τ i = steps to increase M t from i 1 to i so τ = τ 1 + τ 2 + + τ n. P(M t+1 > M t M t = i) = Therefore, for any x, y (n i)2 n 2 E(τ i+1 M t = i) = n2 (n i) 2. n 1 E x,y (τ ) n 2 1 (n i) 2 < π2 6 n2. i=0

In this part of the tutorial Coupling distributions Coupling Markov chains Coloring MC Path Coupling Exact sampling - coupling from the past

A few remarks on yesterday s talk Shuffling by random transpositions has t mix O(n log n). Strong stationary times - random times at which the MC is guaranteed to be at stationarity. Maximal/optimal coupling 1 γ = α = β = µ ν tv Lemma 3. Let µ and ν be distributions on Ω. Then µ ν tv = inf {P(X Y )} couplings (X,Y )

Sampling Colorings Graph G = (V, E). Ω set of proper colorings of G. Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform over colorings Ω (q > + 1).

Sampling Colorings Graph G = (V, E). Ω set of proper colorings of G. Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform over colorings Ω (q > + 1).

Sampling Colorings Graph G = (V, E). Ω set of proper colorings of G. Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform over colorings Ω (q > + 1).

A coupling for colorings Recall, Theorem 8. Let (X t, Y t ) be a coupling with X 0 = x and Y 0 = y. Let τ be the first time they meet (and thereafter, coincide). Then d(t) max P x,y (τ > t). x,y Theorem 9. Let G have max. degree. Let q > 4. Then, 1 t mix (ε) 1 4 /q n(log n + log(ε 1 )). Coupling: Generate a vertex-color pair (v, k) uniformly. Update the colorings X t and Y t by recoloring v with k if it is allowed.

Case analysis For colorings X t, Y t, let D t := {v X t (v) Y t (v)}. D t+1 = D t 1 w.p. Dt(q 2 ) qn

Case analysis For colorings X t, Y t, let D t := {v X t (v) Y t (v)}. D t+1 = D t 1 w.p. Dt(q 2 ) qn D t+1 = D t + 1 w.p. Dt(2 ) qn

Case analysis For colorings X t, Y t, let D t := {v X t (v) Y t (v)}. D t+1 = D t 1 w.p. Dt(q 2 ) qn E(D t+1 X t, Y t ) D t Dt(q 2 ) qn D t+1 = D t + 1 w.p. Dt(2 ) qn ( ) + 2Dt qn = D t 1 q 4 qn

Case analysis For colorings X t, Y t, let D t := {v X t (v) Y t (v)}. D t+1 = D t 1 w.p. Dt(q 2 ) qn E(D t+1 X t, Y t ) D t Dt(q 2 ) qn ( Iterating, E(D t X 0, Y 0 ) D 0 D t+1 = D t + 1 w.p. Dt(2 ) qn ( ) + 2Dt qn = D t 1 q 4 qn ) t ( n 1 q 4 qn 1 q 4 qn ) t

Mixing time We ve shown E(D t X 0 = x, Y 0 = y) n By Theorem 8, ( 1 q 4 ) t ( n exp q 4 qn q d(t) max P x,y (τ > t) = max P(D t > 1 X 0 = x, Y 0 = y) x,y x,y max x,y E(D t X 0 = x, Y 0 = y) t n ). Therefore, 1 t mix (ε) 1 4 /q n(log n + log(ε 1 )).

Fast mixing for q > 2 (Jerrum 1995, Salas-Sokal 1997) Theorem 10. Let G have max. degree. If q > 2, the mixing time of the Metropolis chain on colorings is ( ) q t mix (ε) n(log n + log(ε 1 )) q 2 Use path metrics on Ω to couple only colorings with a single difference and simplify the proof. Coupling colorings with just one difference in a smarter way.

Contraction in D t Recall, we showed contraction in one step: for some α > 0 E(D t+1 X t, Y t ) D t e α d(t) ne αt log n + log ε 1 t mix (ε) α

Contraction in D t Recall, we showed contraction in one step: for some α > 0 E(D t+1 X t, Y t ) D t e α d(t) ne αt log n + log ε 1 t mix (ε) α

Contraction in D t Recall, we showed contraction in one step: for some α > 0 E(D t+1 X t, Y t ) D t e α d(t) ne αt log n + log ε 1 t mix (ε) α

Path Coupling (Bubley-Dyer 1997) Connected graph (Ω, E 0 ) on Ω. Length function l : E 0 [1, ). A path from x 0 to x r is ξ = (x 0, x 1,..., x r ), and (x i 1, x i ) E 0. Length of path ξ is l(ξ) := r i=1 l((x i 1, x i ). Path metric on Ω is ρ(x, y) = min{l(ξ) ξ is a path between x, y}

Path Coupling (Bubley-Dyer 1997) Theorem 11 (Bubley-Dyer 97). If for each (x, y) E 0 there is a coupling (X 1, Y 1 ) of P(x, ) and P(y, ) so for some α > 0, E x,y (ρ(x 1, Y 1 )) e α ρ(x, y), then log(diam(ω) + log(ε)) t mix (ε) α where diam(ω) = max x,y ρ(x, y).

Metropolis chain on extended space Graph G = (V, E). Ω set of all colorings of G (including improper). Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform on Ω, proper colorings.

Metropolis chain on extended space Graph G = (V, E). Ω set of all colorings of G (including improper). Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform on Ω, proper colorings.

Metropolis chain on extended space Graph G = (V, E). Ω set of all colorings of G (including improper). Metropolis MC: Select v V and k [q] uniformly. If k is allowed at v, update. Stationary distribution: uniform on Ω, proper colorings.

Path coupling for colorings Let x y in ( Ω, E 0 ) if their color differs at 1 vertex. For (x, y) E 0, let l(x, y) = 1.

Path coupling for colorings Let x y in ( Ω, E 0 ) if their color differs at 1 vertex. For (x, y) E 0, let l(x, y) = 1.

Path coupling for colorings Let x y in ( Ω, E 0 ) if their color differs at 1 vertex. For (x, y) E 0, let l(x, y) = 1.

Coupling Let v be the disagreeing vertex in colorings x and y. Pick a vertex u and color k uniformly at random. If u / N(v) attempt to update u with k. If u N(v) If k / {x(v), y(v)} attempt to update u with k. Otherwise, attempt to update u in x with k and in y with the color in {x(v), y(v)} \ {k}. E x,y (ρ(x 1, Y 1 )) ρ(x, y) q qn + qn = q 2 qn

Sampling colorings Conjecture: O(n log n) mixing for q + 2. (Vigoda 1999) O(n 2 log n) mixing for q 11 6. (Hayes-Sinclair 2005) Ω(n log n) lower bound. Restricted cases - triangle free graphs, large max. degree. Non-Markovian couplings. (Dyer, Flaxman, Frieze, Hayes, Molloy, Vigoda)

Exact Sampling - Coupling from the past (Propp-Wilson 1996) Two copies of the chain are run until they meet. When the chains meet, are they at stationarity? No: π(a) = 1 3, π(b) = 2 3, but the chains never meet at a. Coupling from the past (CFTP): If we run the chain backwards/ from the past from a time so that all trajectories meet, guaranteed to be at stationarity.

Random function representation of a MC Ergodic MC with transition matrix P. A distribution F over functions f : Ω Ω is a random function representation (RFR) iff P F (f (x) = y) = P(x, y) e.g. for the SRW on {0, 1..., n} let f (i) = min{i + 1, n}, f (i) = max{i 1, 0}. and F uniform on f and f. Proposition 12. Every transition matrix has an RFR, not necessarily unique. F defines a coupling on Ω via (x, y) f F (f (x), f (y))

Simulating forwards and backwards in time Associate a random function f t F to each t {,..., }. Forward simulation of the chain from x for t steps: F t 0(x) = f t 1 f 0 (x), P t (x, y) = P(F t 0(x) = y). Backward simulation of the chain from x for t steps: F 0 t(x) = f 1 f t (x). Let S be time so that F0 S (Ω) = 1. May not be stationary at S. CFTP: Let S be such that F S 0 (Ω) = 1. The chain is stationary!

Why does CFTP work? lim P(F t t 0(x) = y) = π(y) = lim P(F 0 t t(x) = y) Let S be such that F S 0 (Ω) = 1 and t > S. F 0 t(x) = f 1 f t (x) = F 0 S f S 1 f t (x) = F 0 S (y) = F 0 S (x). Since t > S was arbitrary, F 0 t has the same distribution as F 0.

Implementing CFTP Issues/Benefits: Choosing F so E(S) is small. When Ω is large, detecting when F S 0 (Ω) = 1. Exact samples even in absence of bounds on mixing time. Monotone CFTP: Partial order on states respected by the coupling with a maximal state x max and minimal state x min. Run CFTP from x max and x min. All trajectories will have converged when these coalesce. Ising Model, dimer configs. of hexagonal grid, bipartite independent set.

Strong stationary times A random variable τ is a stopping time for (X t ) if 1 τ=t is a function only of (X 0,..., X t ). A strong stationary time (SST) for (X t ) with stationary dist π is a randomized stopping time such that P x (τ = t, X τ = y) = P x (τ = t)π(y) Theorem 13. If τ is an SST then d(t) max P x (τ > t). x

Broder stopping time for random shuffling MC on S n : Choose L t and R t u.a.r. and transpose if different. Stopping time: Mark R t if both R t is unmarked Either L t is marked or L t = R t. Time τ for all cards to be marked is an SST. τ = τ 0 + + τ n 1 where τ k = number of transpositions after kth card is marked and upto and including when k + 1st card is marked. ( ) (k + 1)(n k) τ k Geom n 2

Coupon collector estimate Can also calculate ( n 2 ) ( = n2 1 (k + 1)(n k) n + 1 k + 1 + 1 ) n k n 1 E(τ) = E(τ k ) = 2n(log n + O(1)) k=0 Var(τ) = O(n 2 ). Let t 0 = E(τ) + 2 Var(τ). By Chebyshev, P(τ > t 0 ) 1 4. t mix (2 + o(1))n log n.