Path Coupling and Aggregate Path Coupling

Path Coupling and Aggregate Path Coupling 0 Path Coupling and Aggregate Path Coupling Yevgeniy Kovchegov Oregon State University (joint work with Peter T. Otto from Willamette University)

Path Coupling and Aggregate Path Coupling 1 Mixing times for Glauber dynamics. Important question: the relationship between the mixing time and the equilibrium phase transition of the corresponding statistical mechanical models. For models that exhibit a continuous phase transition: used standard path coupling (Bubley and Dyer 97). There rapid mixing is proved by showing contraction of the mean coupling distance between neighboring configurations. For models that exhibit a discontinuous phase transition, the standard path coupling method fails. Our approach combines aggregate path coupling and large deviation theory to determine the mixing times of a large class of statistical mechanical models, including those that exhibit a discontinuous phase transition. The aggregate path coupling method extends the use of the path coupling technique in the absence of contraction of the mean coupling distances betweenl neighboring configurations.

Path Coupling and Aggregate Path Coupling 2 Markov Chain Monte Carlo (MCMC). Goal: simulating an Ω-valued random variable distributed according to a given probability distribution π(z), given a complex nature of large discrete space Ω. MCMC: generating a Markov chain {X t } over Ω, with distribution µ t (z) = P (X t = z) converging rapidly to its unique stationary distribution, π(z). Metropolis-Hastings algorithm: Consider a connected neighborhood network with points in Ω. Suppose we know the ratios for any two neighbor points z and z on the network. of π(z ) π(z) Let for z and z connected by an edge of the network, the transition probability be set to p(z, z ) = 1 M min {1, π(z ) π(z) } for M large enough.

Path Coupling and Aggregate Path Coupling 3 Gibbs Sampling: Ising Model. Every vertex v of G = (V, E) is assigned a spin σ(v) { 1, +1} The probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T

Path Coupling and Aggregate Path Coupling 4 Gibbs Sampling: Ising Model. Every vertex v of G = (V, E) is assigned a spin σ(v) { 1, +1} The probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T +1 1 1 1 +1 +1 1 +1 +1 1 +1 1 1 +1 1

Path Coupling and Aggregate Path Coupling 5 Gibbs Sampling: Ising Model. σ { 1, +1} V, the Hamiltonian H(σ) = 1 σ(u)σ(v) = 2 u,v: u v edges e=[u,v] σ(u)σ(v) and probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T Z(β) = e βh(σ) - normalizing factor. σ { 1,+1} V

Path Coupling and Aggregate Path Coupling 6 Ising Model: local Hamiltonian H(σ) = 1 2 u,v: u v σ(u)σ(v) = The local Hamiltonian H local (σ, v) = u: u v edges e=[u,v] σ(u)σ(v). σ(u)σ(v) Observe: conditional probability for σ(v) is given by H local (σ, v): H(σ) = H local (σ, v) σ(u 1 )σ(u 2 ) e=[u 1,u 2 ]: u 1,u 2 v

Path Coupling and Aggregate Path Coupling 7 Gibbs Sampling: Ising Model via Glauber dynamics. +1 1 1 1 +1 +1 1 σ(v) +1 1 +1 1 1 +1 1 Observe: conditional probability for σ(v) is given by H local (σ, v): H(σ) = H local (σ, v) e=[u 1,u 2 ]: u 1,u 2 v σ(u 1 )σ(u 2 )

Path Coupling and Aggregate Path Coupling 8 Gibbs Sampling: Ising Model via Glauber dynamics. +1 1 1 1 +1 +1 1? +1 1 +1 1 1 +1 1 Randomly pick v G, erase the spin σ(v). Choose σ + or σ : Prob(σ σ + ) = e βh(σ + ) e βh(σ ) +e βh(σ + ) = e βh local (σ +,v) e βh local (σ,v) +e βh local (σ +,v)= e 2β e 2β +e 2β.

Path Coupling and Aggregate Path Coupling 9 Glauber dynamics: Rapid mixing. Glauber dynamics - a random walk on state space S (here { 1, +1} V ) s.t. needed π is stationary w.r.t. Glauber dynamics. In high temperatures (i.e. β = T 1 small enough) it takes O(n log n) iterations to get ε-close to π. Here V = n. Need: max v V deg(v) tanh(β) < 1 Thus the Glauber dynamics is a fast way to generate π. It is an important example of Gibbs sampling.

Path Coupling and Aggregate Path Coupling 10 Close enough distribution and mixing time. What is ε-close to π? Start with σ 0 : +1 +1 +1 +1 +1 +1 +1 +1 1 1 1 1 1 1 1 If P t (σ) is the probability distribution after t iterations, the total variation distance P t π T V = 1 P t (σ) π(σ) ε. 2 σ { 1,+1} V

Path Coupling and Aggregate Path Coupling 11 Mixing Times. Total variation distance between two distributions µ and ν: µ ν TV = sup µ(a) ν(a) = 1 A Ω 2 Maximal distance to stationary: x Ω d(t) = max P t (x, ) π x Ω TV µ(x) ν(x) where P t (x, ) is the transition probability starting iat x and π is its stationary distribution. Definition. Given ε > 0, the mixing time of the Markov chain is defined by t mix (ε) = min{t : d(t) ε} Mixing times categorized into two groups: rapid mixing = polynomial growth w.r.t. system size slow mixing = exponential growth w.r.t. system size

Path Coupling and Aggregate Path Coupling 12 Coupling Method. Ω - sample space, {p(i, j)} i,j Ω - transition probabilities Construct process (X t, Y t ) on Ω Ω such that X t is a {p(i, j)}-markov chain and Y t is a {p(i, j)}-markov chain. Once X t = Y t, let X t+1 = Y t+1, X t+2 = Y t+2,.... Coupling time: τ c = min{t : X t = Y t }. Successful coupling: P (τ c < ) = 1

Path Coupling and Aggregate Path Coupling 13 Mixing Times, Coupling and Path Coupling. Coupling Inequality. Let (X t, Y t ) be a coupling of a Markov chain where Y t is distributed by the stationary distribution π. The coupling time of the Markov chain is defined by τ c := min{t : X t = Y t }. Then, for all initial states x, and thus P t (x, ) π TV P (X t Y t ) = P (τ c > t) τ mix (ε) E[τ c] ε Thus τ mix (ε) = O(E[τ c ])

Path Coupling and Aggregate Path Coupling 14 Path coupling method Let {(X, Y )} be a coupling of P (x, ) and P (y, ) for which X 0 = x and Y 0 = y. Then P (x, ) P (y, ) T V P x,y (X Y ) Define a metric ρ on the space of configurations and let (x = x 0, x 1,..., x r = y) be a minimal path joining configurations x and y such that each pair of configurations (x j 1, x j ) are neighbors with respect to ρ. Then n P x,y (X Y ) E x,y [ρ(x, Y )] E xj 1,x j [ρ(x j 1, X j )] j=1

Path Coupling and Aggregate Path Coupling 15 Path coupling method Suppose the state space Ω of a Markov chain is the vertex set of a graph with path metric ρ. Suppose that for each edge {σ, τ} there exists a coupling (X, Y ) of the distributions P (σ, ) and P (τ, ) such that Then E σ,τ [ρ(x, Y )] ρ(σ, τ)e α for some α > 0 t mix (ε) log(ε) + log(diam(ω)). α Contraction is required for ALL pairs of neighboring configurations.

Path Coupling and Aggregate Path Coupling 16 Mean-field Blume-Capel model Spin model defined on the complete graph on n vertices. The spin at site j is denoted by ω j, taking values in Λ = {1, 0, 1}. The configuration space is the set Λ n of sequences ω = (ω 1, ω 2,..., ω n ) with each ω j Λ. In terms of a positive parameter K > 0 representing the interaction strength, the Hamiltonian is defined by 2 n H n,k (ω) = ωj 2 K n ω j. n j=1 j=1

Path Coupling and Aggregate Path Coupling 17 Mean-field Blume-Capel model For n N, inverse temperature β > 0, and K > 0, the Gibbs measure or canonical ensemble for the meanfield B-C model is the sequence of probability measures 1 P n,β,k (B) = Z n (β, K) exp[ βh n,k ] dp n B where P n is the product measure with marginals ρ = 1 3 (δ 1 + δ 0 + δ 1 ) and Z n (β, K) is the partition function Z n (β, K) = exp[ βh n,k ] dp n Λ n Taking the system size n to infinity, called the thermodynamic limit, yields the equilibrium state of the system.

Path Coupling and Aggregate Path Coupling 18 Mean-field Blume-Capel model Absorbing the noninteracting component of the Hamiltonian into the product measure P n yields [ ( ) ] 2 1 P n,β,k (dω) = Z n (β, K) exp Sn (ω) nβk P n,β (dω). n In this formula S n (ω) equals the total spin n j=1 ω j, P n,β is the product measure on Λ n with marginals ρ β (dω j ) = 1 Z(β) exp( βω2 j ) ρ(dω j), and Z(β) and Z n (β, K) are the appropriate normalizations.

Path Coupling and Aggregate Path Coupling 19 Mean-field Blume-Capel model With respect to the mean-field Blume-Capel model P n,β,k, S n /n satisfies the large deviations principle with speed n and rate function I β,k (z) = J β (z) βkz 2 inf y R {J β(y) βky 2 } where c β (t) = log and Λ exp(tω 1 ) ρ β (dω 1 ) = log J β (z) = sup{tz c β (t)}. t R ( ) 1 + e β (e t + e t ) 1 + 2e β

Path Coupling and Aggregate Path Coupling 20 Mean-field Blume-Capel model Large deviations principle. (a) For each closed set C, ( ) 1 lim sup n n log P Sn n,β,k n C (b) For each open set G, ( ) 1 lim inf n n log P Sn n,β,k n G inf z C I β,k(z) inf z G I β,k(z) Equilibrium macrostates: Ẽ β,k = {x [ 1, 1] : I β,k (x) = 0} = {x [ 1, 1] : x is a global min pt of J β (x) βkx 2 }

Path Coupling and Aggregate Path Coupling 21 Mean-field Blume-Capel model Free energy functional: G β,k (x) = βkx 2 c β (2βKx) Ẽ β,k = {x [ 1, 1] : x is a global min. point of G β,k (x)} G β,k exhibits two distinct behaviors for (a) β β c = log 4 and (b) β > β c.

Path Coupling and Aggregate Path Coupling 22 β β c = log 4 Mean-field Blume-Capel model K = K c (2) (β) second-order, continuous phase transition point

Path Coupling and Aggregate Path Coupling 23 Mean-field Blume-Capel model: Equilibrium phase diagram R.S. Ellis, P.T. Otto, and H. Touchette in AAP 2005

Path Coupling and Aggregate Path Coupling 24 Mean-field Blume-Capel model β > β c = log 4 K = K 1 (β) metastable critical point K = K c (1) (β) discontinuous, first-order phase transition point

Path Coupling and Aggregate Path Coupling 25 Mean-field Blume-Capel model: Equilibrium phase diagram R.S. Ellis, P.T. Otto, and H. Touchette in AAP 2005

Path Coupling and Aggregate Path Coupling 26 Glauber dynamics for BC model Choose vertex of underlying complete graph uniformly then update the spin at the vertex according to P n,β,k conditioned on the event that the spins at all other vertices remain unchanged. Reversible Markov chain with stationary distribution P n,β,k.

Path Coupling and Aggregate Path Coupling 27 Mean coupling distance for BC dynamics Path metric ρ on Ω n = { 1, 0, 1} n is defined by ρ(σ, τ) = n j=1 σj τ j For a coupling (X, Y ) of one step of the Glauber dynamics of the BC model starting in neighboring configurations σ and τ, asymptotically as n, E σ,τ [ρ(x, Y )] n 1 n n 1 n + (n 1) n [ c β (n 1) + 2βK n ( 2βK S ) ( n(τ) c β 2βK S )] n(σ) n n [ Sn (τ) S ] ( n(σ) c β 2βK S ) n(σ) n n n

Path Coupling and Aggregate Path Coupling 28 Behavior of c β

Path Coupling and Aggregate Path Coupling 29 Rapid mixing for β β c E σ,τ [ρ(x, Y )] n 1 n +(n 1) 2βK n [ Sn (τ) n ] ( S n(σ) c n β 2βK S n(σ) n ) Contraction of mean coupling distance between neighboring configurations σ and τ if ( c β 2βK S ) n(σ) < 1 n 2βK For β β c = log 4, ( c β 2βK S ) n(σ) n < c β (0) = 1 2βK (2) c (β) Rapid mixing when K < K c (2) (β).

Path Coupling and Aggregate Path Coupling 30 Behavior of c β

Path Coupling and Aggregate Path Coupling 31 Rapid mixing for β > β c. Aggregate path coupling: Let (σ = x 0, x 1,..., x r = τ) be a path connecting σ to τ and monotone increasing in ρ such that (x i 1, x i ) are neighboring configurations. E σ,τ [ρ(x, Y )] = r i=1 (n 1) n E xi 1,x i [ρ(x i 1, X i )] (n 1) ρ(σ, τ)+ n [ c β ( ) ( )] 2βK 2βK n S n(τ) c β n S n(σ) Assume S n (σ)/n 0. [ ] K Sn (τ) S n (σ) E σ,τ [ρ(x, Y )] K 1 (β) n [ ρ(σ, τ) 1 1 ( 1 K )] n K 1 (β)

Path Coupling and Aggregate Path Coupling 32 Rapid mixing for β > β c Let (X, Y ) be a coupling of one step of the Glauber dynamics of the BC model that begin in configurations σ and τ, not necessarily neighbors. 0, K 1(β) K K 1 (β) ( Suppose β ) > β c and K < K 1 (β). Then for any α there exists an ε > 0 such that, asymptotically as n, whenever S n (σ) < εn. E σ,τ [ρ(x, Y )] e α/n ρ(σ, τ)

Path Coupling and Aggregate Path Coupling 33 Rapid mixing for β > β c For β > β c and K < K 1 (β) For Y 0 dist = P n,β,k, P n,β,k {S n /n dx} = δ 0 as n. P t (X 0, ) P n,β,k T V P {X t Y t } = P {ρ(x t, Y t ) 1} E[ρ(X t, Y t )] e αt/n E[ρ(X 0, Y 0 )] + ntp n,β,k { S n /n ε} ne αt/n + ntp n,β,k { S n /n ε} Rapid mixing when K < K 1 (β).

Path Coupling and Aggregate Path Coupling 34 Slow mixing Bottleneck ratio (Cheeger constant) argument. For two configurations ω and τ, define the edge measure Q as Q(ω, τ) = P n,β,k (ω)p (ω, τ) and Q(A, B) = Q(ω, τ) ω A,τ B Here P (ω, τ) is the transition probability of the Glauber dynamics of the BC model. The bottleneck ratio of the set S is defined by Then Φ(S) = Q(S, Sc ) P n,β,k (S) and Φ = min Φ(S) S:P n,β,k (S) 1 2 t mix = t mix (1/4) 1 4Φ

Path Coupling and Aggregate Path Coupling 35 Slow mixing Suppose G β,k has a minimum (either local or global) point at z > 0. Let z be the corresponding local maximum point of G β,k such that 0 z < z. Define the bottleneck set A = { ω : z < S n(ω) n 1 The bottleneck set A exists, and thus slow mixing, for (a) β β c and K > K c (2) (β), and (b) β > β c and K > K 1 (β). }

Path Coupling and Aggregate Path Coupling 36 Equilibrium structure versus mixing times Y. K., P.T. Otto, and M. Titus in JSP 2011

Path Coupling and Aggregate Path Coupling 37 Generalizing Aggregate Path Coupling Method Configuration space: Let q be a fixed integer and define Λ = {e 1, e 2,..., e q }, where e k are the q standard basis vectors of R q, and ω = (ω 1, ω 2,..., ω n ) Λ n. The magnetization vector (a.k.a empirical measure or proportion vector): L n (ω) = (L n,1 (ω), L n,2 (ω),..., L n,q (ω)), where the kth component is defined by L n,k (ω) = 1 n δ(ω i, e k ) n i=1 - proportion of spins in configuration ω.

Path Coupling and Aggregate Path Coupling 38 Generalizing Aggregate Path Coupling Method Interaction representation function: H(z) = H 1 (z 1 ) + H 2 (z 2 ) +... + H q (z q ) Example: for the Curie-Weiss-Potts (CWP) model, H(z) = 1 2 z, z = 1 2 z2 1 1 2 z2 2... 1 2 z2 q. Hamiltonian: H n (ω) = nh(l n (ω)) P n,β (B) = 1 Z n (β) Canonical ensemble: B exp [ βh n (ω)] dp n = 1 Z n (β) B exp [nβ H (L n (ω))] dp n where P n = ρ... ρ and Z n (β) = Λ n exp [ βh n (ω)] dp n.

Path Coupling and Aggregate Path Coupling 39 Generalizing Aggregate Path Coupling Method q ( ) ν Relative entropy: R(ν ρ) = ν k log k ρ k k=1 Large deviations principle (R.S. Ellis, K. Haven, and B. Turkington, JSP 2000): The empirical measure L n satisfies the LDP with respect to the Gibbs measure P n,β with rate function I β (z) = R(z ρ) + βh(z) inf{r(t ρ) + βh(t)}. t Equilibrium macrostates: E β := {ν P : ν minimizes R(ν ρ) + βh(ν)}

Path Coupling and Aggregate Path Coupling 40 Generalizing Aggregate Path Coupling Method Let ρ be the uniform distribution. Logarithmic moment generating function of X 1 : ( ) 1 q Γ(z) = log exp{z k } q k=1 Free energy functional for the Gibbs ensemble P n,β : G β (z) = β( H) ( H(z)) Γ( β H(z)), where F (z) = sup x R q{ x, z F (x)} denotes Legendre- Fenchel transform. Then (M. Costeniuc, R.S. Ellis, and H. Touchette JMP 2005) and, therefore, inf {R(z ρ) + βh(z)} = inf {G β (z)} z P z R q E β = {z P : z minimizes G β (z)}

Path Coupling and Aggregate Path Coupling 41 Generalizing Aggregate Path Coupling Method E β = {z P : z minimizes G β (z)} We consider only the single phase region of the Gibbs ensemble; i.e. values of β where G β (z) has a unique global minimum, at z β. There, P n,β (L n dx) δ zβ as n For the Curie-Weiss-Potts model, the single phase region are values of β such that 0 < β < β c := 2(q 1) log(q 1) q 2

Path Coupling and Aggregate Path Coupling 42 Generalizing Aggregate Path Coupling Method Glauber dynamics: (i) Select a vertex i uniformly, (ii) Update the spin at vertex i according to the distribution P n,β, conditioned on the event that the spins at all vertices not equal to i remain unchanged. For configuration σ = (σ 1, σ 2,..., σ n ), let σ i,e k be the configuration that agrees with σ at all vertices j i and s.t. the spin at the vertex i is e k ; i.e. σ i,e k = (σ 1, σ 2,..., σ i 1, e k, σ i+1,..., σ n ) Then if the current configuration is σ and vertex i is selected, the probability the spin at i is updated to e k, denoted by P (σ σ i,e k), is equal to P (σ σ i,e k) = exp { βnh(l n (σ i,e k)) } q l=1 exp { βnh(l n (σ i,e l)) }

Path Coupling and Aggregate Path Coupling 43 Generalizing Aggregate ( Path Coupling ) Method Let g H,β (z) := g H,β 1 (z),..., gq H,β (z), where g H,β l (z) = exp ( β [ lh](z)) q exp ( β [ k H](z)) k=1 Main Result (Y.K. and Peter T. Otto, JSP 2015): Let z β be the unique equilibrium macrostate. Suppose δ (0, 1) s.t. inf π:z z β q k=1 π g H,β z z β 1 k (y), dy 1 δ for all z in P. Then the mixing time of the Glauber dynamics satisfies t mix = O(n log n)

Path Coupling and Aggregate Path Coupling 44 Generalizing Aggregate Path Coupling Method Application: the generalized Curie-Weiss-Potts model (GCWP). In GCWP model (B. Jahnel, C. Külske, E. Rudelli, and J. Wegener, MPRF 2015), for r 2, q H(z) = 1 r j=1 z r j Then β s (q, r) := sup g H,β e βzr 1 k k (z) =. Next, define e βzr 1 1 +...+e βzr 1 q { β 0 : g H,β k (z) < z k for all z P such that z k (1/q, 1] Corollary (Y.K. and Peter T. Otto, JSP 2015): If β < β s (q, r), then t mix = O(n log n). }

Path Coupling and Aggregate Path Coupling 45 Generalizing Aggregate Path Coupling Method Remark: Rapid mixing region for classical Curie-Weiss- Potts model (GCWP with r = 2) was first obtained (among other things) in P. Cuff, J. Ding, O. Louidor, E. Lubetzy, Y. Peres and A. Sly, JSP 2012. The rapid mixing region for GCWP was obtained in Y.K. and Peter T. Otto, JSP 2015 as a few page Corollary to the Main Result. Note that β s (q, r) β c (q, r) Here, the region of rapid mixing β < β s (q, r) is where G β (z) = β( H) ( H(z)) Γ( β H(z)) has a unique local minimum (=global minimum).