Decomposition Methods and Sampling Circuits in the Cartesian Lattice

Size: px
Start display at page:

Download "Decomposition Methods and Sampling Circuits in the Cartesian Lattice"

Transcription

1 Decomposition Methods and Sampling Circuits in the Cartesian Lattice Dana Randall College of Computing and School of Mathematics Georgia Institute of Technology Atlanta, GA Abstract. Decomposition theorems are useful tools for bounding the convergence rates of Markov chains. The theorems relate the mixing rate of a Markov chain to smaller, derivative Markov chains, defined by a partition of the state space, and can be useful when standard, direct methods fail. Not only does this simplify the chain being analyzed, but it allows a hybrid approach whereby different techniques for bounding convergence rates can be used on different pieces. We demonstrate this approach by giving bounds on the mixing time of a chain on circuits of length 2n in ZZ d. 1 Introduction Suppose that you want to sample from a large set of combinatorial objects. A popular method for doing this is to define a Markov chain whose state space Ω consists of the elements of the set, and use it to perform a random walk. We first define a graph H connecting pairs of states that are close under some metric. This underlying graph on the state space representing allowable transitions is known as the Markov kernel. To define the transition probabilities of the Markov chain, we need to consider the desired stationary distribution π on Ω. A method known as the Metropolis algorithm assigns probabilities to the edges of H so that the resulting Markov chain will converge to this distribution. In particular, if is the maximum degree of any vertex in H, and (x, y) is any edge, P (x, y) = 1 ( 2 min 1, π(y) ). π(x) We then assign self loops all remaining probability at each vertex, so P (x, x) 1/2 for all x Ω. If H is connected, π will be the unique stationary distribution of this Markov chain. We can see this by verifying that detailed balance is satsified on every edge (x, y), i.e., π(x)p (x, y) = π(y)p (y, x). As a result, if we start at any vertex in Ω and perform a random walk according to the transition probabilities defined by P, and we walk long enough, we will converge to the desired distribution. For this to be useful, we need that we are converging rapidly to π, so that after a small, polynomial number of steps, our samples will be chosen from a Supported in part by NSF Grant No. CCR

2 distribution which is provably arbitrarily close to stationarity. A Markov chain with this property is rapidly mixing. Consider, for example, the set of independent sets I of some graph G. Taking the Hamming metric, we can define H by connecting any two independent sets that differ by the addition or deletion of a single vertex. A popular stationary distribution is the Gibbs distribution which assigns weight π(i) = γ I /Z γ to I, where γ > 0 is an input parameter of the system, I is the size of the independent set I, and Z γ = J I γ J is the normalizing constant known as the partition function. In the Metropolis chain, we have P (I, I ) = 1 2n min(1, γ) if I is formed by adding a vertex to I, and P (I, I ) = 1 2n min(1, γ 1 ) if I is formed by deleting a vertex from I. Recently there has been great progress in the design and analysis of Markov chains which are provably efficient. One of the most popular proof techniques is coupling. Informally, coupling says that if two copies of the Markov chain can be simultaneously simulated so that they end up in the same state very quickly, regardless of the starting states, then the chain is rapidly mixing. In many instances this is not hard to establish, which gives a very easy proof of fast convergence. Despite the appeal of these simple coupling arguments, a major drawback is that many Markov chains which appear to be rapidly mixing do not seem to admit coupling proofs. In fact, the complexity of typical Markov chains often makes it difficult to use any of the standard techniques, which include bounding the conductance, the log Sobolev constant or the spectral gap, all closely related to the mixing rate. The decomposition method offers a way to systematically simplify the Markov chain by breaking it into more manageable pieces. The idea is that it should be easier to apply some of these techniques to the simplified Markov chains and then infer a bound on the original Markov chain. In this survey we will concentrate on the state decomposition theorem which utilizes some partition of the state space. It says that if the Markov chain is rapidly mixing when restricted to each piece of the partition, and if there is sufficient flow between the pieces (defined by a projection of the chain), then the original Markov chain must be rapidly mixing as well. The allows us to take a top-down approach to mixing rate analysis, whereby we need only consider the mixing rate of the restrictions and the projection. In many cases it is easier to define good couplings on these simpler Markov chains, or to use one of the other known methods of analysis. We note, however, that using indirect methods such as the decomposition or comparison (defined later) invariably adds orders of magnitude the bounds on the running time of the algorithm. Hence it is wise to use these methods judiciously unless the goal is simply to establish a polynomial bound on the mixing rate. 2 Mixing machinery In what follows, we assume that M is an ergodic (i.e. irreducible and aperiodic), reversible Markov chain with finite state space Ω, transition probability matrix P, and stationary distribution π. The time a Markov chain takes to converge to its stationary distribution, i.e., the mixing time of the chain, is measured in terms of the distance between the distribution at time t and the stationary distribution. Letting P t (x, y) denote the t-step probability

3 of going from x to y, the total variation distance at time t is For ε > 0, the mixing time τ(ε) is P t 1, π tv = max P t (x, y) π(y). x Ω 2 y Ω τ(ε) = min{t : P t, π tv ε, t t}. We say a Markov chain is rapidly mixing if the mixing time is bounded above by a polynomial in n and log 1 ε, where n is the size of each configuration in the state space. It is well known that the mixing rate is related to the spectral gap of the transition matrix. For the transition matrix P, we let Gap(P ) = λ 0 λ 1 denote its spectral gap, where λ 0, λ 1,..., λ Ω 1 are the eigenvalues of P and 1 = λ 0 > λ 1 λ i for all i 2. The following result the spectral gap and mixing times of a chain (see, e.g., [18]). Theorem 1. Let π = min x Ω π(x). For all ε > 0 we have (a) τ(ε) 1 Gap(P ) log( 1 π ) ε (b) τ(ε) λ1 2Gap(P ) log( 1 2ε ). Hence, if 1/Gap(P ) is bounded above by a polynomial, we are guaranteed fast (polynomial time) convergence. For most of what follows we will rely on the spectral gap bound on mixing. Theorem 1 is useful for deriving a bound on the spectral gap from a coupling proof, which provides bounds on the mixing rate. We now review of some of the main techniques used to bound the mixing rate of a chain, including the decomposition theorem. 2.1 Path Coupling One of the most popular methods for bounding mixing times has been the coupling method. A coupling is a Markov chain on Ω Ω with the following properties. Instead of updating the pair of configurations independently, the coupling updates them so that i) the two processes will tend to coalesce, or move together under some measure of distance, yet ii) each process, viewed in isolation, is performing transitions exactly according to the original Markov chain. A valid coupling ensures that once the pair of configurations coalesce, they agree from that time forward. The mixing time can be bounded by the expected time for configurations to coalesce under any valid coupling. The method of path coupling simplifies our goal by letting us bound the mixing rate of a Markov chain by considering only a small subset of Ω Ω [3, 6]. Theorem 2. (Dyer and Greenhill [6]) Let Φ be an integer valued metric defined on Ω Ω which takes values in {0,..., B}. Let U be a subset of Ω Ω such that for all (x, y) Ω Ω there exists a path x = z 0, z 1,..., z r = y between x and y such that (z i, z i+1 ) U for 0 i < r and r 1 Φ(z i, z i+1 ) = Φ(x, y). i=0

4 Define a coupling (x, y) (x, y ) of the Markov chain M on all pairs (x, y) U. Suppose that there exists α < 1 such that E[Φ(x, y )] α Φ(x, y) for all (x t, y t ) U, Then the mixing time of M satisfies τ(ɛ) log(bɛ 1 ) 1 α. Useful bounds can also be derived in the case that α = 1 in the theorem (see [6]). 2.2 The disjoint decomposition method Madras and Randall [12] introduced two decomposition theorems which relate the mixing rate of a Markov chain to the mixing rates of related Markov chains. The state decomposition theorem allows the state space to be decomposed into overlapping subsets; the mixing rate of the original chain can be bounded by the mixing rates of the restricted Markov chains, which are forced to stay within the pieces, and the ergodic flow between these sets. The density decomposition theorem is of a similar flavor, but relates a Markov chain to a family of other Markov chains with the same Markov kernel, where the transition probabilities of the original chain can be described as a weighted average of the transition probabilities of the chains in the family. We will concentrate on the state decomposition theorem, and will present a newer version of the theorem due to Martin and Randall [15] which allows the decomposition of the state to be a partition, rather than requiring that the pieces overlap. Suppose that the state space is partitioned into m disjoint pieces Ω 1,..., Ω m. For each i = 1,..., m, define P i = P {Ω i } as the restriction of P to Ω i which rejects moves that leave Ω i. In particular, the restriction to Ω i is a Markov chain, M i, where the transition matrix P i is defined as follows: If x y and x, y Ω i then P i (x, y) = P (x, y); if x Ω i then P i (x, x) = 1 y Ω P i,y x i(x, y). Let π i be the normalized restriction of π to Ω i, i.e., π i (A) = π(a Ωi) π(ω i). Notice that if Ω i is connected then π i is the stationary distribution of P i. Next, define P to be the following aggregated transition matrix on the state space [m]: P (i, j) = 1 π(x)p (x, y). π(ω i ) x Ω i, y Ω j Theorem 3. (Martin and Randall [15]) Let P i and P be as above. Then the spectral gaps satisfy Gap(P ) 1 Gap(P ) min 2 Gap(P i). i [m] A useful corollary allows us to replace P in the theorem with the Metropolis chain defined on the same Markov kernel, provided some simple conditions are satisfied. Since the transitions of the Metropolis chain are fully defined by the stationary distribution π, this is often easier to analyze than the true projection. Define P M on the set [m], with Metropolis transitions P M (i, j) = min{1, π(ωj) π(ω }. i) Let i (Ω j ) = {y Ω j : x Ω i with P (x, y) > 0}.

5 Corollary 1. [15] With P M as above, suppose there exists β > 0 and γ > 0 such that (a) P (x, y) β whenever P (x, y) > 0; (b) π( i (Ω j )) γπ(ω j ) whenever P (i, j) > 0. Then Gap(P ) 1 2 βγ Gap(P M ) min Gap(P i). i=1,...,m 2.3 The comparison method When applying the decomposition theorem, we reduce the analysis of a Markov chain to bounding the convergence times of smaller related chains. In many cases it will be much simpler to analyze variants of these auxiliary Markov chains instead of the true restrictions and projections. The comparison method tells us ways in which we can slightly modify one of these Markov chains without qualitatively changing the mixing time. For instance, it allows us to add additional transition edges or to amplify some of the transition probabilities, which can be useful tricks for simplifying the analysis of a chain. Let P and P be two reversible Markov chains on the same state space Ω with the same stationary distribution π. The comparison method allows us to relate the mixing times of these two chains (see [4] and [17]). In what follows, suppose that Gap( P ), the spectral gap of P, is known (or suitably bounded) and we desire a bound on Gap(P ), the spectral gap of P, which is unknown. Following [4], we let E(P ) = {(x, y) : P (x, y) > 0} and E( P ) = {(x, y) : P (x, y) > 0} denote the sets of edges of the two chains, viewed as directed graphs. For each (x, y) E( P ), define a path γ xy using a sequence of states x = x 0, x 1,..., x k = y with (x i, x i+1 ) E(P ), and let γ xy denote the length of the path. Let Γ (z, w) = {(x, y) E( P ) : (z, w) γ xy } be the set of paths that use the transition (z, w) of P. Finally, define 1 A = max γ xy π(x) (z,w) E(P ) π(z)p (z, w) P (x, y). Γ (z,w) Theorem 4. (Diaconis and Saloff-Coste [4]) With the above notation, the spectral gaps satisfy Gap(P ) 1 A Gap( P ) It is worthwhile to note that there are several other comparison theorems which turn out to be useful, especially when applying decomposition techniques. The following lemma helps us reason about a Markov chain by slightly modifying the transition probabilities (see, e.g., [10]). We use this trick in our main application, sampling circuits. Lemma 1. Suppose P and P are Markov chains on the same state space, each reversible with respect to the distribution π. Suppose there are constants c 1 and c 2 such that c 1 P (x, y) P (x, y) c 2 P (x, y) for all x y. Then c 1 Gap(P ) Gap(P ) c 2 Gap(P ).

6 3 Sampling circuits in the Cartesian lattice A circuit in ZZ d is a walk along lattice edges which starts and ends at the origin. Our goal is to sample from C, the set of circuits of length 2n. It is useful to represent each walk as a string of 2n letters using {a 1,..., a d } and their inverses {a 1 1,..., a 1 d }, where a i represents a positive step in the ith direction, and a 1 i represents a negative step. Since these are closed circuits, the number of times a i appears must equal the number of times a 1 i appears, for all i. We will show how to uniformly sample from the set of all circuits of length 2n using an efficient Markov chain. The primary tool will be finding an appropriate decomposition of the state space. We outline the proof here and refer the reader to [16] for complete details. Using a similar strategy, Martin and Randall showed how to use a Marko chain to sample circuits in regular d-ary trees, i.e., paths of length 2n which trace edges of the tree starting and ending at the origin [15]. This problem generalizes to sampling Dyke paths according to a distribution which favors walks that hit the x-axis a large number of times, known in the statistical physics community as adsorbing staircase walks. Here too the decomposition method was the basis of the analysis. We note that there are other simple algorithms for sampling circuits on trees which do not require Markov chains. In contrast, to our knowledge, the Markov chain based algorithm discussed in this paper is the first efficient method for sampling circuits on ZZ d. 3.1 The Markov chain on circuits The Markov chain on C is based on two types of moves: transpositions of neighboring letters in the word (which keep the numbers of each letter fixed) and rotations, which replace an adjacent (a i, a 1 i ) with (a j, a 1 j ), for some pair of letters a i and a j. We now define the transition probabilities P of M, where we say x u X to mean that we choose x from set X uniformly. Starting at σ, do the following. With probability 1/2, pick i u [n 1] and transpose σ i and σ i+1. With probability 1/2, pick i u [n 1] and k u [d] and if σ i and σ i+1 are inverses (where σ i is a step in the positive direction), then replace them with (a k, a 1 k ). Otherwise keep σ unchanged. The chain is aperiodic, ergodic and reversible, and the transitions are symmetric, so the stationary distribution of this Markov chain is the uniform distribution on C. 3.2 Bounding the mixing rate of the circuits Markov chain We bound the mixing rate of M by appealing to the decomposition theorem. Let σ C and let x i equal the number of occurrences of a i, and hence a 1 i in σ, for all i. Define the trace Tr(σ) to be the vector X = (x 1,..., x d ). This defines a partition of the state space into C = C X, where the union is over all partitions of n into d pieces and C X is the set of words σ C such that Tr(σ) = X = (x 1,..., x d ). The cardinality of the set C X is ( ) 2n x 1,x 1,...,x d,x d, the number of distinct words (or permutations) of length 2n using the letters with these

7 prescribed multiplicities. The number of sets in the partition of the state space is exactly the number of partitions of n into d pieces, D = ( ) n+d 1 d 1. Each restricted Markov chain consists of all the words which have a fixed trace. Hence, transitions in the restricted chains consist of only transpositions, as rotations would change the trace. The projection P consists of a simplex containing D vertices, each representing a distinct partition of 2n. Letting Φ(X, Y ) = 1 2 X Y 1, two points X and Y are connected by an edge of P iff Φ(X, Y ) = 1, where 1 denotes the l 1 metric. In the following we make no attempt to optimize the running time, and instead simply provide polynomial bounds on the convergence rates. Step 1 The restricted Markov chains: Consider any of the restricted chains P X on the set of configurations with trace X. We need to show that this simpler chain, connecting pairs of words differing by a transposition of adjacent letters, converges quickly for any fixed trace. We can analyze the transposition moves on this set by mapping C X to the set of linear extensions of a particular partial order. Consider the alphabet i {{a i,1,..., a i,xi } {A i,1,..., A i,xi }}, and the partial order defined by the relations a i,1 a i,2... a i,xi and A i,1 A i,2... A i,xi, for all i. It is straightforward to see that there is a bijection between the set of circuits in C X and the set of linear extensions to this partial order (mapping a 1 to A). Furthermore, this bijection preserves transpositions. We appeal to the following theorem due to Bubley and Dyer [3]: Theorem 5. The transposition Markov chain on the set of linear extensions to a partial order on n elements has mixing time O(n 4 (log 2 n + log ɛ 1 )). Referring to theorem 1, we can derive the following bound. Corollary 2. The Markov chain P X has spectral gap Gap(P X ) 1/(cn 4 log 2 n) for some constant c. Step 2 The projection of the Markov chain: The states Ω of the projection consist of partitions of n into d pieces, so Ω = D. The stationary probability of X = (x 1,..., x d ) is π(x) = ( ) 2n x 1,x 1,...,x d,x d, the number of words with these multiplicities. The Markov kernel is defined by connecting two partitions X and Y if the distance Φ(X, Y ) = ( x y 1 )/2 = 1. Before applying corollary 1 we first need to bound the mixing rate of the Markov chain defined by Metropolis probabilities. In particular, if X = (x 1,..., x d ) and Y = (x 1,..., x i + 1,..., x j 1,..., x d ), then P M (X, Y ) = 1 ( 2n 2 min 1, π(y ) ) π(x) = 1 2n 2 min ( 1, x 2 j (x i + 1) 2 ).

8 We analyze this Metropolis chain indirectly by first considering a variant P M which admits a simpler path coupling proof. Using the same Markov kernel, define the transitions P M (X, Y ) = 1 2n 2 (x i + 1) 2. In particular, the x i in the denominator is the value which would be increased by the rotation. Notice that detailed balance is satisfied: π(x) π(y ) = (x i + 1) 2 x 2 j = P M (Y, X) (X, Y ). This guarantees that P M has the same stationary distribution as P M, namely π. The mixing rate of this chain can be bounded directly using path coupling. Let U Ω Ω be pairs of states X and Y such that Φ(X, Y ) = 1. We couple by choosing the same pair of indices i and j, and the same bit b { 1, 1} to update each of X and Y, where the probability for accepting each of these moves is dictated by the transitions of P M. Lemma 2. For any pair (X t, Y t ) U, the expected change in distance after one step of the coupled chain is E[Φ(X t+1, Y t+1 )] (1 1 n 6 ) Φ(X t, Y t ). Proof. If (X t, Y t ) U, then there exist coordinates k and k such that y k = x k + 1 and y k = x k 1. Without loss of generality, assume that k = 1 and k = 2. We need to determine the expected change in distance after one step of the coupled chain. Suppose that in this move we try to add 1 to x i and y i and subtract 1 from x j and y j. We consider three cases. Case 1: If {i, j} {1, 2} = 0, then both processes accept the move with the same probability and Φ(X t+1, Y t+1 ) = 1. Case 2: If {i, j} {1, 2} = 1, then we shall see that the expected change is also zero. Assume without loss of generality that i = 1 and j = 3, and first consider the case b = 1. Then we move from X to X = (x 1 + 1, x 2, x 3 1,..., x d ) with probability 1 2n 2 (x 1+1) and from Y to Y = (x , x 2 1, x 3 1,..., x d ) with probability 1 2n 2 (x 1+2). Since P 2 M (X, X ) > P M (Y, Y ), with probability P M (Y, Y ) we update both X and Y ; with probability P M (X, X ) P M (Y, Y ) we update just X; and with all remaining probability we update neither. In the first case we end up with X and Y, in the second we end up with X and Y and in the final case we stay at X and Y. All of these pairs are unit distance apart, so the expected change in distance is zero. If b = 1, then P M (X, X ) = P M (Y, Y 1 ) = 2n 2 (x 3+1) and again the coupling keeps 2 the configurations unit distance apart. Case 3: If {i, j} {1, 2} = 2, then we shall see that the expected change is at most zero. Assume without loss of generality that i = 1, j = 2 and b = 1. The probability of moving from X to X = (x 1 + 1, x 2 1,..., x d ) = Y is P M (X, X 1 ) = 2n 2 (x 1+1). 2 The probability of moving from Y to Y = (x 1 + 2, x 2 2,..., x d ) is P M (Y, Y ) = 1 2n 2 (x 1+2). So with probability P 2 M (Y, Y ) we update both configurations, keeping them unit distance apart, and with probability P M (X, X ) P M (Y, Y ) 1 2n we 6 P M

9 update just X, decreasing the distance to zero. When b = 1 the symmetric argument shows that we again have a small chance of decreasing the distance. Summing over all of these possibilities yields the lemma. The path coupling theorem implies that the mixing time is bounded by τ(ɛ) O(n 6 log n). Furthermore, we get the following bound on the spectral gap. Theorem 6. The Markov chain P M on Ω has spectral gap Gap(P M ) c /(n 6 log n) for some constant c. This bounds the spectral gap of the modified Metropolis chain P M, but we can readily compare the spectral gaps of P M and P M using lemma 1. Since all the transitions of P M are at least as large as those of P M, we find Corollary 3. The Markov chain P M on Ω has spectral gap Gap(P M ) c /(n 6 log n). Step 3 Putting the pieces together: These bounds on the spectral gaps of the restrictions P i and the Metropolis projection P M enable us to apply the decomposition theorem to derive a bound on the spectral gap of P, the original chain. Theorem 7. The Markov chain P is rapidly mixing on C and the spectral gap satisfies Gap(P ) c /(n 12 d log 3 n), for some constant c. Proof. To apply corollary 1, we need to bound the parameters β and γ. We find that β 1 4nd, the minimum probability of a transition. To bound γ we need to determine what fraction of the words in C X are neighbors of a word in C Y if Φ(X, Y ) = 1 (since π is uniform within each of these sets). If X = (x 1,..., x d ) and Y = (x 1,..., x i + 1,..., x j 1,..., x n ), this fraction is exactly the likelihood that a word in C X has an a i followed by an a 1 i, and this is easily determined to be at least 1/n. Combining β 1 4nd, γ 1 n with our bounds from lemmas 2 and 3, corollary 1 gives the claimed lower bound on the spectral gap. 4 Other applications of decomposition The key step to applying the decomposition theorem is finding an appropriate partition of the state space. In most examples a natural choice seems to be to cluster configurations of equal probability together so that the distribution for each of the restricted chains is uniform, or so that the restrictions share some essential feature which will make it easy to bound the mixing rate. In the example of section 3, the state space is divided into subsets, each representing a partition of n into d parts. It followed that the vertices of the projection formed a d-dimensional simplex, where the Markov kernel was formed by connecting vertices which are neighbors in the simplex. We briefly outline two other recent applications of the decomposition theorem where we get other natural graphs for the projection. In the first case graph defining the Markov kernel of the projection is one-dimensional and in the second it is a hypercube.

10 4.1 Independent sets Our first example is sampling independent sets of a graph according to the Gibbs measure. Recall that π(i) = γ I /Z γ, where γ > 0 is an input parameter and Z γ normalizes the distribution. There has been much activity in studying how to sample independent sets for various values of γ using a simple, natural Markov chain based on inserting, deleting or exchanging vertices at each step. Works of Luby and Vigoda [9] and Dyer and Greenhill [5] imply that this chain is rapidly mixing if γ 2/( 2), where is the maximum number of neighbors of any vertex in G. It was shown by Borgs et al. [1] that this chain is slowly mixing on some graphs for γ sufficiently large. Alternatively, Madras and Randall [12] showed that this algorithm is fast for every value of γ, provided we restrict the state space to independent sets of size at most n = V /2( + 1). This relies heavily on earlier work of Dyer and Greenhill [6] showing that a Markov chain defined by exchanges is rapidly mixing on the set of independent sets of fixed size k, whenever k n. The decomposition here is quite natural: We partition I, the set of independent sets of G, into pieces I k according to their size. The restrictions arising from this partition permit exchanges, but disallow insertions or deletions, as they exit the state space of the restricted Markov chain. These are now exactly the Markov chains proven to be rapidly mixing by Dyer and Greenhill (but with slightly greater self-loop probabilities) and hence can also be seen to be rapidly mixing. Consequently, we need only bound the mixing rate of the projection. Here the projection is a one-dimensional graph on {0,..., n }. Further calculation determines that the stationary distribution π(k) of the projection is unimodal in k, implying that the projection is also rapidly mixing. We refer the reader to [12] for details. 4.2 The swapping algorithm To further demonstrate the versatility and potential of the decomposition method, we review an application of a very different flavor. In recent work, Madras and Zheng [13] show that the swapping algorithm is rapidly mixing for the mean field Ising model (i.e., the Ising model on the complete graph), as well as for a simpler toy model. Given a graph G = (V, E), the ferromagnetic Ising model consists of a graph G whose vertices represent particles and whose edges represent interactions between particles. A spin configuration is an assignment of spins, either + or, to each of the vertices, where adjacent vertices prefer to have the same spin. Let J x,y > 0 be the interaction energy between vertices x and y, where (x, y) E. Let σ Ω = {+, } V be any assignment of {+, } to each of the vertices. The Hamiltonian of σ is H(σ) = (x,y) E J x,y 1 σx σ y, where 1 A is the indicator function which is 1 when the event A is true and 0 otherwise. The probability that the Ising spin state is σ is given by the Gibbs distribution: π(σ) = e βh(σ) Z(G),

11 where β is inverse temperature and Z(G) = σ e βh(σ). It is well known that at sufficiently low temperatures the distribution is bimodal (as a function the number of vertices assigned +), and any local dynamics will be slowly mixing. The simplest local dynamics, Glauber dynamics, is the Markov chain defined by choosing a vertex at random and flipping the spin at that vertex with the appropriate Metropolis probability. Simulated tempering, which varies the temperature during the runtime of an algorithm, appears to be a useful way to circumvent this difficulty [8, 14]. The chain moves between m close temperatures that interpolate between the temperature of interest and very high temperature, where the local dynamics converges rapidly. The swapping algorithm is a variant of tempering, introduced by Geyer [7], where the state space is Ω m and each configuration S = (σ 1,..., σ m ) Ω m consists of one sample at each temperature. The stationary distribution distribution is π(s) = Πi=1 m π i(σ i ), where π i is the distribution at temperature i. The transitions of the swapping algorithm consist of two types of moves: with probability 1/2 choose i [m] and perform a local update of σ i (using Glauber dynamics at this fixed temperature); with probability 1/2 choose i [m 1] and move from S = (σ 1,..., σ m ) to S = (σ 1,..., σ i+1, σ i,..., σ m ), i.e., swap configurations i and i + 1, with the appropriate Metropolis probability. The idea behind the swapping algorithm, and other versions of tempering, is that, in the long run, the trajectory of each Ising configuration will spend equal time at each temperature, potentially greatly speeding up mixing. Experimentally, this appears to overcome obstacles to sampling at low temperatures. Madras and Zheng show that the swapping algorithm is rapidly mixing on the meanfield Ising model at all temperatures. Let Ω + Ω be the set of configurations that are predominantly +, and similarly Ω. Define the trace of a configuration S to be Tr(S) = (v 1,..., v m ) {+, } m where v i = + if σ i Ω + and v i = if σ i Ω. The analysis of the swapping chain uses decomposition by partitioning the state space according to the trace. The projection for this decomposition is the m-dimensional hypercube where each vertex represents a distinct trace. The stationary distribution is uniform on the hypercube because, at each temperature, the likelihood of being in Ω + and Ω are equal due to symmetry. Relying on the comparison method, it suffices to analyze the following simplification of the projection: Starting at any vertex V = (v 1,..., v m ) in the hypercube, pick i u [m]. If i = 1, then with probability 1/2 flip the first bit; if i > 1, then with probability 1/2 transpose the v i 1 and v i ; and with all remaining probability do nothing. This chain is easily seen to be rapidly mixing on the hypercube and can be used to infer a bound on the spectral gap of the projection chain. To analyze the restrictions, Madras and Zheng first prove that the simple, single flip dynamics on Ω + is rapidly mixing at any temperature; this result is analytical, relying on the fact that the underlying graph is complete for the mean-field model. Using simple facts about Markov chains on product spaces, it can be shown that the each of the restricted chains must also be rapidly mixing (even without including any

12 swap moves). Once again decomposition completes the proof of rapid mixing, and we can conclude that the swapping algorithm is efficient on the complete graph. Acknowledgements I wish to thank Russell Martin for many useful discussions, especially regarding the technical details of section 3, and Dimitris Achlioptas for improving an earlier draft of this paper. References 1. C. Borgs, J.T. Chayes, A. Frieze, J.H. Kim, P. Tetali, E. Vigoda, and V.H. Vu. Torpid mixing of some MCMC algorithms in statistical physics. Proc. 40th IEEE Symposium on Foundations of Computer Science, , R. Bubley and M. Dyer. Faster random generation of linear extensions. Discrete Mathematics, 201:81 88, R. Bubley and M. Dyer. Path coupling: A technique for proving rapid mixing in Markov chains. Proc. 38th Annual IEEE Symposium on Foundations of Computer Science , P. Diaconis and L. Saloff-Coste. Comparison theorems for reversible Markov chains. Annals of Applied Probability, 3: , M. Dyer and C. Greenhill. On Markov chains for independent sets. Journal of Algorithms, 35: 17 49, M. Dyer and C. Greenhill. A more rapidly mixing Markov chain for graph colorings. Random Structures and Algorithms, 13: , C.J. Geyer Markov Chain Monte Carlo Maximum Likelihood. Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E.M. Keramidas, ed.), Interface Foundation, Fairfax Sta tion, C.J. Geyer and E.A. Thompson. Annealing Markov Chain Monte Carlo with Applications to Ancestral Inference. J. Amer. Statist. Assoc , M. Luby and E. Vigoda. Fast Convergence of the Glauber dynamics for sampling independent sets. Random Structures and Algorithms 15: , N. Madras and M. Piccioni. Importance sampling for families of distributions. Ann. Appl. Probab. 9: , N. Madras and D. Randall. Factoring graphs to bound mixing rates. Proc. 37th Annual IEEE Symposium on Foundations of Computer Science, , N. Madras and D. Randall. Markov chain decomposition for convergence rate analysis. Annals of Applied Probability, (to appear), N. Madras and Z. Zheng. On the swapping algorithm. Preprint, E. Marinari and G. Parisi. Simulated tempering: a new Monte Carlo scheme. Europhys. Lett , R.A. Martin and D. Randall Sampling adsorbing staircase walks using a new Markov chain decomposition method. Proceedings of the 41st Symposium on the Foundations of Computer Science (FOCS 2000), , R.A. Martin and D. Randall Disjoint decomposition with applications to sampling circuits in some Cayley graphs. Preprint, D. Randall and P. Tetali. Analyzing Glauber dynamics by comparison of Markov chains. Journal of Mathematical Physics, 41: , A.J. Sinclair. Algorithms for random generation & counting: a Markov chain approach. Birkhäuser, Boston, 1993.

Sampling Adsorbing Staircase Walks Using a New Markov Chain Decomposition Method

Sampling Adsorbing Staircase Walks Using a New Markov Chain Decomposition Method 66 66 77 D Sampling Adsorbing Staircase Walks Using a New Markov hain Decomposition Method Russell A Martin School of Mathematics Georgia Institute of Technology martin@mathgatechedu Dana Randall ollege

More information

Markov chain Monte Carlo algorithms

Markov chain Monte Carlo algorithms M ONTE C ARLO M ETHODS Rapidly Mixing Markov Chains with Applications in Computer Science and Physics Monte Carlo algorithms often depend on Markov chains to sample from very large data sets A key ingredient

More information

Coupling. 2/3/2010 and 2/5/2010

Coupling. 2/3/2010 and 2/5/2010 Coupling 2/3/2010 and 2/5/2010 1 Introduction Consider the move to middle shuffle where a card from the top is placed uniformly at random at a position in the deck. It is easy to see that this Markov Chain

More information

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods University of Chicago Autumn 2003 CS37101-1 Markov Chain Monte Carlo Methods Lecture 4: October 21, 2003 Bounding the mixing time via coupling Eric Vigoda 4.1 Introduction In this lecture we ll use the

More information

On Markov chains for independent sets

On Markov chains for independent sets On Markov chains for independent sets Martin Dyer and Catherine Greenhill School of Computer Studies University of Leeds Leeds LS2 9JT United Kingdom Submitted: 8 December 1997. Revised: 16 February 1999

More information

MIXING. Abstract. 1 Introduction DANA RANDALL. 1.1 The basics of sampling

MIXING. Abstract. 1 Introduction DANA RANDALL. 1.1 The basics of sampling E MIXIG RLL bstract In this tutorial we introduce the notion of a Markov chain and explore how it can be used for sampling from a large set of configurations Our primary focus will be determining how quickly

More information

The Markov Chain Monte Carlo Method

The Markov Chain Monte Carlo Method The Markov Chain Monte Carlo Method Idea: define an ergodic Markov chain whose stationary distribution is the desired probability distribution. Let X 0, X 1, X 2,..., X n be the run of the chain. The Markov

More information

Lecture 6: September 22

Lecture 6: September 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected

More information

The coupling method - Simons Counting Complexity Bootcamp, 2016

The coupling method - Simons Counting Complexity Bootcamp, 2016 The coupling method - Simons Counting Complexity Bootcamp, 2016 Nayantara Bhatnagar (University of Delaware) Ivona Bezáková (Rochester Institute of Technology) January 26, 2016 Techniques for bounding

More information

Lecture 3: September 10

Lecture 3: September 10 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 3: September 10 Lecturer: Prof. Alistair Sinclair Scribes: Andrew H. Chan, Piyush Srivastava Disclaimer: These notes have not

More information

Polynomial-time counting and sampling of two-rowed contingency tables

Polynomial-time counting and sampling of two-rowed contingency tables Polynomial-time counting and sampling of two-rowed contingency tables Martin Dyer and Catherine Greenhill School of Computer Studies University of Leeds Leeds LS2 9JT UNITED KINGDOM Abstract In this paper

More information

7.1 Coupling from the Past

7.1 Coupling from the Past Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov

More information

Self-Assembly and Convergence Rates of Heterogeneous Reversible Growth Processes (Extended Abstract)

Self-Assembly and Convergence Rates of Heterogeneous Reversible Growth Processes (Extended Abstract) Self-Assembly and Convergence Rates of Heterogeneous Reversible Growth Processes (Extended Abstract) Amanda Pascoe 1 and Dana Randall 2 1 School of Mathematics, Georgia Institute of Technology, Atlanta

More information

Annealing and Tempering for Sampling and Counting. Nayantara Bhatnagar

Annealing and Tempering for Sampling and Counting. Nayantara Bhatnagar Annealing and Tempering for Sampling and Counting A Thesis Presented to The Academic Faculty by Nayantara Bhatnagar In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Algorithms,

More information

Sampling Good Motifs with Markov Chains

Sampling Good Motifs with Markov Chains Sampling Good Motifs with Markov Chains Chris Peikert December 10, 2004 Abstract Markov chain Monte Carlo (MCMC) techniques have been used with some success in bioinformatics [LAB + 93]. However, these

More information

A simple condition implying rapid mixing of single-site dynamics on spin systems

A simple condition implying rapid mixing of single-site dynamics on spin systems A simple condition implying rapid mixing of single-site dynamics on spin systems Thomas P. Hayes University of California Berkeley Computer Science Division hayest@cs.berkeley.edu Abstract Spin systems

More information

Oberwolfach workshop on Combinatorics and Probability

Oberwolfach workshop on Combinatorics and Probability Apr 2009 Oberwolfach workshop on Combinatorics and Probability 1 Describes a sharp transition in the convergence of finite ergodic Markov chains to stationarity. Steady convergence it takes a while to

More information

Exponential Metrics. Lecture by Sarah Cannon and Emma Cohen. November 18th, 2014

Exponential Metrics. Lecture by Sarah Cannon and Emma Cohen. November 18th, 2014 Exponential Metrics Lecture by Sarah Cannon and Emma Cohen November 18th, 2014 1 Background The coupling theorem we proved in class: Theorem 1.1. Let φ t be a random variable [distance metric] satisfying:

More information

Can extra updates delay mixing?

Can extra updates delay mixing? Can extra updates delay mixing? Yuval Peres Peter Winkler July 7, 2010 Abstract We consider Glauber dynamics (starting from an extremal configuration) in a monotone spin system, and show that interjecting

More information

Not all counting problems are efficiently approximable. We open with a simple example.

Not all counting problems are efficiently approximable. We open with a simple example. Chapter 7 Inapproximability Not all counting problems are efficiently approximable. We open with a simple example. Fact 7.1. Unless RP = NP there can be no FPRAS for the number of Hamilton cycles in a

More information

Lecture 28: April 26

Lecture 28: April 26 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 28: April 26 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Markov Chain Decomposition for Convergence Rate Analysis

Markov Chain Decomposition for Convergence Rate Analysis Markov Chain Decomposition for Convergence Rate Analysis Neal Madras Dana Randall Abstract In this paper we develop tools for analyzing the rate at which a reversible Markov chain converges to stationarity.

More information

Approximate Counting and Markov Chain Monte Carlo

Approximate Counting and Markov Chain Monte Carlo Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam

More information

Lecture 19: November 10

Lecture 19: November 10 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 19: November 10 Lecturer: Prof. Alistair Sinclair Scribes: Kevin Dick and Tanya Gordeeva Disclaimer: These notes have not been

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

XVI International Congress on Mathematical Physics

XVI International Congress on Mathematical Physics Aug 2009 XVI International Congress on Mathematical Physics Underlying geometry: finite graph G=(V,E ). Set of possible configurations: V (assignments of +/- spins to the vertices). Probability of a configuration

More information

A Note on the Glauber Dynamics for Sampling Independent Sets

A Note on the Glauber Dynamics for Sampling Independent Sets A Note on the Glauber Dynamics for Sampling Independent Sets Eric Vigoda Division of Informatics King s Buildings University of Edinburgh Edinburgh EH9 3JZ vigoda@dcs.ed.ac.uk Submitted: September 25,

More information

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta Sarah Miracle Dana Randall Amanda Pascoe Streib Abstract We study the mixing time of a Markov chain M nn on

More information

Convergence Rates of Markov Chains for Some Self-Assembly and Non-Saturated Ising Models

Convergence Rates of Markov Chains for Some Self-Assembly and Non-Saturated Ising Models Convergence Rates of Markov Chains for Some Self-Assembly and Non-Saturated Ising Models Sam Greenberg Dana Randall Abstract Algorithms based on Markov chains are ubiquitous across scientific disciplines

More information

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta Sarah Miracle Dana Randall Amanda Pascoe Streib Abstract We study the mixing time of a Markov chain M nn on

More information

Sampling Biased Lattice Configurations using Exponential Metrics

Sampling Biased Lattice Configurations using Exponential Metrics Sampling Biased Lattice Configurations using Exponential Metrics Sam Greenberg Amanda Pascoe Dana Randall Abstract Monotonic surfaces spanning finite regions of Z d arise in many contexts, including DNA-based

More information

Lecture 9: Counting Matchings

Lecture 9: Counting Matchings Counting and Sampling Fall 207 Lecture 9: Counting Matchings Lecturer: Shayan Oveis Gharan October 20 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

On the Swapping Algorithm

On the Swapping Algorithm On the Swapping Algorithm Neal Madras, Zhongrong Zheng Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, Canada; e-mail: madras@mathstat.yorku.ca Received

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

On Markov Chain Monte Carlo

On Markov Chain Monte Carlo MCMC 0 On Markov Chain Monte Carlo Yevgeniy Kovchegov Oregon State University MCMC 1 Metropolis-Hastings algorithm. Goal: simulating an Ω-valued random variable distributed according to a given probability

More information

Sampling Contingency Tables

Sampling Contingency Tables Sampling Contingency Tables Martin Dyer Ravi Kannan John Mount February 3, 995 Introduction Given positive integers and, let be the set of arrays with nonnegative integer entries and row sums respectively

More information

Dynamics for the critical 2D Potts/FK model: many questions and a few answers

Dynamics for the critical 2D Potts/FK model: many questions and a few answers Dynamics for the critical 2D Potts/FK model: many questions and a few answers Eyal Lubetzky May 2018 Courant Institute, New York University Outline The models: static and dynamical Dynamical phase transitions

More information

Rapid Mixing of the Switch Markov Chain for Strongly Stable Degree Sequences and 2-Class Joint Degree Matrices

Rapid Mixing of the Switch Markov Chain for Strongly Stable Degree Sequences and 2-Class Joint Degree Matrices Rapid Mixing of the Switch Markov Chain for Strongly Stable Degree Sequences and 2-Class Joint Degree Matrices Georgios Amanatidis Centrum Wiskunde & Informatica (CWI) Amsterdam, The Netherlands georgios.amanatidis@cwi.nl

More information

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu**

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu** 4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN Robin Thomas* Xingxing Yu** School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332, USA May 1991, revised 23 October 1993. Published

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Rapid Mixing of Several Markov Chains for a Hard-Core Model

Rapid Mixing of Several Markov Chains for a Hard-Core Model Rapid Mixing of Several Markov Chains for a Hard-Core Model Ravi Kannan 1, Michael W. Mahoney 2, and Ravi Montenegro 3 1 Department of Computer Science, Yale University, New Haven, CT 06520 kannan@cs.yale.edu

More information

Lecture 5: Counting independent sets up to the tree threshold

Lecture 5: Counting independent sets up to the tree threshold CS 7535: Markov Chain Monte Carlo Algorithms Fall 2014 Lecture 5: Counting independent sets up to the tree threshold Lecturer: Richard Brooks and Rakshit Trivedi Date: December 02 In this lecture, we will

More information

Lecture 14: October 22

Lecture 14: October 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 14: October 22 Lecturer: Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected to the

More information

Discrete quantum random walks

Discrete quantum random walks Quantum Information and Computation: Report Edin Husić edin.husic@ens-lyon.fr Discrete quantum random walks Abstract In this report, we present the ideas behind the notion of quantum random walks. We further

More information

Lecture 2: September 8

Lecture 2: September 8 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 2: September 8 Lecturer: Prof. Alistair Sinclair Scribes: Anand Bhaskar and Anindya De Disclaimer: These notes have not been

More information

Coupling AMS Short Course

Coupling AMS Short Course Coupling AMS Short Course January 2010 Distance If µ and ν are two probability distributions on a set Ω, then the total variation distance between µ and ν is Example. Let Ω = {0, 1}, and set Then d TV

More information

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods. Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods. Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda University of Chicago Autumn 2003 CS37101-1 Markov Chain Monte Carlo Methods Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda We refer the reader to Jerrum s book [1] for the analysis

More information

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that 15 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that µ ν tv = (1/) x S µ(x) ν(x) = x S(µ(x) ν(x)) + where a + = max(a, 0). Show that

More information

Chapter 7. The worm algorithm

Chapter 7. The worm algorithm 64 Chapter 7 The worm algorithm This chapter, like chapter 5, presents a Markov chain for MCMC sampling of the model of random spatial permutations (chapter 2) within the framework of chapter 4. The worm

More information

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Theory of Stochastic Processes 8. Markov chain Monte Carlo Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html

More information

Vertex and edge expansion properties for rapid mixing

Vertex and edge expansion properties for rapid mixing Vertex and edge expansion properties for rapid mixing Ravi Montenegro Abstract We show a strict hierarchy among various edge and vertex expansion properties of Markov chains. This gives easy proofs of

More information

Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals. Satoshi AOKI and Akimichi TAKEMURA

Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals. Satoshi AOKI and Akimichi TAKEMURA Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals Satoshi AOKI and Akimichi TAKEMURA Graduate School of Information Science and Technology University

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Conductance and Rapidly Mixing Markov Chains

Conductance and Rapidly Mixing Markov Chains Conductance and Rapidly Mixing Markov Chains Jamie King james.king@uwaterloo.ca March 26, 2003 Abstract Conductance is a measure of a Markov chain that quantifies its tendency to circulate around its states.

More information

Edge Flip Chain for Unbiased Dyadic Tilings

Edge Flip Chain for Unbiased Dyadic Tilings Sarah Cannon, Georgia Tech,, David A. Levin, U. Oregon Alexandre Stauffer, U. Bath, March 30, 2018 Dyadic Tilings A dyadic tiling is a tiling on [0,1] 2 by n = 2 k dyadic rectangles, rectangles which are

More information

CUTOFF FOR THE STAR TRANSPOSITION RANDOM WALK

CUTOFF FOR THE STAR TRANSPOSITION RANDOM WALK CUTOFF FOR THE STAR TRANSPOSITION RANDOM WALK JONATHAN NOVAK AND ARIEL SCHVARTZMAN Abstract. In this note, we give a complete proof of the cutoff phenomenon for the star transposition random walk on the

More information

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems Ivona Bezáková Daniel Štefankovič Vijay V. Vazirani Eric Vigoda Abstract We present an improved cooling schedule for

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Model Counting for Logical Theories

Model Counting for Logical Theories Model Counting for Logical Theories Wednesday Dmitry Chistikov Rayna Dimitrova Department of Computer Science University of Oxford, UK Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern

More information

Rapidly Mixing Markov Chains for Sampling Contingency Tables with a Constant Number of Rows

Rapidly Mixing Markov Chains for Sampling Contingency Tables with a Constant Number of Rows Rapidly Mixing Markov Chains for Sampling Contingency Tables with a Constant Number of Rows Mary Cryan School of Informatics University of Edinburgh Edinburgh EH9 3JZ, UK Leslie Ann Goldberg Department

More information

ARCC WORKSHOP: SHARP THRESHOLDS FOR MIXING TIMES SCRIBES: SHARAD GOEL, ASAF NACHMIAS, PETER RALPH AND DEVAVRAT SHAH

ARCC WORKSHOP: SHARP THRESHOLDS FOR MIXING TIMES SCRIBES: SHARAD GOEL, ASAF NACHMIAS, PETER RALPH AND DEVAVRAT SHAH ARCC WORKSHOP: SHARP THRESHOLDS FOR MIXING TIMES SCRIBES: SHARAD GOEL, ASAF NACHMIAS, PETER RALPH AND DEVAVRAT SHAH Day I 1. Speaker I: Persi Diaconis Definition 1 (Sharp Threshold). A Markov chain with

More information

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems Counting and Sampling Fall 2017 Lecture 17: D-Stable Polynomials and Lee-Yang Theorems Lecturer: Shayan Oveis Gharan November 29th Disclaimer: These notes have not been subjected to the usual scrutiny

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations

The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations The Tightness of the Kesten-Stigum Reconstruction Bound for a Symmetric Model With Multiple Mutations City University of New York Frontier Probability Days 2018 Joint work with Dr. Sreenivasa Rao Jammalamadaka

More information

Quantum algorithms based on quantum walks. Jérémie Roland (ULB - QuIC) Quantum walks Grenoble, Novembre / 39

Quantum algorithms based on quantum walks. Jérémie Roland (ULB - QuIC) Quantum walks Grenoble, Novembre / 39 Quantum algorithms based on quantum walks Jérémie Roland Université Libre de Bruxelles Quantum Information & Communication Jérémie Roland (ULB - QuIC) Quantum walks Grenoble, Novembre 2012 1 / 39 Outline

More information

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505 INTRODUCTION TO MCMC AND PAGERANK Eric Vigoda Georgia Tech Lecture for CS 6505 1 MARKOV CHAIN BASICS 2 ERGODICITY 3 WHAT IS THE STATIONARY DISTRIBUTION? 4 PAGERANK 5 MIXING TIME 6 PREVIEW OF FURTHER TOPICS

More information

Discrete solid-on-solid models

Discrete solid-on-solid models Discrete solid-on-solid models University of Alberta 2018 COSy, University of Manitoba - June 7 Discrete processes, stochastic PDEs, deterministic PDEs Table: Deterministic PDEs Heat-diffusion equation

More information

Phase Transitions in Random Dyadic Tilings and Rectangular Dissections

Phase Transitions in Random Dyadic Tilings and Rectangular Dissections Phase Transitions in Random Dyadic Tilings and Rectangular Dissections Sarah Cannon Sarah Miracle Dana Randall Abstract We study rectangular dissections of an n n lattice region into rectangles of area

More information

Simulation - Lectures - Part III Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov

More information

Characterization of cutoff for reversible Markov chains

Characterization of cutoff for reversible Markov chains Characterization of cutoff for reversible Markov chains Yuval Peres Joint work with Riddhi Basu and Jonathan Hermon 23 Feb 2015 Joint work with Riddhi Basu and Jonathan Hermon Characterization of cutoff

More information

Counting and sampling paths in graphs

Counting and sampling paths in graphs Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 2008 Counting and sampling paths in graphs T. Ryan Hoens Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

Negative Examples for Sequential Importance Sampling of Binary Contingency Tables

Negative Examples for Sequential Importance Sampling of Binary Contingency Tables Negative Examples for Sequential Importance Sampling of Binary Contingency Tables Ivona Bezáková Alistair Sinclair Daniel Štefankovič Eric Vigoda arxiv:math.st/0606650 v2 30 Jun 2006 April 2, 2006 Keywords:

More information

DIFFERENTIAL POSETS SIMON RUBINSTEIN-SALZEDO

DIFFERENTIAL POSETS SIMON RUBINSTEIN-SALZEDO DIFFERENTIAL POSETS SIMON RUBINSTEIN-SALZEDO Abstract. In this paper, we give a sampling of the theory of differential posets, including various topics that excited me. Most of the material is taken from

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Testing the Expansion of a Graph

Testing the Expansion of a Graph Testing the Expansion of a Graph Asaf Nachmias Asaf Shapira Abstract We study the problem of testing the expansion of graphs with bounded degree d in sublinear time. A graph is said to be an α- expander

More information

Analysis of Markov chain algorithms on spanning trees, rooted forests, and connected subgraphs

Analysis of Markov chain algorithms on spanning trees, rooted forests, and connected subgraphs Analysis of Markov chain algorithms on spanning trees, rooted forests, and connected subgraphs Johannes Fehrenbach and Ludger Rüschendorf University of Freiburg Abstract In this paper we analyse a natural

More information

MONOTONE COUPLING AND THE ISING MODEL

MONOTONE COUPLING AND THE ISING MODEL MONOTONE COUPLING AND THE ISING MODEL 1. PERFECT MATCHING IN BIPARTITE GRAPHS Definition 1. A bipartite graph is a graph G = (V, E) whose vertex set V can be partitioned into two disjoint set V I, V O

More information

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines

Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines Mixing Rates for the Gibbs Sampler over Restricted Boltzmann Machines Christopher Tosh Department of Computer Science and Engineering University of California, San Diego ctosh@cs.ucsd.edu Abstract The

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Two-coloring random hypergraphs

Two-coloring random hypergraphs Two-coloring random hypergraphs Dimitris Achlioptas Jeong Han Kim Michael Krivelevich Prasad Tetali December 17, 1999 Technical Report MSR-TR-99-99 Microsoft Research Microsoft Corporation One Microsoft

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Markov Chains on Orbits of Permutation Groups

Markov Chains on Orbits of Permutation Groups Markov Chains on Orbits of Permutation Groups Mathias Niepert Universität Mannheim mniepert@gmail.com Abstract We present a novel approach to detecting and utilizing symmetries in probabilistic graphical

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Exact Goodness-of-Fit Testing for the Ising Model

Exact Goodness-of-Fit Testing for the Ising Model Exact Goodness-of-Fit Testing for the Ising Model The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher

More information

Flip dynamics on canonical cut and project tilings

Flip dynamics on canonical cut and project tilings Flip dynamics on canonical cut and project tilings Thomas Fernique CNRS & Univ. Paris 13 M2 Pavages ENS Lyon November 5, 2015 Outline 1 Random tilings 2 Random sampling 3 Mixing time 4 Slow cooling Outline

More information

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Proclaiming Dictators and Juntas or Testing Boolean Formulae Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University

More information

Algorithmic Problems Arising in Posets and Permutations

Algorithmic Problems Arising in Posets and Permutations Algorithmic Problems Arising in Posets and Permutations A Thesis Submitted to the Faculty in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science by Shahrzad

More information

Random Dyadic Tilings of the Unit Square

Random Dyadic Tilings of the Unit Square Random Dyadic Tilings of the Unit Square Svante Janson, Dana Randall and Joel Spencer June 24, 2002 Abstract A dyadic rectangle is a set of the form R = [a2 s, (a+1)2 s ] [b2 t, (b+1)2 t ], where s and

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

The Monte Carlo Method

The Monte Carlo Method The Monte Carlo Method Example: estimate the value of π. Choose X and Y independently and uniformly at random in [0, 1]. Let Pr(Z = 1) = π 4. 4E[Z] = π. { 1 if X Z = 2 + Y 2 1, 0 otherwise, Let Z 1,...,

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information