Rapidly mixing Markov chains - PDF Free Download

Randomized and Approximation Algorithms Leonard Schulman Scribe: Rajneesh Hegde Georgia Tech, Fall 997 Dec., 997 Rapidly mixing Marko chains Goal: To sample from interesting probability distributions. We shall start with a few examples.. The uniform distribution on f0 g n.. The uniform distribution on unbiased random walks. 3. The uniform distribution on permutations of f ::: ng. 4. The uniform distribution on spanning trees of a gien graph. 5. The uniform distribution on solutions to the knapsack problem. 6. The uniform distribution on matchings/perfect matchings of a gien graph. 7. The uniform distribution on self-aoiding random walks in the k-dimensional grid. Let's now consider a random walk on an undirected graph G, where, at each ertex : with probability,westayput with probability d,wegotoeach oftheneighbors of Proposition If G is connected, then this chain has a unique stationary distribution. Proof : We claim / d 8 V (G isastationary distribution. Indeed 0 / d X + d w d w w = d What's left to proe is that there cannot be more than one stationary distribution for this chain. Consider an arbitrary distribution p which is dierent from the aboe stationary distribution. We shall show that within a certain number k of steps, L (pa k becomes strictly less than L (p, which means that p cannot be a stationary distribution. (By L (p, we mean L (p ;, where L is the usual -norm. Let p 0 = pa. Wehae p 0 = p + X w p w d w Gien natural numbers a ::: a n b,asolution is f0 g n,suchthat P n i= iai b

Writing the same expression for 0 and subtracting from the aboe, we get Now p 0 ; 0 = (p ; + X w L (p 0 = X = X jp 0 ; 0 j X (p ; + w X X = jp ; j = L (p j(p ; j + X w d w (p w ; w ( (p w ; w d w d w j(p w ; w j Now the aboe inequality will be strict if, for some ertex, the expansion for p 0 ; 0 (gien by ( has two terms of opposite signs. In other words, there should be two ertices and,ata distance of or from each other,such thatp ; and p ; are of opposite signs. It remains to show that, after a certain number k of steps, pa k is such that there are two ertices with the aboe property. Dene V + = f : p ; > 0g and V ; = f : p ; < 0g. Since the distribution p is dierent from, each ofthese sets is non-empty. Suppose that p is such thatthere are no two ertices and with the aboe property, that is, the minimum distance between a ertex in V + and a ertex in V ; is 3. We proe that for the distribution p 0 = pa, theset V consists of the set +0 V + and its \neighborhood", that is, all ertices that are at a distance of at most one from a ertex in V +. Indeed, for any ertex at a distance of at most one from a ertex in V +, equation ( shows that p 0 ; 0 will be positie, which means that V. Similarly, +0 V consists of the set ;0 V ; and its neighborhood. Thus the distance between V + and V ; decreases by two ineach round until it is either or. Note that just producing a random sample can be ery expensie. (For instance, in the knapsack problem, with a i = b= n 3,ifwe just pick arandom f ig, itwill be infeasible with high probability. The Approach Introduce a graph on the objects to be sampled from, and construct a random walk on that graph which achiees the desired stationary distribution. For ecient sampling, the walk should approach its stationary distribution fairly rapidly. More formally, the walk should be \rapidly mixing". We dene the mixing time of a Marko chain M as max initial distributions p minft : L (pa t g The usefulness of the aboe denition is partly due to Claim If t is the mixing time of a Marko chain, then L (pa kt ;k

Proof : We proceed by induction on k. Write pa (k;t as + f = + f + ; f ; where f + = maxf(pa (k;t ; 0g and f ; = maxf ; (pa (k;t 0g Note that L (f =L (f + +L (f ; and L (f + =L (f ; =l (say By induction, ;k+ L (f =l where, (pa (k;t A t = ( + f + ; f ; A t ( = + l f + l ; l f; l! A t (3 = + l( + f ++ ; f +; ; l( + f ;+ ; f ;; (4 = + l(f ++ ; f +; ; l(f ;+ ; f ;; (5 f ++ f +; = max = max and similarly for f ;+ and f ;;. It follows from the denition of mixing time that ( f +l At! ; 0! ( f +l ; At 0 L (f ++ +L (f +; Similarly, L (f ;+ +L (f ;; Rewriting (5, wehae L (pa kt = L (l(f ++ ; f +; ; l(f ;+ ; f ;; + l(l (f ++ +L (f +; ; l(l (f ;+ +L (f ;; = l = l = L (f ;k Denition A search/sampling/counting problem P is called self-reducible if there exists a representation of n bits for the objects dened by the problem P of size m, such that for i =0 : the set of objects whose rst bit is i are the objects of another problem P i,withjp i jm, and P i is polynomial time computable from P. 3

e.g. - Matchings in a graph G, where the representation of a matching is gien by onebit for each edge of the graph, with a denoting that the corresponding edge has been included. For a bit corresponding to an edge e with ends u, P 0 denotes the set of matchings of Gne and P denotes those of Gnfu g. For the knapsack problem, represent a solution by n bits, with signifying the inclusion of the corresponding object in the knapsack. Here, for some bit a i, P 0 denotes the solutions of the problem with the object i deleted, and P denotes the solutions of the problem with object i deleted and a knapsack ofcapacity b 0 = b ; a i. Theorem 3 The following statements are polynomial time equialent for a self-reducible problem:. 8k, there exists an algorithm running in time n f (k that samples from a distribution p on the objects such that L (p unif: n ;k. 8k, there exists an algorithm running in time n f (k that estimates # objects to within a multiplicatie factor + n k, except for an error eent of probability n ;k. Proof : Consider the self-reducibility tree of the problem, i.e. a tree where the root denotes the problem itself, and its children denote the problems P 0 and P, and so on. The leaes thus represent the objects of the original problem. Part Suppose we aregien a sampling algorithm as mentioned in., i.e. we sample from a distribution with L c. n k+ Hence, after a polynomial # samples, we canestimate p 0 (the number of leaes in the larger subtree, either P 0 or P towithin an additie factor of c except for an error probability n k+ P (err n ;k;. Consider the following counting algorithm. Go down the self-reducibility tree, computing the probabilities of successie sub-trees. At each branch, pursue the heaier sub-tree. Let q i be the estimated probability ofthelarger side at leel i. The estimator for the # objects in the problem is ny p i= i Note that, if none of the estimates fail (i.e. fall outside the additie error bound estimate true alue = Y q i p i Y pi + n;k; p i = Y + n ;k; p Y i ( + n ;k; +n ;k Similar arithmetic shows that estimate = true alue ; n ;k.the probability offailure is bounded by the sum of the probabilities of any estimate failing, thus total P (err n ;k. 4

Part Suppose we hae analgorithm that estimates the # objects in the problem as mentioned in statement. inthe theorem. Let q 0 q be the fraction of the estimates for the left and right subtrees respectiely of the root. We construct a random sample as follows. Go down the root either way with probabilities q 0 and q respectiely. Continue this procedure. The leaf that this will eentually lead us to is the random sample that we generate. The probability that a gien estimate is outside the error bound is assumed to be n ;k; hence the probability that any ofthemfails is n ;k.inthis case the sampled distribution may be arbitrary this contributes an n ;k term to the oerall L distance of the sampled distribution from the uniform distribution. On the other hand, conditional on no estimate being outside the error bound, we shall show by induction on the leel that, for leel l, L (p q ln ;(k+, where the root is considered as leel 0, and p q are the true and estimated probability distributions, respectiely, forthe ertices at leel l. Let be aertex at leel l ;, and p q its true and estimated probabilities. Let p ; q = r. Inthe worst case, p gets partitioned into p = p ( q ; q and p cn k+ = p ( q q + Now. cn k+ Combining from and, p ; q (q + r( q q + cn k+ ; q = q cn k+ + r( q q + cn k+ L (fp p g fq q g q cn + r q + q + r k+ q q cn + r( + k+ cn;k; +r The sum of the r's oer leel lisl (leel (l ;. Hence L (leel l L (leel (l ; + cn k+ Hence L (leel n n k. L (leel (l ; + cn k+ ln ;k; cn k+ cn k+ + cn k+ + cn k+ cisused here as a generic symbol for some constant. 5