Spectral Graph Theory and Applications WS 011/01 Lecture 6: Random Walks versus Independent Sampling Lecturer: Thomas Sauerwald & He Sun For many problems it is necessary to draw samples from some distribution D on a typically large set V. In order to do so, one often considers a Markov chain on V whose limiting distribution is D. The efficiency of the sampling algorithm requires the Markov chain to converge quickly. Two famous examples where this approach have been applied successfully are approximation algorithms for the permanent of nonnegative matrices [SJ89] and for the volume of convex bodies [DFK91]. In this lecture notes, we first see how the convergence of random walks can be related to the second largest eigenvalue of the transition matrix. In the second part, we reveal a more sophisticated property of random walks on expander graphs. Roughly speaking, it turns out that the samples returned by a random walk on an expander of length t are very similar to t independent samples of the vertices of the expander. This property can then be used to reduce the number of required random bits for a broad class of randomized algorithms. 1 Random Walks on Graphs We now focus on random walks on graphs. We assume G = (V, E) is an undirected, unweighted and connected graph which is also d-regular. Recall that M = D 1 A is the normalized adjacency matrix which will be transition matrix of the random walk on G. Lemma 6.1. Let M be any symmetric transition matrix. Then for any probability vector x: M t x π λ t, where π = (1/n,..., 1/n) is the uniform vector and λ = max{ λ, λ n }. t = O(log n/ log(1/λ)) = O(log n/(1 λ)), M t x π 1/(n), In particular, for i.e., for all u, v V, M t u,v [ 1 1 n, 3 1 n ]. Proof. Write v 1 = π,..., v n be the orthonormal eigenvalues of A. Express x in terms of this basis as follows: x = α i v i. i=1 Since x is a probability vector and all v i, i are orthogonal to π = (1/n,..., 1/n), it follows 1
Lecture 6: Random Walks versus Independent Sampling that α i = 1. Then, ( ) Mx π = M α i v i π i=1 = π + α i λ i v i π i= = α i λ i v i i= = α i λ i v i i= λ α i v i i= = λ α i v i i= = λ x π. (since the v i s are orthonormal) Taking square roots yields Mx π λ x π, so that for any t 1, M t x π = M(M t 1 x) π λ M t 1 x π λ t x π. Finally, M t x π λ t x π λ t x λ t x 1 = λ t, where in the first inequality we have used that x π + π = x, since x π and π are orthogonal, which immediately implies x π x. Probability Amplification by Random Walks on Expanders Suppose we run a randomized sampling algorithm for which there is an unknown (bad) set B V for which the algorithm cannot solve the problem. For instance, B could describe all possible choices for the (random) bits the algorithm could use. Then after t repetitions of the algorithm we find the correct answer if and only if at least one sample lies outside B. While the obvious way to amplify the success probability is to generate t independent samples, there is a more clever and somewhat surprising solution. One performs a t-step random walk with a random starting vertex on an expander graph with vertex set V. Despite the large dependencies among two consecutive vertices, it turns out the probability for the random walk to hit at least one vertex outside B is very close to the probability that we have when we sample all t vertices independently from V.
Lecture 6: Random Walks versus Independent Sampling 3 B B Figure 1: Illustration of Independent Samples and the corresponding random walk. B represents the set of bad inputs that we seek to avoid. Theorem 6. (Ajtai-Komlos-Szemerdi (1987), Alon-Feige-Wigderson-Zuckerman (1995), [AKS87, AFWZ95]). Let G be a d-regular graph with n vertices and spectral expansion λ. Let B V with B = βn. Then, Proof. Define a new matrix P as follows: Pr [ B ] := Pr [ t: X t B ] (β + λ) t. P u,v = { 1 if u = v B 0 otherwise. Lemma 6.3. Let π = (1/n,..., 1/n) be the uniform vector. Then, Pr [ B ] = ( PM) t 1 Pπ Proof. Note that Pπ gives a vector which is 1/n at components corresponding to vertices in B and 0 otherwise. Moreover, M t u,v is the probability for a random walk on G starting from u to be located at v at step t. Since P is a projection matrix, it follows that ( PM) t u,v is the same probability as before, but now the random walk is additionally required to use only vertices in B. We now argue (slightly) more formally: [( PM) ] t Pπ = [ ( PM) t] [ ] Pπ u u,w w w V = [ ( PM) t] 1 w B 1 u,w n w V = 1 [ ( n PM) t] w,u w B Hence the claim of the lemma follows. We continue with another lemma. Lemma 6.4. For any vector v R n, PM Pv (β + λ) v. Proof. We first note that we can assume that Pv = v. Otherwise, we replace v by Pv which doesn t change the left-hand-side, and can only make the right-hand-side smaller. For the same reason, we can also assume that v is non-negative and by scaling, we may also assume that i v i = 1.
Lecture 6: Random Walks versus Independent Sampling 4 Hence, we can express v as follows: Pv = v = π + z, where z = v π is orthogonal to π. With this, we obtain PM Pv = PMv = PMπ + PMz = Pπ + PMz, and thus PM Pv Pπ + PMz. (1) We now bound the two summands on the right-hand side separately. By Cauchy-Schwartz, 1 = i v i = i 1 i B v i βn v. Since Pu = β/n, it follows that Pu β v. For the other summand, recall that z is orthogonal to π. Hence, z = n i= c iv i where v i is the i-th eigenvector of M. Moreover, since P is a projection, PMz Mz c i λ i v i λ c i v i = λ z λ v, i= where the last inequality holds since v = π + z and z is orthogonal to π. Hence, i= z + π = v π + π = v Plugging in the two inequalities in equation (1) yields the lemma. By combining the two lemmas, we obtain the theorem as follows. ( PM) t 1 Pπ n ( PM) t Pπ = n ( PM P) t π n (β + λ) t π = (β + λ) t. There exist various extensions of Theorem 6.; for instance, where we only consider a subset of the time-steps in {1,..., t} or the set to be avoided changes over time. We refer to [HLW06] for further details and the references therein. Let us now apply Theorem 6. for a probabilistic algorithm A for the language L RP (the class of problems with one-sided error).
Lecture 6: Random Walks versus Independent Sampling 5 Algorithm Error Probability Random Bits Rand. Algorithm 1/4 k t Repetitions (1/4) t t k t-step Random Walk (1/4) t k + O(t) log(d) Figure : Comparison of the methods for probability amplification. If k is a sufficiently large constant, then for any value of t, the t-step Random Walk algorithm requires less random bits for achieving the same error probability as the t-repetitions. Complexity Class RP. The complexity class RP consists of all languages L for which there exists a probabilistic polynomial-time Turing machine M such that x L = Pr [ M(x) = 1 ] 3 4, x L = Pr [ M(x) = 1 ] = 0. To decide whether a given input x is in L, the algorithm A samples a random string r {0, 1} l of length l and computes in polynomial time a boolean function A(x, r). If x L, then A(x, r) = 1 for all r. If x / L, then the probability (over r) that A(x, r) = 1 is at most β. Now take a graph G = (V, E) with V = {0, 1} k which has spectral expansion at least λ and suppose that λ is sufficiently smaller than β which is the error of the given algorithm. Then the new algorithm à based on random walks is defined as follows: (1) Pick a vertex u 0 V uniformly at random. () Perform a random walk of length t (X 0, X 1,..., X t ) (3) Return t i=0 A(x, v i). By definition of RP, we only need to consider the case where x L. By Theorem 6., [ ] Pr à fails Pr [ i: X i B ] (β + λ) t, which is at most Ω(t) assuming that λ is a constant smaller than < 3/4. Adjusting the constants, we conclude that the error probability is at most 4 t if we use k + O(t) log(d) random bits. Complexity Class BPP. The complexity class BPP consists of all languages L for which there exists a probabilistic polynomial-time Turing machine M such that x L = Pr [ M(x) = 1 ] 3 4, x L = Pr [ M(x) = 0 ] 3 4. For this class, we can achieve a similar probability amplification by performing a random walk of length t and returning the majority vote. For more details, see [HLW06]. References [AFWZ95] Noga Alon, Uriel Feige, Avi Wigderson, and David Zuckerman. Derandomized graph products. Computational Complexity, 5(1):60 75, 1995. [AKS87] Miklós Ajtai, János Komlós, and Endre Szemerédi. Deterministic simulation in logspace. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing, pages 13 140, 1987.
Lecture 6: Random Walks versus Independent Sampling 6 [DFK91] [HLW06] [SJ89] Martin E. Dyer, Alan M. Frieze, and Ravi Kannan. A random polynomial time algorithm for approximating the volume of convex bodies. Journal of the ACM, 38(1), 1991. Shlomo Hoory, Nathan Linial, and Avi Widgerson. Expander graphs and their applications. Bulletin (New series) of the American Mathematical Society, 43(4):439?561, 006. Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and rapidly mixing markov chains. Inf. Comput., 8(1):93 133, 1989.