Lecture 8: Spectral Graph Theory III

Similar documents
Lecture 7: Spectral Graph Theory II

Lecture 8: Path Technology

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 13: Spectral Graph Theory

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Convergence of Random Walks and Conductance - Draft

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

6.842 Randomness and Computation March 3, Lecture 8

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Spectral Analysis of Random Walks

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Rings, Paths, and Cayley Graphs

Lecture 1. 1 if (i, j) E a i,j = 0 otherwise, l i,j = d i if i = j, and

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 13: Linear programming I

Lecture: Local Spectral Methods (1 of 4)

Graph Partitioning Algorithms and Laplacian Eigenvalues

ORIE 6334 Spectral Graph Theory November 22, Lecture 25

PSRGs via Random Walks on Graphs

1 Adjacency matrix and eigenvalues

Distances, volumes, and integration

Lecture: Some Practical Considerations (3 of 4)

Lecture 8 : Eigenvalues and Eigenvectors

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming

ORIE 6334 Spectral Graph Theory December 1, Lecture 27 Remix

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017

Stanford University CS366: Graph Partitioning and Expanders Handout 13 Luca Trevisan March 4, 2013

Convex Optimization CMU-10725

PSRGs via Random Walks on Graphs

Lecture 10: October 27, 2016

Lecture 3: Big-O and Big-Θ

c 1 v 1 + c 2 v 2 = 0 c 1 λ 1 v 1 + c 2 λ 1 v 2 = 0

CMPUT 675: Approximation Algorithms Fall 2014

Eigenvalues, random walks and Ramanujan graphs

Lecture 12: Lower Bounds for Element-Distinctness and Collision

Convergence of Random Walks

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Physical Metaphors for Graphs

9.1 Orthogonal factor model.

Lecture 13: Lower Bounds using the Adversary Method. 2 The Super-Basic Adversary Method [Amb02]

Stephen F Austin. Exponents and Logarithms. chapter 3

Rings, Paths, and Paley Graphs

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 10: Hardness of approximating clique, FGLSS graph

6.842 Randomness and Computation April 2, Lecture 14

Diffusion and random walks on graphs

Lecture 10 February 4, 2013

An Algorithmist s Toolkit September 24, Lecture 5

Lecture 16: Constraint Satisfaction Problems

Lecture 17: Cheeger s Inequality and the Sparsest Cut Problem

Matching Polynomials of Graphs

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Notes on Complexity Theory Last updated: December, Lecture 2

Lecture: Expanders, in theory and in practice (2 of 2)

Computational complexity

Lecture Space Bounded and Random Turing Machines and RL

The min cost flow problem Course notes for Optimization Spring 2007

Holistic Convergence of Random Walks

7.1 Coupling from the Past

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

Lecture 15: Expanders

Exponential Metrics. Lecture by Sarah Cannon and Emma Cohen. November 18th, 2014

NP-Completeness Review

14 : Theory of Variational Inference: Inner and Outer Approximation

Spectral Clustering. Guokun Lai 2016/10

Lectures 15: Cayley Graphs of Abelian Groups

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Lecturer: Naoki Saito Scribe: Ashley Evans/Allen Xue. May 31, Graph Laplacians and Derivatives

Duality of LPs and Applications

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

The min cost flow problem Course notes for Search and Optimization Spring 2004

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Bipartite Ramanujan Graphs of Every Degree

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10: Powers of Matrices, Difference Equations

Additive Combinatorics Lecture 12

Reading group: proof of the PCP theorem

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

CS286.2 Lecture 8: A variant of QPCP for multiplayer entangled games

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

1 Matrix notation and preliminaries from spectral graph theory

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

Lecture 16: FTRL and Online Mirror Descent

Cutting Graphs, Personal PageRank and Spilling Paint

Spectral Graph Theory and its Applications

2.1 Laplacian Variants

NP-Completeness. Until now we have been designing algorithms for specific problems

Lecture 6: Powering Completed, Intro to Composition

Walks, Springs, and Resistor Networks

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

Distributive property and its connection to areas

Lecture 13 Spectral Graph Algorithms

Lecture 14: October 22

Transcription:

A Theorist s Toolkit (CMU 18-859T, Fall 13) Lecture 8: Spectral Graph Theory III October, 13 Lecturer: Ryan O Donnell Scribe: Jeremy Karp 1 Recap Last time we showed that every function f : V R is uniquely expressible as f = ˆf()φ + ˆf(1)φ 1 +... + ˆf(n 1)φ n 1 In this representation, the φ i s form an orgthogonal basis of the eigenvectors of L. We can also compute the eigenvalues of L and order them as follows: = λ λ 1... λ n 1 The number of eigenvalues that are is the number of connected components in the graph. Also recall that if we apply L to any function f: It also follows that A corollary of this is Lf =λ ˆf()φ + λ 1 ˆf(1)φ1 +... + λ n 1 ˆf(n 1)φn 1 f, g = n 1 i= ˆf(i)ĝ(i) ξ(f) = f, Lf = i> λ i ˆf(i) We also have f = i ˆf(i) and E[f] = f, 1 From orthonormality, everything in the previous inner product drops out except the first term, so Var[f] = E[f ] E[f] = i> ˆf(i) 1

Conductance and the second eigenvalue Now, we can observe that min V ar[f]=1 {ξ(f)} = λ 1 This comes from the observatrion that the variance is a sum of non-negative numbers that add up to 1, but ξ(f) is the sum of these same numbers multiplied by the eigenvalues. Then, the smallest this result could be is the smallest of these eigenvalues, λ 1. Another way to think of this idea is in the form of the Poincare inequality: λ 1 Var[f] ξ[f] Note that λ 1 is referred to as the second eigenvalue, despite its index being 1. We care about the relationship between Var[f] and ξ[f] because it relates to conductance. Recall from last time the definition of conductance: Definition.1. The conductance of S V is Φ(S) = Pr u v [v / S u S] = ξ[1 S] 1 S We can relate these to the sparsest cut in the graph by defining the conductance of a graph: Definition.. Φ G, the conductance of G is min {Φ(S)} S: <vol(s) 1 Note that a lower bound on the conductance is 1 min { ξ[1 S] S: <vol(s) 1 Var[1 S ] } because Var[1 S ] = vol(s)(1 vol(s)) 1 vol(s) = 1 1 S. As we saw in Section 1, we can similarly express the second eigenvalue as the following minimization problem: λ 1 = min { ξ[f] f: u R, V ar(f) Var[f] } This is a relaxation of the optimization problem that we solved to determine the conductance of G. Now, we can optimize over a larger range of functions. Interestingly, we can compute λ 1 efficiently, but it is NP -hard to determine the conductance of a graph. In the next section, we relate these two values to each other using Cheeger s inequality.

3 Cheeger s inequality The following theorem is called Cheeger s inequality: Theorem 3.1 (Alon, 1986). 1 λ 1 Φ G λ 1 Proof. The first inequality is a consequence of our phrasing of λ 1 as a minimization problem. Specifically, we have 1 λ 1 = 1 min { ξ[f] f: u R, V ar(f) Var[f] } 1 min { ξ[1 S] S: <vol(s) 1 Var[1 S ] } Φ G The second inequality is proved by Theorem 3.. Observe that a small λ 1 means there exists a function f where ξ[f] is small compared to Var[f]. Then, a natural question to ask is whether we can convert this function to a {, 1}- valued function, which would apply to Φ G. This process is where we lose a square-root factor in the second inequality. Theorem 3.. Let f be a non-constant function such that ξ[f] λ 1 Var[f]. Then, S V such that < vol(s) 1 and Φ(S) λ 1 Figure 1: An illustration of how S is determined by function f and value t In fact, there exists a set S with these proterties where S = {u : f(u) t} or {u : f(u) < t} for some value t. This result allows us to approximately solve the sparsest cut problem. However, we need to do some more work before we are ready to prove Theorem 3.. 3

Definition 3.3. g : V R is convenient if g and vol({u : g(u) }) 1 We claim without proof that we can assume f is convenient in our proof of Theorem 3., losing only a factor of in the inequality ξ[f] λ 1 Var[f]. A reasonable question to ask, regarding the cut-off value t we discussed earlier is, for a value t, what is the expectation of the set you obtain, S. This question is answered by the following Lemma: Lemma 3.4. [Coarea formula] For a non-negative function g : V R + where g >t = {u : g(u) > t}. Proof. (Pr u v [(u, v) crosses g >t boundary)dt = E u v [ g(u) g(v) ] (Pr u v [(u, v) crosses g >t boundary)dt = = E u v [ (Pr u v [t is between g(u), g(v)])dt = E u v [ g(u) g(v) ] 1 T {t is between g(u), g(v)}dt] Figure : We see in this illustration that the proof of the Coarea formula relies on the non-negativity of g We also have the following: Corollary 3.5. Let g be convenient. Then, E[ g(u) g(v) ] Φ G E u v [g(u)]. In this proof, note that ξ[1 S ] = Pr u v [u S, v / S]. 4

Proof. E[ g(u) g(v) ] = = ξ[1 g >t]dt Φ(g >t )vol(g >t )dt Φ G Pr u π [u g >t ]dt = Φ G Pr u π [g(u) > t]dt = Φ G E u v [g(u)] Note that the first inequality in the previous proof relies on the fact that g is convenient. We can now use this corollary to prove Theorem 3.. Proof. [Proof of Theorem 3.] Assuming g is convenient and not always, we see that Φ G E u v[ g(u) g(v) ] E u v [g(u)] We have a convenient function f such that λ Var[f] 1. Then, let g = f. g is still non-negative and its support is the same as the support of f, so g is also convenient. E u v [ f(u) f(v) ] = E u v [ f(u) f(v) f(u) + f(v) ] ξ[f] E u v [(f(u) f(v)) ] E u v [(f(u) + f(v)) ] ξ[f] E u v [f(u) + f(v) ] 4λ 1 E u v [f(u) ] E u v [f(u) ] = 4 λ 1 E u v [f(u) ] We then combine this calculation with the previous one to prove the theorem: Φ G E u v[ g(u) g(v) ] E u v [g(u)] = E u v[ f(u) f(v) ] E u v [f(u) ] 4 λ 1 E u v [f(u) ] E u v [f(u) ] = λ 1 5

Before moving on, let s consider more specifically how we can use Theorem 3.1 to compute an approximately-minimal conductance set S in G efficiently. Given G, it is possible to compute a value λ 1 and a function f : V R such that: a) λ 1 λ 1 ɛ, b) in time either: i) using The Power Method, or ii) E[f] λ 1 Var[f] Õ(n) 1 ɛ poly(n) log 1 ɛ using more traditional linear algebra methods. Note that in some sense, (i) above is not a polynomial time algorithm, since the input length of ɛ is roughly log 1. That said, since the Cheeger algorithm only gives an approximation to the sparsest cut anyway, maybe you don t mind if ɛ is rather big. (ii) is actually ɛ a polynomial time algorithm, but the polynomial on the n isn t very good (perhaps n 4 ) so in practice it might not be a great algorithm to run. 4 Convergence to the stationary distribution So far, we have proved Cheeger s inequality, which provides us with a method to find the sparsest cut in a graph up to a square root factor. Cheeger s inequality also implies that λ 1 is large if and only if there does not exists a set with small conductance. This observation relates to the speed of convergence to π, the stationary distribution. By this we mean, if we are to start at any vertex in the graph and take a random walk, how long does it take for the probability distribution on our location to converge to π? We will see that if λ 1 is large, then convergence occurs quickly. These graphs where λ 1 is large are called expander graphs. If we want to determine the speed of convergence, we might ask what it means for two distributions to be close. We first observe that (I L)φ i = φ i λ i φ i = (1 λ i )φ i Then, let κ i = (1 λ i ) which gives us the ordering 1 = κ κ 1... κ n 1 1. Then, if λ 1 is large, κ 1 has a large gap from 1. 6

4.1 Lazy graphs Unfortunately, when a graph is bipartite we never converge to the stationary distribution because the number of steps we have taken determines which side of the bipartition we are necessarily on. Fortunately, we can fix this issue by introducing lazy graphs. Definition 4.1. The lazy version of graph G = (V, E) is G L = (V, E S) where S is a set of self-loops such that each degree d node has d self-loops in set S. Lazy versions of graphs have the property that at each step of the random walk, with probability 1 we stay at the same vertex and with probability 1 we take a random step according to the same probability distribution as in the original graph. This fixes the issue we had with bipartite graphs, while keeping many of the important properties of the original graph. Now, in the lazy version of our graph, we have the operator ( 1I + 1 κ) and if we apply this operator to φ i we get ( 1 I + 1 κ)φ i = 1 φ i + 1 κ iφ i = ( 1 + 1 κ i)φ i In the lazy version of the graph, φ,..., φ n 1 are still eigenvectors, but each eigenvalue κ i now becomes 1 + 1 κ i which is between and 1. Similarly, the eigenvalues of the new Laplacian are also between and 1. 4. Closeness of two distributions We now return to the question of how to measure closeness of distributions. Definition 4.. A probability density ψ : V R is such that ψ and E u π [ψ(u)] = 1 This probability density is sort of like the relative density with respect to π. More formally, a probability density function corresponds to a probability distribution D ψ where Pr u Dψ := ψ(u)π(u). For example, if ψ 1 then D ψ = π. Fact 4.3. ψ, f = E u π [ψ(u)f(u)] = u π(u)ψ(u)f(u) = u D ψ (u)f(u) = E u Dψ [f(u)] 7

Additionaly, the density corresponding to the probability distribution placing all probability of vertex u is ψ u = 1 1 π(u) u. We can now ask is D ψ close to π? Definition 4.4. The χ -divergence of D ψ from π is d χ (D ψ, π) = Var π [ψ] = E u π [(ψ(u) 1) ] = ψ 1 = i ˆψ(i) ˆψ() = i> ˆψ(i) This definition makes some intuitive sense because the distance ψ is from 1 at each point is an indicator of how similar the D ψ is to π. Goin forward, we will use χ -divergence as our measure of closeness, even if it has some strange properties. One unusual feature, in particular, is that this measure of closeness is not symmetric. We will also present an alternative closeness metric, total variation distance. Definition 4.5. The total variation distance between D ψ and π is d T V (D ψ, π) = 1 D ψ (u) π(u) u V Note that total variation distance is symmetric and between and 1. The following fact illustrates that the previous two measures of closeness are related Fact 4.6. d T V (D ψ, π) = 1 D ψ (u) π(u) u V = 1 π(u) D ψ(u) π(u) 1 u V = 1 E u π[ ψ(u) 1 ] 1 Eu π [(ψ(u) 1) ] = 1 d χ (D ψ, π) We are now interested in the question of how close are we to π if we take a random walk for t steps from some pre-determined starting vertex u. To answer this, suppose ψ is a density. Then Kψ is some function. 8

Fact 4.7. Kψ, 1 = ψ, K1 = ψ, 1 = E[ψ] Then, Kψ is also a density because it s expectation is 1 and it s non-negative. = 1 Theorem 4.8. D Kψ is equivalent to the probability distribution where we drew a vertex from D ψ, then take one random step. Proof. [Proof sketch] We use the fact about ψ, f π : Kψ, f = ψ, Kf = E u ψ [(Kf)(u)] = E u Dχ [E v u [f(u)]] = E u DKψ [f(u)] It is also true that D K t ψ corresponds to taking t random steps. Note also that the operator K t satisfies K t φ i = κ t iφ i ; i.e., the eigenvectors of K t are still 1 = φ, φ 1,..., φ n and the associated eigenvalues are κ t i. Let us henceforth assume that G is made lazy so that κ i for all i. Thus we have 1 = κ t κ t 1 κ t n 1. (Note that if we hadn t done that then maybe κ n 1 = 1 and so κ t n 1 = 1!) We also saw that the χ -divergence of D ψ from π was Var[ψ], and that this was also an upper bound on d T V (D ψ, π); i.e., it s indeed a reasonable way to measure closeness to π. Now we can do a simple calculation: d χ (D K ψ, π) = Var[K t ψ] = i> = i> K t ψ(i) κ t i ψ(i) max i> {κt i} i> = κ t 1Var[ψ] = κ t 1d χ (D Kψ, π) Thus if we do a random walk in G with the starting vertex chosen according to D ψ, the χ -divergence from π of our distribution at time t literally just goes down exponentially in 9 ψ(i)

t, with the base of the exponent being κ 1. In particular, if κ 1 has a big gap from 1, say κ 1 = 1 λ 1 where λ 1 (the second eigenvalue of L) is large, then the exponential decay is (1 λ 1 ) t d χ (D K ψ, π). We may also upper-bound this more simply as exp( λ 1 ) t d χ (D Kψ, π) = exp( λ 1 t)d χ (D Kψ, π). Finally, what is the worst possible starting distribution? It seems intuitively clear that it should be a distribution concentrated on a single vertex, some u. As we saw, this distribution has density ψ u = 1 1 π(u) u. It satisfies d χ (ψ u, π) = Var[ψ u ] E[ψu] 1 = π(u) π(u) = 1 π(u). Note that in the case of a regular graph G, all of the π(u) s are 1/n, so this distance is upper-bounded by n. In general, we ll just say it s at most 1, where π is defined to be π min u {π(u)}. Finally, it is an exercise (use convexity!) to show that divergence from π after t steps starting from any distribution is no bigger than the maximum divergence over all singlevertex starting points. We conclude with a theorem. Theorem 4.9. For a lazy G, if we start a t-step random walk from any starting distribution, the resulting distribution D on vertices satisfies d χ (D, π) exp( λ 1 t) 1 π where π = min u {π(u)}, which is 1/n in case G is regular. Inverting this, we may say that we get ɛ-close to π (in χ -divergence) after t = ln(n/ɛ) λ 1 steps when G is regular. Typically, we might have λ 1 equal to some small absolute constant like.1, and we might choose ɛ = 1/n 1. In this case, we get super-duper-close to π after just O(log n) steps! 1