Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan
spectral graph theory Use linear algebra to understand/characterize graph properties design efficient graph algorithms
linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j)
linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j)
linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j) eigenvalues combinatorial properties eigenvenctors algorithms for combinatorial problems
fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2
fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and
fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and 0 = λ 1 λ 2 λ n 2
fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and 0 = λ 1 λ 2 λ n 2 λ k = 0 iff G has k connected components λ n = 2 iff G has a bipartite connected component
0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3
0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3
0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8
0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8
2 2 2 4 4 4 0,,,,,,, 2 3 3 3 3 3 3
2 2 2 4 4 4 0,,,,,,, 2 3 3 3 3 3 3
eigenvalues as solutions to optimization problems If M R n n is symmetric, and λ 1 λ 2 λ n are eigenvalues counted with multiplicities, then λ k = min A k dim subspace of R V max x A x T Mx x T x and eigenvectors of λ 1,..., λ k are basis of opt A
why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx x T x = (u,v) E x u x v 2 v d v x 2 v
why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx (u,v) E x T x = x u x v 2 v d v xv 2 If λ 1 λ 2 λ n are eigenvalues of L with multiplicities, λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2
why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx (u,v) E x T x = x u x v 2 v d v xv 2 If λ 1 λ 2 λ n are eigenvalues of L with multiplicities, λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 Note: (u,v) E x u x v 2 = 0 iff x constant on connected components
if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2
if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 Consider the space of all vectors that are constant in each connected component; it has dimension at least k and so it witnesses λ k = 0.
if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2
if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components.
if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components
if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components It maps to at least k distinct points
if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components It maps to at least k distinct points G has at least k connected components
conductance vol(s) := v S d v φ(s) := E(S, S) vol(s) φ(g) := min S V :0<vol(S) 1 2 vol(v ) φ(s) In regular graphs, same as edge expansion and, up to factor of 2 non-uniform sparsest cut
conductance 0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 φ(g) = φ({0, 1, 2}) = 0 6 = 0
conductance 0,.055,.055,.211,... φ(g) = 2 18
conductance 0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3 φ(g) = 5 15
G is a social network finding S of small conductance
finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else
finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets
finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets S are items more similar to each other than to other items [Shi, Malik]: image segmentation
finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets S are items more similar to each other than to other items [Shi, Malik]: image segmentation G is an input of a NP-hard optimization problem Recurse on S, V S, use dynamic programming to combine in time 2 O( E(S, S) )
conductance versus λ 2 Recall: λ 2 = 0 G disconnected
conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0
conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0 Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s): λ 2 2 φ(g) 2λ 2
λ 2 2 φ(g) 2λ 2 Cheeger inequality
Cheeger inequality λ 2 2 φ(g) 2λ 2 Constructive proof of φ(g) 2λ 2 : given eigenvector x of λ 2. sort vertices v according to x v a set of vertices S occurring as initial segment or final segment of sorted order has φ(s) 2λ 2
Cheeger inequality λ 2 2 φ(g) 2λ 2 Constructive proof of φ(g) 2λ 2 : given eigenvector x of λ 2. sort vertices v according to x v a set of vertices S occurring as initial segment or final segment of sorted order has φ(s) 2λ 2 2 φ(g) sweep algorithm can be implemented in Õ( V + E ) time
λ 2 2 φ(g) 2λ 2 applications
applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis)
applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1)
applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1) ( ) mixing time of random walk is O 1 log V and so also λ2 O ( ) 1 φ 2 log V (G)
Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components
Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components We prove: λ k small G has k disjoint sets of small conductance
order-k conductance Define order-k conductance φ k (G) := min S 1,...,S k disjoint max φ(s i) i=1,...,k Note: φ(g) = φ 2 (G) φ k (G) = 0 G has k connected components
0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8
0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 { φ 3 (G) = max 0, 2 6, 2 } = 1 4 2
0,.055,.055,.211,... { } 2 φ 3 (G) = max 12, 2 12, 2 = 1 14 6
λ k = 0 φ k = 0 λ k
λ k λ k = 0 φ k = 0 (Lee,Oveis Gharan, T, 2012) λ k 2 φ k(g) O(k 2 ) λ k Upper bound is constructive: in nearly-linear time we can find k disjoint sets each of expansion at most O(k 2 ) λ k.
λ k 2 φ k(g) O(k 2 ) λ k λ k
λ k λ k 2 φ k(g) O(k 2 ) λ k φ k (G) O( (log k) λ 1.1k ) (Lee,Oveis Gharan, T, 2012) φ k (G) O( (log k) λ O(k) ) (Louis, Raghavendra, Tetali, Vempala, 2012) Factor O( log k) is tight
spectral clustering The following algorithm works well in practice. (But to solve what problem?) Compute k eigenvectors x (1),..., x (k) of k smallest eigenvalues of L Define mapping F : V R k F (v) := (x v (1), x v (2),..., x v (k) ) Apply k-means to the points that vertices are mapped to
k = 3 spectral embedding of a random graph
spectral clustering LOT algorithms and LRTV algorithm find k low-conductance sets if they exist First step is spectral embedding Then geometric partitioning but not k-means
λ 2 2 φ(g) 2λ 2 Cheeger inequality
Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm
Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm Nearly-linear time sweep algorithm finds S such that, for every k, ( ) k φ(s) φ(g) O λk
1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure λ k
1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure 2 For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) λ 2 + ) λ 2 x y λ k
1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure 2 For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) λ 2 + ) λ 2 x y λ k So the sweep algorithm finds S φ(s) O(k) λ 2 1 λk
intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k
intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0.
intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0. Then x is constant on connected components (because R(x) = 0) G has < k connected components (because λ k > 0) So x is at distance zero from a vector (itself) that is (k 1)-valued
idea of proof of first result Consider x as an embedding of vertices to the line
idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance
idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance Round to closest interval boundary
idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance Round to closest interval boundary If rounded vector is far from original vector, a lot of points are far from boundary; then we construct witness that λ k is small by finding k disjointly supported vectors of small Rayleigh quotient
intuition for second result For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) R(x) + ) R(x) x y It was known: φ(s) 2R(x) x y = 0 φ(s) kr(x) Our proof blends the two arguments
eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components;
eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components; [Tanaka 2012] If λ k+1 > 5 k λ k, then vertices of G can be partitioned into k sets of small conductance, each inducing a subgraph of large conductance. [Non-algorithmic proof]
eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components; [Tanaka 2012] If λ k+1 > 5 k λ k, then vertices of G can be partitioned into k sets of small conductance, each inducing a subgraph of large conductance. [Non-algorithmic proof] [Oveis-Gharan, T 2013] Sufficient φ k+1 (G) > (1 + ɛ)φ k (G); if λ k+1 > O(k 2 ) λ k, then partition can be found algorithmically
bounded threshold rank [Arora, Barak, Steurer], [Barak, Raghavendra, Steurer], [Guruswami, Sinop],... If λ k is large, then a cut of approximately minimal conductance can be found in time exponential in k using spectral methods [ABS] or convex relaxations [BRS, GS]
bounded threshold rank [Arora, Barak, Steurer], [Barak, Raghavendra, Steurer], [Guruswami, Sinop],... If λ k is large, then a cut of approximately minimal conductance can be found in time exponential in k using spectral methods [ABS] or convex relaxations [BRS, GS] [Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweep algorithm finds good approximation if λ k is large [Oveis-Gharan, T]: the Linial-Goemans relaxation (solvable in polynomial time independent of k) can be rounded better than in the Arora-Rao-Vazirani analysis if λ k is large
challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2
challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Kwok, Lau, Lee, Oveis Gharan, T): φ( output of algorithm ) O(k) λ 2 λk
challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2
challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Spielman, Teng) in bounded-degree planar graphs: ( ) 1 λ 2 O n so φ( output of algorithm ) O matching the planar separator theorem. ( 1 n )
explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2
explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 Can we find interesting classes of graphs for which λ 2 << φ(g)?
lucatrevisan.wordpress.com/tag/expanders/