Graph Partitioning Algorithms and Laplacian Eigenvalues

Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan

spectral graph theory Use linear algebra to understand/characterize graph properties design efficient graph algorithms

linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j)

linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j) eigenvalues combinatorial properties eigenvenctors algorithms for combinatorial problems

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and 0 = λ 1 λ 2 λ n 2

0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8

2 2 2 4 4 4 0,,,,,,, 2 3 3 3 3 3 3

eigenvalues as solutions to optimization problems If M R n n is symmetric, and λ 1 λ 2 λ n are eigenvalues counted with multiplicities, then λ k = min A k dim subspace of R V max x A x T Mx x T x and eigenvectors of λ 1,..., λ k are basis of opt A

why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx x T x = (u,v) E x u x v 2 v d v x 2 v

why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx (u,v) E x T x = x u x v 2 v d v xv 2 If λ 1 λ 2 λ n are eigenvalues of L with multiplicities, λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 Consider the space of all vectors that are constant in each connected component; it has dimension at least k and so it witnesses λ k = 0.

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components.

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components

conductance vol(s) := v S d v φ(s) := E(S, S) vol(s) φ(g) := min S V :0<vol(S) 1 2 vol(v ) φ(s) In regular graphs, same as edge expansion and, up to factor of 2 non-uniform sparsest cut

conductance 0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 φ(g) = φ({0, 1, 2}) = 0 6 = 0

conductance 0,.055,.055,.211,... φ(g) = 2 18

conductance 0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3 φ(g) = 5 15

G is a social network finding S of small conductance

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets S are items more similar to each other than to other items [Shi, Malik]: image segmentation G is an input of a NP-hard optimization problem Recurse on S, V S, use dynamic programming to combine in time 2 O( E(S, S) )

conductance versus λ 2 Recall: λ 2 = 0 G disconnected

conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0

conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0 Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s): λ 2 2 φ(g) 2λ 2

λ 2 2 φ(g) 2λ 2 Cheeger inequality

Cheeger inequality λ 2 2 φ(g) 2λ 2 Constructive proof of φ(g) 2λ 2 : given eigenvector x of λ 2. sort vertices v according to x v a set of vertices S occurring as initial segment or final segment of sorted order has φ(s) 2λ 2

λ 2 2 φ(g) 2λ 2 applications

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis)

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1)

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1) ( ) mixing time of random walk is O 1 log V and so also λ2 O ( ) 1 φ 2 log V (G)

Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components

Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components We prove: λ k small G has k disjoint sets of small conductance

order-k conductance Define order-k conductance φ k (G) := min S 1,...,S k disjoint max φ(s i) i=1,...,k Note: φ(g) = φ 2 (G) φ k (G) = 0 G has k connected components

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 { φ 3 (G) = max 0, 2 6, 2 } = 1 4 2

0,.055,.055,.211,... { } 2 φ 3 (G) = max 12, 2 12, 2 = 1 14 6

λ k = 0 φ k = 0 λ k

λ k λ k = 0 φ k = 0 (Lee,Oveis Gharan, T, 2012) λ k 2 φ k(g) O(k 2 ) λ k Upper bound is constructive: in nearly-linear time we can find k disjoint sets each of expansion at most O(k 2 ) λ k.

λ k 2 φ k(g) O(k 2 ) λ k λ k

λ k λ k 2 φ k(g) O(k 2 ) λ k φ k (G) O( (log k) λ 1.1k ) (Lee,Oveis Gharan, T, 2012) φ k (G) O( (log k) λ O(k) ) (Louis, Raghavendra, Tetali, Vempala, 2012) Factor O( log k) is tight

spectral clustering The following algorithm works well in practice. (But to solve what problem?) Compute k eigenvectors x (1),..., x (k) of k smallest eigenvalues of L Define mapping F : V R k F (v) := (x v (1), x v (2),..., x v (k) ) Apply k-means to the points that vertices are mapped to

k = 3 spectral embedding of a random graph

spectral clustering LOT algorithms and LRTV algorithm find k low-conductance sets if they exist First step is spectral embedding Then geometric partitioning but not k-means

λ 2 2 φ(g) 2λ 2 Cheeger inequality

Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm

Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm Nearly-linear time sweep algorithm finds S such that, for every k, ( ) k φ(s) φ(g) O λk

1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure λ k

1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure 2 For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) λ 2 + ) λ 2 x y λ k

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0.

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0. Then x is constant on connected components (because R(x) = 0) G has < k connected components (because λ k > 0) So x is at distance zero from a vector (itself) that is (k 1)-valued

idea of proof of first result Consider x as an embedding of vertices to the line

idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance

idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance Round to closest interval boundary If rounded vector is far from original vector, a lot of points are far from boundary; then we construct witness that λ k is small by finding k disjointly supported vectors of small Rayleigh quotient

intuition for second result For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) R(x) + ) R(x) x y It was known: φ(s) 2R(x) x y = 0 φ(s) kr(x) Our proof blends the two arguments

eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components;

eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components; [Tanaka 2012] If λ k+1 > 5 k λ k, then vertices of G can be partitioned into k sets of small conductance, each inducing a subgraph of large conductance. [Non-algorithmic proof] [Oveis-Gharan, T 2013] Sufficient φ k+1 (G) > (1 + ɛ)φ k (G); if λ k+1 > O(k 2 ) λ k, then partition can be found algorithmically

bounded threshold rank [Arora, Barak, Steurer], [Barak, Raghavendra, Steurer], [Guruswami, Sinop],... If λ k is large, then a cut of approximately minimal conductance can be found in time exponential in k using spectral methods [ABS] or convex relaxations [BRS, GS] [Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweep algorithm finds good approximation if λ k is large [Oveis-Gharan, T]: the Linial-Goemans relaxation (solvable in polynomial time independent of k) can be rounded better than in the Arora-Rao-Vazirani analysis if λ k is large

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Kwok, Lau, Lee, Oveis Gharan, T): φ( output of algorithm ) O(k) λ 2 λk

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Spielman, Teng) in bounded-degree planar graphs: ( ) 1 λ 2 O n so φ( output of algorithm ) O matching the planar separator theorem. ( 1 n )

explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 Can we find interesting classes of graphs for which λ 2 << φ(g)?

lucatrevisan.wordpress.com/tag/expanders/