Graph Partitioning Algorithms and Laplacian Eigenvalues

Similar documents
Lecture 17: Cheeger s Inequality and the Sparsest Cut Problem

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 13: Spectral Graph Theory

EIGENVECTOR NORMS MATTER IN SPECTRAL GRAPH THEORY

Random Walks and Evolving Sets: Faster Convergences and Limitations

Graph Clustering Algorithms

Markov chain methods for small-set expansion

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

1 Matrix notation and preliminaries from spectral graph theory

Spectral Algorithms for Graph Mining and Analysis. Yiannis Kou:s. University of Puerto Rico - Rio Piedras NSF CAREER CCF

Graph Partitioning Using Random Walks

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

18.S096: Spectral Clustering and Cheeger s Inequality

ORIE 6334 Spectral Graph Theory November 22, Lecture 25

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Unique Games and Small Set Expansion

Stanford University CS366: Graph Partitioning and Expanders Handout 13 Luca Trevisan March 4, 2013

Unique Games Conjecture & Polynomial Optimization. David Steurer

Testing Cluster Structure of Graphs. Artur Czumaj

Approximation & Complexity

Improved Cheeger s Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap

SDP Relaxations for MAXCUT

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

1 Matrix notation and preliminaries from spectral graph theory

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

A Schur Complement Cheeger Inequality

arxiv: v4 [cs.ds] 23 Jun 2015

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

Lecture 9: Low Rank Approximation

STA141C: Big Data & High Performance Statistical Computing

Networks and Their Spectra

Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

ANNALES. SHAYAN OVEIS GHARAN Spectral Graph Theory via Higher Order Eigenvalues and Applications to the Analysis of Random Walks

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record

Spectral Clustering on Handwritten Digits Database

eigenvalue bounds and metric uniformization Punya Biswal & James R. Lee University of Washington Satish Rao U. C. Berkeley

Spectra of Adjacency and Laplacian Matrices

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Laplacian Matrices of Graphs: Spectral and Electrical Theory

Maximum cut and related problems

Linear algebra and applications to graphs Part 1

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture 8: Spectral Graph Theory III

Lecture 1. 1 if (i, j) E a i,j = 0 otherwise, l i,j = d i if i = j, and

Markov Chains and Spectral Clustering

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

Geometry of Similarity Search

Lectures 15: Cayley Graphs of Abelian Groups

Lecture 1 and 2: Random Spanning Trees

Partitioning Algorithms that Combine Spectral and Flow Methods

Spectral Clustering. Guokun Lai 2016/10

Random Lifts of Graphs

MATH 567: Mathematical Techniques in Data Science Clustering II

Reductions Between Expansion Problems

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Diffusion and random walks on graphs

Nonlinear Eigenproblems in Data Analysis

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

ON THE QUALITY OF SPECTRAL SEPARATORS

Lecture: Local Spectral Methods (1 of 4)

Spectral Clustering. Zitao Liu

Approximating the isoperimetric number of strongly regular graphs

Convex and Semidefinite Programming for Approximation

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

CSC2420 Spring 2017: Algorithm Design, Analysis and Theory Lecture 11

1 Review: symmetric matrices, their eigenvalues and eigenvectors

Combinatorial Optimization

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

Inverse Power Method for Non-linear Eigenproblems

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

1 Adjacency matrix and eigenvalues

Dissertation Defense

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering

Graph Sparsification I : Effective Resistance Sampling

On a Cut-Matching Game for the Sparsest Cut Problem

Graphs in Machine Learning

Tight Size-Degree Lower Bounds for Sums-of-Squares Proofs

A local algorithm for finding dense bipartite-like subgraphs

Space-Variant Computer Vision: A Graph Theoretic Approach

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Lecture: Flow-based Methods for Partitioning Graphs (1 of 2)

Introduction to Spectral Graph Theory and Graph Clustering

Convergence of Random Walks

Unsupervised Learning

Machine Learning for Data Science (CS4786) Lecture 11

approximation algorithms I

Invertibility and Largest Eigenvalue of Symmetric Matrix Signings

Positive Semi-definite programing and applications for approximation

Petar Maymounkov Problem Set 1 MIT with Prof. Jonathan Kelner, Fall 07

Transcription:

Graph Partitioning Algorithms and Laplacian Eigenvalues Luca Trevisan Stanford Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee, Yin Tat Lee, and Shayan Oveis Gharan

spectral graph theory Use linear algebra to understand/characterize graph properties design efficient graph algorithms

linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j)

linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j)

linear algebra and graphs Take an undirected graph G = (V, E), consider matrix A[i, j] := weight of edge (i, j) eigenvalues combinatorial properties eigenvenctors algorithms for combinatorial problems

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and 0 = λ 1 λ 2 λ n 2

fundamental thm of spectral graph theory G = (V, E) undirected, A adjacency matrix, d v degree of v, D diagonal matrix of degrees L := I D 1 2 AD 1 2 L is symmetric, all its eigenvalues are real and 0 = λ 1 λ 2 λ n 2 λ k = 0 iff G has k connected components λ n = 2 iff G has a bipartite connected component

0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3

0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8

2 2 2 4 4 4 0,,,,,,, 2 3 3 3 3 3 3

2 2 2 4 4 4 0,,,,,,, 2 3 3 3 3 3 3

eigenvalues as solutions to optimization problems If M R n n is symmetric, and λ 1 λ 2 λ n are eigenvalues counted with multiplicities, then λ k = min A k dim subspace of R V max x A x T Mx x T x and eigenvectors of λ 1,..., λ k are basis of opt A

why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx x T x = (u,v) E x u x v 2 v d v x 2 v

why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx (u,v) E x T x = x u x v 2 v d v xv 2 If λ 1 λ 2 λ n are eigenvalues of L with multiplicities, λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

why eigenvalues relate to connected components If G = (V, E) is undirected and L is Laplacian, then x T Lx (u,v) E x T x = x u x v 2 v d v xv 2 If λ 1 λ 2 λ n are eigenvalues of L with multiplicities, λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 Note: (u,v) E x u x v 2 = 0 iff x constant on connected components

if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

if G has k connected components λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 Consider the space of all vectors that are constant in each connected component; it has dimension at least k and so it witnesses λ k = 0.

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components.

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components It maps to at least k distinct points

if λ k = 0 λ k = min A k dim subspace of R V max x A (u,v) E x u x v 2 v d v xv 2 The eigenvectors x (1),..., x (k) of λ 1,..., λ k are all constant on connected components. The mapping F (v) := (x v (1), x v (2),..., x v (k) ) is constant on connected components It maps to at least k distinct points G has at least k connected components

conductance vol(s) := v S d v φ(s) := E(S, S) vol(s) φ(g) := min S V :0<vol(S) 1 2 vol(v ) φ(s) In regular graphs, same as edge expansion and, up to factor of 2 non-uniform sparsest cut

conductance 0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 φ(g) = φ({0, 1, 2}) = 0 6 = 0

conductance 0,.055,.055,.211,... φ(g) = 2 18

conductance 0, 2 3, 2 3, 2 3, 2 3, 2 3, 5 3, 5 3, 5 3, 5 3 φ(g) = 5 15

G is a social network finding S of small conductance

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets S are items more similar to each other than to other items [Shi, Malik]: image segmentation

finding S of small conductance G is a social network S is a set of users more likely to be friends with each other than anyone else G is a similarity graph on items of a data sets S are items more similar to each other than to other items [Shi, Malik]: image segmentation G is an input of a NP-hard optimization problem Recurse on S, V S, use dynamic programming to combine in time 2 O( E(S, S) )

conductance versus λ 2 Recall: λ 2 = 0 G disconnected

conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0

conductance versus λ 2 Recall: λ 2 = 0 G disconnected φ(g) = 0 Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s): λ 2 2 φ(g) 2λ 2

λ 2 2 φ(g) 2λ 2 Cheeger inequality

Cheeger inequality λ 2 2 φ(g) 2λ 2 Constructive proof of φ(g) 2λ 2 : given eigenvector x of λ 2. sort vertices v according to x v a set of vertices S occurring as initial segment or final segment of sorted order has φ(s) 2λ 2

Cheeger inequality λ 2 2 φ(g) 2λ 2 Constructive proof of φ(g) 2λ 2 : given eigenvector x of λ 2. sort vertices v according to x v a set of vertices S occurring as initial segment or final segment of sorted order has φ(s) 2λ 2 2 φ(g) sweep algorithm can be implemented in Õ( V + E ) time

λ 2 2 φ(g) 2λ 2 applications

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis)

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1)

applications λ 2 2 φ(g) 2λ 2 rigorous analysis of sweep algorithm (practical performance usually better than worst-case analysis) to certify φ Ω(1), enough to certify λ 2 Ω(1) ( ) mixing time of random walk is O 1 log V and so also λ2 O ( ) 1 φ 2 log V (G)

Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components

Lee, Oveis-Gharan, T, 2012 From fundamental theorem of spectral graph theory: λ k = 0 G has k connected components We prove: λ k small G has k disjoint sets of small conductance

order-k conductance Define order-k conductance φ k (G) := min S 1,...,S k disjoint max φ(s i) i=1,...,k Note: φ(g) = φ 2 (G) φ k (G) = 0 G has k connected components

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8

0, 0,.69,.69, 1.5, 1.5, 1.8, 1.8 { φ 3 (G) = max 0, 2 6, 2 } = 1 4 2

0,.055,.055,.211,... { } 2 φ 3 (G) = max 12, 2 12, 2 = 1 14 6

λ k = 0 φ k = 0 λ k

λ k λ k = 0 φ k = 0 (Lee,Oveis Gharan, T, 2012) λ k 2 φ k(g) O(k 2 ) λ k Upper bound is constructive: in nearly-linear time we can find k disjoint sets each of expansion at most O(k 2 ) λ k.

λ k 2 φ k(g) O(k 2 ) λ k λ k

λ k λ k 2 φ k(g) O(k 2 ) λ k φ k (G) O( (log k) λ 1.1k ) (Lee,Oveis Gharan, T, 2012) φ k (G) O( (log k) λ O(k) ) (Louis, Raghavendra, Tetali, Vempala, 2012) Factor O( log k) is tight

spectral clustering The following algorithm works well in practice. (But to solve what problem?) Compute k eigenvectors x (1),..., x (k) of k smallest eigenvalues of L Define mapping F : V R k F (v) := (x v (1), x v (2),..., x v (k) ) Apply k-means to the points that vertices are mapped to

k = 3 spectral embedding of a random graph

spectral clustering LOT algorithms and LRTV algorithm find k low-conductance sets if they exist First step is spectral embedding Then geometric partitioning but not k-means

λ 2 2 φ(g) 2λ 2 Cheeger inequality

Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm

Cheeger inequality λ 2 2 φ(g) 2λ 2 φ(g) O(k) λ 2 λk (Kwok, Lau, Lee, Oveis Gharan, T, 2013) and the upper bound holds for the sweep algorithm Nearly-linear time sweep algorithm finds S such that, for every k, ( ) k φ(s) φ(g) O λk

1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure λ k

1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure 2 For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) λ 2 + ) λ 2 x y λ k

1 We find a 2k-valued vector y such that ( ) x y 2 λ2 O where x is eigenvector of λ 2 proof structure 2 For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) λ 2 + ) λ 2 x y λ k So the sweep algorithm finds S φ(s) O(k) λ 2 1 λk

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0.

intuition for first result Given x, we find a 2k-valued vector y such that ( ) R(x) x y 2 O λ k Suppose R(x) = 0 and λ k > 0. Then x is constant on connected components (because R(x) = 0) G has < k connected components (because λ k > 0) So x is at distance zero from a vector (itself) that is (k 1)-valued

idea of proof of first result Consider x as an embedding of vertices to the line

idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance

idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance Round to closest interval boundary

idea of proof of first result Consider x as an embedding of vertices to the line Partition the points vertices are mapped to into 2k intervals, minimizing 2k-means with squared-distance Round to closest interval boundary If rounded vector is far from original vector, a lot of points are far from boundary; then we construct witness that λ k is small by finding k disjointly supported vectors of small Rayleigh quotient

intuition for second result For every k-valued vector y, applying the sweep algorithm to x finds a set S such that ( φ(s) O(k) R(x) + ) R(x) x y It was known: φ(s) 2R(x) x y = 0 φ(s) kr(x) Our proof blends the two arguments

eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components;

eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components; [Tanaka 2012] If λ k+1 > 5 k λ k, then vertices of G can be partitioned into k sets of small conductance, each inducing a subgraph of large conductance. [Non-algorithmic proof]

eigenvalue gap if λ k = 0 and λ k+1 > 0 then G has exactly k connected components; [Tanaka 2012] If λ k+1 > 5 k λ k, then vertices of G can be partitioned into k sets of small conductance, each inducing a subgraph of large conductance. [Non-algorithmic proof] [Oveis-Gharan, T 2013] Sufficient φ k+1 (G) > (1 + ɛ)φ k (G); if λ k+1 > O(k 2 ) λ k, then partition can be found algorithmically

bounded threshold rank [Arora, Barak, Steurer], [Barak, Raghavendra, Steurer], [Guruswami, Sinop],... If λ k is large, then a cut of approximately minimal conductance can be found in time exponential in k using spectral methods [ABS] or convex relaxations [BRS, GS]

bounded threshold rank [Arora, Barak, Steurer], [Barak, Raghavendra, Steurer], [Guruswami, Sinop],... If λ k is large, then a cut of approximately minimal conductance can be found in time exponential in k using spectral methods [ABS] or convex relaxations [BRS, GS] [Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweep algorithm finds good approximation if λ k is large [Oveis-Gharan, T]: the Linial-Goemans relaxation (solvable in polynomial time independent of k) can be rounded better than in the Arora-Rao-Vazirani analysis if λ k is large

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Kwok, Lau, Lee, Oveis Gharan, T): φ( output of algorithm ) O(k) λ 2 λk

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

challenge: explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 (Spielman, Teng) in bounded-degree planar graphs: ( ) 1 λ 2 O n so φ( output of algorithm ) O matching the planar separator theorem. ( 1 n )

explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2

explain practical performance of sweep algorithm λ 2 2 φ(g) φ( output of algorithm ) 2λ 2 Can we find interesting classes of graphs for which λ 2 << φ(g)?

lucatrevisan.wordpress.com/tag/expanders/