Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science

Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient linear algebra routines Perform well in practice for many problems in theory? Connections between spectral and combinatorial objects Connections to Markov Chains and Probability Theory Intuitive geometric viewpoint RECENT ADVANCES: Fast algorithms for fundamental combinatorial problems rely onspectraland optimizationideas

Spectral Algorithms for Graph Partitioning Spectral algorithms are widely used in many graph-partitioning applications: clustering, image segmentation, community-detection, etc. CLASSICAL VIEW: NEW TREND: - Based on Cheeger s Inequality - Eigenvectors sweep-cuts reveal sparse cuts in the graph - Random walk vectors replace eigenvectors: Fast Algorithms for Graph Partitioning Local Graph Partitioning Empirical Network Analysis - Different random walks: PageRank, Heat-Kernel, etc.

Graphs As Linear Operators Adjacency Matrix A A ji = A ij = ( w ij if i 6= j 0 if i = j Weights are similarities

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D d i = X j w ij, D = diag(d)

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W W = AD 1 Probability transition matrix

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D L is positive semidefinite, i.e., L º 0

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D L is positive semidefinite, i.e., L º 0 Form x T L x measures how quickly random walk mixes from vector x All-one vector 1 is 0-eigenvalue, because D1 is stationary distribution

Example: Spectral Clustering Small eigenvalues of Laplacian control clustering structure of graph min = 1 =0 by construction 2 =0 if G is disconnected Robust to noise when large eigengap Components are recovered by computing 2 nd smallest eigenvector aka Fiedler vector

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs Fiedler vector can be approximating by applying power method to W: For random y 0 s.t. y0 T D 1 1 = 0, compute D 1 W t y 0 Heuristic: For massive graphs, pick t as large as computationally affordable.

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness Real-world graphs are noisy GROUND TRUTH GRAPH

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness Real-world graphs are noisy NOISY MEASUREMENT GROUND-TRUTH GRAPH INPUT GRAPH GOAL: estimate eigenvector of ground-truth graph.

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness NOISY MEASUREMENT GROUND-TRUTH GRAPH INPUT GRAPH GOAL: estimate eigenvector of ground-truthgraph. OBSERVATION: eigenvector of input graph can have very large variance, as it can be very sensitive to noise RANDOM-WALK VECTORS provide better, morestableestimates.

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust GROUND-TRUTH GRAPH

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust + GROUND-TRUTH GRAPH _ Eigenvector

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust + GROUND-TRUTH GRAPH Eigenvector _ NOISY MEASUREMENT

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust GROUND-TRUTH GRAPH + Eigenvector NOISY MEASUREMENT Eigenvector Changes Completely _ OBSERVATION: eigenvector of input graph can be very sensitive to noise

Talk Outline Stable Analogues of the Laplacian Eigenvectors: Formulate eigenvector as convex optimization problem Regularization yields stable optimum Natural connection to random walks Applications to Graph Partitioning: Spectral Clustering Balanced Graph Partitioning Detecting Well-Connected Subgraphs

Stable Analogues of Laplacian Eigenvectors

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. Eigenvector is one-dimensional embedding of the graph: 0

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. Eigenvector is one-dimensional embedding of the graph: ISSUES: 1. Non-convexity 2. Stability requires multidimensional embeddings 0

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. One-dimensional embedding: Multidimensional SOCP min Tr(Y T LY ) s.t. Tr(Y T Y ) = 1 Tr(Y T 11 T )= 0 Y 2 R n k. Multidimensional embedding: 0 Programs have same one-dimensional optimum.

Laplacian Eigenvalue as a Semidefinite Program SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. One-dimensional embedding: min SDP Formulation L X s.t. I X =1 11 T X =0 Multidimensional embedding: X 0 0 Programs have same optimum. Take optimal solution: X = x (x ) T

Laplacian Eigenvalue as a Semidefinite Program SDP Formulation min L X s.t. I X =1 11 T X =0 X 0 Spectrahedron NB: Linear SDP is a convex representation of the original eigenvector problem Rank-1 optimal solution, i.e., 1-dimensional vector embedding

Regularization 101 Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM Stable optimum Unique optimal solution Smoothness conditions

What is Regularization? Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM min x2h L(x)+ F (x) Parameter > 0 Regularizer F Benefits of Regularization in Learning and Statistics: Decreases sensitivity to noise Prevents overfitting

Regularized Spectral Optimization SDP Formulation min L X F (X) s.t. I X =1 11 T X =0 X 0 ASSUMPTION: The regularizer F is 1-strongly convex over wrt some norm. Regularizer F 0 Parameter 0 RESULTS: Let X? be the optimal solution. 1. The embedding corresponding to will be multidimensional. X? X? 2. The embedding of can be approximated well in time O ( E polylog( E ))

Regularized Spectral Optimization F 0, 0 min L X F (X) s.t. I X =1 11 T X =0 X 0 ASSUMPTION: The regularizer F is 1-strongly convex over wrt some norm. RESULTS: Let X? be the optimal solution: 1. Multidimensional embedding, 2. Efficiently Computable, 3. Approximation guarantee: Objective value of SDP is near 2 L X? 2 apple max X2 F (X), 3. Stability aka smoothness under arbitrary perturbation H: kx? (L + H) X? (L)k apple khk.

Examples of Regularizers Applicable regularizers are SDP-versions of common regularizers von Neumann Entropy F H (X) =Tr(X log X) p-norm, p > 1 F p (X) = 1 p X p p = 1 p Tr(Xp ) 1/2-Quasi-Norm, F 1 2 (X) = 2 Tr(X 1 2 ) For each regularizer, we have a different trade-off between regularization and stability.

Regularized Eigenvectors and Random Walks Regularized SDP: min L X F (X) s.t. I X =1 11 T X =0 X 0 REGULARIZER F = F H Entropy OPTIMAL SOLUTION OF REGULARIZED PROGRAM HEAT-KERNEL DIFFUSION X? / e tl where t depends on F = F p p-norm F = F 1 2 ½-Quasi-Norm LAZY RANDOM WALK X? / (ai +(1 a)w ) 1 p 1 PERSONALIZED PAGERANK X? / (I aw ) 1 This interpretation has computational and analytical advantages where a depends on

Applications

Robust Spectral Clustering PROBLEM: Consider spectral clustering in the wild: - no eigengap assumption - input graph may be noisy APPROACH: Perform k-means of regularized spectral embedding. NOISY MEASUREMENT REGULARIZED EMBEDDING RESULT: guarantees on stability of clusters under different levels of noise With Zhenyu Liao (BU). In progress.

Balanced Graph Partitioning GOAL: Find large sparse cut S µ V : SPARSITY GUARANTEE on Conductance: (S) = E(S, S apple min{vol(s), vol( S)} SIZE GUARANTEE on Volume: 1 2 > vol(s ) vol(v ) > b S With Nisheeth Vishnoi (EPFL) and Sushant Sachdeva (Yale)

Eigenvector Approach: The Worst Case Recursive Eigenvector Algorithm: Compute smallest Laplacian eigenvector; Use it to find a cut; If cut is unbalanced, remove it and reiterate on residual graph. - (n) nearly-disconnected components

Regularized Eigenvector Approach EXPANDER The regularized eigenvector embedding reveals sparse large cut in one shot. RESULT: Balanced Graph Partitioning in Nearly-Linear Time O ( E polylog( E ))

Detecting Well-Connected Subsets GOAL: Detect subset S that is well-connected. Suppose second-smallest eigenvector looks like this: S 0 Question: what can we say about the connectivity of S? Using our stable analogue of the eigenvector, we obtain multidimensional embedding: S We can guarantee strong lower bounds on connectivity of the subgraph induced by S. Joint work with Nisheeth Vishnoi (EPFL) and Sushant Sachdeva (Yale)

and more Different Graph Partitioning Formulations: Local graph partitioning Different partitioning objectives Sparsification: Fastest construction of graph sparsifiers with constant average degree Uses PageRank regularizer Dynamically Evolving Networks: Analysis of local edge switching protocols that guarantee global connectivity properties THE END More Details at orecchia.net