Graph Partitioning Using Random Walks

Similar documents
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Locally-biased analytics

Lecture 13: Spectral Graph Theory

Graph Partitioning Algorithms and Laplacian Eigenvalues

Lecture: Local Spectral Methods (1 of 4)

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Convex Optimization of Graph Laplacian Eigenvalues

Diffusion and random walks on graphs

LECTURE NOTE #11 PROF. ALAN YUILLE

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

1 Matrix notation and preliminaries from spectral graph theory

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Spectral Clustering on Handwritten Digits Database

Random Sampling of Bandlimited Signals on Graphs

Lecture 12 : Graph Laplacians and Cheeger s Inequality

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

EIGENVECTOR NORMS MATTER IN SPECTRAL GRAPH THEORY

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Networks and Their Spectra

Approximating the Exponential, the Lanczos Method and an Õ(m)-Time Spectral Algorithm for Balanced Separator

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Introduction to Spectral Graph Theory and Graph Clustering

Communities, Spectral Clustering, and Random Walks

Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

Laplacian Matrices of Graphs: Spectral and Electrical Theory

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

Regularized Laplacian Estimation and Fast Eigenvector Approximation

Fast Linear Iterations for Distributed Averaging 1

Dissertation Defense

Spectral Clustering. Guokun Lai 2016/10

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Graph Sparsification I : Effective Resistance Sampling

Lecture: Some Practical Considerations (3 of 4)

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Lecture 10: October 27, 2016

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Manifold Regularization

Using Local Spectral Methods in Theory and in Practice

Convex Optimization M2

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

On sums of eigenvalues, and what they reveal about graphs and manifolds

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Convex Optimization of Graph Laplacian Eigenvalues

8.1 Concentration inequality for Gaussian random matrix (cont d)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Markov Chains and Spectral Clustering

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Machine Learning for Data Science (CS4786) Lecture 11

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

COMPSCI 514: Algorithms for Data Science

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

18.S096: Spectral Clustering and Cheeger s Inequality

Data Mining Techniques

Network Localization via Schatten Quasi-Norm Minimization

Dimension reduction for semidefinite programming

Unsupervised dimensionality reduction

Eigenvalue Problems Computation and Applications

Spectral Clustering. Zitao Liu

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

On a Cut-Matching Game for the Sparsest Cut Problem

Partitioning Algorithms that Combine Spectral and Flow Methods

Laplacian and Random Walks on Graphs

Facebook Friends! and Matrix Functions

The Second Eigenvalue of the Google Matrix

Semi Supervised Distance Metric Learning

An indicator for the number of clusters using a linear map to simplex structure

STA141C: Big Data & High Performance Statistical Computing

Communities, Spectral Clustering, and Random Walks

Fast Approximation Algorithms for Graph Partitioning using Spectral and Semidefinite-Programming Techniques

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Lecture 1. 1 if (i, j) E a i,j = 0 otherwise, l i,j = d i if i = j, and

Nonlinear Dimensionality Reduction

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

Network Topology Inference from Non-stationary Graph Signals

Spectral Graph Theory

Sparse Subspace Clustering

Data-dependent representations: Laplacian Eigenmaps

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Spectral Graph Theory

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University

Eugene Wigner [4]: in the natural sciences, Communications in Pure and Applied Mathematics, XIII, (1960), 1 14.

Data Mining and Matrices

Spectra of Adjacency and Laplacian Matrices

Information Recovery from Pairwise Measurements

The Nyström Extension and Spectral Methods in Learning

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

Predicting Graph Labels using Perceptron. Shuang Song

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Transcription:

Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science

Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient linear algebra routines Perform well in practice for many problems in theory? Connections between spectral and combinatorial objects Connections to Markov Chains and Probability Theory Intuitive geometric viewpoint RECENT ADVANCES: Fast algorithms for fundamental combinatorial problems rely onspectraland optimizationideas

Spectral Algorithms for Graph Partitioning Spectral algorithms are widely used in many graph-partitioning applications: clustering, image segmentation, community-detection, etc. CLASSICAL VIEW: - Based on Cheeger s Inequality - Eigenvectors sweep-cuts reveal sparse cuts in the graph

Spectral Algorithms for Graph Partitioning Spectral algorithms are widely used in many graph-partitioning applications: clustering, image segmentation, community-detection, etc. CLASSICAL VIEW: NEW TREND: - Based on Cheeger s Inequality - Eigenvectors sweep-cuts reveal sparse cuts in the graph - Random walk vectors replace eigenvectors: Fast Algorithms for Graph Partitioning Local Graph Partitioning Empirical Network Analysis - Different random walks: PageRank, Heat-Kernel, etc.

Graphs As Linear Operators Adjacency Matrix A A ji = A ij = ( w ij if i 6= j 0 if i = j Weights are similarities

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D d i = X j w ij, D = diag(d)

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W W = AD 1 Probability transition matrix

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D L is positive semidefinite, i.e., L º 0

Adjacency Matrix A Graphs As Linear Operators Diagonal Degree Matrix D Random Walk Matrix W Laplacian Matrix L W = AD 1 L = D A =(I W )D L is positive semidefinite, i.e., L º 0 Form x T L x measures how quickly random walk mixes from vector x All-one vector 1 is 0-eigenvalue, because D1 is stationary distribution

Example: Spectral Clustering Small eigenvalues of Laplacian control clustering structure of graph min = 1 =0 by construction 2 =0 if G is disconnected Robust to noise when large eigengap Components are recovered by computing 2 nd smallest eigenvector aka Fiedler vector

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs Fiedler vector can be approximating by applying power method to W: For random y 0 s.t. y0 T D 1 1 = 0, compute D 1 W t y 0 Heuristic: For massive graphs, pick t as large as computationally affordable.

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness Real-world graphs are noisy GROUND TRUTH GRAPH

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness Real-world graphs are noisy NOISY MEASUREMENT GROUND-TRUTH GRAPH INPUT GRAPH GOAL: estimate eigenvector of ground-truth graph.

Why Random Walks? A Practitioner s View Advantages of Random Walks: 1) Quick approximation to eigenvector in massive graphs 2) Statistical robustness NOISY MEASUREMENT GROUND-TRUTH GRAPH INPUT GRAPH GOAL: estimate eigenvector of ground-truthgraph. OBSERVATION: eigenvector of input graph can have very large variance, as it can be very sensitive to noise RANDOM-WALK VECTORS provide better, morestableestimates.

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust GROUND-TRUTH GRAPH

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust + GROUND-TRUTH GRAPH _ Eigenvector

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust + GROUND-TRUTH GRAPH Eigenvector _ NOISY MEASUREMENT

What s Wrong With Eigenvectors? 1) Computationally, eigenvector computation may be slow 2) Statistically, eigenvectors are not robust GROUND-TRUTH GRAPH + Eigenvector NOISY MEASUREMENT Eigenvector Changes Completely _ OBSERVATION: eigenvector of input graph can be very sensitive to noise

Talk Outline Stable Analogues of the Laplacian Eigenvectors: Formulate eigenvector as convex optimization problem Regularization yields stable optimum Natural connection to random walks Applications to Graph Partitioning: Spectral Clustering Balanced Graph Partitioning Detecting Well-Connected Subgraphs

Stable Analogues of Laplacian Eigenvectors

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. Eigenvector is one-dimensional embedding of the graph: 0

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. Eigenvector is one-dimensional embedding of the graph: ISSUES: 1. Non-convexity 2. Stability requires multidimensional embeddings 0

The Laplacian Eigenvalue Problem SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. One-dimensional embedding: Multidimensional SOCP min Tr(Y T LY ) s.t. Tr(Y T Y ) = 1 Tr(Y T 11 T )= 0 Y 2 R n k. Multidimensional embedding: 0 Programs have same one-dimensional optimum.

Laplacian Eigenvalue as a Semidefinite Program SOCP Formulation for Regular G min x T Lx s.t. kxk 2 =1 x T 1 = 0 x 2 R n. One-dimensional embedding: min SDP Formulation L X s.t. I X =1 11 T X =0 Multidimensional embedding: X 0 0 Programs have same optimum. Take optimal solution: X = x (x ) T

Laplacian Eigenvalue as a Semidefinite Program SDP Formulation min L X s.t. I X =1 11 T X =0 X 0 Spectrahedron NB: Linear SDP is a convex representation of the original eigenvector problem Rank-1 optimal solution, i.e., 1-dimensional vector embedding

Regularization 101 Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM Stable optimum Unique optimal solution Smoothness conditions

What is Regularization? Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM min x2h L(x)+ F (x) Parameter > 0 Regularizer F Benefits of Regularization in Learning and Statistics: Decreases sensitivity to noise Prevents overfitting

Regularized Spectral Optimization SDP Formulation min L X F (X) s.t. I X =1 11 T X =0 X 0 ASSUMPTION: The regularizer F is 1-strongly convex over wrt some norm. Regularizer F 0 Parameter 0 RESULTS: Let X? be the optimal solution. 1. The embedding corresponding to will be multidimensional. X? X? 2. The embedding of can be approximated well in time O ( E polylog( E ))

Regularized Spectral Optimization F 0, 0 min L X F (X) s.t. I X =1 11 T X =0 X 0 ASSUMPTION: The regularizer F is 1-strongly convex over wrt some norm. RESULTS: Let X? be the optimal solution: 1. Multidimensional embedding, 2. Efficiently Computable, 3. Approximation guarantee: Objective value of SDP is near 2 L X? 2 apple max X2 F (X), 3. Stability aka smoothness under arbitrary perturbation H: kx? (L + H) X? (L)k apple khk.

Examples of Regularizers Applicable regularizers are SDP-versions of common regularizers von Neumann Entropy F H (X) =Tr(X log X) p-norm, p > 1 F p (X) = 1 p X p p = 1 p Tr(Xp ) 1/2-Quasi-Norm, F 1 2 (X) = 2 Tr(X 1 2 ) For each regularizer, we have a different trade-off between regularization and stability.

Regularized Eigenvectors and Random Walks Regularized SDP: min L X F (X) s.t. I X =1 11 T X =0 X 0 REGULARIZER F = F H Entropy OPTIMAL SOLUTION OF REGULARIZED PROGRAM HEAT-KERNEL DIFFUSION X? / e tl where t depends on F = F p p-norm F = F 1 2 ½-Quasi-Norm LAZY RANDOM WALK X? / (ai +(1 a)w ) 1 p 1 PERSONALIZED PAGERANK X? / (I aw ) 1 This interpretation has computational and analytical advantages where a depends on

Applications

Robust Spectral Clustering PROBLEM: Consider spectral clustering in the wild: - no eigengap assumption - input graph may be noisy APPROACH: Perform k-means of regularized spectral embedding. NOISY MEASUREMENT REGULARIZED EMBEDDING RESULT: guarantees on stability of clusters under different levels of noise With Zhenyu Liao (BU). In progress.

Balanced Graph Partitioning GOAL: Find large sparse cut S µ V : SPARSITY GUARANTEE on Conductance: (S) = E(S, S apple min{vol(s), vol( S)} SIZE GUARANTEE on Volume: 1 2 > vol(s ) vol(v ) > b S With Nisheeth Vishnoi (EPFL) and Sushant Sachdeva (Yale)

Eigenvector Approach: The Worst Case Recursive Eigenvector Algorithm: Compute smallest Laplacian eigenvector; Use it to find a cut; If cut is unbalanced, remove it and reiterate on residual graph. - (n) nearly-disconnected components

Eigenvector Approach: The Worst Case Recursive Eigenvector Algorithm: Compute smallest Laplacian eigenvector; Use it to find a cut; If cut is unbalanced, remove it and reiterate on residual graph. S S 3 2 S 1 - (n) components - (n) eigenvector computations

Regularized Eigenvector Approach EXPANDER The regularized eigenvector embedding reveals sparse large cut in one shot. RESULT: Balanced Graph Partitioning in Nearly-Linear Time O ( E polylog( E ))

Detecting Well-Connected Subsets GOAL: Detect subset S that is well-connected. Suppose second-smallest eigenvector looks like this: S 0 Question: what can we say about the connectivity of S? Using our stable analogue of the eigenvector, we obtain multidimensional embedding: S We can guarantee strong lower bounds on connectivity of the subgraph induced by S. Joint work with Nisheeth Vishnoi (EPFL) and Sushant Sachdeva (Yale)

and more Different Graph Partitioning Formulations: Local graph partitioning Different partitioning objectives Sparsification: Fastest construction of graph sparsifiers with constant average degree Uses PageRank regularizer Dynamically Evolving Networks: Analysis of local edge switching protocols that guarantee global connectivity properties THE END More Details at orecchia.net