Four graph partitioning algorithms. Fan Chung University of California, San Diego

Similar documents
The heat kernel as the pagerank of a graph

A local graph partitioning algorithm using heat kernel pagerank

Graph Partitioning Using Random Walks

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

Diffusion and random walks on graphs

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Cutting Graphs, Personal PageRank and Spilling Paint

Personal PageRank and Spilling Paint

Random Walks and Evolving Sets: Faster Convergences and Limitations

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Facebook Friends! and Matrix Functions

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Vectorized Laplacians for dealing with high-dimensional data sets

A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Machine Learning for Data Science (CS4786) Lecture 11

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Graph Partitioning Algorithms and Laplacian Eigenvalues

Spectral Analysis of Random Walks

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Locally-biased analytics

1 Matrix notation and preliminaries from spectral graph theory

Lecture 13: Spectral Graph Theory

Networks and Their Spectra

Lecture: Local Spectral Methods (1 of 4)

Algorithms, Graph Theory, and Linear Equa9ons in Laplacians. Daniel A. Spielman Yale University

Eigenvalue Problems Computation and Applications

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University

Laplacian and Random Walks on Graphs

CMPSCI 791BB: Advanced ML: Laplacian Learning

Using Local Spectral Methods in Theory and in Practice

A local algorithm for finding dense bipartite-like subgraphs

Heat Kernel Based Community Detection

Link Analysis Ranking

UNIVERSITY OF CALIFORNIA, SAN DIEGO. Local and Distributed Computation for Large Graphs

Finding central nodes in large networks

Parallel Local Graph Clustering

ECEN 689 Special Topics in Data Science for Communications Networks

Laplacian Matrices of Graphs: Spectral and Electrical Theory

Spectral Graph Theory

Convergence of Random Walks

The Push Algorithm for Spectral Ranking

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS6220: DATA MINING TECHNIQUES

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Eugene Wigner [4]: in the natural sciences, Communications in Pure and Applied Mathematics, XIII, (1960), 1 14.

Regularized Laplacian Estimation and Fast Eigenvector Approximation

Linear algebra and applications to graphs Part 1

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

0.1 Naive formulation of PageRank

Online Social Networks and Media. Link Analysis and Web Search

1998: enter Link Analysis

eigenvalues, markov matrices, and the power method

Multi-commodity allocation for dynamic demands using PageRank vectors

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Spectral radius, symmetric and positive matrices

Analytically tractable processes on networks

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Combinatorics of optimal designs

An Algorithmist s Toolkit September 10, Lecture 1

Introducing the Laplacian

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

eigenvalue bounds and metric uniformization Punya Biswal & James R. Lee University of Washington Satish Rao U. C. Berkeley

CS6220: DATA MINING TECHNIQUES

Markov Chains, Random Walks on Graphs, and the Laplacian

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018

Solving Local Linear Systems with Boundary Conditions Using Heat Kernel Pagerank

Data Mining Recitation Notes Week 3

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Graph Models The PageRank Algorithm

Mathematical Properties & Analysis of Google s PageRank

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

PSRGs via Random Walks on Graphs

Testing Cluster Structure of Graphs. Artur Czumaj

CS249: ADVANCED DATA MINING

Network Essence: PageRank Completion and Centrality-Conforming Markov Chains

Kristina Lerman USC Information Sciences Institute

arxiv: v1 [math.co] 27 Jul 2012

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Dirichlet PageRank and Ranking Algorithms Based on Trust and Distrust

STA141C: Big Data & High Performance Statistical Computing

Algebraic Representation of Networks

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Ranking and sparsifying a connection graph

Transcription:

Four graph partitioning algorithms Fan Chung University of California, San Diego

History of graph partitioning NP-hard approximation algorithms Spectral method, Fiedler 73, Folklore Multicommunity flow, Leighton+Rao 88 Semidefinite programming, xxxxxxxxxxxxxarora+rao+vazirani 04 Expander flow, Arora+Hazan+Kale 04 Single commodity flows, xxxxxxxxxxkhandekar+rao+vazirani 06

traditional applications of graph partition algorithms: Divide-and-conquer algorithms Circuit layout & designs Parallel computing Hierarchical clusterings Bioinformatics

Applications of partitioning algorithms for massive graphs Web search identify communities locate hot spots trace targets combat link spam epidemics

Motivations Outline of the talk Conductance and Cheeger s inequality Four graph paritioning algorithms xxby using: eigenvectors random walks PageRank heat kernel Local graph algorithms Future directions

Two types of cuts: Vertex cut edge cut How good is the cut?

e(s,v-s) Vol S e(s,v-s) S Vol S = Σ v ε S deg(v) S = Σ 1 v ε S V-S S

The Cheeger constant for graphs The Cheeger constant h Φ G G = ess (, ) min S min( vol S, vol S ) The volume of S is vol( S) = dx h G and its variations are sometimes called Φ G x S conductance, isoperimetric number,

The Cheeger inequality The Cheeger constant Φh G = ess (, ) min S min( volsvols, ) The Cheeger inequality 2Φ λ G λ : the first nontrivial eigenvalue of the xx(normalized) Laplacian. Φ 2 G 2

The spectrum of a graph Adjacency matrix Many ways to define the spectrum of a graph. How are the eigenvalues related to properties of graphs?

The spectrum of a graph Adjacency matrix Combinatorial Laplacian L= D A diagonal degree matrix Normalized Laplacian Random walks Rate of convergence adjacency matrix

The spectrum of a graph Discrete Laplace operator 1 Δ f ( x) = ( f ( x) f( y)) d Lxy (, ) = not symmetric in general x y x 1 if x = y { 1 if x y and x y d x Normalized Laplacian symmetric normalized with eigenvalues L( x, y ) = { 1 if x = y 1 if x y and x y d d x 0= λ0 λ1 λn 1 2 = λ y

Can you hear the shape of a network? λ dictates many properties of a graph. connectivity diameter isoperimetry z(bottlenecks) How good is the cut by using the eigenvalue λ?

Finding a cut by a sweep Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number.

Finding a cut by a sweep Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number. Still, there is a lower bound guarantee by using the Cheeger inequality. 2Φ λ Φ 2 2

Partitioning algorithm The Cheeger inequality Using eigenvector f, the Cheeger inequality can be stated as α Φ 2Φ λ 2 2 2 2 where λ is the first non-trivial eigenvalue of the Laplacian and α is the minimum Cheeger ratio in a sweep using the eigenvector. f

Eigenvalue problem for n x n matrix:. n 30 billion websites Hard to compute eigenvalues Even harder to compute eigenvectors

In the old days, compute for a given (whole) graph. In reality, can only afford to compute locally. (Access to a (huge) graph, e.g., for a vertex v, find its neighbors. Bounded number of access.)

Finding a cut by a sweep Using a sweep by the eigenvector can reduce the exponential number of choices of subsets to a linear number. Using a local sweep by random walks, PageRank and its variations can further reduce the a linear number of choices to a specified finite number of sizes.

Four one-sweep graph partitioning algorithms graph spectral method spectral partition algorithm random walks PageRank local partition algorithms heat kernel

4 Partitioning algorithm 4 Cheeger inequalities graph spectral method random walks PageRank Fiedler 73, Cheeger, 60 s Mihail 89 Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 heat kernel Chung, PNAS, 08.

Graph partitioning Local graph partitioning Courtesy of Reid Andersen

What is a local graph partitioning algorithm? A local graph partitioning algorithm finds a small cut near the given seed(s) with running time depending only on the size of the output.

The definition of PageRank given by Brin and Page is based on random walks.

Partitioning Computing PageRank History of computing Pagerank Brin+Page 98 Personalized PageRank, Haveliwala 03 Computing personalized PageRank, xxxxxxxxxxxxxxxxxxxxxxxxjeh+widom 03 xxxxxxxxxxxxxxxxxxxxxxxx Berkhin 06

Random walks in a graph. G : a graph P : transition probability matrix Puv (, ) = 1 d u if u v, 0 otherwise. d u := the degree of u. A lazy walk: W = I + P 2

Original definition of PageRank A (bored) surfer either surf a random webpage with probability α or surf a linked webpage with probability 1- α α : the jumping constant p = α + 1 1 1 (,,..., ) (1 α n n n ) pw

Definition of personalized PageRank Two equivalent ways to define PageRank pr(α,s) (1) p = αs+ (1 α) pw s: the seed as a row vector α : the jumping constant s

Definition of PageRank Two equivalent ways to define PageRank p=pr(α,s) (1) p = αs+ (1 α) pw (2) p s = ( 1, 1,..., 1 n n n) t t = α (1 α) ( sw ) t= 0 s = some seed, e.g., (1,0,...,0) the (original) PageRank personalized PageRank (Organize the random walks by a scalar α.)

Partitioning algorithm using random walks Mihail 89, Lovász+Simonovits, 90, 93 k vol( S) W ( u, S) π ( S) 1 du β 2 k 8 k Leads to a Cheeger inequality: β 2Φ λ 8logn β G Φ 8logn 2 2 G where is the minimum Cheeger ratio over sweeps by using a lazy walk of k steps from every vertex for an appropriate range of k.

Algorithmic aspects of PageRank Fast approximation algorithm for x personalized PageRank greedy type algorithm, linear complexity Can use the jumping constant to approximate PageRank with a support of the desired size. Errors can be effectively bounded.

Approximate the pagerank vector : pr( α, s) = p+ pr( α, r) Approximate pagerank Residue vector

Partitioning algorithm using PageRank Using the PageRank vector with seed as a subset S vol( S) vol( G)/4, and a Cheeger inequality can be obtained : γ ΦS 8logs γ u Φ 2 2 u u 8log where is the minimum Cheeger ratio over sweeps by using personalized PageRank with a random seed in S. The volume of the set of such u is > vol(s)/4. s

A partitioning algorithm using PageRank Algorithm(φ,s,b): Compute ε-approximate Pagerank p=pr(α,s) with α=0.1/(φ 2 b), ε=2 -b /b. One sweep algorithm using p for finding cuts with conductance < φ. Performance analysis: If s is in a set S with conductance Φ>φ 2 log s, with constant probability, the algorithm outputs a cut C with condutance < φ, of size 1 order s and vol( C S) > vol( S). (Improving previous bounds by a factor of φlog s. ) 4

Courtesy of Reid Andersen.

Courtesy of Reid Andersen

Kevin Lang 2007

4 Partitioning algorithm 4 Cheeger inequalities graph spectral method random walks PageRank Fiedler 73, Cheeger, 60 s Mihail 89 Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 heat kernel Chung, PNAS, 08.

PageRank versus heat kernel, s k k (1 ) ( sw ) ρts, k = 0 pα = α α t = e k = 0 s ( tw ) k! Geometric sum Exponential sum k

PageRank versus heat kernel p k k α, s = α (1 α) ( sw ) ρts, k = 0 t = e k = 0 s ( tw ) k! Geometric sum Exponential sum k p = α + (1 α) pw ρ = ρ( I W) t recurrence Heat equation

Definition of heat kernel t t t Ht = e I + tw + W + + W + 2 k! t( I W) = e = e tl 2 k 2 k (......) 2 k t 2 k t k = I tl+ L +... + ( 1) L +... 2 k! Ht = ( I W) Ht t ρ = sh ts, t

Partitioning algorithm using the heat kernel Theorem: vol( S) 2 tκtu, /4 ρtu, ( S) π( S) e du where κ tu, is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S.

Partitioning algorithm using the heat kernel Theorem: vol( S) 2 tκtu, /4 ρtu, ( S) π( S) e du where κ tu, is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S. Theorem: For vol S 2/3 ( ) vol( G), ρ, ( S ) π th ( S ) e S ts. (Improving the previous PageRank lower bound 1-t h S.)

Theorem: ( S), ( S ) ( S S ts ) (1 ( S )) e π ρ π π ht/1 ( ) Sketch of a proof: Consider Show Then Ft ( ) = log( ρ ( S) π ( S)) 2 Ft () 0 2 t ts, ΦS Ft () F(0) = t t 1 π S ( ) Solve and get /1 ( ( )), ( S ) ( S ht S S ts ) (1 ( S )) e π ρ π π

Random walks versus heat kernel How fast is the convergence to the stationary distribution? For what k, can one have fw k π? Choose t to satisfy the required property.

Partitioning algorithm using the heat kernel Using the upper and lower bounds, a Cheeger inequality can be obtained : 2 2 κ S ΦS ΦS λs 8 8 where λ S is the Dirichlet eigenvalue of the Laplacian, and κ S is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t.

Dirichlet eigenvalues for a subset S V λ S = inf f u v ( f( u) f( v)) w f( w) 2 d w 2 S V over all f satisfying the Dirichlet boundary condition: f() v = 0 for all v S.

Partitioning algorithm using the heat kernel Using the upper and lower bounds, a Cheeger inequality can be obtained : 2 2 κ S ΦS ΦS λs 8 8 where λ S is the Dirichlet eigenvalue of the Laplacian, and κ S is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t.

Partitioning algorithm using the heat kernel Using the upper and lower bounds, a Cheeger inequality can be obtained : 2 2 κ S ΦS ΦS λs 8 8 where λ S is the Dirichlet eigenvalue of the Laplacian, and κ S is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t.

Partitioning algorithm using the heat kernel Using the upper and lower bounds, a Cheeger inequality can be obtained : 2 2 κu Φu ΦS λs 8logs 8logs where λ S is the Dirichlet eigenvalue of the Laplacian, and κ u is the minimum Cheeger ratio over sweeps by using heat kernel with a random seed in S. The volume of the set of such u is > vol(s)/4.

Courtesy of Reid Andersen

Future directions: Use PageRank and the heat kernel xxpagerank to shed light on: The geometry of graphs? Solving combinatorial problems, such xxas covering, packing, matching, etc. Graph drawing, visualization Metric embedding

Courtesy of Reid Andersen