Spectral Clusterng Shannon Qunn (wth thanks to Wllam Cohen of Carnege Mellon Unverst, and J. Leskovec, A. Raaraman, and J. Ullman of Stanford Unverst)
Graph Parttonng Undrected graph B- parttonng task: 3 Dvde vertces nto two dsont groups 4 5 6 A 3 Questons: How can we delne a good partton of? How can we eflcentl dentf such a partton? 4 5 6 B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org
Graph Parttonng What makes a good partton? Maxmze the number of wthn- group connectons Mnmze the number of between- group connectons 5 3 4 6 A B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3
Graph Cuts Express parttonng obectves as a functon of the edge cut of the partton Cut: Set of edges wth onl one vertex n a group: cut( A, B) = w A, B A 3 4 5 6 B cut(a,b) = J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 4
Graph Cut Crteron Crteron: Mnmum- cut Mnmze weght of connectons between groups arg mn A,B cut(a,b) Degenerate case: Optmal cut Mnmum cut Problem: Onl consders external cluster connectons Does not consder nternal cluster connectvt J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 5
[Sh-Malk] Graph Cut Crtera Crteron: Normalzed- cut [Sh- Malk, 97] Connectvt between groups relatve to the denst of each group cut( A, B) ncut ( A, B) = + vol( A) cut( A, B) vol( B) : total weght of the edges wth at least one endpont n : n Wh use ths crteron? n Produces more balanced parttons How do we eflcentl Lnd a good partton? Problem: Computng optmal cut s NP- hard J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 6
Spectral Graph Parttonng A: adacenc matrx of undrected G A = f s an edge, else 0 x s a vector n R n wth components Thnk of t as a label/value of each node of What s the meanng of A x? Entr s a sum of labels x of neghbors of J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7 " # $ $ $ % & = " # $ $ $ % & " # $ $ $ % & n n nn n n x x a a a a = = = E n x x A ), (
What s the meanng of Ax? th coordnate of A x : & $ Sum of the x- values $ of neghbors of $ % an Make ths a new value at node Spectral Graph Theor: Λ = Analze the spectrum of matrx representng a n & x $ $ $ % x Spectrum: Egenvectors of a graph, ordered b the magntude (strength) of ther correspondng egenvalues : a a nn # " A x=λ x n # " = & x λ $ $ $ % x n # " { n λ, λ,..., λ } λ λ... λ n J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8
Matrx Representatons Adacenc matrx (A): n n matrx A=[a ], a = f edge between node and 3 4 5 6 3 4 5 6 0 0 0 0 0 0 0 3 0 0 0 4 0 0 0 Important propertes: Smmetrc matrx Egenvectors are real and orthogonal 5 0 0 0 6 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9
Matrx Representatons Degree matrx (D): n n dagonal matrx D=[d ], d = degree of node 3 4 5 6 5 3 0 0 0 0 0 0 0 0 0 0 3 4 6 3 0 0 3 0 0 0 4 0 0 0 3 0 0 5 0 0 0 0 3 0 6 0 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 0
Matrx Representatons Laplacan matrx (L): n n smmetrc matrx 3 4 5 6 3 4 5 6 3 - - 0-0 - - 0 0 0 3 - - 3-0 0 4 0 0-3 - - 5-0 0-3 - 6 0 0 0 - - L = D A What s trval egenpar? Important propertes: Egenvalues are non- negatve real numbers Egenvectors are real and orthogonal J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors e 0.4 0. 0.0 x x x x x -0. -0.4 z z z e e 3-0.4-0. 0 0. e [Sh & Mela, 00]
Spectral Clusterng: Graph = Matrx W*v = v propagates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ If Ws connected but roughl block dagonal wth k blocks then the top egenvector s a constant vector the next k egenvectors are roughl pecewse constant wth peces correspondng to blocks M
Spectral Clusterng: Graph = Matrx W*v = v propagates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ If W s connected but roughl block dagonal wth k blocks then the top egenvector s a constant vector the next k egenvectors are roughl pecewse constant wth peces correspondng to blocks Spectral clusterng: Fnd the top k+ egenvectors v,,v k+ Dscard the top one Replace ever node a wth k-dmensonal vector x a = <v (a),,v k+ (a) > Cluster wth k-means M
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Suppose each ()=+ or -: Then s a cluster ndcator that splts the nodes nto two what s T (D-A)?
= + = + = = = = T T T a a a a a a a a d a d A D A D,,,,,,,,,,,, ) ( ) ( = sze of CUT() ) NCUT( szeof ) ( = W I T NCUT: roughl mnmze rato of transtons between classes vs transtons wthn classes
So far How to delne a good partton of a graph? Mnmze a gven graph cut crteron How to eflcentl dentf such a partton? Approxmate usng nformaton provded b the egenvalues and egenvectors of a graph Spectral Clusterng J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7
Spectral Clusterng Algorthms Three basc stages: ) Pre- processng Construct a matrx representaton of the graph ) Decomposton Compute egenvalues and egenvectors of the matrx Map each pont to a lower- dmensonal representaton based on one or more egenvectors 3) Groupng Assgn ponts to two or more clusters, based on the new representaton J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8
Example: Spectral Parttonng Value of x Rank n x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9
Example: Spectral Parttonng Components of x Value of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Rank n x 0
Example: Spectral parttonng Components of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Components of x 3
k-wa Spectral Clusterng How do we partton a graph nto k clusters? Two basc approaches: Recursve b- parttonng [Hagen et al., 9] Recursvel appl b- parttonng algorthm n a herarchcal dvsve manner Dsadvantages: InefLcent, unstable Cluster multple egenvectors [Sh- Malk, 00] Buld a reduced space from multple egenvectors Commonl used n recent papers A preferable approach J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org
Wh use multple egenvectors? Approxmates the optmal cut [Sh- Malk, 00] Can be used to approxmate optmal k- wa normalzed cut Emphaszes cohesve clusters Increases the unevenness n the dstrbuton of the data Assocatons between smlar ponts are amplled, assocatons between dssmlar ponts are attenuated The data begns to approxmate a clusterng Well- separated space Transforms data to a new embedded space, consstng of k orthogonal bass vectors Multple egenvectors prevent nstablt due to nformaton loss J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Q: How do I pck v to be an egenvector for a blockstochastc matrx?
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ How do I pck v to be an egenvector for a blockstochastc matrx?
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Suppose each ()=+ or -: Then s a cluster ndcator that cuts the nodes nto two what s T (D-A)? The cost of the graph cut defned b what s T (I-W)? Also a cost of a graph cut defned b How to mnmze t? Turns out: to mnmze T X / ( T ) fnd smallest egenvector of X But: ths wll not be +/-, so t s a relaxed soluton
Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ λ e λ e 3 egengap λ 3 λ 4 λ, 5 6,7, e. [Sh & Mela, 00]
Some more terms If A s an adacenc matrx (mabe weghted) and D s a (dagonal) matrx gvng the degree of each node Then D- A s the (unnormalzed) Laplacan W=AD - s a probablstc adacenc matrx I- W s the (normalzed or random- walk) Laplacan etc. The largest egenvectors of W correspond to the smallest egenvectors of I- W So sometmes people talk about bottom egenvectors of the Laplacan
A W A W K-nn graph (eas) Full connected graph, weghted b dstance
Spectral Clusterng: Pros and Cons Elegant, and well- founded mathematcall Works qute well when relatons are approxmatel transtve (lke smlart) Ver nos datasets cause problems Informatve egenvectors need not be n top few Performance can drop suddenl from good to terrble Expensve for ver large datasets Computng egenvectors s the bottleneck
Use cases and runtmes K- Means FAST Embarrassngl parallel Not ver useful on ansotropc data Spectral clusterng Excellent qualt under man dfferent data forms Much slower than K- Means
Further Readng Spectral Clusterng Tutoral: http://www.nformatk.un- hamburg.de/ml/ contents/people/luxburg/publcatons/ Luxburg07_tutoral.pdf