Spectral Clustering. Shannon Quinn

Spectral Clusterng Shannon Qunn (wth thanks to Wllam Cohen of Carnege Mellon Unverst, and J. Leskovec, A. Raaraman, and J. Ullman of Stanford Unverst)

Graph Parttonng Undrected graph B- parttonng task: 3 Dvde vertces nto two dsont groups 4 5 6 A 3 Questons: How can we delne a good partton of? How can we eflcentl dentf such a partton? 4 5 6 B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Graph Parttonng What makes a good partton? Maxmze the number of wthn- group connectons Mnmze the number of between- group connectons 5 3 4 6 A B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3

Graph Cuts Express parttonng obectves as a functon of the edge cut of the partton Cut: Set of edges wth onl one vertex n a group: cut( A, B) = w A, B A 3 4 5 6 B cut(a,b) = J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 4

Graph Cut Crteron Crteron: Mnmum- cut Mnmze weght of connectons between groups arg mn A,B cut(a,b) Degenerate case: Optmal cut Mnmum cut Problem: Onl consders external cluster connectons Does not consder nternal cluster connectvt J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 5

[Sh-Malk] Graph Cut Crtera Crteron: Normalzed- cut [Sh- Malk, 97] Connectvt between groups relatve to the denst of each group cut( A, B) ncut ( A, B) = + vol( A) cut( A, B) vol( B) : total weght of the edges wth at least one endpont n : n Wh use ths crteron? n Produces more balanced parttons How do we eflcentl Lnd a good partton? Problem: Computng optmal cut s NP- hard J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 6

Spectral Graph Parttonng A: adacenc matrx of undrected G A = f s an edge, else 0 x s a vector n R n wth components Thnk of t as a label/value of each node of What s the meanng of A x? Entr s a sum of labels x of neghbors of J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7 " # $ $ $ % & = " # $ $ $ % & " # $ $ $ % & n n nn n n x x a a a a = = = E n x x A ), (

What s the meanng of Ax? th coordnate of A x : & $ Sum of the x- values $ of neghbors of $ % an Make ths a new value at node Spectral Graph Theor: Λ = Analze the spectrum of matrx representng a n & x $ $ $ % x Spectrum: Egenvectors of a graph, ordered b the magntude (strength) of ther correspondng egenvalues : a a nn # " A x=λ x n # " = & x λ $ $ $ % x n # " { n λ, λ,..., λ } λ λ... λ n J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8

Matrx Representatons Adacenc matrx (A): n n matrx A=[a ], a = f edge between node and 3 4 5 6 3 4 5 6 0 0 0 0 0 0 0 3 0 0 0 4 0 0 0 Important propertes: Smmetrc matrx Egenvectors are real and orthogonal 5 0 0 0 6 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9

Matrx Representatons Degree matrx (D): n n dagonal matrx D=[d ], d = degree of node 3 4 5 6 5 3 0 0 0 0 0 0 0 0 0 0 3 4 6 3 0 0 3 0 0 0 4 0 0 0 3 0 0 5 0 0 0 0 3 0 6 0 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 0

Matrx Representatons Laplacan matrx (L): n n smmetrc matrx 3 4 5 6 3 4 5 6 3 - - 0-0 - - 0 0 0 3 - - 3-0 0 4 0 0-3 - - 5-0 0-3 - 6 0 0 0 - - L = D A What s trval egenpar? Important propertes: Egenvalues are non- negatve real numbers Egenvectors are real and orthogonal J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors e 0.4 0. 0.0 x x x x x -0. -0.4 z z z e e 3-0.4-0. 0 0. e [Sh & Mela, 00]

Spectral Clusterng: Graph = Matrx W*v = v propagates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ If W s connected but roughl block dagonal wth k blocks then the top egenvector s a constant vector the next k egenvectors are roughl pecewse constant wth peces correspondng to blocks Spectral clusterng: Fnd the top k+ egenvectors v,,v k+ Dscard the top one Replace ever node a wth k-dmensonal vector x a = <v (a),,v k+ (a) > Cluster wth k-means M

= + = + = = = = T T T a a a a a a a a d a d A D A D,,,,,,,,,,,, ) ( ) ( = sze of CUT() ) NCUT( szeof ) ( = W I T NCUT: roughl mnmze rato of transtons between classes vs transtons wthn classes

So far How to delne a good partton of a graph? Mnmze a gven graph cut crteron How to eflcentl dentf such a partton? Approxmate usng nformaton provded b the egenvalues and egenvectors of a graph Spectral Clusterng J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7

Spectral Clusterng Algorthms Three basc stages: ) Pre- processng Construct a matrx representaton of the graph ) Decomposton Compute egenvalues and egenvectors of the matrx Map each pont to a lower- dmensonal representaton based on one or more egenvectors 3) Groupng Assgn ponts to two or more clusters, based on the new representaton J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8

Example: Spectral Parttonng Value of x Rank n x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9

Example: Spectral Parttonng Components of x Value of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Rank n x 0

Example: Spectral parttonng Components of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Components of x 3

k-wa Spectral Clusterng How do we partton a graph nto k clusters? Two basc approaches: Recursve b- parttonng [Hagen et al., 9] Recursvel appl b- parttonng algorthm n a herarchcal dvsve manner Dsadvantages: InefLcent, unstable Cluster multple egenvectors [Sh- Malk, 00] Buld a reduced space from multple egenvectors Commonl used n recent papers A preferable approach J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Wh use multple egenvectors? Approxmates the optmal cut [Sh- Malk, 00] Can be used to approxmate optmal k- wa normalzed cut Emphaszes cohesve clusters Increases the unevenness n the dstrbuton of the data Assocatons between smlar ponts are amplled, assocatons between dssmlar ponts are attenuated The data begns to approxmate a clusterng Well- separated space Transforms data to a new embedded space, consstng of k orthogonal bass vectors Multple egenvectors prevent nstablt due to nformaton loss J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ How do I pck v to be an egenvector for a blockstochastc matrx?

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Suppose each ()=+ or -: Then s a cluster ndcator that cuts the nodes nto two what s T (D-A)? The cost of the graph cut defned b what s T (I-W)? Also a cost of a graph cut defned b How to mnmze t? Turns out: to mnmze T X / ( T ) fnd smallest egenvector of X But: ths wll not be +/-, so t s a relaxed soluton

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ λ e λ e 3 egengap λ 3 λ 4 λ, 5 6,7, e. [Sh & Mela, 00]

Some more terms If A s an adacenc matrx (mabe weghted) and D s a (dagonal) matrx gvng the degree of each node Then D- A s the (unnormalzed) Laplacan W=AD - s a probablstc adacenc matrx I- W s the (normalzed or random- walk) Laplacan etc. The largest egenvectors of W correspond to the smallest egenvectors of I- W So sometmes people talk about bottom egenvectors of the Laplacan

A W A W K-nn graph (eas) Full connected graph, weghted b dstance

Spectral Clusterng: Pros and Cons Elegant, and well- founded mathematcall Works qute well when relatons are approxmatel transtve (lke smlart) Ver nos datasets cause problems Informatve egenvectors need not be n top few Performance can drop suddenl from good to terrble Expensve for ver large datasets Computng egenvectors s the bottleneck

Use cases and runtmes K- Means FAST Embarrassngl parallel Not ver useful on ansotropc data Spectral clusterng Excellent qualt under man dfferent data forms Much slower than K- Means

Further Readng Spectral Clusterng Tutoral: http://www.nformatk.un- hamburg.de/ml/ contents/people/luxburg/publcatons/ Luxburg07_tutoral.pdf