STA141C: Big Data & High Performance Statistical Computing

Size: px

Start display at page:

Download "STA141C: Big Data & High Performance Statistical Computing"

Adele Carroll
5 years ago
Views:

1 STA141C: Big Data & High Performance Statistical Computing Lecture 12: Graph Clustering Cho-Jui Hsieh UC Davis May 29, 2018

2 Graph Clustering Given a graph G = (V, E, W ) V : nodes {v 1,, v n } E: edges {e 1,, e m } W : weight matrix W ij = { w ij, if (i, j) E 0 otherwise Goal: Partition V into k clusters of nodes V = V 1 V 2 V k, V i V j = ϕ, i, j

3 Similarly Graph Example: similarity graph Given samples x 1,..., x n Weight (similarities) indicates closeness of samples

4 Similarity Graph E.g., Gaussian kernel W ij = e x i x j 2 /σ 2

5 Social graph Nodes: users in social network Edges: W ij = 1 if user i and j are friends, otherwise W ij = 0

6 Partitioning into Two Clusters Partition graph into two sets V 1, V 2 to minimize the cut value: cut(v 1, V 2 ) = v i V 1,v j V 2 W ij

7 Partitioning into Two Clusters Partition graph into two sets V 1, V 2 to minimize the cut value: cut(v 1, V 2 ) = v i V 1,v j V 2 W ij Also, the size of V 1, V 2 needs to be similar (balance)

8 Partitioning into Two Clusters Partition graph into two sets V 1, V 2 to minimize the cut value: cut(v 1, V 2 ) = v i V 1,v j V 2 W ij Also, the size of V 1, V 2 needs to be similar (balance) One classical way of enforcing balance: min cut(v 1, V 2 ) V 1,V 2 s.t. V 1 = V 2, V 1 V 2 = {1,, n}, V 1 V 2 = ϕ this is NP-hard (cannot be solved in polynomial time)

9 Kernighan-Lin Algorithm Starts with some partitioning V 1, V 2 Calculate change in cut if 2 vertices are swapped Swap the vertices (1 in V 1 & 1 in V 2 ) that decease the cut the most Iterative until convergence

10 Kernighan-Lin Algorithm Starts with some partitioning V 1, V 2 Calculate change in cut if 2 vertices are swapped Swap the vertices (1 in V 1 & 1 in V 2 ) that decease the cut the most Iterative until convergence Used when we need exact balanced clusters (e.g., circuit design)

11 Objective function that considers balance Ratio-Cut: min V 1,V 2 Normalized-Cut: min V 1,V 2 { Cut(V1, V 2 ) V 1 { Cut(V1, V 2 ) deg(v 1 ) + Cut(V } 1, V 2 ) := RC(V 1, V 2 ) V 2 + Cut(V } 1, V 2 ) := NC(V 1, V 2 ), deg(v 2 ) where deg(v c ) := W i,j = links(v c, V ) v i V c,(i,j) E

12 Generalize to k clusters Ratio-Cut: Normalized-Cut: min V 1,,V k min V 1,,V k k c=1 k c=1 Cut(V c, V V c ) V c Cut(V c, V V c ) deg(v c )

13 Reformulation Recall deg(v c ) = links(v c, V ) Define a diagonal matrix deg(v 1 ) deg(v 2 ) 0 D = 0 0 deg(v 3 ) y c = {0, 1} n : indicator vector for the c-th cluster

14 Reformulation Recall deg(v c ) = links(v c, V ) Define a diagonal matrix deg(v 1 ) deg(v 2 ) 0 D = 0 0 deg(v 3 ) y c = {0, 1} n : indicator vector for the c-th cluster We have yc T y c = V c yc T Dy c = deg(v c ) yc T W y c = links(v c, V c )

15 Ratio Cut Rewrite the ratio-cut objective: RC(V 1,, V k ) = = = = = k c=1 k c=1 k c=1 k c=1 k c=1 Cut(V c, V V c ) V c deg(v c ) links(v c, V c ) V c y T c Dy c y T c W y c y T c y c y T c (D W )y c y T c y c y T c Ly c y T c y c (L = D W is called Graph Laplacian )

16 More on Graph Laplacian L is symmetric positive semi-definite

17 More on Graph Laplacian L is symmetric positive semi-definite For any x, x T Lx = 1 W ij (x i x j ) 2 2 (i,j)

18 Solving Ratio-Cut We have shown Ratio-Cut is equivalent to RCut = k c=1 y T c Ly c y T c y c = k ( y c y c )T L y c y c c=1 Define ȳ c = y c / y c (normalized indicator), Y = [ȳ 1, ȳ 2,, ȳ k ] Y T Y = I

19 Solving Ratio-Cut We have shown Ratio-Cut is equivalent to RCut = k c=1 y T c Ly c y T c y c = k ( y c y c )T L y c y c c=1 Define ȳ c = y c / y c (normalized indicator), Y = [ȳ 1, ȳ 2,, ȳ k ] Y T Y = I Relaxed to real valued problem: min Trace(Y T LY ) Y T Y =I

20 Solving Ratio-Cut We have shown Ratio-Cut is equivalent to RCut = k c=1 y T c Ly c y T c y c = k ( y c y c )T L y c y c c=1 Define ȳ c = y c / y c (normalized indicator), Y = [ȳ 1, ȳ 2,, ȳ k ] Y T Y = I Relaxed to real valued problem: min Trace(Y T LY ) Y T Y =I Solution: Eigenvectors corresponding to the smallest k eigenvalues of L

21 Solving Ratio-Cut Let Y R n k be these eigenvectors. Are we done?

22 Solving Ratio-Cut Let Y R n k be these eigenvectors. Are we done? No, Y does not have 0/1 values (not indicators) (since we are solving a relaxed problem)

23 Solving Ratio-Cut Let Y R n k be these eigenvectors. Are we done? No, Y does not have 0/1 values (not indicators) (since we are solving a relaxed problem) Solution: Run k-means on the rows of Y

24 Solving Ratio-Cut Let Y R n k be these eigenvectors. Are we done? No, Y does not have 0/1 values (not indicators) (since we are solving a relaxed problem) Solution: Run k-means on the rows of Y Summary of Spectral clustering algorithms: Compute Y R n k : eigenvectors corresponds to k smallest eigenvalues of (normalized) Laplacian matrix Run k-means to cluster rows of Y

25 Eigenvectors of Laplacian If graph is disconnected ( k connected components), Laplacian is block diagonal and first k eigen-vectors are:

26 Eigenvectors of Laplacian What if the graph is connected?

27 Eigenvectors of Laplacian What if the graph is connected? There will be only one smallest eigenvalue/eigenvector: L1 = (D A)1 = 0 (1 = [1, 1,, 1] T is the eigenvector with eigenvalue 0)

28 Eigenvectors of Laplacian What if the graph is connected? There will be only one smallest eigenvalue/eigenvector: L1 = (D A)1 = 0 (1 = [1, 1,, 1] T is the eigenvector with eigenvalue 0) However, the 2nd to k-th smallest eigenvectors are still useful for clustering

29 Normalized Cut Rewrite Normalized Cut: Let ỹ c = D1/2 y c D 1/2 y, then c NCut = Normalized Laplacian: NCut = = k c=1 k c=1 k c=1 Cut(V c, V V c ) deg(v c ) y T c (D A)y c y T c Dy c ỹc T D 1/2 (D A)D 1/2 ỹ c ỹc T ỹc L = D 1/2 (D A)D 1/2 = I D 1/2 AD 1/2 Normalized Cut eigenvectors correspond to the smallest eigenvalues of L

30 Kmeans vs Spectral Clustering Kmeans: decision boundary is linear Spectral clustering: boundary can be non-convex curves x i x j 2 σ in W ij = e σ 2 global structure) controls the clustering results (focus on local or

31 Kmeans vs Spectral Clustering

32 Coming up Neural networks Questions?

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will