MATH 567: Mathematical Techniques in Data Science Clustering II

Spectral clustering: overview MATH 567: Mathematical Techniques in Data Science Clustering II Dominique uillot Departments of Mathematical Sciences University of Delaware Overview of spectral clustering: Construct a similarity matrix measuring the similarity of pairs of objects (eg sij = exp( xi xj /(σ ))) Use the similarity matrix to construct a (weighted or unweighted) graph 3 Compute eigenvectors of the (normalized or unnormalized) graph Laplacian: L = D W, Lsym = D / LD / 4 Construct a matrix containing the rst K eigenvectors of L or Lsym as columns 5 Each row identies a vertex of the graph to a point in R K May 8, 7 This lecture is based on U von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 7 (4), 7 6 Cluster those points using the K-means algorithm /8 Example Let us try to cluster the following graph: A = We have:, L = 3 4 3 3 4 3 3 3 v = ( 38577, 4777, 38577, 38577, 4777, 38577, 38577, 38577) T The unnormalized Laplacian Proposition: The matrix L satises the following properties: For any f R n : f T Lf = (fi fj) L is symmetric and positive semidenite 3 is an eigenvalue of L with associated constant eigenvector Proof: To prove (), f T Lf = f T Df f T W f = difi fifj = difi fifj + djf j j= = (fi fj) 3/8 () follows from () (3) is easy 4/8

The unnormalized Laplacian (cont) Proposition: Let The normalized Laplacians be an undirected graph with non-negative Proposition: The normalized Laplacians satisfy the following weights Then: The multiplicity of the eigenvalue of connected components A,, A The eigenspace of eigenvalue vectors Proof: If f A,, A L of properties: equals the number in the graph is an eigenvector associated to = f T Lf = n X fi = fj f Rn, we have n X f Lsym f = T =, fj f i p di dj then (fi fj ) is an eigenvalue of an eigenvalue of 3 whenever the connected components of u Lrw Lsym is an eigenvalue of and > Thus f is constant on We conclude that the eigenspace of is contained in span(a,, A ) Conversely, it is not hard to see that each Ai is an eigenvector associated to (write L in It follows that For every is spanned by the indicator of those components u if and only w = D/ u if u if and only Lu = Du if Lrw! solve the generalized eigenproblem is Proof: The proof of () is similar to the proof of the analogous result for the unnormalized Laplacian () and (3) follow easily by using appropriate rescalings bloc diagonal form) 5/8 The normalized Laplacians (cont) 6/8 raph cuts graph with (weighted) adjacency matrix Proposition: Let be an undirected graph with non-negative W = ( ) We de ne: weights Then: The multiplicity of the eigenvalue Lsym and Lrw A,, A in For Lrw, 3 For := the eigenspace of eigenvalue indicator vectors Ai, i =,, is spanned by the Lsym, the eigenspace of eigenvalue D/ Ai, i =,, i A,j B the graph X W (A, B) := of both equals the number of connected components number of vertices in vol(a) := i A di iven a partition is spanned by the vectors A P A,, A of the vertices of, we let cut(a,, A ) := Proof: Similar to the proof of the analogous result for the X W (Ai, Ai ) unnormalized Laplacian The min-cut problem consists of solving: min V =A A Ai Aj = i6=j 7/8 cut(a,, A ) 8/8

raph cuts (cont) The min-cut problem can be solved eciently when = (see Stoer and Wagner 997) In practice it often does not lead to satisfactory partitions In many cases, the solution of min-cut simply separates one individual vertex from the rest of the graph Balanced cuts The two most common objective functions that are used as a replacement to the min-cut objective are: RatioCut (Hagen and Kahng, 99): RatioCut(A,, A) := W (Ai, Ai) Ai = cut(ai, Ai) Ai Normalized cut (Shi and Mali, ): Ncut(A,, A) := W (Ai, Ai) cut(ai, Ai) = vol(ai) vol(ai) We would lie clusters to have a reasonably large number of points We therefore modify the min-cut problem to enforce such constraints Note: both objective functions tae larger values when the clusters Ai are small Resulting clusters are more balanced However, the resulting problems are NP hard - see Wagner and Wagner (993) 9/8 /8 Spectral clustering Spectral clustering provides a way to relax the RatioCut and the Normalized cut problems Strategy: Express the original problem as a linear algebra problem involving discrete/combinatorial constraints Relax/remove the constraints RatioCut with = : solve min RatioCut(A, A) iven A V, let f R n be given by / if vi A fi := / if vi A Relaxing RatioCut Let L = D W be the (unnormalized) Laplacian of Then f T Lf = (fi fj) = + + i A,j A ( = W (A, A) + + ) ( ) + + = W (A, A) + = V ( ) W (A, A) W (A, A) + = V RatioCut(A, A) i A,j A since + = V, and W (A, A) = W (A, A) /8 /8

Relaxing RatioCut (cont) Relaxing RatioCut (cont) We showed: f T Lf = (fi fj) = V RatioCut(A, A) Moreover, note that fi = i A Thus f i A Finally, f = fi = = = + = V = n Thus, we have showed that the Ratio-Cut problem is equivalent to We have: subject to f, f = n, fi dened as above This is a discrete optimization problem since the entries of f can only tae two values: / and / The problem is NP hard The natural relaxation of the problem is to remove the discreteness condition on f and solve f R n subject to f, f = n subject to f, f = n, fi dened as above 3/8 4/8 Relaxing RatioCut (cont) Unnormalized spectral clustering: summary Using properties of the Rayleigh quotient, it is not hard to show that the solution of f R n subject to f, f = n The unnormalized spectral clustering algorithm: is an eigenvector of L corresponding to the second eigenvalue Clearly, if f is the solution of the problem, then f T L f min RatioCut(A, A) How do we get the clusters from f? We could set { vi A if fi vi A if fi < More generally, we cluster the coordinates of f using K-means This is the unnormalized spectral clustering algorithm for = The above process can be generalized to clusters 5/8 Source: von Luxburg, 7 6/8

Normalized spectral clustering Relaxing the RatioCut leads to unnormalized spectral clustering By relaxing the Ncut problem, we obtain the Normalized spectral clustering algorithm of Shi and Mali () The normalized clustering algorithm of Ng et al Another popular variant of the spectral clustering algorithm was provided by Ng, Jordan, and Weiss () The algorithm uses Lsym instead of L (unnormalized clustering) or Lrw (Shi and Mali's normalized clustering) Source: von Luxburg, 7 Note: The solutions of Lu = Du are the eigenvectors of Lrw See von Luxburg (7) for details Source: von Luxburg, 7 See von Luxburg (7) for details 7/8 8/8