COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018

Lecture 8 Spectral Clustering

Spectral clustering Curse of dimensionality Dimensionality Reduction A : n d data matrix. Find the space V formed by the top k (right) singular vectors Project A on to V (to obtain A k = k i=1 σ iu i v T i ) Cluster the projected points (a total of n k-dimensional points)

Benefits of projection Figure 7.3: Clusters in the full space and their projections courtesy: the textbook

Benefits of projection A : n d data matrix C : n d matrix. ith row is the center of the cluster where a i, the corresponding row of A belong to Rank of C is k n a i c i 2 2 = A C 2 F i=1 A k : Projection of A on to the first k singular vectors.

Projection may not lead to data loss Note that: A k C F A k A F + A C F 2 A C F A good clustering for A k is a good clustering for A.

Spectral algorithms for graph clustering Find clusters in social networks, find communities in the internet Partition a graph A Cut: divides the graph in two minimize the number of edges across the cut

10.4.1 What Makes a Good Partition? How to cut Given a graph, we would like to divide the nodes into two sets so that the cut, or set of edges that connect nodes in different sets is minimized. However, we also want to constrain the selection of the cut so that the two sets are approximately equal in size. The next example illustrates the point. Example 10.14 : Recall our running example of the graph in Fig. 10.1. There, it is evident that the best partition puts {A, B, C} in one set and {D, E, F, G} in the other. The cut consists only of the edge (B,D) and is of size 1. No nontrivial cut can be smaller. A B D E C H G F Smallest cut Best cut Figure 10.11: The smallest cut might not be the best cut The smallest cut is not the best cut In Fig. 10.11 is a variant of our example, where we have added the node H and two extra edges, (H, C) and(c, G). If all we wanted was to minimize the size of the cut, then the best choice would be to put H in one set and all the other nodes in the other set. But it should be apparent that if we reject partitions where one set is too small, then the best we can do is to use the cut consisting of edges (B,D) and(c, G), which partitions the graph into two equal-sized sets {A, B, C, H} and {D, E, F, G}. 10.4.2 Normalized Cuts A proper definition of a good cut must balance the size of the cut itself against the difference in the sizes of the sets that the cut creates. One choice

How to cut Suppose a cut partition a graph in to two parts S and T Vol(S) : Number of edges with at least one end in S Cut(S, T ) : Number of edges that has one end in S and the other end in T Normalized cut: Cut(S, T ) Vol(S) + Cut(S, T ) Vol(T )

How to cut A cut is a partition into two parts Let one part be positive part; the other being negative part Consider a membership vector x R n The sign of the entry x i denotes whether it is in positive cluster or negative clusters

How to cut Assumptions on x We assume there are approximately same number of positive and negative entries (or the weight of the positive part is same as the weight of the negative part) x i = 0 i If there exist an edge between x i and x j then they are likely to have same signs For an edge (i, j), (x i x j ) 2 should be small

10.4.3 Some Matrices That Describe Graphs Describing a graph To develop the theory of how matrix algebra can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. A B D E C G F Figure 10.12: Repeat of the graph of Fig. 10.1 4 Adjacency matrix CHAPTER A 10. MINING SOCIAL-NETWORK GRAPH Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and column 2. 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the ith node. Figure 10.13: The adjacency matrix for Fig. 10.12

Figure 10.14: The degree matrix for Fig. 10.12 10.4.3 Some Matrices That Describe Graphs Describing a graph To develop the theory of how matrix algebra can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. Figure 10.13: The adjacency matrix for Fig. 10.12 A B D E ample 10.17 : The degree matrix for the graph of Fig. 10.12 is shown. 10.14. We use the same order of the nodes as in Example 10.16. F C tance, the entry in row 4 and column 4G is 4 because F node D has edges r other nodes. The entryfigure in10.12: rowrepeat 4 and of the graph column of Fig. 10.15 is 0, because that entry on the Degree diagonal. matrix D Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and column 2. 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the ith node.

10.4.3 Some Matrices That Describe Graphs To develop thethe theory ofgraph how matrix algebra Laplacian can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. A B D E C G F Laplacian matrix L = D A Figure 10.12: Repeat of the graph of Fig. 10.1 10.4. PARTITIONING OF GRAPHS 365 Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and2column -12. -1 0 0 0 0-1 3-1 -1 0 0 0 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the -1-1 2 0 0 0 0 ith node. 0-1 0 4-1 -1-1 0 0 0-1 2-1 0 0 0 0-1 -1 3-1 0 0 0-1 0-1 2 Figure 10.15: The Laplacian matrix for Fig. 10.12

The graph Laplacian The smallest eigenvalue of L is zero and correspond to the 1 eigenvector n [1 1... 1] T. What is the second smallest eigenvalue-eigenvector pair of L? L is symmetric positive semidefinite matrix (which means for any x, x T Lx 0). We will see why shortly It has an orthonormal set of eigenvectors

The graph Laplacian The smallest eigenvalue of L is zero and correspond to the 1 eigenvector n [1 1... 1] T. The second (smallest) eigenvector is orthogonal to the (smallest) eigenvector must satisfy 1 n [1 1... 1] T x = 0 or x i = 0 i

The second (smallest) eigenvector of the Laplacian Minimize such that, i x 2 i x T Lx = 1 and x i = 0 i

The second (smallest) eigenvector of the Laplacian: meaning V = {1, 2,..., n} set of vertices E : set of edges Note that, x T Lx = x T Dx x T Ax = = (i,j) E (x 2 i + x 2 j ) 2 (i,j) E n d i xi 2 i=1 x i x j = n i=1 j=1 (i,j) E Note - this is what we wanted to optimize... n a ij x i x j (x i x j ) 2 The second smallest eigenvector of L gives a good cluster membership vector

The spectral clustering for graphs Find the second (smallest) eigenvector of the Laplacian Minimize x T Lx such that, i x 2 i = 1 and x i = 0 i assign node i to the cluster sign(x i )

Example: i whose corresponding vector component x i is positive and the other set to be those whose components are negative. This choice does not guarantee a partition The into sets spectral of equal size, clustering but the sizes are likely fortographs be close. We believe that the cut between the two sets will have a small number of edges because (x i x j ) 2 is likely to be smaller if both x i and x j have the same sign than if they have different signs. Thus, minimizing x T Lx under the required constraints will tend to give x i and x j the same sign if there is an edge (i, j). 1 4 2 10.4. PARTITIONING OF GRAPHS 367 3 6 Figure 10.16: Graph for illustrating partitioning by spectral analysis highest. Note that we have not scaled the eigenvectors to have length 1, but could Laplacian do so easily matrix if we L wished. = D A Example 10.19 : Let us apply the above technique to the graph of Fig. 10.16. The Laplacian matrix for this graph is shown in Fig. 10.17. Bystandardmethods or math packages 3we can -1find-1all the -1 eigenvalues 0 0 and eigenvectors of this matrix. We shall simply tabulate them in Fig. 10.18, from lowest eigenvalue to -1 2-1 0 0 0-1 -1 3 0 0-1 -1 0 0 3-1 -1 0 0 0-1 2-1 0 0-1 -1-1 3 5 Figure 10.17: The Laplacian matrix for Fig. 10.16

Example: be those whose components are negative. This choice does not guarantee a partition 10.4. PARTITIONING into sets of equal OFsize, GRAPHS but the sizes are likely to be close. 367 We believe that the The cut between spectral the two sets clustering will have a small for number graphs of edges because (x i x highest. j ) 2 isnote likelythat to be we smaller have notif scaled boththe x i eigenvectors and x j haveto the have same length sign 1, than but if they havecould different do so signs. easily ifthus, we wished. minimizing x T Lx under the required constraints will tend to give x i and x j the same sign if there is an edge (i, j). 3-1 -1-1 0 0-1 2-1 0 0 0-1 -1 1 3 0 0-14 -1 0 0 3-1 -1 2 0 0 0-1 2-1 5 0 0-1 -1-1 3 3 6 Figure 10.17: The Laplacian matrix for Fig. 10.16 Figure The second 10.16: eigenvector Graph for hasillustrating three positivepartitioning and three negative by spectral components. analysis It makes the unsurprising suggestion that one group should be {1, 2, 3}, the nodes with positive components, and the other group should be {4, 5, 6}. Eigenpairs of Laplacian matrix L Example 10.19 : Let us apply the above technique to the graph of Fig. 10.16. The Laplacian matrix Eigenvalue for this graph 0 1 is shown 3 3in Fig. 4 10.17. 5 Bystandardmethods or math packages Eigenvector we can 1find all 1 the 5 eigenvalues 1 1 1and eigenvectors of this matrix. We shall simply tabulate 1 them 2 in 4 Fig. 2 10.18, 1 from 0 lowest eigenvalue to 1 1 1 3 1 1 1 1 5 1 1 1 1 2 4 2 1 0 1 1 1 3 1 1 Clusters: Figure 10.18: Eigenvalues and eigenvectors for the matrix of Fig. 10.17 {1, 2, 3} {4, 5, 6} 10.4.5 Alternative Partitioning Methods The method of Section 10.4.4 gives us a good partition of the graph into two