Spectral Feature Vectors for Graph Clustering

Spectral Feature Vectors for Graph Clustering Bin Luo,, Richard C. Wilson,andEdwinR.Hancock Department of Computer Science, University of York York YO DD, UK Anhui University, P.R. China {luo,wilson,erh}@cs.york.ac.uk Abstract. This paper investigates whether vectors of graph-spectral features can be used for the purposes of graph-clustering. We commence from the eigenvalues and eigenvectors of the adjacency matrix. Each of the leading eigenmodes represents a cluster of nodes and is mapped to a component of a feature vector. The spectral features used as components of the vectors are the eigenvalues, the cluster volume, the cluster perimeter, the cluster Cheeger constant, the inter-cluster edge distance, and the shared perimeter length. We explore whether these vectors can be used for the purposes of graph-clustering. Here we investigate the use of both central and pairwise clustering methods. On a data-base of view-graphs, the vectors of eigenvalues and shared perimeter lengths provide the best clusters. Introduction Graph clustering is an important yet relatively under-researched topic in machine learning [9,,4]. The importance of the topic stems from the fact that it is a key tool for learning the class-structure of data abstracted in terms of relational graphs. Problems of this sort are posed by a multitude of unsupervised learning tasks in knowledge engineering, pattern recognition and computer vision. The process can be used to structure large data-bases of relational models [] or to learn equivalence classes. One of the reasons for limited progress in the area has been the lack of algorithms suitable for clustering relational structures. In particular, the problem has proved elusive to conventional central clustering techniques. The reason for this is that it has proved difficult to define what is meant by the mean or representative graph for each cluster. However, Munger, Bunke and Jiang [] have recently taken some important steps in this direction by developing a genetic algorithm for searching for median graphs. Generally speaking, there are two different approaches to graph clustering. The first of these is pairwise clustering[7]. This requires only that a set of pairwise distances between graphs be supplied. The clusters are located by identifying sets of graphs that have strong mutual pairwise affinities. There is therefore no need to explicitly identify an representative (mean, mode or median) graph for each cluster. Unfortunately, the literature on pairwise clustering is much less developed than that on central clustering. The second approach is to embed T. Caelli et al. (Eds.): SSPR&SPR, LNCS 9, pp. 9,. c Springer-Verlag Berlin Heidelberg

4 Bin Luo et al. graphs in a pattern space[]. Although the pattern spaces generated in this way are well organised, there are two obstacles to the practical implementation of the method. Firstly, it is difficult to deal with graphs with different numbers of nodes. Secondly, the node and edge correspondences must be known so that the nodes and edges can be mapped in a consistent way to a vector of fixed length. In this paper, we attempt to overcome these two problems by using graph spectral methods to extract feature vectors from symbolic graphs []. Spectral graph theory is a branch of mathematics that aims to characterise the structural properties of graphs using the eigenvalues and eigenvectors of the adjacency matrix, or of the closely related Laplacian matrix. There are a number of wellknown results. For instance, the degree of bijectivity of a graph is gauged by the difference between the first and second eigenvalues (this property has been widely exploited in the computer vision literature to develop grouping and segmentation algorithms). In routing theory, on the other hand, considerable use is made of the fact that the leading eigenvector of the adjacency matrix gives the steady-state random walk on the graph. Here we adopt a different approach. Our aim is to use the leading eigenvectors of the adjacency matrix to define clusters of nodes. From the clusters, we extract structural features and using the eigenvalue order to index the components we construct feature-vectors. The length of the vectors are determined by the number of leading eigenvalues. The graph spectral features explored include the eigenvalue spectrum, cluster volume, cluster perimeter, cluster Cheeger constant, shared perimeter and cluster distances. The specific technical goals in this paper are two-fold. First, we aim to investigate whether the independent or principal components of the spectral feature vectors can be used to embed graphs in a pattern space suitable for clustering. Second, we investigate which of the spectral features results in the best clusters. Graph Spectra In this paper we are concerned with the set of graphs G,G,.., G k,..., G N. The kth graph is denoted by G k = (V k,e k ), where V k is the set of nodes and E k V k V k is the edge-set. Our approach in this paper is a graphspectral one. For each graph G k we compute the adjacency matrix A k.thisis a V k V k matrix whose element with row index i and column index j is A k (i, j) ={ if (i, j) Ek otherwise. () From the adjacency matrices A k,k =...N, we can calculate the eigenvalues λ k by solving the equation A k λ k I = and the associated eigenvectors φ ω k by solving the system of equations A k φ ω k = λω k φω k. We order the eigenvectors according to the decreasing magnitude of the eigenvalues, i.e. λ k > λ k >... λ V k k. The eigenvectors are stacked in order to construct the modal matrix Φ k =(φ k φ k... φ V k k ).

Spectral Feature Vectors for Graph Clustering We use only the first n eigenmodes of the modal matrix to define spectral clusters for each graph. The components of the eigenvectors are used to compute the probabilities that nodes belong to clusters. The probability that the node indexed i V k in graph k belongs to the cluster with eigenvalue order ω is s k i,ω = Φ k (i, ω) n ω = Φ k(i, ω ). () Spectral Features Our aim is to use spectral features for the modal clusters of the graphs under study to construct feature-vectors. To overcome the correspondence problem, we use the order of the eigenvalues to establish the order of the components of the feature-vectors. We study a number of features suggested by spectral graph theory.. Unary Features We commence by considering unary features for the arrangement of modal clusters. The features studied are listed below: Leading Eigenvalues: Our first vector of spectral features is constructed from the ordered eigenvalues of the adjacency matrix. For the graph indexed k, the vector is B k =(λ k,λ k,..., λn k )T. Cluster Volume: The volume Vol(S) of a subgraph S of a graph G is defined to be the sum of the degrees of the nodes belonging to the subgraph, i.e Vol(S) = i S deg(i), where deg(i) is the degree of node i. By analogy, for the modal clusters, we define the volume of the cluster indexed ω in the graph-indexed k to be Volk ω = i V k s k iω deg(i) n ω= i V k s k () iωdeg(i). The feature-vector for the graph-indexed k is B k =(Vol k,vol k,..., V oln k )T. Cluster Perimeter: For a subgraph S the set of perimeter nodes is (S) = {(u, v) (u, v) E u S v / S}. The perimeter length of the subgraph is defined to be the number of edges in the perimeter set, i.e. Γ (S) = (S). Again, by analogy, the perimeter length of the modal cluster indexed ω is i V Γk ω = k j V k s k iω ( sk jω )A k(i, j) n ω= i V k j V k s k iω ( sk jω )A k(i, j). (4) The perimeter values are ordered according to the modal index of the relevant cluster to form the graph feature vector B k =(Γ k,γ k,..., Γ n k )T.

Bin Luo et al. Cheeger Constant: The Cheeger constant for the subgraph S is defined as follows. Suppose that Ŝ = V S is the complement of the subgraph S. Further let E(S, Ŝ) ={(u, v) u S v Ŝ} be the set of edges that connect S to Ŝ. The Cheeger constant for the subgraph S is H(S) = The cluster analogue of the Cheeger constant is where Volˆω k = H ω k = n ω= E(S, Ŝ) () min[vol(s),vol(ŝ)]. Γ ω k min[vol ω k,volˆω k ], () i V k s k i,ω deg(i) Volω k. (7) is the volume of the complement of the cluster indexed ω. Again, the cluster Cheeger numbers are ordered to form a spectral feature-vector B k = (H k,h k,..., Hn k )T.. Binary Features In addition to the unary cluster features, we have studied pairwise cluster attributes. Shared Perimeter: The first pairwise cluster attribute studied is the shared perimeter of each pair of clusters. For the pair subgraphs S and T the perimeter is the set of nodes belong to the set P (S, T )={(u, v) u S v T }. Hence, our cluster-based measure of shared perimeter for the clusters is (i,j) E U k (u, v) = k s k i,u sk j,v A k(i, j) (i,j ) E k s k. () i,u sk j,v Each graph is represented by a shared perimeter matrix U k. We convert these matrices into long vectors. This is obtained by stacking the columns of the matrix U k in eigenvalue order. The resulting vector is B k = (U k (, ),U k (, ),..., U k (,n),u k (, )..., U k (,n,),...u k (n, n)) T Each entry in the long-vector corresponds to a different pair of spectral clusters. Cluster Distances: The between cluster distance is defined as the path length, i.e. the minimum number of edges, between the most significant nodes in a pair of clusters. The most significant node in a cluster is the one having the largest co-efficient in the eigenvector associated with the cluster. For the cluster indexed u in the graph indexed k, the most significant node is i k u =argmax i s k iu. To compute the distance, we note that if we multiply the adjacency matrix A k by

Spectral Feature Vectors for Graph Clustering 7 itself l times, then the matrix (A k ) l represents the distribution of paths of length l in the graph G k. In particular, the element (A k ) l (i, j) is the number of paths of length l edges between the nodes i and j. Hence the minimum distance between the most significant nodes of the clusters u and v is d u,v =argmin l (A k ) l (i k u,ik v ). If we only use the first n leading eigenvectors to describe the graphs, the between cluster distances for each graph can be written as a n by n matrix which can be converted to a n n long-vector B k =(d,,d,,...d,n,d,...d n,n ) T. 4 Embedding the Spectral Vectors in a Pattern Space In this section we describe two methods for embedding graphs in eigenspaces. The first of these involves performing principal components analysis on the covariance matrices for the spectral pattern-vectors. The second method involves performing multidimensional scaling on a set of pairwise distance between vectors. 4. Eigendecomposition of the Graph Representation Matrices Our first method makes use principal components analysis and follows the parametric eigenspace idea of Murase and Nayar []. The relational data for each graph is vectorised in the way outlined in Section. The N different graph vectors are arranged in view order as the columns of the matrix S =[B B... B k... B N ]. Next, we compute the covariance matrix for the elements in the different rows of the matrix S. This is found by taking the matrix product C = SS T.Weextract the principal components directions for the relational data by performing an eigendecomposition on the covariance matrix C. The eigenvalues λ i are found by solving the eigenvalue equation C λi = and the corresponding eigenvectors e i are found by solving the eigenvector equation Ce i = λ i e i. We use the first leading eigenvectors to represent the graphs extracted from the images. The co-ordinate system of the eigenspace is spanned by the three orthogonal vectors by E =(e, e, e ). The individual graphs represented by the long vectors B k,k =,,...,N can be projected onto this eigenspace using the formula x k = e T B k. Hence each graph G k is represented by a -component vector x k in the eigenspace. 4. Multidimensional Scaling Multidimensional scaling(mds)[] is a procedure which allows data specified in terms of a matrix of pairwise distances to be embedded in a Euclidean space. The classical multidimensional scaling method was proposed by Torgenson[] and Gower[]. Shepard and Kruskal developed a different scaling technique called ordinal scaling[]. Here we intend to use the method to embed the graphs extracted from different viewpoints in a low-dimensional space.

Bin Luo et al. To commence we require pairwise distances between graphs. We do this by computing the L norms between the spectral pattern vectors for the graphs. For the graphs indexed i and i, the distance is d i,i = K [ B i (α) B i (α)]. (9) α= The pairwise similarities d i,i are used as the elements of an N N dissimilarity matrix D, whose elements are defined as follows { di,i if i D i,i = i if i =i. () In this paper, we use the classical multidimensional scaling method to embed our the view-graphs in a Euclidean space using the matrix of pairwise dissimilarities D. The first step of MDS is to calculate a matrix T whose element with row r and column c is given by T rc = [d rc ˆd r. ˆd.c + ˆd.. ], where ˆd r. = N N c= d rc is the average dissimilarity value over the rth row, ˆd.c is the similarly defined average value over the cth column and ˆd.. = N N N r= c= d r,c is the average similarity value over all rows and columns of the similarity matrix T. We subject the matrix T to an eigenvector analysis to obtain a matrix of embedding co-ordinates X. If the rank of T is k, k N, then we will have k non-zero eigenvalues. We arrange these k non-zero eigenvalues in descending order, i.e. λ λ... λ k >. The corresponding ordered eigenvectors are denoted by e i where λ i is the ith eigenvalue. The embedding co-ordinate system for the graphs obtained from different views is X =[f, f,...,f k ], where f i = λ i e i are the scaled eigenvectors. For the graph indexed i, the embedded vector of co-ordinates is x i =(X i,,x i,,x i, ) T. Experiments Our experiments have been conducted with D image sequences for D objects which undergo slowly varying changes in viewer angle. The image sequences for three different model houses are shown in Figure. For each object in the view sequence, we extract corner features. From the extracted corner points we construct Delaunay graphs. The sequences of extracted graphs are shown in Figure. Hence for each object we have different graphs. In table we list the number of feature points in each of the views. From inspection of the graphs in Figure and the number of feature points in Table it is clear that the different graphs for the same object undergo significant changes in structure as the viewing direction changes. Hence, this data presents a challenging graph clustering problem. Our aim is to investigate which combination of spectral feature-vector and embedding strategy gives the best set of graph-clusters. In other words, we aim to see which method gives the best definition of clusters for the different objects.

Spectral Feature Vectors for Graph Clustering 9 Table. Number of feature points extracted from the three image sequences Image Number 4 7 9 CMU MOVI 4 4 7 9 4 Chalet 4 7 9 7 9 4 7 9 In Figure we compare the results obtained with the different spectral feature vectors. In the centre column of the figure, we show the matrix of pairwise Euclidean distances between the feature-vectors for the different graphs (this is best viewed in colour). The matrix has rows and columns (i.e. one for each of the images in the three sequences with the three sequences concatenated), and the images are ordered according to the position in the sequence. From top-tobottom, the different rows show the results obtained when the feature-vectors are constructed using the eigenvalues of the adjacency matrix, the cluster volumes, the cluster perimeters, the cluster Cheeger constants, the shared perimeter length and the inter-cluster edge distance. From the pattern of pairwise distances, it is clear that the eigenvalues and the shared perimeter length give the best block structure in the matrix. Hence these two attributes may be expected to result in the best clusters. To test this assertion, in the left-most and right-most columns of Figure we show the leading eigenvectors of the embedding spaces for the spectral feature-vectors. The left-hand column shows the results obtained with principal components analysis. The right-hand column shows the results obtained with multidimensional scaling. From the plots, it is clear that the best clusters are obtained when MDS is applied to the vectors of eigenvalues and shared perimeter length. Principal components analysis, on the other hand, does not give a space in which there is a clear cluster-structure. We now embark on a more quantitative analysis of the different spectral representations. To do this we plot the normalised squared eigenvalues ˆλ i = Fig.. Image sequences

9 Bin Luo et al. Fig.. Graph representation of the sequences λ i n i= λ i against the eigenvalue magnitude order i. In the case of the parametric eigenspace, these represent the fraction of the total data variance residing in the direction of the relevant eigenvector. In the case of multidimensional scaling, the normalised squared eigenvalues represent the variance of the inter-graph distances in the directions of the eigenvectors of the similarity matrix. The first two plots are for the case of the parametric eigenspaces. The left-hand side plot of Figure 4 is for the unary attribute of eigenvalues, while the middle plot is for the pairwise attribute of shared perimeters. The main feature to note is that of the binary features the vector of adjacency matrix eigenvalues has the fastest rate of decay, i.e. the eigenspace has a lower latent dimensionality, while the vector of Cheeger constants has the slowest rate of decay, i.e. the eigenspace has greater dimensionality. In the case of the binary attributes, the shared perimeter results in the eigenspace of lower dimensionality. In the last plot of Figure 4 we show the eigenvalues of the graph similarity matrix. We repeat the sequence of plots for the three house data-sets, but merge the curves for the unary and binary attributes into a single plot. Again the vector of adjacency matrix eigenvalues gives the space of lower dimensionality, while the vector of inter-cluster distances gives the space of greatest dimensionality. Finaly, we compare the performances of the graph embedding methods using measures of their classification accuracy. Each of the six graph spectral features mentioned above are used. We have assigned the graphs to classes using the the K-means classifier. The classifier has been applied to the raw Euclidean distances, and to the distances in the reduced dimension feature-spaces obtained using PCA andmds.intable we list the number of correctly classified graphs. From the table, it is clear that the eigevalues and the shared perimeters are the best features since they return higher correct classification rates. Cluster distance is the worst feature for clustering graphs. We note also that classification in the feature-space produced by PCA is better than in the original feature vector spaces. However, the best results come from the MDS embedded class spaces. Conclusions In this paper we have investigated how vectors of graph-spectral attributes can be used for the purposes of clustering graphs. The attributes studied are the

Spectral Feature Vectors for Graph Clustering 9 9 4. 7 4... 9 9 4 7 Graph Index 4 7 9 7 497. 4. 4. 7.. Graph Index 9 4 4.4 4. 4 9 7 4.. 7. 9 4 7 9 4 7 9 Graph Index 9 7 4 4 9 7.. 4..4...... Graph Index 4 4. 4. 7 9. 4 9. 9 4 7...4...4. Graph Index 7 Graph Index 4 4 4 9 7 9 4 4 4 4 9 7 7. 7 9 4 9 7 7.. 9 4 4 Graph Index 4 4 4 9. 7 7 9 7 4 9...... Graph Index 4 4.4 7 4 9 4 9 7.. 4. 7 4 9 9 Graph Index 7. 9 7 4 7 4 4........ Graph Index 9 4 4 9 9 4 7 7 7 4 9 Graph Index 9 4 4 9 7 7 7 9 4 4 4 Graph Index 7 4 4 Fig.. Eigenspace and MDS space embedding using the spectral features of binary adjacency graph spectra, cluster volumes, cluster perimeters, cluster Cheeger constants, shared perimeters and cluster distances leading eigenvalues, and, the volumes, perimeters, shared perimeters and Cheeger numbers for modal clusters. The best clusters emerge when we apply MDS to the vectors of leading eigenvalues. The best clusters result when we use cluster volume or shared perimeter.

9 Bin Luo et al..9..7 Eigenvalues Cluster Volumes Cluster Perimeters Cheeger Constants.9..7 Shared Perimeters Cluster Distances.9..7 Eigenvalues Cluster Volumes Cluster Perimeters Cheeger Constants Shared Perimeters Cluster Distances Squared Eigenvalues...4 Squared Eigenvalues...4 Squared Eigenvalues...4......... 4 7 9 Eigenvalue Index 4 4 Eigenvalue Index 4 7 9 Eigenvalue Index Fig. 4. Comparison of graph spectral features for eigenspaces. The left plot is for the unary features in eigenspace, the middle plot is for the binary features in eiegenspace and the right plot is for all the spectral features in MDS space Table. Correct classifications Features Eigenvalues Volumes Perimeters Cheeger Shared Distances constants Perimeters Distances Raw vector 9 PCA 9 7 7 MDS 9 7 9 7 Hence, we have shown how to cluster purely symbolic graphs using simple spectral attributes. The graphs studied in our analysis are of different size, and we do not need to locate correspondences. Our future plans involve studying in more detail the structure of the pattern-spaces resulting from our spectral features. Here we intend to investigate the use of ICA as an alternative to PCA as a means of embedding the graphs in a pattern-space. We also intend to study how support vector machines and the EM algorithm can be used to learn the structure of the pattern spaces. Finally, we intend to investigate whether the spectral attributes studied here can be used for the purposes of organising large image data-bases. References. H. Bunke. Error correcting graph matching: On the influence of the underlying cost function. IEEE Transactions on Pattern Analysis and Machine Intelligence, :97 9, 999.. F. R. K. Chung. Spectral Graph Theory. American Mathmatical Society Ed., CBMS series 9, 997. 4. Chatfield C. and Collins A. J. Introduction to multivariate analysis. Chapman & Hall, 9. 7 4. R. Englert and R. Glantz. Towards the clustering of graphs. In nd IAPR-TC- Workshop on Graph-Based Representation, 999.. Kruskal J. B. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 9: 9, 94. 7

Spectral Feature Vectors for Graph Clustering 9. Gower J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, :, 9. 7 7. B. Luo, A. Robles-Kelly, A. Torsello, R. C. Wilson, and E. R. Hancock. Clustering shock trees. In Proceedings of GbR, pages 7,.. H. Murase and S. K. Nayar. Illumination planning for object recognition using parametric eigenspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, ():9 7, 994. 4, 7 9. S. Rizzi. Genetic operators for hierarchical graph clustering. Pattern Recognition Letters, 9:9, 99.. J. Segen. Learning graph models of shape. In Proceedings of the Fifth International Conference on Machine Learning, pages 9, 9.. K. Sengupta and K. L. Boyer. Organizing large structural modelbases. PAMI, 7(4):, April 99.. Torgerson W. S. Multidimensional scaling. i. theory and method. Psychometrika, 7:4 49, 9. 7