Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Similar documents
SVD, discrepancy, and regular structure of contingency tables

Spectral properties of modularity matrices

1 Matrix notation and preliminaries from spectral graph theory

Chapter 3 Transformations

Data Mining and Analysis: Fundamental Concepts and Algorithms

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

1 Matrix notation and preliminaries from spectral graph theory

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Linear algebra and applications to graphs Part 1

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Lecture Notes 1: Vector spaces

COMPSCI 514: Algorithms for Data Science

Dimension. Eigenvalue and eigenvector

Linear Algebra Massoud Malek

Real symmetric matrices/1. 1 Eigenvalues and eigenvectors

Multivariate Statistical Analysis

Spectra of Adjacency and Laplacian Matrices

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

2. Matrix Algebra and Random Vectors

Spectral Clustering. Zitao Liu

Lecture 10: October 27, 2016

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

8.1 Concentration inequality for Gaussian random matrix (cont d)

7. Symmetric Matrices and Quadratic Forms

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

The Eigenvalue Problem: Perturbation Theory

Foundations of Matrix Analysis

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

1 Review: symmetric matrices, their eigenvalues and eigenvectors

1 Last time: least-squares problems

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra using Dirac Notation: Pt. 2

Quantum Computing Lecture 2. Review of Linear Algebra

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Singular value decomposition (SVD) of large random matrices. India, 2010

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Boolean Inner-Product Spaces and Boolean Matrices

Knowledge Discovery and Data Mining 1 (VO) ( )

Econ Slides from Lecture 7

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Math 4153 Exam 3 Review. The syllabus for Exam 3 is Chapter 6 (pages ), Chapter 7 through page 137, and Chapter 8 through page 182 in Axler.

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Laplacian Integral Graphs with Maximum Degree 3

LINEAR ALGEBRA QUESTION BANK

Math Matrix Algebra

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

Machine Learning for Data Science (CS4786) Lecture 11

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

Introduction to Spectral Graph Theory and Graph Clustering

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

MATH 221, Spring Homework 10 Solutions

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Math Solutions to homework 5

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

MAT Linear Algebra Collection of sample exams

Eigenvalues and Eigenvectors A =

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Spectral Clustering. Guokun Lai 2016/10

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Lecture 7: Positive Semidefinite Matrices

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure.

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems

The following definition is fundamental.

Elementary linear algebra

Lecture Note 12: The Eigenvalue Problem

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

MTH 2032 SemesterII

Principal Component Analysis

Symmetric and self-adjoint matrices

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

1. General Vector Spaces

Definitions for Quizzes

Basic Concepts in Matrix Algebra

Review problems for MA 54, Fall 2004.

ON THE QUALITY OF SPECTRAL SEPARATORS

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Generating p-extremal graphs

Graph Partitioning Using Random Walks

Stat 159/259: Linear Algebra Notes

Chap 3. Linear Algebra

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

1 Directional Derivatives and Differentiability

235 Final exam review questions

The Lanczos and conjugate gradient algorithms

The cocycle lattice of binary matroids

1 Linear Algebra Problems

Transcription:

Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October

Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

Notation G = (V, W): edge-weighted graph on n vertices, W: n n symmetric matrix, w ij 0, w ii = 0. (w ij : similarity between vertices i and j). Simple graph: 0/1 weights W.l.o.g., n n i=1 j=1 w ij = 1, joint distribution with marginal entries: n d i = w ij, i = 1,..., n j=1 (generalized vertex degrees) D = diag (d 1,..., d n ) Laplacian: L = D W Normalized Laplacian L D = I D 1/2 WD 1/2

1 k n P k = (V 1,..., V k ): k-partition of the vertices V 1,..., V k : disjoint, non-empty vertex subsets, clusters P k : the set of all k-partitions e(v a, V b ) = i V a j V b w ij : weighted cut between V a and V b Vol (V a ) = i V a d i : volume of V a

Ratio cut of P k = (V 1,..., V k ) given W: g(p k, W) = k 1 b=a+1 ( 1 V a + 1 ) e(v a, V b ) = V b Normalized cut of P k = (V 1,..., V k ) given W: f (P k, W) = = k 1 b=a+1 ( e(v a, V a ) V a ) 1 Vol (V a ) + 1 e(v a, V b ) Vol (V b ) e(v a, V a ) Vol (V a ) = k e(v a, V a ) Vol (V a ) Minimum k-way ratio cut and normalized cut of G = (V, W): g k (G) = min P k P k g(p k, W) and f k (G) = min P k P k f (P k, W)

The k-means algorithm The problem: given the points x 1,..., x n R d and an integer 1 k n, find the k-partition of the index set {1,..., n} (or equivalently, the clustering of the points into k disjoint non-empty subsets) which minimizes the following k-variance: S 2 k (x 1,..., x n ) = min P k P k S 2 k (P k, x 1,..., x n ) Usually, d k n. = min P k =(V 1,...,V k ) c a = 1 x j. V a j V a j V a x j c a 2,

To find the global minimum is NP-complete, but the iteration of the k-means algorithm, first described in MacQueen (1963) is capable to find a local minimum in polynomial time. If there exists a well-separated k-clustering of the points (even the largest within-cluster distance is smaller than the smallest between-cluster one) the convergence of the algorithm to the global minimum is proved by Dunn (1973-74), with a convenient starting. Under relaxed conditions, the speed of the algorithm is increased by a filtration in Kanungo et al. (2002). The algorithm runs faster if the separation between the clusters increases and an overall running time of O(kn) can be guaranteed.

Sometimes the points x 1,..., x n are endowed with the positive weights d 1,..., d n, w.l.o.g., n i=1 d i = 1. Weighted k-variance of the points: S k 2 (x 1,..., x n ) = min S k 2 (P k, x 1,..., x n ) P k P k = min d j x j c a 2, P k =(V 1,...,V k ) j V a 1 c a = d j x j. j V a d j j V a E.g., d 1,..., d n is a discrete probability distribution and a random vector takes on values x 1,..., x n with these probabilities; e.g., in a MANOVA (Multivariate Analysis of Variance) setup. The above algorithms can be easily adapted to this situation.

Partition matrices P k : n k balanced partition matrix Z k = (z 1,..., z k ) k-partition vector: z a = (z 1a,..., z na ) T, where z ia = 1 V a, if i V a and 0, otherwise. Z k is suborthogonal: Z T k Z k = I k The ratio cut of the k-partition P k given W: g(p k, W) = tr Z T k LZ k = z T a Lz a. (1) We want to minimize it over balanced k-partition matrices Z k Z B k.

Estimation by Laplacian eigenvalues G is connected, the spectrum of L: 0 = λ 1 < λ 2 λ n unit-norm, pairwise orthogonal eigenvectors: u 1, u 2,..., u n, u 1 = 1 The discrete problem is relaxed to a continuous one: r 1,..., r n R k : representatives of the vertices X = (r 1,..., r n ) T = (x 1,..., x k ) n 1 min X T X=I k n i=1 j=i+1 w ij r i r j 2 = min tr X T LX = X T X=I k and equality is attained with X = (u 1,..., u k ). i=1 λ i

g k (G) = min Z k Z B k tr Z T k LZ k λ i (2) and equality can be attained only in the k = 1 trivial case, otherwise the eigenvectors u i (i = 2,..., k) cannot be partition vectors, since their coordinates sum to 0 because of the orthogonality to the u 1 = 1 vector. Optimum choice of k? tr Z T k LZ k = n λ i i=1 i=1 (u T i z a ) 2. (3) This sum is the smallest possible if the largest (u T i z a ) 2 terms correspond to eigenvectors belonging to the smallest eigenvalues. Thus, the above sum is the most decreased by keeping only the k smallest eigenvalues in the inner summation and the corresponding eigenvectors are close to the subspace F k = Span {z 1,..., z k }.

g k,k (Z k, L) := λ i (u T i z a ) 2, i=1 we maximize g k,k (Z k, L) over Zk B for given L. The vectors λ i u i are projected onto the subspace F k : λi u i = λi (u T i z a )z a + ort Fk ( λ i u i ), i = 1,..., k. As λ 1 u 1 = 0, there is no use of projecting it. By the Pythagorean equality: λ i = λ i u i 2 = λ i (u T i z a ) 2 +dist 2 ( λ i u i, F k ), i = 1,..., k.

X k = ( λ 1 u 1,..., λ k u k ) = (r 1,..., r n) T. Sk 2 (X k) = dist 2 ( λ i u i, F k ). i=1 λ i = g k,k (Z k, L) + Sk 2 (P k, X k), (4) i=1 where the partition matrix Z k corresponds to the k-partition P k.

We are looking for the k-partition maximizing the first term. In view of (4), increasing g k,k (Z k, L) can be achieved by decreasing Sk 2(X k ); latter one is obtained by appying the k-means algorithm with k clusters for the k-dimensional representatives r 1,..., r n. As the first column of X k is 0, it is equivalent to apply the k-means algorithm with k clusters for the (k 1)-dimensional representatives that are the row vectors of the n (k 1) matrix obtained from X k by deleting its first column. λ k λ k+1 : we gain the most by omitting the n k largest eigenvalues.

Perturbation results W = W w + W b : within- and between-cluster edges w.r.t. P k D = D w + D b and L = L w + L b ρ: the smallest positive eigenvalue of L w ε: the largest eigenvalue of L b Suppose ε(p k ) < ρ(p k ) ρ = min ρi is large if the clusters are good expanders ε 2 max i {1,...,n} j: c(j) c(i) w ij, where c(i) is the cluster membership of vertex i. It is small, if from each vertex there are few, small-weight edges emaneting to clusters different of the vertex s own cluster

We proved (B, Tusnády, Discrete Math. 1994) that S 2 k (X k ) = min P k P k S 2 k (P k, X k ) k min P k P k ε(p k ) ρ(p k ) By the Weyl s perturbation theorem: 0 < λ k ε < ρ λ k+1 ρ + ε (5) for the ε and ρ of any k-partition with ε < ρ. Consequently, λ k ε(p k ) min λ k+1 P k P k ρ(p k ), (6)

Theorem If for a given k-partition ε(p k) ρ(p k ) < 1, then where X k = (u 1,..., u k ). Theorem Sk 2 [ε(p k )] 2 (P k, X k ) k [ρ(p k ) ε(p k )] 2, S 2 k (P k, X k) [ε(p k )] 2 [ρ(p k ) ε(p k )] 2 λ i, i=1 where X k = ( λ 1 u 1,..., λ k u k ). Consequently, S 2 k (X k) i=1 λ i min P k P k ε(p k ) 2 (ρ(p k ) ε(p k )) 2

Minimizing the normalized cut n k normalized partition matrix: Z k = (z 1,..., z k ) z a = (z 1a,..., z na ) T, where z ia = 1 Vol (Va), if i V a and 0, otherwise. The normalized cut of the k-partition P k given W: f (P k, W) = tr Z T k LZ k = tr (D 1/2 Z k ) T L D (D 1/2 Z k ) (7) or equivalently, f (P k, W) = k tr (D 1/2 Z k ) T (D 1/2 WD 1/2 )(D 1/2 Z k ). We want to minimize f (P k, W) over the k-partitions. It is equivalent to maximizing tr Z T k WZ k over normalized k-partition matrices Z k Z N k.

Normalized Laplacian eigenvalues G is connected, 0 = λ 1 < λ 2 λ p < 1 λ p+1 λ n 2 eigenvalues of L D with corresponding unit-norm, pairwise orthogonal eigenvectors u 1,..., u n, u 1 = ( d 1,..., d n ) T Continuous relaxation: X = (r 1,..., r n ) T = (x 1,..., x k ) n 1 min X T DX=I k n i=1 j=i+1 w ij r i r j 2 = min tr X T LX = X T DX=I k and the minimum is attained with x i = D 1/2 u i (i = 1,..., k). Especially, x 1 = 1. i=1 λ i

f k (G) = min Z k Z N k tr Z T k LZ k and equality can be attained only in the k = 1 trivial case, otherwise the transformed eigenvectors D 1/2 u i (i = 2,..., k) cannot be partition vectors, since their coordinates sum to 0 due to the orthogonality of the 1 vector. Equivalently, i=1 λ i max Z k Z N k tr Z T k WZ k k λ i = i=1 (1 λ i ). i=1

Optimum choice of k tr Z T k WZ k ) = n (1 λ i ) i=1 [(u i ) T (D 1/2 z a )] 2 increased if we neglect the terms belonging to the eigenvalues at least 1, hence, the outer summation stops at p. The inner sum is the largest in the k = p case, when the unit-norm, pairwise orthogonal vectors D 1/2 z 1,..., D 1/2 z p are close to the orthonormal eigenvectors u 1,..., u p, respectively. F p := Span {D 1/2 z 1,..., D 1/2 z p } f p,p (Z p, W) := p p (1 λ i ) [(u i) T (D 1/2 z a )] 2 i=1 tr (D 1/2 Z p ) T (D 1/2 WD 1/2 )(D 1/2 Z p ) f p,p (Z p, W)

For given W, we maximize f p,p (Z p, W) over Z N p. The vectors 1 λ i u i are projected onto the subspace F p : 1 λi u i = p [( 1 λ i u i ) T D 1/2 z a ] D 1/2 z a +ort Fp ( 1 λ i u i ) (i = 1,..., p) As 1 λ 1 u 1 = u 1 is in F p, its orthogonal component is 0. By the Pythagorean equality: 1 λ i = p [( 1 λ i u i ) T D 1/2 z a ] 2 + dist 2 ( 1 λ i u i, F p ) (i = 1,..., p)

Representation X p = (r 1,..., r n ) T = ( 1 λ 1 D 1/2 u 1,..., 1 λ p D 1/2 u p ) dist 2 ( 1 λ i u i, F p ) = n d j (r ji c ji ) 2, j=1 i = 1,..., p where r ji is the ith coordinate of the vector r j and c ji is the same for vector c j R p ; further, there are at most p different ones among the centers c 1,..., c n assigned to the vertex representatives. Namely, c ji = 1 d l r li, j V a, i = 1,..., p. Vol (V a ) l V a S 2 p(p p, X p ) = p dist 2 ( 1 λ i u i, F p ) = i=1 n d j r j c j 2. j=1

p (1 λ i ) = f p,p (Z p, W) + S p(p 2 p, X p). i=1 We are looking for the p-partition maximizing the first term. In view of the above formula, increasing f p,p (Z p, W) can be achieved by decreasing S 2 p(x p ); latter one is obtained by appying the k-means algorithm with p clusters for the p-dimensional representatives r 1,..., r n with respective weights d 1,..., d n. As the first column of X p is 1, it is equivalent to apply the k-means algorithm with p clusters for the (p 1)-dimensional representatives that are the row vectors of the n (p 1) matrix obtained from X p by deleting its first column.

Similarly, for d < k p: d (1 λ i ) = i=1 d [( d 1 λ i u i ) T D 1/2 z a ] 2 + dist 2 ( 1 λ i u i, F k ) i=1 i=1 := f k,d (Z k, W) + S 2 k (P k, X d), X d = ( 1 λ 1 D 1/2 u 1,..., 1 λ d D 1/2 u d ). In the presence of a spectral gap between λ d and λ d+1 < 1 neither d i=1 (1 λ i) nor S k 2(X d ) is increased significantly by introducing one more eigenvalue-eigenvector pair (by using (d + 1)-dimensional representatives instead of d-dimensional ones). Consequently, f k,d (Z k, W) would not change much, and k = d clusters based on d-dimensional representatives will suffice. Increasing the number of clusters (k + 1,..., p) will decrease the sum of the inner variances S k+1 2 (X d ) S k 2(X d ), but not significantly.

In the k = p special case tr Z T p LZ p p f p,p (Z p, W) = p λ i + S p(p 2 p, X p) i=1 that is true for the minima too: f p (G) p λ i + S p(x 2 p), i=1 a sharper lower estimator for the minimum normalized p-way cut than p i=1 λ i.

Spectral gap and variance In B, Tusnády, Discrete Math., 1994 Theorem In the representation X 2 = (D 1/2 u 1, D 1/2 u 2 ) = (1, D 1/2 u 2 ): Similarly, Theorem S 2 k (X 2 ) λ 2 λ 3 In the representation X 2 = ( 1 λ 1 D 1/2 u 1, 1 λ 2 D 1/2 u 2 ) = (1, 1 λ 2 D 1/2 u 2 ): Can it be generalized for k > 2? S 2 k (X 2) λ 2(1 λ 2 ) λ 3

Isoperimetric number Definition The Cheeger constant of the weighted graph G = (V, W ) is h(g) = min U V Vol (U) 1/2 e(u, Ū) Vol (U) Theorem (B, M-Sáska, Discrete Math. 2004). Let λ 2 be the smallest positive eigenvalue of L D. Then λ 2 2 h(g) min{1, 2λ 2 }. If λ 2 1 then the upper estimate can be improved to h(g) λ 2 (2 λ 2 ).

f 2 (G) is the symmetric version of h(g): f 2 (G) 2h(G) = f 2 (G) λ 2 (2 λ 2 ), λ 2 1. h(g) and f 2 (G) were estimated by Mohar,J. Comb. Theory B. 1989 for simple graphs (similar formula by max-degree) Theorem Suppose that G = (V, W) is connected, and λ i s are the eigenvalues of L D. Then k i=1 λ i f k (G) and in the case when the optimal k-dimensional representatives can be classified into k well-separated clusters in such a way that the maximum cluster diameter ε satisfies the relation ε min{1/ 2k, 2 min i Vol (Vi )} with k-partiton (V 1,..., V k ) induced by the clusters above, then f k (G) c 2 λ i, where c = 1 + εc /( 2 εc ) and c = 1/ min i Vol (Vi ). i=1

The Newman Girvan modularity for simple graphs G = (V, A): simple graph, V = n, A is 0/1 adjacency matrix number of edges: e = 1 2 i = 1 n i = 1 n a ij d i = n j=1 a ij: the degree of vertex i h ij := d i d j 2e : expected number of i j edges by random attachment P k = (V 1,..., V k ): k-partition of the vertices (modules) Newman and Girvan, Phys. Rev. E, 2004 introduced a modularity with large values for stronger (than expected) intra-module connections.

Definition The Newman Girvan modularity of the k-partition P k given A: Q NG (P k, A) = 1 2e i,j V a (a ij h ij ) Obviously, Q NG takes on values less than 1, and due to n i=1 n j=1 a ij = n i=1 n j=1 h ij it is 0 for k = 1 (no community structure at all); therefore, only integers k [2, N] are of interest. For given A and k, we are looking for max Pk P k Q NG (P k, A), more precisely, for the optimum k-partition giving the arg max of it; and eventually, for the optimum k too (there may be several local maxima).

H = (h ij ): matrix of rank 1 B = A H: modularity matrix The row sums of B are zeros, therefore it always has a 0 eigenvalue with the trivial eigenvector 1 = (1, 1,..., 1) T. If G is the complete graph, then B is negative semidefinite; but typically, it is indefinite. β 1 β 2 β n eigenvalues of B with unit norm, pairwise orthogonal eigenvectors u 1,..., u n. Newman, Phys. Rev. E, 2006 uses the eigenvectors belonging to the positive eigenvalues of B.

Under the null-hypothesis, vertices i and j are connected to each other independently, with probabilities proportional (actually, because of the normalizing condition, equal) to their generalized degrees. for edge-weighted graphs G = (V, W), w.l.o.g. n i=1 n j=1 w ij = 1 supposed h ij = d i d j, i, j = 1,..., n Definition the Newman-Girvan modularity of P k given W: Q(P k, W) = = (w ij h ij ) i,j V a [e(v a, V a ) Vol 2 (V a )],

For or given k we maximize Q(P k, W) over P k. This task is equivalent to minimizing (w ij h ij ). i V a, j V b a b We want to penalize partitions with clusters of extremely different sizes or volumes Definition Balanced Newman Girvan modularity of P k given W: Q B (P k, W) = = 1 V a (w ij h ij ) i,j V a [ e(va, V a ) V a Vol 2 ] (V a ), V a

Definition Normalized Newman Girvan modularity of P k given W: Q N (P k, W) = = 1 Vol (V a ) e(v a, V a ) Vol (V a ) 1, i,j V a (w ij h ij ) Maximizing the normalized Newman Girvan modularity over P k is equivalent to minimizing the normalized cut.

Maximizing the balanced Newman Girvan modularity B = W H: modularity matrix Spectrum: β 1 β p > 0 = β p+1 β n Unit-norm, pairwise orthogonal eigenvectors: u 1,..., u n, u p+1 = 1/ n. Q B (P k, W) = Q B (Z k, B) = z T a Bz a = tr Z T k BZ k. We want to maximize tr Z T k BZ k over balanced k-partition matrices Z k Z B k.

Continuous relaxation: max tr (Y T BY) = Y T Y=I k max y T a y b=δ ab ya T By a = and equality is attained when y 1,..., y k are eigenvectors of B corresponding to β 1,..., β k. Though the vectors themselves are not necessarily unique (e.g., in case of multiple eigenvalues), the subspace Span {y 1,..., y k } is unique if β k > β k+1. max Z k Z B k Q B (Z k, B) β a p+1 β a β a. (8) Both inequalities can be attained by equality only in the k = 1, p = 0 case, when our underlying graph is the complete graph: A = 11 T I, W = 1 n(n 1) A. In this case there is only one cluster with partition vector of equal coordinates (balanced eigenvector belonging to the single 0 eigenvalue).

The maximum with respect to k of the maximum in (8) is attained with the choice of k = p + 1. Q B (Z k, B) = tr Z T k BZ k = = n β i i=1 n z T a ( β i u i u T i )z a (u T i z a ) 2. We can increase the last sum if we neglect the terms belonging to the negative eigenvalues, hence, the outer summation stops at p, or equivalently, at p + 1. In this case the inner sum is the largerst in the k = p + 1 case, when the partition vectors z 1,..., z p+1 are close to the eigenvectors u 1,..., u p+1, respectively. As both systems consist of orthonormal sets of vectors, the two subspaces spanned by them should be close to each other. The subspace F p+1 = Span {z 1,..., z p+1 } consists of stepwise constant vectors on p + 1 steps, therefore u p+1 F p+1, and it suffices to process only the first p eigenvectors. i=1

Q B (Z p+1, B) Q p+1,p(z p+1, B) := p i=1 p+1 β i and in the sequel, for given B, we want to maximize Q p+1,p (Z p+1, B) over Z B p+1. Project the vectors β i u i onto the subspace F p+1 : (u T i z a ) 2, p+1 βi u i = [( β i u i ) T z a ]z a +ort Fp+1 ( β i u i ), i = 1,..., p. (9) p+1 β i = β i u i 2 = [( β i u i ) T z a ] 2 + dist 2 ( β i u i, F p+1 ), i = 1,..., p.

p β i = i=1 p p+1 [( β i u i ) T z a ] 2 + i=1 p dist 2 ( β i u i, F p+1 ) i=1 = Q p+1,p(z p+1, B) + S 2 p+1(x p ), where the rows of X p = ( β 1 u 1,..., β p u p ) are regarded as p-dimensional representatives of the vertices. We could as well take (p + 1)-dimensional representatives as the last coordinates are zeros, and hence, S 2 p+1 (X p) = S 2 p+1 (X p+1). Maximizing Q p+1,p is equivalent to minimizing S2 p+1 (X p) that can be obtained by applying the k-means algorithm for the p-dimensional representatives with p + 1 clusters.

More generally, if there is a gap between β k β k+1 > 0, then we look for k + 1 clusters based on k-dimensional representatives of the vertices. k leading eigenvectors are projected onto the k-dimensional subspace of F k+1 orthogonal to 1. Calculating eigenvectors is costy; the Lánczos method performs well if we calculate only eigenvectors belonging to some leading eigenvalues followed by a spectral gap. Some authors suggest to use as many eigenvectors as possible. In fact, using more eigenvectors (up to p) is better from the point of view of accuracy, but using less eigenvectors (up to a gap in the positive part of the spectrum) is better from the computational point of view. We have to compromise.

Normalized modularity matrix B D = D 1/2 BD 1/2 Q N (P k, W) = Q N (Z k, B) = z T a Bz a = tr (D 1/2 Z k ) T B D (D 1/2 Z k ) 1 = β 1 β n 1: spectrum of B D u 1,..., u n: unit-norm, pairwise orthogonal eigenvectors u 1 = ( d 1,..., d n ) T p: number of positive eigenvalues of B D (this p not necessarily coincides with that of B)

max Z k Z N k Q N (Z k, B) = Q N (Z k, B) n β i i=1 p+1 β a β a. [(u i) T (D 1/2 z a )] 2. We can increase this sum if we neglect the terms belonging to the negative eigenvalues, hence, the outer summation stops at p, or equivalently, at p + 1. The inner sum is the largest in the k = p + 1 case, when the unit-norm, pairwise orthogonal vectors D 1/2 z 1,..., D 1/2 z p+1 are close to the eigenvectors u 1,..., u p+1, respectively. In fact, the two subspaces spanned by them should be close to each other.

F p+1 = Span {D 1/2 z 1,..., D 1/2 z p+1 } not stepwise constant vectors! X p = ( β 1 D 1/2 u 1,..., β p D 1/2 u p) S 2 p+1(p k, X p) = weighted k-means algorithm gap history = p i=1 dist 2 ( β i u i, F p+1 ) n d j x j c j 2 min. j=1

Pure community structure G: disjoint union of k complete graphs on n 1,..., n k vertices Eigenvalues of the modularity matrix: k 1 positive β i s (>1) In the n 1 = = n k special case it is a multiple eigenvalue. 0: single eigenvalue 1: eigenvalue with multiplicity n k Eigenvalues of the normalized modularity matrix: 1: with multiplicity k 1 0: single eigenvalue n k negative eigenvalues in (-1,0) In the n 1 = = n k special case there is one negative eigenvalue with multiplicity n k Piecewise constant eigenvectors belonging to the positive eigenvalues = k clusters

Pure anticommunity structure G: complete k-partite graph on n 1,..., n k vertices Eigenvalues of the modularity matrix: k 1 negative eigenvalues 0: all the other eigenvalues Eigenvalues of the normalized modularity matrix: k 1 negative eigenvalues in (-1,0) 0: all the other eigenvalues MINIMIZE THE MODULARITY! Piecewise constant eigenvectors belonging to the negative eigenvalues = k clusters

Block matrices Definition The n n symmetric real matrix B is a blown-up matrix, if there is a k k symmetric so-called pattern matrix P with entries 0 p ij 1, and there are positive integers n 1,..., n k with k i=1 n i = n, such that after rearranging its rows and columns the matrix B can be divided into k k blocks, where block (i, j) is an n i n j matrix with entries all equal to p ij (1 i, j n). G = (V, B) is a weighted graph with possible loops Generalized random graph: G is a random graph whose edges exist independently, with probabilities p ij. Modularity matrix: k 1 structural (large absolute value) eigenvalues = k clusters by the corresponding eigenvectors Pure community structure: p ii = 1, p ij = 0 (i j) Pure anticommunity structure: p ii = 0, p ij = 1 (i j)

Eigenvalues of blown-up matrices and their Laplacians GENERAL CASE p ij s are arbitrary, P = B := diag (n 1,..., n k ) with n i cn (i = 1,..., k) rank (B) = k, non-zero eigenvalues: n λ i = Θ(n) with stepwise constant eigenvectors λ i s are eigenvalues of 1/2 P 1/2 Laplacian eigenvalues: 0: single eigenvalue λ 1,..., λ k 1 = Θ(n) with stepwise constant eigenvectors γ i = Θ(n) with multiplicity n i 1, eigenspace: vectors with coordinates 0 except block i, where the sum of the coordinates is 0 (i = 1,..., k)

Normalized Laplacian eigenvalues: There exists δ > 0 (independent of n) such that there are k eigenvalues [0, 1 δ] [1 + δ, 2] with stepwise constant eigenvectors. All the other eigenvalues are 1.

COMPLETE k-partite GRAPH p ii = 0, p ij = 1 (i j) non-zero eigenvalues of B: λ 1,..., λ k =Θ(n) with stepwise constant eigenvectors λ 1,..., λ k [ max i n i, min i n i ] [n max i n i, n min i n i ] Laplacian eigenvalues: 0: single eigenvalue λ 1 = = λ k 1 = n with stepwise constant eigenvectors γ i = n n i with multiplicity n i 1, eigenspace: vectors with coordinates 0 except block i, where the sum of the coordinates is 0 (i = 1,..., k)

Normalized Laplacian eigenvalues: There exists δ > 0 (independent of n) such that there are k 1 eigenvalues [1 + δ, 2] with stepwise constant eigenvectors. 0 is a single eigenvalue, all the other eigenvalues are 1.

k DISJOINT CLUSTERS A = k i=1 A i block-diagonal Diagonal/non-diagonal entries of the n i n i matrix A i : ν i /µ i Large eigenvalues of A: λ i = (n i 1)µ i + ν i = Θ(n) with stepwise constant eigenvectors. Small eigenvalues of A: ν i µ i with multiplicity n i 1 (i = 1,..., k). Laplacian eigenvalues: 0: with multiplicity k, eigenspace: stepwise constant vectors. n i µ i with with multiplicity n i 1 (i = 1,..., k) Normalized Laplacian eigenvalues: 0: with multiplicity k. n i µ i ν i (n i 1)µ i 1 with with multiplicity n i 1 (i = 1,..., k). Union of k complete graphs (ν i = 0). Non-0 e.v. s: n i n i 1 > 1.