Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Size: px

Start display at page:

Download "Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October"

Jesse Harmon
5 years ago
Views:

1 Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics Ljubjana 2010, October

2 Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

3 Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

4 Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

5 Outline Find community structure in networks. Antiexpanders, but the clusters are good expanders. How many clusters? Penalize clusters of extremely different sizes/volumes. Minimum multiway cut problems: ratio cut and normalized cut based on Laplacian and normalized Laplacian spectra. Communities with sparse between-cluster (and dense within-cluster) connections. Modularity cuts: spectral decomposition of the modularity and normalized modularity matrix. Communities with more within-cluster (and less between-cluster) connections than expected under independence.

6 Notation G = (V, W): edge-weighted graph on n vertices, W: n n symmetric matrix, w ij 0, w ii = 0. (w ij : similarity between vertices i and j). Simple graph: 0/1 weights W.l.o.g., n n i=1 j=1 w ij = 1, joint distribution with marginal entries: n d i = w ij, i = 1,..., n j=1 (generalized vertex degrees) D = diag (d 1,..., d n ) Laplacian: L = D W Normalized Laplacian L D = I D 1/2 WD 1/2

7 1 k n P k = (V 1,..., V k ): k-partition of the vertices V 1,..., V k : disjoint, non-empty vertex subsets, clusters P k : the set of all k-partitions e(v a, V b ) = i V a j V b w ij : weighted cut between V a and V b Vol (V a ) = i V a d i : volume of V a

8 Ratio cut of P k = (V 1,..., V k ) given W: g(p k, W) = k 1 b=a+1 ( 1 V a + 1 ) e(v a, V b ) = V b Normalized cut of P k = (V 1,..., V k ) given W: f (P k, W) = = k 1 b=a+1 ( e(v a, V a ) V a ) 1 Vol (V a ) + 1 e(v a, V b ) Vol (V b ) e(v a, V a ) Vol (V a ) = k e(v a, V a ) Vol (V a ) Minimum k-way ratio cut and normalized cut of G = (V, W): g k (G) = min P k P k g(p k, W) and f k (G) = min P k P k f (P k, W)

9 The k-means algorithm The problem: given the points x 1,..., x n R d and an integer 1 k n, find the k-partition of the index set {1,..., n} (or equivalently, the clustering of the points into k disjoint non-empty subsets) which minimizes the following k-variance: S 2 k (x 1,..., x n ) = min P k P k S 2 k (P k, x 1,..., x n ) Usually, d k n. = min P k =(V 1,...,V k ) c a = 1 x j. V a j V a j V a x j c a 2,

10 To find the global minimum is NP-complete, but the iteration of the k-means algorithm, first described in MacQueen (1963) is capable to find a local minimum in polynomial time. If there exists a well-separated k-clustering of the points (even the largest within-cluster distance is smaller than the smallest between-cluster one) the convergence of the algorithm to the global minimum is proved by Dunn ( ), with a convenient starting. Under relaxed conditions, the speed of the algorithm is increased by a filtration in Kanungo et al. (2002). The algorithm runs faster if the separation between the clusters increases and an overall running time of O(kn) can be guaranteed.

11 Sometimes the points x 1,..., x n are endowed with the positive weights d 1,..., d n, w.l.o.g., n i=1 d i = 1. Weighted k-variance of the points: S k 2 (x 1,..., x n ) = min S k 2 (P k, x 1,..., x n ) P k P k = min d j x j c a 2, P k =(V 1,...,V k ) j V a 1 c a = d j x j. j V a d j j V a E.g., d 1,..., d n is a discrete probability distribution and a random vector takes on values x 1,..., x n with these probabilities; e.g., in a MANOVA (Multivariate Analysis of Variance) setup. The above algorithms can be easily adapted to this situation.

12 Partition matrices P k : n k balanced partition matrix Z k = (z 1,..., z k ) k-partition vector: z a = (z 1a,..., z na ) T, where z ia = 1 V a, if i V a and 0, otherwise. Z k is suborthogonal: Z T k Z k = I k The ratio cut of the k-partition P k given W: g(p k, W) = tr Z T k LZ k = z T a Lz a. (1) We want to minimize it over balanced k-partition matrices Z k Z B k.

13 Estimation by Laplacian eigenvalues G is connected, the spectrum of L: 0 = λ 1 < λ 2 λ n unit-norm, pairwise orthogonal eigenvectors: u 1, u 2,..., u n, u 1 = 1 The discrete problem is relaxed to a continuous one: r 1,..., r n R k : representatives of the vertices X = (r 1,..., r n ) T = (x 1,..., x k ) n 1 min X T X=I k n i=1 j=i+1 w ij r i r j 2 = min tr X T LX = X T X=I k and equality is attained with X = (u 1,..., u k ). i=1 λ i

14 g k (G) = min Z k Z B k tr Z T k LZ k λ i (2) and equality can be attained only in the k = 1 trivial case, otherwise the eigenvectors u i (i = 2,..., k) cannot be partition vectors, since their coordinates sum to 0 because of the orthogonality to the u 1 = 1 vector. Optimum choice of k? tr Z T k LZ k = n λ i i=1 i=1 (u T i z a ) 2. (3) This sum is the smallest possible if the largest (u T i z a ) 2 terms correspond to eigenvectors belonging to the smallest eigenvalues. Thus, the above sum is the most decreased by keeping only the k smallest eigenvalues in the inner summation and the corresponding eigenvectors are close to the subspace F k = Span {z 1,..., z k }.

15 g k,k (Z k, L) := λ i (u T i z a ) 2, i=1 we maximize g k,k (Z k, L) over Zk B for given L. The vectors λ i u i are projected onto the subspace F k : λi u i = λi (u T i z a )z a + ort Fk ( λ i u i ), i = 1,..., k. As λ 1 u 1 = 0, there is no use of projecting it. By the Pythagorean equality: λ i = λ i u i 2 = λ i (u T i z a ) 2 +dist 2 ( λ i u i, F k ), i = 1,..., k.

16 X k = ( λ 1 u 1,..., λ k u k ) = (r 1,..., r n) T. Sk 2 (X k) = dist 2 ( λ i u i, F k ). i=1 λ i = g k,k (Z k, L) + Sk 2 (P k, X k), (4) i=1 where the partition matrix Z k corresponds to the k-partition P k.

17 We are looking for the k-partition maximizing the first term. In view of (4), increasing g k,k (Z k, L) can be achieved by decreasing Sk 2(X k ); latter one is obtained by appying the k-means algorithm with k clusters for the k-dimensional representatives r 1,..., r n. As the first column of X k is 0, it is equivalent to apply the k-means algorithm with k clusters for the (k 1)-dimensional representatives that are the row vectors of the n (k 1) matrix obtained from X k by deleting its first column. λ k λ k+1 : we gain the most by omitting the n k largest eigenvalues.

18 Perturbation results W = W w + W b : within- and between-cluster edges w.r.t. P k D = D w + D b and L = L w + L b ρ: the smallest positive eigenvalue of L w ε: the largest eigenvalue of L b Suppose ε(p k ) < ρ(p k ) ρ = min ρi is large if the clusters are good expanders ε 2 max i {1,...,n} j: c(j) c(i) w ij, where c(i) is the cluster membership of vertex i. It is small, if from each vertex there are few, small-weight edges emaneting to clusters different of the vertex s own cluster

19 We proved (B, Tusnády, Discrete Math. 1994) that S 2 k (X k ) = min P k P k S 2 k (P k, X k ) k min P k P k ε(p k ) ρ(p k ) By the Weyl s perturbation theorem: 0 < λ k ε < ρ λ k+1 ρ + ε (5) for the ε and ρ of any k-partition with ε < ρ. Consequently, λ k ε(p k ) min λ k+1 P k P k ρ(p k ), (6)

20 Theorem If for a given k-partition ε(p k) ρ(p k ) < 1, then where X k = (u 1,..., u k ). Theorem Sk 2 [ε(p k )] 2 (P k, X k ) k [ρ(p k ) ε(p k )] 2, S 2 k (P k, X k) [ε(p k )] 2 [ρ(p k ) ε(p k )] 2 λ i, i=1 where X k = ( λ 1 u 1,..., λ k u k ). Consequently, S 2 k (X k) i=1 λ i min P k P k ε(p k ) 2 (ρ(p k ) ε(p k )) 2

21 Minimizing the normalized cut n k normalized partition matrix: Z k = (z 1,..., z k ) z a = (z 1a,..., z na ) T, where z ia = 1 Vol (Va), if i V a and 0, otherwise. The normalized cut of the k-partition P k given W: f (P k, W) = tr Z T k LZ k = tr (D 1/2 Z k ) T L D (D 1/2 Z k ) (7) or equivalently, f (P k, W) = k tr (D 1/2 Z k ) T (D 1/2 WD 1/2 )(D 1/2 Z k ). We want to minimize f (P k, W) over the k-partitions. It is equivalent to maximizing tr Z T k WZ k over normalized k-partition matrices Z k Z N k.

22 Normalized Laplacian eigenvalues G is connected, 0 = λ 1 < λ 2 λ p < 1 λ p+1 λ n 2 eigenvalues of L D with corresponding unit-norm, pairwise orthogonal eigenvectors u 1,..., u n, u 1 = ( d 1,..., d n ) T Continuous relaxation: X = (r 1,..., r n ) T = (x 1,..., x k ) n 1 min X T DX=I k n i=1 j=i+1 w ij r i r j 2 = min tr X T LX = X T DX=I k and the minimum is attained with x i = D 1/2 u i (i = 1,..., k). Especially, x 1 = 1. i=1 λ i

23 f k (G) = min Z k Z N k tr Z T k LZ k and equality can be attained only in the k = 1 trivial case, otherwise the transformed eigenvectors D 1/2 u i (i = 2,..., k) cannot be partition vectors, since their coordinates sum to 0 due to the orthogonality of the 1 vector. Equivalently, i=1 λ i max Z k Z N k tr Z T k WZ k k λ i = i=1 (1 λ i ). i=1

24 Optimum choice of k tr Z T k WZ k ) = n (1 λ i ) i=1 [(u i ) T (D 1/2 z a )] 2 increased if we neglect the terms belonging to the eigenvalues at least 1, hence, the outer summation stops at p. The inner sum is the largest in the k = p case, when the unit-norm, pairwise orthogonal vectors D 1/2 z 1,..., D 1/2 z p are close to the orthonormal eigenvectors u 1,..., u p, respectively. F p := Span {D 1/2 z 1,..., D 1/2 z p } f p,p (Z p, W) := p p (1 λ i ) [(u i) T (D 1/2 z a )] 2 i=1 tr (D 1/2 Z p ) T (D 1/2 WD 1/2 )(D 1/2 Z p ) f p,p (Z p, W)

25 For given W, we maximize f p,p (Z p, W) over Z N p. The vectors 1 λ i u i are projected onto the subspace F p : 1 λi u i = p [( 1 λ i u i ) T D 1/2 z a ] D 1/2 z a +ort Fp ( 1 λ i u i ) (i = 1,..., p) As 1 λ 1 u 1 = u 1 is in F p, its orthogonal component is 0. By the Pythagorean equality: 1 λ i = p [( 1 λ i u i ) T D 1/2 z a ] 2 + dist 2 ( 1 λ i u i, F p ) (i = 1,..., p)

26 Representation X p = (r 1,..., r n ) T = ( 1 λ 1 D 1/2 u 1,..., 1 λ p D 1/2 u p ) dist 2 ( 1 λ i u i, F p ) = n d j (r ji c ji ) 2, j=1 i = 1,..., p where r ji is the ith coordinate of the vector r j and c ji is the same for vector c j R p ; further, there are at most p different ones among the centers c 1,..., c n assigned to the vertex representatives. Namely, c ji = 1 d l r li, j V a, i = 1,..., p. Vol (V a ) l V a S 2 p(p p, X p ) = p dist 2 ( 1 λ i u i, F p ) = i=1 n d j r j c j 2. j=1

27 p (1 λ i ) = f p,p (Z p, W) + S p(p 2 p, X p). i=1 We are looking for the p-partition maximizing the first term. In view of the above formula, increasing f p,p (Z p, W) can be achieved by decreasing S 2 p(x p ); latter one is obtained by appying the k-means algorithm with p clusters for the p-dimensional representatives r 1,..., r n with respective weights d 1,..., d n. As the first column of X p is 1, it is equivalent to apply the k-means algorithm with p clusters for the (p 1)-dimensional representatives that are the row vectors of the n (p 1) matrix obtained from X p by deleting its first column.

28 Similarly, for d < k p: d (1 λ i ) = i=1 d [( d 1 λ i u i ) T D 1/2 z a ] 2 + dist 2 ( 1 λ i u i, F k ) i=1 i=1 := f k,d (Z k, W) + S 2 k (P k, X d), X d = ( 1 λ 1 D 1/2 u 1,..., 1 λ d D 1/2 u d ). In the presence of a spectral gap between λ d and λ d+1 < 1 neither d i=1 (1 λ i) nor S k 2(X d ) is increased significantly by introducing one more eigenvalue-eigenvector pair (by using (d + 1)-dimensional representatives instead of d-dimensional ones). Consequently, f k,d (Z k, W) would not change much, and k = d clusters based on d-dimensional representatives will suffice. Increasing the number of clusters (k + 1,..., p) will decrease the sum of the inner variances S k+1 2 (X d ) S k 2(X d ), but not significantly.

29 In the k = p special case tr Z T p LZ p p f p,p (Z p, W) = p λ i + S p(p 2 p, X p) i=1 that is true for the minima too: f p (G) p λ i + S p(x 2 p), i=1 a sharper lower estimator for the minimum normalized p-way cut than p i=1 λ i.

30 Spectral gap and variance In B, Tusnády, Discrete Math., 1994 Theorem In the representation X 2 = (D 1/2 u 1, D 1/2 u 2 ) = (1, D 1/2 u 2 ): Similarly, Theorem S 2 k (X 2 ) λ 2 λ 3 In the representation X 2 = ( 1 λ 1 D 1/2 u 1, 1 λ 2 D 1/2 u 2 ) = (1, 1 λ 2 D 1/2 u 2 ): Can it be generalized for k > 2? S 2 k (X 2) λ 2(1 λ 2 ) λ 3

31 Isoperimetric number Definition The Cheeger constant of the weighted graph G = (V, W ) is h(g) = min U V Vol (U) 1/2 e(u, Ū) Vol (U) Theorem (B, M-Sáska, Discrete Math. 2004). Let λ 2 be the smallest positive eigenvalue of L D. Then λ 2 2 h(g) min{1, 2λ 2 }. If λ 2 1 then the upper estimate can be improved to h(g) λ 2 (2 λ 2 ).

32 f 2 (G) is the symmetric version of h(g): f 2 (G) 2h(G) = f 2 (G) λ 2 (2 λ 2 ), λ 2 1. h(g) and f 2 (G) were estimated by Mohar,J. Comb. Theory B for simple graphs (similar formula by max-degree) Theorem Suppose that G = (V, W) is connected, and λ i s are the eigenvalues of L D. Then k i=1 λ i f k (G) and in the case when the optimal k-dimensional representatives can be classified into k well-separated clusters in such a way that the maximum cluster diameter ε satisfies the relation ε min{1/ 2k, 2 min i Vol (Vi )} with k-partiton (V 1,..., V k ) induced by the clusters above, then f k (G) c 2 λ i, where c = 1 + εc /( 2 εc ) and c = 1/ min i Vol (Vi ). i=1

33 The Newman Girvan modularity for simple graphs G = (V, A): simple graph, V = n, A is 0/1 adjacency matrix number of edges: e = 1 2 i = 1 n i = 1 n a ij d i = n j=1 a ij: the degree of vertex i h ij := d i d j 2e : expected number of i j edges by random attachment P k = (V 1,..., V k ): k-partition of the vertices (modules) Newman and Girvan, Phys. Rev. E, 2004 introduced a modularity with large values for stronger (than expected) intra-module connections.

34 Definition The Newman Girvan modularity of the k-partition P k given A: Q NG (P k, A) = 1 2e i,j V a (a ij h ij ) Obviously, Q NG takes on values less than 1, and due to n i=1 n j=1 a ij = n i=1 n j=1 h ij it is 0 for k = 1 (no community structure at all); therefore, only integers k [2, N] are of interest. For given A and k, we are looking for max Pk P k Q NG (P k, A), more precisely, for the optimum k-partition giving the arg max of it; and eventually, for the optimum k too (there may be several local maxima).

35 H = (h ij ): matrix of rank 1 B = A H: modularity matrix The row sums of B are zeros, therefore it always has a 0 eigenvalue with the trivial eigenvector 1 = (1, 1,..., 1) T. If G is the complete graph, then B is negative semidefinite; but typically, it is indefinite. β 1 β 2 β n eigenvalues of B with unit norm, pairwise orthogonal eigenvectors u 1,..., u n. Newman, Phys. Rev. E, 2006 uses the eigenvectors belonging to the positive eigenvalues of B.

36 Under the null-hypothesis, vertices i and j are connected to each other independently, with probabilities proportional (actually, because of the normalizing condition, equal) to their generalized degrees. for edge-weighted graphs G = (V, W), w.l.o.g. n i=1 n j=1 w ij = 1 supposed h ij = d i d j, i, j = 1,..., n Definition the Newman-Girvan modularity of P k given W: Q(P k, W) = = (w ij h ij ) i,j V a [e(v a, V a ) Vol 2 (V a )],

37 For or given k we maximize Q(P k, W) over P k. This task is equivalent to minimizing (w ij h ij ). i V a, j V b a b We want to penalize partitions with clusters of extremely different sizes or volumes Definition Balanced Newman Girvan modularity of P k given W: Q B (P k, W) = = 1 V a (w ij h ij ) i,j V a [ e(va, V a ) V a Vol 2 ] (V a ), V a

38 Definition Normalized Newman Girvan modularity of P k given W: Q N (P k, W) = = 1 Vol (V a ) e(v a, V a ) Vol (V a ) 1, i,j V a (w ij h ij ) Maximizing the normalized Newman Girvan modularity over P k is equivalent to minimizing the normalized cut.

39 Maximizing the balanced Newman Girvan modularity B = W H: modularity matrix Spectrum: β 1 β p > 0 = β p+1 β n Unit-norm, pairwise orthogonal eigenvectors: u 1,..., u n, u p+1 = 1/ n. Q B (P k, W) = Q B (Z k, B) = z T a Bz a = tr Z T k BZ k. We want to maximize tr Z T k BZ k over balanced k-partition matrices Z k Z B k.

40 Continuous relaxation: max tr (Y T BY) = Y T Y=I k max y T a y b=δ ab ya T By a = and equality is attained when y 1,..., y k are eigenvectors of B corresponding to β 1,..., β k. Though the vectors themselves are not necessarily unique (e.g., in case of multiple eigenvalues), the subspace Span {y 1,..., y k } is unique if β k > β k+1. max Z k Z B k Q B (Z k, B) β a p+1 β a β a. (8) Both inequalities can be attained by equality only in the k = 1, p = 0 case, when our underlying graph is the complete graph: A = 11 T I, W = 1 n(n 1) A. In this case there is only one cluster with partition vector of equal coordinates (balanced eigenvector belonging to the single 0 eigenvalue).

41 The maximum with respect to k of the maximum in (8) is attained with the choice of k = p + 1. Q B (Z k, B) = tr Z T k BZ k = = n β i i=1 n z T a ( β i u i u T i )z a (u T i z a ) 2. We can increase the last sum if we neglect the terms belonging to the negative eigenvalues, hence, the outer summation stops at p, or equivalently, at p + 1. In this case the inner sum is the largerst in the k = p + 1 case, when the partition vectors z 1,..., z p+1 are close to the eigenvectors u 1,..., u p+1, respectively. As both systems consist of orthonormal sets of vectors, the two subspaces spanned by them should be close to each other. The subspace F p+1 = Span {z 1,..., z p+1 } consists of stepwise constant vectors on p + 1 steps, therefore u p+1 F p+1, and it suffices to process only the first p eigenvectors. i=1

42 Q B (Z p+1, B) Q p+1,p(z p+1, B) := p i=1 p+1 β i and in the sequel, for given B, we want to maximize Q p+1,p (Z p+1, B) over Z B p+1. Project the vectors β i u i onto the subspace F p+1 : (u T i z a ) 2, p+1 βi u i = [( β i u i ) T z a ]z a +ort Fp+1 ( β i u i ), i = 1,..., p. (9) p+1 β i = β i u i 2 = [( β i u i ) T z a ] 2 + dist 2 ( β i u i, F p+1 ), i = 1,..., p.

43 p β i = i=1 p p+1 [( β i u i ) T z a ] 2 + i=1 p dist 2 ( β i u i, F p+1 ) i=1 = Q p+1,p(z p+1, B) + S 2 p+1(x p ), where the rows of X p = ( β 1 u 1,..., β p u p ) are regarded as p-dimensional representatives of the vertices. We could as well take (p + 1)-dimensional representatives as the last coordinates are zeros, and hence, S 2 p+1 (X p) = S 2 p+1 (X p+1). Maximizing Q p+1,p is equivalent to minimizing S2 p+1 (X p) that can be obtained by applying the k-means algorithm for the p-dimensional representatives with p + 1 clusters.

44 More generally, if there is a gap between β k β k+1 > 0, then we look for k + 1 clusters based on k-dimensional representatives of the vertices. k leading eigenvectors are projected onto the k-dimensional subspace of F k+1 orthogonal to 1. Calculating eigenvectors is costy; the Lánczos method performs well if we calculate only eigenvectors belonging to some leading eigenvalues followed by a spectral gap. Some authors suggest to use as many eigenvectors as possible. In fact, using more eigenvectors (up to p) is better from the point of view of accuracy, but using less eigenvectors (up to a gap in the positive part of the spectrum) is better from the computational point of view. We have to compromise.

45 Normalized modularity matrix B D = D 1/2 BD 1/2 Q N (P k, W) = Q N (Z k, B) = z T a Bz a = tr (D 1/2 Z k ) T B D (D 1/2 Z k ) 1 = β 1 β n 1: spectrum of B D u 1,..., u n: unit-norm, pairwise orthogonal eigenvectors u 1 = ( d 1,..., d n ) T p: number of positive eigenvalues of B D (this p not necessarily coincides with that of B)

46 max Z k Z N k Q N (Z k, B) = Q N (Z k, B) n β i i=1 p+1 β a β a. [(u i) T (D 1/2 z a )] 2. We can increase this sum if we neglect the terms belonging to the negative eigenvalues, hence, the outer summation stops at p, or equivalently, at p + 1. The inner sum is the largest in the k = p + 1 case, when the unit-norm, pairwise orthogonal vectors D 1/2 z 1,..., D 1/2 z p+1 are close to the eigenvectors u 1,..., u p+1, respectively. In fact, the two subspaces spanned by them should be close to each other.

47 F p+1 = Span {D 1/2 z 1,..., D 1/2 z p+1 } not stepwise constant vectors! X p = ( β 1 D 1/2 u 1,..., β p D 1/2 u p) S 2 p+1(p k, X p) = weighted k-means algorithm gap history = p i=1 dist 2 ( β i u i, F p+1 ) n d j x j c j 2 min. j=1

48 Pure community structure G: disjoint union of k complete graphs on n 1,..., n k vertices Eigenvalues of the modularity matrix: k 1 positive β i s (>1) In the n 1 = = n k special case it is a multiple eigenvalue. 0: single eigenvalue 1: eigenvalue with multiplicity n k Eigenvalues of the normalized modularity matrix: 1: with multiplicity k 1 0: single eigenvalue n k negative eigenvalues in (-1,0) In the n 1 = = n k special case there is one negative eigenvalue with multiplicity n k Piecewise constant eigenvectors belonging to the positive eigenvalues = k clusters

49 Pure anticommunity structure G: complete k-partite graph on n 1,..., n k vertices Eigenvalues of the modularity matrix: k 1 negative eigenvalues 0: all the other eigenvalues Eigenvalues of the normalized modularity matrix: k 1 negative eigenvalues in (-1,0) 0: all the other eigenvalues MINIMIZE THE MODULARITY! Piecewise constant eigenvectors belonging to the negative eigenvalues = k clusters

50 Block matrices Definition The n n symmetric real matrix B is a blown-up matrix, if there is a k k symmetric so-called pattern matrix P with entries 0 p ij 1, and there are positive integers n 1,..., n k with k i=1 n i = n, such that after rearranging its rows and columns the matrix B can be divided into k k blocks, where block (i, j) is an n i n j matrix with entries all equal to p ij (1 i, j n). G = (V, B) is a weighted graph with possible loops Generalized random graph: G is a random graph whose edges exist independently, with probabilities p ij. Modularity matrix: k 1 structural (large absolute value) eigenvalues = k clusters by the corresponding eigenvectors Pure community structure: p ii = 1, p ij = 0 (i j) Pure anticommunity structure: p ii = 0, p ij = 1 (i j)

51 Eigenvalues of blown-up matrices and their Laplacians GENERAL CASE p ij s are arbitrary, P = B := diag (n 1,..., n k ) with n i cn (i = 1,..., k) rank (B) = k, non-zero eigenvalues: n λ i = Θ(n) with stepwise constant eigenvectors λ i s are eigenvalues of 1/2 P 1/2 Laplacian eigenvalues: 0: single eigenvalue λ 1,..., λ k 1 = Θ(n) with stepwise constant eigenvectors γ i = Θ(n) with multiplicity n i 1, eigenspace: vectors with coordinates 0 except block i, where the sum of the coordinates is 0 (i = 1,..., k)

52 Normalized Laplacian eigenvalues: There exists δ > 0 (independent of n) such that there are k eigenvalues [0, 1 δ] [1 + δ, 2] with stepwise constant eigenvectors. All the other eigenvalues are 1.

53 COMPLETE k-partite GRAPH p ii = 0, p ij = 1 (i j) non-zero eigenvalues of B: λ 1,..., λ k =Θ(n) with stepwise constant eigenvectors λ 1,..., λ k [ max i n i, min i n i ] [n max i n i, n min i n i ] Laplacian eigenvalues: 0: single eigenvalue λ 1 = = λ k 1 = n with stepwise constant eigenvectors γ i = n n i with multiplicity n i 1, eigenspace: vectors with coordinates 0 except block i, where the sum of the coordinates is 0 (i = 1,..., k)

54 Normalized Laplacian eigenvalues: There exists δ > 0 (independent of n) such that there are k 1 eigenvalues [1 + δ, 2] with stepwise constant eigenvectors. 0 is a single eigenvalue, all the other eigenvalues are 1.

55 k DISJOINT CLUSTERS A = k i=1 A i block-diagonal Diagonal/non-diagonal entries of the n i n i matrix A i : ν i /µ i Large eigenvalues of A: λ i = (n i 1)µ i + ν i = Θ(n) with stepwise constant eigenvectors. Small eigenvalues of A: ν i µ i with multiplicity n i 1 (i = 1,..., k). Laplacian eigenvalues: 0: with multiplicity k, eigenspace: stepwise constant vectors. n i µ i with with multiplicity n i 1 (i = 1,..., k) Normalized Laplacian eigenvalues: 0: with multiplicity k. n i µ i ν i (n i 1)µ i 1 with with multiplicity n i 1 (i = 1,..., k). Union of k complete graphs (ν i = 0). Non-0 e.v. s: n i n i 1 > 1.

SVD, discrepancy, and regular structure of contingency tables

SVD, discrepancy, and regular structure of contingency tables Marianna Bolla Institute of Mathematics Technical University of Budapest (Research supported by the TÁMOP-4.2.2.B-10/1-2010-0009 project) marib@math.bme.hu