Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40

Motivation Image partitioning in computer vision 2 / 40

Motivation Community detection in network analysis 3 / 40

Outline I. Graph and graph Laplacian Graph Weighted graph Graph Laplacian II. Graph clustering Graph clustering Normalized cut Spectral clustering 4 / 40

I.1 Graph An (undirected) graph is a pair G = (V, E), where {v i } is a set of vertices; E is a subset of V V. Remarks: An edge is a pair {u, v} with u v (no self-loop); There is at most one edge from u to v (simple graph). 5 / 40

I.1 Graph For every vertex v V, the degree d(v) of v is the number of edges adjacent to v: d(v) = {u V {u, v} E}. Let d i = d(v i ), the degree matrix D = D(G) = diag(d 1,..., d n ). D = 2 0 0 0 0 3 0 0 0 0 3 0 0 0 0 2. 6 / 40

I.1 Graph Given a graph G = (V, E), with V = n and E = m, the incidence matrix D(G) of G is an n m matrix with d ij = { 1, if k s.t. ej = {v i, v k } 0, otherwise D(G) = e 1 e 2 e 3 e 4 e 5 v 1 1 1 0 0 0 v 2 1 0 1 1 0 v 3 0 1 1 0 1 v 4 0 0 0 1 1 7 / 40

I.1 Graph Given a graph G = (V, E), with V = n and E = m, the adjacency matrix A(G) of G is a symmetric n n matrix with a ij = { 1, if {vi, v j } E 0, otherwise. A(G) = 0 1 1 0 1 0 1 1 1 1 0 1 0 1 1 0 8 / 40

I.2 Weighted graph A weighted graph is a pair G = (V, W ) where V = {v i } is a set of vertices and V = n; W R n n is called weight matrix with { wji 0 if i j w ij = 0 if i = j The underlying graph of G is Ĝ = (V, E) with E = {{v i, v j } w ij > 0}. If w ij {0, 1}, W = A, the adjacency matrix of Ĝ. Since w ii = 0, there is no self-loops in Ĝ. 9 / 40

I.2 Weighted graph For every vertex v i V, the degree d(v i ) of v i is the sum of the weights of the edges adjacent to v i : d(v i ) = n w ij. j=1 Let d i = d(v i ), the degree matrix D = D(G) = diag(d 1,..., d n ). Remark: Let d = diag(d) and denote 1 = (1,..., 1) T, then d = W 1. 10 / 40

I.2 Weighted graph Given a subset of vertices A V, we define the volume vol(a) by vol(a) = d(v i ) = n w ij. v i A v i A j=1 Remarks: If vol(a) = 0, all the vertices in A are isolated. Example: If A = {v 1, v 3 }, then vol(a) =d(v 1 ) + d(v 3 ) =(w 12 + w 13 )+ (w 31 + w 32 + w 34 ) 11 / 40

I.2 Weighted graph Given two subsets of vertices A, B V, we define the links links(a, B) by Remarks: links(a, B) = A and B are not necessarily distinct; v i A,v j B w ij. Since W is symmetric, links(a, B) = links(b, A); vol(a) = links(a, V ). 12 / 40

I.2 Weighted graph The quantity cut(a) is defined by cut(a) = links(a, V A). The quantity assoc(a) is defined by assoc(a) = links(a, A). Remarks: cut(a) measures how many links escape from A; assoc(a) measures how many links stay within A; cut(a) + assoc(a) = vol(a). 13 / 40

I.3 Graph Laplacian Given a weighted graph G = (V, W ), the (graph) Laplacian L of G is defined by L = D W. where D is the degree matrix of G. Remark D = diag(w 1). 14 / 40

I.3 Graph Laplacian Properties of Laplacian 1. x T Lx = 1 2 n i,j=1 w ij(x i x j ) 2 for x R n. 2. L 0 if w ij 0 for all i, j; 3. L 1 = 0; 4. If the underlying graph of G is connected, then 0 = λ 1 < λ 2 λ 3... λ n ; 5. If the underlying graph of G is connected, then the dimension of the nullspace of L is 1. 15 / 40

I.3 Graph Laplacian Proofs Property 1. Since L = D W, we have x T Lx = x T Dx x T W x n n = d i x 2 i w ij x i x j i=1 i,j=1 = 1 n 2 ( d i x 2 i 2 i n w ij x i x j + i,j=1 = 1 n 2 ( w ij x 2 i 2 = 1 2 i,j=1 n d j x 2 j) j=1 n w ij x i x j + i,j=1 n w ij (x i x j ) 2. i,j=1 n w ij x 2 j) i,j=1 16 / 40

I.3 Graph Laplacian Property 2. Since L T = D W T = D W = L, L is symmetric. Since x T Lx = 1 n 2 i,j=1 w ij(x i x j ) 2 and w ij 0 for all i, j, we have x T Lx 0. Property 3. L 1 = (D W )1 = D1 W 1 = d d = 0. Properties 4 and 5: skip for now, see 2.2 of [Gallier 14]. 17 / 40

Outline I. Graph and graph Laplacian Graph Weighted graph Graph Laplacian II. Graph clustering Graph clustering Normalized cut Spectral clustering 18 / 40

II.1 Graph clustering k-way partitioning: given a weighted graph G = (V, W ), find a partition A 1, A 2,..., A k of V, such that A 1 A 2... A k = V ; A 1 A 2... A k = ; for any i and j, the edges between (A i, A j ) have low weight and the edges within A i have high weight. If k = 2, it is a two-way partitioning. 19 / 40

II.1 Graph clustering Recall: (two-way) cut: cut(a) = links(a, Ā) = where Ā = V A. v i A,v j Ā w ij 20 / 40

II.1 Graph clustering problems The mincut is defined by min cut(a) = min w ij. v i A,v j Ā In practice, the mincut easily yields unbalanced partitions. min cut(a) = 1 + 2 = 3; 21 / 40

II.2 Normalized cut The normalized cut 1 is defined by Ncut(A) = cut(a) vol(a) + cut(ā) vol(ā). 1 Jianbo Shi and Jitendra Malik, 2000 22 / 40

II.2 Normalized cut Minimal Ncut: min Ncut(A). min Ncut(A) = 4 3+6+6+3 + 4 3+6+6+3 = 4 9. 23 / 40

II.2 Normalized cut 1. Let x = (x 1,..., x n ) be the indicator vector, such that { 1 if vi A x i = 1 if v i Ā. 2. Then (1 + x) T D(1 + x) = 4 v i A d i = 4 vol(a); (1 + x) T W (1 + x) = 4 v i A,v j A w ij = 4 assoc(a). (1 + x) T L(1 + x) = 4 (vol(a) assoc(a)) = 4 cut(a); and (1 x) T D(1 x) = 4 v i Ā d i = 4 vol(ā); (1 x) T W (1 x) = 4 v i Ā,v j Ā w ij = 4 assoc(ā). (1 x) T L(1 x) = 4 (vol(ā) assoc(ā)) = 4 cut(ā). (Please verify it after class.) 24 / 40

II.2 Normalized cut 3. Ncut(A) can now be written as Ncut(A) = 1 4 ( (1 + x) T L(1 + x) k1 T D1 + (1 ) x)t L(1 x) (1 k)1 T D1 = 1 4 ((1 + x) b(1 x))t L((1 + x) b(1 x)) b1 T. D1 where k = vol(a)/vol(v ), b = k/(1 k) and vol(v ) = 1 T D1. 4. Let y = (1 + x) b(1 x), we have Ncut(A) = 1 4 y T Ly b1 T D1 where y i = { 2 if vi A 2b if v i Ā. 25 / 40

II.2 Normalized cut 5. Since b = k/(1 k) = vol(a)/vol(ā), we have In addition, 1 4 yt Dy = d i + b 2 d i = vol(a) + b 2 vol(ā) v i A v i Ā = b(vol(ā) + vol(a)) = b 1T D1. y T D1 = y T d = 2 d i 2b v i A v i Ā = 2 vol(a) 2b vol(ā) = 0 d i 26 / 40

II.2 Normalized cut 6. The minimal normalized cut is to solve the following binary optimization: y = arg min y s.t. y T Ly y T Dy y(i) {2, 2b} y T D1 = 0 (1) 7. Relaxation y = arg min y s.t. y T Ly y T Dy y R n y T D1 = 0 (2) 27 / 40

II.2 Normalized cut Variational principle Let A, B R n n, A T = A, B T = B > 0 and λ 1 λ 2... λ n be the eigenvalues of Au = λbu with corresponding eigenvectors u 1, u 2,..., u n, then and min x x T Ax x T Bx = λ 1, arg min x x T Ax x T Bx = u 1 x T Ax min x T Bu 1 =0 x T Bx = λ x T Ax 2, arg min x T Bu 1 =0 x T Bx = u 2. More general form exists. 28 / 40

II.2 Normalized cut For the matrix pair (L, D), it is known that (λ 1, y 1 ) = (0, 1). By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair (λ 2, y 2 ) of Ly = λdy (3) Remarks: L is extremely sparse and D is diagonal; Precision requirement for eigenvectors is low, say O(10 3 ). 29 / 40

II.2 Normalized cut Image segmentation: original graph 30 / 40

II.2 Normalized cut Image segmentation: heatmap of eigenvectors 31 / 40

II.2 Normalized cut Image segmentation: result of min Ncut 32 / 40

II.3 Spectral clustering Ncut remaining issues Once the indicator vector is computed, how to search the splitting point that the resulting partition has the minimal Ncut(A) value? How to use the extreme eigenvectors to do the k-way partitioning? The above two problems are addressed in spectral clustering algorithm. 33 / 40

II.3 Spectral clustering Spectral clustering algorithm [Ng et al, 2002] Given a weighted graph G = (V, W ), 1. compute the normalized Laplacian L n = D 1 2 (D W )D 1 2 ; 2. find k eigenvectors X = [x 1,..., x k ] corresponding to the k smallest eigenvalues of L n ; 3. form Y R n k by normalizing each row of X such that Y (i, :) = X(i, ; )/ X(i, :) ; 4. treat each Y (i, :) as a point, cluster them into k clusters via K-means with label c i = {1,..., k}. The label c i indicates the cluster that v i belongs to. 34 / 40

II.3 Spectral clustering Synthetic example: original data 35 / 40

II.3 Spectral clustering Synthetic example: computed eigenvectors 36 / 40

II.3 Spectral clustering Synthetic example: 2-way clustering 37 / 40

II.3 Spectral clustering Synthetic example: 3-way clustering 38 / 40

II.3 Spectral clustering Synthetic example: 4-way clustering 39 / 40

References 1. Jean Gallier, Notes on elementary spectral graph theory applications to graph clustering using normalized cuts, 2013. 2. Jianbo Shi and Jitendra Malik, Normalized cuts and image segmentation, 2000. 3. Andrew Y Ng, Michael I. Jordan and Yair Weiss, On spectral clustering: Analysis and an algorithm, 2001 40 / 40