Introduction to Spectral Graph Theory and Graph Clustering

Similar documents
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering: a Survey

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts

MATH 567: Mathematical Techniques in Data Science Clustering II

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

Spectral Clustering on Handwritten Digits Database

Notes on Elementary Spectral Graph Theory Applications to Graph Clustering

Spectral Clustering. Zitao Liu

Spectral Clustering. Guokun Lai 2016/10

Machine Learning for Data Science (CS4786) Lecture 11

MATH 567: Mathematical Techniques in Data Science Clustering II

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

CS 556: Computer Vision. Lecture 13

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

STA141C: Big Data & High Performance Statistical Computing

Graphs in Machine Learning

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

A spectral clustering algorithm based on Gram operators

Chapter 4. Signed Graphs. Intuitively, in a weighted graph, an edge with a positive weight denotes similarity or proximity of its endpoints.

CS 664 Segmentation (2) Daniel Huttenlocher

COMPSCI 514: Algorithms for Data Science

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Learning Spectral Graph Segmentation

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Graphs in Machine Learning

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

1 Matrix notation and preliminaries from spectral graph theory

Spectral Techniques for Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Linear Spectral Hashing

Networks and Their Spectra

Lecture 13: Spectral Graph Theory

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Graph Partitioning Using Random Walks

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Some notes on SVD, dimensionality reduction, and clustering. Guangliang Chen

Markov Chains and Spectral Clustering

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

An indicator for the number of clusters using a linear map to simplex structure

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

An Introduction to Spectral Graph Theory

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Spectral Generative Models for Graphs

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Statistical and Computational Analysis of Locality Preserving Projection

MATH 829: Introduction to Data Mining and Analysis Clustering II

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Linear algebra and applications to graphs Part 1

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Graph fundamentals. Matrices associated with a graph

Data Mining and Analysis: Fundamental Concepts and Algorithms

Space-Variant Computer Vision: A Graph Theoretic Approach

1 Matrix notation and preliminaries from spectral graph theory

An Algorithmist s Toolkit September 10, Lecture 1

Power Grid Partitioning: Static and Dynamic Approaches

Embedding and Spectrum of Graphs

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Eigenvalue Problems Computation and Applications

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

GRAPH PARTITIONING WITH MATRIX COEFFICIENTS FOR SYMMETRIC POSITIVE DEFINITE LINEAR SYSTEMS

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

Graph Partitioning Algorithms and Laplacian Eigenvalues

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Unsupervised Clustering of Human Pose Using Spectral Embedding

Fitting a Graph to Vector Data. Samuel I. Daitch (Yale) Jonathan A. Kelner (MIT) Daniel A. Spielman (Yale)

Fisher s Linear Discriminant Analysis

Grouping with Bias. Stella X. Yu 1,2 Jianbo Shi 1. Robotics Institute 1 Carnegie Mellon University Center for the Neural Basis of Cognition 2

MATH 285 HW3. Xiaoyan Chong

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

1 Review: symmetric matrices, their eigenvalues and eigenvectors

COMMUNITY DETECTION IN SPARSE NETWORKS VIA GROTHENDIECK S INEQUALITY

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

5 Quiver Representations

8.1 Concentration inequality for Gaussian random matrix (cont d)

Grouping with Directed Relationships

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Sparse Subspace Clustering

On the inverse matrix of the Laplacian and all ones matrix

Random walks and anisotropic interpolation on graphs. Filip Malmberg

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Spectral Clustering using Multilinear SVD

Linear Systems of ODE: Nullclines, Eigenvector lines and trajectories

Lecture 1: From Data to Graphs, Weighted Graphs and Graph Laplacian

Functional Analysis Review

ECS 289 / MAE 298, Network Theory Spring 2014 Problem Set # 1, Solutions. Problem 1: Power Law Degree Distributions (Continuum approach)

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Last lecture: Recurrence relations and differential equations. The solution to the differential equation dx

Communities, Spectral Clustering, and Random Walks

Transcription:

Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40

Motivation Image partitioning in computer vision 2 / 40

Motivation Community detection in network analysis 3 / 40

Outline I. Graph and graph Laplacian Graph Weighted graph Graph Laplacian II. Graph clustering Graph clustering Normalized cut Spectral clustering 4 / 40

I.1 Graph An (undirected) graph is a pair G = (V, E), where {v i } is a set of vertices; E is a subset of V V. Remarks: An edge is a pair {u, v} with u v (no self-loop); There is at most one edge from u to v (simple graph). 5 / 40

I.1 Graph For every vertex v V, the degree d(v) of v is the number of edges adjacent to v: d(v) = {u V {u, v} E}. Let d i = d(v i ), the degree matrix D = D(G) = diag(d 1,..., d n ). D = 2 0 0 0 0 3 0 0 0 0 3 0 0 0 0 2. 6 / 40

I.1 Graph Given a graph G = (V, E), with V = n and E = m, the incidence matrix D(G) of G is an n m matrix with d ij = { 1, if k s.t. ej = {v i, v k } 0, otherwise D(G) = e 1 e 2 e 3 e 4 e 5 v 1 1 1 0 0 0 v 2 1 0 1 1 0 v 3 0 1 1 0 1 v 4 0 0 0 1 1 7 / 40

I.1 Graph Given a graph G = (V, E), with V = n and E = m, the adjacency matrix A(G) of G is a symmetric n n matrix with a ij = { 1, if {vi, v j } E 0, otherwise. A(G) = 0 1 1 0 1 0 1 1 1 1 0 1 0 1 1 0 8 / 40

I.2 Weighted graph A weighted graph is a pair G = (V, W ) where V = {v i } is a set of vertices and V = n; W R n n is called weight matrix with { wji 0 if i j w ij = 0 if i = j The underlying graph of G is Ĝ = (V, E) with E = {{v i, v j } w ij > 0}. If w ij {0, 1}, W = A, the adjacency matrix of Ĝ. Since w ii = 0, there is no self-loops in Ĝ. 9 / 40

I.2 Weighted graph For every vertex v i V, the degree d(v i ) of v i is the sum of the weights of the edges adjacent to v i : d(v i ) = n w ij. j=1 Let d i = d(v i ), the degree matrix D = D(G) = diag(d 1,..., d n ). Remark: Let d = diag(d) and denote 1 = (1,..., 1) T, then d = W 1. 10 / 40

I.2 Weighted graph Given a subset of vertices A V, we define the volume vol(a) by vol(a) = d(v i ) = n w ij. v i A v i A j=1 Remarks: If vol(a) = 0, all the vertices in A are isolated. Example: If A = {v 1, v 3 }, then vol(a) =d(v 1 ) + d(v 3 ) =(w 12 + w 13 )+ (w 31 + w 32 + w 34 ) 11 / 40

I.2 Weighted graph Given two subsets of vertices A, B V, we define the links links(a, B) by Remarks: links(a, B) = A and B are not necessarily distinct; v i A,v j B w ij. Since W is symmetric, links(a, B) = links(b, A); vol(a) = links(a, V ). 12 / 40

I.2 Weighted graph The quantity cut(a) is defined by cut(a) = links(a, V A). The quantity assoc(a) is defined by assoc(a) = links(a, A). Remarks: cut(a) measures how many links escape from A; assoc(a) measures how many links stay within A; cut(a) + assoc(a) = vol(a). 13 / 40

I.3 Graph Laplacian Given a weighted graph G = (V, W ), the (graph) Laplacian L of G is defined by L = D W. where D is the degree matrix of G. Remark D = diag(w 1). 14 / 40

I.3 Graph Laplacian Properties of Laplacian 1. x T Lx = 1 2 n i,j=1 w ij(x i x j ) 2 for x R n. 2. L 0 if w ij 0 for all i, j; 3. L 1 = 0; 4. If the underlying graph of G is connected, then 0 = λ 1 < λ 2 λ 3... λ n ; 5. If the underlying graph of G is connected, then the dimension of the nullspace of L is 1. 15 / 40

I.3 Graph Laplacian Proofs Property 1. Since L = D W, we have x T Lx = x T Dx x T W x n n = d i x 2 i w ij x i x j i=1 i,j=1 = 1 n 2 ( d i x 2 i 2 i n w ij x i x j + i,j=1 = 1 n 2 ( w ij x 2 i 2 = 1 2 i,j=1 n d j x 2 j) j=1 n w ij x i x j + i,j=1 n w ij (x i x j ) 2. i,j=1 n w ij x 2 j) i,j=1 16 / 40

I.3 Graph Laplacian Property 2. Since L T = D W T = D W = L, L is symmetric. Since x T Lx = 1 n 2 i,j=1 w ij(x i x j ) 2 and w ij 0 for all i, j, we have x T Lx 0. Property 3. L 1 = (D W )1 = D1 W 1 = d d = 0. Properties 4 and 5: skip for now, see 2.2 of [Gallier 14]. 17 / 40

Outline I. Graph and graph Laplacian Graph Weighted graph Graph Laplacian II. Graph clustering Graph clustering Normalized cut Spectral clustering 18 / 40

II.1 Graph clustering k-way partitioning: given a weighted graph G = (V, W ), find a partition A 1, A 2,..., A k of V, such that A 1 A 2... A k = V ; A 1 A 2... A k = ; for any i and j, the edges between (A i, A j ) have low weight and the edges within A i have high weight. If k = 2, it is a two-way partitioning. 19 / 40

II.1 Graph clustering Recall: (two-way) cut: cut(a) = links(a, Ā) = where Ā = V A. v i A,v j Ā w ij 20 / 40

II.1 Graph clustering problems The mincut is defined by min cut(a) = min w ij. v i A,v j Ā In practice, the mincut easily yields unbalanced partitions. min cut(a) = 1 + 2 = 3; 21 / 40

II.2 Normalized cut The normalized cut 1 is defined by Ncut(A) = cut(a) vol(a) + cut(ā) vol(ā). 1 Jianbo Shi and Jitendra Malik, 2000 22 / 40

II.2 Normalized cut Minimal Ncut: min Ncut(A). min Ncut(A) = 4 3+6+6+3 + 4 3+6+6+3 = 4 9. 23 / 40

II.2 Normalized cut 1. Let x = (x 1,..., x n ) be the indicator vector, such that { 1 if vi A x i = 1 if v i Ā. 2. Then (1 + x) T D(1 + x) = 4 v i A d i = 4 vol(a); (1 + x) T W (1 + x) = 4 v i A,v j A w ij = 4 assoc(a). (1 + x) T L(1 + x) = 4 (vol(a) assoc(a)) = 4 cut(a); and (1 x) T D(1 x) = 4 v i Ā d i = 4 vol(ā); (1 x) T W (1 x) = 4 v i Ā,v j Ā w ij = 4 assoc(ā). (1 x) T L(1 x) = 4 (vol(ā) assoc(ā)) = 4 cut(ā). (Please verify it after class.) 24 / 40

II.2 Normalized cut 3. Ncut(A) can now be written as Ncut(A) = 1 4 ( (1 + x) T L(1 + x) k1 T D1 + (1 ) x)t L(1 x) (1 k)1 T D1 = 1 4 ((1 + x) b(1 x))t L((1 + x) b(1 x)) b1 T. D1 where k = vol(a)/vol(v ), b = k/(1 k) and vol(v ) = 1 T D1. 4. Let y = (1 + x) b(1 x), we have Ncut(A) = 1 4 y T Ly b1 T D1 where y i = { 2 if vi A 2b if v i Ā. 25 / 40

II.2 Normalized cut 5. Since b = k/(1 k) = vol(a)/vol(ā), we have In addition, 1 4 yt Dy = d i + b 2 d i = vol(a) + b 2 vol(ā) v i A v i Ā = b(vol(ā) + vol(a)) = b 1T D1. y T D1 = y T d = 2 d i 2b v i A v i Ā = 2 vol(a) 2b vol(ā) = 0 d i 26 / 40

II.2 Normalized cut 6. The minimal normalized cut is to solve the following binary optimization: y = arg min y s.t. y T Ly y T Dy y(i) {2, 2b} y T D1 = 0 (1) 7. Relaxation y = arg min y s.t. y T Ly y T Dy y R n y T D1 = 0 (2) 27 / 40

II.2 Normalized cut Variational principle Let A, B R n n, A T = A, B T = B > 0 and λ 1 λ 2... λ n be the eigenvalues of Au = λbu with corresponding eigenvectors u 1, u 2,..., u n, then and min x x T Ax x T Bx = λ 1, arg min x x T Ax x T Bx = u 1 x T Ax min x T Bu 1 =0 x T Bx = λ x T Ax 2, arg min x T Bu 1 =0 x T Bx = u 2. More general form exists. 28 / 40

II.2 Normalized cut For the matrix pair (L, D), it is known that (λ 1, y 1 ) = (0, 1). By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair (λ 2, y 2 ) of Ly = λdy (3) Remarks: L is extremely sparse and D is diagonal; Precision requirement for eigenvectors is low, say O(10 3 ). 29 / 40

II.2 Normalized cut Image segmentation: original graph 30 / 40

II.2 Normalized cut Image segmentation: heatmap of eigenvectors 31 / 40

II.2 Normalized cut Image segmentation: result of min Ncut 32 / 40

II.3 Spectral clustering Ncut remaining issues Once the indicator vector is computed, how to search the splitting point that the resulting partition has the minimal Ncut(A) value? How to use the extreme eigenvectors to do the k-way partitioning? The above two problems are addressed in spectral clustering algorithm. 33 / 40

II.3 Spectral clustering Spectral clustering algorithm [Ng et al, 2002] Given a weighted graph G = (V, W ), 1. compute the normalized Laplacian L n = D 1 2 (D W )D 1 2 ; 2. find k eigenvectors X = [x 1,..., x k ] corresponding to the k smallest eigenvalues of L n ; 3. form Y R n k by normalizing each row of X such that Y (i, :) = X(i, ; )/ X(i, :) ; 4. treat each Y (i, :) as a point, cluster them into k clusters via K-means with label c i = {1,..., k}. The label c i indicates the cluster that v i belongs to. 34 / 40

II.3 Spectral clustering Synthetic example: original data 35 / 40

II.3 Spectral clustering Synthetic example: computed eigenvectors 36 / 40

II.3 Spectral clustering Synthetic example: 2-way clustering 37 / 40

II.3 Spectral clustering Synthetic example: 3-way clustering 38 / 40

II.3 Spectral clustering Synthetic example: 4-way clustering 39 / 40

References 1. Jean Gallier, Notes on elementary spectral graph theory applications to graph clustering using normalized cuts, 2013. 2. Jianbo Shi and Jitendra Malik, Normalized cuts and image segmentation, 2000. 3. Andrew Y Ng, Michael I. Jordan and Yair Weiss, On spectral clustering: Analysis and an algorithm, 2001 40 / 40