COMPSCI 514: Algorithms for Data Science

Similar documents
1 Matrix notation and preliminaries from spectral graph theory

Machine Learning for Data Science (CS4786) Lecture 11

1 Matrix notation and preliminaries from spectral graph theory

Econ Slides from Lecture 7

Lecture 13: Spectral Graph Theory

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

1 Last time: least-squares problems

Eigenvalues and Eigenvectors A =

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

STA141C: Big Data & High Performance Statistical Computing

Spectra of Adjacency and Laplacian Matrices

An Algorithmist s Toolkit September 10, Lecture 1

Numerical Linear Algebra Homework Assignment - Week 2

Fiedler s Theorems on Nodal Domains

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

Chapter 4. Signed Graphs. Intuitively, in a weighted graph, an edge with a positive weight denotes similarity or proximity of its endpoints.

Lecture 7. Econ August 18

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Data Mining and Analysis: Fundamental Concepts and Algorithms

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Math Matrix Algebra

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Fiedler s Theorems on Nodal Domains

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

7. Symmetric Matrices and Quadratic Forms

8.1 Concentration inequality for Gaussian random matrix (cont d)

Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Review of Linear Algebra

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Introduction to Spectral Graph Theory and Graph Clustering

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Graph fundamentals. Matrices associated with a graph

Markov Chains and Spectral Clustering

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Networks and Their Spectra

Chapter 3 Transformations

Problem # Max points possible Actual score Total 120

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Linear algebra and applications to graphs Part 1

Chap 3. Linear Algebra

ELE/MCE 503 Linear Algebra Facts Fall 2018

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

Ma/CS 6b Class 20: Spectral Graph Theory

Lecture 7: Positive Semidefinite Matrices

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Maths for Signals and Systems Linear Algebra in Engineering

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Computational math: Assignment 1

Matrix Algebra: Summary

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Maths for Signals and Systems Linear Algebra in Engineering

c Igor Zelenko, Fall

Conceptual Questions for Review

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Lecture 1: Systems of linear equations and their solutions

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

MA 265 FINAL EXAM Fall 2012

Basic Calculus Review

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Lecture 6 Positive Definite Matrices

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear vector spaces and subspaces.

Chapter 7: Symmetric Matrices and Quadratic Forms

COMPSCI 514: Algorithms for Data Science

The Singular Value Decomposition

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

COMP 558 lecture 18 Nov. 15, 2010

MATH 3321 Sample Questions for Exam 3. 3y y, C = Perform the indicated operations, if possible: (a) AC (b) AB (c) B + AC (d) CBA

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

1 Linearity and Linear Systems

Singular Value Decomposition

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

Algebraic Methods in Combinatorics

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Large Scale Data Analysis Using Deep Learning

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Ma/CS 6b Class 20: Spectral Graph Theory

Dimensionality Reduction

4. Determinants.

TMA Calculus 3. Lecture 21, April 3. Toke Meier Carlsen Norwegian University of Science and Technology Spring 2013

Functional Analysis Review

Properties of Linear Transformations from R n to R m

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Lecture 15, 16: Diagonalization

Linear Algebra - Part II

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Math 3191 Applied Linear Algebra

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Transcription:

COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018

Lecture 8 Spectral Clustering

Spectral clustering Curse of dimensionality Dimensionality Reduction A : n d data matrix. Find the space V formed by the top k (right) singular vectors Project A on to V (to obtain A k = k i=1 σ iu i v T i ) Cluster the projected points (a total of n k-dimensional points)

Benefits of projection Figure 7.3: Clusters in the full space and their projections courtesy: the textbook

Benefits of projection A : n d data matrix C : n d matrix. ith row is the center of the cluster where a i, the corresponding row of A belong to Rank of C is k n a i c i 2 2 = A C 2 F i=1 A k : Projection of A on to the first k singular vectors.

Projection may not lead to data loss Note that: A k C F A k A F + A C F 2 A C F A good clustering for A k is a good clustering for A.

Spectral algorithms for graph clustering Find clusters in social networks, find communities in the internet Partition a graph A Cut: divides the graph in two minimize the number of edges across the cut

10.4.1 What Makes a Good Partition? How to cut Given a graph, we would like to divide the nodes into two sets so that the cut, or set of edges that connect nodes in different sets is minimized. However, we also want to constrain the selection of the cut so that the two sets are approximately equal in size. The next example illustrates the point. Example 10.14 : Recall our running example of the graph in Fig. 10.1. There, it is evident that the best partition puts {A, B, C} in one set and {D, E, F, G} in the other. The cut consists only of the edge (B,D) and is of size 1. No nontrivial cut can be smaller. A B D E C H G F Smallest cut Best cut Figure 10.11: The smallest cut might not be the best cut The smallest cut is not the best cut In Fig. 10.11 is a variant of our example, where we have added the node H and two extra edges, (H, C) and(c, G). If all we wanted was to minimize the size of the cut, then the best choice would be to put H in one set and all the other nodes in the other set. But it should be apparent that if we reject partitions where one set is too small, then the best we can do is to use the cut consisting of edges (B,D) and(c, G), which partitions the graph into two equal-sized sets {A, B, C, H} and {D, E, F, G}. 10.4.2 Normalized Cuts A proper definition of a good cut must balance the size of the cut itself against the difference in the sizes of the sets that the cut creates. One choice

How to cut Suppose a cut partition a graph in to two parts S and T Vol(S) : Number of edges with at least one end in S Cut(S, T ) : Number of edges that has one end in S and the other end in T Normalized cut: Cut(S, T ) Vol(S) + Cut(S, T ) Vol(T )

How to cut A cut is a partition into two parts Let one part be positive part; the other being negative part Consider a membership vector x R n The sign of the entry x i denotes whether it is in positive cluster or negative clusters

How to cut Assumptions on x We assume there are approximately same number of positive and negative entries (or the weight of the positive part is same as the weight of the negative part) x i = 0 i If there exist an edge between x i and x j then they are likely to have same signs For an edge (i, j), (x i x j ) 2 should be small

10.4.3 Some Matrices That Describe Graphs Describing a graph To develop the theory of how matrix algebra can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. A B D E C G F Figure 10.12: Repeat of the graph of Fig. 10.1 4 Adjacency matrix CHAPTER A 10. MINING SOCIAL-NETWORK GRAPH Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and column 2. 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the ith node. Figure 10.13: The adjacency matrix for Fig. 10.12

Figure 10.14: The degree matrix for Fig. 10.12 10.4.3 Some Matrices That Describe Graphs Describing a graph To develop the theory of how matrix algebra can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. Figure 10.13: The adjacency matrix for Fig. 10.12 A B D E ample 10.17 : The degree matrix for the graph of Fig. 10.12 is shown. 10.14. We use the same order of the nodes as in Example 10.16. F C tance, the entry in row 4 and column 4G is 4 because F node D has edges r other nodes. The entryfigure in10.12: rowrepeat 4 and of the graph column of Fig. 10.15 is 0, because that entry on the Degree diagonal. matrix D Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and column 2. 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the ith node.

10.4.3 Some Matrices That Describe Graphs To develop thethe theory ofgraph how matrix algebra Laplacian can help us find good graph partitions, we first need to learn about three different matrices that describe aspects of a graph. The first should be familiar: the adjacency matrix that has a 1 in row i and column j if there is an edge between nodes i and j, and0 otherwise. A B D E C G F Laplacian matrix L = D A Figure 10.12: Repeat of the graph of Fig. 10.1 10.4. PARTITIONING OF GRAPHS 365 Example 10.16 : We repeat our running example graph in Fig. 10.12. Its adjacency matrix appears in Fig. 10.13. Note that the rows and columns correspond to the nodes A, B,..., G in that order. For example, the edge (B,D) is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the entry in row 4 and2column -12. -1 0 0 0 0-1 3-1 -1 0 0 0 The second matrix we need is the degree matrix for a graph. This graph has nonzero entries only on the diagonal. The entry for row and column i is the degree of the -1-1 2 0 0 0 0 ith node. 0-1 0 4-1 -1-1 0 0 0-1 2-1 0 0 0 0-1 -1 3-1 0 0 0-1 0-1 2 Figure 10.15: The Laplacian matrix for Fig. 10.12

The graph Laplacian The smallest eigenvalue of L is zero and correspond to the 1 eigenvector n [1 1... 1] T. What is the second smallest eigenvalue-eigenvector pair of L? L is symmetric positive semidefinite matrix (which means for any x, x T Lx 0). We will see why shortly It has an orthonormal set of eigenvectors

The graph Laplacian The smallest eigenvalue of L is zero and correspond to the 1 eigenvector n [1 1... 1] T. The second (smallest) eigenvector is orthogonal to the (smallest) eigenvector must satisfy 1 n [1 1... 1] T x = 0 or x i = 0 i

The second (smallest) eigenvector of the Laplacian Minimize such that, i x 2 i x T Lx = 1 and x i = 0 i

The second (smallest) eigenvector of the Laplacian: meaning V = {1, 2,..., n} set of vertices E : set of edges Note that, x T Lx = x T Dx x T Ax = = (i,j) E (x 2 i + x 2 j ) 2 (i,j) E n d i xi 2 i=1 x i x j = n i=1 j=1 (i,j) E Note - this is what we wanted to optimize... n a ij x i x j (x i x j ) 2 The second smallest eigenvector of L gives a good cluster membership vector

The spectral clustering for graphs Find the second (smallest) eigenvector of the Laplacian Minimize x T Lx such that, i x 2 i = 1 and x i = 0 i assign node i to the cluster sign(x i )

Example: i whose corresponding vector component x i is positive and the other set to be those whose components are negative. This choice does not guarantee a partition The into sets spectral of equal size, clustering but the sizes are likely fortographs be close. We believe that the cut between the two sets will have a small number of edges because (x i x j ) 2 is likely to be smaller if both x i and x j have the same sign than if they have different signs. Thus, minimizing x T Lx under the required constraints will tend to give x i and x j the same sign if there is an edge (i, j). 1 4 2 10.4. PARTITIONING OF GRAPHS 367 3 6 Figure 10.16: Graph for illustrating partitioning by spectral analysis highest. Note that we have not scaled the eigenvectors to have length 1, but could Laplacian do so easily matrix if we L wished. = D A Example 10.19 : Let us apply the above technique to the graph of Fig. 10.16. The Laplacian matrix for this graph is shown in Fig. 10.17. Bystandardmethods or math packages 3we can -1find-1all the -1 eigenvalues 0 0 and eigenvectors of this matrix. We shall simply tabulate them in Fig. 10.18, from lowest eigenvalue to -1 2-1 0 0 0-1 -1 3 0 0-1 -1 0 0 3-1 -1 0 0 0-1 2-1 0 0-1 -1-1 3 5 Figure 10.17: The Laplacian matrix for Fig. 10.16

Example: be those whose components are negative. This choice does not guarantee a partition 10.4. PARTITIONING into sets of equal OFsize, GRAPHS but the sizes are likely to be close. 367 We believe that the The cut between spectral the two sets clustering will have a small for number graphs of edges because (x i x highest. j ) 2 isnote likelythat to be we smaller have notif scaled boththe x i eigenvectors and x j haveto the have same length sign 1, than but if they havecould different do so signs. easily ifthus, we wished. minimizing x T Lx under the required constraints will tend to give x i and x j the same sign if there is an edge (i, j). 3-1 -1-1 0 0-1 2-1 0 0 0-1 -1 1 3 0 0-14 -1 0 0 3-1 -1 2 0 0 0-1 2-1 5 0 0-1 -1-1 3 3 6 Figure 10.17: The Laplacian matrix for Fig. 10.16 Figure The second 10.16: eigenvector Graph for hasillustrating three positivepartitioning and three negative by spectral components. analysis It makes the unsurprising suggestion that one group should be {1, 2, 3}, the nodes with positive components, and the other group should be {4, 5, 6}. Eigenpairs of Laplacian matrix L Example 10.19 : Let us apply the above technique to the graph of Fig. 10.16. The Laplacian matrix Eigenvalue for this graph 0 1 is shown 3 3in Fig. 4 10.17. 5 Bystandardmethods or math packages Eigenvector we can 1find all 1 the 5 eigenvalues 1 1 1and eigenvectors of this matrix. We shall simply tabulate 1 them 2 in 4 Fig. 2 10.18, 1 from 0 lowest eigenvalue to 1 1 1 3 1 1 1 1 5 1 1 1 1 2 4 2 1 0 1 1 1 3 1 1 Clusters: Figure 10.18: Eigenvalues and eigenvectors for the matrix of Fig. 10.17 {1, 2, 3} {4, 5, 6} 10.4.5 Alternative Partitioning Methods The method of Section 10.4.4 gives us a good partition of the graph into two