MATH 567: Mathematical Techniques in Data Science Clustering II

Similar documents
MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 829: Introduction to Data Mining and Analysis Clustering II

Spectral Clustering. Zitao Liu

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering on Handwritten Digits Database

Graphs in Machine Learning

Introduction to Spectral Graph Theory and Graph Clustering

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Machine Learning for Data Science (CS4786) Lecture 11

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

1 Matrix notation and preliminaries from spectral graph theory

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

1 Matrix notation and preliminaries from spectral graph theory

Graphs in Machine Learning

An Algorithmist s Toolkit September 10, Lecture 1

8.1 Concentration inequality for Gaussian random matrix (cont d)

Lecture 12 : Graph Laplacians and Cheeger s Inequality

A spectral clustering algorithm based on Gram operators

Notes on Elementary Spectral Graph Theory Applications to Graph Clustering

Notes on Elementary Spectral Graph Theory Applications to Graph Clustering Using Normalized Cuts

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

MATH 285 HW3. Xiaoyan Chong

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

STA141C: Big Data & High Performance Statistical Computing

17.1 Directed Graphs, Undirected Graphs, Incidence Matrices, Adjacency Matrices, Weighted Graphs

Spectral Techniques for Clustering

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Consistency of Spectral Clustering

Lecture 9: Laplacian Eigenmaps

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

An indicator for the number of clusters using a linear map to simplex structure

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering: a Survey

Semi-Supervised Learning by Multi-Manifold Separation

Lecture 13: Spectral Graph Theory

Some notes on SVD, dimensionality reduction, and clustering. Guangliang Chen

1 Review: symmetric matrices, their eigenvalues and eigenvectors

Spectral Clustering on Handwritten Digits Database Mid-Year Pr

COMPSCI 514: Algorithms for Data Science

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

18.312: Algebraic Combinatorics Lionel Levine. Lecture 19

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

Communities, Spectral Clustering, and Random Walks

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Networks and Their Spectra

Spectra of Adjacency and Laplacian Matrices

On the inverse matrix of the Laplacian and all ones matrix

An Introduction to Spectral Graph Theory

Lecture 13 Spectral Graph Algorithms

Linear Algebra II Lecture 22

1 Last time: least-squares problems

CS660: Mining Massive Datasets University at Albany SUNY

Math 3191 Applied Linear Algebra

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Data Mining and Analysis: Fundamental Concepts and Algorithms

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

ECS231: Spectral Partitioning. Based on Berkeley s CS267 lecture on graph partition

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Markov Chains and Spectral Clustering

The f-adjusted Graph Laplacian: a Diagonal Modification with a Geometric Interpretation

Eigenvalue problems and optimization

Spectral Graph Theory

Finite Frames and Graph Theoretical Uncertainty Principles

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

ORIE 6334 Spectral Graph Theory September 8, Lecture 6. In order to do the first proof, we need to use the following fact.

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

MODELS FOR SPECTRAL CLUSTERING AND THEIR APPLICATIONS. Donald, F. McCuan. B.A., Austin College, A thesis submitted to the

Spectral Graph Theory Lecture 3. Fundamental Graphs. Daniel A. Spielman September 5, 2018

Damped random walks and the characteristic polynomial of the weighted Laplacian on a graph

CS 556: Computer Vision. Lecture 13

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Integral Free-Form Sudoku graphs

Power Grid Partitioning: Static and Dynamic Approaches

Chapter 4. Signed Graphs. Intuitively, in a weighted graph, an edge with a positive weight denotes similarity or proximity of its endpoints.

Jordan Canonical Form Homework Solutions

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Graph Partitioning Algorithms and Laplacian Eigenvalues

Two Laplacians for the distance matrix of a graph

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

On the Maximal Error of Spectral Approximation of Graph Bisection

Spectral Generative Models for Graphs

Linear Spectral Hashing

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

Physical Metaphors for Graphs

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

1 I A Q E B A I E Q 1 A ; E Q A I A (2) A : (3) A : (4)

CS 664 Segmentation (2) Daniel Huttenlocher

Spectral Bounds on the Chromatic Number

Linear Algebra II Lecture 13

Rings, Paths, and Paley Graphs

Linear Algebra I Lecture 10

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Abstract We survey some of what can be deduced about automorphisms of a graph from information on its eigenvalues and eigenvectors. Two of the main to

Inverse Power Method for Non-linear Eigenproblems

Transcription:

Spectral clustering: overview MATH 567: Mathematical Techniques in Data Science Clustering II Dominique uillot Departments of Mathematical Sciences University of Delaware Overview of spectral clustering: Construct a similarity matrix measuring the similarity of pairs of objects (eg sij = exp( xi xj /(σ ))) Use the similarity matrix to construct a (weighted or unweighted) graph 3 Compute eigenvectors of the (normalized or unnormalized) graph Laplacian: L = D W, Lsym = D / LD / 4 Construct a matrix containing the rst K eigenvectors of L or Lsym as columns 5 Each row identies a vertex of the graph to a point in R K May 8, 7 This lecture is based on U von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 7 (4), 7 6 Cluster those points using the K-means algorithm /8 Example Let us try to cluster the following graph: A = We have:, L = 3 4 3 3 4 3 3 3 v = ( 38577, 4777, 38577, 38577, 4777, 38577, 38577, 38577) T The unnormalized Laplacian Proposition: The matrix L satises the following properties: For any f R n : f T Lf = (fi fj) L is symmetric and positive semidenite 3 is an eigenvalue of L with associated constant eigenvector Proof: To prove (), f T Lf = f T Df f T W f = difi fifj = difi fifj + djf j j= = (fi fj) 3/8 () follows from () (3) is easy 4/8

The unnormalized Laplacian (cont) Proposition: Let The normalized Laplacians be an undirected graph with non-negative Proposition: The normalized Laplacians satisfy the following weights Then: The multiplicity of the eigenvalue of connected components A,, A The eigenspace of eigenvalue vectors Proof: If f A,, A L of properties: equals the number in the graph is an eigenvector associated to = f T Lf = n X fi = fj f Rn, we have n X f Lsym f = T =, fj f i p di dj then (fi fj ) is an eigenvalue of an eigenvalue of 3 whenever the connected components of u Lrw Lsym is an eigenvalue of and > Thus f is constant on We conclude that the eigenspace of is contained in span(a,, A ) Conversely, it is not hard to see that each Ai is an eigenvector associated to (write L in It follows that For every is spanned by the indicator of those components u if and only w = D/ u if u if and only Lu = Du if Lrw! solve the generalized eigenproblem is Proof: The proof of () is similar to the proof of the analogous result for the unnormalized Laplacian () and (3) follow easily by using appropriate rescalings bloc diagonal form) 5/8 The normalized Laplacians (cont) 6/8 raph cuts graph with (weighted) adjacency matrix Proposition: Let be an undirected graph with non-negative W = ( ) We de ne: weights Then: The multiplicity of the eigenvalue Lsym and Lrw A,, A in For Lrw, 3 For := the eigenspace of eigenvalue indicator vectors Ai, i =,, is spanned by the Lsym, the eigenspace of eigenvalue D/ Ai, i =,, i A,j B the graph X W (A, B) := of both equals the number of connected components number of vertices in vol(a) := i A di iven a partition is spanned by the vectors A P A,, A of the vertices of, we let cut(a,, A ) := Proof: Similar to the proof of the analogous result for the X W (Ai, Ai ) unnormalized Laplacian The min-cut problem consists of solving: min V =A A Ai Aj = i6=j 7/8 cut(a,, A ) 8/8

raph cuts (cont) The min-cut problem can be solved eciently when = (see Stoer and Wagner 997) In practice it often does not lead to satisfactory partitions In many cases, the solution of min-cut simply separates one individual vertex from the rest of the graph Balanced cuts The two most common objective functions that are used as a replacement to the min-cut objective are: RatioCut (Hagen and Kahng, 99): RatioCut(A,, A) := W (Ai, Ai) Ai = cut(ai, Ai) Ai Normalized cut (Shi and Mali, ): Ncut(A,, A) := W (Ai, Ai) cut(ai, Ai) = vol(ai) vol(ai) We would lie clusters to have a reasonably large number of points We therefore modify the min-cut problem to enforce such constraints Note: both objective functions tae larger values when the clusters Ai are small Resulting clusters are more balanced However, the resulting problems are NP hard - see Wagner and Wagner (993) 9/8 /8 Spectral clustering Spectral clustering provides a way to relax the RatioCut and the Normalized cut problems Strategy: Express the original problem as a linear algebra problem involving discrete/combinatorial constraints Relax/remove the constraints RatioCut with = : solve min RatioCut(A, A) iven A V, let f R n be given by / if vi A fi := / if vi A Relaxing RatioCut Let L = D W be the (unnormalized) Laplacian of Then f T Lf = (fi fj) = + + i A,j A ( = W (A, A) + + ) ( ) + + = W (A, A) + = V ( ) W (A, A) W (A, A) + = V RatioCut(A, A) i A,j A since + = V, and W (A, A) = W (A, A) /8 /8

Relaxing RatioCut (cont) Relaxing RatioCut (cont) We showed: f T Lf = (fi fj) = V RatioCut(A, A) Moreover, note that fi = i A Thus f i A Finally, f = fi = = = + = V = n Thus, we have showed that the Ratio-Cut problem is equivalent to We have: subject to f, f = n, fi dened as above This is a discrete optimization problem since the entries of f can only tae two values: / and / The problem is NP hard The natural relaxation of the problem is to remove the discreteness condition on f and solve f R n subject to f, f = n subject to f, f = n, fi dened as above 3/8 4/8 Relaxing RatioCut (cont) Unnormalized spectral clustering: summary Using properties of the Rayleigh quotient, it is not hard to show that the solution of f R n subject to f, f = n The unnormalized spectral clustering algorithm: is an eigenvector of L corresponding to the second eigenvalue Clearly, if f is the solution of the problem, then f T L f min RatioCut(A, A) How do we get the clusters from f? We could set { vi A if fi vi A if fi < More generally, we cluster the coordinates of f using K-means This is the unnormalized spectral clustering algorithm for = The above process can be generalized to clusters 5/8 Source: von Luxburg, 7 6/8

Normalized spectral clustering Relaxing the RatioCut leads to unnormalized spectral clustering By relaxing the Ncut problem, we obtain the Normalized spectral clustering algorithm of Shi and Mali () The normalized clustering algorithm of Ng et al Another popular variant of the spectral clustering algorithm was provided by Ng, Jordan, and Weiss () The algorithm uses Lsym instead of L (unnormalized clustering) or Lrw (Shi and Mali's normalized clustering) Source: von Luxburg, 7 Note: The solutions of Lu = Du are the eigenvectors of Lrw See von Luxburg (7) for details Source: von Luxburg, 7 See von Luxburg (7) for details 7/8 8/8