Spectral Clustering. Shannon Quinn

Similar documents
Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Aggregation of Social Networks by Divisive Clustering Method

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

Clustering gene expression data & the EM algorithm

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Perceptual Organization (IV)

Singular Value Decomposition: Theory and Applications

Lecture 10: May 6, 2013

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Machine Learning. Measuring Distance. several slides from Bryan Pardo

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Lecture 12: Discrete Laplacian

NP-Completeness : Proofs

Problem Set 9 Solutions

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Statistical pattern recognition

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Section 8.3 Polar Form of Complex Numbers

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD.

Review: Fit a line to N data points

1 Derivation of Point-to-Plane Minimization

Finding Dense Subgraphs in G(n, 1/2)

Calculation of time complexity (3%)

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The Second Eigenvalue of Planar Graphs

Errors for Linear Systems

Solutions to exam in SF1811 Optimization, Jan 14, 2015

ENTROPIC QUESTIONING

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

The Order Relation and Trace Inequalities for. Hermitian Operators

Hierarchical State Estimation Using Phasor Measurement Units

Linear Approximation with Regularization and Moving Least Squares

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Lecture 10 Support Vector Machines. Oct

VQ widely used in coding speech, image, and video

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Random Walks on Digraphs

Linear Feature Engineering 11

EM and Structure Learning

Spatial Statistics and Analysis Methods (for GEOG 104 class).

SIMPLE LINEAR REGRESSION

Advanced Quantum Mechanics

Composite Hypotheses testing

2.3 Nilpotent endomorphisms

Feature Selection: Part 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

SEMI-SUPERVISED LEARNING

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Chapter Newton s Method

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Some Reading. Clustering and Unsupervised Learning. Some Data. K-Means Clustering. CS 536: Machine Learning Littman (Wu, TA)

10-701/ Machine Learning, Fall 2005 Homework 3

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

ψ = i c i u i c i a i b i u i = i b 0 0 b 0 0

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

IV. Performance Optimization

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

CSE 546 Midterm Exam, Fall 2014(with Solution)

Pattern Classification

Kernel Methods and SVMs Extension

Generalized Linear Methods

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Complete subgraphs in multipartite graphs

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

On the Multicriteria Integer Network Flow Problem

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Pulse Coded Modulation

Dynamical Systems and Information Theory

Probability Theory (revisited)

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Bi-Relational Network Analysis Using a Fast Random Walk with Restart

Semi-supervised Classification with Active Query Selection

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Lecture 4: November 17, Part 1 Single Buffer Management

Support Vector Machines

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

5 The Rational Canonical Form

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

Xiangwen Li. March 8th and March 13th, 2001

PES 1120 Spring 2014, Spendier Lecture 6/Page 1

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Eigenvalues of Random Graphs

Machine Learning: and 15781, 2003 Assignment 4

Transcription:

Spectral Clusterng Shannon Qunn (wth thanks to Wllam Cohen of Carnege Mellon Unverst, and J. Leskovec, A. Raaraman, and J. Ullman of Stanford Unverst)

Graph Parttonng Undrected graph B- parttonng task: 3 Dvde vertces nto two dsont groups 4 5 6 A 3 Questons: How can we delne a good partton of? How can we eflcentl dentf such a partton? 4 5 6 B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Graph Parttonng What makes a good partton? Maxmze the number of wthn- group connectons Mnmze the number of between- group connectons 5 3 4 6 A B J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3

Graph Cuts Express parttonng obectves as a functon of the edge cut of the partton Cut: Set of edges wth onl one vertex n a group: cut( A, B) = w A, B A 3 4 5 6 B cut(a,b) = J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 4

Graph Cut Crteron Crteron: Mnmum- cut Mnmze weght of connectons between groups arg mn A,B cut(a,b) Degenerate case: Optmal cut Mnmum cut Problem: Onl consders external cluster connectons Does not consder nternal cluster connectvt J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 5

[Sh-Malk] Graph Cut Crtera Crteron: Normalzed- cut [Sh- Malk, 97] Connectvt between groups relatve to the denst of each group cut( A, B) ncut ( A, B) = + vol( A) cut( A, B) vol( B) : total weght of the edges wth at least one endpont n : n Wh use ths crteron? n Produces more balanced parttons How do we eflcentl Lnd a good partton? Problem: Computng optmal cut s NP- hard J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 6

Spectral Graph Parttonng A: adacenc matrx of undrected G A = f s an edge, else 0 x s a vector n R n wth components Thnk of t as a label/value of each node of What s the meanng of A x? Entr s a sum of labels x of neghbors of J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7 " # $ $ $ % & = " # $ $ $ % & " # $ $ $ % & n n nn n n x x a a a a = = = E n x x A ), (

What s the meanng of Ax? th coordnate of A x : & $ Sum of the x- values $ of neghbors of $ % an Make ths a new value at node Spectral Graph Theor: Λ = Analze the spectrum of matrx representng a n & x $ $ $ % x Spectrum: Egenvectors of a graph, ordered b the magntude (strength) of ther correspondng egenvalues : a a nn # " A x=λ x n # " = & x λ $ $ $ % x n # " { n λ, λ,..., λ } λ λ... λ n J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8

Matrx Representatons Adacenc matrx (A): n n matrx A=[a ], a = f edge between node and 3 4 5 6 3 4 5 6 0 0 0 0 0 0 0 3 0 0 0 4 0 0 0 Important propertes: Smmetrc matrx Egenvectors are real and orthogonal 5 0 0 0 6 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9

Matrx Representatons Degree matrx (D): n n dagonal matrx D=[d ], d = degree of node 3 4 5 6 5 3 0 0 0 0 0 0 0 0 0 0 3 4 6 3 0 0 3 0 0 0 4 0 0 0 3 0 0 5 0 0 0 0 3 0 6 0 0 0 0 0 J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 0

Matrx Representatons Laplacan matrx (L): n n smmetrc matrx 3 4 5 6 3 4 5 6 3 - - 0-0 - - 0 0 0 3 - - 3-0 0 4 0 0-3 - - 5-0 0-3 - 6 0 0 0 - - L = D A What s trval egenpar? Important propertes: Egenvalues are non- negatve real numbers Egenvectors are real and orthogonal J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors e 0.4 0. 0.0 x x x x x -0. -0.4 z z z e e 3-0.4-0. 0 0. e [Sh & Mela, 00]

Spectral Clusterng: Graph = Matrx W*v = v propagates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ If Ws connected but roughl block dagonal wth k blocks then the top egenvector s a constant vector the next k egenvectors are roughl pecewse constant wth peces correspondng to blocks M

Spectral Clusterng: Graph = Matrx W*v = v propagates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ If W s connected but roughl block dagonal wth k blocks then the top egenvector s a constant vector the next k egenvectors are roughl pecewse constant wth peces correspondng to blocks Spectral clusterng: Fnd the top k+ egenvectors v,,v k+ Dscard the top one Replace ever node a wth k-dmensonal vector x a = <v (a),,v k+ (a) > Cluster wth k-means M

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Suppose each ()=+ or -: Then s a cluster ndcator that splts the nodes nto two what s T (D-A)?

= + = + = = = = T T T a a a a a a a a d a d A D A D,,,,,,,,,,,, ) ( ) ( = sze of CUT() ) NCUT( szeof ) ( = W I T NCUT: roughl mnmze rato of transtons between classes vs transtons wthn classes

So far How to delne a good partton of a graph? Mnmze a gven graph cut crteron How to eflcentl dentf such a partton? Approxmate usng nformaton provded b the egenvalues and egenvectors of a graph Spectral Clusterng J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 7

Spectral Clusterng Algorthms Three basc stages: ) Pre- processng Construct a matrx representaton of the graph ) Decomposton Compute egenvalues and egenvectors of the matrx Map each pont to a lower- dmensonal representaton based on one or more egenvectors 3) Groupng Assgn ponts to two or more clusters, based on the new representaton J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 8

Example: Spectral Parttonng Value of x Rank n x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 9

Example: Spectral Parttonng Components of x Value of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Rank n x 0

Example: Spectral parttonng Components of x J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org Components of x 3

k-wa Spectral Clusterng How do we partton a graph nto k clusters? Two basc approaches: Recursve b- parttonng [Hagen et al., 9] Recursvel appl b- parttonng algorthm n a herarchcal dvsve manner Dsadvantages: InefLcent, unstable Cluster multple egenvectors [Sh- Malk, 00] Buld a reduced space from multple egenvectors Commonl used n recent papers A preferable approach J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org

Wh use multple egenvectors? Approxmates the optmal cut [Sh- Malk, 00] Can be used to approxmate optmal k- wa normalzed cut Emphaszes cohesve clusters Increases the unevenness n the dstrbuton of the data Assocatons between smlar ponts are amplled, assocatons between dssmlar ponts are attenuated The data begns to approxmate a clusterng Well- separated space Transforms data to a new embedded space, consstng of k orthogonal bass vectors Multple egenvectors prevent nstablt due to nformaton loss J. Leskovec, A. Raaraman, J. Ullman: Mnng of Massve Datasets, http://www.mmds.org 3

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Q: How do I pck v to be an egenvector for a blockstochastc matrx?

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ How do I pck v to be an egenvector for a blockstochastc matrx?

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ smallest egenvecs of D-A are largest egenvecs of A smallest egenvecs of I-W are largest egenvecs of W Suppose each ()=+ or -: Then s a cluster ndcator that cuts the nodes nto two what s T (D-A)? The cost of the graph cut defned b what s T (I-W)? Also a cost of a graph cut defned b How to mnmze t? Turns out: to mnmze T X / ( T ) fnd smallest egenvector of X But: ths wll not be +/-, so t s a relaxed soluton

Spectral Clusterng: Graph = Matrx W*v = v propogates weghts from neghbors W v = λ v : v s an egenvector wth egenvalue λ λ e λ e 3 egengap λ 3 λ 4 λ, 5 6,7, e. [Sh & Mela, 00]

Some more terms If A s an adacenc matrx (mabe weghted) and D s a (dagonal) matrx gvng the degree of each node Then D- A s the (unnormalzed) Laplacan W=AD - s a probablstc adacenc matrx I- W s the (normalzed or random- walk) Laplacan etc. The largest egenvectors of W correspond to the smallest egenvectors of I- W So sometmes people talk about bottom egenvectors of the Laplacan

A W A W K-nn graph (eas) Full connected graph, weghted b dstance

Spectral Clusterng: Pros and Cons Elegant, and well- founded mathematcall Works qute well when relatons are approxmatel transtve (lke smlart) Ver nos datasets cause problems Informatve egenvectors need not be n top few Performance can drop suddenl from good to terrble Expensve for ver large datasets Computng egenvectors s the bottleneck

Use cases and runtmes K- Means FAST Embarrassngl parallel Not ver useful on ansotropc data Spectral clusterng Excellent qualt under man dfferent data forms Much slower than K- Means

Further Readng Spectral Clusterng Tutoral: http://www.nformatk.un- hamburg.de/ml/ contents/people/luxburg/publcatons/ Luxburg07_tutoral.pdf