Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Size: px

Start display at page:

Download "Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014"

Julian Hood
6 years ago
Views:

1 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014

2 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group features to form codes Lots of features (from all traiig data) High dimesioal feature vectors (see above) Number K of codes (clusters) is kow 2

3 Groupig (review cot.) Two geerative models of clusterig: K-Meas Assumes all clusters have same variace i every dimesio Assumes all clusters have same variace as each other Expectatio Maximizatio (EM) Fits a arbitrary Gaussia to every cluster More geeral; ca cluster more complex cases Risk of over-fittig 3

4 Alterative Approaches If we do t eed to map to probabilities We do t eed to model uderlyig distributios We ca optimize other fuctio Goal Divide N samples ito K clusters (groups) Maximizig the similarity of samples withi groups Miimizig the similarity of samples betwee groups Problem Expoetially may possible groups Direct solutios are NP hard Therefore, we look for heuristic solutios 4

5 Preview Spectral Clusterig Divide data usig K processes Maximizig similarity withi groups Subject to artificial orthogoality costrait Agglomerative clusterig with Ward s Likage Recursively merge samples Miimizig variace withi groups Greedy heuristic 5

6 Spectral Clusterig 1. Defie a affiity (similarity) matrix A a i,j is large (~1) if samples a i ad a j are similar a i,j is small (but >= 0) if samples are dissimilar exp( a i a j L2 ) is commo Similarities below a threshold ofte set to 0 Affiity matrix must be symmetric 2. Defie a diagoal degree matrix D d ii = j a ij 6

7 Spectral Clusterig (II) 3. Defie the LaPlacia matrix L: L = D A What does this matrix look like? L has a importat property: f T Lf = 1 2 i, j=1 a ij ( f i f ) 2 j F is ay vector, but we will iterpret it as a vector of cluster labels 7

8 Proof f T Lf = f T Df f T Af = 2 d i f i f i f j a i, j i=1 i, j=1 = 1 # 2 d i f i 2 2 f i f j a i, j + d j f j 2 % $ i=1 i, j=1 j=1 = 1 2 a ij i, j=1 ( f i f ) 2 j & ( ' 8

9 Spectral Clusterig (III) 4. Now take the eigevectors of L Yes, compute LL T = RλR T Remember the defiitio of a eigevector: f i Lf T i = λ i So if a eigevector has eigevalue 0: ( f i f ) 2 j 0 = f T Lf = 1 a ij 2 i, j=1 I other words, all pairs are either Have the same label, or Have 0 similarity 9

10 Spectral Clusterig (IV) The umber of 0 eigevalues is the umber of discoected groups i L Every L has at least oe 0 eigevalue The correspodig eigevector is (1,1,,1) More geerally, eigevectors with small eigevalues miimize 1 2 a ij i, j=1 ( f i f ) 2 j 10

11 Spectral Clusterig (V) I other words, if samples are projected oto the eigevalues of L: The most similar samples will cluster Selectig K eigevectors geerates K orthogoal processes 5. Project data oto the K eigevectors with the smallest eigevalues 6. Cluster the projected samples usig K- Meas 11

12 Spectral Clusterig (summary) The eigevectors of L miimize Sice f s are labels, this tries to give similar samples similar labels Dissimilar samples ca have differet labels (a ij is 0 or small) Works best whe A cotais may 0s F s are ot itegers, however So K-Meas cleas it up 1 2 i, j=1 a ij ( f i f j ) 2 12

13 Spectral Clusterig (last SC slide, I promise) Efficiet way to exploit gaps without assumig distributios of samples Weakess: does t fid very small groups Groups with very few samples have too small a effect o the eige decompositio 13

14 Agglomerative Clusterig Iitialize every sample to be its ow cluster Groups of size 1 Iteratively fid the most similar pair of groups & merge them Util the umber of groups equals K Note that this is a simple Greedy search usig whatever fuctio (likage) measures similarity 14

15 Ward s Likage Miimize the total variace The sum of the squared distaces from every poit to its cluster ceter Iitial total variace is zero (sigleto groups) O every step, merge the two groups with the smallest gai i variace Gai A, B ( ) = Var A B ( ) Var A ( ) Var B ( ) 15

16 Agglomerative + Ward s Miimizes itra-class variace Assumes Euclidea Geometry Heuristic does ot fid global optimum 16

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm