Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014
Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group features to form codes Lots of features (from all traiig data) High dimesioal feature vectors (see above) Number K of codes (clusters) is kow 2
Groupig (review cot.) Two geerative models of clusterig: K-Meas Assumes all clusters have same variace i every dimesio Assumes all clusters have same variace as each other Expectatio Maximizatio (EM) Fits a arbitrary Gaussia to every cluster More geeral; ca cluster more complex cases Risk of over-fittig 3
Alterative Approaches If we do t eed to map to probabilities We do t eed to model uderlyig distributios We ca optimize other fuctio Goal Divide N samples ito K clusters (groups) Maximizig the similarity of samples withi groups Miimizig the similarity of samples betwee groups Problem Expoetially may possible groups Direct solutios are NP hard Therefore, we look for heuristic solutios 4
Preview Spectral Clusterig Divide data usig K processes Maximizig similarity withi groups Subject to artificial orthogoality costrait Agglomerative clusterig with Ward s Likage Recursively merge samples Miimizig variace withi groups Greedy heuristic 5
Spectral Clusterig 1. Defie a affiity (similarity) matrix A a i,j is large (~1) if samples a i ad a j are similar a i,j is small (but >= 0) if samples are dissimilar exp( a i a j L2 ) is commo Similarities below a threshold ofte set to 0 Affiity matrix must be symmetric 2. Defie a diagoal degree matrix D d ii = j a ij 6
Spectral Clusterig (II) 3. Defie the LaPlacia matrix L: L = D A What does this matrix look like? L has a importat property: f T Lf = 1 2 i, j=1 a ij ( f i f ) 2 j F is ay vector, but we will iterpret it as a vector of cluster labels 7
Proof f T Lf = f T Df f T Af = 2 d i f i f i f j a i, j i=1 i, j=1 = 1 # 2 d i f i 2 2 f i f j a i, j + d j f j 2 % $ i=1 i, j=1 j=1 = 1 2 a ij i, j=1 ( f i f ) 2 j & ( ' 8
Spectral Clusterig (III) 4. Now take the eigevectors of L Yes, compute LL T = RλR T Remember the defiitio of a eigevector: f i Lf T i = λ i So if a eigevector has eigevalue 0: ( f i f ) 2 j 0 = f T Lf = 1 a ij 2 i, j=1 I other words, all pairs are either Have the same label, or Have 0 similarity 9
Spectral Clusterig (IV) The umber of 0 eigevalues is the umber of discoected groups i L Every L has at least oe 0 eigevalue The correspodig eigevector is (1,1,,1) More geerally, eigevectors with small eigevalues miimize 1 2 a ij i, j=1 ( f i f ) 2 j 10
Spectral Clusterig (V) I other words, if samples are projected oto the eigevalues of L: The most similar samples will cluster Selectig K eigevectors geerates K orthogoal processes 5. Project data oto the K eigevectors with the smallest eigevalues 6. Cluster the projected samples usig K- Meas 11
Spectral Clusterig (summary) The eigevectors of L miimize Sice f s are labels, this tries to give similar samples similar labels Dissimilar samples ca have differet labels (a ij is 0 or small) Works best whe A cotais may 0s F s are ot itegers, however So K-Meas cleas it up 1 2 i, j=1 a ij ( f i f j ) 2 12
Spectral Clusterig (last SC slide, I promise) Efficiet way to exploit gaps without assumig distributios of samples Weakess: does t fid very small groups Groups with very few samples have too small a effect o the eige decompositio 13
Agglomerative Clusterig Iitialize every sample to be its ow cluster Groups of size 1 Iteratively fid the most similar pair of groups & merge them Util the umber of groups equals K Note that this is a simple Greedy search usig whatever fuctio (likage) measures similarity 14
Ward s Likage Miimize the total variace The sum of the squared distaces from every poit to its cluster ceter Iitial total variace is zero (sigleto groups) O every step, merge the two groups with the smallest gai i variace Gai A, B ( ) = Var A B ( ) Var A ( ) Var B ( ) 15
Agglomerative + Ward s Miimizes itra-class variace Assumes Euclidea Geometry Heuristic does ot fid global optimum 16