UNSUPERVISED CLUSTERING WITH MST: APPLICATION TO ASTEROID DATA Mai 2005 O. Michel, P.Bendjoya,

Size: px

Start display at page:

Download "UNSUPERVISED CLUSTERING WITH MST: APPLICATION TO ASTEROID DATA Mai 2005 O. Michel, P.Bendjoya,"

Frederick Wilkerson
6 years ago
Views:

1 UNSUPERVISED CLUSTERING WITH MST: APPLICATION TO ASTEROID DATA Mai 25 O. Michel, P.Bendjoya,

2 Classification...a certain knowhow (a,e,i)computation Proposed metrics(zappala et al., 99) d(i, j) = where mm ˆ ij â ij 5 4 (a i a j ) â ij 2 + 2(e i e j )2 + 2(i i i j )2 ms ˆ mm ij = (mm i +mm j )/2, â ij = (a i +a j )/2, mma 3 = cste(kepler) 2

3 FoA detection methods Notice that : No supervised approach is possible Many objects that are NOT belonging to a family may have close (a, e, i) params : interlopers, outliers {(a, e, i) + d} is NOT an Euclidean space. Existing satisfactory approaches : Hierarchical Clustering Method (Zappala et al, 99,95) Wavelet based over-densities detection (Bendjoya et al, 993,97) BUT : both imply heavy computational burden BUT : both require ad-hoc parameter or threshold definitions 3

4 New needs About. asteroids 4

5 MST : définition Soit X = {x,..., x n } une réalisation de n vecteurs aléatoires i.i.d., où chaque x i IR d suit une distribution P, de densité de Lebesgue λ. Les x i sont considérés comme les sommets d un graphe acyclique totalement connecté (T n ), de segments e i,j et de longueur : L γ (T n ) := e i,j T n e i,j γ, γ ], d[ Par définition le graphe de représentation minimal ou Minimal Spanning Tree (MST) est, parmi l ensemble des graphes, celui de longueur minimale : T n := arg min T n L γ (T n ) Algorithme de calcul de solution exacte de complexité O(n log n). 5

6 A random set of n = 28 realizations from 2D separable uniform density over [, ] 2, and the MST spanning these points 28 random samples MST z 2 z z z 6

7 Applications in MST : Motivations Image indexing Content based retrievial Registration Target detection Robust entropy and divergence estimation Mutual information between data flows... 7

8 Entropic Spanning Graphs for α = (d γ) d, and γ d : L(Z n ) = min e T e e γ Ĥ α (Z n ) = [ ln L(Zn )/n α ] ln β L,γ α Renyi s Entropy 8

9 Back to FoA detection : Alternative graph (MST) based approach Motivations: a performances MST and k-mst for unsupervised clustering (Hero et al 99,2) b relation of MST length to Rényi entropy of the underlying distribution (Hero et al, 98) c relation of MST based single linkage clustering with entropy clustering (Michel 2) d exploitation of Prim s algorithm for MST construction to detect clusters (Olman 4) e existence of O(n log n) implementations of Prim s MST construction HOWEVER, points (b) and (c) requires that an Euclidean metrics is used : 9

10 Alternate metrics, from dimensional analysis x = 2. e a y = 2. z = sin i a 54 a d 2 e (i, j) = (x i x j ) 2 + (y i y j ) 2 + (z i z j ) 2 rk: as for our data, a > 2 units, no singularity is introduced. rk2: this transform amounts to squeeze and stretch the (a, e, i) space, such that euclidean distances apply.

11 Exploiting Prim s algorithm Summary of the Prim s algorithm (NN accretion method) : Initialize with an arbitrary T. let T m be a partial m vertices MST, issued by m iterations; iteration m : - among the unconnected vertices, connect the closest to T m, making T m+ - Iterate until no unconnected vertex is left. Olman s idea : record l(m), the length of the edge built to connect a new vertex at iteration m

12 Ex. of application on simulation data, 6 s= edge length agregation s= index 6 z=inc.5. edge length y=exc x=dga edge length agregation s=2 index agregation index.3 xoy.3 xoz coded data, s= coded data, s= yz coded data, s=

13 Ex. of application on simulation data, 2 data, in (a,e,i) coordinates extracted clusters 5 4 edge length accretion index z=i.2 z=i y=e x=a y=e x=a rk: For both examples, threshold was determined by η = αstd ({ e i }, i =,..., n) with α = here 3

14 COR Proba de detection Proba de Fausse alarme 4

15 Pd (a) family (2), interlopers; tests d25v5,3,6 (b) (c) (a) : d25v5 (b) : d25v3 (c) : d25v6 o : MST clustering * : WVT clustering Pfa Clust : ROC for simulation files containing one unique family of 25 asteroid, created from an initial impact of parameters 5, 3, 6, respectively. ROCs for WT based and MST based approaches. Initial metrics d. Pd families (,5), interlopers; test d25.5v2.5, d25.5v5.5 (a) (b) (a) : d25.5v5.5 (b) : d25.5v2.5 o : MST clustering * : WVT clustering Pfa 2Clust: ROC for simulation files containing two families of asteroid (25 and 5) or (25 and 5) objects, created from initial impacts of parameters respectively (2 and 5) or (5 and 5) respectively. ROCs for WT based and MST based approaches. Initial metrics d. 5

16 Pd Pd family (2), interlopers; tests d25v5,3, Pfa, Euclidean norm families (,5), interlopers; test d25.5v2.5, d25.5v5.5 diamond : MST clustering * : WVT clustering Pfa Euclidean norm Clust : ROC for simulation files containing one unique family of 25 asteroid, created from an initial impact of parameters 5, 3, 6, respectively. ROCs for WT based using d and MST Euclidean metrics d e based approaches 2Clust: ROC for simulation files containing two families of asteroid (25 and 5) or (25 and 5) objects, created from initial impacts of parameters respectively (2 and 5) or (5 and 5) respectively. ROCs for WT based using d and MST Euclidean metrics d e based approaches. 6

17 Setting the threshold, Empirical approach : Let < α < ; for fixed α, measure N c =Nb of detected significant clusters Record N c = f(α) Nb of detected cluster(s) Number of cluster with more than objects Estimate α opt = max(α) for largest the plateau in f(α) α 7

18 Setting the threshold, 2 Entropy based approach : Let < α < ; for fixed α, H in, H out for MST length of detected significant clusters Record H in H out = f(α) arbitrary units Clusters Entropy - Background Entropy, exp.scale Estimate α opt at the most important step α 8

19 Conclusion tools exist no matter the space dimension an entropic justication of the segmentation to your convenience... Thank you 9

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel