Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY DEPARTMENT OF PHYSICS AND TECHNOLOGY UNIVERSITY OF TROMSØ, NORWAY 1/54
Outline Motivation Introduction to Spectral Clustering Distance Measures for PolSAR Covariance Matrices A New Algorithm Results Conclusions and Future Work 2/54
Motivation Seeking (near) optimal statistical classification Disregarding covariance matrix structure (decomposition theory) and spatial information - for now Improve on the Wishart classifier Lee et al. (IJRS, 1994), Lee et al. (TGRS, 1999), Pottier & Lee (EUSAR, 2000),... Apply modern pattern recognition tools Kernel methods, spectral clustering, information theoretic learning 3/54
The Wishart Classifier Revisited Initialisation: Segmentation in H/A/α space Cloude-Pottier-Wishart (CPW) classifier Class mean coherency matrices V i calculated from initial partitioning of data; V i = < T j pixel j class i >, i = 1,..., k T j = < kk H > k = 1 2 [S hh +S vv, S hh S vv, 2S hv ] T. 4/54
The Wishart Classifier Revisited Initialisation: Segmentation in H/A/α space Cloude-Pottier-Wishart (CPW) classifier Class mean coherency matrices V i calculated from initial partitioning of data Iterative classification Minimum distance classification based on Wishart distance between the pixel coherency matrix T and V i : ω j = min i d W (T, V i ), i 1,..., k Iterative reclassification and update of class means 5/54
The Wishart Classifier Revisited Delivers consistently good results. Few parameters, easy to use, computationally efficient, approaches a ML solution - if it converges. But has some drawbacks: The initialisation uses a fixed number of classes, and is restricted to one class per predetermined zone in H/A/α space. Inherits the well known disadvantages of k-means. E.g., converence is not guaranteed, and may be slow. Conclusion: State of the art algorithms from pattern recognition and machine learning should be tested. 6/54
Clustering by Pairwise Affinities Based on distances d i j between all pixel pairs (i,j). Propagates similarity from pixel to pixel. Yields flexible discrimination surfaces. Nonlinear mapping to kernel space, where clustering is done with linear methods. The mapping is found by eigendecomposition. 3 2 1 0 1 2 Examples of capabilities 3 4 3 2 1 0 1 Input space 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.6 0.4 0.2 0 Kernel space 7/54
Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 8/54
Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a 12... a 1N A = a 21 a 22... a 2N..... a N1 a N2... a NN 9/54
Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a 12... a 1N A = a 21 a 22... a 2N..... a N1 a N2... a NN 10/54
Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a 12... a 1N A = a 21 a 22... a 2N..... a N1 a N2... a NN The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. There are different ways of using the eigenvalues and eigenvectors of A to obtain an optimal clustering. 11/54
Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a 12... a 1N A = a 21 a 22... a 2N..... a N1 a N2... a NN The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. E.g.: Using u eigenvectors corresponding to the largest eigenvalues new u-dimensional feature space (eigenspace): e T 1 e T 2. e T u = [φ 1φ 2... φ N ] 12/54
Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i 13/54
Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. 14/54
Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. 15/54
Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. We use an information theoretic method, which partitions data by implicit maximization of the Cauchy-Schwarz divergence between the cluster pdf s in input space. Pdf s are estimated nonparametrically. 16/54
Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. We use an information theoretic method, which partitions data by implicit maximization of the Cauchy-Schwarz divergence between the cluster pdf s in input space. Pdf s are estimated nonparametrically. Data points outside the size N sample can be mapped to eigenspace using the Nyström approximation: Φ j (T) N λ j N e ji d(t, T i ), j = 1,..., u. i=1 17/54
Relation to Kernel Methods May be related to Mercer kernel-based algorithms, such as: Support Vector Machines, Kernel PCA, Kernel k-means, etc. The pairwise affinities are inner products in a Mercer kernel space a i j = a(t i, T j ) = < φ i, φ j >, a i j is a Mercer kernel function and A a Mercer kernel matrix iff: a(t i, T j ) is semi-positive definite a(t i, T j ) is symmetric a(t i, T j ) is continuous With these restrictions, how do we select the distance measure? 18/54
Coherency Matrix Distance Measures Wishart distance (Lee et al., IJRS 94) d W (T 1, T 2 ) = ln T 2 + tr(t 1 2 T 1). 19/54
Coherency Matrix Distance Measures Wishart distance (Lee et al., IJRS 94) d W (T 1, T 2 ) = ln T 2 + tr(t 1 2 T 1). Can be symmetrized, but d W (T i, T i ) depends on T i. Not suitable! 20/54
Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 21/54
Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. 22/54
Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS.03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. Symmetrized normalized log-likelihood distance (Proposed here) d SNLL (T 1, T 2 ) = 1 2 ( tr(t 1 1 T 2 + T 1 2 T 1) ) p. 23/54
Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. Symmetrized normalized log-likelihood distance (Proposed here) d SNLL (T 1, T 2 ) = 1 ( tr(t 1 2 1 T 2 + T 1 2 T 1) ) p. Based on log-likelihood ratio test of equality for one known and one unknown covariance matrix. Symmetrized version of revised Wishart distance (Kersten et al., TGRS 05) 24/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. 25/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. 26/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in eigenspace, using the Nyström approximation. 27/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. 28/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. The latter solution has much lower computational cost. Our experience is that the classification results are essentially equal. 29/54
The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. The latter solution has much lower computational cost. Our experience is that the classification results are essentially equal. Hence, only the initialisation of the CPW classifier is changed. 30/54
The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. 31/54
The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost 32/54
The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost Kernel bandwidth: σ Robust automatic selection rule is under investigation 33/54
The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost Kernel bandwidth: σ Robust automatic selection rule is under investigation Eigenspace dimension: u Can be fixed to u = k for simplicity 34/54
POLinSAR 2007 Frascati Test Data Set: Flevoland, L-band 200x320 subset of AIRSAR L-band data set of agricultural area in Flevoland, The Netherlands, August 1989. Courtesy of NASA/JPL. 35/54
Ground Truth Data 36/54
Evaluation Qualitative analysis (visual inspection) Quantitative analysis We calculate a matching matrix M that relates predicted (P) and actual (A) class labels, and derive classification merits from M (Ferro-Famil et al., TGRS 01): Descriptivity D i : The fraction of the dominant predicted class labels within an actual class (quantifies homogeneity). Compactness C i : Quantifies to what extent the dominant predicted class also dominates other actual classes. Representivity R i : Quantifies to what extent the dominant predicted class is predicted for other actual classes. 37/54
Qualitative Analysis Cloude-Pottier-Wishart (CPW) Classifier Parameters: k=16, k e f f =9, it=10 (No. iterations in Wishart classifier). 38/54
Qualitative Analysis Cloude-Pottier-Wishart (CPW) Classifier Observations: Class 2 and 9 covered by same cluster. Class 4 and 10 covered by same cluster. Homogeneous classification in the ground truth areas. 39/54
Qualitative Analysis Bartlett Spectral Wishart (BSW) Classifier Parameters: k=16, k e f f =12, σ = 0.42, N=6400 (10%), it=10. 40/54
Qualitative Analysis Bartlett Spectral Wishart (BSW) Classifier Observations: Class 1 and 5 covered by same cluster. Some interference by a second cluster in class 3 and 5. Not as homogeneous classification as for CPW classifier, but better delineation of some areas. 41/54
POLinSAR 2007 Frascati Qualitative Analysis SNLL Spectral Wishart (SSW) Classifier Parameters: k=16, ke f f =15, σ = 0.42, N=6400 (10%), it=10. 42/54
Qualitative Analysis SNLL Spectral Wishart (SSW) Classifier Observations: Unique dominant cluster for all ground truth areas. Less homogeneous classification than other methods, much due to the higher effective number of classes. 43/54
Matching matrix for CPW classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A1 99.8 0.2 0 0 0 0 0 0 0.2 0 99.5 A2 0 99.5 0 0 0.5 0 0 0 99.5 0 0 A3 0 0 99.7 0 0 0.3 0 0 0 0 99.3 A4 0 0 0 92.8 7.2 0 0 0 0 92.8 0 A5 0 20.2 0 0 79.8 0 0 0 20.2 0 39.5 A6 0 0 0 0 0 100 0 0 0 0 100 A7 0 0 0 0 0 0.3 99.7 0 0 0 99.3 A8 0 0 3.0 0 0 1.5 0 95.5 0 0 91.0 A9 0 100 0 0 0 0 0 0 100 0 0 A10 0 0 0 100 0 0 0 0 0 100 0 R i 99.8 0 96.7 0 72.2 97.8 99.7 95.5 0 7.2 44/54
Matching matrix for Bartlett distance classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A1 99.8 0 0 0 0 0 0 0 0.2 0 99.7 A2 0 98.5 0 0 0.5 0 0 0 1.5 0 97.0 A3 0 0 72.9 0 0 0.1 0 15.3 0 0 57.6 A4 0 0 0 99.2 0.8 0 0 0 0 0.03 98.4 A5 0 5.6 0 1.8 83.2 0 0 0 8.5 0 67.3 A6 0 0 0 0 0 99.8 0 0.02 0 0 99.8 A7 0 0 0 0 0 0.6 98.1 0 0 0 97.5 A8 0 0 8.3 0 0 7.4 0.02 89.0 0 0 77.0 A9 0 3.7 0 0 1.8 0 0 0 94.5 0 89.1 A10 0 0 0 0.4 0.1 0 0 0 0.6 98.6 97.5 R i 99.8 89.2 64.7 97.0 80.5 91.7 98.1 77.4 83.9 98.6 45/54
Matching matrix for SNLL distance classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A1 99.8 0 0 0 0 0 0 0 0.2 0 99.7 A2 0 98.6 0 0 0.1 0 0 0 0.9 0 97.6 A3 0 0 61.3 0 0 0.05 0 6.0 0 0 55.2 A4 0 0 0 93.0 1.8 0 0 0 0 0 91.2 A5 0 1.0 0 0.1 72.1 0 0 0 3.4 0 67.7 A6 0 0 0 0 0 99.0 0 0 0 0 99.0 A7 0 0 0 0 0.2 0.1 84.1 0 0 0 83.9 A8 0 0 1.2 0 0 3.9 0.1 91.8 0 0 86.6 A9 0 0.2 0 0 6.4 0 0 0 87.1 0 80.6 A10 0 0 0 0.2 0.1 0 0 0 0.4 94.5 93.7 R i 99.8 97.4 60.1 92.7 63.6 94.9 84.0 85.8 82.3 94.5 46/54
Quantitative Analysis: Descriptivity 47/54
Quantitative Analysis: Compactness 48/54
Quantitative Analysis: Representivity 49/54
Quantitative Analysis: Effective no. classes 50/54
Convergence Speed 51/54
Conclusions and Future Work We have selected two distance measures suited for calculation of pairwise affinities for PolSAR data coherency matrices. We have demonstrated how PolSAR data can be segmented by spectral clustering of coherency matrices The algorithm improves the classification result of the CPW classifier, while using the same information (derived from the statistics of a single pixel). Performance analysis shows that spectral clustering gives a better initialisation of the Wishart classifier than the H/A/α initialisation, both in terms of classification result and covergence speed. 52/54
Conclusions and Future Work Further work will concentrate on methods for robust selection of the kernel bandwidth σ, and studies of the data adaptive k e f f, in order to develop and verify a fully automatic segmentation algorithm. We will also study how spatial information and information from polarimetric decompositions can be included in the distance measure, to assimilate more prior information in the kernel function. The algorithm will be tested on different data sets. 53/54
Thank you! Stian Normann Anfinsen Computational Earth Observation and Machine Learning Laboratory University of Tromsø URL: http://www.phys.uit.no/ceo-ml/ 54/54