Statistical clustering and Mineral Spectral Unmixing in Aviris Hyperspectral Image of Cuprite, NV

Similar documents
A Simple Regression Problem

Feature Extraction Techniques

COS 424: Interacting with Data. Written Exercises

Pattern Recognition and Machine Learning. Artificial Neural networks

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Bayes Decision Rule and Naïve Bayes Classifier

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Kernel Methods and Support Vector Machines

Support recovery in compressed sensing: An estimation theoretic approach

Boosting with log-loss

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

On Constant Power Water-filling

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Machine Learning Basics: Estimators, Bias and Variance

Pattern Recognition and Machine Learning. Artificial Neural networks

Detection and Estimation Theory

Lower Bounds for Quantized Matrix Completion

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

HIGH RESOLUTION NEAR-FIELD MULTIPLE TARGET DETECTION AND LOCALIZATION USING SUPPORT VECTOR MACHINES

CS Lecture 13. More Maximum Likelihood

Multi-Scale/Multi-Resolution: Wavelet Transform

Estimating Parameters for a Gaussian pdf

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

On Conditions for Linearity of Optimal Estimation

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Pattern Recognition and Machine Learning. Artificial Neural networks

Support Vector Machines MIT Course Notes Cynthia Rudin

Using a De-Convolution Window for Operating Modal Analysis

Probability Distributions

Non-Parametric Non-Line-of-Sight Identification 1

Sharp Time Data Tradeoffs for Linear Inverse Problems

PAC-Bayes Analysis Of Maximum Entropy Learning

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

SPECTRUM sensing is a core concept of cognitive radio

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Randomized Recovery for Boolean Compressed Sensing

Block designs and statistics

Support Vector Machines. Goals for the lecture

A Theoretical Analysis of a Warm Start Technique

Stochastic Subgradient Methods

Hamming Compressed Sensing

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

A remark on a success rate model for DPA and CPA

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Principal Components Analysis

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Convex Programming for Scheduling Unrelated Parallel Machines

Fairness via priority scheduling

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Super-Channel Selection for IASI Retrievals

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Ch 12: Variations on Backpropagation

Leonardo R. Bachega*, Student Member, IEEE, Srikanth Hariharan, Student Member, IEEE Charles A. Bouman, Fellow, IEEE, and Ness Shroff, Fellow, IEEE

Research Article Data Reduction with Quantization Constraints for Decentralized Estimation in Wireless Sensor Networks

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

IN modern society that various systems have become more

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

A Note on the Applied Use of MDL Approximations

Hybrid System Identification: An SDP Approach

Gaussian Illuminants and Reflectances for Colour Signal Prediction

Asynchronous Gossip Algorithms for Stochastic Optimization

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Feedforward Networks

Birthday Paradox Calculations and Approximation

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Ensemble Based on Data Envelopment Analysis

Weighted- 1 minimization with multiple weighting sets

The Methods of Solution for Constrained Nonlinear Programming

Multiple line-template matching with the EM algorithm

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Efficient dynamic events discrimination technique for fiber distributed Brillouin sensors

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

In this chapter, we consider several graph-theoretic and probabilistic models

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Interactive Markov Models of Evolutionary Algorithms

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

A method to determine relative stroke detection efficiencies from multiplicity distributions

arxiv: v1 [math.na] 10 Oct 2016

INTELLECTUAL DATA ANALYSIS IN AIRCRAFT DESIGN

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

Support Vector Machines. Maximizing the Margin

Introduction to Machine Learning. Recitation 11

Combining Classifiers

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Data-Driven Imaging in Anisotropic Media

Topic 5a Introduction to Curve Fitting & Linear Regression

Ufuk Demirci* and Feza Kerestecioglu**

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Transcription:

CS229 REPORT, DECEMBER 05 1 Statistical clustering and Mineral Spectral Unixing in Aviris Hyperspectral Iage of Cuprite, NV Mario Parente, Argyris Zynis I. INTRODUCTION Hyperspectral Iaging is a technique for obtaining a spectru in each position of a large array of spatial positions so that a recognizable iage is obtained at each of a set of discrete wavelengths. The iages ight be of a rock in the laboratory, a field study site fro an aircraft or a rover caera, or a whole planet fro a spacecraft or Earth-based telescope. By analyzing the spectral features (generally neighborhoods of local inia in the spectra) one can ap aterials. A siplistic explanation for this being is that specific cheical bonds in different aterials anifest theselves as absorption features at different wavelengths and by apping where those bonds occur in the spectra one can uniquely identify what is called the unique spectral signature of the aterial. The factors affecting spectra of natural aterials and the causes of absorption features are several and cobine in coplex ways. They are not the focus of this paper but a coprehensive tutorial can be found in [3]. Spectral unixing is the procedure by which the easured spectru of a pixel is decoposed into a collection of constituent spectra, or endebers, and a set of corresponding fractions, or abundances, that indicate the proportion of each endeber present in the pixel. In the case of rocks or soils the endebers can be consistent with the inerals present in the geologic surface observed. In this work we present a novel technique of endeber selection fro a database of inerals based on siple convex optiization techniques. Spectral unixing can assue linear or nonlinear cobination of the endebers depending on the nature of the surface observed [9]. Unfortunately nonlinear schees can be ipractical for hyperspectral iaging because ultiple views fro different angles of the sae scene are required [11]. Linear Spectral Unixing is based on the assuption that the spectru of each pixel of the scene is a convex cobination of the spectra of its coponent inerals [13]. Deterinistic odeling of the ixture lacks the ability to explain the statistical variability of the spectra within a class due for exaple to illuination differences, altietry, grain size of the aterial and other causes. Several attepts have been ade to correct this proble: these approaches allow the endebers of the ixture to be rando variables (ostly Gaussians) [6], [14]. The present work assues a ixture of ultidiensional pdf s for the statistical distribution of the spectra of the single pixels coposing the scene. Each pdf (a ultinoial Gaussian) represents the likelihood of a certain ineral ixture in the scene. We ake use of the Gauss Mixture Vector Quantization algorith [1], [7] as an alternative to the EM algorith [5]to learn the ixture paraeters. We also explore cluster analysis with correlation distance [8]. The clustering stage is useful to select only a few centroids each to be representative of the iage and to perfor only on the the unixing as opposed to the whole iage. This decreases the coputational cost of the processing, especially when a large nuber of iages is acquired. Each gaussian ean produced by the clustering stage, thought to be representative of a specific ineral ixture, is unixed in this work by a constrained least square algorith which can be cast as a quadratic progra. II. AVIRIS CUPRITE DATASET Spectral data collected over Cuprite, Nevada, USA, have been widely used to evaluate reote sensing technology and for spectral unixing [4], [15] and references therein. For the purpose of our study out of the Aviris bands (172-221, or 1.99 to 2.48 µ) have been selected because of the better discriination of ineral signatures in that range. In figure 1(b) there is a plot of the spectra that represent the spectral variability in the iage. Those were obtained by considering the iage as a -diensional data cloud and capturing its corners. III. CLUSTERING Our goal in clustering is to pick up as uch as possible of the spectral variability in the iage. We tried soe statistical easures of cluster validation such as the Gap statistics. By polling several experts in ineral identification we assessed that the statistical optial nuber of clusters is too low to capture the variability in the data that those experts would consider if they scanned the pixel spectra. Since this is a copletely unsupervised classification and for the reasons just entioned we consider capturing the ost spectra of figure 1(b) as a reasonable goal for the clustering stage and to select the nuber of clusters. We experiented with 3 different setups: K-eans, GMVQ and cluster analysis with the correlation easure. We try for each k (nuber of clusters) 5 runs with rando initial point and choose the run with iniu value for objective function (distortion or distance). A. Lloyd Clustering Algorith for Gauss Mixture design The Gauss Mixture Vector Quantization algorith can be seen as an alternative to the EM algorith for fitting a finite

CS229 REPORT, DECEMBER 05 2 (a) RGB coposite of bands 183, 193 and 7 (b) spectral variability Fig. 1. Aviris hyperspectral iage of Cuprite, NV Gauss ixture {p, g } to a training set {x 1, x 2,..., x N } (see [1] section 4 and []. The design of a Gauss ixture ipleented by GMVQ is as follows [1]: Miniu Distortion (Nearest Neighbor) Step: For i = 1,..., N encode each training vector x i into the index α c+1 (x i ) corresponding to the iniu distortion Gaussian odel f (x i θ), c that is [1], α c+1 (x i ) = arg in ( ln f (x i θ c ) + λ ln 1 p c ) (1) = arg in (1 2 ln Kc + (2) + 1 2 (x µc ) T (K c ) 1 (x µ c ) λ ln p c ). where µ c and K c are the current estiates of the ean and covariance of the distribution of the saples. Centroid Step: Given all the vectors {x 1, x 2,..., x N } belonging to the -th partition deterine a gaussian density (i.e. its paraeters µ c and K) c f (x i θ) c so as to iniize [ 1 2 ln Kc + 1 ] 2 (x i µ c ) T (K) c 1 (x i µ c ) (3) The iniization can be perfored by alternatively fixing µ c or K c and iniizing over the other one [1]. The solutions are the saple ean in the -th partition µ c+1 = 1 and the saple covariance in the -th partition K c+1 = 1 x i (4) (x i µ c )(x i µ c ) T (5) The algorith includes one ore step to calculate the optial length function ln 1 p,given the partition and the c centroids [1]. We proposed a variation on the GMVQ algorith that discards the penalty ter λ ln 1 p in the distortion easure ( c by considering λ = 0 ) because we observed fro siulations that algoriths that use easures of ebership probability of each cluster ( like the prior for each cluster in EM and optial length function for GMVQ) penalize too uch the clusters with fewer assigned saple. 1) Euclidean Distance: If we set the covariance ter K, equal to the identity, the distortion easure becoes the Euclidean distance and the GMVQ Algorith reduces to the well known K-eans (or vanilla Vector Quantization). The Euclidean distance is invariant under orthogonal transforation of the data but it does not take the correlation of the variables into account. The results in figure 2 (left) show that the euclidean distortion is able to pick up the relatively very big (in nor) and very sall clusters in figure 1(b) which stand out as solitary but only a few of the spectra in the central range are distinguished. The reason being that those spectra differ ostly in shape. We also found that the segentation ap was a little bit too fragented. 2) Mahalanobis Distance: If the only constraint is λ = 0 then the distance easure is siilar to the Mahalanobis distance (with an additional ter that considers the volue of the gaussian cluster). The classifier becoes quadratic. The increased flexibility of the boundary is traded off by the decreasing of the betweencluster variance/within-cluster variance ratio. Fro figure 3 (left) we see results siilar to the previous case. The reason ight rely on the fact that the covariances of the clusters are very siilar and we know that in case the classification boundary is linear. We also notice that the GMVQ estiation for the ean is a siple average like in k-eans. The fragentation of the segentation ap is soewhat iproved by the use of the second order statistics. B. Correlation-based Cluster Analysis The correlation-based distance between a pixel x i and a centroid µ is: d c (x i, µ) = 1 ρ(x i, µ) = 1 (x i x i 1) T (µ µ1) x i x i 1 2 µ µ1 2,

CS229 REPORT, DECEMBER 05 3 0.45 0.4 5 Fig. 2. Cluster centroids (left) and Cluster Map (right) for Euclidean Distance. 0.45 0.4 5 Fig. 3. Cluster centroids (left) and Cluster Map (right) for Mahalanobis Distance. where x i, µ are the vector eans of x i and µ respectively. This is a easure that is invariant to scaling and shifting (vertically) of the expression values. We tried this setup to take into account the shape of the spectra considered as signals. The drawback is that the actual agnitudes of the spectra are ignored. If the inputs are standardized, then the above distance is equivalent to the Euclidean distance. Another drawback of the distance is that d c (x i, µ) = 0 only iplies linear relationship between x i and µ.furtherore the centroids are not obvious to interpret. We actually obtained the best results with the correlationbased distance, as we can see fro figure 4 (left). The obvoius liitation of the easure is that the high nor cluster in figure 1(b) is isplaced because the easure is noralized. We on the other hand get alost the full variability in the data. In a developent of this project we will explore a clustering algorith based on Shape and Gain Vector Quantization that takes into account correlation (shape) and nor (gain) siultaneously. The segentation ap sees less fragented. IV. MINERAL IDENTIFICATION AND UNMIXING We assue that a dictionary of ineral spectra is available to us. For this particular dataset, we extracted the dictionary fro [3] and [14]. Suppose that the ineral dictionary is given in D D = [ v 1 v 2... v n ], where v i for i = 1,..., n are the individual ineral spectra. We want to find the abundances a (j), such that, for each cluster centroid (ean) µ j, we have: µ j n a (j) i v i = Da (j) (6) in a least squares sense. Since it is unreasonable to assue that a given spectru is the linear cobination of a large nuber of dictionary

CS229 REPORT, DECEMBER 05 4 5 Fig. 4. Cluster Means (left) and Cluster Map (right) for Correlation Distance. spectra, we want to ipose a liit on the nuber of nonzero abundance coefficients. We can view this proble as selecting a sall nuber of regressors out of a given set, in order to approxiate (in a least-squares sense) a given vector. Specifically we would like to solve the following proble, for each cluster centroid: iniize Da (j) µ j 2 subject to Card(a (j) ) r a (j) 0 Here Card(a (j) ) denotes the cardinality of a (j), i.e. the nuber of nonzero eleents in a (j), or in other words the sparsity structure of the abundance vector. For our particular proble, the dictionary contains n = 117 inerals. We would like to express each centroid as a linear cobination of approxiately r = 5 of those inerals. Proble 7 reduces to a quadratic progra (QP) if the cardinality constraint is reoved. However, with this constraint present, it turns out that this proble is cobinatorial and is thus very hard to solve. Specifically, if we wanted to find the global optiu of 7 we would have to solve n!/r!(n r)! quadratic progras. Each of these QPs would correspond to a different sparsity structure in the abundance vector. Obviously solving such a nuber of probles is intractable, even for a odest value of r. There exist, however, efficient heuristics for finding approxiate solutions to this proble. As explained in [2] section 6.3.2, one ethod that works satisfactorily is to first solve the following proble, for a range of values of λ: (7) iniize Da (j) µ j 2 + λ a (j) 1 (8) By increasing the value of λ, we are in essence putting ore weight on iniizing the l-1 nor of the abundance vector a (j). This causes the solution of 8 to be sparser. We can then use the sparsity pattern given by this proble to solve the original proble. It turns out that proble 8 is equivalent to the following proble, for an appropriate choice of ɛ: iniize Da (j) µ j 2 subject to a (j) 1 ɛ This proble can be expressed as a QP with a siple transforation in the variables. The paraeter ɛ puts a liit on the axiu allowable l-1 nor of a (j). In particular, if we choose ɛ to be large, then the proble essentially becoes unconstrained. On the other hand, if ɛ is less than the l-1 nor of the optial solution of the unconstrained least-squares proble, then the constraint in 9 will be tight. In other words if we choose a sall ɛ, then we can be certain that the solution a (j) of 9 will have 1 = ɛ. Thus, since an l-1 nor constraint on a (j) will a (j) change its sparsity structure, we can change ɛ until we get the desired cardinality on the solution a (j). Now suppose we obtain an acceptable solution a (j) to 9. We then construct the atrix D, which consists of the coluns of D which correspond to non-zero entries in a (j). We then proceed to solve the following proble for each cluster centroid: iniize Dã (j) µ j 2 subject to ã (j) 0 (9) () Thus, the solution to, for a centroid µ j will give us the abundances (weights) for that given cluster corresponding to equation 6. In order to express these in ters of percentages, we then have to noralize the vector a (j). V. UNMIXING RESULTS The results of figure 5 show the estiated abundance aps for three inerals, whose presence in this region is unaniously agreed on by experts (i.e. [4]). The axiu in the scale is % (dark red). We found reference for quantitative data for ineral abundances for this dataset, naely [15]. For the ost coon inerals our results our ethod produced abundance aps which are qualitatively siilar to

CS229 REPORT, DECEMBER 05 5 those obtained used there. The values of the abundances are in broad accordance but we don t have a definitive answer of what are the ost accurate for the lack of ground truth data on ineral abundances. VI. CONCLUSION AND FUTURE WORK In this work we explored clustering techniques on a wellknown hyperspectral iage. We assessed that cluster analysis with use of the correlation distortion easure is a technique that picks up ost of the variability in the dataset. Despite the lack of quantitative reference data for ineral abundances for this dataset, our results were qualitatively in accordance with other studies. In future studies we will devise reliable perforance easures for cluster validation and ineral unixing. We will also explore the clustering perforance of an algorith that takes into account both shape and nor as an iproveent of our clustering stage. REFERENCES [1] A. Aiyer, K. Pyun, Y. Huang, D. O Brien and R.M. Gray, Lloyd Clustering of Gauss Mixture Models for Iage Copression and Classification, in Iage Counication, Vol, pp. 459-485 (05). [2] S. Boyd and L. Vandenberghe, Convex Optiization, Cabridge University Press, (04). [3] R.N. Clark, Spectroscopy of Rocks and Minerals, and Principles of Spectroscopy, in Manual of Reote Sensing, John Wiley and Sons, A. Rencz Editor, New York, (1999). [4] R. N. Clark, G. A. Swayze, K. E. Livo, R. F. Kokaly, S. J. Sutley, J. B. Dalton, R. R.McDougal, and C. A. Gent, Iaging Spectroscopy: Earth and Planetary Reote Sensing with the USGS Tetracorder and Expert Systes, Journal of Geophysical Research, Vol. 8, No. E12,, p. 5-1- 44, (Deceber 03). [5] A.P. Depster, N.M. Laird and D.B. Rubin, Maxiu-likelihood fro incoplete data via the EM algorith, Journal of the Royal Statistical Society, Ser. B, 39, (1977). [6] M.T. Eissann and R.C. Hardle, Stochastic spectral unixing with enhanced endeber class separation, Applied Optics, Vol.43, No. 36, (Deceber 04). [7] R.M. Gray, Gauss Mixture Vector Quantization, Proceedings of IEEE International Cnference on Acoustics, Speech and Signal Processing, (May 01). [8] T. Hastie, R. Tibshirani and J. Friedan, The Eleents of Statistical Learning, Springer (01) [9] N. Keshava and J.F. Mustard, Spectral unixing, IEEE Signal Processing Magazine, (January 02). [] M. Parente, An investigation of the Properties of Expectation- Maxiization and Gauss Mixture Vector Quantization in Density Estiation and Clustering, EE391 Report, Stanford University, (Septeber 04). [11] M. Petrou, Mixed pixel classification: an overview, subitted to World Scientific, (1998). [12] R. Redner and H. Walker, Mixture densities, axiu likelihood and the EM algorith, SIAM Review, 26(2), pages 195-239, (April 1984). [13] J.J Settle and N.A. Drake, Linear ixing and the estiation of ground cover proportions, International Journal of Reote Sensing, 14, pp.1159-1177, (1993). [14] D. Stein, Application of the Noral Copositional Model to the analysis of Hyperspectral Iagery, IEEE, (04). [15] D.W. Stein, The Noral Copositional Model with Applications to Hyperspectral Iage Analysis, MIT Lincoln Laboratory, Project Report NGA-8, (March 05). Fig. 5. Alunite HS295.3B (left), Kaolinite CM9 (iddle) and Muscovite GDS113 (right) abundance aps.