Mapping of hyperspectral AVIRIS data using machine-learning algorithms

Can. J. Remote Sensing, Vol. 35, Suppl. 1, pp. S106 S116, 2009 Mapping of hyperspectral AVIRIS data using machine-learning algorithms Björn Waske, Jon Atli Benediktsson, Kolbeinn Árnason, and Johannes R. Sveinsson Abstract. Hyperspectral imaging provides detailed spectral and spatial information from the land cover that enables a precise differentiation between various surface materials. On the other hand, the performance of traditional and widely used statistical classification methods is often limited in this context, and thus alternative methods are required. In the study presented here, the performance of two machine-learning techniques, namely support vector machines (SVMs) and random forests (RFs), is investigated and the classification results are compared with those from well-known methods (i.e., maximum likelihood classifier and spectral angle mapper). The classifiers are applied to an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) dataset that was acquired near the Hekla volcano in Iceland. The results clearly show the advantages of the two proposed classifier algorithms in terms of accuracy. They significantly outperform the other methods and achieve overall accuracies of approximately 90%. Although SVM and RF show some diversity in the classification results, the global performance of the two classifiers is very similar. Thus, both methods can be considered attractive for the classification of hyperspectral data. Résumé. Les images hyperspectrales fournissent une information spectrale et spatiale détaillée sur le couvert permettant ainsi une différentiation précise des différents matériaux de surface. D autre part, la performance des méthodes de classification statistique traditionnelles et les plus couramment utilisées est souvent limitée à cet égard. Ainsi, des méthodes alternatives sont nécessaires. Dans cette étude, les performances de deux techniques d apprentissage automatique séparateurs à vaste marge (SVM) et forêts aléatoires (RF) sont analysées et les résultats de classification sont comparés à ceux de méthodes bien connues (c.-à-d. classificateur par maximum de vraisemblance et méthode SAM («spectral angle mapper»). Les classificateurs sont appliqués à un ensemble de données d AVIRIS («Airborne Visible/Infrared Imaging Spectrometer») acquis près du volcan Hekla, en Islande. Les résultats démontrent clairement les avantages des deux algorithmes de classification proposés en termes de précision. Ceux-ci affichent une performance nettement supérieure à celle des autres méthodes et permettent d atteindre une précision globale d environ 90 %. Bien que les méthodes SVM et RF affichent une certaine diversité au plan des résultats de classification, la performance globale des deux classificateurs est très semblable. Ainsi, les deux méthodes peuvent être considérées comme intéressantes pour la classification des données hyperspectrales. [Traduit par la Rédaction] Introduction Waske et al 116 Hyperspectral imaging is characterized by high spectral resolution with up to hundreds of narrow bands. Hyperspectral sensors provide spectrally continuous spatial information of the land surface, ranging from the visible to the short-wave infrared regions of the electromagnetic spectrum, which enables detailed discrimination between similar surface materials (Clark, 1999; van der Meer and de Jong, 2001). Imaging spectroscopy has been used for more than two decades in the field of geology, and the reflectance properties of various surface materials have been addressed in several studies (Goetz et al., 1982; Kruse, 1988). Hyperspectral data have been successfully used for classification problems that require very precise description in spectral feature space, e.g., for mapping of lithology (Harris et al., 2005; Chen et al., 2007; Sahoo et al., 2007). The operational availability of hyperspectral data will increase through future spaceborne missions, such as the German EnMAP, thus providing more importance on hyperspectral research with a focus on classification of hyperspectral data. However, the performance of widely used traditional classification methods like the maximum likelihood classifier (MLC) can be limited when applied to hyperspectral datasets because of the high-dimensional feature space and a finite number of training samples. An increasing number of bands can often decrease classification accuracies, i.e., the well-known Hughes phenomena (Hughes, 1968), which becomes even more critical as soon as ground-truth and Received 15 January 2009. Accepted 21 July 2009. Published on the Web at http://pubservices.nrc-cnrc.ca/rp-ps/journaldetail.jsp?jcode=cjrs&=eng on 5 March 2010. B. Waske, 1 J.A. Benediktsson, and J.R. Sveinsson. Faculty of Electrical and Computer Engineering, University of Iceland, 107 Reykjavík, Iceland. K. Árnason. National Land Survey of Iceland, Stillholti 16-18, 300 Akranes, Iceland. 1 Corresponding author. Present address: Faculty of Agriculture Institute of Geodesy and Geoinformation, University of Bonn, 53115 Bonn, Germany (e-mail: bwaske@uni-bonn.de). S106

Canadian Journal of Remote Sensing / Journal canadien de télédétection reference data are limited, as in remote study regions that are difficult to access. The MLC is derived from the Bayes rule when classes have equal prior probabilities and each pixel is classified to the class that has the highest likelihood. The approach is based on the assumption that the probability density function for each class is multivariate, and often a Gaussian distribution is assumed. Many processes can be described by this model. Moreover, this type of model simplifies the approach because the model is only described by the mean vector and the covariance matrix. However, the MLC requires a theoretical limit of (number of bands + 1) training pixels per class to avoid the covariance matrix becoming singular. Thus, alternative classifier algorithms may be better suited to these cases. The spectral angle mapper (SAM) is an example of a common technique used for classifying hyperspectral imagery (Kruse et al., 1993; Dennison et al., 2004; Chen et al., 2007). This method, which is based on spectral similarity, compares unknown image spectra with known endmember spectra by calculating the angle between the corresponding vectors in feature space (Kruse et al., 1993). The similarity measure does not consider the absolute reflectance values, but refers only to the shape of the different spectra. Although SAM can achieve good results and can outperform statistical classifiers in terms of accuracy (Chen et al., 2007), recent developments from the field of machine learning have been generating more accurate results. SAM is not particularly well suited when applied to complex feature spaces (Harken and Sugumaran, 2005; van der Linden et al., 2007a). A description of MLC and SAM is given in Richards and Jia (2006). Support vector machines (SVMs) are well known in the field of machine learning and pattern recognition and were introduced to the field of remote sensing during the last few years (Huang et al., 2002; Melgani and Bruzzone, 2004). The SVM classifier is based on generating a hyperplane between the training samples which separates two classes in a multidimensional feature space. The optimization procedure aims at maximizing the margin between the closest training samples and the hyperplane. The approach performs well with small training sets, even when high-dimensional datasets are classified, because it only considers training data close to the class boundary (Melgani and Bruzzone, 2004; Pal and Mather, 2006; Fauvel et al., 2006). In Fauvel et al. (2006), for example, the number of available training samples was successively reduced when SVMs were applied to an urban hyperspectral dataset. In this study the overall accuracy that is achieved by a training set with 100 samples per class is only slightly decreased (i.e., 1.5%) compared with the results achieved by a small training set with 10 samples per class. Although SVMs perform well with a small number of training samples, a large number of samples may guarantee that adequate samples are included in the training set (Foody and Mathur, 2004). However, appropriate sampling strategies can decrease the training set size considerably without any significant loss in classification accuracy (Foody et al., 2006). SVMs have been used successfully in challenging hyperspectral applications such as the classification of heterogeneous urban areas (van der Linden et al., 2007b) and the mapping of similar lithologic classes (Sahoo et al., 2007). Another important development in the field of machine learning and image classification is the concept of multiple classifier systems (Polikar, 2006). Multiple classifier systems (MCSs) and classifier ensembles do not refer to a specific classification algorithm but describe a more general classification strategy. In contrast to traditional classification approaches, ensembles use several classifiers and combine the different outputs to generate a final result. This idea was introduced recently in the context of remote sensing (Benediktsson and Swain, 1992; Benediktsson et al., 2007) and is particularly useful in the context of hyperspectral and multisource applications (Briem et al., 2002; Ham et al., 2006; Waske and Benediktsson 2007; Cheung-Wai Chan and Paelinckx, 2008). Whereas some MCSs are based on the combination of various algorithms, others employ variants of the same algorithm. The individual classifier is trained on modified input data (e.g., subsets of training samples or input features), and the different outputs are combined by a voting strategy. Self-learning decision tree (DT) classifiers are particularly interesting in this context because of their simple handling and fast training times. A DT employs a hierarchical classification concept, whereby the training data are successively separated into smaller, increasingly homogeneous subclasses using a set of tests. These test rules, which are defined at each node of the tree, are generally based on the training data and an impurity measurement (Safavian and Landgrebe, 1991; Friedl and Brodley, 1997). Breiman s random forest (RF) classifier (Breiman, 2001), which has been used successfully for the classification of hyperspectral and multisource imagery (Gislason et al., 2006; Ham et al., 2006; Cheung-Wai Chan and Paelinckx, 2008; Waske and van der Linden, 2008), is one example of a decision tree based classifier system. The performance of SVMs and RFs for classification of hyperspectral imagery is addressed in the study presented here. The study site is the area surrounding Hekla volcano in southcentral Iceland. The area is characterized by recent volcanic activity comprising various historic lava flows, which form the most prominent features. The very similar spectral patterns of these mostly nonvegetated volcanic surface classes result in a challenging pattern recognition and classification problem. The results achieved by SVMs and RFs are compared and discussed. The present study addresses the following research questions: (1) Can advanced classification techniques improve the accuracy and reliability of the classification achieved by well-known methods such as MLC and SAM? (2) How do two different machine-learning approaches perform in comparison to each other? The study site and dataset are described, and then the classifier algorithms are introduced. The actual methods for the application of the classifiers to the data set are explained, and S107

Vol. 35, Suppl. 1, 2009 the results are presented. The final section of the paper is a discussion of the results and presents the conclusions. Study site and data The study site is the region surrounding Hekla volcano, one of the most active volcanoes in Iceland. Hekla has erupted quite regularly every 10 years since 1970, in 1970, 1980 1981, 1991, and 2000. The volcano is located on the southwestern margin of the Eastern volcanic zone in south Iceland. Hekla is a ridge built by repeated eruptions on a narrow volcanic fissure zone and comprises mainly andesitic and basaltic lavas and tephra. The elevation of the volcano is 1488 m, and it rises some 800 1000 m above the surrounding area (Soosalu et al., 2003). Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data collected on a cloud-free day (17 June 1991) were used for the classification. The sensor operates in the visible, nearinfrared, and mid-infrared portions of the electromagnetic spectrum, with its sensitivity range spanning wavelengths from 0.4 to 2.4 µm. The sensor system has 224 data channels, utilizing four spectrometers, and each spectral band is approximately 10 nm in width (Green et al., 1998). During the image acquisition, spectrometer 4 was not working properly. This particular spectrometer operates in the wavelength range from 1.84 to 2.40 µm (64 bands). These 64 bands were deleted from the imagery along with the first channels for all the other spectrometers, leaving 157 data channels for analysis. The image strip is 2048 pixels 614 pixels (see Figure 1), with a spatial resolution of 20 m (Benediktsson and Kanellopoulos, 1999). The dataset contains calibrated image radiance returned by the AVIRIS instrument during data acquisition. The units are microwatts per square centimetre per nanometre per steradian multiplied by 200 to maintain precision in the 16 bit data. The mapping procedure focuses on 22 lithologies and land cover classes, mainly lava flows from different eruptions and older hyaloclastites (erupted subglacially during the last periods of the Ice Age). Most of the area is completely void of vegetation, and only some of the lava in the southern part of the area (bottom part of Figure 1) is wholly or partly covered by vegetation. The reference data were generated based on detailed geological and vegetation maps, expert knowledge, and image interpretation. The reference information for each class was equally divided into independent training and test data. Afterward, different training sets were generated to investigate the impact of the number of training samples on the classification accuracy. Using an equalized random sampling approach, which guarantees that all classes are equally included in the sample set, 100, 200, and up to 400 samples per class were selected out of the whole training dataset (from now on referred to as tr100, tr200, and tr400; see Table 1). S108 Classification algorithms Support vector machines The basic concept of SVMs for a linearly nonseparable case aims at the definition of a separating hyperplane in a multidimensional feature space mapped by a kernel function, in which all the computations are completed in the original feature space. The redistributed data enable the fitting of a linear hyperplane between the training samples of two classes. The optimization problem attempts to maximize the margins between the hyperplane and the closest training samples (i.e., the support vectors) and minimize the error of the training samples that cannot be differentiated (Vapnik, 1998). The influence of the nonseparable samples is controlled by a regularization parameter C. A detailed introduction to the general concept of SVMs is given by Burges (1998) and Schölkopf and Smola (2002). A brief discussion is given as follows. Let x i d (i =1,2,, L) for a binary classification problem in a d-dimensional feature space d a training set of L samples with the corresponding class labels y i {1, 1}. The separating hyperplane f(x) is described by a normal vector w d and the bias b, where b / w is the distance between the hyperplane and the origin, with w as the Euclidean norm from w: f(x) =w x + b (1) The support vectors are located on two hyperplanes w x + b =±1 that are parallel to the separating hyperplane. The approach aims at maximizing the margins, between the closest samples and the hyperplane, which leads to the following optimization problem: min w 2 L C i 2 + ξ (2) i= 1 where ξ i are slack variables and C is a regularization parameter, which are used to handle misclassified samples in nonseparable cases. The parameter C is used as a penalty for samples that lie on the wrong side of the hyperplane. In fact, it controls the shape of the solution and consequently affects the generalization capability of the classifier. Thus, a large value of C might cause an overfitting to the training data. The linear SVM approach is extended for nonlinear separable cases by socalled kernel methods. During nonlinear mapping the data are transferred by a kernel function into a higher dimensional feature space. The final hyperplane decision function can be defined as L f( x) α iyik( xi, x j) + b i= 1 where α i are Lagrange multipliers and k denotes the kernel function. (3)

Canadian Journal of Remote Sensing / Journal canadien de télédétection Figure 1. Overview of the location of the study site, covering approximately 41 km 12 km. AVIRIS dataset (the red, green, and blue bands (RGB) are represented by principal components PC1, PC2, and PC3, respectively) and locations of ground-truth information for the 22 classes. A widely used kernel function is the Gaussian radial basis function (RBF) kernel (Vapnik, 1998; Schölkopf and Smola, 2002), which is given by k(x i, x j ) = exp[ γ x i x j 2 ] (4) The SVM training requires the estimation of the kernel parameter γ and the regularization parameter C. The two parameters are usually determined by a grid search, testing possible combinations of C and γ in a user-defined range. Lastly, the best combinations for γ and C were selected based on cross-validation. In contrast to DT classifiers that directly provide a final class membership and the well-known maximum likelihood classifier, which is based on likelihood of classes, the output images of SVM provided by Equation (3) contain distances between each pixel and the separating hyperplane. These distance images can then be used to determine the final class membership, using a simple majority vote and the absolute distances. SVM is a binary method in that a single SVM just separates two classes. However, remote sensing applications normally deal with multiclass problems, and there are two main strategies for multiclass extensions of SVMs. A classification problem with c classes is divided into several binary subproblems. The one-against-one (OAO) method generates a set of c(c 1)/2 individual SVM classifiers, one for each possible pair of classes. A majority vote is applied to the c(c 1)/2 rule images to compute the final class membership. In contrast, the one-against-all (OAA) strategy is based on c binary classifiers, separating each class from the remaining ones. The absolute maximum of Equation (3) defines the final class label. Alternatively, the SVM classifier can be directly described as a multiclass problem (Hsu and Lin, 2002) that requires fewer support vectors and seems adequate to derive accurate S109

Vol. 35, Suppl. 1, 2009 Table 1. Land cover classes and number of samples within training set tr400 and the test set. Class classification accuracies with a small sample set (Mathur and Foody, 2004). Random forests The strategy of a classifier ensemble, such as RF, is based on the assumption that independent classifiers produce individual errors, which are not produced by the majority of other classifiers. In the RF, each individual DT within the ensemble is trained on a bootstrapped sample (i.e., subsets of training data) of the original training samples. This approach is based on random selection of training samples from the whole training sample set with replacement. The method randomly selects n samples from a training sample set of same size n. Because the selection is done with replacement, some training samples can be selected several times for a specific training set, whereas other samples are not considered in this particular sample set. In contrast, the random selection without replacement selects m samples out of a set with size n, where m < n. Thereby, each training sample can be selected only once in each sample subset. In addition, the split rule at each node of a DT is derived using only a randomly selected feature subset of the input data. Lastly, a majority vote is used to create the final classification result. The computational complexity is simplified by the reduction of the number of input features at each internal node. This enables RF to deal with high-dimensional datasets, and the algorithm is generally computationally lighter than conventional ensemble strategies (Gislason et al., 2006). S110 No. of samples Training set tr400 Test set Andesite lava 1970 400 950 Andesite lava 1980 I 400 424 Andesite lava 1980 II 400 748 Andesite lava 1991 I 400 1369 Andesite lava 1991 II 205 205 Andesite lava with birch bushes 400 790 Andesite lava with sparse moss cover 400 511 Andesite lava with thick moss cover 400 2007 Old unvegetated andesite lava I 400 1083 Old unvegetated andesite lava II 400 799 Old unvegetated andesite lava III 357 356 Hyaloclastite formation I 400 702 Hyaloclastite formation II 342 342 Hyaloclastite formation III 400 635 Hyaloclastite formation IV 341 341 Lava covered with tephra and scoria 350 350 Lichen covered basalt lava 400 511 Rhyolite 202 202 Scoria 275 275 Volcanic tephra 400 1631 Firn and glacier ice 229 229 Snow 506 506 Different methods exist for the generation of the DT classifiers which usually are based on the measurement of impurity of the data within potential nodes. The RF algorithm is based on the Gini index of Breiman (2001). This measurement describes the impurity for a given node. The method aims to find the largest homogeneous subclass within the training samples and discriminate it from the remaining training dataset (Zambon et al., 2006). The Gini index is summed and the split rule that results in a maximum reduction of impurity is used as a decision rule (Duda et al., 2001). The Gini index is given by c ωi i= 1 Gini( t) = p ( 1 p ) ω i where c is the number of classes, and pω i is the probability or the relative frequency of class ω i at node t and is defined as p ω i (5) lωi = (6) L in which lω is the number of samples belonging to class ω i i, and L is the total number of training samples. Methods To simplify the parameter selection for the SVM, the imagery was scaled between 0 and 1 before training. The OAO approach was used to solve the multiclass problem using a set of binary classifiers with optimized parameters for γ and C. The training of the SVM with a Gaussian kernel and the classification were performed using imagesvm (available from www.hu-geomatics.de) (Janz et al., 2007). ImageSVM is a freely available IDL/ENVI implementation that uses the LIBSVM approach (Chang and Lin, 2001) for the training of the SVM. The kernel parameters C and γ are determined by a grid search, using a three-fold cross-validation. Possible combinations of C and γ are tested in a user-defined range, and the best parameter combinations were used for the final classification. Freely available Fortran code was used for the RF classification (available from www.stat.berkeley.edu/ ~breiman/randomforests/). To investigate the influence of the number of iterations (i.e., number of DTs within the ensemble), 25, 50, 75, and 100 iterations were performed. The size of the feature subset, which is user defined, is set to the square root of the number of input features (i.e., 13), which has shown good results in previous studies (Gislason et al., 2006; Cheung-Wai Chan and Paelinckx, 2008). Both classifiers were trained with the same training sample set, containing 100, 200, and up to 400 randomly selected samples per class. For comparison, SAM and an MLC were applied to the dataset using both training datasets. For SAM, the spectra of the training samples of each lithological unit class were averaged to construct a final class spectrum.

Canadian Journal of Remote Sensing / Journal canadien de télédétection The classification results were evaluated by a detailed accuracy assessment and visual interpretation. Accuracy assessment was performed using overall accuracies and confusion matrices to derivate the producer s and user s accuracies. In addition, the kappa coefficient of agreement was derived. For a statistical-based comparison of the produced maps, the significance of differences between two different maps was assessed using a McNemar test with a 95% confidence interval. This nonparametric test, which is based on the standardized normal test statistics, is defined as follows: z = f f f + f 12 21 12 21 where f 12 is the number of samples correctly classified by classifier 1 but misclassified by classifier 2, and f 21 is the number of samples misclassified by classifier 1 but correctly classified by classifier 2 (Foody, 2004). Besides the McNemar test, the maps produced by RF and SVM were compared by Spearman s rank correlation coefficient. Classification results Accuracy assessment Comparing the test areas, the accuracy assessment demonstrates that both machine-learning algorithms perform very accurately and achieve overall accuracies of 94.7% (SVM) and 88.7% (RF) using a small training dataset from 100 samples per class (see Table 2). In contrast, the SAM classifier results in significantly lower accuracy, and the MLC cannot be applied in this case. 2 As expected, the more accurate results are generated by a larger training set, resulting in overall accuracies of 96.9% (SVM) and 90.7% (RF). Comparison of these results with the accuracies achieved by the MLC (67.7%) and SAM (69.7%) indicates that the two machine-learning algorithms SVM and RF significantly outperform traditional methods in terms of overall accuracy. The inclusion of additional samples in the training process (i.e., tr400) further increases the classification accuracy. However, most obvious is that the accuracy achieved by the MLC is increased by up to 13% by increasing the size of the training set. On the other hand, the accuracies of the SVM and RF increased by only 1.0% and 2.5%, respectively. Compared with the overall accuracies, the kappa coefficients show similar characteristics (see Table 2). In addition, a maximum likelihood classification was performed using a dimensionality-reduced dataset. A conventional principal component analysis was applied to the original image. One hundred bands (principal components, PCs) were selected, which explain approximately 99% of the total variance of the dataset. Whereas the overall accuracy achieved with the largest training sample set remains similar to the accuracy achieved on the whole dataset, the accuracy significantly increased up to 88% using the training (7) set with 200 samples per class. Similar results were achieved when the training set with 100 samples per class was applied to selected principal components. These results support the (wellknown) dependency of the MLC approach on the number of training samples (with regard to the dimension of the dataset). The classification results achieved by SVM and RF were compared using the McNemar test with a 95% confidence interval. The test demonstrates that the two methods provide significantly different results. The values of Spearman s rank correlation coefficient ρ (between 0.57 and 0.65) underline this observation. However, these comparisons need to be investigated in more detail. To investigate the impact of the number of iterations within the RF (i.e., number of individual DTs in the ensemble) on the classification accuracy, classifications with 25, 50, 75, and 100 iterations were performed. Twenty-five iterations within the RF ensemble resulted in 89.5% accuracy when compared with that for the test areas (Table 3), and additional iterations within the ensemble have smaller and nonsignificant effects on the overall accuracy. The producer s and user s accuracies demonstrate the balanced performance between the two machine-learning techniques in terms of accuracy (Table 4). The two classes Firn and glacier ice and Snow are classified most accurately, irrespective of the algorithms. One reason for this is the strong difference between these two classes in comparison with other classes within the spectral feature space. The SVM and the RF perform weaker on the four land cover types Old unvegetated andesite lava III, Hyaloclastite formation I, Hyaloclastite formation III, and Scoria in terms of classification accuracy, resulting in producer s accuracies between 86.6% and 91.6%. These values are much lower than those achieved for the other classes (96.5% on average). The producer s accuracies for these difficult classes achieved by the RF are even lower than those with the SVM (e.g., 61.7% for Hyaloclastite formation III ). In addition, the differences between the producer s and user s accuracies were calculated to evaluate the balance between the two measurements (Figure 2). Whereas many classes show balanced accuracies, some classes are strongly biased, particularly towards the producer s accuracies. The balance between Old unvegetated andesite lava III and Hyaloclastite formation I is highest for the RF, whereas SVM cause the highest differences for Andesite lava 1990 II and Hyaloclastite formation III. Moreover, the results produced by the RF seem to be more imbalanced than the accuracies achieved by the SVM. One reason for the lower and imbalanced accuracies might be the similarity between the spectral responses of different classes, which results in confusion between these lithological units. The spectra of hyaloclastite units, as Hyaloclastite formation I and Old unvegetated andesite lava III, appear similar; consequently, differentiation between these two classes is more difficult. 2 The MLC requires a theoretical limit of (number of bands + 1) training pixels per class to avoid a singular covariance matrix. S111

Vol. 35, Suppl. 1, 2009 Table 2. Overall accuracy and kappa coefficient using different classifier algorithms and training datasets. tr100 tr200 tr400 Classifier Accuracy (%) Kappa Accuracy (%) Kappa Accuracy (%) Kappa SVM 94.7 0.94 96.9 0.97 98.1 0.98 RF 88.7 0.88 90.7 0.90 93.2 0.93 MLC 67.7 0.66 90.5 0.89 SAM 69.2 0.67 69.7 0.67 69.2 0.67 Visual interpretation The visual interpretation supports the results of the statistical accuracy assessment; the maps appear similar in many regions (Figure 3) but also show some differences. Particularly large homogeneous regions do not show significant differences. However, the main discrepancy appears to be that some linear structures of the class Volcanic tephra are more distinct on the RF map than on the SVM map. However, it is assumed that the linear features are related to artifacts. Parallel linear features of Hyaloclastite formation III and Old unvegetated andesite lava II are visible in the southeast of the map (i.e., lower right-hand corner). The two maps are significantly different in this region, and in this case the structures appear more distinct in the SVM result than in the RF classification. In addition, some differences are visible in smaller regions. Discussion and conclusion The performance of two algorithms (support vector machines (SVMs) and random forests (RFs) from the field of machine learning) on the classification of hyperspectral imagery was assessed in this paper. Generally, SVMs and RFs have the potential to be more accurate in the classification of hyperspectral data than the maximum likelihood classifier because of the limited number of training samples that are usually available and the high dimensionality of the dataset. The low accuracy achieved by the spectral angle mapper (SAM) for the current study area is not necessarily a general limitation of the method. SAM has performed well in several studies, particularly when applied to spectroscopy data. Moreover, the approach can be improved, e.g., by using spectral libraries. Nevertheless, for a typical supervised classification approach, the two proposed algorithms may be superior because they provide better results independent of the size of the training sample sets. The high producer s and user s accuracies underline the good performance, and on average the individual lithological units are classified with average accuracies of approximately 90%. As mentioned previously, the methods perform weaker on some land cover types (e.g., S112 Table 3. Overall accuracy using the RF with a different number of iterations and training set tr200. No. of iterations 25 50 75 100 Accuracy (%) 89.5 90.2 90.4 90.5 Old unvegetated andesite lava III, Hyaloclastite formation I, Hyaloclastite formation III, and Scoria ) in terms of classification accuracy, resulting in values that are much lower than the average accuracies achieved for the other classes. One reason for this observation might be the similarity between the spectral responses of various classes, which results in confusion between different classes. In terms of the overall and class accuracies, SVMs outperform RFs. Moreover, SVMs achieve more balanced producer s and user s accuracies. Nevertheless, the RFs achieve accurate results with overall accuracies of 90% and higher. The differences in the produced maps were also underlined by the McNemar test and Spearman s rank correlation coefficients. However, it is interesting to note that the two classifiers do not generate identical results, and the diversity between the two classifiers is interesting. Diversity is the basic assumption of an adequate classifier combination, whereas combining similar classification results would not further improve the overall accuracy. Thus it could be interesting to combine both algorithms to further increase the classification accuracy (Waske and van der Linden, 2008). The main reason for the discrepancies between the maps may be the two different classifier concepts. SVM classifiers are based on a few relatively complex decision boundaries. In linear nonseparable cases, the data are transferred into a higher dimensional feature space that enables the generation of a decision boundary that appears nonlinear in the original feature space. In contrast, the RF approach is based on several DT classifiers, which construct many rather simple decision boundaries that are parallel to the feature axis. On the whole, the two classification approaches are useful for classification of hyperspectral data. They perform well, even with small training sample sets. This observation is particularly important for classification of remote and isolated environments with difficult access and lack of infrastructure (i.e., northern environments). In such regions, fieldwork is often limited (and very costly), and mapping strategies that are based on limited ground-truth data seem favorable. Adequate sampling strategies can further decrease the number of required training samples (Foody et al., 2006). Although the SVM results in higher overall accuracies, the RF is also attractive as an accurate and simple approach. The RF is easy to handle because it mainly depends on only two user-defined values (i.e., the number of selected features and the number of iterations). On the other hand, the parameters for the SVM cannot be as simply defined as those for the RF, but an automatic grid search in the SVM case enables

Canadian Journal of Remote Sensing / Journal canadien de télédétection Table 4. Producer s and user s accuracies using SVM and RF (100 iterations) with training set tr200. Class SVM RF Producer s accuracy (%) User s accuracy (%) Producer s accuracy (%) Andesite lava 1970 95.4 93.4 87.6 87.6 Andesite lava 1980 I 98.4 96.9 96.5 85.7 Andesite lava 1980 II 95.6 96.2 87.8 88.4 Andesite lava 1991 I 98.4 99.4 99.2 98.5 Andesite lava 1991 II 100.0 87.9 97.6 81.3 Andesite lava with birch bushes 99.7 99.1 97.3 93.1 Andesite lava with sparse moss cover 99.2 98.8 89.8 95.4 Andesite lava with thick moss cover 99.8 99.8 97.2 98.4 Old unvegetated andesite lava I 97.9 98.2 94.3 97.6 Old unvegetated andesite lava II 96.7 95.9 91.8 87.3 Old unvegetated andesite lava III 89.6 77.4 89.0 63.3 Hyaloclastite formation I 91.5 96.6 65.5 84.7 Hyaloclastite formation II 94.7 97.0 81.0 79.8 Hyaloclastite formation III 86.6 91.7 61.7 78.1 Hyaloclastite formation IV 97.9 96.8 97.6 87.9 Lava with tephra and scoria 97.1 97.9 95.7 83.3 Lichen covered basalt lava 95.9 95.3 91.0 88.7 Rhyolite 98.5 95.6 88.1 68.5 Scoria 91.6 96.2 67.6 66.0 Volcanic tephra 97.8 98.9 92.5 99.2 Firn and glacier ice 100.0 99.6 99.6 98.7 Snow 100.0 100.0 99.6 100.0 User s accuracy (%) Figure 2. Differences between producer s and user s accuracy generated by the SVM and RF classifications. S113

Vol. 35, Suppl. 1, 2009 Figure 3. Classification results achieved by (a) SVM and (b) RF. parameter selection. In conclusion, both classification algorithms are useful for discriminating different volcanic units in Iceland based on hyperspectral datasets. It is anticipated that these classification techniques will also be appropriate for most northern environments where hyperspectral data are used for mapping lithology. Acknowledgements This work was partially funded by the Research Fund of the University of Iceland and the Icelandic Research Fund. We appreciate the comments of the anonymous reviewers who helped us significantly improve the paper. S114 References Benediktsson, J.A., and Kanellopoulos, I. 1999. Classification of multisource and hyperspectral data based on decision fusion. IEEE Transactions on Geoscience and Remote Sensing, Vol. 37, No. 3, pp. 1367 1377. Benediktsson, J.A., and Swain, P.H. 1992. Consensus theoretic classification methods. IEEE Transactions on Systems, Man and Cybernetics, Vol. 22, No. 4, pp. 688 704. Benediktsson, J.A., Chanussot, J., and Fauvel, M. 2007. Multiple classifier systems in remote sensing: from basics to recent developments. In Multiple Classifier Systems, Proceedings of the 7th International Workshop, MCS 2007, Prague. Lecture Notes in Computer Science 4472. Edited by M. Haindl, J. Kittler, and F. Roli. Springer, Berlin.

Canadian Journal of Remote Sensing / Journal canadien de télédétection Breiman, L. 2001. Random forests. Machine Learning, Vol. 45, No. 1, pp. 5 32. Briem, G.J., Benediktsson, J.A., and Sveinsson, J.R. 2002. Multiple classifiers applied to multisource remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, Vol. 40, No. 10, pp. 2291 2299. Burges, C.J.C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, Vol. 2, pp. 121 167. Chang, C.-C., and Lin, C.-J. 2001. LIBSVM: a library for support vector machines. Available at www.csie.ntu.edu.tw/~cjlin/libsvm. Chen, X., Warner, T.A., and Campagna, D.J. 2007. Integrating visible, nearinfrared and short-wave infrared hyperspectral and multispectral thermal imagery for geological mapping at Cuprite, Nevada. Remote Sensing of Environment, Vol. 110, No. 3, pp. 344 356. Cheung-Wai Chan, J., and Paelinckx, D. 2008. Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sensing of Environment, Vol. 112, No. 6, pp. 2999 3011. Clark, R.N. 1999. Spectroscopy of rocks and minerals, and principles of spectroscopy, remote sensing for the Earth sciences. In Manual of remote sensing. Edited by A.N. Rencz. 3rd ed. John Wiley and Sons, New York. Vol. 3, pp. 3 58. Dennison, P.E., Halligan, K.Q., and Roberts, D.A. 2004. A comparison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper. Remote Sensing of Environment, Vol. 93, No. 3, pp. 359 367. Duda, R.O., Hart, P.E., and Stork, D.G. (Editors). 2001. Pattern classification. 2nd ed. John Wiley & Sons Inc., Chichester, N.Y. Fauvel, M., Chanussot, J., and Benediktsson, J.A. 2006. Evaluation of kernels for multiclass classification of hyperspectral remote sensing data. In ICASSP 2006, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 14 19 May 2006, Toulouse, France. IEEE, New York. Vol. 2, pp. 813 816. Foody, G.M. 2004. Thematic map comparison: evaluating the statistical significance of differences in classification accuracy. Photogrammetric Engineering & Remote Sensing, Vol. 70, pp. 627 633. Foody, G.M., and Mathur, A. 2004. A relative evaluation of multiclass image classification of support vector machines. IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, pp. 1335 1343. Foody, G.M., Mathur, A., Sanchez-Hernandez, C., and Boyd, D.S. 2006. Training set size requirements for the classification of a specific class. Remote Sensing of Environment, Vol. 104, No. 1, pp. 1 14. Friedl, M.A., and Brodley, C.E. 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, Vol. 61, No. 3, pp. 399 409. Green, R.O., Eastwood, M.L., Sarture, C.M., Chrien, T.G., Aronsson, M., Chippendale, B.J., Faust, J.A., Pavri, B.E., Chovit, C.J., Solis, M., Olah, M.R., and Williams, O. 1998. Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sensing of Environment, Vol. 65, No. 3, pp. 227 248. Gislason, P.O., Benediktsson, J.A., and Sveinsson, J.R. 2006. Random Forests for land cover classification. Pattern Recognition Letters, Vol. 27, pp. 294 300. Goetz, A.F.H., Rowan, L.C., and Kingston, M.J. 1982. Mineral identification from orbit: initial results from the shuttle multispectral infrared radiometer. Science (Washington, D.C.), Vol. 218, pp. 1020 1031. Ham, J., Chen, Y., and Crawford, M.M. 2006. Investigation of the Random Forest framework for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, Vol. 43, No. 3, pp. 492 501. Harken, J., and Sugumaran, R. 2005. Classification of Iowa wetlands using an airborne hyperspectral image: a comparison of the spectral angle mapper classifier and an object-oriented approach. Canadian Journal of Remote Sensing, Vol. 31, No. 2, pp. 167 174. Harris, J.R., Rogge, D., Hitchcock, R., Ijewliw, O., and Wright, D. 2005. Mapping lithology in Canada s Arctic: application of hyperspectral data using the minimum noise fraction transformation and matched filtering. Canadian Journal of Earth Sciences, Vol. 42, No. 12, pp. 2173 2193. Hsu, C.-W., and Lin, C.-J. 2002. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, Vol. 13, No. 2, pp. 415 425. Huang, C., Davis, L.S., and Townshend, J.R. 2002. An assessment of support vector machines for land cover classification. International Journal of Remote Sensing, Vol. 23, No. 4, pp. 725 749. Hughes, G.F. 1968. On the mean accuracy of statistical pattern recognition. IEEE Transactions on Information Theory, Vol. 14, pp. 55 63. Janz, A., Schiefer, S., Waske, B., and Hostert, P. 2007. imagesvm A useroriented tool for advanced classification of hyperspectral data using support vector machines. In Imaging Spectroscopy: Innovation in Environmental Research: Proceedings of the 5th Workshop of the EARSeL Special Interest Group on Imaging Spectroscopy, 23 25 April 2007, Bruges, Belgium. EARSeL Secretariat, Hannover, Germany. Kruse, F.A. 1988. Use of airborne imaging spectrometer data to map minerals associated with hydrothermally altered rocks in the northern Grapevine Mountains, Nevada and California: Remote Sensing of Environment, Vol. 24, No. 1, pp. 31 51. Kruse, F.A., Lefkoff, A.B., Boardman, J.W., Heidebrecht, K.B., Shapiro, A.T., Barloon, P.J., and Goetz, A.F.H. 1993. The Spectral Image Processing System (SIPS). Interactive visualization and analysis of imaging spectrometer data. Remote Sensing of Environment, Vol. 44, No. 2 3, pp. 145 163. Mathur, A., and Foody, G.M. 2008. Multiclass and binary SVM classification: implications for training and classification users. IEEE Geoscience and Remote Sensing Letters, Vol. 5, No. 2, pp. 241 245. Melgani, F., and Bruzzone, L. 2004. Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No. 8, pp. 1778 1790. Pal, M., and Mather, P.M. 2006. Some issues in the classification of DAIS hyperspectral data. International Journal of Remote Sensing, Vol. 27, No. 14, pp. 2895 2916. Polikar, R. 2006. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, Vol. 6, pp. 21 45. Richards, J.A., and Jia, X. 2006. Remote sensing digital image analysis: an introduction. Springer, New York. Safavian, S.R., and Landgrebe, D.A. 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 21, pp. 660 674. Sahoo, B.C., Oommen, T., Misra, D., and Newby, G. 2007. Using the onedimensional S-transform as a discrimination tool in classification of hyperspectral images. Canadian Journal of Remote Sensing, Vol. 33, No. 6, pp. 551 560. S115

Vol. 35, Suppl. 1, 2009 Schölkopf, B., and Smola, A. 2002. Learning with kernels. MIT Press, Cambridge, Mass. Soosalu, H., Einarsson, P., and Jakobsdóttir S. 2003. Volcanic tremor related to the 1991 eruption of the Hekla volcano, Iceland. Bulletin of Volcanology, Vol. 65, No. 8, pp. 562 577. Vapnik, V.V. 1998. Statistical learning theory. Wiley, New York. van der Linden, S., Waske, B., and Hostert, P. 2007a. Towards an optimized use of the spectral angle space. In Imaging Spectroscopy: Innovation in Environmental Research: Proceedings of the 5th Workshop of the EARSeL Special Interest Group on Imaging Spectroscopy, 23 25 April 2007, Bruges, Belgium. EARSeL Secretariat, Hannover, Germany. van der Linden, S., Janz, A., Waske, B., Eiden, M., and Hostert, P. 2007b. Classifying segmented hyperspectral data from a heterogeneous urban environment using support vector machines. Journal of Applied Remote Sensing [online], Vol. 1, No. 013543. van der Meer, F., and de Jong, S.M. 2001. Imaging spectrometry: basic principles and prospective applications. Kluwer Academic Publishers, London, UK. Waske, B., and Benediktsson, J.A. 2007. Fusion of support vector machines for classification of multisensor data. IEEE Transactions on Geoscience and Remote Sensing, Vol. 45, No. 12, pp. 3858 3866. Waske, B., and van der Linden, S. 2008. Classifying multilevel imagery from SAR and optical sensors by decision fusion. IEEE Transactions on Geoscience and Remote Sensing, Vol. 46, No. 5, pp. 1457 1466. Zambon, M., Lawrence, R., Bunn, A., and Powell, S. 2006. Effect of alternative splitting rules on image processing using classification tree analysis. Photogrammetric Engineering & Remote Sensing, Vol. 72, pp. 25 30. S116