Online publication date: 22 January 2010 PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: On: 29 January 2010 Access details: Access Details: Free Access Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Remote Sensing Letters Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t915281289 Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the statistic B. Ghimire a ; J. Rogan a ; J. Miller b a Graduate School of Geography, Clark University, Worcester, MA, USA b Department of Geography and the Environment, University of Texas, Austin, TX, USA Online publication date: 22 January 2010 To cite this Article Ghimire, B., Rogan, J. and Miller, J.(2010) 'Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the statistic', Remote Sensing Letters, 1: 1, 45 54 To link to this Article: DOI: 10.1080/01431160903252327 URL: http://dx.doi.org/10.1080/01431160903252327 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Remote Sensing Letters Vol. 1, No. 1, March 2010, 45 54 Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the statistic B. GHIMIRE*, J. ROGAN and J. MILLER Graduate School of Geography, Clark University, Worcester, MA 01610, USA Department of Geography and the Environment, University of Texas, Austin, TX 78712, USA (Received 10 April 2009; in final form 11 August 2009) Land-cover characterization of large heterogeneous landscapes is challenging because of the confusion caused by high intra-class variability and heterogeneous landscape artefacts. Neighbourhood context can be used to supplement spectral information, and a novel way of incorporating spatial dependence in a heterogeneous region is tested here using an ensemble learning technique called random forests and a measure of local spatial dependence called the statistic. The overall Kappa accuracy of the random forest classifier that used a combination of spectral and local spatial () variables at three different neighbourhood sizes (3 3, 7 7, and 11 11) ranged from 0.85 to 0.92. This accuracy was higher than that of a non-spatial random forest classifier having an overall Kappa accuracy of 0.78, which was run using the spectral variables only. This study demonstrated that the use of the statistic with different neighbourhood sizes leads to substantial increase in per class classification accuracy of heterogeneous land-cover categories. 1. Introduction Methods of land-cover characterization using medium spatial resolution data (15 30 m) are well established and near operational (Franklin and Wulder 2002). Unfortunately, at fine spatial scales, the size of the pixel is still often larger than the land-cover element causing class mixing within pixels (Blaschke et al. 2004), and over large areas map products lack spatial precision owing to heterogeneous landscape artefacts such as topographic displacement, moisture gradients, and antecedent disturbances (i.e. natural and anthropogenic) (Rogan and Miller 2006). Random forest classifiers provide a new way to produce thematic maps that are potentially robust to variations in class reflectance caused by gradients or disturbances in regional scale mapping and high intra-class variability influenced by landscape heterogeneity, which is evident in the class labelling errors in the calibration data used in land-cover characterization (see Ham et al. 2005, Gislason et al. 2006, Lippitt et al. 2008, Rogan et al. 2008). Random forest classifiers are ensemble algorithms developed in the field of machine learning and use bootstrap samples with replacement to grow a large set of classification trees (Breiman 2001). In a land-cover classification context, pixels are assigned to *Corresponding author. Email: bghimire@clarku.edu Remote Sensing Letters ISSN 2150-704X print/issn 2150-7058 online # 2010 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/01431160903252327

46 B. Ghimire et al. categories that receive the maximum number of votes from the collection of multiple trees. Random forests do not overfit the data distributions because the large number of trees grown reduces generalization error (Breiman 2001, Pal 2005) and has been shown to increase land-cover classification accuracy (Ham et al. 2005, Pal 2005, Gislason et al. 2006), because the classification error of one permutation can be overcome by the ensemble of permutations (Kotsiantis and Pintelas 2004). Furthermore, random forests provide measures of variable importance (e.g. mean decrease in Gini coefficient) that can be used to understand the contribution of specific variables in a classification (Gislason et al. 2006). Random forests represent the state of the art in land-cover classification but, as with most classification methods, do not include spatial dependence among neighbouring pixels even though spatial dependence or context can be used to increase land-cover classification accuracies (Stuckens et al. 2000). The incorporation of spatial dependence into random forests classification has the potential to increase the precision of class designation by minimizing intra-class variation (Wulder and Boots 1998). Thus, the incorporation of spatial dependence in random forests has the potential to complement its inability to utilize spatial relationships among neighbouring pixels. The incorporation of spatial dependence or context has been approached in the literature using (i) texture-based transformed spectral bands as variables in landcover classification in order to capture the local variability of the digital numbers within a neighbourhood using geostatistics (Atkinson and Lewis 2000) and spatial autocorrelation statistics (Myint et al. 2007) and (ii) majority smoothing filter applied to the classified image (Mather 2004). In this context, this study presents a novel contextual texture-based classification method to assess the capability of a random forest algorithm to classify land cover by integrating spectral and variables in a heterogeneous landscape using Landsat-7 Enhanced Thematic Mapper-plus (ETMþ) imagery. An improved approach to incorporate spatial dependence in land-cover classification is through the statistic, Gi*. This statistic behaves similarly to a moving filter in a remote sensing context by considering pixel values within a local neighbourhood of the focus pixel (which helps to remove labelling errors caused by noisy data or complex spectral measurement space) while simultaneously accommodating the values in the entire image reflecting global landscape heterogeneity characteristics (Wulder and Boots 1998, Richards and Jia 2006). 2. Methodology 2.1 Random forests A random forest classifier (i.e. random forests) consists of a group of tree-based classifiers {h (x, Q k ), k ¼ 1,...} where x is the input vector and Q k are the independent identically distributed random vectors (Breiman 2001). Random forests use bootstrap samples with replacement to grow a large collection of classification trees, which assign each pixel to a class based on the maximum number of votes that a class receives from the collection of trees. Each tree is grown from a randomly and independently selected subspace (certain proportion of pixels) of the measurement space (training pixels) that is used to train the random forest classifier and the remaining samples called out of bag cases are used to assess the accuracy of the classification. Variable importance is calculated as the sum of the decrease in Gini for each variable calculated from the group of tree-based classifiers. This algorithm is

Spatial dependence in land-cover classification 47 easy to implement because the user has to adjust for only two parameters: (i) number of trees to grow and (ii) number of randomly selected split variables at each node. The strong law of large numbers ensures that the solution always converges with no over fitting. 2.2 statistic The statistic is a local indicator of spatial dependence ( and Ord 1992, Ord and 1995), which describes local variability in spatial dependence (Wulder and Boots 1998). The application of the statistic results in the creation of a new image representing the spatial structure of a given spectral image (Wulder and Boots 1998). The statistic, Gi, can be used to identify clusters of high values called hot spots or clusters of low values called cold spots. It is computed as P w ij x j Gi j ¼ P (1) x j where G i is the statistic value, w ij are the contiguity-based weights specified as w ij ¼ 1 if the pixel j falls in the filter window of the central pixel i and w ij ¼ 0 if the pixel j falls outside, and x j is the value of pixel j (including the value at the central pixel i). Conceptually G i behaves similarly to a moving filter window and can be computed for each pixel of a given spectral band as the ratio of the sum of radiometric values in the neighbourhood of a central pixel to the sum of radiometric values of all pixels in the study area. 2.3 Data and methods Satellite sensor images of Cape Cod, Massachusetts, captured by Landsat-7 ETMþ on 7 September 2001 (path/row 11/31) were used in this analysis. True-colour 1:5000 orthorectified aerial photographs from April 2001 (http://www.mass.gov/mgis/) were used to identify land-cover categories in the area and serve as a source for calibration and validation datasets. The eight land-cover classes considered in this study were cranberry bogs, pasture or row crops, forest, grassland, urban, water, wetland, and sand quarry. The ground reference dataset of land-cover categories was split randomly into a 75% subset of calibration samples and 25% subset of validation samples. Table 1 presents the different land-cover categories examined in this analysis and the number of calibration and validation pixels assigned to each category. j Table 1. Number of calibration and validation pixels for different land-cover classes. Class name Calibration pixels Validation pixels Cranberry bogs 23 9 Pasture or row crops 22 10 Forest 157 67 Grassland 91 33 Urban 163 69 Water 81 39 Wetland 133 65 Sand quarry 48 15

48 B. Ghimire et al. A random forest classification model, hereafter called non-spatial random forest classifier, was run on the six spectral bands using the random forest package in the R statistical software (R Development Core Team 2008). This model was calibrated by varying two input parameters: (i) the number of trees from 100 to 1000 in increments of 100 and (ii) the number of randomly selected split variables from 1 to 6 (where 6 refers to the number of bands) in increments of 1 in order to select a single model built by the combination of these input parameters having the highest accuracy. Post-classification smoothing majority filters of 3 3, 7 7, and 11 11 pixel sizes using a mode decision rule were applied to this non-spatial random forest classified image. In contrast, another classification model, hereafter called random forest classifier, utilized the random forest classification algorithm run on the six variables computed on each spectral band in addition to the six spectral bands. Three different random forest classifiers were tested for statistic neighbourhood sizes of 3 3, 7 7, and 11 11 pixels with the same input parameters as those used for the non-spatial random forest classifier, in order to provide consistent results for comparison with the non-spatial random forest classifier. These three neighbourhood sizes were considered to investigate the influence of different spatial scales on the classification accuracy. Comparison of the accuracy of the different variants of the random forest model was performed using the Kappa statistic that takes into account the actual agreement specified by the major diagonal of the confusion matrix and chance agreement indicated by the row and column totals of the confusion matrix (Congalton 1991). Furthermore, accuracies of individual classes were assessed with per class Kappa measures (Stehman 1997). 3. Results The non-spatial random forest classifier parameterized with 600 trees and 3 randomly selected split variables at each node had the highest overall validation accuracy (overall Kappa ¼ 0.78 with Z ¼ 29.15; average per class Kappa ¼ 0.72 with standard deviation ¼ 0.24) amongst the non-spatial random forest classifiers parameterized by varying the number of trees and number of split variables (table 2). Figure 1 shows the classified map produced by the non-spatial random forest classifier. Table 3 shows the confusion matrix of the non-spatial random forest classifier. The Table 2. Accuracy measures of classifiers with and without spatial dependence. Classification method Overall accuracy (%) Overall Kappa Mean per class Kappa standard deviation Non-spatial random forest 81.76 0.7782 0.7245 0.2417 Non-spatial random forest 3 3 90.55 0.8850 0.8215 0.2968 pixels majority filter Non-spatial random forest 7 7 86.64 0.8363 0.7600 0.3153 pixels majority filter Non-spatial random forest 11 11 77.52 0.7214 0.6112 0.3205 pixels majority filter Random forest 3 3 pixels 87.62 0.8496 0.8123 0.1708 Random forest 7 7 pixels 93.49 0.9209 0.8961 0.1199 Random forest 11 11 pixels 93.49 0.9212 0.9303 0.0629

Spatial dependence in land-cover classification 49 Figure 1. Land-cover maps produced by (a) non-spatial random forest classifier (top left), (b) random forest classifier using 11 11 pixels moving window (top right) and (c) difference map of non-spatial random forest and random forest (11 11 pixels) land-cover maps (bottom left). post-classification smoothing majority filters applied to the non-spatial random forest classified image showed a decreasing trend in accuracy with increasing neighbourhood size (table 2), reflecting the trade-off between retaining the signal versus the unwarranted noisy pixels. A majority filter neighbourhood size of 3 3 pixels produced the highest accuracy (overall Kappa ¼ 0.89 with Z ¼ 43.81; average per class Kappa ¼ 0.82 with standard deviation ¼ 0.30), whereas 11 11 pixels had the lowest accuracy (overall Kappa ¼ 0.72 with Z ¼ 24.79; average per class Kappa ¼ 0.61 with standard deviation ¼ 0.32). The random forest classifier that integrates the spectral and variables led to a significant increase in land-cover map accuracy (table 2). In general, there was an increase in overall Kappa value with increasing neighbourhood sizes from 3 3 to 11 11 pixels for the random forest classifiers. The largest neighbourhood size of 11 11 pixels resulted in highest overall accuracy (overall Kappa of 0.92 with Z ¼ 53.87; average per class Kappa ¼ 0.93 with standard deviation ¼ 0.06) for the random forest classifier that was significantly different at 0.05 level (Z ¼ 4.51) than that created by the non-spatial random forest classifier and was also higher than that produced by the post-classification majority filters of sizes ranging from 3 3

50 B. Ghimire et al. Table 3. Confusion matrix of non-spatial random forest classifier using validation samples. Class name Cranberry bogs Pasture or row crops Forest Grassland Urban Water Wetland Sand quarry Total Commission error Cranberry bogs 6 0 0 0 1 0 0 0 7 0.1429 Pasture or row crops 1 2 0 0 1 0 0 0 4 0.5 Forest 0 0 60 5 2 0 7 0 74 0.1892 Grassland 2 4 0 27 6 0 1 1 41 0.3415 Urban 0 4 2 1 53 0 6 1 67 0.209 Water 0 0 0 0 0 39 0 0 39 0 Wetland 0 0 5 0 5 0 51 0 61 0.1639 Sand quarry 0 0 0 0 1 0 0 13 14 0.0714 Total 9 10 67 33 69 39 65 15 307 Omission error 0.3333 0.8 0.1045 0.1818 0.2319 0 0.2154 0.1333

Spatial dependence in land-cover classification 51 16 14 12 Gini index (%) 10 8 6 4 2 0 Band1 Band2 Band3 Band4 Band5 Band7 Band1 Bands Band2 Band3 Band4 Band5 Band7 Figure 2. Variable importance contributions of different bands in terms of percent mean reduction in Gini index of the random forest classifier with 11 11 pixels moving window. pixels to 11 11 pixels applied on the non-spatial random forest classified image. The variables were selected by the random forest classifier because the variables were able to explain a substantial amount of variance indicated by the Gini index variable importance measure (figure 2). The predicted categories of the random forest classifier having a window or neighbourhood size of 11 11 pixels are shown in figure 1. The difference map showing areas of difference and no difference between the land-cover maps produced by the non-spatial random forest classifier and random forest classifier is also revealed in figure 1. Table 4 shows the confusion matrix along with the class-specific accuracy measures for the random forest classifier. Table 5 shows the increase in per class Kappa value of each land-cover category for the random forest classifier over the non-spatial classifier. The highest increase in per class Kappa values was found for heterogeneous map categories, such as pasture or row crops (increase in per class Kappa ¼ 373.23%) and cranberry bogs (increase in per class Kappa ¼ 34.45%). The pasture or row crops category was initially confused with grassland and urban categories and the canberry bog category was confused with the pasture or row crops and grassland categories when the nonspatial classifier was applied, because the pasture or row crops and cranberry bog categories were characterized by low inter-class separability and high intra-class variability. However, the use of the random forest classifier provided additional dimensions in the feature space that increased inter-class separability and reduced intra-class variability. In contrast, small increases in Kappa values were observed for homogeneous classes, such as water (increase in per class Kappa ¼ 0%) and forest (increase in per class Kappa ¼ 4.99%), because the homogeneous categories were highly separable using the spectral variables only and additional variables did not contribute to increasing inter-class separability. 4. Conclusion Spatial dependence can describe the local spatial structure and variability of landcover categories and can increase land-cover classification accuracies in heterogeneous

52 B. Ghimire et al. Table 4. Confusion matrix of random forest classifier with 11 11 pixels moving window using validation samples. Class name Cranberry bogs Pasture or row crops Forest Grassland Urban Water Wetland Sand quarry Total Commission error Cranberry bogs 8 0 0 0 0 0 0 0 8 0 Pasture or row crops 0 9 0 0 2 0 0 0 11 0.1818 Forest 0 0 62 0 2 0 1 0 65 0.0462 Grassland 0 0 0 33 2 0 0 0 35 0.0571 Urban 0 1 2 0 60 0 3 0 66 0.0909 Water 0 0 0 0 0 39 0 0 39 0 Wetland 1 0 3 0 2 0 61 0 67 0.0896 Sand quarry 0 0 0 0 1 0 0 15 16 0.0625 Total 9 10 67 33 69 39 65 15 307 Omission error 0.1111 0.1 0.0746 0 0.1304 0 0.0615 0

Spatial dependence in land-cover classification 53 Table 5. Per class Kappa values of non-spatial and random forest classifiers. Class name Kappa value (non-spatial random forest) Kappa value (random forest -11 11 pixels) Increase in Kappa of random forest over non-spatial random forest Cranberry bogs 0.6589 0.8859 34.45 Pasture or row crops 0.1894 0.8963 373.23 Forest 0.8623 0.9053 4.99 Grassland 0.7902 1 26.55 Urban 0.7034 0.8338 18.54 Water 1 1 0.00 Wetland 0.7312 0.9213 26.00 Sand quarry 0.8603 1 16.24 landscapes that are difficult to classify because of high intra-class variability and low inter-class separability. The increase in accuracy of the random forest classifier over the non-spatial random forest classifier can be attributed to the large increase in average per class Kappa of heterogeneous map categories in comparison to homogeneous classes. The variation in Kappa accuracy of the random forest classifier for different neighbourhood sizes is related to the differences in the size of pixel neighbourhood and the shape and size of the land-cover patches or objects represented by the pixel neighbourhood. It is hypothesized that the statistic used to describe contextual or spatial information has distinct advantages over conventional contextual texture-based classification approaches because, unlike standard contextual methods that consider only values at a given neighbourhood of each pixel, the statistic is a ratio of values of the neighbourhood for each pixel versus values of the entire image. References ATKINSON, P.M. and LEWIS, P., 2000, Geostatistical classification for remote sensing: an introduction. Computers and Geosciences, 26, pp. 361 371. BLASCHKE, T., BURNETT, C. and PEKKARINEN, A., 2004, Image segmentation methods for objectbased analysis and classification. In Remote Sensing Image Analysis: Including the Spatial Domain, S.M. de Jong and F.D. van der Meer (Eds), pp. 211 236 (Dordrecht: Kluwer Academic Publishers). BREIMAN, L., 2001, Random forests. Machine Learning, 45, pp. 5 32. CONGALTON, R.G., 1991, A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37, pp. 35 46. FRANKLIN, S.E. and WULDER, M.A., 2002, Remote sensing methods in medium spatial resolution satellite data land cover classification of large areas. Progress in Physical Geography, 26, pp. 173 205. GETIS, A. and ORD, J.K., 1992, The analysis of spatial association by use of distance statistics. Geographical Analysis, 24, pp. 189 206. GISLASON, P.O., BENEDIKTSSON, J.A. and SVEINSSON, J.R., 2006, Random forests for land cover classification. Pattern Recognition Letters, 27, pp. 294 300. HAM, J., CHEN, Y., CRAWFORD, M.M. and GHOSH, J., 2005, Investigation of the random forest framework for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43, pp. 492 501. KOTSIANTIS, S.B. and PINTELAS, P.E., 2004, Combining bagging and boosting. International Journal of Computational Intelligence, 1, pp. 324 333.

54 B. Ghimire et al. LIPPITT, C.D., ROGAN, J., LI, Z., EASTMAN, J.R. and JONES, T.G., 2008, Mapping selective logging in mixed deciduous forest: a comparison of machine learning algorithms. Photogrammetric Engineering and Remote Sensing, 74, pp. 1201 1211. MATHER, P.M., 2004, Computer Processing of Remotely-Sensed Images: An Introduction (Chichester, UK: John Wiley & Sons Ltd). MYINT, S.W., WENTZ, E.A. and PURKIS, S.J., 2007, Employing spatial metrics in urban land-use/ land-cover mapping: comparing the and Geary indices. Photogrammetric Engineering and Remote Sensing, 73, pp. 1403 1415. ORD, J.K. and GETIS, A., 1995, Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis, 27, pp. 286 306. PAL, M., 2005, Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26, pp. 217 222. RDEVELOPMENT CORE TEAM, 2008, R: A Language and Environment for Statistical Computing (Vienna, Austria: R Foundation for Statistical Computing). RICHARDS, J.A. and JIA, X., 2006, Remote Sensing Digitial Image Analysis: An Introduction (Berlin, Germany: Springer-Verlag). ROGAN, J., FRANKLIN, J., STOW, D., MILLER, J., WOODCOCK, C. and ROBERTS, D., 2008, Mapping land-cover modifications over large areas: a comparison of machine learning algorithms. Remote Sensing of Environment, 112, pp. 2272 2283. ROGAN, J. and MILLER, J., 2006, Integrating GIS and remotely sensed data for mapping forest disturbance and change. In Understanding Forest Disturbance and Spatial Pattern: Remote Sensing and GIS Approaches, M. Wulder and S.E. Franklin (Eds), pp. 133 170 (Boca Raton, FL: Taylor & Francis). STEHMAN, S.V., 1997, Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62, pp. 77 89. STUCKENS, J., COPPIN, P.R. and BAUER, M.E., 2000, Integrating contextual information with per-pixel classification for improved land cover classification. Remote Sensing of Environment, 71, pp. 282 296. WULDER, M. and BOOTS, B., 1998, Local spatial autocorrelation characteristics of remotely sensed imagery assessed with the statistic. International Journal of Remote Sensing, 19, pp. 2223 2231.