AGOG 485/585 /APLN 533 Spring Lecture 5: MODIS land cover product (MCD12Q1). Additional sources of MODIS data

AGOG 485/585 /APLN 533 Spring 2019 Lecture 5: MODIS land cover product (MCD12Q1). Additional sources of MODIS data

Outline Current status of land cover products Overview of the MCD12Q1 algorithm Mapping Global Urban Areas Reading : Textbook Chapter 20 Optional: Chapter 12 Check additional readings on the website

MODIS Global Land Cover Product Designed to provide information related to state and seasonal-to-decadal scale dynamics of global land cover There are two datasets: MODIS land cover type (MCD12Q1): 5 main layers in which land cover is mapped using several different classification systems MODIS land cover dynamics (MCD12Q2): 7 layers, to support studies of seasonal and interannual variation in land surface and ecosystem properties. A.K.A Global Vegetation Phenology product

MCD12Q1: Land Cover Types Layers depict different land cover classifications: International Geosphere-Biosphere classification, also provides the most likely alternative IGBP class. 14-class system developed at the University of Maryland 6-biome system used by the MODIS LAI/FAPAR algorithm: including broadleaf and cereal crops Biome classification proposed by Steve Running et al.: leaf type, leaf longevity, plant persistence Plant functional type classification described by Bonan et al. Classification confidence is also provided for each pixel.

MCD12Q1 classifications

MCD12Q1: Algorithm Description (collection 4) Input data (242 bands, 1000m resolution): 16-day Nadir BRDF-Adjusted Reflectances (NBARs) assembled over one year of observations: 7 spectral bands, 0.4 2.4 µm, similar to Landsat 16-day Enhanced Vegetation Index (EVI) Land Surface Temperature product (16-day maximum temp) All inputs are cloud-cleared and atmospherically corrected. Texture (maximum) images of the spectral band 1 USGS DEM topography (slope, aspect, gradient, elev) In addition, annual mean, minimum and maximum for each of the input features identified above.

MCD12Q1: Algorithm Description (collection 5.1) Input data (135 bands, 500m resolution): 32-day Nadir BRDF-Adjusted Reflectances (NBARs) assembled over one year of observations: 7 spectral bands, 0.4 2.4 µm, similar to Landsat 32-day Enhanced Vegetation Index (EVI) Land Surface Temperature product (32-day average maximum temp) All inputs are cloud-cleared and atmospherically corrected. In addition, annual mean, minimum and maximum for each of the input features identified above. Collection 4 land cover

MCD12Q1: Flowchart

MCD12Q1: Algorithm Description At the global scale, many land covers reveal multimodal distributions. The MODIS land cover product employs a supervised decision tree classification algorithm called C4.5 C4.5 is a nonparametric classifier which makes no assumptions regarding the frequency distribution of the data being classified. Assumptions are violated for parametric classifiers that assume a frequency distribution: such as maximum likelihood classifiers.

MCD12Q1: Training data The MCD12Q1 algorithm relies heavily on a database of land cover exemplars for classification estimation. System for Terrestrial Ecosystem Parameterization (STEP): 2095 sites distributed globally Dynamic database, requires ongoing maintenance. Sites included in the database are derived from manual interpretation of Landsat TM data, augmented by ancillary map data, as available.

MCD12Q1: Training data

Decision Tree Classification Goal: optimal prediction of class labels from a set of feature values Basic approach: Supervised learning using training data Boosting (10 decisions trees): Ensemble classification developed in the machine learning community. It improves class discrimination by estimating multiple classifiers while forcing the classifier to focus on difficult classes. Final classification produced by an accuracy-weighted vote

Decision Tree Classification Tree Structure: Root node (all data), internal nodes and terminal or leaf nodes (predictions) Building the Decision Tree: Recursive partitioning of training data into successively more homogeneous subsets Multiple Leaf Nodes per Class Leaf nodes identify class assignment Sub-classes allocated individual leaves Internal nodes Root Leaf nodes

Land Cover Product, New England, 2001 Evergreen Needleleaf Forest Agriculture Agriculture/Natural Vegetation Mosaic Deciduous Broadleaf Forest Mixed Forest Urban 14 EOS IWG October 31, 2001 October 31, 2001

MCD12Q1: Algorithm Description Classification Confidence map Second Most-Likely Class

MCD12Q1: Land cover validation Validation Plan Utilizes Multiple Approaches Level 1: Comparison with existing data sources Examples: Global AVHRR land cover datasets: DISCover, UMD Humid Tropics: Landsat Pathfinder Forest Cover: FAO Forest Resources Assessment Western Europe: CORINE United States: USGS/EPA NLCD/MRLC

MCD12Q1: Land cover validation Level 2: Quantitative studies of output and training data Per-pixel confidence statistics Test site cross-comparisons Level 3: Sample-based statistical studies Random stratified sampling according to proper statistical principles Costly, but needed for making proper accuracy statements.

MCD12Q1: Confidence values by land cover type IGBP Class Confidence 1. Evergreen Needleleaf 68.3 2. Evergreen Broadleaf 89.3 3. Deciduous Needleleaf 66.7 4. Deciduous Broadleaf 65.9 5. Mixed Forest 65.4 6. Closed Shrubland 60.0 7. Open Shrubland 75.3 8. Woody Savanna 64.0 IGBP Class Confidence 9. Savanna 67.8 10. Grasslands 70.6 11. Permanent Wetlands 52.3 12. Cropland 76.4 14. Cropland/Nat. Veg n. 60.7 15. Snow and Ice 87.2 16. Barren 90.0 Overall Confidence 76.3

MCD12Q1: Confidence values by continent Region Confidence, percent Africa 79.4 Australia/Pacific 83.2 Eurasia 76.8 North America 71.9 South America 78.5 Overall Confidence 76.3

MCD12Q1: Cross validation with training sites Cross-Validation Procedure Hide 10% of training sites, classify with remaining 90%; repeat 10 times for 10 unique sets of all sites Provides confusion matrix based on unseen pixels where whole training site is unseen Not a stratified random sample, but a reasonable indication of within-class accuracy.

MCD12Q1: Confusion Matrix

MCD12Q1: Overall accuracies Proper accuracy statements require proper statistical sampling. AVHRR state of the art has been 60-70%, depending on class and region. MODIS accuracies are falling in the 70-80% range. Most mistakes are between similar classes. Land cover change should not be inferred from comparing successive land cover maps.

What about urban areas? Occupy less than 1% of terrestrial surface (<2% according to other accounts) Have significant effects on global environmental processes >50% of world s human population >70-80% of economic activities Major contributors to pollution What is urban area? Built environment (=impervious surfaces) occupies >50% within a pixel Vegetated pixels (>50% green, e.g. parks) are not urban Minimal mapping unit = contiguous patches of built-up land 1 km 2

What about urban areas? From Schneider et al 2010

Nighttime lights of the conterminous US Defense Meteorological Satellite Program (DMSP)

Methodology of mapping urban areas for the first version of global land cover (2003)

Examples of mapped urban areas (2003)

Mapping urban areas using 500m MODIS (Schneider et al 2010) Preliminary stratification of global urban areas into 16 quasihomogeneous strata = urban ecoregions Biome designation (climate and vegetation) Regional differences in urban topology (structure, organization, and historic development) Level of economic development (per capita Gross Domestic Product)

Urban ecoregions (Schneider et al 2010)

Mapping urban areas using 500m MODIS (Schneider et al 2010) Supervised classification with ensemble decision trees (C4.5 classifier) and boosting Trained by the same 2095 STEP sites plus 182 additional urban areas One-year time-series of 8-day 7-band MODIS NBAR product (~500 m) Auxiliary datasets (e.g DMSP nightlights) are not used

Mapping urban areas using 500m MODIS (Schneider et al 2010) C4.5 classifier is run twice: The first run utilizes all 2277 training areas Problematic areas are then refined using a posterior probabilities and Bayes Rule, where probabilities of being urban are estimated by subtracting probabilities of non-urban classes (primarily shrublands) derived from decision trees that do not use urban training areas (the second run) from one, i.e. P (urban) = 1 P(shrubland) Accuracy assessment uses 140 independent reference sites classified manually using Landsat imagery

Estimating posterior probabilities

Comparison of global maps