Accurate digital mapping of rare soils

Similar documents
Prediction of Soil Properties Using Fuzzy Membership

Overview and Recent Developments

Species Distribution Modeling

APPLICATION OF FUZZY SETS FUNCTION FOR LAND ATTRIBUTES MAPPING. Dwi Putro Tejo Baskoro ABSTRACT

Improvements on 2D modelling with 3D spatial data: Tin prospectivity of Khartoum, Queensland, Australia

Sneak Preview of the Saskatchewan Soil Information System (SKSIS)

This is trial version

Comparing CORINE Land Cover with a more detailed database in Arezzo (Italy).

Four Soil Orders on a Vermont mountaintop One-third of the world s soil orders in a 2500 square meter research plot

Learning with multiple models. Boosting.

GLOBAL SYMPOSIUM ON SOIL ORGANIC CARBON, Rome, Italy, March 2017

EFFECT OF ANCILLARY DATA ON THE PERFORMANCE OF LAND COVER CLASSIFICATION USING A NEURAL NETWORK MODEL. Duong Dang KHOI.

USING HYPERSPECTRAL IMAGERY

Applying Artificial Neural Networks to problems in Geology. Ben MacDonell

Lesson 6: Accuracy Assessment

Geoderma 155 (2010) Contents lists available at ScienceDirect. Geoderma. journal homepage:

Desertification Hazard Zonation by Means of ICD Method in Kouhdasht Watershed

Technical Drafting, Geographic Information Systems and Computer- Based Cartography

Development of statewide 30 meter winter sage grouse habitat models for Utah

Dynamic Modeling of Land Use and Coverage at Quarta Colônia, RS, Brazil

RESEARCH ON KNOWLEDGE-BASED PREDICTIVE NATURAL RESOURCE MAPPING SYSTEM

Sediment- yield estimation, by M-PSIAC method in a GIS environment, case study:jonaghn river sub basin(karun basin)

Linking Arid Land Surface Characteristics to Soil Hydrologic and Ecosystem Functions in Mojave Desert Landscapes

Predictive soil mapping with limited sample data

SoLIM: A new technology for soil survey

GIS in Water Resources Midterm Exam Fall 2008 There are 4 questions on this exam. Please do all 4.

Pierce Cedar Creek Institute GIS Development Final Report. Grand Valley State University

GRAPEVINE LAKE MODELING & WATERSHED CHARACTERISTICS

WEALTH FROM WATER PILOT PROGRAM

Land Use MTRI Documenting Land Use and Land Cover Conditions Synthesis Report

AGOG 485/585 /APLN 533 Spring Lecture 5: MODIS land cover product (MCD12Q1). Additional sources of MODIS data

SIF_7.1_v2. Indicator. Measurement. What should the measurement tell us?

Comparison of Intermap 5 m DTM with SRTM 1 second DEM. Jenet Austin and John Gallant. May Report to the Murray Darling Basin Authority

Shalaby, A. & Gad, A.

STUDY ON FOREST VEGETATION CLASSIFICATION BASED ON MULTI- TEMPORAL REMOTE SENSING IMAGES

Image segmentation for wetlands inventory: data considerations and concepts

Versioning of GlobalSoilMap.net raster property maps for the North American Node

NR402 GIS Applications in Natural Resources. Lesson 9: Scale and Accuracy

Building a Vibrant and Enduring Spatial Science John P. Wilson IWGIS2014 Beijing, China

Comparison of MLC and FCM Techniques with Satellite Imagery in A Part of Narmada River Basin of Madhya Pradesh, India

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Effect of land cover / use change on soil erosion assessment in Dubračina catchment (Croatia)

KNOWLEDGE-BASED CLASSIFICATION OF LAND COVER FOR THE QUALITY ASSESSEMENT OF GIS DATABASE. Israel -

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Climate History, Lake Evolution, and Predicting Organic Richness of the Green River Formation, Piceance Creek Basin, Colorado

Preliminary Research on Grassland Fineclassification

Using remote sensing and Detection of Early Season Invasives (DESI) to analyze the temporal dynamics of invasive cheatgrass (Bromus tectorum).

1. Introduction. S.S. Patil 1, Sachidananda 1, U.B. Angadi 2, and D.K. Prabhuraj 3

Using Remote Sensing to Analyze River Geomorphology

The use and applications of the Soilscapes datasets

Terrain Attributes Aid Soil Mapping on Low-Relief Indiana

Climate Change and Biomes

Species Distribution Models

Eco-hydromorphic Characterization of the Louisiana Coastal Region Using Multiple Remotely Sensed Data Sources and Analyses

D G Rossiter. Soil Science Division. February 2001

Investigation of the Effect of Transportation Network on Urban Growth by Using Satellite Images and Geographic Information Systems

GeoComputation 2011 Session 4: Posters Accuracy assessment for Fuzzy classification in Tripoli, Libya Abdulhakim khmag, Alexis Comber, Peter Fisher ¹D

A Help Guide for Using gssurgo to Find Potential Wetland Soil Landscapes

Supporting Information Appendix

PROANA A USEFUL SOFTWARE FOR TERRAIN ANALYSIS AND GEOENVIRONMENTAL APPLICATIONS STUDY CASE ON THE GEODYNAMIC EVOLUTION OF ARGOLIS PENINSULA, GREECE.

Pilot area description Aalborg south

Biodiversity Blueprint Overview

GDEs rely on groundwater

A Method to Improve the Accuracy of Remote Sensing Data Classification by Exploiting the Multi-Scale Properties in the Scene

The application of classification tree analysis to soil type prediction in a desert landscape

Ecoregions Glossary. 7.8B: Changes To Texas Land Earth and Space

Tropics & Sub-Tropics. How can predictive approaches be improved: Data Sparse Situations

INTRODUCTION Landslides are bad but good

Using Grassland Vegetation Inventory Data

p of increase in r 2 of quadratic over linear model Model Response Estimate df r 2 p Linear Intercept < 0.001* HD

The Global Scope of Climate. The Global Scope of Climate. Keys to Climate. Chapter 8

COMMON GIS TECHNIQUES FOR VECTOR AND RASTER DATA PROCESSING. Ophelia Wang, Department of Geography and the Environment, University of Texas

Global SoilMappingin a Changing World

Aim and objectives Components of vulnerability National Coastal Vulnerability Assessment 2

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Basin characteristics

Methods for generating vegetation maps from remotely

Object Based Imagery Exploration with. Outline

An examination of geographic patterns of soil climate and its classification in the U.S. system of soil taxonomy

Classification according to patent rock material/origin, soil distribution and orders

A global map of mangrove forest soil carbon at 30 m spatial resolution

Star-Galaxy Separation in the Era of Precision Cosmology

REMOTE SENSING ACTIVITIES. Caiti Steele

Earth History: Record in the Rocks

PENNSYLVANIA. Ordinary processes at Earth's surface and just below it cause rocks to change and soils to form. Page 1 of 3. S8.A.1.1.

A Logistic Regression Method for Urban growth modeling Case Study: Sanandaj City in IRAN

GENERALIZED LINEAR MIXED MODELS FOR ANALYZING ERROR IN A SATELLITE-BASED VEGETATION MAP OF UTAH

Pilot area description Tissoe

How does erosion happen?

79 International Journal of Scientific & Engineering Research, Volume 4, Issue 12, December-2013 ISSN

Scientific registration n : 2180 Symposium n : 35 Presentation : poster MULDERS M.A.

Esri Image & Mapping Forum 9 July 2017 Geiger-Mode for Conservation Planning & Design by USDA NRCS NGCE

Summary Description Municipality of Anchorage. Anchorage Coastal Resource Atlas Project

Lab 7: Sedimentary Structures

Preliminary Calculation of Landscape Integrity in West Virginia Based on Distance from Weighted Disturbances

Spectroscopy-supported digital soil mapping

Topographical Change Monitoring for Susceptible Landslide Area Determination by Using Multi-Date Digital Terrain Models and LiDAR

Australian Soil and Land Survey Field Handbook THIRD EDITION THE NATIONAL COMMITTEE ON SOIL AND TERRAIN

New Digital Soil Survey Products to Quantify Soil Variability Over Multiple Scales

GEOGRAPHY (029) CLASS XI ( ) Part A: Fundamentals of Physical Geography. Map and Diagram 5. Part B India-Physical Environment 35 Marks

Transcription:

Accurate digital mapping of rare soils Colby Brungard 1,, Budiman Minasny 2 1 Department of Plants, Soils, and Climate, Utah State University, United States 2 Faculty of Agriculture & Environment, The University of Sydney, NSW 2006, Australia

Background: Soil class prediction from a DSM study, Wyoming, USA U.S. Soil Taxonomy Subgroup Classes Pedons a % of total b Producer s accuracy % Ustic Haplargid 26 46 86 Ustic Torriorthent 21 37 83 Badland 6 10 50 Ustic Paleargid 2 4 0 Majority classes = good prediction Minority/rare classes = poor prediction Ustic Torrifluvent 2 4 0 Total 57 100 a Total number of pedons per subgroup class. b Percent of total observations represented by each subgroup class. Table modified from: Brungard, C.W., Boettinger, J.L., Duniway, M.C., Wills, S.A., Edwards Jr., T.C. 2015. Machine learning for predicting soil classes in three semi-arid landscapes, Geoderma, Volumes 239 240

Problem: Rare soil: a soil taxonomic class with few observations Accurate modeling of rare soils is needed for Rare plant distribution (e.g., Schoenocrambe suffrutescens) (Baker et al., 2016) Vineyard suitability (e.g. Terra Rossa) Targeted land management (e.g., wetland restoration) However; rare soils are difficult to accurately predict

Why are some soils classes rare? Endemic soils Soils restricted to a particular geographic area based on a unique combination of soil-forming factors (Bockheim, 2005) Likely small areas, infrequently sampled Sampling issues Biased sampling Arbitrary study area boundaries

Why are rare soils difficult to predict? 1) Conceptual reasons Different defining and predicting variables we define soil classes with a set of soil profile variables, but try to model them with environmental variables Soil classes difficult to separate in feature space

Soil Classes Difficult to Separate in Feature Space

Why are rare soils difficult to predict? 2) Practical reasons Accurate classification requires many observations A few classes have the majority of the observations Good prediction accuracy Rare soil classes (minority classes) have few observations Poor prediction accuracy

Possible solutions to practical problems Decrease the number of classes Combine minor classes with similar classes or into other class Can exclude rare classes Should be done for classes with fewer observations than covariates Use a weighting/cost scheme Useful, but cannot produce class probability predictions (often most interesting) Increase number of observations in rare classes

How to increase observation numbers in rare classes? Expert knowledge Case-based reasoning (Shi et al., 2004) Expert knowledge used to generate synthetic observations Data-driven approach Synthetic minority-class oversampling (e.g., SMOTE) Synthetic observations generated by similarity to existing observations Currently only for two-class problems (soil datasets are multiclass)

Synthetic minority-class oversampling for multi-class soil datasets - SIMONA Goal: Generate equal number of observations in each soil class Synthetic samples similar to field samples in geographic and feature space An alternative algorithm from Abdi and Hashemi (2015) which only sampled the feature space

SIMONA - Synthetic minority class oversampling for multiclass soil datasets 1. Calculate weights for each minority class observation Weights based on n surrounding points with matching class label. Weight higher for more surrounding observations with matching class labels 2. Randomly choose one minority class observation by weight 3. Extract all covariate values around observation in a w-meter buffer 4. Calculate similarity between observation and covariates with Gower s similarity index (including categorical variables) 5. Randomly choose one of the p most similar samples 6. Add to synthetic sample set 7. Repeat until number of original + synthetic samples = the number of observations in the majority class

Preliminary Case Study Area: 296 km 2 North-central Wyoming, USA Elevation: ~ 1500 m Geology: mudstone, sandstone, conglomerate, limestone, shale and coal Climate: cool and dry Land use: oil, gas, coal production Photo courtesy of elifino57: https://ssl.panoramio.com/user/6155411

Sampling 57 pedons 5 subgroup* classes Imbalanced - 83% of observations in two classes U.S. Soil Taxonomy Subgroup Classes Pedons a % of total b Ustic Haplargid 26 46 Ustic Torriorthent 21 37 Badland 6 10 Ustic Paleargid 2 4 Ustic Torrifluvent 2 4 Total 57 100 a Total number of pedons per subgroup class. b Percent of total observations represented by each subgroup class. * Subgroups according to US Soil Taxonomy

SIMONA Covariates: Plan curvature, Diffuse Insolation, Landsat band ratio 5/2, Catchment Slope U.S. Soil Taxonomy Original + Original Synthetic Subgroup Classes Synthetic Ustic Haplargid 26 0 26 Ustic Torriorthent 21 5 26 Badland 6 20 26 Minor a 4 22 26 Total 57 104 a Combined 3 minor classes with < = 2 observations

Modelling CART and Neural Network models Original samples Original + Synthetic samples Case weights Original samples assigned high class weight (1) Synthetic samples assigned low case weight (0.25) Model accuracy Bootstrap sampling repeated 10 times Overall accuracy Kappa Confusion matrices

Results: Model Accuracy Overall Accuracy* Model Original Samples Original + Synthetic CART 0.57 0.64 Neural Net 0.67 0.77 * mean of bootstrap sampling repeated 10 times Kappa* Model Original Samples Original + Synthetic CART 0.26 0.52 Neural Net 0.43 0.70 * mean of bootstrap sampling repeated 10 times

Results Confusion Matrices from bootstrap sampling (neural network model only) Accurate models have large diagonal and small off-diagonal values Original samples Original + synthetic Actual Badland Ustic Haplargid Ustic Torriorthent Minor Badland 1 0 0 1 Ustic Haplargid 3 40 7 4 Predicted Ustic Torriorthent 5 11 26 3 Minor 0 0 0 0 Predicted Actual Badland Ustic Haplargid Ustic Torriorthent Minor Badland 21 0 3 0 Ustic Haplargid 1 17 5 2 Ustic Torriorthent 1 5 18 1 Minor 0 3 2 21

Results: Spatial Prediction with only original samples

Results: Spatial Prediction with synthetic samples

Conclusions SIMONA (Synthetic minority class oversampling for multiclass soil datasets) is promising for increasing rare soil class prediction accuracy Need to further test with independent validation data set May provide a means to increase prediction accuracy when additional field sampling is limited

References Abdi, L., and S. Hashemi. 2015. To combat multi-class imbalanced problems by means of over-sampling and boosting techniques. Soft Computing, 19:12. pp 3369-3385. Baker, J.B., Fonnesbeck, B.B., and J.L., Boettinger. 2016. Modeling Rare Endemic Shrub Habitat in the Uinta Basin Using Soil, Spectral, and Topographic Data. Soil Science Society of America Journal, 80. pp. 395 408 Bockheim, J.G. 2005. Soil endemism and its relation to soil formation theory. Geoderma, 129. pp. 109 124 Shi, X., Zhu, A-X., Burt, J.E., Qi, F., and D. Simonson. 2004. A Case-based Reasoning Approach to Fuzzy Soil Mapping. Soil Sci. Soc. Am. J. 68. pp. 885 894