Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Similar documents
Models to carry out inference vs. Models to mimic (spatio-temporal) systems 5/5/15

Introduction to Spatial Big Data Analytics. Zhe Jiang Office: SEC 3435

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

UPDATING THE MINNESOTA NATIONAL WETLAND INVENTORY

Land cover/land use mapping and cha Mongolian plateau using remote sens. Title. Author(s) Bagan, Hasi; Yamagata, Yoshiki. Citation Japan.

Neural Networks and Ensemble Methods for Classification

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Spatial Data Science. Soumya K Ghosh

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

CS145: INTRODUCTION TO DATA MINING

Urban remote sensing: from local to global and back

Midterm, Fall 2003

Semantic Geospatial Data Integration and Mining for National Security

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

When Dictionary Learning Meets Classification

Performance Evaluation and Comparison

New Regional Co-location Pattern Mining Method Using Fuzzy Definition of Neighborhood

Learning theory. Ensemble methods. Boosting. Boosting: history

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Statistical Machine Learning from Data

GeoComputation 2011 Session 4: Posters Discovering Different Regimes of Biodiversity Support Using Decision Tree Learning T. F. Stepinski 1, D. White

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Construction and Analysis of Climate Networks

The Solution to Assignment 6

Unsupervised Anomaly Detection for High Dimensional Data

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

Data Mining und Maschinelles Lernen

Chapter 6. Fundamentals of GIS-Based Data Analysis for Decision Support. Table 6.1. Spatial Data Transformations by Geospatial Data Types

Data Fusion and Multi-Resolution Data

FINAL: CS 6375 (Machine Learning) Fall 2014

AGOG 485/585 /APLN 533 Spring Lecture 5: MODIS land cover product (MCD12Q1). Additional sources of MODIS data

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

KNOWLEDGE-BASED CLASSIFICATION OF LAND COVER FOR THE QUALITY ASSESSEMENT OF GIS DATABASE. Israel -

INTRODUCTION TO GEOGRAPHIC INFORMATION SYSTEM By Reshma H. Patil

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

On Improving the k-means Algorithm to Classify Unclassified Patterns

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

An Empirical Study of Building Compact Ensembles

Stephen Scott.

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Variance Reduction and Ensemble Methods

Combination of Microwave and Optical Remote Sensing in Land Cover Mapping

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

Advanced Statistical Methods: Beyond Linear Regression

1 Handling of Continuous Attributes in C4.5. Algorithm

Applying Machine Learning for Gravitational-wave Burst Data Analysis

ASIA GEOSPATIAL FORUM 2011 Eco-Friendly Walk Score Calculator Choosing a Place to Live with GIS

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Transportation Data Mining: Vision & Challenges

Urban land use information retrieval based on scene classification of Google Street View images

Development of statewide 30 meter winter sage grouse habitat models for Utah

Land Cover Classification Mapping & its uses for Planning

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Chapter 2 Spatial and Spatiotemporal Big Data Science

Machine Learning Linear Classification. Prof. Matteo Matteucci

GIS = Geographic Information Systems;

Wetland Mapping Methods for the Arrowhead Region of Minnesota

Calculating Land Values by Using Advanced Statistical Approaches in Pendik

Online publication date: 22 January 2010 PLEASE SCROLL DOWN FOR ARTICLE

Most people used to live like this

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

Empirical Risk Minimization, Model Selection, and Model Assessment

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

WATER BODIES V2 ALGORITHM

Habitat Mapping using Remote Sensing for Green Infrastructure Planning in Anguilla

Machine Learning Concepts in Chemoinformatics

ECE 592 Topics in Data Science

Oak Ridge Urban Dynamics Institute

1. Introduction. S.S. Patil 1, Sachidananda 1, U.B. Angadi 2, and D.K. Prabhuraj 3

Why Spatial Data Mining?

Trimble s ecognition Product Suite

Decision Trees.

New Prediction Methods for Tree Ensembles with Applications in Record Linkage

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011

Impacts of sensor noise on land cover classifications: sensitivity analysis using simulated noise

Nonlinear Classification

Quality Assessment and Uncertainty Handling in Uncertainty-Based Spatial Data Mining Framework

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Remote Sensing and GIS Techniques for Monitoring Industrial Wastes for Baghdad City

CSC 411: Lecture 03: Linear Classification

Using Remote Sensing to Map the Evolution of Marsh Vegetation in the South Bay of San Francisco

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Machine Learning. Ensemble Methods. Manfred Huber

Understanding and Measuring Urban Expansion

Measuring Discriminant and Characteristic Capability for Building and Assessing Classifiers

Data Mining and Analysis: Fundamental Concepts and Algorithms

Neural Networks DWML, /25

Transcription:

Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department of Forest Resources University of Minnesota, Twin Cities 1

Highlights Public engagement with science and technology : Coursera MOOC, From GPS and Google Maps to Spatial Computing, reached 21,844 participants across 182 countries Enhanced infrastructure for education Interdisciplinary survey paper on spatiotemporal change footprint discovery Encyclopedia of GIS (Springer, 2 nd Edition): Multiple articles on climate change APA/IEEE Computing in Sc. & Eng. special issue on Computing and Climate Enhanced infrastructure for research Spatial decision trees can help improve wetland maps for climate models

Highlights Understanding Large semantic gap between Data Science and Climate Science Data Science results are hard to interpret in Climate Science Data Science assumptions violate laws of physics Concepts: unnecessary errors, e.g., salt and pepper noise Physics-Guided Data Mining concepts are potentially transformative Ex. Spatial Decision Trees: explicit physics (e.g., continuity) to wetland mapping Ex. Intervals of Persistent Change detection uses Physics (e.g., violation of continuity)

Spatial Decision Tree: Motivation Wetland mapping: Climate Change: wetlands major source of methane 1 manage natural disasters, defense against hurricanes, buffer of floods. maintain biodiversity, habitats to wildlife species Greenhouse Gas Methane flood control wildlife habitats 5 1 Bryan Walsh, How Wetlands Worsen Climate Change, Time, Magazine, 2010

Wetland Mapping Example Input: Output: train test (a) aerial photo (b) aerial photo (c) true classes (d) DT prediction wetland dry land Training samples: upper half Test samples: lower half Spatial neighborhood: DT: decision tree 6

Challenges Spatial autocorrelation effect samples violate i.i.d. assumption salt-and-pepper noise (white circles) Spatial anisotropy asymmetric spatial neighborhood (blue circle) Spatial heterogeneity areas with the same features correspond to distinct class labels (white circle) High computational cost large amount of focal computation with different spatial neighborhoods sizes Ground truth classes Decision tree prediction feature maps ground truth wetland dry land 7

Given: Problem Statement training & test samples from a raster spatial framework spatial neighborhood, its maximum size Find: a (spatial) decision tree Objective: minimize classification error and salt-and-pepper noise Constraint: training samples are contiguous patches spatial autocorrelation, anisotropy, and heterogeneity exist training dataset can be large with high computational cost 8

Input: Example with Decision Tree Output: 9 ID f 1 f 2 class A 3 3 red B 3 3 red C 1 2 green D 3 1 red E 3 1 red F 3 1 red G 3 3 red H 1 2 green I 1 2 green J 3 1 red K 1 1 red L 3 1 red M 1 2 green N 1 2 green O 3 1 red P 3 1 red Q 3 1 red R 1 1 red pixel id A B C D E F G H I J K L M N O P Q R In this example, Gamma index Γ 1 on feature f 1 is unique. Most often, Γ 1 is computed on the fly. decision tree f 1 1 yes no green red C H I M N K R A B D E F G J L O P Q R predicted map A B C D E F G H I J K L M N O P Q R salt-and-pepper noise pixel K from decision tree

Related Work Summary single decision tree traditional decision tree spatial decision tree Tree Existing Work local feature test & information gain: Proposed Work focal feature test & spatial information gain: Ensemble bootstrap sampling: geographic space partitioning: random forest ensemble spatial ensemble ensemble of decision trees 10

Proposed Approach Focal Test Local Focal feature test Test both local and focal (neighborhood) information focal test uses local autocorrelation statistics, e.g., Gamma index Focal Zonal 11

Proposed Approach - 2 tree traversal direction depends on both local and focal (neighborhood) information focal test uses local autocorrelation statistics, e.g., Gamma index (Γ) neighborhood where: i, j: pixel locations S i,j : similarity between location i and j W i,j is adjacency matrix element 12

Example Focal Tests traditional decision tree spatial decision tree inputs: table of records ID f 1 f 2 Γ 1 class C A 3 1 3 2 1 green red H B 3 1 3 2 1 green red C I 1 2 1 green D K 3 1 1-1 red M E 3 1 1 2 1 green red N F 3 1 1 2 1 green red G R 3 1 3 1-1 red H A 13 23 1 green red B I 31 32 1 green red D J 3 1 1 red K E 13 1 1 red F L 3 1 1 red M G 1 3 2 3 1 green red N J 1 3 2 1 1 green red O L 3 1 1 red O P 3 1 1 red Q P 3 1 1 red Q R 1 3 1 1 red 13 f 1 1 yes no green red C H I M N K R pixel id A B D E F G predicted map A B C D E F G H I J K L M N O P Q R A B C D E F G H I J K L M N O P Q R J L O P Q inputs: feature maps, class map feature f 1 3 3 1 3 3 3 3 1 1 3 1 3 1 1 3 3 3 1 feature f 2 3 3 2 1 1 1 3 2 2 1 1 1 2 2 1 1 1 1 class map adaptive neigh., 3 by 3 focal function Γ 1 1 1 1 1 1 1 1 1 1 1-1 1 1 1 1 1 1-1 ( f 1 1 ) * (Γ 1 0) + - green red C H I M N predicted map A B D E F G A B C D E F G H I J K L M N O P Q R J K L O P Q R

Questions to answer: Evaluation: Case Study SDT v.s. DT - classification accuracy SDT v.s. DT - salt-and-pepper noise Computational scalability of SDT Dataset: Chanhassen, MN (wetland mapping) 2 classes: wetland, dry land features: high resolution (3m*3m) aerial photos (RGB, NIR, NDVI) in 2003, 2005, 2008 Training set: randomly select circular patches; Test set: remaining pixels on the scene; Three scenes are used. Max neighborhood size: 11 pixels by 11 pixels Chanhassen, MN 14

Wetland Mapping Comparison Scene 1 Input: Output: train test (a) aerial photo (b) aerial photo (c) true classes (d) DT prediction wetland dry land (e) SDT prediction Training samples: upper half Test samples: lower half Spatial neighborhood: DT: decision tree SDT: spatial decision tree (11x11 neighborhood) 15

Classification Performance Scene 2 decision tree (DT) spatial decision tree (SDT) Trends: 1.DT: salt-and-pepper noise 2.SDT improve accuracy, salt-and-pepper noise levels true wetland true dryland false wetland false dryland

Evaluation: Classification Performance Classification accuracy and salt-and-pepper noise level Model Confusion Matrix Prec. Recall F measure Autocorrelation DT 99,141 10,688 0.81 0.75 0.78 0.87 15,346 45,805 SDT 99,390 10,439 0.83 0.83 0.83 0.93 10,618 50,533 Significance test between confusion matrices: Model Khat Khat Variance Z-score significance DT 0.66 3.6*10-6 SDT 0.73 3.0*10-6 26.8 significant Spatial decision tree reduces salt-and-pepper noise and misclassification errors, compared with decision trees. 17

Computational Bottleneck Analysis Analysis: 1. focal computation takes the vast majority of the time 2. focal computation cost increases faster with the training set size Focal computation is the bottleneck! 18

19 Incremental Update Approach Key idea: reduce redundant focal computation by reusing results across candidate test thresholds Γ(f < δ) 1 9 9 9 2 9 9 9 3 8 7 6 4 5 5 5 1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -1-1 -1-1 1-1 -1-1 1-1 -1-1 1-1 -1-1 -1-1 -1-1 -1 0.6 1 1 0.6 0.75 1 1 1 1 1 1 1 1 1 1-0.33 0.2 1 1-0.2 0.25 1 1-0.6 0.5 1 1 0.33 0.6 1 1 1-1 -1-1 1-1 -1-1 -1-1 -1-1 -1-1 -1-1 1-1 -1-1 1-1 -1-1 1-1 -1-1 1-1 -1-1 -0.33 0.2 1 1-0.6 0.5 1 1 0.6 0.75 1 1 1 1 1 1 (a) feature values (b) indicators, focal values for δ=1 (c) indicators, focal values for δ=2 candidate δ {1, 2, 3, 4, 5, 6, 7, 8} -0.33 0.2 1 1-0.2 0.25 1 1-0.2 0.25 1 1-0.33 0.2 1 1 (d) indicators, focal values for δ=3 (e) indicators, focal values for δ=4

Evaluation of Computational Cost Notation of symbols F # of features (12) N # of samples N d S max N 0 # of distinct feature values max neigh size min node size The refined algorithm significantly reduces computational cost. 20

Conclusions Ignoring auto-correlation leads to errors, e.g., salt-n-pepper noise Proposed a novel spatial decision tree approach with focal tests Evaluation shows that proposed method reduced salt-n-pepper noise And improved classification accuracy Designed computational refinements to improve scalability 21

Publications on Spatial Decision Trees [1] Z. Jiang, S. Shekhar, X. Zhou, J. Knight, J. Corcoran: Focal-Test-Based Spatial Decision Tree Learning. IEEE Transactions on Knowledge and Data Engineering (TKDE) 27(6): 1547-1559 (2015) [2] Z. Jiang, S. Shekhar, X. Zhou, J. Knight, J. Corcoran: Focal-Test-Based Spatial Decision Tree Learning: A Summary of Results. IEEE International Conference on Data Mining (ICDM) 2013: 320-329 [3] Z. Jiang, S. Shekhar, P. Mohan, J. Knight, and J. Corcoran. "Learning spatial decision tree for geographical classification: a summary of results. International Conference on Advances in GIS, pp. 390-393. ACM, 2012. [4] Z. Jiang, S. Shekhar, A. Kamzin, and J. Knight. "Learning a Spatial Ensemble of Classifiers for Raster Classification: A Summary of Results. IEEE International Conference on Data Mining Workshop, IEEE, 2014. 22

Challenges Revisited Spatial autocorrelation effect samples violate i.i.d. assumption salt-and-pepper noise (white circles) Spatial anisotropy asymmetric spatial neighborhood (blue circle) Spatial heterogeneity areas with the same features correspond to distinct class labels (white circle) High computational cost large amount of focal computation with different spatial neighborhoods sizes Ground truth classes Decision tree prediction feature maps ground truth wetland dry land 23

Future Work Key idea I: focal feature test tree traversal direction depends on both local and focal (neighborhood) information focal test uses local autocorrelation statistics, e.g., Gamma index Key idea II: spatial information gain (SIG) SIG = Info. Gain * α + Spatial Autocorrelation * (1 α) tree node test selection depends on both class purification and autocorrelation structure Key idea III: spatial ensemble of local trees geographic space partitioning, learn local classifiers 24

Proposed Approach: Spatial Ensemble yes red traditional ensemble (random forest) 1. assume i.i.d. distribution 2. bootstrap sampling 3. learn a tree from one sampling with random feature subsets yes f 1 1 red no green f 1 1 no green yes red f 1 1 no green yes red spatial ensemble (spatial forest) 1. assume spatial heterogeneity 2. spatial partitioning 3. learn local tree model in each partition yes f 1 1 red no green f 1 1 no green yes red f 1 1 no green 25

one feature image ground truth spatial cluster (islands) archipelagos partition P1 partition P2 prediction in P1 prediction in P2 ground truth single DT prediction spatial ensemble random forest 26