Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Similar documents
Uncertainty in Geographic Information: House of Cards? Michael F. Goodchild University of California Santa Barbara

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

ENGRG Introduction to GIS

Representation of Geographic Data

Accuracy and Uncertainty

A General Framework for Conflation

Lecture 5 Geostatistics

Deriving Uncertainty of Area Estimates from Satellite Imagery using Fuzzy Land-cover Classification

Geostatistics and Spatial Scales

The Nature and Value of Geographic Information. Michael F. Goodchild University of California Santa Barbara

Software. People. Data. Network. What is GIS? Procedures. Hardware. Chapter 1

Quality and Coverage of Data Sources

Lecture 8. Spatial Estimation

Basics of GIS. by Basudeb Bhatta. Computer Aided Design Centre Department of Computer Science and Engineering Jadavpur University

Improving Spatial Data Interoperability

Spatial Concepts: Data Models 2

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

UNCERTAINTY AND ERRORS IN GIS

Visualizing Spatial Uncertainty of Multinomial Classes in Area-class Mapping

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

Overview of Statistical Analysis of Spatial Data

KNOWLEDGE-BASED CLASSIFICATION OF LAND COVER FOR THE QUALITY ASSESSEMENT OF GIS DATABASE. Israel -

Geography 38/42:376 GIS II. Topic 1: Spatial Data Representation and an Introduction to Geodatabases. The Nature of Geographic Data

Introduction to GIS. Phil Guertin School of Natural Resources and the Environment GeoSpatial Technologies

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

GIST 4302/5302: Spatial Analysis and Modeling

SVY2001: Lecture 15: Introduction to GIS and Attribute Data

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Geog 469 GIS Workshop. Data Analysis

Data Accuracy to Data Quality: Using spatial statistics to predict the implications of spatial error in point data

The Nature of Geographic Data

Introduction-Overview. Why use a GIS? What can a GIS do? Spatial (coordinate) data model Relational (tabular) data model

Object-based feature extraction of Google Earth Imagery for mapping termite mounds in Bahia, Brazil

Spatial Process VS. Non-spatial Process. Landscape Process

Introduction to GIS I

MODELING DEM UNCERTAINTY IN GEOMORPHOMETRIC APPLICATIONS WITH MONTE CARLO-SIMULATION

2.2 Geographic phenomena

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Cell-based Model For GIS Generalization

Geographical Information Systems

GIST 4302/5302: Spatial Analysis and Modeling

UNCERTAINTY EVALUATION OF MILITARY TERRAIN ANALYSIS RESULTS BY SIMULATION AND VISUALIZATION

Propagation of Errors in Spatial Analysis

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

A spatial literacy initiative for undergraduate education at UCSB

An Information Model for Maps: Towards Cartographic Production from GIS Databases

CyberGIS: What Still Needs to Be Done? Michael F. Goodchild University of California Santa Barbara

Introduction to Spatial Statistics and Modeling for Regional Analysis

VISUAL ANALYTICS APPROACH FOR CONSIDERING UNCERTAINTY INFORMATION IN CHANGE ANALYSIS PROCESSES

GEO-DATA INPUT AND CONVERSION. Christos G. Karydas,, Dr. Lab of Remote Sensing and GIS Director: Prof. N. Silleos

Lecture 1: Geospatial Data Models

30 YEARS SINCE Z_GIS FOUNDING: PROVIDING SOLUTIONS FOR SOCIETY AND BUSINESS

NR402 GIS Applications in Natural Resources. Lesson 9: Scale and Accuracy

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

The Generalized Likelihood Uncertainty Estimation methodology

8/28/2011. Contents. Lecture 1: Introduction to GIS. Dr. Bo Wu Learning Outcomes. Map A Geographic Language.

Introduction to GIS - 2

Reference Database and Validation Concept for Remote Sensing based Forest Cover Change Products in Fiji INTERNATIONAL CLIMATE INITIATIVE

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Impacts of sensor noise on land cover classifications: sensitivity analysis using simulated noise

Geographic Information Systems (GIS) in Environmental Studies ENVS Winter 2003 Session III

Lecture 5. Symbolization and Classification MAP DESIGN: PART I. A picture is worth a thousand words

Interpolating Raster Surfaces

A Basic Introduction to Geographic Information Systems (GIS) ~~~~~~~~~~

Integrating LiDAR data into the workflow of cartographic representation.

Areal Differentiation Using Fuzzy Set Membership and Support Logic

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

Understanding Geographic Information System GIS

Introducing GIS analysis

F denotes cumulative density. denotes probability density function; (.)

USING LANDSAT IN A GIS WORLD

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Modeling Urban Sprawl: from Raw TIGER Data with GIS

Raster Spatial Analysis Specific Theory

Remote sensing of sealed surfaces and its potential for monitoring and modeling of urban dynamics

Spatial Data Mining. Regression and Classification Techniques

GIST 4302/5302: Spatial Analysis and Modeling

INTRODUCTION TO PATTERN RECOGNITION

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Nature of Spatial Data. Outline. Spatial Is Special

An Introduction to Geographic Information System

Intro to GIS Summer 2012 Data Visualization

Geographically weighted methods for examining the spatial variation in land cover accuracy

Twenty Years of Progress: GIScience in Michael F. Goodchild University of California Santa Barbara

Geospatial technology for land cover analysis

Digital Chart Cartography: Error and Quality control

Fundamental Spatial Concepts. Michael F. Goodchild University of California Santa Barbara

PROANA A USEFUL SOFTWARE FOR TERRAIN ANALYSIS AND GEOENVIRONMENTAL APPLICATIONS STUDY CASE ON THE GEODYNAMIC EVOLUTION OF ARGOLIS PENINSULA, GREECE.

Exploring Digital Earth. Michael F. Goodchild University of California Santa Barbara

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Version 1.1 GIS Syllabus

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Transiogram: A spatial relationship measure for categorical data

[Figure 1 about here]

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Fundamentals of ArcGIS Desktop Pathway

Results: MCMC Dancers, q=10, n=500

Transcription:

Statistical Perspectives on Geographic Information Science Michael F. Goodchild University of California Santa Barbara

Statistical geometry Geometric phenomena subject to chance spatial phenomena emphasis on form GIS Distribution of a population of phenomena points, lines, areas Distribution of repeated measurements of a single phenomenon points, lines, areas, maps

Outline Classical examples GIS motivation General issues GIS uncertainty uncertainty in area-class maps

Buffon s needle Consider a needle of unit length dropped randomly onto a set of parallel lines unit distance apart probability that the needle will intersect a line?

Analytical results p(intersection) = 2/π = 0.6366 Experimental determination of π 5 th decimal place npq/np = 10-5 n ~ 10 10 lines s units apart, needle length l p = 2l / πs relevance to GIScience?

l s 2 E(number of cells intersected) = 4l / πs l < s p(ends in different cells) = (4ls l 2 ) / πs 2

Applications Short needle quadrat-based experiments avoid missing point-to-point interactions when pairs are in different cells databases partitioned into tiles avoid having to access multiple tiles when e.g. computing distance Long needle operations on raster databases, e.g. intervisibility depend on number of intersected cells

Geography in the 1960s Barry Boots dissertation (PhD 1972) statistical models of tesselations if all nodes in a boundary network are 3- valent then from Euler it follows that the average number of edges per face will be slightly less than 6 irrespective of the causative process

Uncertainty in spatial data All spatial data leave the user uncertain to some degree about the exact nature of the real world no representation can be exact all representations involve some combination of approximation, measurement, generalization

GPS tracks as distributions

The area class map Assigns every location x to a class Mark and Csillag term c = f(x) a nominal field (or perhaps ordinal) classified scene soil map, vegetation cover map, land use map Need to model uncertainty in this type of map

Uncertainty modeling Area-class maps are made by a long and complex process involving many stages, some partially subjective Maps of the same theme for the same area will not be the same level of detail, generalization vague definitions of classes variation among observers measuring instrument error different classifiers, training sites different sensors

Error and uncertainty Error: true map plus distortion systematic measurements disturbed by stochastic effects accuracy (deviation from true value) precision (deviation from mean value) variation ascribed to error Uncertainty: differences reflect uncertainty about the real world no true map possible consensus map combining maps can improve estimates

Models of uncertainty Determine effects of uncertainty/variation/error on results of analysis if there is known variation, the results of a single analysis cannot be claimed to be correct uncertainty analysis an essential part of GIS error model the preferred term

Traditional error analysis Measurements subject to distortion z' = z + δz Propagate through transformations r = f(z) r + δr = f(z + δz) But f is rarely known complex compilation and interpretation complex spatial dependencies between elements of resulting data set

Spatial dependence In true values z In errors e cov(e i,e j ) a decreasing positive function of distance geostatistical framework Scale effects, generalization as convolutions of z

If this were not true If Tobler s First Law of Geography did not apply to errors in maps If errors were statistically independent If relative errors were as large as absolute errors Errors in derived products would be impossibly large e.g. slope e.g. length Shapes would be unrecognizable

Realization A single instance from an error model an error model must be stochastic Monte Carlo simulation The Gaussian distribution metaphor scalar realizations a Gaussian distribution for maps an entire map as a realization

The area class map Field of nominal values c(x), n>1 spatially autocorrelated in raster, count of i,i joins greater than expected In vector, collection of discrete objects nodes, edges, areas in coverage model polygons in shapefile model

A collection of discrete objects Three conflicts with the observed nature of area-class maps In repeated mappings, positions, attributes, and numbers of objects will vary (topological variation) Positional uncertainties will vary widely depending on boundary clarity Confusion of attributes will vary within polygons may be greatest in the center contrary to the egg-yolk model

suburban core land use type urban

Six requirements of an error model for area-class maps 1. Address confusion at every point between observed class c' and consensus class c 2. Variation between realizations should emulate variation between repeated mappings 3. Autocorrelations in outcomes at nearby points 4. Emulate effects when maps are generalized, both thematically and cartographically

Continued requirements: 5. Realizations should be invariant under changes in underlying representation, e.g., raster cell size 6. Nominal case: results invariant under reordering of classes Review known models against these requirements

1. The confusion matrix Useful descriptive device quality control Comparing classifiers, observers, scales, accuracies p(c' c) Applied per-pixel or per-polygon Per-polygon case: no within-polygon variation (violates 1) no variation in topology (violates 2) Per-pixel case: no spatial dependence in outcomes (violates 3, 5)

2. The epsilon band Addresses only positional accuracy in a fixed topology Assumes uniform degree of positional accuracy Violates 1, 2, 4

Models based on vectors of probabilities At every location: P(x) = {p 1,p 2,...,p n } assume raster representation, P constant over cell Simple random assignment no spatial dependence violates 3 violates 5 since cell size would be evident in outcomes How to induce spatial dependence?

Spatial dependence in outcomes Independent outcomes zero spatial dependence between pixels perfect positive spatial dependence within pixels implies pixel size is meaningful Induce spatial dependence range >> pixel size spatial dependence falls smoothly independent of pixel size

3. Simple convolution Generate independent outcomes in each pixel Convolve using a modal filter induces spatial dependence size of filter determines range of dependence satisfies 3, 5 Posterior proportions not equal to prior probabilities convolution favors more probable classes

4. Sequential assignment Goodchild, Sun, and Yang IJGIS (1992) Random field z with controlled spatial dependence U(0,1) assign class e.g. {0.2,0.3,0.5} 1 (0.0,0.2); 2 (0.2,0.5); 3 (0.5,1.0) z = 0.1, c = 1 z = 0.3, c = 2 z = 0.8, c = 3

Comparing to criteria 1. Within-polygon variation: yes 2. Topological variation: yes 3. Spatial dependence: yes 4. Generalization: increase cell size, smooth z, smooth P 5. Independent of cell size: yes 6. Invariant under reordering: no

Indicator Kriging Assign Class 1, notclass 1 Among notclass 1, assign Class 2, notclass 2 Continue to Class n-1 notclass n-1 = Class n

Process-based interpretation Class i antecedent to class i-1 e.g. agriculture invaded by urban e.g. grassland invaded by forest shape of boundary between class i and class i-1 determined by class i some applications have inherently ordered classes but in this model all classes are ordered

5. Shuffling across realizations Shuffling within realizations unacceptable because of heterogeneity Generate N realizations with random assignment Establish target spatial dependencies Pick random pixel pick random pair of realizations swap if closer to target in both realizations

Properties No justifying interpretation it works Spatial dependence characterized at pixel level no generalization possible, violating 4 It satisfies all other criteria

6. Phase-space model m dimensional "phase" space defined by field variables partition into n regions Generate m random fields to locate x in phase space Assign x to one of n classes compare classifiers Goodchild and Dubuc (1987)

z 1 (x) z 2 (x) z 3 (x) z 2 c=2 c=4 c=1 c=3 z 1 4 3 2 2 1 3 2

Properties of the model No specification of P(x) two versions of the model Independent generation of z in each realization no memory between realizations constant proportions Fix z and generate distortions in each realization memory determined by z not by P varying proportions size of distortion determines amount of variation between realizations

Properties of the model Only classes adjacent in phase space can be adjacent geographically Classifiers provide obvious basis but how to calibrate variances, covariances of random fields? how to calibrate in other cases? model is over-specified but strongly motivated by process All criteria satisfied generalization by smoothing z, coarsening phasespace classification

Conclusions Understanding of uncertainty should be process-based phase space ordinal field Spatial dependence and topological variation are critical for applications missing in the simpler models Some useful methods shuffling most practical phase space most satisfying