Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Similar documents
Accuracy and Uncertainty

Nature of Spatial Data. Outline. Spatial Is Special

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Lecture 8. Spatial Estimation

Overview of Statistical Analysis of Spatial Data

Lecture 5 Geostatistics

Contents. Learning Outcomes 2012/2/26. Lecture 6: Area Pattern and Spatial Autocorrelation. Dr. Bo Wu

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

GIST 4302/5302: Spatial Analysis and Modeling

Spatial Data Mining. Regression and Classification Techniques

Representation of Geographic Data

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

Introducing GIS analysis

ENGRG Introduction to GIS

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Uncertainty in Geographic Information: House of Cards? Michael F. Goodchild University of California Santa Barbara

GIST 4302/5302: Spatial Analysis and Modeling

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

The Nature of Geographic Data

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Basics of Geographic Analysis in R

What s special about spatial data?

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Everything is related to everything else, but near things are more related than distant things.

Temporal vs. Spatial Data

Sampling Populations limited in the scope enumerate

Concepts and Applications of Kriging. Eric Krause

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

UNCERTAINTY AND ERRORS IN GIS

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Spatial Analysis 1. Introduction

The Study on Trinary Join-Counts for Spatial Autocorrelation

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

The Nature and Value of Geographic Information. Michael F. Goodchild University of California Santa Barbara

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

A spatial literacy initiative for undergraduate education at UCSB

Raster Spatial Analysis Specific Theory

GIST 4302/5302: Spatial Analysis and Modeling

Lesson 6: Accuracy Assessment

Spatial Process VS. Non-spatial Process. Landscape Process

Lecture 4. Spatial Statistics

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Interpolating Raster Surfaces

UNIVERSITY OF TECHNOLOGY, SYDNEY

Quality and Coverage of Data Sources

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

NR402 GIS Applications in Natural Resources. Lesson 9: Scale and Accuracy

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

EMERGING MARKETS - Lecture 2: Methodology refresher

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Introduction to GIS. Phil Guertin School of Natural Resources and the Environment GeoSpatial Technologies

Stochastic calculus for summable processes 1

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Geometric Algorithms in GIS

Chapter 1: Basic Concepts

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Fundamental Spatial Concepts. Michael F. Goodchild University of California Santa Barbara

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Geostatistics and Spatial Scales

Quantitative Methods Geography 441. Course Requirements

GIS and the Built Environment

Chapter 02 Maps. Multiple Choice Questions

Triad-Friendly Approaches to Data Collection Design

SUPPLEMENTARY INFORMATION

Spatial Autocorrelation

Exploratory Spatial Data Analysis (ESDA)

LECTURE 15: SIMPLE LINEAR REGRESSION I

Supplementary material: Methodological annex

Spatial analysis. 0 move the objects and the results change

Chapter 6 Spatial Analysis

INTRODUCTION TO GEOGRAPHIC INFORMATION SYSTEM By Reshma H. Patil

Introduction to GIS I

2 Systems of Linear Equations

Assume that you have made n different measurements of a quantity x. Usually the results of these measurements will vary; call them x 1

Spatial Regression Modeling

Geographically weighted methods for examining the spatial variation in land cover accuracy

Statistics for cities

GIS for ChEs Introduction to Geographic Information Systems

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Harvard University. Rigorous Research in Engineering Education

Geographers Perspectives on the World

3/9/2015. Overview. Introduction - sampling. Introduction - sampling. Before Sampling

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Propagation of Errors in Spatial Analysis

The Building Blocks of the City: Points, Lines and Polygons

USING GIS CARTOGRAPHIC MODELING TO ANALYSIS SPATIAL DISTRIBUTION OF LANDSLIDE SENSITIVE AREAS IN YANGMINGSHAN NATIONAL PARK, TAIWAN

Geostatistical Interpolation: Kriging and the Fukushima Data. Erik Hoel Colligium Ramazzini October 30, 2011

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

On dealing with spatially correlated residuals in remote sensing and GIS

1. Write down the term 2. Write down the book definition 3. Put the definition in your own words 4. Draw an image and/or put a Real Life Example

Treatment of Error in Experimental Measurements

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Transcription:

The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular: zonal data, regions, states, districts, postcodes, zipcodes Regular lattice data: constructed grid, raster representation Types of spatial data Point patterns Events occurring in a continuous (or finely discretised) space Polygons, and lines (GIS) Q: Where would a continuous space based model be appropriate in social sciences? Spatial Autocorrelation First law of geography: everything is related to everything else, but near things are more related than distant things Waldo Tobler Many geographers would say I don t understand spatial autocorrelation Actually, they don t understand the mechanics, they do understand the concept. 1

Spatial autocorrelation Lattice or zone data Variable (x) recorded at places s Is the data random or are there similarities between neighbours? Does a high value of x tend to be associated with a high value of x in neighbouring places (and low values with low)? Spatial Autocorrelation Spatial Autocorrelation correlation of a variable with itself through space. If there is any systematic pattern in the spatial distribution of a variable, it is said to be spatially autocorrelated If nearby or neighboring areas are more alike, this is positive spatial autocorrelation Negative autocorrelation describes patterns in which neighboring areas are unlike Random patterns exhibit no spatial autocorrelation Why spatial autocorrelation is important Most statistics are based on the assumption that the values of observations in each sample are independent of one another Positive spatial autocorrelation may violate this, if the samples were taken from nearby areas Goals of spatial autocorrelation Measure the strength of spatial autocorrelation in a map test the assumption of independence or randomness Spatial Autocorrelation Spatial Autocorrelation is, conceptually as well as empirically, the two-dimensional equivalent of redundancy It measures the extent to which the occurrence of an event in an areal unit constrains, or makes more probable, the occurrence of an event in a neighboring areal unit. 2

Spatial Autocorrelation Non-spatial independence suggests many statistical tools and inferences are inappropriate. Correlation coefficients or ordinary least squares regressions (OLS) to predict a consequence assumes that the observations have been selected randomly. If the observations, however, are spatially clustered in some way, the estimates obtained from the correlation coefficient or OLS estimator will be biased and overly precise. They are biased because the areas with higher concentration of events will have a greater impact on the model estimate and they will overestimate precision because, since events tend to be concentrated, there are actually fewer number of independent observations than are being assumed. Indices of Spatial Autocorrelation Moran s I Geary s C Ripley s K Join Count Analysis Random - no spatial autocorrelation Overly dispersed - negatively autocorrelated c1 0 2 4 6 8 10 0 2 4 6 8 10 c2 c1 0 2 4 6 8 10 0 2 4 6 8 10 c2 3

Positive spatial autocorrelation Why does spatial auto correlation occur? c1 0 2 4 6 8 10 0 2 4 6 8 10 c2 Reaction functions? Spillovers, externalities? Unobserved similarities between places? Diffusion (disease spread)? Common activity in neighbouring areas (crime)? Common policy across neighbouring areas? Map Join Counts BB WW BW Tot. A 0 0 24 24 B 5 7 12 24 C 10 10 4 24 Computing the Joins When using a computer to compute joins, we will have twice as many joins (ex. WB, BW). Just divide by two n L = = i 1 If we have n b Black cells and n w = n n b white cells the probability p of a black cell is p b = n b /n p w = n w /n 4

Computing the probabilities Start with first cell. The probability it is black is p b and the probability of white is p w The probability of BB in two adjacent cells is p b * p b or p 2 b Probability of BW is p b * p w + p b * p w or 2 p b p w Example from Election 2000 A spatial analysis of Election 2000 Did the blue and red map really say something significant about the locations where people voted by County? ArcView example Sampling The sampling density determines the resolution of the data Samples taken at 1 km intervals will miss variation smaller than 1 km Standard approaches to sampling: Random Systematic Stratified 5

Random samples Every location is equally likely to be chosen Systematic samples Sample points are spaced at regular intervals Stratified samples Stratified samples Requires knowledge about distinct, spatially defined sub-populations (spatial subsets such as ecological zones) More sample points are chosen in areas where higher variability is expected 6

The Koch Snowflake Length = 1 Selfsimilarity and fractals First iteration 4 Length = 3 After 2 iterations 4 2 Length = 3 After 3 iterations After n iterations 4 3 Length = 3 4 n Length = 3 7

After iterations (work with me here, people) The Koch snowflake is six of these put together to form... Length 4 = 3 =... well, a snowflake. Notice that the perimeter of the Koch snowflake is infinite... Uncertainty... but that the area it bounds is finite (indeed, it is contained in the white square). John Wiley & Sons Ltd 8

Introduction Imperfect or uncertain reconciliation [science, practice] [concepts, application] [analytical capability, social context] It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Sources of Uncertainty Measurement error: different observers, measuring instruments Specification error: omitted variables Ambiguity, vagueness and the quality of a GIS representation A catch-all for incomplete representations or a quality measure U1: Conception Spatial uncertainty Natural geographic units? Bivariate/multivariate extensions? Discrete objects Vagueness Statistical, cartographic, cognitive Ambiguity Values, language 9

Scale & Geographic Individuals Regions Uniformity Function Relationships typically grow stronger when based on larger geographic units Scale and Spatial Autocorrelation No. of geographic Correlation areas 48.2189 24.2963 12.5757 6.7649 3.9902 U2: Measurement/representation Representational models filter reality differently Vector Raster 0.9 1.0 0.5 0.9 0.1 0.5 0.0 0.1 10

Statistical measures of uncertainty: nominal case How to measure the accuracy of nominal attributes? e.g., a vegetation cover map The confusion matrix compares recorded classes (the observations) with classes obtained by some more accurate process, or from a more accurate source (the reference) Example of a misclassification or confusion matrix. A grand total of 304 parcels have been checked. The rows of the table correspond to the land use class of each parcel as recorded in the database, and the columns to the class as recorded in the field. The numbers appearing on the principal diagonal of the table (from top left to bottom right) reflect correct classification. A B C D E Total A 80 4 0 15 7 106 B 2 17 0 9 2 30 C 12 5 9 4 8 38 D 7 8 0 65 0 80 E 3 2 1 6 38 50 Total 104 36 10 99 55 304 Confusion Matrix Statistics Percent correctly classified total of diagonal entries divided by the grand total, times 100 209/304*100 = 68.8% but chance would give a score of better than 0 Kappa statistic normalized to range from 0 (chance) to 100 evaluates to 58.3% Sampling for the Confusion Matrix Examining every parcel may not be practical Rarer classes should be sampled more often in order to assess accuracy reliably sampling is often stratified by class 11

Per-Polygon and Per-Pixel Assessment Error can occur in both attributes of polygons, and positions of boundaries better to conceive of the map as a field, and to sample points this reflects how the data are likely to be used, to query class at points An example of a vegetation cover map. Two strategies for accuracy assessment are available: to check by area (polygon), or to check by point. In the former case a strategy would be devised for field checking each area, to determine the area's correct class. In the latter, points would be sampled across the state and the correct class determined at each point. Interval/Ratio Case Errors distort measurements by small amounts Accuracy refers to the amount of distortion from the true value Precision refers to the variation among repeated measurements and also to the amount of detail in the reporting of a measurement The term precision is often used to refer to the repeatability of measurements. In both diagrams six measurements have been taken of the same position, represented by the center of the circle. On the left, successive measurements have similar values (they are precise), but show a bias away from the correct value (they are inaccurate). On the right, precision is lower but accuracy is higher. 12

Reporting Measurements The amount of detail in a reported measurement (e.g., output from a GIS) should reflect its accuracy 14.4m implies an accuracy of 0.1m 14m implies an accuracy of 1m Excess precision should be removed by rounding Measuring Accuracy Root Mean Square Error is the square root of the average squared error the primary measure of accuracy in map accuracy standards and GIS databases e.g., elevations in a digital elevation model might have an RMSE of 2m the abundances of errors of different magnitudes often closely follow a Gaussian or normal distribution The Gaussian or Normal distribution. The height of the curve at any value of x gives the relative abundance of observations with that value of x. The area under the curve between any two values of x gives the probability that observations will fall in that range. The range between 1 standard deviation and +1 standard deviation is in blue. It encloses 68% of the area under the curve, indicating that 68% of observations will fall between these limits. Plot of the 350 m contour for the State College, Pennsylvania, U.S.A. topographic quadrangle. The contour has been computed from the U.S. Geological Survey's digital elevation model for this area. Uncertainty in the location of the 350 m contour based on an assumed RMSE of 7 m. The Gaussian distribution with a mean of 350 m and a standard deviation of 7 m gives a 95% probability that the true location of the 350 m contour lies in the colored area, and a 5% probability that it lies outside. 13

A Useful Rule of Thumb for Positional Accuracy Positional accuracy of features on a paper map is roughly 0.5mm on the map e.g., 0.5mm on a map at scale 1:24,000 gives a positional accuracy of 12m this is approximately the U.S. National Map Accuracy Standard and also allows for digitizing error, stretching of the paper, and other common sources of positional error A useful rule of thumb is that positions measured from maps are accurate to about 0.5 mm on the map. Multiplying this by the scale of the map gives the corresponding distance on the ground. Map scale Ground distance corresponding to 0.5 mm map distance 1:1250 62.5 cm 1:2500 1.25 m 1:5000 2.5 m 1:10,000 5 m 1:24,000 12 m 1:50,000 25 m 1:100,000 50 m 1:250,000 125 m 1:1,000,000 500 m 1:10,000,000 5 km Correlation of Errors Absolute positional errors may be high reflecting the technical difficulty of measuring distances from the Equator and the Greenwich Meridian Relative positional errors over short distances may be much lower positional errors tend to be strongly correlated over short distances As a result, positional errors can largely cancel out in the calculation of properties such as distance or area U3: Analysis. Error Propagation Addresses the effects of errors and uncertainty on the results of GIS analysis Almost every input to a GIS is subject to error and uncertainty In principle, every output should have confidence limits or some other expression of uncertainty 14

Error in the measurement of the area of a square 100 m on a side. Each of the four corner points has been surveyed; the errors are subject to bivariate Gaussian distributions with standard deviations in x and y of 1 m (dashed circles). The red polygon shows one possible surveyed square (one realization of the error model). Ecological Fallacy assuming a summarized variable is true for the entire population In this case the measurement of area is subject to a standard deviation of 200 sq m; a result such as 10,014.603 is quite likely, though the true area is 10,000 sq m. In principle, the result of 10,014.603 should be rounded to the known accuracy and reported as as 10,000. MAUP Living with Uncertainty Scale + aggregation = MAUP can be investigated through simulation of large numbers of alternative zoning schemes It is easy to see the importance of uncertainty in GIS but much more difficult to deal with it effectively but we may have no option, especially in disputes that are likely to involve litigation 15

Some Basic Principles Uncertainty is inevitable in GIS Data obtained from others should never be taken as truth efforts should be made to determine quality Effects on GIS outputs are often much greater than expected there is an automatic tendency to regard outputs from a computer as the truth More Basic Principles Use as many sources of data as possible and cross-check them for accuracy Be honest and informative in reporting results add plenty of caveats and cautions Consolidation Uncertainty is more than error Richer representations create uncertainty! Need for a priori understanding of data and sensitivity analysis 16