What s special about spatial data?

Similar documents
Nature of Spatial Data. Outline. Spatial Is Special

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Spatial Analysis 1. Introduction

Spatial Data, Spatial Analysis and Spatial Data Science

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

GIST 4302/5302: Spatial Analysis and Modeling

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Introduction to Spatial Statistics and Modeling for Regional Analysis

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Exploratory Spatial Data Analysis (ESDA)

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Contents. Learning Outcomes 2012/2/26. Lecture 6: Area Pattern and Spatial Autocorrelation. Dr. Bo Wu

Lecture 8. Spatial Estimation

Lecture 5 Geostatistics

Interpolating Raster Surfaces

Spatial Autocorrelation

Everything is related to everything else, but near things are more related than distant things.

Temporal vs. Spatial Data

GIS Spatial Statistics for Public Opinion Survey Response Rates

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

Local Spatial Autocorrelation Clusters

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Spatial Data Mining. Regression and Classification Techniques

Locational Error Impacts on Local Spatial Autocorrelation Indices: A Syracuse Soil Sample Pb-level Data Case Study

Lecture 4. Spatial Statistics

Spatial Analysis 2. Spatial Autocorrelation

Urban GIS for Health Metrics

GIST 4302/5302: Spatial Analysis and Modeling

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Introducing GIS analysis

Spatial-Temporal Analytics with Students Data to recommend optimum regions to stay

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Mapping and Analysis for Spatial Social Science

Representation of Geographic Data

Geography 38/42:376 GIS II. Topic 1: Spatial Data Representation and an Introduction to Geodatabases. The Nature of Geographic Data

CSISS Tools and Spatial Analysis Software

Concepts and Applications of Kriging. Eric Krause

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

Are You Maximizing The Value Of All Your Data?

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

ENGRG Introduction to GIS

GIS Test Drive What a Geographic Information System Is and What it Can Do. Alison Davis-Holland

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

In this exercise we will learn how to use the analysis tools in ArcGIS with vector and raster data to further examine potential building sites.

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

Daniel Fuller Lise Gauvin Yan Kestens

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Overview of Statistical Analysis of Spatial Data

Objectives Define spatial statistics Introduce you to some of the core spatial statistics tools available in ArcGIS 9.3 Present a variety of example a

Development of Integrated Spatial Analysis System Using Open Sources. Hisaji Ono. Yuji Murayama

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Kernel Density Estimation (KDE) vs. Hot-Spot Analysis - Detecting Criminal Hot Spots in the City of San Francisco

Spatial Modeling, Regional Science, Arthur Getis Emeritus, San Diego State University March 1, 2016

Modeling the Ecology of Urban Inequality in Space and Time

Basics of Geographic Analysis in R

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

Implementing Visual Analytics Methods for Massive Collections of Movement Data

Spatial Statistics or Why Spatial is Special?

Texas A&M University

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Geog 469 GIS Workshop. Data Analysis

Software. People. Data. Network. What is GIS? Procedures. Hardware. Chapter 1

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

The Nature of Geographic Data

Mapping Your Educational Research: Putting Spatial Concepts into Practice with GIS. Mark Hogrebe Washington University in St.

Geographers Perspectives on the World

An Introduction to Spatial Autocorrelation and Kriging

The Study on Trinary Join-Counts for Spatial Autocorrelation

Spatial Analysis with ArcGIS Pro STUDENT EDITION

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Tracey Farrigan Research Geographer USDA-Economic Research Service

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

Techniques for Science Teachers: Using GIS in Science Classrooms.

The Case for Space in the Social Sciences

Spatial Tools for Econometric and Exploratory Analysis

In matrix algebra notation, a linear model is written as

Introducing spatial measurements and statistics

ArcGIS Platform For NSOs

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Interaction Analysis of Spatial Point Patterns

Statistics: A review. Why statistics?

Basics of GIS. by Basudeb Bhatta. Computer Aided Design Centre Department of Computer Science and Engineering Jadavpur University

GIST 4302/5302: Spatial Analysis and Modeling

An Introduction to Pattern Statistics

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Transcription:

What s special about spatial data?

Road map Geographic Information analysis The need to develop spatial thinking Some fundamental geographic concepts (PBCS) What are the effects of space? Spatial autocorrelation The potential of spatial data

Geographic information analysis What is it? Spatial data manipulation (what we do in setting up a GIS) Spatial data analysis/geovisualization (descriptive and exploratory) Spatial statistical analysis (statistical methods to determine whether the data are typical or unexpected relative to a statistical model) (Patterns observed, processes inferred) Spatial modeling (constructing models to predict spatial outcomes) Techniques and methods to enable the representation, description, measurement, comparison, and generation of spatial patterns are central to the study of geographic information analysis.

Representation of geography Vector and raster (traditional GISystems perspective) Objects and fields (GIScience perspective) Object types: a series of entities located in space. We recognize point objects, line objects, network objects, area objects. The particular instantiation can change as the scale (resolution) changes (the multiple-representation problem). Attributes identify the objects. Fields: properties that vary continuously across space. Key issues are continuity and self-definition (everywhere has a value, and the set of values define the field); represented using any one of many tessellation schemes (raster cells, TIN, hexagons). How you represent the object or field in a spatial database depends on what you want to do with it.

Patterns and processes Most GISystems people think of spatial data as being built upon the entity-attribute model However, from the spatial analyst perspective, spatial data is a representation of the patterns and processes that we hope to develop an understanding of. While GISystems may lack some of the tools to perform sophisticated spatial analyses, I strongly recommend that you use the capabilities of a GIS to organize, transform, visualize, and describe the data before you embark on the more complex process of performing spatial analysis.

GIS versus Spatial Statistics Diva GIS GWR R Estat Crimestat Fragstat ArcGIS GeoDa Quantum SatScan Passage SAS Statistics SPSS

Why such an interest now? Research question Data gathering Questions Maps (Analysis) (Statistical) Analysis Conclusions Old to new Data Spatial analysis Specialized form of knowledge discovery of data(bases) KDD Maps Conclusions

The need to develop spatial thinking We need to move from atomistic decision units to social-spatial interaction (or, think of the forest, not just the trees) Spatial spillovers (spatial lag how neighbours affect you), spatial multipliers Spatial context Neighbourhood effects, contextual effects Spatial mismatch Spatial disparities

The need to develop spatial thinking The tidal wave of geolocated observations Twitter, VGI, Open Data, Facebook, etc. The mismatch between the spatial scale of the process and the spatial scale of the observations The units we impose are not necessarily the units associated with the process (e.g., admin units [e.g., census tracts] are not behavioural units [e.g., neighbourhoods]) That is, often the patterns we observe were generated by processes operating in a different spatial unit.

The need to develop spatial thinking Error terms show systematic patterns Distance decay in observations: Precision decreases with distance from sensors Neighbourhood effect in (hedonic) house price models Change of spatial support problem (COSP) E.g., school catchments do not match census tracts, stand-level variables do not match landscape-level variables Variables measured at different spatial scales Can be nested, hierarchical structures, or Non-nested, overlapping

Maps lead to questions

Spatial analytical questions Where do things happen: patterns, clusters, hotspots Why do things happen where they happen: location decisions How does where things happen affect other things (context) and how does context affect what happens: interactions Where should things be located: optimization However, before we can start our investigations, we need to cover a few spatial data basics.

Is it okay to work with lat / lon? What are the issues that arise when working with latitudes and longitudes in spatial analysis? Method may assume you are working in a planar coordinate system. Topological relations may change as a results of projecting the data. Your (L/L) maps will distort the apparent spatial relations of the data. So, you should always work with data that has been projected using the proper datum (which should be NAD83, not WGS84) always use a projection-based coordinate system, unless

Projected UTM NAD83 Lat/Lon It should be obvious that the results of any statistical analysis would be affected if lat/lon coordinates were used as is.

What are the effects of space? What are the assumptions of traditional aspatial statistics? Samples are random Observations are independent and identically distributed Observations can be adequately described using a Gaussian distribution (along with many others.) Time series analysis introduced exceptions, with the recognition that a single time series is a sample size of 1, composed of n sequentially-correlated values (autocorrelation was introduced). Spatial analysis comes along, where again a map of data values can be viewed as a sample of size 1, with repeated nearby correlated measures (spatial autocorrelation is recognized).

How does spatial data violate the basic aspatial statistical assumptions? Random samples? No. Spatial autocorrelation (SA) is assumed to be characteristic of spatial data (why else use spatial analysis?). If geography is worth studying at all, it must be because phenomena do not vary randomly through space. SA influences statistics (any that depend on n, the sample size) since it introduces redundancy each additional piece of information provides less new information than is indicated by n. There are various measures of SA that we will get to later (both global and local).

Spatial Autocorrelation Everything is related to everything else, but near things are more related than distant things. ~Tobler Spatial interaction, contagion, externalities, spill-overs, copycatting. Spatial autocorrelation is determined both by: similarities in position similarities in attributes When spatial autocorrelation exists, many normal statistical test should not be pursued or should be adapted for spatial dependency. Also when spatial autocorrelation is identified, we need to be aware of sample bias.

Spatial Autocorrelation Spatial autocorrelation is the measure by which we quantify the proximity effects e.g., a very important component of health geography: 8 percent of childhood asthma may be attributed to living in homes within 250 feet of busy roadways (air pollution effects)

Global vs Local Indices A global measure of spatial autocorrelation gives a single summary measure of the patterns of association for the whole study region. This can conceal more localised patterns within the region. Global measures can often be broken down into local measures where the patterns of association are measured and compared for sub-regions E.g. Local Moran s, Local Getis G. Local can be used to identify hotspots and cold spots of something (e.g. crime)

Global vs Local Measures of global autocorrelation Join counts method (nominal, often binary) Moran s I (continuous) Getis G statistic Geary s C Measures of local autocorrelation Local Moran s I (LISA) Local Getis G statistic (Gi*) (Hotspot clustering)

Gore/Bush Presidential Election 2000 Is there evidence of clustering by State? Use Join Count to answer this question! Joint count Actual Jbb 60 Jgg 21 Jbg 28 Total 109 Many BB joins total number of joins = 109 = sum of neighbors/2

Join Count Statistic for Gore/Bush 2000 by State % of Votes in election Bush % (Pb) 0.49885 Gore % (Pg) 0.50115 Actual Expected Stan Dev Z-score Jbb 60 27.125 8.667 3.7930 Jgg 21 27.375 8.704-0.7325 Jbg 28 54.500 5.220-5.0763 Total 109 109.000 The expected number of joins is calculated based on the proportion of votes each received in the election (for Bush = 109*.499*.499 = 27.125) K = 109 = total number of joins There are far more Bush/Bush joins (actual = 60) than would be expected (27) Since test score (3.79) is greater than the critical value (2.54 at 1%) result is statistically significant at the 99% confidence level (p <= 0.01) Strong evidence of spatial autocorrelation clustering There are far fewer Bush/Gore joins (actual = 28) than would be expected (54) Since test score (-5.07) is greater than the critical value (2.54 at 1%) result is statistically significant at 99% confidence level (p <= 0.01) Again, strong evidence of spatial autocorrelation clustering

Moran s I and Correlation Coefficient r Differences and Similarities Correlation Coefficient r Relationship between two variables Income r = 0.71 Education or Price r = -0.71 Moran s I Involves one variable only Correlation between variable, X, and the spatial lag of X formed by averaging all the values of X for the neighboring polygons Quantity Crime in nearby area r = 0.71 r = -0.71 Crime Rate Grocery Store Density Nearby Grocery Store Density

Hot Spots and Cold Spots What is a hot spot? A place where high values cluster together What is a cold spot? A place where low values cluster together Moran s I and Geary s C cannot distinguish them They only indicate clustering e.g. low crime area e.g. high crime area Cannot tell if these are hot spots, cold spots, or both

Getis-Ord General/Global G-Statistic The G statistic distinguishes between hot spots and cold spots. It identifies spatial concentrations. G is relatively large if high values cluster together G is relatively low if low values cluster together The General G statistic is interpreted relative to its expected value The value for which there is no spatial association G > (larger than) expected value potential hot spots G < (smaller than) expected value potential cold spots A Z test statistic is used to test if the difference is statistically significant Calculation of G is based on a neighborhood distance within which cluster is expected to occur Getis, A. and Ord, J.K. (1992) The analysis of spatial association by use of distance statistics Geographical Analysis, 24(3) 189-206

The effect of SA on n Griffith (2003, p. 83) presents a formula that enables you to calculate the effective n (n*) based upon the number of sample points (n) and the spatial autocorrelation ( pp or Moran s I) of the data. In Chun and Griffith (2013) they show that for a DEM consisting of 9181 values, with a MI of 0.9967, the effective sample size is 12! Moran s I is the spatial equivalent of the standard correlation coefficient.

The potential of spatial data Not all news is bad news when it comes to working with spatial data, as there are insights that can be obtained when working in space. Four important spatial concepts that influence the patterns we observe, and hopefully can be used to help infer something about the processes that lie behind the patterns are: Distance Adjacency Interaction Neighbourhood

Distance Typically measured as the crow flies (on a planar surface, not along the topographic surface). GIS has enabled many other distances to be utilized, such as: Least cost distance (developing a cost surface using relevant factors [consider it like a topographic surface] that you then use to identify the path of least resistance across that surface [like water flowing downhill]) Network-based distances (travel only along linear features, which may have various impedances [e.g., speed limits] associated with them, along with nodes/intersections, which may have various turn restrictions associated with them). Time and direction may also play an important in the determination of the appropriate distance (e.g., rush hour, jet stream)

Adjacency Typically considered to be a binary condition one spatial entity is either adjacent to another or not. How to determine adjacency is subject to interpretation: Should only geography be considered? Often a cut-off distance is used (e.g., < 1000 m) in defining adjacency Other attributes could also easily be used, such as: Is direct travel permitted (e.g., flight schedules, bridges over rivers) Are they nearest neighbours, or possibly the five closest entities Are they rooks-case (Von Neumann) neighbours, or queen s-case (Moore) neighbours (esp. in tessellation models) Is there a minimum length of border between the two entities?

Interaction The observed effects of distance and adjacency nearer things are more related than distant things. Normally scaled to range from 0 (no interaction) to 1 (a high degree of interaction)--adjacency is a binary form of interaction. Typically measured using an inverse distance weighting approach: ωω iiii 1 dd kk Where ωω iiii is the interaction weight between the two entities i and j that are a distance d apart in space. The distance exponent, k, controls the rate of decline of the weight. The inverse power law is typically used.

Interaction The interaction between two entities can be positively weighted by some attributes of the entities. Commonly used measures are the size of the entities, based upon an attribute such as the populations, p i and p j ωω iiii pp iipp jj dd kk Other measures that could be used include trade volume, number of species in common. Interaction is important in the development of spatial interpolation methods.

Neighbourhoods We may want to formalize the concept of adjacency by declaring that all of the spatial entities adjacent to Y form the neighbourhood of Y (however adjacency was defined). In other cases we may define a neighbourhood as being the region of space within a specified distance from Y. We could also use (e.g.) clustering methods to identify groups of entities that share similar characteristics.

Summary Autocorrelation undermines conventional inferential statistics (n* << n) The modifiable areal unit problem (MAUP) also undermines conventional methods, in particular correlation and regression. Scale (extent and grain) can have a significant impact, and should always be an explicit decision. Edge effects are almost always present, although there are some approaches that try to reduce its effect. Spatial is special!

References Anselin, L. and S. J. Rey. 2014, Modern Spatial Econometrics in Practice. GeoDa Press. Bivand, R., E. Pebbesma and V. Gomex-Rubio. Applied Spatial Data Analysis with R (2 nd Ed.). Springer. Brundsdon, C. and L. Comber. 2015. An Introduction to R for Spatial Analysis & Mapping. Sage. Chun, Y and D. Griffith. 2013. Spatial Statistics & Geostatistics. Sage. Griffith, D. 2003. Spatial Autocorrelation and Spatial Filtering. Springer. O Sullivan, D. and D. Unwin. 2003. Geographic Information Analysis. Wiley.