Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Similar documents
Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Mapping and Analysis for Spatial Social Science

Spatial Autocorrelation

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (And Navigating GeoDa)

Geovisualization. Luc Anselin. Copyright 2016 by Luc Anselin, All Rights Reserved

Introduction to Spatial Statistics and Modeling for Regional Analysis

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Soc/Anth 597 Spatial Demography March 14, GeoDa 0.95i Exercise A. Stephen A. Matthews. Outline. 1. Background

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Spatial Autocorrelation

Attribute Data. ArcGIS reads DBF extensions. Data in any statistical software format can be

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Local Spatial Autocorrelation Clusters

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Nature of Spatial Data. Outline. Spatial Is Special

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Exploratory Spatial Data Analysis

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Global Spatial Autocorrelation Clustering

Lecture 1: Introduction to Spatial Econometric

Where to Invest Affordable Housing Dollars in Polk County?: A Spatial Analysis of Opportunity Areas

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Spatial Data Mining. Regression and Classification Techniques

Exploratory Spatial Data Analysis and GeoDa

Spatial Analysis 2. Spatial Autocorrelation

Creating and Managing a W Matrix

Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Basics of Geographic Analysis in R

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Tracey Farrigan Research Geographer USDA-Economic Research Service

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Temporal vs. Spatial Data

Spatial Analysis 1. Introduction

Spatial Autocorrelation (2) Spatial Weights

Lecture 5 Geostatistics

Lecture 8. Spatial Estimation

Business Statistics. Lecture 10: Course Review

STARS: Space-Time Analysis of Regional Systems

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Everything is related to everything else, but near things are more related than distant things.

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Concepts and Applications of Kriging. Eric Krause

The Implementation of Autocorrelation-Based Regioclassification in ArcMap Using ArcObjects

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées.

Construction Engineering. Research Laboratory. Approaches Towards the Identification of Patterns in Violent Events, Baghdad, Iraq ERDC/CERL CR-09-1

In matrix algebra notation, a linear model is written as

What s special about spatial data?

Spatial Data, Spatial Analysis and Spatial Data Science

Concepts and Applications of Kriging

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral

BNG 495 Capstone Design. Descriptive Statistics

AP Final Review II Exploring Data (20% 30%)

The Analysis of Sustainability Development of Eastern and South Eastern Europe in the Post Socialist Period

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Contents. Acknowledgments. xix

The GeoDa Book. Exploring Spatial Data. Luc Anselin

Overview of Statistical Analysis of Spatial Data

The Nature of Geographic Data

A Local Indicator of Multivariate Spatial Association: Extending Geary s c.

Introduction. Part I: Quick run through of ESDA checklist on our data

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

GIST 4302/5302: Spatial Analysis and Modeling

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Stat 5101 Lecture Notes

Final Project: An Income and Education Study of Washington D.C.

Subject CS1 Actuarial Statistics 1 Core Principles

Working Paper No Introduction to Spatial Econometric Modelling. William Mitchell 1. April 2013

Concepts and Applications of Kriging

GIST 4302/5302: Spatial Analysis and Modeling

Learning Objectives for Stat 225

DIFFUSION AND DYNAMIC TREND OF SPATIAL DEPENDENCE

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

An Introduction to Spatial Autocorrelation and Kriging

Community Health Needs Assessment through Spatial Regression Modeling

Descriptive Univariate Statistics and Bivariate Correlation

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Representation of Geographic Data

Spatial Regression Modeling

Glossary for the Triola Statistics Series

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

Unconstrained Ordination

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

AP Statistics Cumulative AP Exam Study Guide

Visualize and interactively design weight matrices

Spatial correlation and demography.

HUMAN CAPITAL CATEGORY INTERACTION PATTERN TO ECONOMIC GROWTH OF ASEAN MEMBER COUNTRIES IN 2015 BY USING GEODA GEO-INFORMATION TECHNOLOGY DATA

Measures of Spatial Dependence

Transcription:

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Key message Spatial dependence First Law of Geography (Waldo Tobler): Everything is related to everything else, but near things are more related than distant things 2

Spatial analysis Locational invariance Spatial analysis is not locationally invariant The results change when the locations of the study objects change Where matters! Mapping and Geovisualization Showing interesting patterns Exploratory Spatial Data Analysis Discovering interesting patterns Spatial Modeling Explaining interesting patterns 3

GeoDa https://spatial.uchicago.edu/ 4

ESDA Exploratory data analysis (EDA) uses a set of techniques to: maximize insight into a data set uncover underlying structures extract important variables detect outliers and anomalies test underlying assumptions suggest hypotheses develop parsimonious models ESDA includes spatial attributes of the data 5

Geovisualization Beyond mapping: Combining map and scientific visualization methods Exploit human pattern recognition capabilities Statistical maps Innovative map devices (quantile map, percentile map, box map, standard deviation map, conditional map) Map movie 6

Exercise 1 (mapping) Open and close a project Load a shape file with the proper indicator (Key) Select functions from the menu or toolbar Make a simple choropleth map Select items in the map Data: Morocco (gdppc_07); Brazil_UF (Y00, G00, RG00); Brazil_MR (RENDAPC, ESGT, M1SM) 7

Exercise 2 (data) Open and navigate the data table Select and sort items in the table Create new variables in the table Data: Morocco (gdppc_03, gdppc_07) Which regions presented the highest per capita GRP growth from 2003 to 2007 in Morocco? Table > Calculator 8

Exercise 3 (EDA) This exercise illustrates some basic techniques for exploratory data analysis, or EDA. It covers the visualization of the non-spatial distribution of data by means of a histogram and box plot, and highlights the notion of linking, which is fundamental in GeoDa. Data: Brazil_UF (Y00, G00) Histogram: The histogram is a discrete approximation to the density function of a random variable 9

Exercise 3 (EDA) Box plot: It shows the median, first and third quartiles of a distribution (the 50%, 25% and 75% points in the cumulative distribution) as well as a notion of outlier. An observation is classified as an outlier when it lies more than a given multiple of the interquartile range (the difference in value between the 75% and 25% observation) above or below respectively the value for the 75 th percentile and 25 th percentile. The standard multiples used are 1.5 and 3 times the interquartile range. 10

Box plot Outlier Hinge = 1.5 Interquartile range (25% a 75%) Median Mean 11

Exercise 3 (EDA) Scatter plot: The visualization of the bivariate association between variables can be done by means of a scatter plot Converting the scatter plot to a correlation plot, in which the regression slope corresponds to the correlation between the two variables (as opposed to a bivariate regression slope in the default case). Any observations beyond the value of 2 can be informally designated as outliers 12

Exercise 3 (EDA) Conditional plots: Conditional plots, also known as facet graphs or Trellis graphs (Becker, Cleveland, and Shyu 1996), provide a means to assess interactions between more than two variables. Multiple graphs or maps are constructed for different subsets of the observations, obtained as a result of conditioning on the value of two variables. Conditional scatter plot Data: Brazil_MR (RENDA=f(POP), ESGT, M1SM) 13

Exercise 4 (spatial scale and rate of density) Inference can change with scale Problem of spatial aggregation State versus micro-region Intensity maps Extensive variable tends to be correlated with size (such as area or total population) Rate or density is more suitable for a choropleth map, and is referred to as an intensive variable. Data: Brazil_UF (Y00) and Brazil_MR (RENDAPC, RENDA, POP) Rates-calculated map (Raw rate; Excess risk) 14

Spatial dependence What happens at one place depends on events in nearby places Remember: All things are related but nearby things are more related than distant things (Tobler) Categorizing Type: substantive versus nuisance Direction: positive versus negative Issues Time versus space Inference 15

Concepts Spatial weights Spatial lag Spatial autocorrelation 16

Spatial weights matrix Definition: N by N positive matrix W, elements w ij w ij nonzero for neighbors, 0 otherwise w ii = 0, no self-neighbors Geographic weights (contiguity, distance, general, graph-based weights) Socio-economic weights 17

Contiguity weights Contiguity: sharing a common boundary of non-zero length Three views of contiguity: rook bishop queen rook queen 18

Exercise 5 (spatial weights) Contiguity-based spatial weights Create a first order contiguity spatial weights file from a polygon shape file, using both rook and queen criteria (.GAL) Connectivity structure of the weights in a histogram Higher order contiguity Data: Brazil_UF; Morocco 19

Exercise 6 (spatial weights) Distance-based spatial weights Create a distance-based spatial weights file from a point shape file, by specifying a distance band Adjust the critical distance Create a spatial weights file based on a k-nearest neighbor criterion Data: Hedonic; Brazil_UF; Morocco 20

Spatially lagged variables Spatially lagged variables are an essential part of the computation of spatial autocorrelation tests and the specification of spatial regression models Wy: is the average of a variable in the neighboring spatial units (e.g. WY00 is the average per capita income of the neighbors) 21

Exercise 7 (spatial lag) Scatter plot of Y00 and WY00 Use standardized values Data: Brazil_UF (Y00) 22

Clustering Global characteristic Property of overall pattern = all observations are like values more grouped in space than random Test by means of a global spatial autocorrelation statistic No location of the clusters determined 23

Clusters Local characteristic Where are the like values more grouped in space than random? Property of local pattern = location-specific Test by means of a local spatial autocorrelation statistic Local clusters may be compatible with global spatial randomness 24

Spatial autocorrelation statistic Formal test of match between value similarity and locational similarity Statistic summarizes both aspects Significance: how likely is it (p-value) that the computed statistic would take this (extreme) value in a spatially random pattern? 25

Attribute similarity Summary of the similarity or dissimilarity of a variable at different locations: Variable y at locations i, j with i j Measures of similarity: Cross product: y i y j Measures of dissimilarity: Squared differences: (y i - y j ) 2 Absolute differences: y i - y j 26

Global spatial autocorrelation (Moran s I) I = n S 0 n i=1 n j =1 n i=1 w ij z i z j 2 zi z i = x i x Cross-product statistic Similar to a correlation coefficient Value depends on W 27

Moran scatter plot Moran s I as a regression slope Regress Wz on z Moran scatter plot: linear association between Wz on y-axis and z on the x-axis Each point is a pair (zi, Wzi), slope is I 28

Spatial average of standardized value of the variable Schematic presentation of Moran scatter plot Standardized value of the variable 29

Exercise 8 (Moran scatter plot) Same as exercise 7 Check the influence of outliers ± two standard-deviations Use GeoDa menu: Space > Univariate Moran s I Data: Brazil_UF (Y00) 30

Inference Null hypothesis: spatial randomness Observed spatial pattern of values is equally likely as any other spatial pattern Values at one location do not depend on values at other (neighboring) locations Under spatial randomness, the location of values may be altered without affecting the information content of the data 31

Computational inference Permutation approach Reshuffle observations each observation equally likely to fall on each location Construct reference distribution from random permutations Normal approximation: I 1 n 1 Not zero, but approaches zero as n E Pseudo significance 32

Exercise 9 (inference) Illustrate the concept of spatial autocorrelation by contrasting real data that is highly correlated in space (% African-American 2000 Census tract data for Milwaukee MSA) with the same observations randomly distributed over space Box maps and Moran scatterplots Data: MSA (PCTBLCK, PCTBLACK) 33

Global versus local analysis Global analysis One statistic to summarize pattern Clustering Homogeneity Local analysis Location-specific statistics Clusters Heterogeneity 34

LISA definition Local Indicator of Spatial Autocorrelation Anselin (1995) Local Spatial Statistic Indicate significant spatial autocorrelation for each location Local-global relation Sum of LISA proportional to a corresponding global indicator of spatial autocorrelation 35

Local Moran statistic l i = z i m 2 w ij j z j m 2 = z i 2 i, l i i = nl, l = l i n i Global is mean of locals Link local-global 36

Inference Computational Conditional permutation Hold value at i fixed, permute others 37

LISA significance map Locations with significant local statistics Sensitivity analysis to p-value Choropleth Map Shading by significance Non-significant locations not highlighted 38

LISA cluster map Only the significant locations Matches significance map Types of spatial autocorrelation Spatial clusters high-high (red), low-low (blue) Spatial outliers high-low (light red), low-high (light blue) 39

Spatial clusters and spatial outliers Spatial outliers Individual locations Spatial clusters Core of the cluster in LISA map Cluster itself also includes neighbors Use p < 0.01 to identify meaningful cluster cores and their neighbors Conditional local cluster maps 40

Caveats LISA clusters and hot spots Suggest interesting locations Suggest significant spatial structure Do not explain Need to account for multivariate relations Univariate spatial autocorrelation due to other covariates Spatial Econometrics 41

Exercise 10 (LISA) Compute the local Moran statistic and associated significance map and cluster map Assess the sensitivity of the cluster map to the number of permutations and significance level Interpret the notion of spatial cluster and spatial outlier Data: Brazil_MR (RENDAPC); MSA (PCTBLCK, PCTBLACK); Grid100 (ZAR09, RANZAR09) 42

Exercise 11 (Conditional local cluster map) The cluster maps can be incorporated in a conditional map view. This is accomplished by selecting the Show As Conditional Map option Data: Brazil_MR (RENDAPC, ESGT, M1SM) 43

Reference The notes for this lecture were adapted from those elaborated by Prof. Sergio Rey for the course Geographic Information Analysis GPH 483/598, held in the Spring of 2014 at the School of Geographical Sciences and Urban Planning at the Arizona State University. They also relied on previous material prepared by Prof. Eduardo Haddad for the course Regional and Urban Economics, held yearly at the Department of Economics at the University of Sao Paulo. 44