Exploratory Spatial Data Analysis (ESDA)

Similar documents
GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Modeling Spatial Relationships using Regression Analysis

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Modeling Spatial Relationships Using Regression Analysis

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Spatial Analysis 1. Introduction

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Summary of OLS Results - Model Variables

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi

Exploratory Spatial Data Analysis (And Navigating GeoDa)

Concepts and Applications of Kriging. Eric Krause

CSISS Tools and Spatial Analysis Software

Introduction to Spatial Statistics and Modeling for Regional Analysis

Mapping Your Educational Research: Putting Spatial Concepts into Practice with GIS. Mark Hogrebe Washington University in St.

Attribute Data. ArcGIS reads DBF extensions. Data in any statistical software format can be

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Concepts and Applications of Kriging

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Introduction. Part I: Quick run through of ESDA checklist on our data

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Statistics: A review. Why statistics?

Using Spatial Statistics and Geostatistical Analyst as Educational Tools

Spatial Investigation of Mineral Transportation Characteristics in the State of Washington

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

Where to Invest Affordable Housing Dollars in Polk County?: A Spatial Analysis of Opportunity Areas

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

ESRI 2008 Health GIS Conference

Lecture 8. Spatial Estimation

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

GIS Spatial Statistics for Public Opinion Survey Response Rates

Lecture 5 Geostatistics

Concepts and Applications of Kriging

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Regression Analysis of 911 call frequency in Portland, OR Urban Areas in Relation to Call Center Vicinity Elyse Maurer March 13, 2015

Objectives Define spatial statistics Introduce you to some of the core spatial statistics tools available in ArcGIS 9.3 Present a variety of example a

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

A geographically weighted regression

Geospatial dynamics of Northwest. fisheries in the 1990s and 2000s: environmental and trophic impacts

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Daniel Fuller Lise Gauvin Yan Kestens

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral

Mapping and Analysis for Spatial Social Science

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

Shana K. Pascal Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55408

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

A FOSS Web Tool for Spatial Regression Techniques and its Application to Explore Bike Sharing Usage Patterns

Geographically Weighted Regression as a Statistical Model

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Everything is related to everything else, but near things are more related than distant things.

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

In matrix algebra notation, a linear model is written as

A Space-Time Model for Computer Assisted Mass Appraisal

Spatial Trends of unpaid caregiving in Ireland

GIS CONFERENCE MAKING PLACE MATTER Decoding Health Data with Spatial Statistics

Terms ABBR Definition

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Context-dependent spatial analysis: A role for GIS?

Spatial Regression Modeling

The Cost of Transportation : Spatial Analysis of US Fuel Prices

Spatial Statistics For Real Estate Data 1

Spatial Analysis 2. Spatial Autocorrelation

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

Spatial Modeling, Regional Science, Arthur Getis Emeritus, San Diego State University March 1, 2016

Spatial Analysis with ArcGIS Pro STUDENT EDITION

Spatial Data Mining. Regression and Classification Techniques

APPLICATION OF GEOGRAPHICALLY WEIGHTED REGRESSION ANALYSIS TO LAKE-SEDIMENT DATA FROM AN AREA OF THE CANADIAN SHIELD IN SASKATCHEWAN AND ALBERTA

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Outline. ArcGIS? ArcMap? I Understanding ArcMap. ArcMap GIS & GWR GEOGRAPHICALLY WEIGHTED REGRESSION. (Brief) Overview of ArcMap

envision Technical Report Archaeological Prediction Maps Kapiti Coast

Spatial Autocorrelation

What s special about spatial data?

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Geographically weighted methods for examining the spatial variation in land cover accuracy

Modeling the Ecology of Urban Inequality in Space and Time

Data Structures & Database Queries in GIS

TESTING FOR CO-INTEGRATION

SPATIO-TEMPORAL ANALYSIS OF PRECIPITATION AND TEMPERATURE DISTRIBUTION OVER TURKEY

Evaluating sustainable transportation offers through housing price: a comparative analysis of Nantes urban and periurban/rural areas (France)

Lecture 4. Spatial Statistics

CHAPTER 6: SPECIFICATION VARIABLES

Comparison of spatial methods for measuring road accident hotspots : a case study of London

Introduction To Raster Based GIS Dr. Zhang GISC 1421 Fall 2016, 10/19

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Geographically weighted regression approach for origin-destination flows

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Spatial Autocorrelation

Urban Residential Land Value Analysis: The Case of Potenza

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

ArcGIS for Geostatistical Analyst: An Introduction. Steve Lynch and Eric Krause Redlands, CA.

Transcription:

Exploratory Spatial Data Analysis (ESDA) VANGHR s method of ESDA follows a typical geospatial framework of selecting variables, exploring spatial patterns, and regression analysis. The primary software tools used for ESDA are Arizona State University s GeoDa and Esri s ArcGIS (Spatial Statistics Toolbox). VANGHR selects the appropriate dependent and independent variables to examine. A combination of histograms and scatter plot matrices are used in visualizing the distribution of the data in space. Spatial autocorrelation is used to examine the nature of the spatial pattern of observation points (clustered vs. random vs. dispersed). Hot spot analysis is used to explore the clustering patterns of selected variables. Spatial autocorrelation is also used to explore the association between two variables. The next step in the process is to run an ordinary least square (OLS) regression. Based on the results of the OLS regression, VANGHR determines whether a geographically weighted regression analysis is necessary. Figure 1: VANGHR Analytic Process Spatial Autocorrelation in Bivariate Analysis Using GeoDa s multivariate LISA (local indicator of spatial association) analysis, the correlation between the two variables is examined. The analysis produces both an overall Moran s I for the study area and a LISA value for each feature. Only those features with a statistically significant (p = 0.05) LISA value are mapped. Default permutation values are accepted. A graphic map output of the nature of the association is also produced. This cluster map indicates the

classification of the association of the two variables by a census tract as High/High, Low/Low, Low/High, and High/Low. In GeoDa, a new project is opened with a polygon shapefile containing the appropriate variables. A weights matrix is then created for the polygon shapefile. The Multivariate LISA tool was selected from the Space Toolbar with the following inputs: 1 st Variable (Y): Dependent Variable 2 nd Variable (X): Independent Variable Choose to open Cluster Map and Moran Scatter Plot The results of the analysis are saved by right-clicking the map output, choosing Save Results, and selecting LISA indicies, Clusters, and Significances. The shapefile table is then saved to a new shapefile. This new shapefile is then opened in ArcGIS and the cluster field is symbolized to indicate the association between the two variables at the local level. Geospatial Regression Methods in Multivariate Analysis Geographically Weighted Regression (GWR) is a technique for exploratory data analysis that provides estimates of regression coefficients for each geographical location, based on a weighting of other observations near that location (Mitchell, 2005). The basic assumption is that observations exhibit spatial dependency. This has its root from the first law of geography by Tobler which says that, "Everything is related to everything else, but near things are more related than distant things," (Tobler, 1970). Ordinary Least Squares (OLS) regression serves as a starting point to build a well specified GWR and guides the researcher to select the key explanatory variables. Upon completion of the OLS, verification of the six tests for OLS is required before proceeding to GWR. Below are the six tests: 1. Coefficients have the expected signs test to determine if the signs associated with each variable are appropriate. For example, in trying to model stroke hospitalization, when an analyst sees that the population 65 and over is negatively associated with stroke hospitalization, further investigation of the model is likely needed, since age (65 and over) is one of the risk factors for stroke hospitalization. 2. AIC (Akaike Information Criterion) test to measure how the model performed and compare different regression models. The model with the lower AIC is held to be better. This indicator also addresses the benefit of moving from OLS regression (Global) to GWR (local regression). 3. Variance Inflation Factor (VIF) - test to verify if two or more variables are telling the same story (colinearity). The rule is that any variable with greater than 7.5 VIF should be removed. 4. Jarque-Bera Statistic Test test to verify that the residuals are normally distributed and for model bias. Since the null hypothesis is that the residuals are normally distributed, test to make sure that this is not statistically significant. If this test has a value < or = 0.05, this means that the model is bias and cannot be trusted. Run spatial autocorrelation (Moran s Index) to make sure the residuals are random. If the residuals cluster, that will be an indication that a critical explanatory variable is missing from our model. 5. Adjusted R-Square the adjusted R-Square from the OLS is used to compare the goodness of fit. This value is compared with the adjusted R-Square from GWR to see which model has a higher proportion of dependent variable variance accounted for by the regression model.

6. Koenker Statistics test to determine which variables to select, based on the p-value. For example, if the Koenker test is significant, the analyst can only trust the robust probability. This assumes that the errors in the model are normally distributed. This is a test for non-stationarity and the robust probability will be an indicator of regional variation. After the six tests for a properly specified model, the analyst selects the key variables from the OLS based on the p-value (statistically significant) to run the GWR. One aspect of GWR is that the estimated parameters are, in part, dependent on the weighting function or kernel selected. In executing the GWR, the number of neighbors used for each local estimation becomes very important. The shape and size of the bandwidth determines which features will be used to calibrate each local equation. It is always necessary to let the program choose a bandwidth or neighbor value that will identify an optimal fixed distance or optimal adaptive number of neighbors. Select either Fixed or Adaptive as the kernel type and AICc or CV as bandwidth. If the observations are either reasonably or regularly positioned in the study area then the analyst will use Fixed kernel. This function uses the same distance as the coordinate system for feature class. The assumption is that each regression point is constant across the study area. At the regression point, the weight of a data point is unity and the weight decreases as distance from the regression point increases (Fotheringham, 2002). If the observations are clustered, then the analyst will apply the Adaptive kernel. For the bandwidth parameter, VANHGR s approach is that any selection of number of neighbors should be based on theory by letting the data determine the number of neighbors to be included in the analysis. This is because as the bandwidth gets larger, the weights approach unity and the local GWR model approaches the global OLS model (Fotheringham, 2002). This is why it is imperative to let the data specify the distance or number of neighbors to be included in the model calibration. When features have a large variation in distribution (sparse in some areas, dense in others), adaptive kernel becomes the best choice. Another technique VANGHR uses is optimization techniques in selecting appropriate value for the selected parameter by plotting the distance against the spatial autocorrelation Z-Score until an optimum distance is reached or at a point where the Z-score begins to decline. The optimum distance ensures that each feature has at least one neighbor. An alternate technique utilized by VANGHR in selecting the number of neighbors parameter is to use a hot spot analysis based on the dependent variable. The p-value of the resulting Z scores in the hot spot analysis are evaluated and the number of features with a p-value of 0.05 or less is the number of neighbors to use as the parameter in the GWR analysis. The output of the GWR is mapped (Local R-Square) to show where the model performed well and if it is answering the question being asked. The standardized residuals are also mapped to see if the residuals are random. The analyst will then run spatial autocorrelation to make sure the residuals are random. If they form clusters of high and low residuals, then the analyst takes a second look at the data to make sure a critical variable is not missing from the model.

Example Exploratory Spatial Data Analysis Maps

High Priority Target Areas -- Asthma

High Priority Target Areas -- Asthma References: Fotheringham AS, Brunsdon C and Charlton M, 2002, Geographically Weighted Regression: the analysis of spatially varying relationships, Chichester: Wiley Andy Mitchell, 2005, The ESRI Guide to GIS: Spatial Measurement & Statistics: ESRI Press. Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.

Contact Information: Steve Sedlock ssedlock@vnghr.org Virginia Network for Geospatial Health Research, Inc. PO Box 15818 Richmond, VA 23227 804.264.3325