Spatial Regression Modeling

Similar documents
Using Spatial Statistics Social Service Applications Public Safety and Public Health

Community & Environmental Sociology/Sociology 977 Spatial Data Analysis

Exploratory Spatial Data Analysis (ESDA)

Spatial Regression Modeling

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Introduction. Part I: Quick run through of ESDA checklist on our data

Modeling the Ecology of Urban Inequality in Space and Time

Attribute Data. ArcGIS reads DBF extensions. Data in any statistical software format can be

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Introduction to Spatial Regression Analysis ICPSR Summer Program University of North Carolina at Chapel Hill. University of Wisconsin-Madison

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

ESRI 2008 Health GIS Conference

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Spatial Regression. 10. Specification Tests (2) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Investigation of Mineral Transportation Characteristics in the State of Washington

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Spatial Analysis 1. Introduction

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Modeling Spatial Relationships using Regression Analysis

Modeling Spatial Relationships Using Regression Analysis

A Guide to Modern Econometric:

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

ECON 497: Lecture 4 Page 1 of 1

LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE

County child poverty rates in the US: a spatial regression approach

Decision 411: Class 7

CSISS Tools and Spatial Analysis Software

Departamento de Economía Universidad de Chile

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Empirical Economic Research, Part II

Introduction to Spatial Statistics and Modeling for Regional Analysis

A Space-Time Model for Computer Assisted Mass Appraisal

Introduction to Spatial Regression Analysis ICPSR 2014

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Application of Spatial Regression Models to Income Poverty Ratios in Middle Delta Contiguous Counties in Egypt

A Space-Time Model of Fertility and Development in China, Katherine King University of Michigan

GeoDa and Spatial Regression Modeling

Statistics: A review. Why statistics?

Testing Restrictions and Comparing Models

Community Health Needs Assessment through Spatial Regression Modeling

Introduction to Eco n o m et rics

Spatial Analysis 2. Spatial Autocorrelation

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

LATVIAN GDP: TIME SERIES FORECASTING USING VECTOR AUTO REGRESSION

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Spatial Effects and Externalities

Geographically Weighted Regression as a Statistical Model

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

Katherine J. Curtis 1, Heather O Connell 1, Perla E. Reyes 2, and Jun Zhu 1. University of Wisconsin-Madison 2. University of California-Santa Cruz

CHAPTER 6: SPECIFICATION VARIABLES

Chapter 8 Heteroskedasticity

Spatial Modeling, Regional Science, Arthur Getis Emeritus, San Diego State University March 1, 2016

In matrix algebra notation, a linear model is written as

Introduction to Econometrics

Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Spatial Effects in Convergence of Portuguese Product

Chapter 1 Statistical Inference

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral

Spatial Modelling of Disparity in Economic Activity and Unemployment in Southern and Oromia Regional States of Ethiopia

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

COLUMN. Spatial Analysis in R: Part 2 Performing spatial regression modeling in R with ACS data

Daniel Fuller Lise Gauvin Yan Kestens

Christopher Dougherty London School of Economics and Political Science

SPACE Workshop Santa Barbara, California July 2007

Regional patterns and correlates in recent family formation in Japan: Spatial Analysis of Upturn in Prefecture-level Fertility after 2005

Multiple Regression Analysis

Cluster Analysis using SaTScan

Multiple Regression Analysis: Heteroskedasticity

ISQS 5349 Spring 2013 Final Exam

Spatial Effects in Convergence of Portuguese Product

SIMULATION AND APPLICATION OF THE SPATIAL AUTOREGRESSIVE GEOGRAPHICALLY WEIGHTED REGRESSION MODEL (SAR-GWR)

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Lecture 4: Multivariate Regression, Part 2

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

The Cost of Transportation : Spatial Analysis of US Fuel Prices

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Volatility. Gerald P. Dwyer. February Clemson University

Multivariate Time Series: VAR(p) Processes and Models

10. Alternative case influence statistics

OLSQ. Function: Usage:

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Quadratic Equations Part I

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

SPATIAL ECONOMETRICS: METHODS AND MODELS

Transcription:

Spatial Regression Modeling Paul Voss & Katherine Curtis The Center for Spatially Integrated Social Science Santa Barbara, CA July 12-17, 2009 Day 4

Plan for today Focus on spatial heterogeneity A bit of wrap-up and EDA regarding our regression results from yesterday morning Putting it all together: A worked example Spatial heterogeneity in relationships Geographically Weighted Regression (GWR) Lab: Spatial regression diagnostics & model strategies; GWR

Questions?

As class ended yesterday, we quickly looked at the results of a simple OLS multivariate regression model using GeoDa Dependent variable: sqrt(ppov) Independent variables: sqrt(unem) sqrt(pfhh) log(hdplus)

Compare R results with the same model run in GeoDa As a reminder, here were the GeoDa results

GeoDa output OLS regression Virtually identical results

Advantages of AIC and SC Both measures improve (decline) as R 2 increases, but they degrade as the model size increases Like adjusted, R 2, they place a premium on achieving a given fit with a smaller number of parameters per observation, but the penalty for added variables is greater Both criteria have their virtues, neither has an advantage over the other. The SC, with its heavier penalty for degrees of freedom lost, will lean toward a simpler model

Some general comments regarding goodness of fit measures R 2 is fine for OLS; other estimators often generate a pseudo R 2 (which is usually the squared correlation of observed and predicted values) Log likelihood values can be used to compare models only if the models are nested AIC is more general measure than log likelihood, but must play by the rules: Like Log likelihood, can use AIC to compare models of the same type (e.g., OLS) with same DV but different specifications, or Can compare models of different types (e.g., OLS, spatial regression, GWR) if they have the same model specification

End of digression regarding goodness of fit measures What about all those other diagnostics at the end of the GeoDa regression output?

Recall, the lower half of the GeoDa output from the OLS regression run looked like this

We need seriously to consider this part of the output There are some troubling numbers here. Let s take them one at a time

Part 2 of the GeoDa regression output

One advantage of R over GeoDa is that the diagnostic analysis capability is much richer in R And we looked at some of this in the lab session on Tuesda

A very useful diagnostic plot. What are we looking for here and what s our conclusion?

There are other common residual diagnostic plots. Here s a residual vs. carrier plot Heteroskedasticity? Probably

Another residual vs. carrier plot Heteroskedasticity? Probably, although the fan seems a bit softer to me

Another residual vs. carrier plot Seems to be much less heteroskedasticity coming from the education variable

What about the rejection of normality among residuals found in the GeoDa run? They don t look too bad here. Maybe it s the sample size that s just giving us a large JB statistic Shapiro-Wilk normality test gives us the same story, however: data: residuals(reg1) W = 0.9716, p-value = 6.702e-16

How can the Q-Q plot of residuals further inform us? The residual Q-Q plot confirms that we have a problem. The problem continues even if we remove the most obvious offenders (almost all counties in Texas)

What about the rejection of normality among residuals found in the GeoDa run? Here it is with 8 serious outliers removed from the plot. Better, but still some problems. Shapiro-Wilk normality test data: residuals(reg1)[-c(275, 774, 1048, 1066, 1114, 1130, 1135, 1143)] W = 0.9915, p-value = 3.867e-07

What s the take-home message? GeoDa and R give us the same message (which they should!) Unless you re very fortunate, the OLS model diagnostics almost always leave us with a bad taste. Why? Because we need to and want to move on. We especially want to do something about the spatial autocorrelation in the residuals But neither the residual Moran statistic nor the Lagrange multiplier statistics are trustworthy in the presence of non-normality & heteroskedasticy Furthermore, Monte Carlo simulations have shown that in the presence of residual spatial autocorrelation, heteroskedasticity is induced This is where we begin to look for ways of reducing the unresolved heterogeneity that appears to be plaguing our OLS model

Perhaps a worked example will help An example using county levels of child poverty, all U.S. counties: 2000

Objective Considering spatial effects when analyzing the uneven geographic distribution of child poverty in the U.S. Still a work in progress A central issue is whether to model our data under an assumption of spatial dependence or spatial heterogeneity

Our Data Proportion of Children in Poverty (transformed) 2000 Census of Population (U.S.) Counties are unit of analysis (n = 3,074) Contiguous 48 states only Most indep. cities merged with surrounding county

Log Odds of Proportion of Children in Poverty: 2000

Industrial Structure % Extr. ind. % Non dur. mfg. ind. % Misc. svcs. % Prof. svcs. Our Model (after Friedman and Lichter, 1998) Emp. Oppy. Structure % Unemp. % Males under emp. Family Structure % of families w/ children headed by females Control Variables % Hispanic % Black % HS or less % Emp. in county Log Odds % Children in Poverty

Begin with a Standard Regression Approach and examine the diagnostics

Standard Regression Results Variable Industrial composition: Parameter (p-value) Extractive (+) 2.377 (0.0000) Non-durable manufacturing (+) -0.101 (0.5359) ns Miscellaneous services (+) 0.246 (0.2215) ns Professional services (-) 0.614 (0.0000) Local employment opportunities: Prop. LF unemployed (+) 4.942 (0.0000) Prop. male underemployment (+) 1.415 (0.0000) Family structure: Prop. Female-headed families (+) 3.883 (0.0000) Control variables: Proportion black (+) -0.045 (0.5254) ns Proportion Hispanic (+) 0.396 (0.0000) Prop. HS education or less (+) 2.405 (0.0000) Proportion work in county (+) 0.236 (0.0000) Intercept -4.854 (0.0000) Adjusted R 2 0.751 Jarque-Bera test (normality of errors) 7480.38 (0.0000) B-P test (Heteroskedasticity) 401.27 (0.0000) Moran s I (residuals) 0.331 (0.0000) Compare Moran s I on dep. var. = 0.590

Test for Normality of Residuals JB = n 6 k S 2 K + ( 3) 4 2 H : JB = 0 0 The statistic has an asymptotic chisquared distribution with two degrees of freedom (one for skewness, one for kurtosis). 5% critical value is 5.99

Tests for Heteroskedasticity H = E 2 i Var[ ε ] [ ] = 0 i : ε σ 2 H A + : σ 2 = σ 2 f ( α z α ) i 0 p p Breusch and Pagan (1979) Koenker and Bassett (1982) White (1980) p Tests are not valid in the presence of spatial dependence

Standard Regression Results Variable Industrial composition: Parameter (p-value) Extractive (+) 2.377 (0.0000) Non-durable manufacturing (+) -0.101 (0.5359) ns Miscellaneous services (+) 0.246 (0.2215) ns Professional services (-) 0.614 (0.0000) Local employment opportunities: Prop. LF unemployed (+) 4.942 (0.0000) Prop. male underemployment (+) 1.415 (0.0000) Family structure: Prop. Female-headed families (+) 3.883 (0.0000) Control variables: Proportion black (+) -0.045 (0.5254) ns Proportion Hispanic (+) 0.396 (0.0000) Prop. HS education or less (+) 2.405 (0.0000) Proportion work in county (+) 0.236 (0.0000) Intercept -4.854 (0.0000) Adjusted R 2 0.751 Jarque-Bera test (normality of errors) 7480.38 (0.0000) B-P test (Heteroskedasticity) 401.27 (0.0000) Moran s I (residuals) 0.331 (0.0000)

So, what to do now.(?) Proceed directly to re-specify the process under a spatial dependence assumption? A common approach, but not necessarily the best one. We really should try first to do something with the obvious spatial heterogeneity (non-stationarity) Moreover, are we ready at this point to assume that poverty is a social phenomenon resulting from spatial interaction? Eliminate or control for the spatial heterogeneity? Model it first and then work with residuals as the new dependent variable Add a trend surface to our OLS model Seek other variables to improve specification? Identify spatial regimes and interaction effects Shift to a space-time framework

We ll proceed to perform each of these options and comment along the way Hopefully in the end we ll have some sense of reasonable ways to introduce the different spatial effects as possible alternative datagenerating processes w.r.t. child poverty

So let s proceed directly to re-specify the process under a spatial dependence assumption The Lagrange Multiplier statistics suggested preference for a spatial error dependence model

Spatial Dependence Models Variable (A) = OLS (A) + Spatial Lag (A) + Spatial Error (Several control variables) *** (***) *** (***) *** (***) Industrial composition: Extractive 2.377 (0.0000) 1.904 (0.0000) 2.080 (0.0000) Non-durable manufacturing -0.101 (0.5359) ns 0.011 (0.9383) ns -0.069 (0.6720) ns Miscellaneous services 0.246 (0. 2215) ns -0.035 (0.8432) ns -0.030 (0.8830) ns Professional services 0.614 (0.0000) 0.292 (0. 0122) 0.295 (0.0366) Local employment opportunities: Prop. LF unemployed 4.942 (0.0000) 3.636 (0.0000) 3.798 (0.0000) Prop. male underemployment 1.415 (0.0000) 1.296 (0.0000) 1.519 (0.0000) Family structure: Prop. Female-headed families 3.883 (0.0000) 3.844 (0.0000) 4.129 (0.0000) Intercept: -4.854 (0.0000) -3.591 (0.0000) -4.347 (0.0000) Spatial parameter: 0.351 (0.0000) 0.612 (0.0000) Diagnostics: Robust LM (lag) 155.82 (0.0000) Robust LM (error) 352.20 (0.0000) Heteroskedasticity (B-P) 401.27 (0.0000) 740.47 (0.0000) 800.09 (0.0000) Likelihood -578.94-259.10-205.94 AIC 1181.88 544.21 435.88 Moran s I (residuals) 0.331 0.079-0.048

So Where are we? Happy with these results? In particular, are we satisfied with the spatial error dependence model? Write it up and ship it off?? But what s our theory? How do we introduce the spatial dependence model? or interpret it? What about apparent unresolved spatial dependence and heterogeneity? Just let it go?

A brief digression to address the theory question Why do high poverty counties (and low poverty counties) cluster in space? Poverty spillover Importance of family Welfare levels as product of spatial dependence Tendency of non-poor to live near one another Role of government in fostering economic segregation Legacy effects

So let s say we re satisfied with the spatial dependence model But what about the apparent heteroskedasticity in the spatial error model? This clearly is a sign that we ve not resolved the nonstationarity (heterogeneity) in our model and this has consequences Just ignore the heterogeneity? Try to model it first and then examine the remaining dependence process assuming stationarity? Bring it into our model? how? trend surface?

Trend Surface Analysis One commonly suggested approach to handling spatial heterogeneity Alas, there are problems (Ripley, 1981) It tends to distort ( wave ) the surface at the edges of the area in order to fit points in the center For higher orders, the polynomial terms tend to be highly correlated, causing fierce multicollinearity problems Uneven distribution in the data can distort the fit High positive autocorrelation in the process can lead to a tendency to fit a surface of too high an order Nevertheless

Second-Order Trend Surface?

Variable (A) OLS (B) Trend Surface only (Several control variables) *** (***) *** (***) Industrial composition: (C) OLS Vars + Trend Extractive 2.377 (0.0000) 2.488 (0.0000) Non-durable manufacturing -0.101 (0.5359) ns 0.043 (0.7781) ns Miscellaneous services 0.246 (0.2215) ns 0.005 (0.9785) ns Professional services 0.614 (0.0000) 0.584 (0.0000) Local employment opportunities: Prop. LF unemployed 4.942 (0.0000) 5.402 (0.0000) Prop. male underemployment 1.415 (0.0000) 1.567 (0.0000) Family structure: Prop. Female-headed families 3.883 (0.0000) 4.039 (0.0000) Intercept -4.854 (0.0000) 7.081 (0.0000) -1.568 (0.0026) Large-scale spatial effect: Trend Surface Models Long 0.037 (0.0021) 0.030 (0.0000) Lat -0.319 (0.0000) -0.067 (0.0000) Long 2 0.000 (0.8756) ns 0.000 (0.8271) ns Lat 2 0.002 (0.0000) -0.000 (0.0628) ns Long * Lat -0.001 (0.0000) -0.001 (0.0000) AIC 1181.88 4745.20 789.12 Moran s I (residuals) 0.331 0.474 0.250

Might even try adding the spatial dependence processes to this latter model Such models assume the data-generating process is approximated both by a spatial heterogeneity component and a spatial dependence component

Variable (C) OLS vars + trend (C) + Spatial Lag (C) + Spatial Error (Several control variables) *** (***) *** (***) *** (***) Industrial composition: Extractive 2.488 (0.0000) 2.047 (0.0000) 2.161 (0.0000) Non-durable manufacturing 0.043 (0.7781) ns 0.041 (0.7753) ns -0.042 (0.7941) ns Miscellaneous services 0.005 (0.9785) ns -0.060 (0.7375) ns -0.046 (0.8159) ns Professional services 0.584 (0.0000) 0.369 (0.0019) 0.350 (0.0111) Local employment opportunities: Prop. LF unemployed 5.402 (0.0000) 4.152 (0.0000) 4.077 (0.0000) Prop. male underemployment 1.567 (0.0000) 1.411 (0.0000) 1.564 (0.0000) Family structure: OLS Vars. + Trend Surface + Spatial Effects Prop. Female-headed families 4.039 (0.0000) 3.912 (0.0000) 4.188 (0.0000) Intercept -1.568 (0.0026) -2.578 (0.0000) -1.446 (0.1065) ns Large-scale spatial effect: Long 0.030 (0.0000) 0.016 (0.0125) 0.024 (0.0539) ns Lat -0.067 (0.0000) -0.009 (0.5547) ns -0.078 (0.0060) Long 2 0.000 (0.8271) ns -0.000 (0.2715) ns -0.000 (0.4405) ns Lat 2-0.000 (0.0628) ns -0.001 (0.0001) -0.000 (0.3379) ns Long * Lat -0.001 (0.0000) -0.001 (0.0000) -0.001 (0.0000) AIC 789.12 426.93 336.95 Moran s I (residuals) 0.250 0.073-0.032

We now continue by asking whether unresolved spatial heterogeneity (and the consequent heteroskedasticity in the model) might better be addressed by exploring the role of a key missing variable What might be a good candidate for such a variable? Possibly a poverty legacy variable?

Models with Legacy Variable (1980 child poverty) Variable OLS + Legacy OLS + Legacy + Trend OLS + Leg + Trend + Sp Error (Several control variables) *** (***) *** (***) *** (***) Industrial composition: Extractive 1.581 (0.0000) 1.710 (0.0000) 1.555 (0.0000) Non-durable manufacturing -0.011 (0.9391) ns 0.118 (0.4043) ns 0.022 (0.8861) ns Miscellaneous services 0.066 (0.7216) ns -0.153 (0.3819) ns -0.196 (0.2923) ns Professional services 0.426 (0.0004) 0.449 (0.0001) 0.286 (0.0260) Local employment opportunities: Prop. LF unemployed 3.917 (0.0000) 4.232 (0.0000) 3.297 (0.0000) Prop. male underemployment 0.886 (0.0000) 1.067 (0.0000) 1.218 (0.0000) Family structure: Prop. Female-headed families 3.373 (0.0000) 3.530 (0.0000) 3.802 (0.0000) Intercept: Legacy: Poverty rate in 1980 2.070 (0.0000) 1.898 (0.0000) 1.704 (0.0000) Large scale trend *** (***) *** (***) Spatial error parameter 0.426 (0.0000) Diagnostics: Heteroskedasticity (B-P) 440.12 688.38 951.40 Likelihood -301.05-121.92 9.26 AIC 628.10 279.85 17.49 Moran s I (residuals) 0.270 0.185-0.022

Concluding Comments Attention to spatial effects gives us better parameter estimates than OLS and a model with much improved diagnostics Combining corrections for both spatial heterogeneity and spatial dependence appears to give us a more realistic model Addition of the temporally lagged dependent variable ( legacy effect) gives us a strong additional predictor variable without seriously attenuating the contributions of other IVs w.r.t. the determinants of child poverty, attributes of place retain a strong causal influence even after controlling for family-level attributes and the temporally lagged dependent variable

Questions about the example?

Geographically Weighted Regression Again, our child poverty data

Log Odds of Proportion of Children in Poverty: 2000

Industrial Structure % Extr. Ind. % Non Dur Ind. % Misc. Svcs. % Prof. Svcs Same Model (after Friedman and Lichter, 1998) Emp. Oppy. Structure % Unemp % Males under emp. Family Structure % of families w/ children headed by females Control Variables % Hispanic % Black % HS or less % Emp. in county Log Odds % Children in Poverty

Now, we ll take a look at a particular form of spatial heterogeneity: heterogeneity in relationships Enter GWR

Geographically Weighted Regression: the analysis of spatially varying relationships A. Stewart Fotheringham, Chris Brunsdon, and Martin Charlton (John Wiley & Sons, Ltd, 2002)

GWR Software GWR 3.x Software developed by Stewart Fotheringham Martin Charlton Chris Brunsdon University of Newcastle upon Tyne (at the time)

Standard Regression y i = β + β x + β x +... + β x + 0 where: 1 1i βˆ = 2 2i ( X X T ) 1 T X y k ki ε i Contrast Geographically Weighted Regression (GWR) y i = β + β x + β x +... + β x + 0i 1i 1i 2i 2i ki ki ε i where: βˆ i = T ( X W X i ) 1 T X W y i

Schematic Showing Kernel Estimation

Basic Operation of GWR 3.x Data Ideas GWR MODEL EDITOR Creates a Control File RUN PROGRAM Listing File Output File Map Results ArcGIS Other GIS Files

Here s how it works Log Odds of Proportion of Children in Poverty: 2000

Regression Intercept Global = -4.984

T-value: Regression Intercept

Variable: Prop. H.S. Educ. or Less Global: 2.6 Local: -4.1 - +6.5

Variable: Proportion Labor Force Unemployed Global: 4.2 Local: -9.0 - +14.2

Variable: Prop. Extractive Industries Global: 2.5 Local: -14.6 - +25.5

Variable: Prop. Female Headed Households Global: 3.9 Local: -4.3 - +8.4 ns

Local R 2 Values Global R 2 : 0.713

Strengths of GWR Potentially important tool when exploring spatial data. Nothing is the same everywhere. Helps you to understand spatial heterogeneity in your data. Provides better understanding global model. Serves as a device for possibly identifying specification errors in global model (e.g., important interaction effects). Thus, GWR and local analysis becomes a potential model-building procedure Permits logistic regression and Poisson regression as well as normal regression

Okay. Pretty slick! Everyone likes this, right? Well, No

Some faults regarding GWR Very rote. Regression undertaken at each regression point without much care regarding regression assumptions Data with spatially autocorrelated residuals fit with OLS rather than spatial regression model (MLE, IV/GMM) The results are not easily amenable to tabular presentation; they often make great maps, however

But Wait There s More Usual rule in regression: n observations, k parameters; n >> k GWR fits n x k parameters with only n observations Some unusual results arising from GWR are not yet fully understood. For example, it has been observed and commented upon that GWR can sometimes generate high (negative) correlations among estimated parameters

1.0.8.6.4.2 Variables: Proportion Black vs. Proportion Nondurable Manufacturing Pearson s r =.368 PROBLCK 0.0 -.2 -.1 0.0.1.2.3.4.5 4 PRONMAN 3 2 Parameters: Proportion Black vs. Proportion Nondurable Manufacturing Pearson s r = -.729 PARM_2 1 0-1 -1.5-1.0 -.5 0.0.5 1.0 PARM_9

David Wheeler & Michael Tiefelsdorf Journal of Geographical Systems 7(2005):161-287 GWR appears to amplify regression parameter correlations present in global model One local parameter pattern can be used to predict another local parameter pattern Misspecified kernel function increases coefficient correlation Perhaps, local spatial autocorrelation among the residuals influences the parameter correlation

Readings for today Fotheringham, A. Stewart, and Chris Brunsdon. 1999. Local forms of Spatial Analysis. Geographical Analysis 31(4):340-358. Wheeler, David, and Michael Tiefelsdorf. 2005. Multicollinearity and Correlation among Local Regression Coefficients in Geographically Weighted Regression. Journal of Geographical Systems 7:161-187. Messner, Steven F., and Luc Anselin. 2004. Spatial Analyses of Homicide with Areal Data. Pp. 127-144 in Michael F. Goodchild and Donald G. Janelle (eds.) Spatially Integrated Social Science. (Oxford: Oxford University Press). O Loughlin, John, Colin Flint, and Luc Anselin. 1994. The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930. Annals of the Association of American Geographers 84(3):351-380.

Questions?

Afternoon Lab GWR hands on (in R)