Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Similar documents
Modeling Spatial Relationships Using Regression Analysis

Modeling Spatial Relationships using Regression Analysis

Using Spatial Statistics Social Service Applications Public Safety and Public Health

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Exploratory Spatial Data Analysis (ESDA)

Spatial Pattern Analysis: Mapping Trends and Clusters

Spatial Pattern Analysis: Mapping Trends and Clusters. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Summary of OLS Results - Model Variables

Spatial Pattern Analysis: Mapping Trends and Clusters

Regression Analysis. A statistical procedure used to find relations among a set of variables.

Objectives Define spatial statistics Introduce you to some of the core spatial statistics tools available in ArcGIS 9.3 Present a variety of example a

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Statistics: A review. Why statistics?

Regression Analysis of 911 call frequency in Portland, OR Urban Areas in Relation to Call Center Vicinity Elyse Maurer March 13, 2015

ESRI 2008 Health GIS Conference

Gis Based Analysis of Supply and Forecasting Piped Water Demand in Nairobi

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

The prediction of house price

ECON 497 Midterm Spring

Introduction. Part I: Quick run through of ESDA checklist on our data

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

The Simple Linear Regression Model

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

CRP 272 Introduction To Regression Analysis

Evaluating sustainable transportation offers through housing price: a comparative analysis of Nantes urban and periurban/rural areas (France)

Lecture 4. Spatial Statistics

In matrix algebra notation, a linear model is written as

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Using Spatial Statistics and Geostatistical Analyst as Educational Tools

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Steps in Regression Analysis

GeoDa and Spatial Regression Modeling

Modeling the Ecology of Urban Inequality in Space and Time

ECNS 561 Multiple Regression Analysis

Practice exam questions

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Spatial Data Mining. Regression and Classification Techniques

A geographically weighted regression

APPLICATION OF GEOGRAPHICALLY WEIGHTED REGRESSION ANALYSIS TO LAKE-SEDIMENT DATA FROM AN AREA OF THE CANADIAN SHIELD IN SASKATCHEWAN AND ALBERTA

ECON 497: Lecture Notes 10 Page 1 of 1

Introduction to Econometrics

ECON 497: Lecture 4 Page 1 of 1

Modeling the Spatial Effects on Demand Estimation of Americans with Disabilities Act Paratransit Services

Simple Regression Model (Assumptions)

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

Regression Analysis. BUS 735: Business Decision Making and Research

Regression of Inflation on Percent M3 Change

14.32 Final : Spring 2001

Introduction to statistical modeling

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

Iris Wang.

The multiple regression model; Indicator variables as regressors

Nature of Spatial Data. Outline. Spatial Is Special

Heteroscedasticity 1

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

CHAPTER 6: SPECIFICATION VARIABLES

Transit Service Gap Technical Documentation

Lecture 4: Multivariate Regression, Part 2

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Business Statistics. Lecture 9: Simple Regression

Classification & Regression. Multicollinearity Intro to Nominal Data

Ch 7: Dummy (binary, indicator) variables

Correlation and Regression Bangkok, 14-18, Sept. 2015

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

LECTURE 11. Introduction to Econometrics. Autocorrelation

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Measuring community health outcomes: New approaches for public health services research

The flu example from last class is actually one of our most common transformations called the log-linear model:

Day 4: Shrinkage Estimators

Ordinary Least Squares Regression Explained: Vartanian

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part II 1 st Nine Weeks,

Sociology 593 Exam 1 February 17, 1995

Shaping Your Neighbourhood

CRP 608 Winter 10 Class presentation February 04, Senior Research Associate Kirwan Institute for the Study of Race and Ethnicity

Chapter 3 Multiple Regression Complete Example

Lecture 4: Multivariate Regression, Part 2

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Mapping For Wellness: Then and Now

Multiple Regression Methods

Causal Inference with Big Data Sets

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Variance component models part I

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Regression I: Mean Squared Error and Measuring Quality of Fit

Concepts and Applications of Kriging. Eric Krause

multilevel modeling: concepts, applications and interpretations

An overview of applied econometrics

y response variable x 1, x 2,, x k -- a set of explanatory variables

A Second Course in Statistics: Regression Analysis

Single and multiple linear regression analysis

Regression Analysis Tutorial 34 LECTURE / DISCUSSION. Statistical Properties of OLS

Transcription:

Modeling Spatial Relationships Using Regression Analysis Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Workshop Overview Answering why? questions Introduce regression analysis - What it is and why you care - Example applications Work through solving a real problem using regression - Finding a properly specified OLS model - The 6 things you must check! - Exploring regional variation using GWR - What if? scenarios - Identifying opportunities for tailored remediation Modeling variations in per capita Medicare spending

What is Regression analysis? Model, examine, and explore spatial relationships Better understand the factors behind observed spatial patterns Predict outcomes based on that understanding Ordinary ry Least Square (OLS) Geographically ly Weighted ed Regression (GWR R) 100 80 60 40 20 0 0 Observed Values Predicted Values 20 40 60 80 100

What s the big deal? Pattern analysis (without regression): - Are there places where people persistently die young? - Where are test scores consistently high? - Where are 911 emergency call hot spots? Regression analysis: - Why are people persistently dying young? - What factors contribute to consistently high test scores? - Which variables effectively predict 911 emergency call volumes?

Regression analysis terms and concepts Dependent variable (Y): What you are trying to model or predict (e.g., residential burglary). Explanatory variables (X): Variables you believe cause or explain the dependent variable (e.g., income, vandalism, number of households). Coefficients (β): Values, computed by the regression tool, reflecting the relationship between explanatory variables and the dependent variable. Residuals (ε): The portion of the dependent variable that isn t explained by the model; the model under- and over-predictions.

Regression model coefficients Coefficient sign (+/-) and magnitude reflect each explanatory variable s relationship to the dependent variable Intercept 1.625506 INCOME -0.000030 VANDALISM 0.133712 HOUSEHOLDS 0.012425 LOWER CITY 0.136569 The asterisk * indicates the explanatory variable is statistically significant

Why use regression? Explore correlations - Does higher Medicare spending translate to better health or to better quality health care? Predict unknown values - How many claims for heat related illnesses can we expect given current weather forecasts? Understand key factors - Why are people are dying young in South Dakota?

Explore Correlations Crime Analysis - Are you more likely to be robbed in a rich or poor neighborhood? - Do we see more crime in neighborhoods where there is more vandalism ( Broken Window theory) or less vandalism?

Predictive Modeling Property values - If we have data for recent home sales, can we predict home values? Demand for services - If we know the number of 911 calls is a function of population, education, and jobs, can we use population projections to predict future demand?

Understand Key Factors Natural Resource Management - What are the most important habitat characteristics for an endangered animal? Education - What are the key factors contributing to consistently high test scores? Public Health - What variables most effectively explain high rates of childhood obesity?

OLS Regression Lauren Rosenshein Bennett

OLS analysis Why is Medicare Spending so High in the south? Are people just sicker there? e? The HCC index explained 66% of the variation in the per capita costs dependent variable: Adjusted R-Squared [2]: 0.656 However, significant spatial autocorrelation among model residuals s indicates important explanatory variables are missing from the model.

Build a multivariate regression model Explore variable relationships using the scatterplot matrix Consult theory and field experts Look for spatial variables Run OLS (this is iterative) Use Exploratory Regression

Our best OLS model Per capita Medicare costs as a function of: - Dehydration Admissions - Hospital Beds - Imaging Events - Evaluation and Management Costs - Distance to Houston This model tells 86% of the story and the over and under predictions aren t clustered! But are we done?

Check OLS results 1 Coefficients have the expected sign. 2 No redundancy among explanatory variables. 3 Coefficients are statistically significant. 4 Residuals are normally distributed. d. 5 Residuals are not spatially autocorrelated. 6 Strong Adjusted R-Square value.

1. Coefficient signs Coefficients should have the expected signs. Imaging Events per 1000 people + PQI10: Dehydration Admissions + Distance to Houston

2. Coefficient significance Look for statistically significant explanatory variables. Coefficients for all of the explanatory variables are statistically significant at the 0.05 level Since the Koenker test is statistically significant, we can only trust the robust coefficient probabilities

Check for variable redundancy Multicollinearity: - Term used to describe the phenomenon when two or more of the variables in your model are highly correlated. Variance inflation factor (VIF): - Detects the severity of multicollinearity. - Explanatory variables with a VIF greater than 7.5 should be removed one by one.

3. Multicollinearity Find a set of explanatory variables that have low VIF values. In a strong model, each explanatory variable gets at a different aspect of the dependent variable. As a rule of thumb, VIF values should be smaller than 7.5.

Checking for model bias The residuals of a good model should be normally distributed with a mean of zero The Jarque-Bera test checks model bias

4. Model bias When the Jarque-Bera test is statistically significant: - The model is biased - Results are not reliable - Sometimes this indicates a key variable is missing from the model

5. Model performance Compare models by looking for the lowest AIC value. - As long as the dependent variable remains fixed, the AIC value for different OLS/GWR models are comparable Look for a model with a high Adjusted R-Squared value.

6. Spatial Autocorrelation Statistically significant clustering of under and over predictions. Random spatial pattern ter of under and over predictions ns.

Online help is helpful! The 6 checks: Coefficients have the expected sign. Coefficients are statistically significant. No redundancy among explanatory variables. Residuals are normally distributed. Residuals are not spatially autocorrelated. Strong Adjusted R 2 (good model performance)

Also Helpful: Exploratory Regression

Are we done? A statistically significant Koenker OLS diagnostic is often evidence that Geographically Weighted Regresion (GWR) will improve model results GWR allows you to explore geographic variation which can help you tailor effective remediation efforts.

Global vs. local regression models OLS - Global regression model - One equation, calibrated using data from all features - Relationships are fixed GWR - Local regression model - One equation for every feature, calibrated using data from nearby features - Relationships are allowed to vary across the study area For each explanatory variable, able GWR creates a coefficient surface showing you where relationships are strongest st.

Exploring Regional Variation Lauren Rosenshein Bennett, MS

Running GWR GWR is a local spatial regression model Modeled relationships are allowed to vary GWR variables are the same as OLS, except: Do not include spatial regime (dummy) variables Do not include variables with little value variation

Defining local GWR constructs an equation for each feature Coefficients are estimated using nearby feature values GWR requires a definition for nearby

Defining local GWR requires a definition for nearby Kernel type Fixed: Nearby is determined by a fixed distance band Adaptive: Nearby is determined by a fixed number of neighbors Bandwidth method AIC or Cross Validation (CV): GWR will find the optimal distance or optimal number of neighbors Bandwidth parameter: Userprovided distance or userprovided number of neighbors

Interpreting GWR results Compare GWR R2 and AICc values to OLS R2 and AICc values - The better model has a lower AICc and a high R2. Model predictions, residuals, standard errors, coefficients, and condition numbers are written to the output feature class.

Mapped Output Residual maps show model under- and over-predictions. They shouldn t be clustered. Coefficient maps show how modeled relationships vary across the study area.

Mapped Output Maps of Local R 2 values show where the model is performing best To see variation in model stability: apply graduated color rendering to Condition Numbers

GWR prediction Calibrate the GWR model using known values for the dependent variable ab and all of the explanatory variables es. Observed Modeled Predicted Provide a feature class of prediction locations containing values for all of the explanatory variables. GWR will create an output feature class with the computed predictions ns.

Resources for learning more www.esriurl.com/spatialstats - Short videos - Articles and blogs - Online documentation - Supplementary model and script tools - Hot spot, Regression, and ModelBuilder tutorials resources.arcgis.com QUESTIONS? LBennett@Esri.com LScott@Esri.com