Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

Similar documents
Regression Analysis. A statistical procedure used to find relations among a set of variables.

Exploratory Spatial Data Analysis (ESDA)

ESRI 2008 Health GIS Conference

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

BIG IDEAS. Area of Learning: SOCIAL STUDIES Urban Studies Grade 12. Learning Standards. Curricular Competencies

Everything is related to everything else, but near things are more related than distant things.

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Urban GIS for Health Metrics

A Space-Time Model for Computer Assisted Mass Appraisal

Geographically Weighted Regression Using a Non-Euclidean Distance Metric with a Study on London House Price Data

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

Geographically Weighted Regression and Kriging: Alternative Approaches to Interpolation A Stewart Fotheringham

Modeling Spatial Relationships Using Regression Analysis

Modeling Spatial Relationships using Regression Analysis

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Notes On: Do Television and Radio Destroy Social Capital? Evidence from Indonesian Village (Olken 2009)

Introduction to Spatial Statistics and Modeling for Regional Analysis

Spatial Trends of unpaid caregiving in Ireland

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Midterm 1 ECO Undergraduate Econometrics

Environmental Criminology

Daniel Fuller Lise Gauvin Yan Kestens

MODULE 1 INTRODUCING THE TOWNSHIP RENEWAL CHALLENGE

Mapping Welsh Neighbourhood Types. Dr Scott Orford Wales Institute for Social and Economic Research, Data and Methods WISERD

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Engagement on Strategies to Overcome Inequality

Edexcel Geography Advanced Paper 2

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Modeling the Spatial Effects on Demand Estimation of Americans with Disabilities Act Paratransit Services

Spatial segregation and socioeconomic inequalities in health in major Brazilian cities. An ESRC pathfinder project

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

ESTIMATE THE REGRESSION COEFFICIENTS OF VARIABLES SPL. REFERENCE TO FERTILITY

Introduction to Linear Regression Analysis

Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru

Geographically weighted regression approach for origin-destination flows

Statistics: A review. Why statistics?

USING DOWNSCALED POPULATION IN LOCAL DATA GENERATION

Regression Models. Chapter 4

Urban Residential Land Value Analysis: The Case of Potenza

MATH 1150 Chapter 2 Notation and Terminology

GEOG 3340: Introduction to Human Geography Research

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

What is Human Geography? HUMAN GEOGRAPHY. Human Geography. Human Geography 5/18/2015. Example of Differences: Hurricane Katrina

Spatial Patterns for Crime Spots

Sociology 593 Exam 2 Answer Key March 28, 2002

Measuring Disaster Risk for Urban areas in Asia-Pacific

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

GOVERNMENT MAPPING WORKSHOP RECOVER Edmonton s Urban Wellness Plan Mapping Workshop December 4, 2017

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Ch 7: Dummy (binary, indicator) variables

Nature of Spatial Data. Outline. Spatial Is Special

Technical Track Session I:

AP Final Review II Exploring Data (20% 30%)

Sociology 593 Exam 2 March 28, 2002

CRP 272 Introduction To Regression Analysis

Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis

A spatial literacy initiative for undergraduate education at UCSB

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

MOVING WINDOW REGRESSION (MWR) IN MASS APPRAISAL FOR PROPERTY RATING. Universiti Putra Malaysia UPM Serdang, Malaysia

The prediction of house price

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Section 3: Simple Linear Regression

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral

Energy Use in Homes. A series of reports on domestic energy use in England. Energy Efficiency

The Cost of Transportation : Spatial Analysis of US Fuel Prices

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Regional Snapshot Series: Transportation and Transit. Commuting and Places of Work in the Fraser Valley Regional District

Regression Analysis. BUS 735: Business Decision Making and Research

CHANGES IN THE STRUCTURE OF POPULATION AND HOUSING FUND BETWEEN TWO CENSUSES 1 - South Muntenia Development Region

Evaluating sustainable transportation offers through housing price: a comparative analysis of Nantes urban and periurban/rural areas (France)

One Economist s Perspective on Some Important Estimation Issues

Interpolating Raster Surfaces

Sociology 593 Exam 1 February 17, 1995


Practice Questions for Exam 1

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

Rockefeller College University at Albany

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Geographically weighted methods for examining the spatial variation in land cover accuracy

Spatial Statistics For Real Estate Data 1

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Songklanakarin Journal of Science and Technology SJST R1 Sifriyani

VIDEO: The World In A Box: Geographic Information Systems

Linear Regression Communication, skills, and understanding Calculator Use

Explorative Spatial Analysis of Coastal Community Incomes in Setiu Wetlands: Geographically Weighted Regression

Technical Track Session I: Causal Inference

9. Linear Regression and Correlation

Linear Regression With Special Variables

convenience truth 4 concepts and their SYNTHESIS a convenience truth 5.0

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

Chapter 4. Regression Models. Learning Objectives

1988 Chilean Elections

Transcription:

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1 Geographically Weighted Regression Chelsey-Ann Cu 32482135 GEOB 479 L2A University of British Columbia Dr. Brian Klinkenberg 9 February 2018

GEOGRAPHICALLY WEIGHTED REGRESSION 2 I. What is Geographically Weighted Regression? Regression consists of a large range of modeling spatial relationships between independent and dependent variables. In a regression analysis, one would normally make the assumption that the relationship that is being modelled is uniform throughout the study area. This entails that the dependent variable (y) is directly related to the independent variable (x) within the estimated parameters (β0 and β1) (y = β0 + β1x1 + ε) (Charlton, Fotheringham, & Brunsdon, 2009). This is estimated in a way that to minimize the sum of the residuals for the observations. This model is usually fitted using an Ordinary Least Squares (OLS) procedure. OLS is a global method of linear regression that generates predictions or models a dependent variable in terms of its relationship to a set of explanatory variables. An OLS estimator works with the formula: β = (X T X) -1 X T y, where β is the vector of the estimated parameters, X is the designated matric which contains the values of the independent variables and a column of 1s, y is the vector of observed values, and (X T X) -1 is the inverse of the variance-covariance matrix (Charlton, Fotheringham, & Brunsdon, 2009, p. 1). A goodness of fit test is usually used in this method to check the ability of a model to replicate the observed values. This is expressed as R 2, a value which goes from 0 to 1, measuring the proportion of variation in the observed values which is accounted for by the variation in the model. This basic regression model makes assumptions that observations are independent of one another. Unfortunately, this idealized situation is not always the case. With geography, there is an assumption that phenomena will vary across space. Tobler expressed that Everything is related to everything else, but near things are more related than distant things (Tobler, 1970, p. 236). This means that both variables and residuals in a given model could show spatial dependence. As a result, parameters that are estimated with linear models of regression could be insufficient and

GEOGRAPHICALLY WEIGHTED REGRESSION 3 biased if there is a spatial aspect involved. Another issue with linear regression analysis is that it assumes the relationship being modelled is the same throughout the study area. In fact, it may be the case that the processes that create these relationships that one is interested in may change with space. Geographically Weighted Regression (GWR) is a method for exploratory spatial data analysis (Charlton, 2009). With GWR the relationship takes into account the location (u) in the study area as an input parameter and tact on spatial coordinates with the data points. This creates a new formula: y(u) = β0(u) + β1x1(u) + ε(u) (Charlton, Fotheringham, & Brunsdon, 2009). Here, the locations are assumed to be where the data is collected from, allowing a separate estimate of the parameters to be mapped. With GWR taking into account local geography, it ensures that observation points that are near to each other have a greater effect (weight) in the estimation than points that are further away. Weights are for a computation of a weighing scheme (also known as a kernel) which have bandwidths that grow larger as the GWR model approaches the OLS model (Charlton, Fotheringham, & Brunsdon, 2009). GWR analysis is a useful analysis tool that can be used for a variety of different purposes. It can be used to better understand phenomenon through looking at how changes in one variable can cause changes in another. Moreover, it can create a consistent and accurate model which is useful in predicting phenomenon. However, it is important to remember that with GWR, all independent variables that affect the dependent variable must be taken into account, otherwise the model produced in the analysis will be an inaccurate representation. The accuracy of a GWR model can be determined by looking at the residuals, like in an OLS model.

GEOGRAPHICALLY WEIGHTED REGRESSION 4 II. Case Study: Vancouver Children s Social Scores This study takes into account various neighbourhood factors and their relationship to the development of children s social skills in Vancouver. The analysis consisted of using explanatory regression to determine the variables that affect social skills the most. The results identified that gender, language and income were the best variables to use. Then an Ordinary Least Squares analysis was preformed to determine the level of correlation in the absence of spatial information (see Figure 1). Finally, a Geographically Weighted Regression analysis was done to determine the spatial correlation between children s social skills and the given parameters (See Figure 2). Figure 1. Regression Analysis for Social Scores of Children in Vancouver, BC. This map shows the standard residuals of the OLS analysis results. The adjusted R 2 value is 0.374.

GEOGRAPHICALLY WEIGHTED REGRESSION 5 Figure 2. Regression Analysis for Social Scores of Children in Vancouver, BC. This map shows the results of a GWR analysis based on the local R 2 value to show the accuracy of the predicted model. The most accurate points are in red, while the least accurate are in green. The results of both analyses were then compiled and using the absolute value of the differences in predicted values. A map was created to determine the difference in regression between the two types of analyses (see Figure 3). The figure shows that mostly there is a large discrepancy between OLS and GWR results in the east side of Vancouver, where the majority of the red points are. On the west side of the map, there is less discrepancy, showing that the two models fit well with one another in this particular instance. This demonstrates that the OLS

GEOGRAPHICALLY WEIGHTED REGRESSION 6 model and the GWR model are fairly similar so long as there is no relevant spatial factor in the analysis. Figure 3. The Difference in Regression Analysis. This was computed by taking the absolute value of the difference between the estimated/predicted values of the GWR and OLS that was calculated for each child.

GEOGRAPHICALLY WEIGHTED REGRESSION 7 In order to look further into the spatial contexts and the given parameters, a grouping analysis was performed. This took natural clusters in the data and clumped them together to form 4 groups. They are as followed: Table 1. Grouping Analysis for Enumeration Areas. Income ($) Neighbourhood families with > 4 members (%) Children spend > 30 hours in childcare (%) Neighbourhood families that are single parents (%) Neighbourhood immigrants that have spent < 5 years in Canada (%) Group 1 High High Average Low Average Group 2 Low High High High Average Group 3 Average Low Low Low Low Group 4 Average Average Average Average High Through grouping similar neighbourhood characteristics, the variation in social skills in terms of another factor can be more easily pinpointed. Figure 4 shows the relationship between 5 different factors and their impacts on the social scores of children. Here, one can see that the east side of Vancouver is predominantly Group 2 while the downtown area is dominated by Group 3 and the west side is a mixture of Group 1 and 3. For the most part, children living in the red areas (Group 2) experience larger negative impacts to social scores with the different variables (income, gender, and language scores). This can be seen in the coming analysis.

GEOGRAPHICALLY WEIGHTED REGRESSION 8 Figure 4. Grouping Analysis for Enumeration Areas. This map corresponds with the categories laid out in Table 1. The analysis shows which neighbourhoods have similar characteristics in terms of: income, family size, childcare, single parents, and immigrants. Areas that are similar in variables are grouped together. For the purposes of this analysis, the points showing the differences between OLS and GWR will overlay the data to show the difference in spatial and aspatial analysis (Figure 5, 7, 9). The maps showing there results of the GWR by the Local R 2 values overlay the layers show the accuracy of the results (Figure 6, 8, 10). Each of the maps look at a particular parameter given the assumption that the others do not affect children s social scores.

GEOGRAPHICALLY WEIGHTED REGRESSION 9 The following map indicates that there is a connection between income and a child s social skills (see Figure 5). For this parameter, every increase in $1000 will positively or negatively affect children s social scores by either decreasing their score by 2 points (red areas) or increasing their score by 2 points (green areas). Income has a lower impact on a child s social skills in the east side of Vancouver as opposed to the west side of Vancouver. This could be because with higher income, children can afford additional opportunities to socialize, such as extra-curricular activities (Richard & Dodge, 1982). In opposition, children living in lower income areas are not afforded the same opportunities due to limits in factors such as family funds or parental time constraints. In certain areas, like the east side of Vancouver, where income varies greatly among neighbours, social scores decrease substantially with change in income.

GEOGRAPHICALLY WEIGHTED REGRESSION 10 Figure 5. Relationship between Income and Social Scores with the difference in regression analysis. For every thousand dollars of increase in income, there is the above represented change in social scores. The Local R 2 values from the GWR analysis show that the results around the Downtown, Kitsilano and East Vancouver region area are the most accurate (see Figure 6). This is because R 2 is the proportion of variation in the dependent variable (social skills) that is explained by the model. It is measured from 0 to 1 with a number closer to one showing the higher accuracy in data. In the case of Figure 6, the accurate points are shown in red. With green being points with higher uncertainty.

GEOGRAPHICALLY WEIGHTED REGRESSION 11 Figure 6. Relationship between Income and Social Scores with the GWR by local R 2. Furthermore, analysis showed that there is a relation between the social scores and language scores of children (see Figure 7). In the east side of Vancouver, language scores seemed to have a higher impact on social skills than it does on the west side. This could be attributed to the social factors of grouped areas that were previously discussed. The map below shows that there are clusters of areas where slightly higher language scores correlate to higher social scores especially in mid-eastern Vancouver. These are areas with lower incomes, larger families and larger amounts of single parents (see Figure 4). Also, in mid-west Vancouver, there is a large area where language has a low effect on children s social skills. This could be

GEOGRAPHICALLY WEIGHTED REGRESSION 12 attributed to less accurate data in general (see Figure 8), or a shift in population dynamics such as a community that has more social interaction with neighbours. It is also important to note that there is overall very little variation in language scores affecting social scores, as seen on the legend scale numbers on the map (see Figure 7, 8). Figure 7. Relationship between Language Scores and Social Scores with difference in regression analysis. For every one unit of increase in language score, there is the above represented change in social scores.

GEOGRAPHICALLY WEIGHTED REGRESSION 13 Figure 8. Relationship between Language Scores and Social Scores with the GWR by local R 2. Finally, the relationship social scores and gender was also analyzed (see Figure 9). On the map, red indicates areas where being a female negatively impacts social scores by decreasing their scores up to 17 points. Green indicates areas where being female positively impacts social score by 2 points. This map can be analyzed next to the income map (Figure 5) as showing counteracting variables. For instance, in areas where being female increases social scores, the effects of a thousand dollars difference in income largely decreases social scores. This can also be compared to language scores, were in the same area that being female positively impacts

GEOGRAPHICALLY WEIGHTED REGRESSION 14 social scores, high language scores also increase social scores. One can assume that these areas with extreme values have data points that are more susceptible to variability in parameters. Figure 9. Relationship between Gender and Social Scores with difference in regression analysis. This map represents a change in scores if the child is female versus male and the corresponding shift in social score due to gender.

GEOGRAPHICALLY WEIGHTED REGRESSION 15 Figure 10. Relationship between Gender and Social Scores with the GWR by local R 2. III. Other Applications of GWR GWR is a valuable analysis tool in many other situations than the example highlighted in the case study. Since most variables that are analyzed for correlation represent things in the real world which are inherently spatial, GWR adds the missing spatial factor to the analysis to enhance accuracy. It is only reasonable to assume that any set of variables, which exist with real world locations, are impacted by location as a factor and therefore, it should be included as a parameter in the analysis. Below are several examples in of applications in GWR which further highlights the benefits of implementing the model.

GEOGRAPHICALLY WEIGHTED REGRESSION 16 A. Housing Costs An example of GWR at work, is with real estate pricing. The limitations with linear models is this field is that they often over- or underestimate the asking prices in some neighbourhoods (Legg & Bowe, 2009). In order to improve on the model accuracy, GWR is used to eliminate some of the residual errors. Legg and Bowe (2009) created a study in which took 93 homes that were listed on the market and applied a linear regression and analysis and GWR based on location, and square footage. With a GWR model, the measured R 2 had higher value meaning an increase in accuracy. Their results showed that lot value coefficients indicate that as lots are located nearer the urban core and farther from the rural townships, lot square footage price increases. In contrast, coefficients suggest that the larger the house, the less it contributes to the listing price (Legg & Bowe, 2009, p. 45). B. Crime Other phenomenon, such as crime distribution, across cities can also be analyzed with a GWR model. Cameron et al., (2016) look at the relationship between the location of alcohol outlets and the location of violent crimes in New Zealand. Here, they used data on policeattended violent incidents at the census area. level Their results confirmed that there are more violent events that occur closer to establishments that sell liquor. Through a spatial analysis they were able to identify that places that certain places, such as bas or night clubs, are associated with an increase in violence annually. Their findings are important because they show areas where intervention is most crucial to minimizing alcohol-related violence.

GEOGRAPHICALLY WEIGHTED REGRESSION 17 C. Social Justice Further applications of GWR can be seen in differential access to resources. Tsiko (2016), explores the spatial variation of factors that affect women s access to land in Africa. The study found that less educated or HIV-positive women are more likely to be given access to own land rather than educated women. Also, an increase in population density, negatively affected women s ability to access to family land. These factors were investigated further to try to identify how they affected landownership. IV. Conclusion As identified by the case study and the further examples given, Geographically Weighted Regression is a useful spatial analysis method that can be used in a variety of contexts. It is important to account for spatial variability in analyzing and explaining phenomenon, given that patters vary with geography. In other words, the nearness of an object oftentimes affects how and what results will occur.

GEOGRAPHICALLY WEIGHTED REGRESSION 18 References Cameron, M. P., et al. (2016). Alcohol Outlet Density and Violence: A Geographically Weighted Regression Approach. Drug and Alcohol Review, 35(3), 280-288. doi:10.1111/dar.12295 Charlton, M. (2009). GWR Explained. Retrieved from Geographically Weighted Modelling: http://gwr.maynoothuniversity.ie/what-is-gwr/ Charlton, M., Fotheringham, S., & Brunsdon, C. (2009). Geographically weighted regression. White paper. National Centre for Geocomputation. National University of Ireland Maynooth. Legg, R. J., & Bowe, T. (2009). Applying geographically weighted regression to a real estate problem, an example from Marquette, Michigan. ArcUser, 44-45. Richard, B. A., & Dodge, K. A. (1982). Social maladjustment and problem solving in schoolaged children. Journal of Consulting and Clinical Psychology, 50(2), 226. Tsiko, R. (2016). Geographically Weighted Regression of Determinants Affecting Women s Access to Land in Africa. Geosciences, 6(1), 16. doi:10.3390/geosciences6010016 Tobler, WR, 1970, A computer movie simulating urban growth in the Detroit region, Economic Geography, 46(2), 234-240.