Geographically Weighted Regression LECTURE 2 : Introduction to GWR II

Geographically Weighted Regression LECTURE 2 : Introduction to GWR II Stewart.Fotheringham@nuim.ie http://ncg.nuim.ie/gwr

A Simulation Experiment Y i = α i + β 1i X 1i + β 2i X 2i Data on X 1 and X 2 drawn randomly for 2500 locations on a 50 x 50 matrix s.t. r(x 1, X 2 ) is controlled. Results shown to be independent of r(x 1,X 2 ) Experiment 1: (parameters spatially invariant) α i = 10 for all i β 1i = 3 for all i Β 2i = -5 for all i Y i obtained from above Data used to calibrate model by global regression and by GWR

Results Global: Adj. R 2 = 1.0 AIC = -59,390 K = 3 α (est.) = 10; β 1 (est.) = 3; β 2 (est.) = -5 GWR: Adj. R 2 = 1.0 AIC = -59,386 K = 6.5 N = 2,434 α i (est.) = 10 for all i β 1i (est.) = 3 for all i β 2i (est.) = -5 for all i Conclusion: GWR does NOT appear to suggest any spurious nonstationarity when relationships are constant

Experiment 2: (parameters spatially variant) 0 i 50 0 j 50 α i = 0 + 0.2i + 0.2j 0 to 20 β 1i = -5 + 0.1i + 0.1j -5 to 5 Β 2i = -5 + 0.2i + 0.2j -5 to 15 Y i obtained in same way Data used to calibrate model by global regression and by GWR

Results Global: Adj. R 2 = 0.04 AIC = 17,046 K = 3 α (est.) = 10.26; β 1 (est.) = -0.1; β 2 (est.) = 5.28 These are close to the averages of the local estimates (10;0;5) GWR: Adj. R 2 = 0.997 AIC = 2,218 K = 167 N = 129 α i (est.) range = 2 to 18.6 β 1i (est.) range = -4.3 to 4.7 β 2i (est.) range = -3.9 to 13.6 Conclusion: GWR identifies spatial nonstationarity in relationships; global model fails completely.

0 α(i) 20-5 β1(i) 5-5 β2(i) 15

An Empirical Example - House Prices in London 1990 sales price data for 12,493 houses in London (excludes houses sold below market value) along with various attributes of each property and a postcode so locations down to 100m can be obtained via the Central Postcode Directory neighbourhood data obtained for enumeration districts (via postcode-to- ED LUT)

Locations of house sales in data set

To what extent are differences in average house prices a function of differences in the intrinsic value associated with different areas and to what extent are they due to different mixes of properties? To answer this, we need regression techniques to account for variations in housing attributes so that we can derive a comparable value per sq.m.

Basic premise: P i = f [S(i), N(i)] Lancaster (1966) J. Political Economy Overviews: (very popular technique) Meen and Andrew (1998) Modelling Regional House Prices: A Review of the Literature DETR Orford (1999) Valuing the Built Environment: GIS and House Price Analysis Ashgate: Aldershot. Issues: Hedonic Price Modelling Almost all applications are global, implying no coefficient variation over space whereas several authors have argued that the assumption of uniform price coefficients is unrealistic even within a single metropolitan area.

Global Regression Parameter Estimates Variable Parameter T value Estimate Intercept 58,900 23.3 FLRAREA 697 49.3 FLRDETACH* 205 7.5 FLRFLAT* -123-5.6 FLRBNGLW* -87-1.4 FLRTRRCD* -119-6.2 BLDPWW1** -2,340-3.9 BLDPOSTW** -2,786-3.1 BLD60S** -5,177-5.0 BLD70S** -2,421-2.1 BLD80S** 6,315 6.9 GARAGE 5,956 10.6 CENHEAT 7,777 12.4 BATH2+ 22,297 19.1 PROF 72 3.0 UNEMPLOY -211-5.5 ln(distcl) -18,137-30.1 R 2 = 0.60 * Excluded house type is Semi-detached ** Excluded age is Inter-war 1914-1939

Price / Square Metre of Various House Types Estimated from the Global Regression Results House Type Price / Sq. M. ( ) Detached 902 Semi-Detached 697 Bungalow 610 Terraced 578 Flat 574

Price Comparisons of equivalent houses by age built Period of Housi ng Pre- 1914 Pre- 1914 1914-1939 1940-1959 1960-1969 1970-1979 1980-1989 - -2,340 446 2,837 81-8,655 1914-1939 1940-1959 1960-1969 1970-1979 1980-1989 2,340-2,786 5,177 2,421-6,315-446 -2,786-2,391-365 -9.101-2,837-5,177-2,391 - -2,756-11,492-81 -2,421 365 2,756 - -8,736 8,655 6,315 9,101 11,492 8,736 -

However, these are all global results, i.e. averages over the whole of London. Might there be differences across London in some of these relationships?

Using GWR In this case an adaptive kernel is used - a bisquare function Calibration yielded an optimal number of nearest neighbours = 931 Results presented in a series of parameter surfaces - those shown all have significant spatial variation

Value of terraced property /m 2 (global estimate = 578)

Pre-1914 housing compared to inter-war (global estimate = -2,340)

1960s housing compared to inter-war (global estimate = -5,177)

10 Reasons Why You Might want to use GWR in Your Research

1. Conforms to different philosophical approaches A post-modernist view : Relationships intrinsically different across space e.g. differences in attitudes, preferences or different administrative, political or other contextual effects produce different responses to the same stimuli A positivist view : Global statements can be made but models not properly specified to allow us to make them. GWR is a good indicator of when and in what way a global model is misspecified. Can all contextual effects ever be modelled?

2. GWR is part of a growing trend towards local analysis (as opposed to traditional global types of analysis) Local statistics are spatial disaggregations of global statistics Global Local similarities across space single-valued statistics non-mappable GIS unfriendly search for regularities aspatial differences across space multi-valued statistics mappable GIS friendly search for exceptions spatial

3. Provides useful link to GIS GIS are very useful for the storage, manipulation and display of spatial data They are less useful for the analysis of spatial data Have been repeated calls for this to change In some cases the link between GIS and spatial analysis has been a step backwards One important way the situation can be improved is to develop better spatial analytical tools that can take advantage of the features of GIS

An important catalyst for the better integration of GIS and spatial analysis has been the development of local spatial statistical techniques Chief among these has been the development of Geographically Weighted Regression (GWR)

4. GWR is widely applicable to almost any form of spatial data Link between health and wealth Modelling presence/absence of a disease Examining spatial patterns of a disease (e.g. GW log odds ratio) Educational attainment levels Determinants of house prices Determinants of critical load variations in lakes Urban temperature variations Economic performance indicators

5. GWR is a truly spatial technique It uses locational information as well as attribute information It employs a spatial weighting function with the assumption that near places are more similar than distant ones. The outputs are location-specific and geocoded so they can easily be mapped and subject to further spatial analysis

6. Residuals from GWR are generally much lower and are not spatially autocorrelated GWR models give much better fits to data, even accounting for increases in number of parameters GWR residuals are generally not spatially autocorrelated so reducing/removing the need for spatial regression models

Residuals from Global Model

Residuals from GWR Model

7. User-friendly software for GWR (GWR 3.0) makes it simple Currently about 7,000 lines of FORTRAN code VB front-end to create a control file and run the program in Windows Can, if you want, run the code directly under Unix with a control file

8. The concept of Geographical Weighting can be extended to many other statistics In GWR, weight around a given point is based on a kernel. However, regression is not the only technique in which weighting can be applied

Most descriptive statistics can be geographically weighted Continuous Univariate Mean, Standard Deviation, Skewness, Median, Interquartile Range Bi/Multivariate Correlation Coefficient, Regression Coefficients Discrete Proportions Odds Ratios

9. Extensions of Geographically Weighting can be applied to other modelling techniques GW Poisson regression GW logistic models GW kernel density estimation GW principal components analysis

10. Finally, can use GWR as a Spatial Microscope Instead of determining an optimal bandwidth during the calibration of a GWR model, a bandwidth can be input a priori. A series of bandwidths can be selected and the resulting parameter surfaces examined at different levels of smoothing For example, consider a very simple model of house prices regressed on floor area for 570 houses in Tyne & Wear, North East England. Surfaces of the local floorspace parameter are derived for bandwidths corresponding to 400, 350, 300, 250, 200, 150, 100 and 50 NN

400

350

300

250

200

150

100

Summary GWR appears to be a useful method to investigate spatial non-stationarity - simply assuming relationships are stationary over space is no longer tenable GWR can be likened to a spatial microscope - allows us to see variations in relationships that were previous unobservable Can use GWR as a model diagnostic or to identify interesting locations for investigation. Windows-based software makes it easy to apply to any spatial data set.

End of presentation