Prospect. February 8, Geographically Weighted Analysis - Review and. Prospect. Chris Brunsdon. The Basics GWPCA. Conclusion

Similar documents
Statistics: A review. Why statistics?

ESRI 2008 Health GIS Conference

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Using Spatial Statistics Social Service Applications Public Safety and Public Health

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

Exploratory Spatial Data Analysis (ESDA)

Modeling Spatial Relationships using Regression Analysis

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

The GWmodel R package: Further Topics for Exploring Spatial Heterogeneity using Geographically Weighted Models

Section 2.2 RAINFALL DATABASE S.D. Lynch and R.E. Schulze

Correlation and Regression

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Modeling Spatial Relationships Using Regression Analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Geographical General Regression Neural Network (GGRNN) Tool For Geographically Weighted Regression Analysis

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

AMS 7 Correlation and Regression Lecture 8

Geographically Weighted Regression LECTURE 2 : Introduction to GWR II

Geographically weighted regression approach for origin-destination flows

Bootstrapping, Randomization, 2B-PLS

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Categorical Predictor Variables

A Short Note on the Proportional Effect and Direct Sequential Simulation

ANALYSIS OF LARGE SCALE SOIL SPECTRAL LIBRARIES

Community Health Needs Assessment through Spatial Regression Modeling

Statistical downscaling daily rainfall statistics from seasonal forecasts using canonical correlation analysis or a hidden Markov model?

Statistical View of Least Squares

Geographically Weighted Regression (GWR)

STATISTICAL LEARNING SYSTEMS

Data Mining Based Anomaly Detection In PMU Measurements And Event Detection

Experimental Design and Data Analysis for Biologists

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Multimodel Ensemble forecasts

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

Geographically Weighted Regression as a Statistical Model

Simple Linear Regression

Appendix A : rational of the spatial Principal Component Analysis

Geospatial dynamics of Northwest. fisheries in the 1990s and 2000s: environmental and trophic impacts

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Linear Model Selection and Regularization

Functional time series

Measuring the fit of the model - SSR

STA 2101/442 Assignment Four 1

Single and multiple linear regression analysis

EXTENDING PARTIAL LEAST SQUARES REGRESSION

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

The Degree of Standardisation in the An Sơn Ceramic Assemblage

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Regime switching models

Processing Big Data Matrix Sketching

The GWmodel R package: further topics for exploring spatial heterogeneity using geographically weighted models

Bulyanhulu: Anomalous gold mineralisation in the Archaean of Tanzania. Claire Chamberlain, Jamie Wilkinson, Richard Herrington, Ettienne du Plessis

Treatment of Data. Methods of determining analytical error -Counting statistics -Reproducibility of reference materials -Homogeneity of sample

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

2.5 Forecasting and Impulse Response Functions

Small Sample Corrections for LTS and MCD

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Dimensionality Reduction Techniques (DRT)

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

Regression diagnostics

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Ecological indicators: Software development

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Simple and Multiple Linear Regression

Lecture 4: Regression Analysis

Hunting for Anomalies in PMU Data

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

SIMULATION AND APPLICATION OF THE SPATIAL AUTOREGRESSIVE GEOGRAPHICALLY WEIGHTED REGRESSION MODEL (SAR-GWR)

Explorative Spatial Analysis of Coastal Community Incomes in Setiu Wetlands: Geographically Weighted Regression

Classification 2: Linear discriminant analysis (continued); logistic regression

Econometrics 2, Class 1

Inter Item Correlation Matrix (R )

Spatial Regression. 6. Specification Spatial Heterogeneity. Luc Anselin.

Multivariate Statistics

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Modelling Non-linear and Non-stationary Time Series

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Introduction to Machine Learning

ST430 Exam 1 with Answers

ReducedPCR/PLSRmodelsbysubspaceprojections

MULTI-VARIATION ANALYSIS AND OPTIMISATION OF ELECTRICAL CONDUCTIVITY OF MnO-SiO 2 -CaO SLAGS

Regression Retrieval Overview. Larry McMillin

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Satellite and gauge rainfall merging using geographically weighted regression

Exam Applied Statistical Regression. Good Luck!

Principal component analysis for compositional data with outliers

Chapter 3: Examining Relationships

Lecture 14 Simple Linear Regression

Transcription:

bruary 8, 0

Regression (GWR) In a nutshell: A local statistical technique to analyse spatial variations in relationships Global averages of spatial data are not always helpful: climate data health data This problem can also occur with global statistics that measure relationships in spatial data. regression correlation

Spatial Non-Stationarity Spatial Non-Stationarity occurs when a relationship (or pattern) that applies in one region does not apply in another Global models are statements about processes or patterns which are assumed to be stationary and as such are location independent are assumed to apply in all locations Local models are spatial disaggregations of global models, the results of which are location-specific The template of the model is the same - but the specifics may alter. i.e. The model may always be a linear regression model with certain variables, but the coefficients alter geographically

Spatial Non-Stationarity Spatial Non-Stationarity occurs when a relationship (or pattern) that applies in one region does not apply in another Global models are statements about processes or patterns which are assumed to be stationary and as such are location independent are assumed to apply in all locations Local models are spatial disaggregations of global models, the results of which are location-specific The template of the model is the same - but the specifics may alter. i.e. The model may always be a linear regression model with certain variables, but the coefficients alter geographically The above is essentially a description of GWR

An Example of Spatial Non-Stationarity - % With Degrees % Foreign Born < 6.5 6.5 7.8 7.8 9.0 9.0 0.0 0.0.0.0 5.0 > 5.0 < 0. 0. 0.4 0.4 0.6 0.6 0.8 0.8...0 >.0 Georgia State (Source: US Census 990)

An Example of Spatial Non-Stationarity - Northing 400000 500000 600000 700000 800000 0 4 5 6 7 0 4 5 6 7 % With Degrees 5 5 5 5 5 5 5 5 0 4 5 6 7 % Foreign Born

The GWR Model Standard Global Regression y i = α + β x i + β x i +... + ε i where ε i N(0, σ ) Regression y i = α(u i, v i ) + β (u i, v i )x i + β (u i, v i )x i +... + ε i where ε i N(0, σ ) and (u i, v i ) is the location of observation i Note: the coefficients in GWR are now functions, not variables.

A Calibration gorithm - For a given point (u, v): Consider a window of radius h Calibrate regression just using data falling in that window. Scanning the window across the study area gives a surface of regression parameters...

A Calibration gorithm - To avoid sudden jumps when scanning the window: We use a weighted regression calibration. Hence Regression.

A Calibration gorithm - Weighting Details: A possible Scheme { } d if d < h h w i (u, v) = 0 otherwise where d = (u u i ) + (v v i ). h is called the bandwidth. Other weighting functions could be used - i.e. Gaussian Results generally more sensitive to h than choice of weighting function.

A Calibration gorithm - 4 Calibration Formula { ˆβ(u, v) = X T W(u, v)x} X T W(u, v)y where W = Diagonal(w (u, v), w (u, v),..., w n (u, v)) X is the matrix of independent variables. y is the vector of the dependent variable. cf Global Regression Formula { ˆβ = X X} T X T y

A Calibration gorithm - 5 An extension of the method Use a different bandwidth in different places - i.e. h(u, v) Typically, bandwidth at (u, v) is distance to kth nearest neighbour. Useful if density of observations is variable - e.g. urban/rural.

Over- and Under- Fitting 0. 0. 0.4 0.6 0.8.0. 0. 0. 0.4 0.6 0.8.0. 0.0 0.5.0.5.0 0. 0. 0.4 0.6 0.8.0. 0. 0. 0.4 0.6 0.8.0. 0.0 0.5.0.5.0 0.0 0.5.0.5.0 0.0 0.5.0.5.0

Cross-Validation and h RMS Prediction Error.9 4.0 4. 4. Cross Validation Example 00 50 00 50 00 Bandwidth h (km) cross-validation - fit the model to a holdback sample using the remaining data for a range of h-values, then find the h-value that is the best predictor.

Results of GWR - Slope Slope Coefficient <...8.8.9.9.9.9 4. 4. 4. > 4.

Results of GWR - Intercept Intercept Coefficient < 6.9 6.9 7. 7. 7. 7. 7.6 7.6 7.9 7.9 8. > 8.

Results of GWR - Slope Slope Coefficient - Using Grid Sampling <...8.8.9.9.9.9 4. 4. 4. > 4.

Further Issues Local standard error - reliability of estimates Significance testing - Monte Carlo approaches H 0 : No spatial variation in coefficients H : GWR assumption is true Tests whether GWR assumption is valid Could also be used to justify global models on occasions Multivariate GWR

Results of GWR - Multivariate % Foreign Born % Elderly ( 65) < 0.9 0.9.7.7...4.4.7.7.9 >.9 < 0.6 0.6 0.4 0.4 0.4 0.4 0. 0. 0. 0. 0. > 0. Note - adding extra variables can alter interpretation due to correlation between predictors. Just like in other kinds of regression...

PCA Multivariate relationships: Issues with collinearity Treating variables symmetrically Multivariate outliers Principal Components Identifies collinearity: Based on Σ-matrix of several variables Can identify multivariate outliers

PCA as Model - y 4 0 4 Comparison: OLS Regression 4 0 4

PCA as Model - y 4 0 4 Comparison: PCA 4 0 4

Interpretation PCA is a kind of line fitting algorithm Based on perpendicular distances. Error to be minimised is based on fitting both x and y, not just y. Residuals are the perpendicular distances mentioned above The equation of the best fit line gives the loadings on each variable for PC The projection of the points on the line correspond to the scores for PC

The Multivariate Situation Same idea still applies BUT For first k components in m dimensions: Find k-dimensional subspace minimising perpendicular distances in m-space - the equations of the subspace gives the loadings in terms of input variables. Residuals are the perpendicular distances mentioned above Coordinates projected onto subspace ordination plot found in above plot Type multidimensional outliers They fit the model subspace model, but are unusual in the subspace Big residuals Type multidimensional outliers Don t even fit the subspace model!

Geographical Weighting PCA Might want to find outliers locally A local outlier: Is not an unusual observation in the data set as a whole But is unlike its geographical neighbours Can use locally weighted PCA to investigate local multivariate outliers. How to do it: Apply geographical weighting windows to the perpendicular distance minimising algorithm Thus PCA loadings are viewed as functions of (u, v) - like regression coefficients in GWR.

An Example Baltic Soil Survey (Reimann et al, 000). Agricultural soils were collected from 0 European countries over a large region surrounding the Baltic Sea 768 sites Here we concentrate on topsoil samples - Trace compounds: SiO, O, O, O, MnO, MgO, CaO, Na O, O and P O 5. % by weight calculated. Data has 768 rows and 0 columns. so, the x and y coordinates of each site are recorded. Data standardised to z-scores. ey task: identify local patterns and outliers...

Survey Locations

Choosing h for Bandwidth Selection Much like the procedure in GWR Measure perpendicular distances in a holdback sample CV Score 550 600 650 700 Choose h to minimise this 800 000 400 800 Bandwidth

PCA Results - Highest Loadings

PCA Results - Sternutation Plot of Loading

PCA Results - Sternutation Plot of Loading

Unique sign patterns in geographically weighted loadings SiO O O O MnO MgO CaO Na O O P O 5 + - - - - - - - - - + + + + + + + + + - + + + + + + + + + + Relatively small number of patterns exhibited - only out of a possible 0 = 04 NB. First sign always positive by convention

PCA Results - Sign Patterns

Hunting of Type - High Perpendicular Distances 50 65 0 59

Hunting of Type - Parallel Coordinates Site 0 Site 59 SiO O O O MnO MgO CaO NaO O PO5 Site 50 SiO O O O MnO MgO CaO NaO O PO5 SiO O O O MnO MgO CaO NaO O PO5 Site 65 SiO O O O MnO MgO CaO NaO O PO5

s GWR/ as data miner Certainly a useful rôle for But PCA can also be seen as a model Possibly data mining / data modelling not such a clear cut distinction? Further extensions...

s The End with thanks to Martin Charlton for his helpful comments and discussion.