Katherine E. Williams University of Denver GEOG3010 Geogrpahic Information Analysis April 28, 2011 A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE
Overview Data Regression model Ordinary Least Squares Diagnostic Statistics Exploratory Regression Geographically Weighted Regression Interpolation Measured vs. Predicted
Pre-Processing Recommendations Make sure data is clean & organized Use consistent projections Work in a File Geodatabase (if possible) Explore & understand your data Distributions, variables, etc. Read the statistical documentation Be ready to analyze outcomes The software will not do everything for you
Data Soil Moisture Spatial Survey (SMSS) 117 points Dependent Variable: Volumetric Water Content (VWC) 21 Candidate Explanatory Variables Goal: Create a continuous VWC surface that models changes induced by microtopographic features (less than 10 meters)
SMSS Study Area Map
SMSS Plan 1. Find a model from my explanatory variables that best predicts VWC 2. Create an interpolated surface for VWC What method should I use? How do I find available options? How do I choose between methods?
Ordinary Least Squares Regression Describes relationship between dependent variable and explanatory variable(s) Global model, single regression equation Dependent Variable y= X + X + + X + β β β ε Coefficients 1 1 2 2... n n Explanatory Variables Random Error
Ordinary Least Squares Regression Minimizes squared distance between observed and predicted Residuals are the portion of the dependent variable that is unaccounted for by the regression model
OLS Pitfalls Misspecification Missing variables Outliers (scatter plot) Non-linear relationships (scatter plot) Multicollinearity (VIF) Normal distribution bias (Jarque-Bera) Nonstationarity (Kroenker BP) Residual spatial autocorrelation (Moran s I)
Exploratory Regression Tool Supplementary Spatial Statistics Toolbox http://blogs.esri.com/dev/blogs/geoprocessing/archive/2010/10/11/supplem entary-spatial-statistics-now-available-for-download_2100_.aspx Iterates through OLS models Identifies properly specified OLS models Diagnostic Statistics assess model performance, redundancy, bias, and residual distribution
Exploratory Regression for SMSS Passing Model 3 explanatory variables: Slope, Vegetation Type, Percent Cover
Scatter Plot Simple way to look at data distribution Patterns should be linear Transformation Outliers Remove data points & re-run model Dependent variable 250 200 150 100 50 0 0 20 40 60 80 100 Random Linear Non-Linear Explanatory Variable
Coefficient of Determination (Adjusted R 2 ) Ability of model to predict variability in dependent variable Value range: 0 to 1 Higher values explain more Above 0.50 is considered passing for the ER Tool SMSS Value: 0.61 (final OLS)
Variance Inflation Factor (VIF) Explanatory Variable Redundancy High values indicate that explanatory variables are interacting Multicollinearity Passing VIF value < 7.5 SMSS Values: Slope (1.23), Vegetation Type (1.46), and Percent Cover (1.21)
Corrected Akaike s Information Criterion Measure of model fit Can be used to compare models with same dependent variable Lower values are better Interpolation bandwidth SMSS Value: -122.176 (final OLS)
Jarque-Bera Statistic Residual Numeric Distribution (Bias) Null Hypothesis is a normal distribution Significant p-value means non-normal distribution SMSS p-value: 0.08
Kroenker s BP Statistic Consistency of dependent/explanatory variable relationship Significance indicates the relationship is not consistent Nonstationarity A Geographically Weighted Regression should be considered SMSS p-value: 0.000002*
Moran s Index Spatial autocorrelation in residuals Value Range -1 to 1 (-1=dispersion, 0=CSR, 1=clustering) Significant values indicate deviation from CSR Clustering of under & over prediction Missing key explanatory variable SMSS Value: 0.555171
Moran s I Report
Spatial Weights Matrix Conceptualization of spatial relationships Structure for Moran s I, Hot Spot, & Clustering Statistics Neighbors or distance Should reflect real relationship in data SMSS SWM: Inverse Distance Squared
OLS Diagnostic Summary for SMSS Missing variables Outliers (scatter plot) Non-linear relationships (scatter plot) Multicollinearity (VIF) Normal distribution bias (Jarque-Bera) Nonstationarity (Kroenker BP) Residual spatial autocorrelation (Moran s I)
Geographically Weighted Regression Local Regression Model Calculates individual equations for each point Multilevel model Average value Random component Local coefficients Disaggregate local variations from overall model
Geographically Weighted Regression Based on neighborhood or distance search Kernel and Bandwidth GWR does not provide robust diagnostic statistics Relationships should be inspected in OLS first!
SMSS Study GWR Bandwidth 266 m Residual Squared 1.26 Effectiveness Number 28.5 Adjusted R 2 0.71 AICc -140.9467 Moran s I p-value 0.865530
What s Next? Have a properly specified OLS/GWR model Says something about the why in soil moisture variation Physical variables that control or are indicators of soil moisture Only says something about soil moisture at the sample points where soil moisture is already known Does not inform about locations where soil moisture has not already been measured
Interpolation Estimates values from limited input point data for a continuous raster surface Works on concept that spatial objects are spatially correlated Many types: IDW, Kriging, Natural Neighbor, Spline, Trend, etc. Best technique often depends on the data SMSS compares IDW and Ordinary Kriging
Inverse Distance Weighting Exact interpolator Distance determines weight Exponential distance decay with higher power designation Search Window Shape Bandwidth Neighbors
IDW Parameters
Ordinary Kriging Uses spatial relationships in data to build prediction model Build Semiovariogram Fit Model Predict Need to explore data to specify most appropriate method Search Window Model Type
Semivariogram & Kriging Parameters
IDW Cross Validation
Kriging Cross Validation
SMSS IDW Surface
SMSS OK Surface
SMSS Cross Validation Field Points Measured n=117 Summary Statistics Ordinary Kriging Inverse Distance Weighting Predicted Error Mean 0.00632622 0.00467828 Predicted Error RMS 0.20752391 0.20251504 Predicted Regression Function Error Regression Function y = 0.2345 * x + 0.2019 y = 0.2362 * x + 0.1994 y = -0.7655 * x + 0.2019 y = -0.7638 * x + 0.1994
Where are we at? Continuous Interpolated Surface for VWC Is it sufficient? Am I modeling variance related to topography?
Regression Model Predict Regression models are equations Unknown dependent variables can be computed with known explanatory variables Built into GWR interface as Additional Parameters Explanatory variables must be input in the same order as input data
Predicted Points If, Explanatory Variables are known. Then, the Dependent Variable can be computed with OLS or GWR Field points 10 & 5 meter grids
Defining Explanatory Variables Create Continuous Surfaces Extract Values to Points Input into GWR Does not further inform the GWR model Re-run Interpolation
SMSS 10 & 5 meter IDW Surfaces
SMSS 10 & 5 meter OK Surfaces
Cross Validation 10m Grid Points GWR Predicted n=10557 5m Grid Points GWR Predicted n=42531 Summary Statistics Ordinary Kriging Inverse Distance Weighting Predicted Error Mean -0.00005312 0.00006528 Predicted Error RMS 0.12523941 0.12139920 Predicted Regression Function Error Regression Function y = 0.6253 * x + 0.0762 y = -0.3747 * x + 0.0762 y = 0.6603 * x + 0.0692 y = -0.3397 * x + 0.0692 Predicted Error Mean 0.00002023 0.00000738 Predicted Error RMS 0.10854328 0.10241228 Predicted Regression Function Error Regression Function y = 0.8153 * x + 0.0409 y = -0.1847 * x + 0.0409 y = 0.8419 * x + 0.0353 y = -0.1581 * x + 0.0353
Now what do we have? Modeled surface at high resolution from limited data Picking up midslope variability
Summary of SMSS
Lessons Learned Spatial statistical methods are powerful However, all statistical methods must be carefully evaluated to avoid misspecification Think out of the box Make sure you are answering the right questions Use your resources! ArcGIS has extensive documentation & experts who are willing to help
Additional Resources ArcGIS Desktop Help ArcGIS Resource Center Spatial Statistics Toolbox http://help.arcgis.com/en/arcgisdesktop/10.0/hel p/#/an_overview_of_the_spatial_statistics_too lbox/005p00000002000000/ Geoprocessing Blog http://blogs.esri.com/dev/blogs/geoprocessing/ default.aspx Documentation with Supplementary Spatial Statistics Toolbox download