Global Spatial Autocorrelation Clustering

Similar documents
Local Spatial Autocorrelation Clusters

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Autocorrelation

Spatial Regression. 6. Specification Spatial Heterogeneity. Luc Anselin.

Construction Engineering. Research Laboratory. Approaches Towards the Identification of Patterns in Violent Events, Baghdad, Iraq ERDC/CERL CR-09-1

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Basics of Geographic Analysis in R

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Measures of Spatial Dependence

Lab #3 Background Material Quantifying Point and Gradient Patterns

The Study on Trinary Join-Counts for Spatial Autocorrelation

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Spatial Clusters of Rates

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Mapping and Analysis for Spatial Social Science

Spatial Analysis 2. Spatial Autocorrelation

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Nature of Spatial Data. Outline. Spatial Is Special

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

A Local Indicator of Multivariate Spatial Association: Extending Geary s c.

Hypothesis Testing hypothesis testing approach

Spatial Data Mining. Regression and Classification Techniques

Introduction to Spatial Statistics and Modeling for Regional Analysis

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Temporal vs. Spatial Data

Spatial autocorrelation: robustness of measures and tests

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Rate Maps and Smoothing

Introduction. Semivariogram Cloud

Universitat Autònoma de Barcelona Facultat de Filosofia i Lletres Departament de Prehistòria Doctorat en arqueologia prehistòrica

An Introduction to Pattern Statistics

An Introduction to Spatial Autocorrelation and Kriging

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Experimental Design and Data Analysis for Biologists

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

Chapter 16. Simple Linear Regression and dcorrelation

A Guide to Modern Econometric:

Chapter 4: Regression Models

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

12 - Nonparametric Density Estimation

Spatial Autocorrelation (2) Spatial Weights

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 10. Specification Tests (2) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial correlation and demography.

Creating and Managing a W Matrix

Spatial interaction and spatial autocorrelation: a crossproduct

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Review of Statistics

Correlation and simple linear regression S5

Chapter 16. Simple Linear Regression and Correlation

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Spatial Autocorrelation

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Relational Nonlinear FIR Filters. Ronald K. Pearson

Point Pattern Analysis

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

6. Spatial analysis of multivariate ecological data

Weighted Least Squares

Sleep data, two drugs Ch13.xls

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Lecture 18. Models for areal data. Colin Rundel 03/22/2017

FinQuiz Notes

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Introduction to Statistical Analysis

UNIT 5:Random number generation And Variation Generation

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved


Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Luc Anselin and Nancy Lozano-Gracia

Handbook of Regression Analysis

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Nonparametric Estimation of the Spatial Connectivity Matrix Using Spatial Panel Data

Lecture 5: Sampling Methods

Unit 10: Simple Linear Regression and Correlation

Categorical Predictor Variables

TIME SERIES DATA ANALYSIS USING EVIEWS

Regression Models. Chapter 4. Introduction. Introduction. Introduction

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Chapter 3: Regression Methods for Trends

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Practical Statistics

Correlation 1. December 4, HMS, 2017, v1.1

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 4. Regression Models. Learning Objectives

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

CS 5014: Research Methods in Computer Science

Testing for Regime Switching in Singaporean Business Cycles

Bivariate Relationships Between Variables

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Spatial Statistics For Real Estate Data 1

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Transcription:

Global Spatial Autocorrelation Clustering Luc Anselin http://spatial.uchicago.edu

join count statistics Moran s I Moran scatter plot non-parametric spatial autocorrelation

Join Count Statistics

Recap - Spatial Autocorrelation Statistic combination of attribute similarity and locational similarity f(x i, xj) and wij general form ΣiΣj f(xi,xj)wij

Spatial Autocorrelation for Binary Data only two values observed, e.g., presence and absence x i = 1, or B(lack) xi = 0, or W(hite) coding of 1 or 0 is arbitrary, typically choose 1 for the smaller category unlike event analysis (point patterns) all locations are observed with either 1 or 0 values

neighborhoods with burglaries (=1) Walnut Hills, Cincinnati, OH

Match Value-Location count the number of times similar values occur for neighbors = joins if x i = 1 and xj = 1 for i-j neighbors, count as a BB join if xi = 0 and xj = 0 for i-j neighbors, count as a WW join

Positive Spatial Autocorrelation similarity of neighbors BB = 1 -- 1 x ixj = 1 for a join, 0 otherwise WW = 0 -- 0 (1 - x i)(1 - xj) = 1 for a join, 0 otherwise

Negative Spatial Autocorrelation dissimilarity of neighbors BW = 1 -- 0 or 0 -- 1 (x i - xj) 2 = 1 for dissimilar neighbors 0 otherwise

Join Count Statistics binary spatial weights, w ij = 1 or 0 BB: (1/2) ΣiΣj wij.xi.xj WW: (1/2) ΣiΣj wij.(1-xi).(1-xj) BW: (1/2) ΣiΣj wij.(xi - xj) 2 BB + WW + BW = (1/2) S0 = (1/2) ΣiΣj wij

null hypothesis: spatial randomness

Actual Random 1 Random 2 BB 130 95 79 WW 845 808 814 BW 509 581 591 Sum 1484 1484 1484 join count statistics - queen contiguity n = 457, S0 = 2968

Analytical Inference based on sampling theory binomial distribution, using sampling with replacement or without replacement moments (mean, variance, higher) can be computed uses a (poor) normal approximation Walnut Hills burglary BB = 130 E[BB] = 91.7, Var[BB] = 77.9 z-value = 4.34 p-value < 0.00007

Permutation Inference recompute statistic (BB) builds a reference distribution reshuffle 0-1 values over locations compare observed statistic to reference distribution pseudo p-value = (M + 1)/(R + 1) M = values equal to or greater than statistic R = number of replications (e.g., 999)

reference distribution, BB = 130

reference distribution for spatially random values BB = 95

Moran s I 6

Moran s I the most commonly used of many spatial autocorrelation statistics I = [ Σi Σj wij zi.zj/s0 ]/[Σi zi 2 / N] with z i = yi - mx : deviations from mean cross product statistic (z i.zj) similar to a correlation coefficient value depends on weights (w ij)

Moran s I examined more closely scaling factors in numerator and denominator in numerator: S0 = Σi Σj wij the number of non-zero elements in the weights matrix, or the number of neighbor pairs (x2) in denominator: N the total number of observations

Inference how to assess whether computed value of Moran s I is significantly different from a value for a spatially random distribution compute analytically (assume normal distribution, etc.) computationally, compare value to a reference distribution obtained from a series of randomly permuted patterns

Approximate Inference assumes either normal distribution or randomization (each value equally likely to occur at any location) analytical derivation of moments (mean, variance) of the statistic under the null hypothesis (spatial randomness) z = (I - E[I]) / SD[I] approximate the resulting z-value by a standard normal distribution

Cleveland 2015 q4 house sales prices (in $1,000)

Normal Randomization MI 0.282 0.282 E[MI] -0.0049-0.0049 Var[MI] 0.00178 0.00158 z-value 6.81 7.22 p-value << 0.000001 << 0.000001 normal vs randomization inference for Moran s I queen weights

Queen K6 Distance MI 0.282 0.343 0.335 E[MI] -0.0049-0.0049-0.0049 Var[MI] 0.00158 0.00124 0.00110 z-value 7.22 9.88 10.3 p-value << 0.000001 << 0.000001 << 0.000001 Moran s I inference under randomization, different spatial weights

Permutation Approach as such, even a high Moran s I does not indicate significance need to construct a reference distribution randomly reshuffle observations and recompute Moran s I each time compare observed value to reference distribution

Standardized z-value standardize by subtracting mean and dividing by standard deviation, computed from the reference distribution z = [Observed I - Mean(I)] / Standard Deviation(I) z-values are comparable across variables and across spatial weights

999 permutations and reference distribution (queen weights) MI = 0.282 Mean = -0.0045 s.d. = 0.0401 z-value = 7.15

Interpretation of Moran s I

Sign of Moran s I theoretical mean is - 1/(N - 1), essentially zero for large N positive and significant = clustering of like value NOT clustering of high or low could be either or a combination negative and significant = alternating values presence of spatial outliers spatial heterogeneity (checkerboard pattern)

Significance of Moran s I compute pseudo p-value compare standard z-value to standard normal (approximate only) only if Moran s I is significant is sign of coefficient meaningful

Comparing Moran s I Moran s I depends on spatial weights relative magnitude for same weights and different variables is meaningful but NOT for different spatial weights instead, use standardized z-value to compare

Queen K6 Distance MI 0.282 0.343 0.335 E[MI] -0.0049-0.0049-0.0049 Var[MI] 0.00158 0.00124 0.00110 z-value 7.22 9.88 10.3 p-value << 0.000001 << 0.000001 << 0.000001 Moran s I, different spatial weights MI is largest for K6, but z is largest for Distance

Clustering vs Clusters Moran s I is a global statistic, i.e., a single value for the whole spatial pattern Moran s I does NOT provide the location of clusters cluster detection requires a local statistic

True vs Apparent Contagion the indication of clustering does not provide an explanation for why the clustering occurs different processes can result in the same pattern true contagion: evidence of clustering due to spatial interaction (peer effects, mimicking, etc.) apparent contagion: evidence of clustering due to spatial heterogeneity (different spatial structures create local similarity)

Geary s c

Focus on Dissimilarity squared difference as measure of dissimilarity similar to notion of variogram (geostatistics) values between 0 and 2

Geary s c Statistic c = (N-1) Σi Σj wij (xi - xj) 2 / 2S0 Σ zi 2 alternatively c = [ Σi Σj wij (xi - xj) 2 / 2S0 ] / [ Σ zi 2 / (N-1) ] with z i in deviations from the mean S0 = Σi Σj wij sum of all the weights

Interpretation positive spatial autocorrelation c < 1 or z < 0 negative spatial autocorrelation c > 1 or z > 0 opposite sign of Moran s I

Inference same approach as for Moran s I analytical approximate (normal, randomization) permutation

Queen K6 Distance c 0.716 0.496 0.544 E[c] 1.0 1.0 1.0 Var[c] 0.00381 0.00545 0.00285 z-value -4.61-6.82-8.55 p-value < 0.000004 << 0.000001 << 0.000001 geary s c inference under randomization, different spatial weights

999 permutations and reference distribution (queen weights) c = 0.716 Mean = 0.998 s.d. = 0.062 z-value = -4.58

Moran Scatter Plot 6

Recap: Moran s I I = [ Σi Σj wij zi.zj/s0 ]/[Σi zi 2 / N] with z i = yi - mx : deviations from mean for row-standardized weights S0 = N I = Σi Σj wij zi.zj / Σi zi 2 = Σi zi (Σj wij.zj) / Σi zi 2 Moran s I is slope in a regression of Σj wij.zj on zi

Moran Scatter Plot scatter plot of [ zi, Σj wij.zj ] the value at i on the x-axis, its spatial lag (weighted average of neighboring values) on the y-axis slope of linear fit is Moran s I use local regression (Lowess) to identify possible structural breaks

outliers Moran scatter plot - Cleveland house prices - queen weights

outliers (in red)

Moran scatter plot without outliers

Categories of Local Spatial Autocorrelation four quadrants of the scatter plot upper right and lower left positive spatial autocorrelation clusters of like values locations are similar to their neighbors lower right and upper left negative spatial autocorrelation spatial outliers locations are different from their neighbors

Positive Spatial Autocorrelation all comparisons relative to the mean not absolute high or low high-high low-low

Negative Spatial Autocorrelation spatial outliers high-low low-high

Moran scatter plot with Lowess smoother

Nonparametric Spatial Covariance Function 6

Alternative Perspective - Nonparametric non-parametric approach no model for covariance let the data suggest the functional form based on sample autocorrelation covariance function must be positive definite

Sample Autocorrelation computed for each pair i, j ρij = ρ(zi,zj) = zi*zj* / (1/n) Σh (zh - zm) 2 z i* = zi - zm deviations from mean in practice, easier to use standardized zi incidental parameter problem one parameter for each pair i-j n(n-1)/2 individual values of ρij

Non-Parametric Principle spatial autocorrelation as an unspecified function of distance ρij = g(dij) how to fit the function? use kernel estimator

Kernel Estimator Hall-Patil (1994) ρ(d) = Σi Σj K(dij/h)(zi.zj) / Σi Σj K(dij/h) K is the kernel function (many choices) h is the bandwidth results depends critically on h, less so on K

Kernel Regression z izj = g(dij) local regression depends on choice of kernel function and bandwidth values of the estimated g(dij) do not necessarily result in a valid (positive semi-definite) variancecovariance matrix

correlogram - full distance range

correlogram - 20,000 ft max distance

Interpretation range of spatial autocorrelation alternative to specifying spatial weights sensitive to kernel fit may violate Tobler s law