Spatial Analysis 2. Spatial Autocorrelation

Spatial Analysis 2 Spatial Autocorrelation

Spatial Autocorrelation a relationship between nearby spatial units of the same variable If, for every pair of subareas i and j in the study region, the drawings which yield x i and x j are correlated, then we say that there is spatial autocorrelation in the region.

Spatial Autocorrelation Spatially located data (georeferenced data) in region R. Given: Variable X A realization of X in area i of R is x i The set of values, { x i }, that are related over space are spatially autocorrelated

Spatial Autocorrelation Given a set S containing n geographical units, spatial autocorrelation refers to the relationship between some variable observed in each of the n localities and a measure of geographical proximity defined for all n(n -1) pairs chosen from n. Hubert, Golledge and Costanza, 1981

Representations of Spatial Autocorrelation Moran s I (global covariance) Geary s c (global differences) Γ (cross product) Ripley s K (cumulative pairs over distance) ρ, λ (autoregressive coefficients in various regression formulations) Getis and Ord s G i and G i * (local clustering) Anselin s I i and c i (local indicators of spatial association (LISA)) Ord and Getis O (a local representation taking into account global autocorrelation) 1/γ (inverse of the semivariogram; i.e., the correlogram)

Advantages of the Study of Spatial Autocorrelation Provides tests on model misspecification; Determines the strength of the spatial effects on any variable in the model; Allows for tests on assumptions of spatial stationarity and spatial heterogeneity; Finds the possible dependent relationship that a realization of a variable may have on other realizations; Identifies the role that distance decay or spatial interaction might have on any spatial autoregressive model; Helps to recognize the influence that the geometry of spatial units under study might have on the realizations of a variable; Allows us to identify the strength of associations among realizations of a variable between spatial units; Gives us the means to test hypotheses about spatial relationships; Gives us the opportunity to weigh the importance of temporal effects; Provides a focus on a spatial unit to better understand the effect that it might have on other units and vice versa ( local spatial autocorrelation ); Helps in the study of outliers.

Global Statistics Nearest Neighbor k-function Global Autocorrelation Statistics Moran s I Geary s c Semivariance Spatial Autocorrelation coefficient(s) Semi-variance

Representation of Spatial Autocorrelation: WY Cross Product W The Spatial Weights Matrix The Spatial Association of All Sites to All Other Sites d, d 2, 1/0, 1/d Y The Attribute Association Matrix The Association of the Attributes at Each Site to the Attributes at All Other Sites +,-,/,x

The Spatial Weights Matrix W is the formal expression of the spatial association between objects (it is the pair-wise geometry of objects being studied).

Spatial Configuration (W matrix) 1 or 0 (1 for some defined aspect of the geometry of the region being studied and 0 otherwise) 1/d α (a distance decay representation) s/p (a common boundary representation s is length of side in common, p is perimeter) d 2 -d 1 (a difference in distance representation) G i, I i (a local statistics representation)

1 2 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 0 3 4

Commentators on W Dacey: varying results given schemes Cliff and Ord: rook s and queen s cases Griffith: better under-specified Florax & Rey: over-specification reduces power Kooijman: maximize Moran s Openshaw: computer search for best model Bartels: binary defensible Hammersley-Clifford: near neighbors in Markov Tiefelsdorf, Griffith, Boots: standardization Florax and Graff: bias due to matrix sparseness Getis and Aldstadt: minimize AIC

Some New W Schemes Fotheringham, Brunsdon, and Charlton s bandwidth distance decay (1996) LeSage s Gaussian distance decline (1999) McMillen s tri-cube distance decline (1998) Getis and Aldstadt s local statistics model (2001, 2002) Aldstadt and Getis AMOEBA (2003) Folmer subsume in model (2007)

The Attribute Matrix Y The variable under study. One variable at a time. Interval scale (other scales under special conditions). For examples: residuals from regression; a socio-economic variable (number of crimes, household income, number of artifacts, etc.)

1 [ 12] 2 [ 10 ] 0 2 7-9 -2 0 5-11 -7-5 0-16 [ 5 ] 9 11 16 0 [ 2 1 ] 3 4

WY: Covariance Set W to preferred spatial weights matrix (rooks, queens, distance decline, etc.) Set Y to (y i - µ) (y j -µ) Set scale to n/ W [Σ(y i -µ) 2 ] I = n Σ Σ w ij (y i - µ) (y j -µ)/ W Σ(y i -µ) 2 where W is sum of all w ij and i j This is Moran s I.

WY: Additive Set W to 1/0 spatial weights matrix 1 within d; 0 outside of d Set Y to (y i +y j ) Set scale to Σ W ij (d) / Σ (y i ) G(d) = Σ W ij (d) (y i +y j ) / Σ (y i ) and i j This is Getis and Ord s G.

WY: Difference Set W to preferred spatial weights matrix Set Y to (y i -y j ) 2 Set scale to (n-1)/2wσ(y i -µ) 2 c = (n - 1) Σ Σ W ij (y i -y i) ) 2 / 2WΣ(y i -µ) 2 where W is sum of all W ij and i j. This is Geary s c.

WY: Difference Set W to 1/0 weights matrix; 1 within ad and 0 otherwise; a is an integer; d is a constant distance Set Y to (y i -y j ) 2 Set scale to 1/2 γ(ad) = 1/2 Σ Σ W ij (y i -y j) ) 2 This is the semivariance

Attribute Relationships Y Types of Relationships Additive association (clustering): (Y i + Y j ) Multiplicative association (product): (Y i Y j ) Covariation (correlation): (Y i - Ῡ)(Y j - Ῡ) Differences (homogeneity/heterogeneity): (Y i -Y j ) Inverse (relativity): (Y i /Y j ) All Relationships Subject to Mathematical Manipulation (power, logs, abs, etc.)

Local Statistics Focus on a site i To what extent are values at sites in vicinity of i associated with i Anselin s LISA: Moran s, Geary s local Getis and Ord s G i * (A clustering statistic)

The I i Statistic (A LISA Statistic): Local Indicator of Spatial Association The I i statistic is local, that is, it is focused on a site and is normally distributed. It is designed to yield a measure of pattern that can be translated into standard normal variates. A covariance statistic. A measure of similarity (differences). Well-suited for study of residuals from regression Indicates the extent to which a location (site) is surrounded to a distance d (or by contiguity) by a cluster of high or low values (H-H, L-L, H-L). Can identify clusters of H-H, L-L, H-L.

Local G i* Statistic [Σ j w ij (d)x j - W j * X ] G i *(d) = s{[ns 1j *-W j2 *]/(N-1)} 1/2 all j w ij (d) is element of 1/0 spatial weights matrix where 1 within d of i, 0 otherwise W j * = Σ w ij (d) S 1j * = Σ j w ij 2 (all j) Statistic distributed normally [G i *(d) = Z score]

The G i *, G i Statistic The G * i statistic is local, that is, it is focused on sites and is normally distributed. It is designed to yield a measure of pattern in standard normal variates. Indicates the extent to which a location (site) is surrounded to a distance d by a cluster of high or low values. Usually, the user specifies maximum search distance and number of increments. The output is a listing of the G i* (d) value for the sample point at a specified distance (d).

K Function Analysis: A Global Statistic L(d) = {(A[ΣΣ K(d ij )]/ π N(N-1) } ½ where K(d ij ) is the number of pairs of points within d of i, and A is the area of the region under study. Used to discern the clustering pattern of the specified variable within the entire study area. The output file gives a table showing L values for each distance (d) increment. E[L(d)] in a random distribution is d.

o o o o o o o o o o

The Random Expectation of K 120 100 Random Expectation of K Observed L* (d) 80 60 40 20 0 10 20 30 40 50 60 70 80 90 100 Distance (meters)

K-function for Aedes aegypti Pupae, by Household, Iquitos, Peru 180 160 Pups CI (min) CI (max) 140 120 100 Random House K-func 80 60 40 20 0 10 20 30 40 50 60 70 80 90 100 Distance