Spatial analysis is the quantitative study of phenomena that are located in space.

Size: px

Start display at page:

Download "Spatial analysis is the quantitative study of phenomena that are located in space."

Theodora Clark
6 years ago
Views:

1 c HYON-JUNG KIM, Introduction Spatial analysis is the quantitative study of phenomena that are located in space. Spatial data analysis usually refers to an analysis of the observations in which the spatial locations of sites are taken into account, and includes the reduction of spatial patterns to a few clear and useful summaries. Spatial statistics goes beyond this in that these summaries are compared with what might be expected from theories of how the pattern might have originated and developed, i.e., inferential statistics. So, Spatial Statistics involves the inferential level of analysis, model building, testing and interpretation. It is a vast subject in large part because spatial data are of so many different types. Spatial data: Data that are location specific and that vary in space. The observations may be: - univariate or multivariate - categorical or continuous - real-valued (numerical) or not real-valued - observational or experimental The data locations may - be points, regions, line segments, or curves - be regularly or irregularly spaced - be regularly or irregularly shaped - belong to Euclidean or non-euclidean space The mechanism that generates the data locations may be: - known or unknown - random or non-random - related or unrelated to the processes that govern the observations PAGE 1

2 c HYON-JUNG KIM, 2016 Typical data - sample of observations from the process of interest - often very noisy, NOT independent Three prototypes of data: 1. Geostatistical data The components of geostatistical data are the locations, and the measurements at each location. e.g. Rainfall measurements in Tampere, Temperature for weather stations in Finland, Air pollutants measurements, Soil ph in water, etc. 2. Lattice data (Areal or aggregate data) Counts or averages of a quantity on subregions that make up a larger region. e.g. Presence or absence of a plant species in square quadrats over a study area, number of deaths due to SIDS in the counties of North Carolina, Pixel values from remote sensing (satellites) 3. Spatial point patterns e.g. Location of bird nests in a suitable habitat (evidence of territoriality), location of lunar craters (meteor impacts or volcanism) etc. Note that the distinction between these types are not always clearcut. Especially, geostatistical data and lattice data have many similarities. Spatial Structure Large-scale structure (Global) - Mean function of geostatistical process - Intensity of spatial point process - Mean vector of lattice data PAGE 2

3 c HYON-JUNG KIM, 2016 Small-scale structure (Local) - Variogram, covariance function of geostatistical process (and lattice process) - Ripley s K function, second-order intensity, nearest-neighbor functions for spatial point process - Neighbor weights for lattice process Stationarity implies constant large-scale structure and small-scale structure which depends on the spatial locations only through their relative positions (formal descriptions will be discussed later.) Main objectives of Spatial Statistics - Inference for spatial structure - Inference for non-spatial structure - Prediction of unobserved variables - Design issues, such as where to take observations or how to arrange treatments in a spatial experiment. Temporal Statistics, Spatial Statistics, and Spatio-Temporal Statistics The inherent difference between temporal statistics and spatial statistics is due to the fact that time flows in one direction only, from past to present to future. - In spatial statistics, observations are often irregularly spaced and models must be more flexible. - In geostatistics and lattice data analysis, observations are usually assumed to be dependent and non-identically distributed; in particular, models usually include a trend. - In space, interaction regarding each observation generally occurs in all directions and many geostatistical/lattice models incorporate omnidirectional interaction. - In time series, prediction usually consists of extrapolating to a future time point. In geostatistics/lattice analysis, interpolation is as important as extrapolation. - Geostatistics and lattice data analysis are most similar to that subfield of modern longitudinal data analysis which explicitly models the temporal correlation among observations. PAGE 3

4 c HYON-JUNG KIM, Spatial point pattern analysis is most similar to failure time data analysis. Spatiotemporal statistics: data are observations with identifiable and observed spatial and temporal labels. e.g. Earthquakes (locations random in time and space), change in locations of trees over time, environmental monitoring of water quality, etc. Space-time data can be modeled either as a collection of spatially correlated time series or a collection of temporally correlated spatial random fields, lattice processes, or spatial point processes. There are many possibilities to combine spatial data types with temporal data types and the interaction between them. We will focus on pure spatial statistics in this course but occasionally we will discuss spatiotemporal extensions of certain issues, topics, or methods. Basic Notation and Statistical Model - Space S, which for concreteness we assume to be Euclidean: S = R d where d = 1, 2, or 3. - Study region, A S - Spatial data (or point) locations s 1,..., s n, s i D, D an index set - Observations Z(s 1 ), Z(s 2 ),..., Z(s n ) - Covariates X(s 1 ), X(s 2 ),..., X(s n ) - Model: {Z(s), s D, D R d } This is a stochastic process, i.e. a collection of random variables, indexed by points or regions in D. Either the Z values or s values or both are random. The X values are usually assumed to be nonrandom. PAGE 4

5 1.1 Visualization c HYON-JUNG KIM, Visualization al Data 1. Visualizing Geostatistical (point referenced) Data The best way to visualize these data is to display on a map, and differentiate the values of the measurements of interest by colour or size. Example: Field observations of air pollution measurements in the northeast US. The points are air pollution monitors: the monthly average P M 2.5 concentration colour coded (a gradient from blue (low) to red (high)) Alternatively we can display the points as gradients in size (2nd figure): the monthly average P M 2.5 concentration where larger circles represent higher concentrations, smaller circles are lower concentrations. Note the choice of colour and size PM2.5 gradient concentrations of the points(ugm-3) can lead to different conclusions! 2.5 ed (a to Latitude Longitude Goals of spatial statistics applied to geostatistical data - Explore the spatial pattern in the observations. (Often called spatial structure ). - Quantify the spatial pattern with a function. - Model the spatial correlation/covariance in the observations. 23 / 53 PAGE 5

6 al Data 1.1 Visualization c HYON-JUNG KIM, Make predictions at unobserved locations: interpolation, smoothing. he Additional considerations: Account for spatial structure in regression models and/or Test a null hypothesis of nopm2.5 spatialconcentrations structure. (ugm-3) on 2.5 er ircles nd n lead Latitude Areal data Longitude Areal units are often referenced as polygons. The centroids of the areal units may be useful for a spatial reference, in combination with the area of the polygon. The best 24 / 53 way to visualize these data is to display as a map, differentiating the areal units by colour. Areal data (lattices) use neighbor relationships. Examples: - Median household income in Los Angeles neighborhoods - State-specific (or county-, census tract-, zip code-specific) election results - County hospital admission rates for influenza Information collected in areal units may be census related, health related, environmental (satellite estimates of pollution, land cover). Goals of spatial statistics applied to areal data PAGE 6

- If there is a spatial pattern, how strong is it?

7 1.1 Visualization c HYON-JUNG KIM, 2016 Example- Understand the linkage between areal units. - We want to determine spatial patterns of areal units within a region. - If there is a spatial pattern, how strong is it? A pattern through visualization is often subjective. Independent measurements will usually have no pattern. in the Visualizing Areal Data: Example 30 / / 53 PAGE 7

1.1 Visualization c HYON-JUNG KIM, 2016 3. Point Pattern Data A spatial point process is a stochastic mechanism that generates events in 2D. Event is an observation (e.g. presence/absence) and the point is the location.

8 1.1 Visualization c HYON-JUNG KIM, Point Pattern Data A spatial point process is a stochastic mechanism that generates events in 2D. Event is an observation (e.g. presence/absence) and the point is the location. Mapped point pattern: Events in a study area D have been recorded. Sampled point pattern: Events are recorded after taking samples in an area D. Examples: - Locations of homeless in Los Angeles - Cases of malaria in Nairobi - Locations of a specific tree species in a forest Point Pattern Data: An Everyday Example If there are different categories of a point pattern, such as with the homeless data, then these categories may be coloured separately. Often conclusions cannot be drawn from visual inspection alone. Goals about point pattern data: 39 / 53 Model some spatial pattern and determine if our observed point pattern fits this model. Measure of intensity: mean number of events per unit area PAGE 8

1.1 Visualization c HYON-JUNG KIM, 2016 Questions we would like to answer: - Is there a regular pattern in the points? - Is there clustering of the points?

9 1.1 Visualization c HYON-JUNG KIM, 2016 Questions we would like to answer: - Is there a regular pattern in the points? - Is there clustering of the points? - Can we define a point process that our events follow? - Is there an underlying population distribution from which events arise in a region? 4. Spatio-temporal data All three types of data we have described may be referenced in space and in time. That is, data that are location specific can have replicates in time: - Each observation has a location, time and value Geostatistical: Relationship between daily air pollution measured at discrete locations Visualizing Areal Data in the US Northeast and hospital admissions Areal: Examining birth rates from year to year in US states. Crude birth rates by state based on equal-interval cut points Point process: Changes in spatial clustering of homeless individuals from 2015 to Figure: Monomier, N. Lying with Maps. Statistical Science 2005, 20(3) PAGE 9 34 / 53

10 c HYON-JUNG KIM, Geostatistics The (stochastic) process varies continuously over the space, but data is measured only at discrete locations. - Process (Markov Random Field) {Z(s), s D, D R d } - Observations: z 1 = Z(s 1 ), z 2 = Z(s 2 ),..., z n = Z(s n ) First law of geography: Nearby quantities tend to be more alike than those far apart The usual model for many kinds of data is Datum = Mean + Residual In a Geostatistical context, the basic model takes the form Z(s) = m(s) + ɛ(s) ( i.e. large scale variation + small scale) = m(s) + W (s) (smooth) + δ(s) (white noise) = signal + noise where m(s) E[Z(s)] is the mean function which is usually nonrandom quantity. When we specify the distribution of ɛ(s) sufficiently, the distribution of {Z(s), s D} will be specified. However, the random sampling assumptions generally are not appropriate. Geostatistical data generally represent an incomplete sampling of a single realization. Some further assumptions about Z( ) must be made for inference to be possible and such an assumption is stationarity (to be discussed later in detail). 2.1 Exploratory Data Analysis 1. Non-spatial summaries - Numerical summaries: Mean, median, standard deviation, range, etc. - Graphic tools: stem-and-leaf, box plots, etc. 2. Descriptive statistics for spatial information PAGE 10

11 2.1 Exploratory Data Analysis c HYON-JUNG KIM, 2016 a) Methods mainly to explore large-scale variation: Plot of Z i versus each marginal coordinate Plot of mean or median of Z i versus row index or column index (data locations on a regular grid) 2-D or 3-D scatterplots: a plot of Z i vs. data location (for d = 3) Indicator maps: assign each data point to one of only two classes using two symbols contour plots, greyscale maps, proportional symbol maps Spatial moving averages: estimation by averaging the values at neighboring sampled data points Nonparametric smoothing : Kernel estimation: (Bailey and Gatrell, section 2.3.2), LOESS (locally weighted polynomial regression) Mean or median polish - Requires a rectangular grid, say p q - Decomposes data: data = overall + row effect + col effect + residuals (i.e. removes some trend, large scale variation) - Alternately subtract row means (medians) and column means and accumulate these in extra cells. Repeat this procedure until another iteration produces virtually no change. b) Methods to explore small-scale variation: h-scatterplots (or same-lag scatterplots) - Methods to explore dependence - Requires regular spacing between data locations - for a fixed vector e of unit length and a scalar h, plot Z(s i + he) vs. Z(s i ) for all i - May reveal direction of dependence, outliers or the existence of nonstationarity in the mean and/or variance PAGE 11

12 2.1 Exploratory Data Analysis c HYON-JUNG KIM, D plot of standard deviation versus(vs.) spatial location, computed from a moving window Scatterplot of standard deviation vs. mean, computed from a moving window Semivariogram cloud - Plot (Z(s i ) Z(s j )) 2 or Z(s i ) Z(s j ) 1/2 vs. (s i s j ) 1/2 for all possible pairs of observations - Note that this implicitly assumes some kind of stationarity e.g. Coal Ash Data (Cressie) The data contains 208 coal ash core samples collected on a grid. Suppose X=% coal ash, Y 1 = % coal ash of neighbor to the East and Y 2 = % coal ash of second nearest neighbor to the East. Let D1 2 = (X Y 1 ) 2, D1 2 = (X Y 1 ) 2, etc. Make a boxplot of D1, 2 D2, 2 and put them side by side. D 2 1 large D 2 1 small Empirical (or sample, or experimental) semivariogram (Matheron, 1962) (Assume that large scale variation for Z( ) is removed or ignorable for now.) γ(h) = 1 2 N(h) {Z(s i ) Z(s j )} 2 N(h) where N(h) = {(s i, s j ) : s i s j = h : i, j = 1, 2,..., n} and N(h) is the number of distinct pairs in N(h). Sample covariance function The usual estimator is Ĉ(h) = 1 N(h) (Z(s i ) Z)(Z(s j ) Z) N(h) which is the spatial generalization of the sample autocovariance function used in time series analysis. (This will be discussed more in depth later.) PAGE 12

13 2.2 Models c HYON-JUNG KIM, Models Stationarity a) Strict stationarity - requires that the joint probability of the data depends only on the relative positions of the sites at which the data were taken. b) Second-order stationarity i) the variate s mean is constant. ii) Covariance between variates at two sites depends only on the site s relative positions. C(s, t) = C(s + h, t + h), for all h c) Intrinsic stationarity i) the mean is constant E[Z(s)] = µ for all s D ii) 1 Var[Z(s) Z(t)] depends only on the lag difference s t for all s, t D. 2 Trend surface (Mean functions) The first requirement for stationarity (that the spatial variate have constant mean) does not seem reasonable in many cases. What seems more reasonable is that sites close to one another should have similar means, but sites far apart need not. This kind of local stationarity rather than global stationarity leads to the postulation of a continuous, relatively smooth but nonconstant function for the mean. - The conventional multiple regression model: Z(s) = X(s)β + ɛ(s) - A very useful class of mean functions are the polynomials: e.g. m(x, y) = β 0 + β 1 x + β 2 y - Another kind of continuous (but less smooth) function is the surface that results from performing a median polish. PAGE 13

14 2.2 Models c HYON-JUNG KIM, An alternative to a parametric approach to modeling the mean function is a nonparametric approach using splines or LOESS or a kernel estimator. Recall that in page 10, if we assume that the distribution of {ɛ(s), s D} is a Gaussian process, then the distribution of {Z(s), s D} is completely specified. Now, the convention in Geostatistics is that the distribution of {Z(s), s D} is specified through its covariance function as a function of the coordinates of the two corresponding sites. Covariance functions The function needs to satisfy the following properties: a) Evenness C(h) = C( h) for all h b) Nonnegative definiteness n n a i a j C(s i s j ) 0 i=1 j=1 for all n, all sequences {a i, i = 1,..., n} and all sequences of spatial locations {s i, i = 1,..., n}. a) and b) C(0) 0, C(h) C(0) for all h Bochner s theorem: a function is nonnegative definite iff (if and only if) it is the Fourier transform of a positive Borel measure. Isotropy and Anisotropy A stationary covariance function is called isotropic if the covariance between any two values depends only on the Euclidean distance s t between locations i.e., C(h) = C( h ) When the covariance depends on the direction, it is called anisotropic. Isotropic, parametric (valid) covariance function models Let r = h for convenience. PAGE 14

15 2.2 Models c HYON-JUNG KIM, 2016 Tent (triangular, piecewise linear) model (valid in R 1 only) θ 1 (1 r/θ 2 ) for 0 r θ 2 C(r; θ) = 0 for θ 2 < r Spherical model ( ) θ 1 1 3r 2θ C(r; θ) = 2 + r3 for 0 r θ 2θ for θ 2 < r Exponential model C(r; θ) = θ 1 exp( θ 2 r) θ 1 0, θ 2 0 Gaussian model C(r; θ) = θ 1 exp( θ 2 r 2 ) θ 1 0, θ 2 0 Rational quadratic model r C(r; θ) = θ 1 (θ 2 ) r 2 /θ 2 θ 1 0, θ 2 0 Matern class of model C(r; θ) = θ 1 2 θ 3 1 Γ(θ 3 ) ( 1/2 ) θ3 2θ3 r K θ3 θ 2 2θ1/2 3 r θ 2 θ 1 0, θ 2 0,, θ 3 > 0 where K θ is called the modified Bessel function of the third kind of order θ 3. Cosine model C(r; θ) = θ 1 cos(r/θ 2 ) θ 1 0, θ 2 0 Wave or hole-effect model C(r; θ) = θ 1 θ 2 sin(r/θ 2 ) r θ 1 0, θ 2 0 Note that we can constuct more complicated models using the following rules: - If C 1 ( ) and C 2 ( ) are valid covariance functions in R d, then so is C( ) C 1 ( ) + C 2 ( ) - If C 0 ( ) is a valid covariance function in R d and b > 0, then C( ) bc 0 ( ) is a valid covariance function in R d PAGE 15

16 2.2 Models c HYON-JUNG KIM, If C 1 ( ) and C 2 ( ) are valid covariance functions in R1 d and R2 d respectively, then C( ) C 1 ( )C 2 ( ) is a valid covariance function in R d 1+d 2 - A valid isotropic covariance function in R1 d may not be a valid isotropic covariance function in R2 d where d 2 > d 1. However, the converse is true. With the exception of the tent model, all the models listed above are valid in R 2 and R 3. Semivariogram Traditionally, geostatistical practitioners have adopted a slightly more general kind of stationarity assumption (intrinsic stationarity) than second-order stationarity, and they modeled the small-scale dependence through a function (semivariogram) somewhat different than the covariance function. γ(s t) = 1 Var[Z(s) Z(t)]. 2 The function 2γ( ) is called the variogram. When the process is intrinsically stationary, it can be also expressed as γ(h) = 1 2 E[Z(s) Z(t)]2 where h = s t. A second-order stationary random process with covariance function C( ) is intrinsically stationary, with semivariogram γ(h) = C(0) C(h) but the converse is not true in general. That is, there exist processes that are intrinsically stationary but not second-order stationary. The semivariogram must satisfy the following properties: a) It vanishes at 0, i.e. γ(0) = 0 b) Evenness c) It needs to be conditionally negative-definite; that is, it must satisfy n n λ i λ j γ(s i s j ) 0 i=1 j=1 for each set of locations s 1,..., s n and all λ 1,..., λ n such that n i=1 λ i = 0. d) lim h {γ(h)/ h 2 } = 0 PAGE 16

17 2.2 Models c HYON-JUNG KIM, 2016 Attributes of the semivariogram Nugget effect microscale variability Sill ( = partial sill+ nugget effect ) Range or effective range The range of an isotropic semivariogram (or covariance function) is defined as the distance beyond which correlation is equal to 0. Of the models listed, only the tent and spherical models have a range (which is equal to θ 2 ). For isotropic models that do not have a range, effective range, if one exists, is defined as the distance beyond which correlation does not exceed 0.95 variance (or C(0), partial sill). The exponential, Gaussian, rational quadratic, and Matern models all have effective ranges; but the cosine model does not. Slope Examples of valid isotropic semivariogram models Tent (valid in R 1 only) θ 1 r/θ 2 for 0 r θ 2 γ(r; θ) = for θ 2 < r θ 1 Linear Power γ(r; θ 1 ) = θ 1 r θ 1 0 γ(r; θ) = θ 1 r θ 2 θ 1 0, 0 θ 2 < 2 Spherical γ(r; θ) = ( ) θ 3r 1 2θ 2 r3 2θ2 3 θ 1 for 0 r θ 2 for θ 2 < r Exponential γ(r; θ) = θ 1 {1 exp( θ 2 r)} θ 1 0, θ 2 0 PAGE 17

18 2.2 Models c HYON-JUNG KIM, 2016 Gaussian model γ(r; θ) = θ 1 {1 exp( θ 2 r 2 )} θ 1 0, θ 2 0 Rational quadratic model γ(r; θ) = θ 1 r r 2 /θ 2 θ 1 0, θ 2 0 Cosine model γ(r; θ) = θ 1 {1 cos(r/θ 2 )} θ 1 0, θ 2 0 Wave or hole-effect model sin(r/θ 2 ) γ(r; θ) = θ 1 {1 θ 2 } θ 1 0, θ 2 0 r Matern class of model ( 1 γ(r; θ) = θ 1 1 1/2 ) θ3 2θ3 r K 2 θ θ3 3 1 Γ(θ 3 ) θ 2 2θ1/2 3 r θ 2 θ 1 0, θ 2 0,, θ 3 > 0 Geostatistical Data: Semivariogram Interpretation - The exponential model is a special case of the Matern model with θ 3 = 1 ; the Gaussian 2 model is the limiting case of the Matern model as θ 3.. PAGE 18

19 2.2 Models c HYON-JUNG KIM, 2016 Exponential semivariogram Gaussian semivariogram Semivariogram Semivariogram h h Spherical semivariogram Power semivariogram Semivariogram Semivariogram ω = ω 1.5 = 1 ω = h h Modeling anisotropy a) Range anisotropy - Most often seen in practice (sill and nugget are the same). - Geometric anisotropy is easiest to model. Any valid isotropic model can be generalized to make it geometrically anisotropic. e.g. C(h : θ) = θ 1 exp[ θ 2 (h θ 3 h 1 h 2 + θ 4 h 2 2) 1/2 ] PAGE 19

20 2.2 Models c HYON-JUNG KIM, 2016 b) Sill anisotropy - Either the assumption of second-order stationarity is violated or there are measurement errors which are correlated or do not have mean zero. c) Nugget anisotropy - Can be caused by correlated measurement errors. - Typically occurs in one direction only. d) Slope anisotropy - Can be dealt with a similar fashion as geometric anisotropy. Other types of anisotropy i) Geometric anisotropy: A covariance function is geometrically anisotropic if a positive definite matrix A exists such that C(h) = C([h Ah] 1/2 ) for all h ii) Zonal anisotropy Estimation of C( ) and γ( ) (revisited) Empirical (or sample, or experimental) semivariogram For a sample of given realizations from Z( ) where the mean function is taken to be constant, the empirical semivariogram is the unbiased estimator of an isotropic semivariogram given by γ(h) = 1 2 N(h) {Z(s i ) Z(s j )} 2 s i s j =h When non-constant trend is assumed, the sample semivariogrm is computed based on the residuals where γ(h) = 1 2 N(h) ˆɛ(s i ) = Z(s i ) m(s i ; ˆβ), {ˆɛ(s i ) ˆɛ(s j )} 2 s i s j =h i = 1,..., n PAGE 20

21 2.2 Models c HYON-JUNG KIM, 2016 This estimator is unbiased for the semivariogram (assuming the correct mean function has been adopted): method of moments type estimator. When data locations are irregularly spaced, we partition the lag space H = {(s t) : s, t D} into lag classes or windows H 1,..., H k, say, and assign each lag in the data set to one of these classes. For non-regularly spaced data, this estimator is approximately unbiased because the grouping (binning) of lags into classes cause a blurring effect. Need to replace s i s j = h with s i s j T (h) where T (h) is a tolerance region about h. γ + (h) = 1 2 AVG{[Z(s i) Z(s j )] 2 : s i s j T (h l )} Two main types of partitions: 1. Polar partitioning, i.e. angle and distance classes 2. Rectangular partitioning Rules of thumb to be considered (Journel and Huijbregts, 1978): i) Empirical semivariogram should be considered only for distances for which the number of pairs is greater than (about) 30. ii) The distance of reliability is half the maximum distance over the field of data. Robust semivariogram estimators - Cressie and Hawkins 1984 γ(h) = 1 { 2 N(h) N(h) ˆɛ(s i) ˆɛ(s j ) 1/2 } [.494/N(h)] - Genton 1998 Sample covariance function Recall that the estimator is given by Ĉ(h) = 1 N(h) (Z(s i ) Z)(Z(s j ) Z) N(h) PAGE 21

22 2.2 Models c HYON-JUNG KIM, 2016 stical Data: Anisotropy This estimator is biased even for regularly-spaced data and is meaningful only if the process is second-order stationary. NOTE: γ(h) Ĉ(0) Ĉ(h) y means that the semivariance depends only on the d Correlation function (Correlogram) n points, not direction. ρ(h) = C(h)/C(0) ropy means the semivariance also depends on direction Checking for isotropy distance. - Superimposition of directional sample semivariogram - Rose diagram: consists of smoothing the sample semivariograms, then n examine theanisotropy lag space connecting with with a smooth a curve, directional those lag vectors h for which these smoothed semivariograms are roughly equal. In effect, this plots estimated isocorrelation contours (in case of a second-order stationary process). semivariance Distance (h), degrees PAGE 22

23 2.3 Estimation for geostatistical models c HYON-JUNG KIM, Estimation for geostatistical models In summary, the general (or classical) model we use for our analysis of geostatistical data is Z(s) = m(s; β) + ɛ(s) where m( ; β) is a specified family of continuous functions, β is a vector of unknown parameters, {ɛ(s) : s D} is a intrinsically (or second-order) stationary process with mean zero and semivariogram γ( ; θ) (or covariance function C( ; θ)), and θ is a vector of unknown parameters. Overview of the geostatistical method: i) Using exploratory techniques, prior knowledge, and etc., set up an appropriate model (e.g. model given above) with assumptions on the mean function and stationarity of the process that generated the data. ii) Estimate β for the mean function (if it is not assumed to be constant): ˆβ (e.g. by ordinary least squares or median polish). iii) Obtain the fitted residuals: ˆɛ(s i ) = Z(s i ) m(s i ; ˆβ). Compute the empirical semivariogram of the residuals. iv) Select a valid semivariogram model that is compatible with the plot from the previous step. Fit the chosen model to empirical semivariogram to estimate the model s parameters. v) Using the fitted semivariogram model, re-estimate β by generalized least squares (or some other method which accounts for correlation among observations). vi) Repeat steps iii) - v) if needed. vii) Predict ( krige ) unobserved values at sites (or over regions) of interest and estimate the corresponding variances of prediction error. Determine optimal locations to take additional observations, and repeat the above steps if needed. Semivariogram Model Fitting Although the empirical semivariogram is unbiased for the semivariogram, it may not be negative-definite. Neither the sample semivariogram nor the sample covariance function can be used directly used for statistical inference,e.g., spatial prediction (kriging). PAGE 23

24 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 Fit a valid semivariogram model to the sample semivariogram Methods of fitting i) By inspection (by eye) ii) Ordinary nonlinear least squares (OLS) Min [ˆγ(h) γ(h; θ)] 2 with respect to θ h Semivariogram estimates are correlated! iii) Weighted nonlinear least squares (WNLS) Cressie, 1985 A weighted nonlinear estimator of γ(h; θ) is defined as a value ˆθ that minimizes the weighted residual sum of squares function Min N(h) [ˆγ(h) γ(h; θ)]2 [γ(h; θ)] 2 Note that the nonparametric estimates at large lags tend to receive relatively less weight. iv) Generalized nonlinear least squares (GLS) Min[ˆγ γ(θ)] [ ˆ Var(ˆγ)] 1 [ˆγ γ(θ)] - Derivation and calculation of ˆ Var(ˆγ)]? v) Maximum likelihood (ML) / Restricted maximum likelihood (REML) - Assuming normality for a model Z = Xβ + ɛ, L(β, θ; Z) = 1 2 log V 1 2 (Z Xβ) V 1 (Z Xβ) where V = V (θ) denotes the covariance matrix of Z = (Z 1,..., Z n ) and X is the model matrix for covariates. - Estimates θ and β simultaneously by finding values that maximizes L(β, θ). - Applicable to processes with second-order stationary errors only. - The restricted MLE (REML estimator) maximizes the log likelihood function associated with n rank(x) linearly independent error contrasts. It is known to be less biased than MLE s and thus, often more preferred especially when rank(x) is appreciable relative to n. PAGE 24

25 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 Model Selection Procedures - Visual inspection of semivariogram plot - Minimized weighted (or generalized) residual sum of squares function - Maximized log-likelihood (restricted log-likelihood) function - Penalized likelihood criteria e.g., Akaike s criterion AIC = L(ˆβ, ˆθ) no. of estimated parameters Estimating the large-scale variation If the mean function m(s; β) is linear or nonlinear function of the elements of β, then linear or nonlinear least squares can be used to fit the model to the data. This is called trend surface analysis. This approach is quite easy to implement due to wide availability of computing software (e.g. PROC REG in SAS or lm in Splus, etc). Other approaches: Median Polish The mean function is taken to be m(x l, y k ; β) = a + r k + c l Locally weighted least squares (LOESS) - Only assumes that the mean function is smooth. - Estimates the smooth trend in a moving fashion by fitting a site-specific first-order or second-order polynomial to only the most proximate data to a site. - Fits using weighted least squares with weights inversely related to distance from the site. Kernel estimator It is a type of local smoother which calculates a weighted average of observations near a target point(s): n i=1 ( ) 1 s b k si z 2 i b PAGE 25

26 2.3 Estimation for geostatistical models c HYON-JUNG KIM, 2016 where k( ) is called a kernel function or simply a kernel satisfying some moment conditions (e.g. a quadratic, or uniform kernel). Smoothing splines It is an estimator which minimizes a functional criterion (penalized residual sum of squares) to fit the data well and at the same time has some degree of smoothness. Spatial Regression i) Generalized least squares (GLS) with known covariance matrix Model: Z = Xβ + ɛ, E(ɛ) = 0, Var(ɛ) = V (θ) where V = V (θ) is a completely specified positive definite matrix. - GLS estimator of β : ˆβ GLS = (X V 1 X) 1 X V 1 Z ii) Estimated generalized least squares (EGLS) In practice, the true value of θ and consequently V is hardly known and completely specified. A natural way to deal with this problem is to replace θ in the evaluation of V by an estimator ˆθ, thereby obtaining ˆV. - EGLS estimator of β : ˆβ EGLS = (X ˆV 1 X) 1 X ˆV 1 Z Example: *** Mean structure or Covariance structure? The issue that was mentioned previously is that in practice, a decomposition of the data into large-scale and small-scale variation is not so clearcut. This problem is often addressed as follows (Statistics for Spatial Data, Cressie): One man s mean structure is another man s covariance structure. If replications of a spatial process are available, statistical procedures exist for distinguishing between two structures. In practice, however, geostatistical data are not usually replicated so we must settle for plausibility, rather than a high degree of certainty. PAGE 26

27 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, Spatial Prediction (Kriging) Goal: Predict a value of Z(s) at s 0 (an arbitrary location in D) Spatial prediction usually refers to interpolating a value rather than extrapolation for a random spatial process. The main idea relies on a form of weighted averaging in which the weights are chosen such that the error associated with the predictor is less than for any other linear sum. The terminology kriging is from D.G. Krige, a South African mining engineer who in the 1950 s developed empirical methods for predicting ore grades at unsampled locations using the known grades of ore sampled at nearby sites. For kriging, i) First choose a parametric model for the semivariogram or covariance function. ii) Estimate the semivariogram (covariance) parameters. iii) Make predictions and uncertainty estimates given the parameter estimates. The types of Kriging: a. Simple Kriging: assumes a constant known mean, but is not often used because for unbiasedness constraint to be applicable in kriging equations, we must estimate the expected value. b. Ordinary Kriging: assumes a constant unknown mean (mean needs to be estimated). c. Universal Kriging: assumes a trend in x and y, and may include other spatially varying covariates. 1. Ordinary Kriging (O.K.) by D.G. Krige Basic assumptions: i) The mean function is assumed to be constant. ii) The semivariogram is assumed to be known. Restrictions to obtain an ordinary kriging predictor: PAGE 27

28 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 i) It is a linear combination of the data values. ii) It is unbiased. iii) It minimizes the variance of prediction error among all functions satisfying the above 2 properties. n min Var[ λ i Z(s i ) Z(s 0 )] i=1 subject to Then, Ẑ(s 0 ) = n λ i = 1 i=1 n λ i Z(s i ) with E[Ẑ(s 0)] = µ. i=1 Kriging gives us the best linear unbiased predictor (BLUP) at any new location s 0. With the method of Lagrange multiplier (from Calculus), it is shown that the optimal coefficients λ 1,..., λ n are the first n elements of the vector λ o that satisfies the following system of linear equations, known as the ordinary kriging equations: Γ o λ o = γ o where λ o = (λ 1,..., λ n, m) γ o = [γ(s 1 s 0 ),..., γ(s n s 0 ), 1] γ(s i s j ) for i = 1,..., n; j = 1,..., n Γ o = 1 for i = n + 1; j = 1,..., n 0 for i = n + 1; j = n + 1 and m is a Lagrange multiplier and Γ o is symmetric. The minimized variance called the kriging variance is given by n σok(s 2 0 ) = λ i γ(s i s 0 ) + m = λ oγ o. i=1 PAGE 28

29 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Example: y x Take γ( h ) = 1 exp( h /2). Γ o = 1 exp( 5/2) 1 exp( 1/2) 1 exp( 1) γ o = 1 exp( 2/2) 1 exp( 1) 1 exp( 2/2) λ o = Γ 1 o γ o = σ 2 OK(s 0 ) = λ o γ o = PAGE 29

30 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Alternative expressions for ordinary kriging predictor and prediction variance which do not involve the unknown Lagrange multiplier are given below. Define λ = (λ 1,..., λ n ) γ = (γ(s 1 s 0 ),..., γ(s n s 0 )) Γ = {γ(s i s j )} Then it can be shown that So the OK predictor can be obtained as and the kriging variance is m = 1 1 Γ 1 γ 1 [ Γ 1 1 γ = Γ 1 γ Γ 1 ] γ 1 Γ Ẑ(s 0 ) = [ γ Γ 1 ] γ 1 Γ Γ 1 Z σ 2 OK(s 0 ) = γ Γ 1 γ (1 1 Γ 1 γ) 2 1 Γ (1 α)% prediction interval for Z(s 0 ), assuming the random field is Gaussian: Ẑ(s 0 ) ± z α/2 σ OK (s 0 ) where z α/2 is the upper α/2 percentage point of the standard normal distribution. Remarks: Ordinary kriging is derived under the assumption of constant mean. This assumption will be relaxed later in discussion of Universal kriging. It is also derived under an assumption that the semivariogram is known. In practice, it is hardly known and must be estimated, and the estimator ˆγ( ) replace γ( ) in kriging equations and kriging variance. However, it should be noted that the estimated kriging variance tends to underestimate the prediction error variance of the estimated OK predictor because it does not account for the estimation error incurred in estimating θ. Example continued from p 20: PAGE 30

31 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Suppose we wish to minimize the kriging variance at s 0 (a new site inside the sampling configuration), and we have sufficient resources to take an observation at any one of the remaining unsampled sites (excluding s 0 ). Kriging variances at s 0 corresponding to the addition of each of the sites, A,B,C and D are Cross Validation It is a method of evaluating the aptness of a spatial correlation model using only data from the sample. It can be used for evaluating choices of search radius, lag tolerance, etc. Procedure: i) For location s i, omit z i from the data set temporarily. ii) Estimate Z(s i ) = z i from the remaining points and call it ẑ i. iii) Compare the estimate ẑ i to z i. iv) Repeat the above steps for all points i = 1,..., n in the sample. v) Compute the summary statistics and graphs of the cross-validation error distribution. Summary statistics: 1. Average of prediction sum of squares (PRESS): 1 n ni=1 (z i ẑ i ) 2 where ẑ i indicates the prediction of z i from the rest of the data. 2. Mean of standardized PRESS residuals: 1 n n (z i ẑ i )/ˆσ i i=1 where ˆσ 2 i is the mean squared prediction error for predicting z i from the rest. 3. Root mean squared prediction (standardized) residuals: 1 ( ) n 2 zi ẑ i n i=1 ˆσ i 4. Histogram, scatterplots, of maps of PRESS residuals or standardized PRESS residuals Cautions: The model that appears best may depend on which summary statistics you used. PAGE 31

32 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, Universal Kriging A constant-mean assumption in ordinary kriging may not be reasonable in many practical situations. Two extensions which allow for nonconstant mean are universal kriging and median polish kriging. Assume Z(s) = β 0 + β 1 f 1 (s) β p f p (s) + ɛ(s) where f j ( ) s are functions of spatial location (which can be any covariates measured at each location) and ɛ( ) is assumed to be intrinsically stationary. Again, we seek to find a linear unbiased estimator which minimizes the variance of prediction error: n min Var[ λ i Z(s i ) Z(s 0 )] i=1 subject to n E[ λ i Z(s i )] = β 0 + β 1 f 1 (s) β p f p (s) i=1 (This yields a set of p + 1 constraints.) Then there are p + 1 Lagrange multiplier to be found, and the algebra is messier than the case of ordinary kriging. The optimal coefficients λ 1,..., λ n are the first n elements of the vector λ U that satisfies the following system of linear equations (UK equations): Γ U λ U = γ U where λ U = (λ 1,..., λ n, m 0, m 1,..., m p ) γ U = [γ(s 1 s 0 ),..., γ(s n s 0 ), 1, f 1 (s 0 ),..., f p (s 0 )] γ(s i s j ) for i = 1,..., n; j = 1,..., n Γ U = f j 1 n (s i ) for i = 1,..., n; j = n + 1,..., n + p for i = n + 1,..., n + p + 1; j = n + 1,..., n + p + 1 and Γ U is a symmetric (n + p + 1) (n + p + 1) matrix. We should try to understand why the trend exists based on the nature of our data and use a simple form of the trend if possible. Then, we subtract this trend from the observed PAGE 32

33 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 data to obtain the residuals. We then use the residuals to compute the sample variogram, fit a model variogram to it, predict the values at the unsampled locations ( kriged the residuals), and finally add the kriged residuals back to the trend. OTHER EXTENSIONS OF ORDINARY KRIGING: We have considered point kriging, i.e. prediction at a single site so far. Sometimes it is desirable to predict the average value over a region. This can be done by a straight forward extension of OK called ordinary block kriging. In some cases, quantity as P (Z(s 0 ) z 0 Z) (e.g. ozone levels in air cannot exceed 2 ppm in environmental monitoring) is of more importance and a method for predicting such a quantity is called indicator kriging, which utilizes 0-1 data (exceeds standard or not). In other situations, there are measurements for more than one variable at each data location. An extended method which utilizes dependence between variables as well as dependence within variables to predict values at unsampled locations, is called cokriging. Block Kriging Suppose that we want to predict the average value of Z over a region B, i.e., B Z(B) Z(s)ds B where B is the area of the block. The theoretical development is similar as in ordinary kriging and yields ordinary block kriging equations, where Γ OB λ OB = γ OB γ OB = [γ(b, s 1 ),..., γ(b, s n ), 1] γ(b, s i ) = B 1 γ(u s i )du The ordinary block kriging predictor of is given by B n Ẑ(s 0 ) = λ OB,i Z(s i ) i=1 where λ B,1,..., λ B,n are the first n elements of λ OB. The kriging variance is given by λ OBγ OB B 2 γ(u v)dudv B B PAGE 33

34 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 In practice, it will generally be necessary to evaluate the integrals by a numerical integration procedure. - Change of support problem Median Polish Kriging First do a median polish fit of overall, row and column effects and compute the residuals from this fit. Then, perform ordinary kriging to get, say, ˆɛ(s 0 ) using those residuals. To get the median polish kriging predictor of Z(s 0 ), add just the planar interpolated median polish fit at s 0 to the kriged residual: Ẑ(s 0 ) = m(s 0 ; â, {ˆr k }{ĉ l }) + ˆɛ(s 0 ) The kriging variance of the median-polish kriging predictor is taken (with little modification) to be the ordinary kriging variance based on the median polish residuals. Indicator Kriging Define the indicator random field 1 if Z(s) z I(s, z) = 0 otherwise The indicator random field is intrinsically stationary if the following conditions hold: i) E[I(s, z)] F (z) for all s and all z R. ii) Var[ I(s, z) I(s + h, z)] 2γ I,z (h, z) for all h and all z R. Indicator kriging proceeds as does ordinary kriging, but with I(s i, z) in place of z i and γ I,z ( ) instead of γ( ). Prediction is often carried out at K levels z 1,..., z k, which requires the K corresponding semivariograms to be estimated and modeled. - Other simple methods of spatial prediction: 1. Method of polygons. 2. Weighted average based on triangulation. 3. Inverse distance (k-nn) method. PAGE 34

35 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Characterization of spatial cross-dependence When sampling over a spatial domain, measurements are often collected on more than one variables, say m variables and we may also be interested in the correlations between them. Consider for now the simplest case where we confine our development to the case of two variables, i.e. m = 2. As before, there are several functions that can be used to characterize the dependence of two variables. Cross-covariance function C ij (s, t) = Cov(Z i (s), Z j (t)) i, j = 1, 2 Note that C ij (s, t) C ij (t, s) for i j, C ij (s, t) C ji (s, t) for i j, in general. Traditional cross-variogram 2ν ij (s, t) = Cov(Z i (s) Z i (t), Z j (s) Z j (t)) i, j = 1, 2 Pseudo cross-variogram Note that ν ij 2γ ij (s, t) = Var(Z i (s) Z j (t)) i, j = 1, 2 requires that data on both variables must be measured at the same locations or at least at many of the same locations, whereas γ ij requires that the two variables be measured in the same units in order to be meaningful. It is recommended to standardize the variables before estimating this quantity. Estimation: Sample cross-covariance function (h = s t) Sample cross-variograms Example: 2Ĉij(s, t) = 2ˆν ij (s, t) = 2ˆγ ij (s, t) = 1 Zi (s)z j (t) Z (i) N(h) h Z(j) +h 1 (Zi (s) Z i (t))(z j (s) Z j (t)) N(h) 1 (Zi (s) Z j (t)) 2 (Z (i) N(h) h Z(j) +h )2 PAGE 35

36 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Cokriging Suppose that the data are now m 1 vectors Z(s 1 ),..., Z(s n ) and we may want to predict one or more values of the variables at an unsampled location. Denote the jth element of the ith of these vectors by Z j (s i ). Let s 0 denote the unsampled site. First consider that we wish to predict, say Z 1 (s 0 ). We can merely do the ordinary kriging to get a predicted value. However, if the other variables are correlated with the first variable, then a better predictor can be obtained from basing the prediction on all of the elements of Z(s 1 ),..., Z(s n ). The best linear unbiased predictor of Z 1 (s 0 ) based on all of these others is called the (ordinary) cokriging predictor. When we wish to predict the entire vector of variables at an unsampled site, i.e. Z(s 0 ), then it can be accomplished using similar ideas and is called multivariate spatial prediction. For m = 2, define Z 1 = [Z 1 (s 1 ),..., Z 1 (s n )] and Z 2 = [Z 2 (s 1 ),..., Z 2 (s n )]. Then the cokriging predictor of Z 1 (s 0 ) is given by λ 1 Z 1 + λ 2 Z 2 whereas the multivariate spatial predictor of Z(s 0 ) is given by Λ 1 Z 1 + Λ 2 Z 2 where Λ 1 and Λ 2 are matrices. COKRIGING EQUATIONS: Assume that m = 2 and that the two variables are jointly second-order stationary for simplicity. The model for the process is Z(s) = β + ɛ(s). Or Also, Z 1 = 1 0 β 1 + ɛ 1. Z β 2 ɛ 2 Z(s 0 ) = 1 0 β 1 + ɛ 1(s 0 ). 0 1 ɛ 2 (s 0 ) β 2 PAGE 36

37 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Define where and Σ = Σ 11 Σ 12 Σ 21 Σ 22 C ij (s 1, s 1 ) C ij (s 1, s n ) Σ ij = Cov(ɛ i, ɛ j ) =.. C ij (s n, s 1 ) C ij (s n, s n ) C 11 (s 1, s 0 ). C 11 (s n, s 0 ) c 1 = Cov(ɛ, ɛ 1 (s 0 )) = C 21 (s 1, s 0 ). C 21 (s n, s n ) The cokriging equations to predict Z 1 (s 0 ) are Σ 11 Σ λ 1 c 1 Σ 21 Σ λ 2 = m m 2 The ordinary cokriging predictor of Z 1 (s 0 ) is then λ 1 Z 1 + λ 2 Z 2 and the associated cokriging variance is given by (λ 1, λ 2 )c 1 + m 1. Note that the symmetry condition C ij (s, t) = C ij (t, s) should be satisfied in order for cokriging based on 2ν ij to give the optimal predictor. This condition is not required for 2γ ij which always gives the same predictor as cokriging based on the cross-covariance function. We can get the same results using the variance-based cross-variogram. The cokriging PAGE 37

38 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 equations in these terms are Γ 11 Γ λ 1 γ 1 Γ 21 Γ λ 2 = m m 2 The ordinary cokriging predictor of Z 1 (s 0 ) is then λ 1 Z 1 + λ 2 Z 2 and the associated cokriging variance is given by (λ 1, λ 2 )γ 1 + m 1. In order to implement cokriging, we need to estimate the cross-covariance functions or cross variograms, choose valid parametric models for these functions, and fit the model to the estimates. Much research is still needed on these topics especially because of the scarcity of known valid models. EXAMPLE: m = 2 1 exp( l k ) for i = j γ ij (k, l) = 1 0.5exp( l k + 1 ) for i j PAGE 38

39 2.4 Spatial Prediction (Kriging) c HYON-JUNG KIM, 2016 Space-time Geostatistics Suppose that we have observed spatial data at each of m time points, i.e., {Z(s 1i, t i ),..., Z(s ni, t i ) : i = 1,..., m} where s 1i,..., s ni are the n i data locations at time i, and t 1 < t 2 <... < t m are the times of observation. When n i n for all i, the data are said to be rectangular. The data are usually assumed to be an incomplete sampling of one realization of the stochastic process {Z(s, t), s D(t), t T }. If D(t) D and T = {1, 2,..., }, then we can view this as a time series of spatial processes. If the temporal correlation is non-negligible, then we generally need to assume spatial and temporal stationary of some kind. The generic space-time problem is to use the data to predict Z(s, t), where s D and t 0 T. Typically, t 0 t m. In principle, we can use ideas from spatial kriging to perform space-time kriging, but some differences arise: - Data in time often reveal a cyclical or periodic component but data in space usually do not. e.g. This can be dealt with by using a mean function model that contains some periodic components. - We must use a valid space-time covariance function or semivariogram. i) Include an extra parameter to scale properly for time. ii) Assume space-time additivity. iii) Assume space-time separability. PAGE 39

40 c HYON-JUNG KIM, Lattice Data Recall that the definition of the lattice data: nontrivial observations are taken at a finite number of sites whose whole constitutes the entire study region. For this type of data, there is no possibility of a response between data locations. When the data locations are points, geostatistical methods can be used to handle the data. So we shall focus on the cases where data locations are regions. For areal (lattice) data, we use neighbour information to define spatial relationships. Examples: - Cancer rate in each city district - Census data with zipcode division for a metropolitan area - Remotely sensed data Exploratory Data Analysis Many of the EDA tools previously introduced for geostatistical data can also be applied to lattice data. For data on a regular grid: median polish, plots of row or column mean versus row or column index, same-lag scatterplots Irregularly spaced regions: 3-D scatterplots, semivariogram cloud, plots of each datum against the average of its nearest neighbors, gray-scale maps, plot of response versus area of region, etc. The data analysis involves: representation of spatial proximity, testing for spatial pattern using Moran s I or Geary s c statistic, modeling with autoregressive models (SAR, CAR). Measures of spatial autocorrelation The study objective is mainly to measure how strong the tendency is for observations from nearby regions to more (or less) alike than observations from regions far apart, and then judge whether any apparent tendency is sufficiently strong that it is unlikely to be due to chance alone. - The data locations may be points or regions and response variables can be either discrete or continuous. PAGE 40

41 c HYON-JUNG KIM, 2016 Examples of spatial autocorrelation for binary (0-1) data: The general cross-product statistic Notation: - Let Z i denote the response at the ith location, i = 1,..., n. - Let Y ij be a measure of how similar or dissimilar the responses are at locations i and j. - Let W ij be a measure of the spatial proximity of locations i and j. - Define matrices (for future reference) Y = (Y ij ) and W = (W ij ). W is called a proximity matrix. The general cross-product statistic is given by C = i W ij Y ij. j If C too small If C too large Example (hypothetical): Let Y ij = (Z i Z j ) 2 for binary Z i s. 1 if locations i and j are adjacent W ij = 0 otherwise PAGE 41

42 c HYON-JUNG KIM, 2016 Testing the statistical significance of C: H 0 : no correlation - Normal approximation - Comparison to randomization distribution - Monte Carlo approach i) Normal approximation of C: Let S 0 = W ij, S 1 = 1 i j 2 i j(w ij + W ji ) 2, S 2 = i (W i. + W.i ) 2 Let T 0, T 1, and T 2 similarly but for the Y i j s. Then, E(C) = S 0T 0 n(n 1) and Var(C) = S 1T 1 2n(n 1) + (S 2 2S 1 )(T 2 2T 1 ) + (S2 0 + S 1 S 2 )(T0 2 + T 1 T 2 ) [E(C)] 2 4n(n 1)(n 2) n(n 1)(n 2)(n 3) C approx. N(E(C), Var(C)). Compute Example continued: z = C E(C) 1 Var(C) PAGE 42

43 c HYON-JUNG KIM, 2016 ii) Randomization distribution - List all possible arrangements of the observed responses over the locations obtained by permutation of responses. - Compute C for each arrangement, and rank these. - Determine where the data s C values fits in; P -value for the test is the number of C values in the randomization distribution as extreme or more extreme than the observed C. Example continued: iii) Monte Carlo approach - Observe that complete enumeration of the possible arrangements may be computationally prohibitive even for moderately-sized data sets. - So instead, obtain a random sample from the randomization distribution and follow the same type of procedure. - In order to implement this random sampling, generate n random numbers (one for each data location), rank these random numbers from smallest to largest, then rearrange the observations in accordance with the ranking of random numbers. C is computed for this arrangement, and repeat the whole process m times. - The P -values estimates the proportion of C values as extreme or more extreme than the observed C, and is given by P = 1 + number of C values observed C 1 + m Example: PAGE 43

44 c HYON-JUNG KIM, Join-Count Statistics A subclass of general cross-product statistics which are for use with binary data. - Code the data as either 1 (black) or 0 (white). The black-white classification is for the purpose of making a map. - Question of interest: Are neighboring locations more likely to display the same color (or opposite colors) than what we would expect in the absence of spatial correlation? Procedures: Classify the joins between contiguous regions as BB, BW, or W W. Define W ij = 1 if regions i and j share an edge, and 0 otherwise (using rook s definition of neighborhood). Other ways of defining neighborhoods: bishop s, queen s etc. Count the number of joins of a specified type, e.g. the # of BW joins= BW. Note that if we define Y ij = (Z i Z j ) 2, then C = i W ij Y ij = 2BW j i.e. BW = C/2. Likewise, BB = C /2 where C is the value of C obtained by defining Y ij = Z i Z j. If the total # of joins in the system is J, then W W = J BB BW. BW statistic: (There is some evidence that this statistic is slightly better than the other two.) Let b = # of black regions and w = # of white regions; (b + w = n). Note E(BW ) = 1E(C) and Var(BW ) = 1Var(C). 2 4 It can be also shown that T 0 = 2bw, T 1 = 2T 0, T 2 = 4nbw If the regions form a rectangular r c lattice, and the rook s contiguity definition is used, then S 0 = 2(2rc r c), S 1 = 2S 0, S 2 = 8(8rc 7r 7c + 4) PAGE 44

c HYON-JUNG KIM, 2016 Commonly used definitions of neighborhood Areal Data - Rook s: spatial correlation down rows and across rows - Bishop s: spatial correlation in diagonal direction Border/Edge

45 c HYON-JUNG KIM, 2016 Commonly used definitions of neighborhood Areal Data - Rook s: spatial correlation down rows and across rows - Bishop s: spatial correlation in diagonal direction Border/Edge Connectivity - Queen s: omni-directional correlation Queen a single shared boundary point means they are neighbours. Rook requires more than a single shared point to constitute neighbours. The same approach can be used for data at irregularly spaced and shaped locations, but only formulas given for T 0, T 1, and T 2 can apply but S 0, S 1, and S 2 cannot. -BW statistic: T 0 = b(b 1), T 1 = 2T 0, T 2 = 4b(b 1) 2 Extensions to polytomous categorical data (i.e. a multi-colored map) are possible. 3. Moran s and Geary s statistic (for continuous data) Moran s I (1950, Biometrika): where Z = i Z i n. E(I) = 1 n 1 I > 1 n 1 I < 1 n 1 under independence. I = n i j W ij (Z i Z)(Z j Z) S 0 i(z i Z) 2 20 / 46 - Normal approximation to distribution of I under independence (n > 25): E (I) as before. PAGE 45

Spatial and Environmental Statistics

Spatial and Environmental Statistics Dale Zimmerman Department of Statistics and Actuarial Science University of Iowa January 17, 2019 Dale Zimmerman (UIOWA) Spatial and Environmental Statistics January