Chapter 4 - Fundamentals of spatial processes Lecture notes

TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017

STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative correlation when competition Part of a space-time process Temporal snapshot Temporal aggregation Statistical analysis Incorporate spatial dependence into spatial statistical models Active research field Computer intensive tasks gives specialized software

STK4150 - Intro 3 Hierarchical (statistical) models Data model Process model In time series setting - state space models

STK4150 - Intro 4 Hierarchical (statistical) models Data model Process model In time series setting - state space models Example Y t =αy t 1 + W t, t = 2, 3,... Z t =βy t + η t, t = 1, 2, 3,... W t ind (0, σ 2 W ) η t ind (0, σ 2 W ) Process model Data model We will do similar type of modelling now, separating the process model and the data model: Z(s i ) = Y (s i ) + ε(s i ), ε(s i ) iid

STK4150 - Intro 5 Spatial prediction An unknown function is of interest, i.e. Y (s), s D Standard problem: Observed Z = (Z 1,..., Z m ) Z i = Y (s i ) + ε i, i = 1,..., n Want to predict function in an unobserved position s 0, i.e. Y (s 0 ) Multiple methods, assumptions and framework are different. Interpolation, Regression, Splines Kriging Hierarchical model (with a Gaussian random field)

Methods for spatial prediction Regression: Find function of minimum least squares. K f (s) = β i f i (s) = β T f(s) i=1 Know: β = (F T F) 1 F T Z, with F ij = f j (s i ). Kriging: Find optimal unbiased linear predictor under Squared error loss. min k,l E{(Y (s 0) k L T Z) 2 } Hierarchical model: Find optimal predictor under Squared error loss. Know: a(z) = E{Y (s 0 ) Z} min E{(Y (s 0) a(z)) 2 } a(z)

Geostatistical models Assume {Y (s)} is a Gaussian process (Y (s 1 ),..., Y (s m )) is multivariate Gaussian for all m and s 1,..., s m Need to specify µ(s) = E(Y (s)) C Y (s, s ) = cov(y (s), Y (s )) Assuming 2. order stationarity: µ(s) = µ, s C Y (s, s ) = C Y (s s ), s, s Common extension: µ(s) = x(s) T β Often: Z(s i ) Y (s i ), σε 2 independent N(Y (s i ), σε) 2

STK4150 - Intro 8 Covariance function and Variogram Dependence can be specified through covariance functions or the Variogram 2γ Y (h) var[y (s + h) Y (s)] =var[y (s + h)] + var[y (s)] 2cov[Y (s + h), Y (s)] =2C Y (0) 2C Y (h) Variograms are more general than covariance functions Variogram can exist even if var[y (s)] =! In variograms relative changes are modelled model rather than the process it self. In Geostatistics it is common to use variograms. All formulas Covariance formulas have corresponding Variogram formulas. γ Y (h) sometimes called semi-variogram

Stationary, isotropic, anisotropic Strong stationarity For any (s 1,..., s m ) and any h [Y (s 1 ),..., Y (s m )] = [Y (s 1 + h),..., Y (s m + h)] Stationarity in mean E(Y (s)) = µ, for all s D A Stationarity covariance (Depend only on lag ) cov(y (s), Y (s + h)) = C Y (h), for all s, s + h D A Isotropic covariance (depend only on length of lag) cov(y (s), Y (s + h)) = C Y ( h ) for all s, s + h D A Geometric anisotropy in covariance ( Rotated coordinate system ) cov(y (s), Y (s + h)) = C Y ( Ah ), for all s, s + h D A Weak stationarity : Stationarity in mean and a stationarity covariance. (recall time series)

Isotropic covariance functions/variograms Matern covariance function C Y (h; θ) = σ 2 1{2 θ2 1 Γ(θ 2 )} 1 { h /θ 1 } θ2 K θ2 ( h /θ 1 ) Powered-exponential C Y (h; θ) = σ 2 1 exp{ ( h /θ 1 ) θ2 } Exponential Gaussian C Y (h; θ) = σ 2 1 exp{ ( h /θ 1 )} C Y (h; θ) = σ 2 1 exp{ ( h /θ 1 ) 2 } Note: Many different ways to define the Range parameter θ 1. e.g. σ 2 1 exp{ 3( h /R)ν }

Stationary spatial covariance functions STK4150 - Intro 11

Simulation of stationary spatial random field STK4150 - Intro 12

Isotropic Covariance Benefit of isotropic assumption is that we only need 1D functions Note: not all covariance functions / variograms valid in 1D are valid as isotropic covariance functions in higher dimensions.

TK4150 - Intro 14 Bochner s theorem A covariance function needs to be positive definite Theorem (Bochner, 1955) If C Y (h) dh <, then a valid real-valued covariance function can be written as C Y (h) = cos(ω T h)f Y (ω)dω where f Y (ω) 0 is symmetric about ω = 0. f Y (ω): Spectral density of C Y (h).

Nugget effect and Sill We have C Z (h) =cov[z(s), Z(s + h)] =cov[y (s) + ε(s), Y (s + h) + ε(s + h)] =cov[y (s), Y (s + h)] + cov[ε(s), ε(s + h)] =C Y (h) + σεi 2 (h = 0) Assume Then C Y (0) =σy 2 lim [C Y (0) C Y (h)] =0 h 0 C Z (0) = σy 2 + σε 2 lim [C Z (0) C Z (h)] = σε 2 = c 0 h 0 Possible to include nugget effect also in Y -process. Sill Nugget effect TK4150 - Intro 15

Nugget/sill STK4150 - Intro 16

STK4150 - Intro 17 Nugget/sill Sill is a fixed finite level which the variogram converges towards at large lag (i.e. h ). Variograms without a sill (γ(h) as h ) has no equivalent correlation function. The nugget effect is variability below the resolution in our model. In the setting with point observations, i.e. Z i = Y (s i ) + ε i it is difficult (often impossible) to distinguish nugget effect in random field Y (s i ) and observation error ε i. This becomes a modelling choice. Some softwares mixes nugget effect and observation error.

STK4150 - Intro 18 Estimation of variogram/covariance function 2γ Z (h) var[z(s + h) Z(s)] Const expectation = E[Z(s + h) Z(s)] 2 Can estimate from all pairs having distance h between. Problem: Few/no pairs for all h Simplifications Isotropic: γ Z (h) = γ 0 Z ( h ) Lag bin: 2 γ 0 Z (h) = ave{(z(s i) Z(s j )) 2 ; s i s j T (h)} If covariates, use residuals

Boreality data Empirical variogram: Boreality data

STK4150 - Intro 20 Testing for independence If independence: γ 0 Z (h) = σ2 Z Test-statistic F = γ Z (h 1 )/ σ 2 Z, h 1 smallest observed distance Reject H 0 for F 1 large Permutation test: Keep spatial positions, and keep values of residuals, but scramble the pairing, i.e. reassign residuals to new spatial positions. Recalculate F for all permutations of Z (or for a random sample of permutations) If observed F is above 97.5% percentile, reject H 0 Boreality example: P-value = 0.0001

STK4150 - Intro 21 Prediction in multivariate Gaussian models (Z(s 1 ),..., Z(s m ), Y (s 0 )) is multivariate Gaussian Can use rules about conditional distributions: ( ) (( ) ( )) X1 µ1 Σ11 Σ MVN, 12 X 2 µ 2 Σ 21 Σ 22 Need E(X 1 X 2 ) =µ 1 + Σ 12 Σ 1 22 (X 2 µ 2 ) var(x 1 X 2 ) =Σ 11 Σ 12 Σ 1 22 Σ 21 Expectations: As for ordinary linear regression Covariances: New!

Prediction in the spatial model ( ) (( ) ( )) Y (s0 ) µ(s0 ) CY (s MVN, 0, s 0 ) c(s 0 ) Z µ Z c(s 0 ) T c(s 0 ) =cov[z, Y (s 0 )] =cov[y, Y (s 0 )] =(C Y (s 0, s 1 ),..., C Y (s 0, s m )) =c Y (s 0 ) C Z ={C Z (s i, s j )} { C Y (s i, s i ) + σ 2 C Z (s i, s j ) = ε, C Y (s i, s j ), s i = s j s i s j C Z E(Y (s0) Z) = µ(s0) + c(s 0 ) T C 1 Z (Z µ Z ) Var(Y (s0) Z) = C Y (s0, s0) c(s 0 ) T C 1 Z c(s 0)

Kriging = Prediction Model Y (s) =x(s) T β + δ(s) Z i =Y (s i ) + ε i Prediction of Y (s 0 ), i.e. in an unobserved location. Linear predictors {L T Z + k} Optimal predictor minimize MSPE(L, k) E[Y (s 0 ) L T Z k] 2 =var[y (s 0 ) L T Z k] + {E[Y (s 0 ) L T Z k]} 2 Note: Do not make any distributional assumptions

TK4150 - Intro 24 Kriging MSPE(L, k) =var[y (s 0 ) L T Z k] + {E[Y (s 0 ) L T Z k]} 2 =var[y (s 0 ) L T Z k] + {µ Y (s 0 ) L T µ z k]} 2 Second term is zero if k = µ Y (s 0 ) L T µ Z. First term (c(s 0 ) = cov[z, Y (s 0 )]): var[y (s 0 ) L T Z k] =C Y (s 0, s 0 ) 2L T c(s 0 ) + L T C Z L Derivative wrt L T : giving 2c(s 0 ) + 2C Z L = 0 L = C 1 Z c(s 0) Y (s 0 ) =µ Y (s 0 ) + c(s 0 ) T C 1 Z [Z µ Z ] MSPE(L, k ) =C Y (s 0, s 0 ) c(s 0 ) T C 1 Z c(s 0)

Kriging (Simple Kriging) The Kriging predictions with a known mean is: c(s 0 ) =cov[z, Y (s 0 )] =cov[y, Y (s 0 )] =(C Y (s 0, s 1 ),..., C Y (s 0, s m )) =c Y (s 0 ) C Z ={C Z (s i, s j )} { C Y (s i, s i ) + σ 2 C Z (s i, s j ) = ε, C Y (s i, s j ), s i = s j s i s j Y (s 0 ) = µ Y (s 0 ) + c Y (s 0 ) T C 1 Z (Z µ Z )

Gaussian assumptions Recall now Y, Z are MVN ( ) (( Z µz = MVN Y(s 0 ) µ Y (s 0 ) Give ) ( )) CZ c, Y (s 0 ) T c Y (s 0 ) C Y (s 0, s 0 ) Y (s 0 ) Z N(µ Y (s 0 ) + c(s 0 ) T C 1 Z [Z µ Z ], C Y (s 0, s 0 ) c(s 0 ) T C 1 Z c(s 0)) Same as kriging! The book derive this directly without using the formula for conditional distribution

STK4150 - Intro 27 Compare to interpolation Then: Assume no observation error Consider deviations from µ(s),i.e. f (s) = Y (s) µ(s). Set f i (s) = C Y (s s i ) F = C Z (= C Y no observation error) (F T F) 1 F T = C 1 Z (symmetry) β = C 1 Z Z f (s 0 ) = n i=1 β ic Y (s 0 s i ) Which gives the same result again: f (s 0 ) = β T c(s 0 ) = (Z T )C 1 Z c(s 0) Y (s 0 ) = µ(s 0 ) + c(s 0 ) T C 1 Z (Z µ Z) If we include µ(s) we get

Spatial prediction comments Given the mean and covariance function Kriging and the a Gaussian random field model give identical spatial correlations. There are stronger assumptions underlying the Gaussian model than needed in Kriging Kriging is the optimal linear predictor for any distribution (not only gaussian) For the Gaussian model the linear prediction is optimal (among all), for other distributions there might be other predictors which are better. The Gaussian model is easier to extend using a Hierarchical approach In Kriging and Gaussian random field, interpolating functions are determined by data In Kriging it is common to use plugin estimate for the covariance function The hierarchical approach is suited to include uncertainty on model parametrers. the Mean Squared Prediction Error (MSPE) is used for both Kriging and hierarchical model (HM)

STK4150 - Intro 29 So fa and Next: Simple kriging Linear predictor Assume parameters known Equal to conditional expectation

STK4150 - Intro 30 So fa and Next: Simple kriging Linear predictor Assume parameters known Equal to conditional expectation Unknown parameters Ordinary kriging Plug-in estimates Bayesian approach Non-Gaussian models

STK4150 - Intro 31 Unknown parameters So far assumed parameters known, what if unknown? Direct approach -Universal and Ordinary kriging (for mean parameters) Plug-in estimate/empirical Bayes Bayesian approach

Kriging (cont) Assuming Then and Y (s 0 ) = µ Y (s 0 ) + c Y (s 0 ) T C 1 Z (Z µ Z ) E[Y (s)] =x(s) T β Z(s i ) Y (s i ), σ 2 ε ind.gau(y (s i ), σ 2 ε) µ Z =µ Y = Xβ C Z =Σ Y + σ 2 εi c Y (s 0 ) =(C Y (s 0, s 1 ),..., C Y (s 0, s m )) T Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T [Σ Y + σ 2 εi] 1 (Z Xβ)

Kriging Simple kriging, EY (s) = x(s) T β, known Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T C 1 (Z Xβ) Z

Kriging Simple kriging, EY (s) = x(s) T β, known Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T C 1 (Z Xβ) Z Universal kriging, EY (s) = x(s) T β, unknown Ŷ (s 0 ) =x(s 0 ) T βgls + c Y (s 0 ) T C 1 Z (Z X β gls ) β gls =[X T C 1 Z X] 1 X T C 1 z Z Bayesian kriging, EY (s) = x(s) T β, with β N(β 0, Σ 0 ) Ŷ (s 0 ) =x(s 0 ) T βb + c Y (s 0 ) T C 1 Z (Z X β B ) β B =β 0 + Σ 0 X T (XΣ 0 X T + C Z ) 1 (Z Xβ 0 ) Possible to show the SK and UK are the limiting cases of BK Σ 0 0 BK SK Σ 0 BK UK

TK4150 - Intro 36 Ordinary kriging Ordinary kriging, EY (s) = µ, unknown (special case of UK) Ŷ (s 0 ) ={c Y (s 0 ) + 1(1 1T C Z 1c y (s 0 )) 1 T C 1 Z 1 } T C 1 Z Z = µ gls + c Y (s 0 ) T C 1 Z (Z 1 µ gls) µ gls =[1 T C 1 Z 1] 1 1 T C 1 z Z To minimize the MSPE, we make an unbiased estimate with the minimum variance. Unbiased constraint: Prediction variance: MSPE(λ) = E(Y (s 0 ) λ T Z) 2 E[Y (s 0 )] E[λ T Z] = 0 PV (λ) = C Y (s 0, s 0 ) 2λ T c Y (s 0 ) + λ T C Z λ

Ordinary kriging equations We have: E[Y (s 0 )] = µ and E[λ T Z] = λ T E[Z] = λ T 1µ thus The problem then becomes: subject to: µ = λ T 1µ 1 = λ T 1 min PV (λ) λ 1 = λ T 1 Solved by Lagrange multiplier, i.e. minimize C Y (s 0, s 0 ) 2λ T c Y (s 0 ) + λ T C Z λ 2κ(λ T 1 1) Differentiation wrt λ T gives (which is combined to the final result presented on previous page): 2c Y (s 0 ) + 2C Z λ =2κ1 λ T 1 =1