Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013
STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative correlation when competition Typically part of a space-time process Temporal snapshot Temporal aggregation Statistical analysis Incorporate spatial dependence into spatial statistical models Relatively new in statistics, active research field!
STK4150 - Intro 3 Hierarchical (statistical) models Data model Process model In time series setting - state space models
STK4150 - Intro 4 Hierarchical (statistical) models Data model Process model In time series setting - state space models Example Y t =αy t 1 + W t, t = 2, 3,... Z t =βy t + η t, t = 1, 2, 3,... W t ind (0, σ 2 W ) η t ind (0, σ 2 W ) Process model Data model We will do similar type of modelling now, separating the process model and the data model: Z(s i ) = Y (s i ) + ε(s i ), ε(s i ) iid
STK4150 - Intro 5 Prediction Typical aim: Observed Z = (Z(s 1 ),..., Z(s m )) Want to predict Y (s 0 ) Squared error loss function: Optimal E(Y (s 0 ) Z) Empirical methods: E(Y (s 0) Z) E(Y (s 0) Z, ˆθ) Bayesian methods: E(Y (s 0) Z) = θ E(Y (s0) Z, θ)p(θ Z)dθ Require E(Y (s 0 ) Z, θ) Z Y are independent, but dependent marginally Require model for {Y (s)}
Geostatistical models Assume {Y (s)} is a Gaussian process (Y (s 1 ),..., Y (s m )) is multivariate Gaussian for all m and s 1,..., s m Need to specify µ(s) = E(Y (s)) C Y (s, s ) = cov(y (s), Y (s )) Usually assume 2. order stationarity: µ(s) = µ, s C Y (s, s ) = C Y ( s s ), s, s Possible extension: µ(s) = x(s) T β Often: Z(s i ) Y (s i ), σε 2 ind.gau(y (s i ), σε) 2
STK4150 - Intro 7 Prediction (Z(s 1 ),..., Z(s m ), Y (s 0 )) is multivariate Gaussian Can use rules about conditional distributions: ( ) (( ) ( )) X1 µ1 Σ11 Σ MVN, 12 X 2 µ 2 Σ 21 Σ 22 Need E(X 2 X 1 ) =µ 1 + Σ 12 Σ 1 22 (X 2 µ 2 ) var(x 2 X 1 ) =Σ 11 Σ 12 Σ 1 22 Σ 21 Expectations: As for ordinary linear regression Covariances: New!
TK4150 - Intro 8 Variogram and covariance function Dependence can be specified through covariance functions Not always that such exist, but prediction can still be carried out More general concept: Variogram Note: 2γ Y (h) var[y (s + h) Y (s)] =var[y (s + h)] + var[y (s)] 2cov[Y (s + h), Y (s)] =2C Y (0) 2C Y (h) Variogram can exist even if var[y (s)] =! Often assumed in spatial statistics, thereby the use of variograms When 2γ Y (h) = 2γ Y ( h ), we get an isotrophic variogram
Isotropic covariance functions/variograms Matern covariance function C Y (h; θ) = σ1{2 2 θ2 1 Γ(θ 2 )} 1 { h /θ 1 } θ2 K θ2 ( h /θ 1 ) Powered-exponential C Y (h; θ) = σ1 2 exp{ ( h /θ 1 ) θ2 } Exponential C Y (h; θ) = σ1 2 exp{ ( h /θ 1 )} Gaussian C Y (h; θ) = σ1 2 exp{ ( h /θ 1 ) 2 }
TK4150 - Intro 10 Bochner s theorem A covariance function needs to be positive definite Theorem (Bochner, 1955) If C Y (h) dh <, then a valid real-valued covariance function can be written as C Y (h) = cos(ω T h)f Y (ω)dω where f Y (ω) 0 is symmetric about ω = 0. f Y (ω): Spectral density of C Y (h).
Nugget effect and Sill We have C Z (h) =cov[z(s), Z(s + h)] =cov[y (s) + ε(s), Y (s + h) + ε(s + h)] =cov[y (s), Y (s + h)] + cov[ε(s), ε(s + h)] =C Y (h) + σεi 2 (h = 0) Assume Then C Y (0) =σy 2 lim [C Y (0) C Y (h)] =0 h 0 C Z (0) = σy 2 + σε 2 lim [C Z (0) C Z (h)] = σε 2 = c 0 h 0 Possible to include nugget effect also in Y -process. Sill Nugget effect TK4150 - Intro 11
Nugget/sill STK4150 - Intro 12
STK4150 - Intro 13 Estimation of variogram/covariance function 2γ Z (h) var[z(s + h) Z(s)] Const expectation = E[Z(s + h) Z(s)] 2 Can estimate from all pairs having distance h between. Problem: Few/no pairs for all h Simplifications γ Z (h) = γ 0 Z ( h ) Use 2 γ 0 Z (h) = ave{(z(s i) Z(s j )) 2 ; s i s j T (h)} If covariates, use residuals
Boreality data Empirical variogram: Boreality data
STK4150 - Intro 15 Testing for independence If independence: γ 0 Z (h) = σ2 Z Test-statistic F = γ Z (h 1 )/ σ 2 Z, h 1 smallest observed distance Reject H 0 for F 1 large Permutation test: Recalculate F for all permutations of Z If observed F is above 97.5% percentile, reject H 0 Boreality example: P-value = 0.0001
Kriging = Prediction Model Y =Xβ + δ Z =Y + ε Prediction of Y (s 0 ) Linear predictors {L T Z + k} Optimal predictor minimize MSPE(L, k) E[Y (s 0 ) L T Z k] 2 =var[y (s 0 ) L T Z k] + {E[Y (s 0 ) L T Z k]} 2 Note: Do not assume any distributional assumptions
Kriging MSPE(L, k) =var[y (s 0 ) L T Z k] + {E[Y (s 0 ) L T Z k]} 2 =var[y (s 0 ) L T Z k] + {µ Y (s 0 ) L T µ z k]} 2 Second term is zero if k = µ Y (s 0 ) L T µ Z. First term (c(s 0 ) = cov[z, Y (s 0 )]) var[y (s 0 ) L T Z k] =C Y (s 0, s 0 ) 2L T c(s 0 ) + L T C Z L Derivative wrt L T : giving 2c(s 0 ) + 2C Z L = 0 L = C 1 Z c(s 0) Y (s 0 ) =µ Y (s 0 ) + c(s 0 ) T C 1 Z [Z µ Z ] MSPE(L, k ) =C Y (s 0, s 0 ) c(s 0 ) T C 1 Z c(s 0)
Kriging c(s 0 ) =cov[z, Y (s 0 )] =cov[y, Y (s 0 )] =(C Y (s 0, s 1 ),..., C Y (s 0, s m )) =c Y (s 0 ) C Z ={C Z (s i, s j )} { C Y (s i, s i ) + σ 2 C Z (s i, s j ) = ε, C Y (s i, s j ), s i = s j s i s j Y (s 0 ) = µ Y (s 0 ) + c Y (s 0 ) T C 1 Z (Z µ Z )
Gaussian assumptions Assume now Y, Z are MVN ( ) (( Z µz = MVN Y(s 0 ) µ Y (s 0 ) Give ) ( )) CZ c, Y (s 0 ) T c Y (s 0 ) C Y (s 0, s 0 ) Y (s 0 ) Z N(µ Y (s 0 ) + c(s 0 ) T C 1 Z [Z µ Z ], C Y (s 0, s 0 ) c(s 0 ) T C 1 Z c(s 0)) Same as kriging! The book derive this directly without using the formula for conditional distribution
Kriging (cont) Assuming Then and Y (s 0 ) = µ Y (s 0 ) + c Y (s 0 ) T C 1 Z (Z µ Z ) E[Y (s)] =x(s) T β Z(s i ) Y (s i ), σ 2 ε ind.gau(y (s i ), σ 2 ε) µ Z =µ Y = Xβ C Z =Σ Y + σ 2 εi c Y (s 0 ) =(C Y (s 0, s 1 ),..., C Y (s 0, s m )) T Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T [Σ Y + σ 2 εi] 1 (Z Xβ)
STK4150 - Intro 21 Summary previous time/plan Simple kriging Linear predictor Assume parameters known Equal to conditional expectation
STK4150 - Intro 22 Summary previous time/plan Simple kriging Linear predictor Assume parameters known Equal to conditional expectation Unknown parameters Ordinary kriging Plug-in estimates Bayesian approach Non-Gaussian models
STK4150 - Intro 23 Unknown parameters So far assumed parameters known, what if unknown? Plug-in estimate/empirical Bayes Bayesian approach Direct approach - Ordinary kriging
STK4150 - Intro 24 Estimation Likelihood L(θ) = p(z; θ) = p(z y; θ)p(y; θ)dθ Gaussian processes: Can be derived analytically Optimization can still be problematic Many (too many?) routines in R available y
Estimation Likelihood L(θ) = p(z; θ) = p(z y; θ)p(y; θ)dθ Gaussian processes: Can be derived analytically Optimization can still be problematic y Many (too many?) routines in R available > gls(bor~wet,correlation=corgaus(form=~x+y,nugget=true),data=boreality) Generalized least squares fit by REML Model: Bor ~ Wet Data: Boreality Log-restricted-likelihood: -1363.146 Coefficients: (Intercept) Wet 15.06705 77.66940 Correlation Structure: Gaussian spatial correlation Formula: ~x + y Parameter estimate(s): range nugget 460.3941829 0.6112509 Degrees of freedom: 533 total; 531 residual Residual standard error: 3.636527
Bayesian approach Hierarchical model Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter: θ
Bayesian approach Hierarchical model Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter: θ Simultaneous model: p(y, z θ) = p(z y, θ)p(y θ) Marginal model: L(θ) = p(z θ) = y p(z, y θ)dy Inference: θ = argmax θ L(θ)
Bayesian approach Hierarchical model Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter: θ Simultaneous model: p(y, z θ) = p(z y, θ)p(y θ) Marginal model: L(θ) = p(z θ) = p(z, y θ)dy y Inference: θ = argmax θ L(θ) Bayesian approach: Include model on θ Variable Densities Notation in book Data model: Z p(z Y, θ) [Z Y, θ] Process model: Y p(y θ) [Y θ] Parameter model: θ p(θ) [θ] Simultaneous model: p(y, z, θ) Marginal model: p(z) = θ p(z, y θ)dydθ y Inference: θ = θ θp(θ z)dθ
Bayesian approach - example Assume z 1,..., z m are iid, z i = µ + ε i, ε i N(0, σ 2 ) σ 2 known, interest in estimating µ. ML: ˆµ ML = z, Var[ z] = σ 2 /m
Bayesian approach - example Assume z 1,..., z m are iid, z i = µ + ε i, ε i N(0, σ 2 ) σ 2 known, interest in estimating µ. ML: ˆµ ML = z, Var[ z] = σ 2 /m Bayesian approach: Assume µ N(µ 0, kσ 2 ) Can show that E[µ z] = k 1 µ 0 + m z k 1 + m σ2 Var[µ z] = k 1 + m k z k σ 2 m
TK4150 - Intro 31 Prediction - example Predict new point z 0. Plug-in-rule: ẑ 0 = z, var[ẑ 0 ] = σ 2
TK4150 - Intro 32 Prediction - example Predict new point z 0. Plug-in-rule: ẑ 0 = z, var[ẑ 0 ] = σ 2 Bayesian approach: E[z 0 z] =E[E[z 0 µ, z]] = E[µ z] = k 1 µ 0 + m z k 1 + m var[z 0 z] =E[var[z 0 µ, z]] + var[e[z 0 µ, z]] =σ 2 + var[µ z] =σ 2 + σ 2 k 1 + m k k z σ 2 + σ2 m Taking uncertainty in µ into account.
INLA INLA = Integrated nested Laplace approximation (software) C-code, R-interface Can be installed by source("http://www.math.ntnu.no/inla/givemeinla.r") Both empirical Bayes and Bayes
Example - simulated data Assume Y (s) =β 0 + β 1 x(s) + σ δ δ(s), Z(s i ) =Y (s i ) + σ ε ε(s i ), {δ(s)} matern correlation {ε(s i )} independent Simulations: Simulation on 20 30 grid {x(s)} sinus curve in horizontal direction Parametervalues (β 0, β 1 ) = (2, 0.3), σ δ = 1, σ ε = 0.3 and (θ 1, θ 2 ) = (2, 3) Show images Show code/results
STK4150 - Intro 35 Matern model - parametrization Model Range par Smoothness par Book { h /θ 1 } θ2 K θ2 ( h /θ 1 ) θ 1 θ 2 geor { h /φ} κ K κ ( h /φ) φ κ INLA { h κ} ν K ν ( h κ) 1/κ ν Note: INLA reports an estimate of another range, defined as 8 r = κ corresponding to a distance where the covariance function is approximately zero. An estimate of the range parameter can then be obtained by 1 κ = r 8 Note: INLA works with precisions τ y = σy 2, τ z = σz 2.
STK4150 - Intro 36 Output INLA Empirical Bayes Fixed effects: mean sd 0.025quant 0.5quant 0.975quant kld (Intercept) 2.5757115 0.006218343 2.5635180 2.5757114 2.5879077 0 x 0.3714519 0.040367838 0.2922947 0.3714515 0.4506262 0 Random effects: Name Model Max KLD delta Matern2D model Model hyperparameters: mean sd 0.025quant 0.5quant Precision for the Gaussian observations 28.8979 1.5841 25.9174 28.8546 Precision for delta 2.2549 0.1507 1.9738 2.2498 Range for delta 5.9772 0.2936 5.4222 5.9700 0.975quant Precision for the Gaussian observations 32.1247 Precision for delta 2.5644 Range for delta 6.5730 Marginal Likelihood: -455.08 Warning: Interpret the marginal likelihood with care if the prior model is improper. Posterior marginals for linear predictor and fitted values computed
STK4150 - Intro 37 Result - INLA - empirical Bayes Parameter True value Estimate β 1 2.0 2.423 β 2 0.3 0.357 σ z 0.3 1 13.2739 = 0.274 σ y 1.0 1 0.5802 = 1.313 φ 1 2.0 5.977 8 = 2.113
STK4150 - Intro 38 Ordinary kriging E[Y (s)] = µ, µ unknown, obs Z = (Z 1,..., Z m ) MSPE(λ) = E[Y (s 0 ) λ T Z) 2 Constraint: E[λ T Z] = E[Y (s 0 )] = µ E[λ T Z] = λ T E[Z] = λ T 1µ λ T 1 = 1 Lagrange multiplier: Minimize MSPE(λ) 2κ(λ T 1 1) MSPE(λ) = C Y (s 0, s 0 ) 2λ T c Y (s 0 ) + λ T C Z λ Differentiation wrt λ T : 2c Y (s 0 ) + 2C Z λ =2κ1 λ T 1 =1 λ =C 1 Z [c Y (s 0 ) + κ 1] κ = 1 c Y (s 0 ) T C 1 Z 1 1 T C 1 Z 1
Kriging Simple kriging, EY (s) = x(s) T β, known Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T C 1 (Z Xβ) Z
Kriging Simple kriging, EY (s) = x(s) T β, known Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T C 1 (Z Xβ) Ordinary kriging, EY (s) = µ, unknown Z Ŷ (s 0 ) ={c Y (s 0 ) + 1(1 1T C Z 1c y (s 0 )) 1 T C 1 Z 1 } T C 1 Z Z = µ gls + c Y (s 0 ) T C 1 Z (Z 1 µ gls) µ gls =[1 T C 1 Z 1] 1 1 T C 1 z Z
Kriging Simple kriging, EY (s) = x(s) T β, known Y (s 0 ) = x(s 0 )β + c Y (s 0 ) T C 1 (Z Xβ) Ordinary kriging, EY (s) = µ, unknown Z Ŷ (s 0 ) ={c Y (s 0 ) + 1(1 1T C Z 1c y (s 0 )) 1 T C 1 Z 1 } T C 1 Z Z = µ gls + c Y (s 0 ) T C 1 Z (Z 1 µ gls) µ gls =[1 T C 1 Z 1] 1 1 T C 1 z Z Universal kriging, EY (s) = x(s) T β, unknown Ŷ (s 0 ) =x(s 0 ) T βgls + c Y (s 0 ) T C 1 Z (Z X β gls ) β gls =[X T C 1 Z X] 1 X T C 1 z Z
TK4150 - Intro 42 Non-Gaussian data Counts, binary data: Gaussian assumption inappropriate Can still have Gaussian assumption on latent process, but non-gaussian data-distribution Y (s) =x(s) T β + ε(s), {ε(s)} Gaussian process Z(s i ) Y (s i ), θ 1 ind.f(y (s i ), θ 1 ) Best linear predictor still possible, but not reasonable (?) Conditional expectation E[Y (s 0 ) Z] still optimal under square loss Not easy to compute anymore Exponential-family model (EFM) f (z) = exp{(zη b(η))/a(θ 1 ) + c(z, θ 1 )} η =x T β Include Binomial, Poisson, Gaussian, Gamma
STK4150 - Intro 43 Estimation Likelihood L(θ) = p(z; θ) = p(z y; θ)p(y; θ)dθ Gaussian processes: Can be derived analytically Non-Gaussian: Can be problematic Monte Carlo Laplace approximations y
STK4150 - Intro 44 Computation of conditional expectation E[Y (s 0 ) Z] = E[E[Y (s 0 ) Y, Z]] Monte Carlo: E[E[Y (s 0 ) Y, Z]] 1 M M E[Y (s 0 ) Y m, Z] m=1 where Y m [Y Z] (MCMC) Laplace approximation: Approximate [Y Z] by a Gaussian Computer software available for both approahces. Also give uncertainty measures var[y (s 0 ) Z].
STK4150 - Intro 45 Laplace approximation L(θ) =p(z; θ) = p(z y; θ)p(y; θ)dy = exp{f (y; θ)}dy y y where f (y; θ) = log p(z y; θ) + log p(y; θ) Taylor approximation, y 0 max-point of f (y; θ) (depending on θ) f (y) f (y 0 ) 1 2 (y y 0) T H(y y 0 ), H = 2 y y T f (y) y=y 0 L(θ) exp{f (y 0 ) 1 2 (y y 0) T H(y y 0 )}dy y = exp{f (y 0 )}(2π) m/2 H 1/2
Example - simulated data Assume Y (s) =β 0 + β 1 x(s) + σ δ δ(s), Z(s i ) Poisson(exp(Y (s i )) {δ(s)} matern correlation Simulations: Simulation on 20 30 grid {x(s)} sinus curve in horizontal direction Parametervalues (β 0, β 1 ) = (2, 0.3), σ δ = 1, σ ε = 0.3 and (θ 1, θ 2 ) = (2, 3) Show images Show code/results