Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1

Point-referenced spatial data often come as multivariate measurements at each location. 2 Geography 890

Point-referenced spatial data often come as multivariate measurements at each location. Examples: Environmental monitoring: stations yield measurements on ozone, NO, CO, and PM 2.5. 2 Geography 890

Point-referenced spatial data often come as multivariate measurements at each location. Examples: Environmental monitoring: stations yield measurements on ozone, NO, CO, and PM 2.5. Community ecology: assembiages of plant species due to water availibility, temerature, and light requirements. Forestry: measurements of stand characteristics age, total biomass, and average tree diameter. Atmospheric modeling: at a given site we observe surface temperature, precipitation and wind speed 2 Geography 890

Each location contains m spatial regressions Y k (s) = µ k (s) + w k (s) + ɛ k (s), k = 1,..., m. Mean: µ(s) = [µ k (s)] m k=1 = [xt k (s)β k] m k=1 Cov: w(s) = [w k (s)] m k=1 MV GP (0, Γ w(, )) Γ w (s, s ) = [Cov(w k (s), w k (s ))] m k,k =1 Error: ɛ(s) = [ɛ k (s)] m k=1 MV N(0, Ψ) Ψ is an m m p.d. matrix, e.g. usually Diag(τk 2)m k=1. 3 Geography 890

Multivariate Gaussian process w(s) MV GP (0, Γ w ( )) with Γ w (s, s ) = [Cov(w k (s), w k (s ))] m k,k =1 Example: with m = 2 ( Γ w (s, s Cov(w1 (s), w ) = 1 (s )) Cov(w 1 (s), w 2 (s ) )) Cov(w 2 (s), w 1 (s )) Cov(w 2 (s), w 2 (s )) For finite set of locations S = {s 1,..., s n }: V ar ([w(s i )] n i=1) = Σ w = [Γ w (s i, s j )] n i,j=1 4 Geography 890

Multivariate Gaussian process Properties: Γ w (s, s) = Γ T w(s, s ) lim s s Γ w (s, s ) is p.d. and Γ w (s, s) = V ar(w(s)). For sites in any finite collection S = {s 1,..., s n }: n i=1 j=1 n u T i Γ w (s i, s j )u j 0 for all u i, u j R m. Any valid Γ W must satisfy the above conditions. The last property implies that Σ w is p.d. In complete generality: Γ w (s, s ) need not be symmetric. Γ w (s, s ) need not be p.d. fors s. 5 Geography 890

Modelling cross-covariances Moving average or kernel convolution of a process: Let Z(s) GP (0, ρ( )). Use kernels to form: w j (s) = κ j (u)z(s + u)du = κ j (s s )Z(s )ds Γ w (s s ) has (i, j)-th element: [Γ w (s s )] i,j = κ i (s s + u)κ j (u )ρ(u u )dudu Convolution of Covariance Functions: ρ 1, ρ 2,...ρ m are valid covariance functions. Form: [Γ w (s s )] i,j = ρ i (s s t)ρ j (t)dt. 6 Geography 890

Modelling cross-covariances Constructive approach Let v k (s) GP (0, ρ k (s, s )), for k = 1,..., m be m independent GP s with unit variance. Form the simple multivariate process v(s) = [v k (s)] m k=1 : v(s) MV GP (0, Γ v (, )) with Γ v (s, s ) = Diag(ρ k (s, s )) m k=1. Assume w(s) = A(s)v(s) arises as a space-varying linear transformation of v(s). Then: Γ w (s, s ) = A(s)Γ v (s, s )A T (s ) is a valid cross-covariance function. 7 Geography 890

Modelling cross-covariances Constructive approach, contd. When s = s, Γ v (s, s) = I m, so: Γ w (s, s) = A(s)A T (s) A(s) identifies with any square-root of Γ w (s, s). Can be taken as lower-triangular (Cholesky). A(s) is unknown! Should we first model A(s) to obtain Γ w (s, s)? Or should we model Γ w (s, s ) first and derive A(s)? A(s) is completely determined from within-site associations. 8 Geography 890

Modelling cross-covariances Constructive approach, contd. If A(s) = A: w(s) is stationary when v(s) is. Γ w (s, s ) is symmetric. Γ v (s, s ) = ρ(s, s )I m Γ w = ρ(s, s )AA T Last specification is called intrinsic and leads to separable models: Σ w = H(φ) Λ; Λ = AA T 9 Geography 890

Hierarchical modelling Let y = [Y(s i )] n i=1 and w = [W(s i)] n i=1. First stage: n y β, w, Ψ MV N ( Y(s i ) X(s i ) T β + w(s i ), Ψ ) i=1 10 Geography 890

Hierarchical modelling Let y = [Y(s i )] n i=1 and w = [W(s i)] n i=1. First stage: n y β, w, Ψ MV N ( Y(s i ) X(s i ) T β + w(s i ), Ψ ) i=1 Second stage: w θ MV N(0, Σ w (Φ)) where Σ w (Φ) = [Γ W (s i, s j ; Φ)] n i,j=1. Third stage: Priors on Ω = (β, Ψ, Φ). 10 Geography 890

Bayesian computations Choice: Fit as [y Ω] [Ω] or as [y β, w, Ψ] [w Φ] [Ω]. 11 Geography 890

Bayesian computations Choice: Fit as [y Ω] [Ω] or as [y β, w, Ψ] [w Φ] [Ω]. Conditional model: Conjugate distributions are available for Ψ and other variance parameters. Easy to program. Marginalized model: need Metropolis or Slice sampling for most variance-covariance parameters. Harder to program. But reduced parameter space (no w s) results in faster convergence Σ w (Φ) + I Ψ is more stable than Σ w (Φ). 11 Geography 890

Bayesian computations Recovering the w s? Interest often lies in the spatial surface w y. 12 Geography 890

Bayesian computations Recovering the w s? Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: 12 Geography 890

Bayesian computations Recovering the w s? Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: Obtain Ω (1),..., Ω (G) [Ω y, X] For each Ω (g), draw w (g) [w Ω (g), y, X] NOTE: With Gaussian likelihoods [w Ω, y, X] is also Gaussian. With other likelihoods this may not be easy and often the conditional updating scheme is preferred. 12 Geography 890

Spatial prediction Often we need to predict Y (s) at a new set of locations { s 0,..., s m } with associated predictor matrix X. Sample from predictive distribution: [ỹ y, X, X] = [ỹ, Ω y, X, X]dΩ = [ỹ y, Ω, X, X] [Ω y, X]dΩ, [ỹ y, Ω, X, X] is multivariate normal. Sampling scheme: Obtain Ω (1),..., Ω (G) [Ω y, X] For each Ω (g), draw ỹ (g) [ỹ y, Ω (g), X, X]. 13 Geography 890

Illustration Illustration from: Finley, A.O., S. Banerjee, A.R. Ek, and R.E. McRoberts. (2008) Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics, 13:60 83. 14 Geography 890

Illustration Bartlett Experimental Forest Slight digression why we fit a model: Association between response and covariates, β, (e.g., ecological interpretation) Residual spatial and/or non-spatial associations and patterns (i.e., given covariates) Subsequent prediction 15 Geography 890

Illustration Bartlett Experimental Forest Study objectives: Evaluate methods for multi-source forest attribute mapping Find the best model, given the data Produce maps of biomass and uncertainty, by tree species Study area: USDA FS Bartlett Experimental Forest (BEF), NH 1,053 ha heavily forested Major tree species: American beech (BE), eastern hemlock (EH), red maple (RM), sugar maple (SM), and yellow birch (YB) 16 Geography 890

Illustration Bartlett Experimental Forest Bartlett Experimental Forest Image provided by www.fs.fed.us/ne/durham/4155/bartlett 17 Geography 890

Illustration Bartlett Experimental Forest Response variables: Metric tons of total tree biomass per ha Measured on 437 1 10 ha plots Models fit using random subset of 218 plots Prediction at remaining 219 plots Inventory plots Latitude (meters) 0 1000 3000 BE EH Latitude (meters) 0 1000 2000 3000 4000 RM SM Longitude (meters) Latitude (meters) 1000 2000 3000 4000 0 1000 3000 YB Longitude (meters) 1000 2000 3000 4000 18 Geography 890

Illustration Bartlett Experimental Forest Covariates DEM derived elevation and slope Spring, Summer, Fall Landsat ETM+ Tasseled Cap features (brightness, greeness, wetness) Latitude (meters) 0 1000 2000 3000 4000 Elevation Slope 1000 2000 3000 4000 1000 2000 3000 4000 Longitude (meters) Longitude (meters) 19 Geography 890

Illustration Bartlett Experimental Forest Candidate models Each model includes 55 covariates and 5 intercepts, therefore, X T is 1090 60. 20 Geography 890

Illustration Bartlett Experimental Forest Candidate models Each model includes 55 covariates and 5 intercepts, therefore, X T is 1090 60. Different specifications of variance structures: 1 Non-spatial multivariate Diag(Ψ) = τ 2 2 Diag(K), same φ, Diag(Ψ) 20 Geography 890

Illustration Bartlett Experimental Forest Model comparison Deviance Information Criterion (DIC): D(Ω) = 2 log L(Data Ω) D(Ω) = E Ω Y [D(Ω)] p D = D(Ω) D( Ω); Ω = E Ω Y [Ω] DIC = D(Ω) + p D. Lower DIC is better. 21 Geography 890

Illustration Bartlett Experimental Forest Selected model Model 5: K, different φ, Diag(Ψ) Parameters: K = 15, φ = 5, Diag(Ψ) = 5 22 Geography 890

Illustration Bartlett Experimental Forest Selected model Model 5: K, different φ, Diag(Ψ) Parameters: K = 15, φ = 5, Diag(Ψ) = 5 Focus on spatial cross-covariance matrix K (for brevity). Posterior inference of cor(k), e.g., 50 (2.5, 97.5) percentiles: BE EH BE 1 EH 0.16 (0.13, 0.21) 1 RM -0.20 (-0.23, -0.15) 0.45 (0.26, 0.66) SM -0.20 (-0.22, -0.17) -0.12 (-0.16, -0.09) YB 0.07 (0.04, 0.08) 0.22 (0.20, 0.25) These relationships expressed in mapped random spatial effects, w. 22 Geography 890

Illustration Bartlett Experimental Forest E[w Data] Inventory plots Latitude (meters) 0 1000 3000 BE EH Latitude (meters) 0 1000 2000 3000 4000 RM SM Longitude (meters) Latitude (meters) 1000 2000 3000 4000 0 1000 3000 YB Longitude (meters) 1000 2000 3000 4000 E[w Data] Validation plots Latitude (meters) 0 1000 3000 BE EH Latitude (meters) 0 1000 2000 3000 4000 RM SM Longitude (meters) Latitude (meters) 0 1000 2000 3000 4000 0 1000 3000 YB Longitude (meters) 0 1000 2000 3000 4000 23 Geography 890

Illustration Bartlett Experimental Forest E[Y Data] Validation plots Latitude (meters) 0 1000 3000 BE EH Latitude (meters) 0 1000 2000 3000 4000 RM SM Longitude (meters) Latitude (meters) 0 1000 2000 3000 4000 0 1000 3000 YB Longitude (meters) 0 1000 2000 3000 4000 P (2.5 < Y < 97.5 Data) Validation plots Latitude (meters) 0 1000 3000 BE EH Latitude (meters) 0 1000 2000 3000 4000 RM SM Longitude (meters) Latitude (meters) 0 1000 2000 3000 4000 0 1000 3000 YB Longitude (meters) 0 1000 2000 3000 4000 24 Geography 890

Illustration Bartlett Experimental Forest Summary Proposed Bayesian hierarchical spatial methodology: Partition sources of uncertainty Provides hypothesis testing Reveal spatial patterns and missing covariates Allow flexible inference Access parameters posterior distribution Access posterior predictive distribution Provide consistent prediction of multiple variables Maintains spatial and non-spatial association 25 Geography 890