Hierarchical Modeling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. March 4, 2013 Point-referenced spatial data often come as multivariate measurements at each location. Examples: Environmental monitoring: stations yield measurements on ozone, NO, CO, and PM 2.5. Community ecology: assembiages of plant species due to water availibility, temerature, and light requirements. Forestry: measurements of stand characteristics age, total biomass, and average tree diameter. Atmospheric modeling: at a given site we observe surface temperature, precipitation and wind speed We anticipate dependence between measurements at a particular location across locations 1 2 NEON Applied Bayesian Regression Spatio-temporal Workshop Multivariate Gaussian process Each location contains m spatial regressions Y k (s) = µ k (s) + w k (s) + ɛ k (s), k = 1,..., m. Mean: µ(s) = [µ k (s)] m k=1 = [xt k (s)β k] m k=1 Cov: w(s) = [w k (s)] m k=1 MV GP (0, Γ w(, )) Γ w (s, s ) = [Cov(w k (s), w k (s ))] m k,k =1 Error: ɛ(s) = [ɛ k (s)] m k=1 MV N(0, Ψ) Ψ is an m m p.d. matrix, e.g. usually Diag(τk 2)m k=1. w(s) MV GP (0, Γ w ( )) with Γ w (s, s ) = [Cov(w k (s), w k (s ))] m k,k =1 Example: with m = 2 ( Γ w (s, s Cov(w1 (s), w ) = 1 (s )) Cov(w 1 (s), w 2 (s ) )) Cov(w 2 (s), w 1 (s )) Cov(w 2 (s), w 2 (s )) For finite set of locations S = {s 1,..., s n }: V ar ([w(s i )] n i=1) = Σ w = [Γ w (s i, s j )] n i,j=1 3 NEON Applied Bayesian Regression Spatio-temporal Workshop 4 NEON Applied Bayesian Regression Spatio-temporal Workshop Multivariate Gaussian process Properties: Γ w (s, s) = Γ T w(s, s ) lim s s Γ w (s, s ) is p.d. and Γ w (s, s) = V ar(w(s)). For sites in any finite collection S = {s 1,..., s n }: n i=1 j=1 n u T i Γ w (s i, s j )u j 0 for all u i, u j R m. Any valid Γ W must satisfy the above conditions. The last property implies that Σ w is p.d. In complete generality: Γ w (s, s ) need not be symmetric. Γ w (s, s ) need not be p.d. fors s. Moving average or kernel convolution of a process: Let Z(s) GP (0, ρ( )). Use kernels to form: w j (s) = κ j (u)z(s + u)du = κ j (s s )Z(s )ds Γ w (s s ) has (i, j)-th element: [Γ w (s s )] i,j = κ i (s s + u)κ j (u )ρ(u u )dudu Convolution of Covariance Functions: ρ 1, ρ 2,...ρ m are valid covariance functions. Form: [Γ w (s s )] i,j = ρ i (s s t)ρ j (t)dt. 5 NEON Applied Bayesian Regression Spatio-temporal Workshop 6 NEON Applied Bayesian Regression Spatio-temporal Workshop

Constructive approach Constructive approach, contd. Let v k (s) GP (0, ρ k (s, s )), for k = 1,..., m be m independent GP s with unit variance. Form the simple multivariate process v(s) = [v k (s)] m k=1 : v(s) MV GP (0, Γ v (, )) with Γ v (s, s ) = Diag(ρ k (s, s )) m k=1. Assume w(s) = A(s)v(s) arises as a space-varying linear transformation of v(s). Then: Γ w (s, s ) = A(s)Γ v (s, s )A T (s ) is a valid cross-covariance function. When s = s, Γ v (s, s) = I m, so: Γ w (s, s) = A(s)A T (s) A(s) identifies with any square-root of Γ w (s, s). Can be taken as lower-triangular (Cholesky). A(s) is unknown! Should we first model A(s) to obtain Γ w (s, s)? Or should we model Γ w (s, s ) first and derive A(s)? A(s) is completely determined from within-site associations. 7 NEON Applied Bayesian Regression Spatio-temporal Workshop 8 NEON Applied Bayesian Regression Spatio-temporal Workshop Hierarchical modelling Constructive approach, contd. If A(s) = A: w(s) is stationary when v(s) is. Γ w (s, s ) is symmetric. Γ v (s, s ) = ρ(s, s )I m Γ w = ρ(s, s )AA T Last specification is called intrinsic and leads to separable models: Σ w = H(φ) Λ; Λ = AA T Let y = [Y(s i )] n i=1 and w = [W(s i)] n i=1. First stage: n y β, w, Ψ MV N ( Y(s i ) X(s i ) T β + w(s i ), Ψ ) i=1 Second stage: w θ MV N(0, Σ w (Φ)) where Σ w (Φ) = [Γ W (s i, s j ; Φ)] n i,j=1. Third stage: Priors on Ω = (β, Ψ, Φ). Marginalized likelihood: y β, θ, Ψ MV N(Xβ, Σ w (Φ) + I Ψ) 9 NEON Applied Bayesian Regression Spatio-temporal Workshop 10 NEON Applied Bayesian Regression Spatio-temporal Workshop Bayesian computations Bayesian computations Choice: Fit as [y Ω] [Ω] or as [y β, w, Ψ] [w Φ] [Ω]. Conditional model: Conjugate distributions are available for Ψ and other variance parameters. Easy to program. Marginalized model: need Metropolis or Slice sampling for most variance-covariance parameters. Harder to program. But reduced parameter space (no w s) results in faster convergence Σ w (Φ) + I Ψ is more stable than Σ w (Φ). But what about Σ 1 w (Φ)?? Matrix inversion is EXPENSIVE O(n 3 ). Recovering the w s? Interest often lies in the spatial surface w y. They are recovered from [w y, X] = [w Ω, y, X] [Ω y, X]dΩ using posterior samples: Obtain Ω (1),..., Ω (G) [Ω y, X] For each Ω (g), draw w (g) [w Ω (g), y, X] NOTE: With Gaussian likelihoods [w Ω, y, X] is also Gaussian. With other likelihoods this may not be easy and often the conditional updating scheme is preferred. 11 NEON Applied Bayesian Regression Spatio-temporal Workshop 12 NEON Applied Bayesian Regression Spatio-temporal Workshop

Spatial prediction Often we need to predict Y (s) at a new set of locations { s 0,..., s m } with associated predictor matrix X. Sample from predictive distribution: [ỹ y, X, X] = [ỹ, Ω y, X, X]dΩ = [ỹ y, Ω, X, X] [Ω y, X]dΩ, [ỹ y, Ω, X, X] is multivariate normal. Sampling scheme: Obtain Ω (1),..., Ω (G) [Ω y, X] For each Ω (g), draw ỹ (g) [ỹ y, Ω (g), X, X]. from: Finley, A.O., S. Banerjee, A.R. Ek, and R.E. McRoberts. (2008) Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics, 13:60 83. 13 NEON Applied Bayesian Regression Spatio-temporal Workshop 14 NEON Applied Bayesian Regression Spatio-temporal Workshop Slight digression why we fit a model: Association between response and covariates, β, (e.g., ecological interpretation) Residual spatial and/or non-spatial associations and patterns (i.e., given covariates) Subsequent prediction Study objectives: Evaluate methods for multi-source forest attribute mapping Find the best model, given the data Produce maps of biomass and uncertainty, by tree species Study area: USDA FS (F), NH 1,053 ha heavily forested Major tree species: American beech (), eastern hemlock (), red maple (), sugar maple (), and yellow birch () 15 NEON Applied Bayesian Regression Spatio-temporal Workshop 16 NEON Applied Bayesian Regression Spatio-temporal Workshop Response variables: Metric tons of total tree biomass per ha Measured on 437 1 10 ha plots Inventory plots Models fit using random subset of 218 plots Prediction at remaining 219 plots Image provided by www.fs.fed.us/ne/durham/4155/bartlett 17 NEON Applied Bayesian Regression Spatio-temporal Workshop 18 NEON Applied Bayesian Regression Spatio-temporal Workshop

Covariates DEM derived elevation and slope Spring, Summer, Fall Landsat ETM+ Tasseled Cap features (brightness, greeness, wetness) Elevation Slope 19 NEON Applied Bayesian Regression Spatio-temporal Workshop Candidate models Each model includes 55 covariates and 5 intercepts, therefore, X T is 1090 60. Different specifications of variance structures: 1 Non-spatial multivariate Diag(Ψ) = τ 2 2 Diag(K), same φ, Diag(Ψ) 3 K, same φ, Diag(Ψ) 4 Diag(K), different φ, Diag(Ψ) 5 K, different φ, Diag(Ψ) 6 K, different φ, Ψ 20 NEON Applied Bayesian Regression Spatio-temporal Workshop Model comparison Deviance Information Criterion (DIC): D(Ω) = 2 log L(Data Ω) D(Ω) = E Ω Y [D(Ω)] p D = D(Ω) D( Ω); Ω = E Ω Y [Ω] DIC = D(Ω) + p D. Lower DIC is better. Model p D DIC 1 35 8559 2 35 8543 3 35 8521 4 33 8536 5 34 8505 6 38 8507 21 NEON Applied Bayesian Regression Spatio-temporal Workshop Selected model Model 5: K, different φ, Diag(Ψ) Parameters: K = 15, φ = 5, Diag(Ψ) = 5 Focus on spatial cross-covariance matrix K (for brevity). Posterior inference of cor(k), e.g., 50 (2.5, 97.5) percentiles: 1 0.16 (0.13, 0.21) 1-0.20 (-0.23, -0.15)-0.20 (-0.23, -0.15) 0.45 (0.26, 0.66)0.45 (0.26, 0.66-0.20 (-0.22, -0.17) -0.12 (-0.16, -0.09) 0.07 (0.04, 0.08) 0.22 (0.20, 0.25) These relationships expressed in mapped random spatial effects, w. 22 NEON Applied Bayesian Regression Spatio-temporal Workshop E[w Data] Inventory plots E[w Data] 23 NEON Applied Bayesian Regression Spatio-temporal Workshop E[Y Data] P (2.5 < Y < 97.5 Data) 24 NEON Applied Bayesian Regression Spatio-temporal Workshop

Summary Proposed Bayesian hierarchical spatial methodology: Partition sources of uncertainty Provides hypothesis testing Reveal spatial patterns and missing covariates Allow flexible inference Access parameters posterior distribution Access posterior predictive distribution Provide consistent prediction of multiple variables Maintains spatial and non-spatial association Extendable model template: Cluster plot sample design multiresolution models Non-continuous response general linear models Obs. over time and space spatiotemporal models 25 NEON Applied Bayesian Regression Spatio-temporal Workshop