spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Similar documents
Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data

Journal of Statistical Software

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data

spbayes: an R package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Hierarchical Modeling for non-gaussian Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Hierarchical Modelling for non-gaussian Spatial Data

Gaussian predictive process models for large spatial data sets.

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modeling for Spatio-temporal Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modeling for Spatial Data

Introduction to Spatial Data and Models

Introduction to Geostatistics

Introduction to Spatial Data and Models

Hierarchical Modelling for non-gaussian Spatial Data

Introduction to Spatial Data and Models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

MCMC algorithms for fitting Bayesian models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Modelling Multivariate Spatial Data

Bayesian Linear Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Linear Models

Model Assessment and Comparisons

Bayesian spatial hierarchical modeling for temperature extremes

Bayesian Linear Regression

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Introduction to Spatial Data and Models

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Some notes on efficient computing and setting up high performance computing environments

Geostatistical Modeling for Large Data Sets: Low-rank methods

Bayesian Linear Models

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

CBMS Lecture 1. Alan E. Gelfand Duke University

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Principles of Bayesian Inference

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Lecture 23. Spatio-temporal Models. Colin Rundel 04/17/2017

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Metropolis-Hastings Algorithm

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

ST 740: Markov Chain Monte Carlo

Bayesian Dynamic Modeling for Space-time Data in R

Bayesian data analysis in practice: Three simple examples

Principles of Bayesian Inference

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Hierarchical Modeling for Univariate Spatial Data

Advanced Statistical Modelling

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Principles of Bayesian Inference

Computational statistics

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

Wrapped Gaussian processes: a short review and some new results

Empirical Bayes methods for the transformed Gaussian random field model with additive measurement errors

Markov Chain Monte Carlo

Bayesian Hierarchical Models

Bayesian non-parametric model to longitudinally predict churn

CPSC 540: Machine Learning

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Overview of Spatial Statistics with Applications to fmri

Bayesian Estimation of Input Output Tables for Russia

Regression with correlation for the Sales Data

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Default Priors and Effcient Posterior Computation in Bayesian

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Introduction to Machine Learning CMU-10701

Chapter 4 - Fundamentals of spatial processes Lecture notes

Or How to select variables Using Bayesian LASSO

Effective Sample Size in Spatial Modeling

A general mixed model approach for spatio-temporal regression data

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Package spatial.gev.bma

Introduction to Bayes and non-bayes spatial statistics

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

STA 4273H: Statistical Machine Learning

Overall Objective Priors

A Process over all Stationary Covariance Kernels

Making rating curves - the Bayesian approach

Bayesian Multivariate Logistic Regression

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Lecture 19. Spatial GLM + Point Reference Spatial Data. Colin Rundel 04/03/2017

Multivariate Gaussian Random Fields with SPDEs

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Spatial Models: A Quick Overview Astrostatistics Summer School, 2017

Multivariate spatial modeling

MCMC Methods: Gibbs and Metropolis

Bayesian Methods for Machine Learning

Bayesian hierarchical models for spatially misaligned data in R

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Transcription:

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley 1, Sudipto Banerjee 2, and Bradley P. Carlin 2 1 Michigan State University, Departments of Forestry and Geography 2 University of Minnesota, Division of Biostatistics July 29, 2007 1 spbayes

Overview Outline: Need for new software tools spbayes overview Spatial models and Bayesian inference Illustration Future direction Summary 2 spbayes

Need for new software tools Motivation Spatial data is used in an array of disciplines economics, genetics, natural sciences, sociology,... 3 spbayes

Need for new software tools Motivation Spatial data is used in an array of disciplines economics, genetics, natural sciences, sociology,... Trends in data availability and analytical needs: Increasing volume of spatial data Addressing interrelated variables of interest Summarizing uncertainty at many levels (Hierarchical modeling) Delivering spatial analysis products to end users 3 spbayes

Need for new software tools Scope Scope of interest: Point-referenced or geostatistical data vs. areally referenced data Point locations defined on R 2 (e.g., latitude-longitude or Easting-Northing) Multivariate generalized linear model Simple spatial dependence structures stationary and isotropic spatial processes 4 spbayes

Need for new software tools Software options Current software options: BUGS (Bayesian Inference Using Gibbs Sampling) Flexible coding environment Too slow (e.g., matrix operations) 5 spbayes

Need for new software tools Software options Current software options: BUGS (Bayesian Inference Using Gibbs Sampling) Flexible coding environment Too slow (e.g., matrix operations) geor and georglm Univariate only Limited prior specifications Too canned and cumbersome 5 spbayes

Need for new software tools Software options Current software options: BUGS (Bayesian Inference Using Gibbs Sampling) Flexible coding environment Too slow (e.g., matrix operations) geor and georglm Univariate only Limited prior specifications Too canned and cumbersome Model specific coding Efficient code C, C++, FORTRAN Optimized libs BLAS (Basic Linear Algebra Subprogs.), LAPACK (Linear Algebra Package) Too time consuming 5 spbayes

spbayes overview spbayes features Models: Univariate and multivariate spatial regression Choice of spatial and non-spatial covariance matrices Fully Bayesian specification Diagnostics: Model choice DIC (Deviance Information Criterion) CODA (Convergence Diagnosis and Output Analysis) ready output Interface and underlying code: R package Written in C++ and BLAS, LAPACK, and SPARSKIT R s foreign language interface permits OS portability and processor optimization 6 spbayes

spbayes overview spbayes features Models: Univariate and multivariate spatial regression Choice of spatial and non-spatial covariance matrices Fully Bayesian specification Diagnostics: Model choice DIC (Deviance Information Criterion) CODA (Convergence Diagnosis and Output Analysis) ready output Interface and underlying code: R package Written in C++ and BLAS, LAPACK, and SPARSKIT R s foreign language interface permits OS portability and processor optimization 6 spbayes

spbayes overview spbayes features Models: Univariate and multivariate spatial regression Choice of spatial and non-spatial covariance matrices Fully Bayesian specification Diagnostics: Model choice DIC (Deviance Information Criterion) CODA (Convergence Diagnosis and Output Analysis) ready output Interface and underlying code: R package Written in C++ and BLAS, LAPACK, and SPARSKIT R s foreign language interface permits OS portability and processor optimization 6 spbayes

Spatial models Point referenced spatial models and Bayesian inference (in a nutshell) 7 spbayes

Spatial models Univariate model Simple linear model + random spatial effects Y (s) = µ(s) + w(s) + ɛ(s), Response: Y (s) at some site Mean: µ = X(s) T β Spatial random effects: w(s) GP (0, σ 2 ρ(θ; s s )) Non-spatial variance: ɛ(s) iid N(0, τ 2 ) D Y (s), X(s) 8 spbayes

Spatial models Gaussian process Spatial Gaussian process (GP ): Say w(s) GP (0, σ 2 ρ( )) and Cov(w(s), w(s )) = σ 2 ρ ( θ; s s ) D w(s i) w(s j) 9 spbayes

Spatial models Gaussian process Spatial Gaussian process (GP ): Say w(s) GP (0, σ 2 ρ( )) and Cov(w(s), w(s )) = σ 2 ρ ( θ; s s ) Let w = [w(s i )] n i=1, then w MV N(0, σ 2 H(θ)), where H(θ) = [ρ(θ; s i s j )] n i,j=1 D w(s i) w(s j) 9 spbayes

Spatial models Gaussian process Realization of a Gaussian process: Changing θ and holding σ 2 = 1: w MV N(0, σ 2 H(θ)), where H(θ) = [ρ(θ; s i s j )] n i,j=1 10 spbayes

Spatial models Gaussian process Realization of a Gaussian process: Changing θ and holding σ 2 = 1: w MV N(0, σ 2 H(θ)), where H(θ) = [ρ(θ; s i s j )] n i,j=1 Correlation model for H(θ): e.g., exponential decay ρ(θ; t) = exp( θt) if t > 0. 10 spbayes

Spatial models Gaussian process Realization of a Gaussian process: Changing θ and holding σ 2 = 1: w MV N(0, σ 2 H(θ)), where H(θ) = [ρ(θ; s i s j )] n i,j=1 Correlation model for H(θ): e.g., exponential decay ρ(θ; t) = exp( θt) if t > 0. Other valid models e.g., Gaussian, Spherical, Matérn, etc. 10 spbayes

Spatial models Multivariate model Multivariate spatial model Y(s) = µ(s) + w(s) + ɛ(s), Response: Y(s) = [Y i (s)] q i=1 Mean: µ = X T (s)β, where X T (s) = [x T i (s)]q i=1 Spatial random effects: w(s) MV GP (0, K(s, s ; θ)) Non-spatial variance: ɛ(s) MV N(0, Ψ) 11 spbayes

Spatial models Multivariate model Multivariate spatial model Y(s) = µ(s) + w(s) + ɛ(s), Response: Y(s) = [Y i (s)] q i=1 Mean: µ = X T (s)β, where X T (s) = [x T i (s)]q i=1 Spatial random effects: w(s) MV GP (0, K(s, s ; θ)) Non-spatial variance: ɛ(s) MV N(0, Ψ) Care is needed in choosing K(s, s ; θ) insure symmetric and positive definite qn qn covariance matrix. 11 spbayes

Spatial models Multivariate model Multivariate spatial model Y(s) = µ(s) + w(s) + ɛ(s), Response: Y(s) = [Y i (s)] q i=1 Mean: µ = X T (s)β, where X T (s) = [x T i (s)]q i=1 Spatial random effects: w(s) MV GP (0, K(s, s ; θ)) Non-spatial variance: ɛ(s) MV N(0, Ψ) Care is needed in choosing K(s, s ; θ) insure symmetric and positive definite qn qn covariance matrix. K(s, s ; θ) = A[ q k=1 ρ k(s, s ; θ k )]A T 11 spbayes

Bayesian inference Hierarchical modeling Model parameterization 1 Unmarginalized likelihood: First stage: Y β, w, Ψ MV N(X T β + w, I n Ψ) 12 spbayes

Bayesian inference Hierarchical modeling Model parameterization 1 Unmarginalized likelihood: First stage: Y β, w, Ψ MV N(X T β + w, I n Ψ) Second stage: w A, θ MV N(0, Σ w ), where Σ w = [K(s i, s j ; θ)] n i,j=1 12 spbayes

Bayesian inference Hierarchical modeling Model parameterization 1 Unmarginalized likelihood: First stage: Y β, w, Ψ MV N(X T β + w, I n Ψ) Second stage: w A, θ MV N(0, Σ w ), where Σ w = [K(s i, s j ; θ)] n i,j=1 2 Marginalized likelihood (used in spbayes): Y Ω MV N(X T β, Σ w + I n Ψ), where Ω = {β, A, θ, Ψ}. 12 spbayes

Bayesian inference Hierarchical modeling Sampling and quantities of interest spbayes MCMC sampling scheme for Ω: β Gibbs (default) or Metropolis-Hastings (MH) A, Ψ, θ MH but soon slice sampling Recover w given Ω samples: P (w Data) P (w Ω, Data)P (Ω Data)dΩ. Prediction given Ω samples and new X(s 0 )... X(s m ): P (w Data) P (w w, Ω, Data)P (w Ω, Data)P (Ω Data)dΩdw, P (Y Data) P (Y Ω, Data)P (Ω Data)dΩ. 13 spbayes

Bayesian inference Hierarchical modeling Sampling and quantities of interest spbayes MCMC sampling scheme for Ω: β Gibbs (default) or Metropolis-Hastings (MH) A, Ψ, θ MH but soon slice sampling Recover w given Ω samples: P (w Data) P (w Ω, Data)P (Ω Data)dΩ. Prediction given Ω samples and new X(s 0 )... X(s m ): P (w Data) P (w w, Ω, Data)P (w Ω, Data)P (Ω Data)dΩdw, P (Y Data) P (Y Ω, Data)P (Ω Data)dΩ. 13 spbayes

Bayesian inference Hierarchical modeling Sampling and quantities of interest spbayes MCMC sampling scheme for Ω: β Gibbs (default) or Metropolis-Hastings (MH) A, Ψ, θ MH but soon slice sampling Recover w given Ω samples: P (w Data) P (w Ω, Data)P (Ω Data)dΩ. Prediction given Ω samples and new X(s 0 )... X(s m ): P (w Data) P (w w, Ω, Data)P (w Ω, Data)P (Ω Data)dΩdw, P (Y Data) P (Y Ω, Data)P (Ω Data)dΩ. 13 spbayes

Illustration Synthetic illustration Synthetic data Given two response variables (i.e., q = 2), exponential spatial correlation function, and β = ( 1 1 ) ( 1 2, K(0; θ) = 2 8 ) ( 9 0, Ψ = 0 2 ) ( 0.6, θ = 0.1 ). 14 spbayes

Illustration Synthetic illustration Synthetic data (cont d) Y 1 Y 2 15 spbayes

Illustration Synthetic illustration Potential candidate models fit with ggt.sp in spbayes Model 1: w = 0 (non-spatial) Model 2: A = σi q and Ψ = 0 Model 3: A = σi q and Ψ = τ 2 I q Model 4: A = diag[σ i ] q i=1 and Ψ = diag[τ i 2]q i=1 Model 5: A and Ψ = diag[τi 2]q i=1 Model 6: A, Ψ = diag[τi 2]q i=1, and θ = {φ k} q k=1 Model 7: A, Ψ, and θ = {φ k } q k=1 16 spbayes

Illustration Synthetic illustration Specifying a model using ggt.sp Recall Model 6: A, Ψ = diag[τ 2 i ]q i=1, and θ = {φ k} q k=1 Step 1: Define parameters priors, hyperpriors K.prior <- prior(dist="iwish", df=2, S=diag(c(3,6))) Psi.prior.1 <- prior(dist="ig", shape=2, scale=7) Psi.prior.2 <- prior(dist="ig", shape=2, scale=5) phi.prior <- prior(dist="unif", a=0.06, b=3) 17 spbayes

Illustration Synthetic illustration Step 2: Define parameters sampling info starting value, MH tuning, etc. var.update.control <- list( "K"=list(sample.order=0, starting=diag(1, 2), tuning=diag(c(0.1, 0.5, 0.1)), prior=k.prior), "Psi"=list(sample.order=1, starting=1, tuning=0.3, prior=list(psi.prior.1, Psi.prior.2)) "phi"=list(sample.order=2, starting=0.5, tuning=0.5, prior=list(phi.prior, phi.prior)) ) beta.control <- list(update="gibbs", prior=prior(dist="flat")) 18 spbayes

Illustration Synthetic illustration Step 3: Run control number of samples, etc. run.control <- list("n.samples"=5000, "sp.effects"=true) Step 4: Call ggt.sp ggt.model.6 <- ggt.sp( formula=list(y.1~1, Y.2~1), run.control=run.control, coords=coords, var.update.control=var.update.control, beta.update.control=beta.control, cov.model="exponential") Step 5: Prediction if desired sp.pred <-sp.predict(ggt.model.6, pred.coords=coords, pred.covars=covars) 19 spbayes

Illustration Synthetic illustration Step 3: Run control number of samples, etc. run.control <- list("n.samples"=5000, "sp.effects"=true) Step 4: Call ggt.sp ggt.model.6 <- ggt.sp( formula=list(y.1~1, Y.2~1), run.control=run.control, coords=coords, var.update.control=var.update.control, beta.update.control=beta.control, cov.model="exponential") Step 5: Prediction if desired sp.pred <-sp.predict(ggt.model.6, pred.coords=coords, pred.covars=covars) 19 spbayes

Illustration Synthetic illustration Step 3: Run control number of samples, etc. run.control <- list("n.samples"=5000, "sp.effects"=true) Step 4: Call ggt.sp ggt.model.6 <- ggt.sp( formula=list(y.1~1, Y.2~1), run.control=run.control, coords=coords, var.update.control=var.update.control, beta.update.control=beta.control, cov.model="exponential") Step 5: Prediction if desired sp.pred <-sp.predict(ggt.model.6, pred.coords=coords, pred.covars=covars) 19 spbayes

Illustration Synthetic illustration CODA ready posterior samples held in ggt.sp object. For example, summary of ggt.model.6$p.samples: Parameter Estimates: 50% (2.5%, 97.5%) β 1,0 1.086 (0.555, 1.628) β 2,0-0.273 (-1.619, 1.157) K 1,1 1.801 (0.542, 6.949) K 2,1-1.784 (-3.604, -0.357) K 2,2 8.253 (4.645, 13.221) Ψ 1,1 7.478 (3.020, 10.276) Ψ 2,2 2.276 (0.832, 5.063) φ 1 1.024 (0.243, 2.805) φ 2 0.193 (0.073, 0.437) 20 spbayes

Illustration Synthetic illustration Finally, given the model object and new X(s 0 )... X(s m ) Y1 Y2 21 spbayes

Future direction Future of spbayes Extend to other GLMs More priors and sampling routines (e.g., slice sampling) Include general model specification (e.g., MCMC package s metrop) Address BIG-N challenges 22 spbayes

Future direction s New function sp.lm Fully Bayesian lm with spatial effects Minimal arguments for fitting and prediciton BIG-N through Predictive Process Knot-based dimension reduction Fit models with n>10,000 with common desktop computer See Banerjee et al. (2007) w w 23 spbayes

Summary Summary spbayes begins to meet a statistical computing need: Flexible model specification Efficient MCMC computation Useful parameter and predictive inference Portable and scalable code base 24 spbayes

Acknowledgements and References This work was supported by: NASA Headquarters under the Earth System Science Fellowship Grant NGT5 NSF Grant 0706870 USDA Forest Service Forest Inventory and Analysis program University of Minnesota, School of Statistics References: Finley, A.O., Banerjee, S., and Carlin. B.P. (2007). spbayes: An R package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Journal of Statistical Software. http://www.jstatsoft.org 19(4). Finley, A.O., Banerjee, S., Ek, A.R. and McRoberts, R.E. (In press). Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics. Banerjee, S., and Finley, A.O. (In press). Bayesian multiresolution modeling of spatially replicated data. Journal of Statistical Planning and Inference. Finley, A.O., Banerjee, S., and McRoberts, R.E. (In press). A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environmental and Ecological Statistics. 25 spbayes

Questions Questions 26 spbayes