Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Similar documents
Spatial smoothing using Gaussian processes

Computational techniques for spatial logistic regression with large datasets

Computational techniques for spatial logistic regression with large datasets

Spatial Modelling Using a New Class of Nonstationary Covariance Functions

Measurement Error in Spatial Modeling of Environmental Exposures

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Nonstationary Covariance Functions for Gaussian Process Regression

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Disease mapping with Gaussian processes

Analysing geoadditive regression data: a mixed model approach

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Statistical and epidemiological considerations in using remote sensing data for exposure estimation

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

random fields on a fine grid

A STATISTICAL TECHNIQUE FOR MODELLING NON-STATIONARY SPATIAL PROCESSES

Modelling geoadditive survival data

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

Hierarchical Modelling for Univariate Spatial Data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Hierarchical Modeling for Univariate Spatial Data

Bayesian Methods for Machine Learning

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Bayesian smoothing of irregularly-spaced data using Fourier basis functions

NONSTATIONARY GAUSSIAN PROCESSES FOR REGRESSION AND SPATIAL MODELLING

Introduction to Spatial Data and Models

Wrapped Gaussian processes: a short review and some new results

Nearest Neighbor Gaussian Processes for Large Spatial Data

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

A short introduction to INLA and R-INLA

A general mixed model approach for spatio-temporal regression data

STA 4273H: Sta-s-cal Machine Learning

Introduction to Spatial Data and Models

Predicting Long-term Exposures for Health Effect Studies

Bayesian Hierarchical Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Gaussian Process Regression

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Summary STK 4150/9150

Bayesian spatial quantile regression

Bayesian covariate models in extreme value analysis

Default Priors and Effcient Posterior Computation in Bayesian

What s for today. Continue to discuss about nonstationary models Moving windows Convolution model Weighted stationary model

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

GAUSSIAN PROCESS REGRESSION

The Bayesian approach to inverse problems

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Bayesian non-parametric model to longitudinally predict churn

Spatial Statistics with Image Analysis. Lecture L08. Computer exercise 3. Lecture 8. Johan Lindström. November 25, 2016

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology

Model Selection for Gaussian Processes

arxiv: v4 [stat.me] 14 Sep 2015

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

The Matrix Reloaded: Computations for large spatial data sets

State Space Representation of Gaussian Processes

Aggregated cancer incidence data: spatial models

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Introduction to Geostatistics

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

The Matrix Reloaded: Computations for large spatial data sets

On Bayesian Computation

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

STA 4273H: Statistical Machine Learning

Stochastic Spectral Approaches to Bayesian Inference

Hierarchical Modeling for Multivariate Spatial Data

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Neutron inverse kinetics via Gaussian Processes

Hierarchical Modelling for Univariate Spatial Data

Some general observations.

CPSC 540: Machine Learning

A Complete Spatial Downscaler

Sub-kilometer-scale space-time stochastic rainfall simulation

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Student-t Process as Alternative to Gaussian Processes Discussion

Multi-resolution models for large data sets

Flexible Spatio-temporal smoothing with array methods

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

STA 4273H: Statistical Machine Learning

Introduction to Gaussian Processes

Reliability Monitoring Using Log Gaussian Process Regression

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Basics of Point-Referenced Data Models

A spatio-temporal model for extreme precipitation simulated by a climate model

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Multivariate spatial modeling

Statistícal Methods for Spatial Data Analysis

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Information geometry for bivariate distribution control

Climate Change: the Uncertainty of Certainty

Bayesian linear regression

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Transcription:

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek

Increased attention to spatial analysis in public health data availability: geocoding and GPS for assigning point locations to individuals and monitors GIS software: easy data management and manipulation graphical presentation spatially-varying covariate generation interest amongst researchers: strong applied interest in kriging and related smoothing methods opportunities for more sophisticated spatio-temporal modelling, particularly Bayesian hierarchical modelling 1

Petrochemical exposure in Kaohsiung, Taiwan leukemia brain cancer northing (km) 2490 2500 2510 2520 2530 northing (km) 2490 2500 2510 2520 2530 165 170 175 180 185 190 easting (km) 165 170 175 180 185 190 easting (km) n = 495 n = 433 n 1 = 141 n 1 = 121 2

Possible approaches for health analysis Explicitly estimate pollutant exposure - difficult retrospectively Use distance to exposure source as covariate Use a moving window/multiple testing to detect clusters of cases default approach - software available Include space as a covariate to provide a map of risk H i Ber(p(x i, s i )) logit(p(x i, s i )) = x T i β + g θ (s i ) 3

Particulate matter exposure in the Nurses Health Study estimate individual exposure, 1985-2003 EPA monitoring for large-scale spatio-temporal heterogeneity spatially-varying covariates for local heterogenity distance to roads, climate variables, local land use,... generated using GIS basic additive exposure model: log E i N(f(x i, s i ), η 2 ) f(x i, s i ) = p h p (x i ) + g θ (s i ) geocoding of individual residences every two years relate estimated exposure to health outcomes (chronic heart disease) 4

geocoding and GIS make this possible; spatial statistics provides a rigorous framework 5

Health outcomes by postcode in NSW, Australia methodological challenges areal (postcode) units vary drastically in size data misalignment relate areal data to a latent smooth process, g θ ( ) (Kelsall & Wakefield, Rathouz) computational challenges: 650 units, 5 years daily data, 2 sexes, 9 age groups 6

Outline Motivating examples Introduction to Gaussian processes (GPs) Fast Gaussian process modelling Flexible Gaussian process modelling Bayes and overfitting The future: flexibility + efficiency + hierarchical modelling 7

Kriging as a GP model Y i N(g(s i ), η 2 ) g( ) GP(µ, C( ; θ)) Bayesian model specifies prior distributions for θ (Bayesian kriging) Empirical Bayes/marginal likelihood (i.e., kriging) integrate g train = (g(s 1 ),..., g(s n )) out of model estimate θ maximizing marginal posterior maximizing marginal likelihood fitting variogram model for C( ; θ) point estimate for spatial process: E(g test Y, θ) based conditional normal calculations 8

GAUSSIAN PROCESS DISTRIBUTION Infinite-dimensional joint distribution for g(x), x X : Example: g( ) a spatial process, X = R 2 g( ) GP(µ( ), C(, )) Finite-dimensional marginals are normal Types of covariance functions, C(x i, x j ): stationary, isotropic stationary, anisotropic nonstationary X x j x j x i x i 9

GAUSSIAN PROCESS DISTRIBUTION Infinite-dimensional joint distribution for g(x), x X : Example: g( ) a spatial process, X = R 2 g( ) GP(µ( ), C(, )) Finite-dimensional marginals are normal Types of covariance functions, C(x i, x j ): stationary, isotropic stationary, anisotropic nonstationary X x j x i x j x i 10

GAUSSIAN PROCESS DISTRIBUTION Infinite-dimensional joint distribution for g(x), x X : Example: g( ) a spatial process, X = R 2 g( ) GP(µ( ), C(, )) Finite-dimensional marginals are normal Types of covariance functions, C(x i, x j ): stationary, isotropic stationary, anisotropic nonstationary X x j x j x i x i 11

Stationary Correlation Functions Correlation Correlation fxns ρ = 0.2 ρ = 0.04 0.0 0.1 0.2 0.3 0.4 0.5 Distance Z(x) 2.0 1.0 0.0 1.0 Sample fxns 0.00 0.10 0.20 0.30 x Matérn form ( ) ν ( ) 1 R(τ) = 2 ντ 2 ντ Γ(ν)2 ν 1 ρ Kν ρ ; ν > 0, ρ > 0 Differentiability controlled by ν, asymptotic advantages (Stein) Familiar exponential (ν = 0.5) and squared exponential (Gaussian) (ν ) correlations as special and limiting cases 12

Computational challenges of GPs even marginal likelihood in normal error model is intensive: g( ) GP(µ( ), C θ (, )) Y N(µ, C θ + η 2 I) O(n 3 ) fitting: C θ + η 2 I and (C θ + η 2 I) 1 (Y µ1) non-gaussian spatial models particularly difficult spatial process can t be integrated out MCMC mixing is very slow because of high-level structure correlation amongst process values and between process values and process hyperparameters value 0 5 10 15 ACF 0.0 0.4 0.8 0 2000 6000 10000 iteration 0 20 40 60 80 100 lag/10 13

Petrochemical exposure in Kaohsiung, Taiwan leukemia brain cancer northing (km) 2490 2500 2510 2520 2530 northing (km) 2490 2500 2510 2520 2530 165 170 175 180 185 190 easting (km) 165 170 175 180 185 190 easting (km) n = 495 n = 433 n 1 = 141 n 1 = 121 14

Modelling Framework H i Ber(p(x i, s i )) logit(p(x i, s i )) = x i T β + g θ (s i ) basic spatial model for g s θ = (g θ (s 1 ),..., g θ (s n )) GAM: g θ ( ) is a two-dimensional smooth term basis representation g s θ = Zu Gaussian process representation: g( ) GP(µ( ), C θ (, )) g s θ N(µ, C θ ) GLMM: g s θ = Zu correlated random effects, u N(0, Σ) 15

Approaches Bayesian spectral basis model fit by MCMC (Wikle, 2002) [B-SB] penalized likelihood based on mixed model (radial basis functions) with REML smoothing (Kammann and Wand, 2003; Ngo and Wand, 2004) [PL-PQL] penalized likelihood with GCV smoothing (Wood, 2001, 2003, 2004) [PL-GCV] Bayesian mixed model/radial basis functions fit by MCMC (Zhao and Wand 2004) [B-Geo] 16

Bayesian spectral basis function model computationally efficient basis function construction (Wikle 2002) g # = Zu and g s = σp g # piecewise constant gridded surface on k by k grid P maps observation locations to nearest grid point Z is the Fourier (spectral) basis and Zu is the inverse FFT Zu is approximately a Gaussian process (GP) when... u N(0, diag(π θ (ω))) for Fourier frequencies, ω spectral density, π θ ( ), of GP covariance function defines V(u) 17

Bayesian spectral basis functions ω 2 = 0 ω 2 = 1 ω 2 = 2 ω 2 = 3 ω 1 = 0 ω 1 = 1 ω 1 = 2 ω 1 = 3 18

Comparison with usual GP specification spectral basis uses FFT O ( (k 2 ) log(k 2 ) ) additional observations are essentially free for fixed grid fast computation and prediction of surface given coefficients a priori independent coefficients give fast computation of prior and help with mixing 19

Penalized likelihood using GLMM framework with REML [PL-PQL] g s = Zu, Z = Ψ nk Ω 1 2 kk, u N(0, σ2 u) - variance component provides complexity penalty Ω contains pairwise spatial covariances between k knot locations and Ψ between n data locations and k knot locations potential covariance functions: thin plate spline generalized covariance function, ( C(τ) = τ 2 log τ ) ν ( ) 1 Matérn correlation function, R(τ) = 2 ντ 2 ντ Γ(ν)2 ν 1 ρ Kν ρ, with ρ and ν fixed computationally efficient approximation of a Gaussian process representation for g s PQL approach - IWLS fitting of (β, u) with REML estimation of σ 2 u within the iterations using MM software 20

GLMM basis functions radial basis functions centered at the knots 4 of 64 functions displayed: 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 TPS Matérn 21

Penalized likelihood using GCV [PL-GCV] thin plate spline basis for g( ) truncated eigendecomposition of basis matrix increases computational efficiency IWLS fitting of (β, u) with GCV estimation of penalty easy implementation using the R mgcv library gam() 22

Bayesian geoadditive model [B-Geo] Bayesian version of GLMM framework already described g s = Zu, Z = Ψ nk Ω 1 2 kk, u N(0, σ2 u) natural Bayesian complexity penalty through prior on u thin plate spline covariance or Matérn correlation basis construction of Ψ and Ω MCMC implementation - ensuring mixing is not simple Metropolis-Hastings for u using conditional posterior mean and variance based on linearized observations joint proposals for σ 2 u and u to ensure that u remains compatible with its variance component 23

Simulated datasets 3 case-control scenarios: n 0 = 1, 000; n 1 = 200; n test = 2500 on 50 by 50 grid 1 cohort scenario: n = 10, 000; n test = 2500 on 50 by 50 grid 6 4 2 0 2 4 6 0.0002 0.0006 0.0010 6 4 2 0 2 4 6 0.0002 0.0006 0.0010 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 0.0002 0.0006 0.0010 20 10 0 10 20 0.10 0.11 0.12 0.13 6 4 2 0 2 4 6 20 10 0 10 20 24

Assessment on 50 simulated datasets flat isotropic MSE 0.0000 0.0004 0.0008 0.0012 PL PQL PL GCV B Geo B SB MSE 0.0005 0.0010 0.0015 0.0020 0.0025 PL PQL PL GCV B Geo B SB anisotropic cohort MSE 0.0010 0.0020 0.0030 MSE 0.00005 0.00015 0.00025 PL PQL PL GCV B Geo B SB PL PQL PL GCV B SB null 25

Mixing and speed of Bayesian methods speed (1000 its) speed (1000 eff. its) log posterior trace σ trace ρ trace B-Geo 15 min. 104 hr. 660 650 640 630 620 0 5 10 15 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 B-SB 1.3 min. 6 hr. 50000 100000 200000 0.0 0.5 1.0 1.5 2.0 0 1 2 3 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 26

Taiwan revisited - assessment leukemia brain cancer Summed test deviance over 10-fold C-V sets leukemia brain cancer PL-GCV 590.1 529.8 PL-PQL 585.6 529.5 B-Geo 583.3 525.7 B-SB 582.1 525.1 null 581.6 525.5 northing (km) 2490 2500 2510 2520 2530 0.24 0.26 0.28 0.30 0.32 northing (km) 2490 2500 2510 2520 2530 0.24 0.26 0.28 0.30 0.32 165 170 175 180 185 190 easting (km) 165 170 175 180 185 190 easting (km) 27

Assessment on count simulations n = 225, n test = 2500 on 50 by 50 grid 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 MSE 0.00 0.05 0.10 0.15 0.20 MSE 0.2 0.4 0.6 0.8 1.0 MSE 0.2 0.4 0.6 0.8 1.0 1.2 PL PQL PL GCV B Geo B SB null PL PQL PL GCV B Geo B SB null PL PQL PL GCV B Geo B SB null 28

Penalization in the spectral approach GP representation zeroes out high-frequency coefficients as appropriate Spatial hyperparameter controls coefficient variances g( ) GP(µ( ), σ 2 R(, ; ρ, ν)) prior penalty sample functions log(variance) 40 30 20 10 0 ρ = 0.1 ρ = 0.5 ρ = 2.5 0 50 150 250 frequency variance 0.000 0.010 0.020 ρ = 0.1 ρ = 0.5 ρ = 2.5 0 20 40 60 80 frequency f(x) 2 1 0 1 2 0.0 0.4 0.8 x 29

Heterogeneous penalties y 2 0 2 4 6 8 true gam() spectral x spatially-varying penalties are one option (e.g., Lang & Brezger 2004; Crainiceanu et al. 2004) spatially-varying ρ in a GP context is another 30

Outline Motivating examples Introduction to Gaussian processes (GPs) Fast Gaussian process modelling Flexible Gaussian process modelling Why Bayes works for smoothing The future: flexibility + efficiency + hierarchical modelling 31

A nonstationary covariance Higdon, Swall, and Kern (1999) model: C NS (x i, x j ) = k xi (u)k xj (u)du R p guaranteed positive definite Gaussian kernels give closed form: k xi (u) exp ( (u x i ) T Σ 1 i (u x i ) ) ( R NS (x i, x j ) = c ij exp (x i x j ) T ( Σi + Σ j 2 ) 1 (x i x j )) g( ) GP(µ, σ 2 R NS (, ; Σ( ))) 32

Nonstationary GPs in 1-D 0.1 0.3 0.5 Kernel standard deviation Some kernels f(x) 1.0 0.0 0.5 Some sample functions f(x) 0.5 0.5 1.5 0.0 0.4 0.8 0.0 0.4 0.8 x x 0.0 0.4 0.8 x f(x) 2.5 1.5 0.5 0.5 0.0 0.4 0.8 x f(x) 0.5 1.0 1.5 2.0 0.0 0.4 0.8 x 33

Nonstationary GPs in 2-D Sample function Kernel structure Sample function image x 2 1 g(x[1], x[2]) 0 1 2 1.0 0.8 x[2] 0.6 0.4 0.2 0.0 0.0 0.2 0.4 x[1] 0.6 0.8 1.0 x 2 x 1 x 1 34

Generalizing the kernel convolution approach Squared exponential form: exp ( ( τij ρ ) 2 ) c ij exp ( (x i x j ) T ( Σi + Σ j 2 ) 1 (x i x j )) infinitely-differentiable sample paths Distance measures : isotropy τij 2 = (x i x j ) T (x i x j ) anisotropy τij 2 = (x i x j ) T Σ 1 (x i x j ) ( ) nonstationarity Q ij = (x i x j ) T Σi +Σ 1 j 2 (xi x j ) Can we replace τ 2 ij with Q ij in other stationary correlation functions? 35

A class of nonstationary covariance functions Theorem 1: if R(τ) is positive definite for R P, P = 1, 2,..., then R NS (x i, x j ) = c ij R( Q ij ) is positive definite for R P, P = 1, 2,... Theorem 2: smoothness (differentiability) properties of the original stationary correlation retained Specific case of Matérn nonstationary covariance: 1 Γ(ν)2 ν 1 ( ) ν ( ) 2 ντ 2 ντ K ν ρ ρ 1 ( Γ(ν)2 ν 1 2 ) ν ( νq ij Kν 2 ) νq ij advantages: more flexible form, differentiability not constrained, possible asymptotic advantages 36

Exponential and Matérn sample functions (stationary) ν = 0.5 ν = 4 x 2 3 2 1 0 1 2 3 4 x 2 2 1 0 1 2 3 4 x 1 x 1 37

A basic Bayesian nonstationary spatial model Bayesian nonstationary kriging model Y i N(g(x i ), η 2 ), x i R 2 g( ) GP(µ, σ 2 R NS (, ; ν, Σ( )) Let R NS be the nonstationary Matérn correlation Kernels (Σ x ) constructed based on stationary GP priors define multiple kernel matrices, Σ x, x X smoothly-varying (element-wise) in domain positive definite Fit via MCMC, including parameters determining Σ( ) 38

Smoothly-varying kernel matrices Spectral decomposition for each Σ x = Γ T x Λ x Γ x in R 2, parameterize each kernel using unnormalized eigenvector coordinates (a x, b x ) and the second eigenvalue (log λ 2,x ) define stationary GP priors for Φ( ) {(a( ), b( ), log(λ 2 ( ))} efficiently parameterize each GP using basis function approximation (Zhao & Wand, 2004) a x, b x λ 2x x λ 1x 39

Colorado precipitation characterization 37 38 39 40 41 109 108 107 106 105 104 103 102 40

Estimating Colorado precipitation Posterior mean Posterior std dev 37 38 39 40 41 a 5.0 5.5 6.0 6.5 7.0 37 38 39 40 41 c 0.0 0.2 0.4 109 107 105 103 109 107 105 103 37 38 39 40 41 b 5.0 5.5 6.0 6.5 7.0 37 38 39 40 41 d 0.0 0.2 0.4 109 107 105 103 109 107 105 103 41

Why doesn t Bayes overfit? Fourier basis involves k 2 (=4096, e.g.) coefficients Nonstationary covariance involves very highly-parameterized covariance structure No direct penalty on complicated spatial functions marg. lik. (prior predictive) P(Y M2) data space P(Y M1) P (y M i ) y 2.0 1.0 0.0 0.5 1.0 1.5 Y 2 x y 0.5 0.0 0.5 1.0 1.5 2.0 Y 1 x y 1.0 0.5 0.0 0.5 1.0 x Model 1 ρ = 2.5 Model 2 ρ = 0.5-40.3-18.9-12.7-27.5-21.1-16.4 42

What does a Bayesian approach give us? ability to create rich hierarchical models that reflect our understanding of the system in environmental health applications the ability to incorporate time, latent variables, misaligned data a natural penalty on overfitting a recipe (perhaps slow) for estimation proper characterization of the uncertainty challenge lies less on the modelling side than with computations, model comparison and evaluation, and reproducibility 43

Future methodological work collaborative work on spatio-temporal modelling computational approaches for applying existing methodological ideas to real health data GP computations and parameterization: flexibility + efficiency + hierarchical modelling computational tricks for the nonstationary covariance; e.g., knotbased approaches for faster computation use of a wavelet basis with irregular spatial data in a similar framework as the spectral basis combining deterministic and stochastic models, e.g. for air pollution useful, practical methods for designing spatial monitoring networks and determining power 44