Gaussian predictive process models for large spatial data sets.

Similar documents
Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modeling for Spatio-temporal Data

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Geostatistical Modeling for Large Data Sets: Low-rank methods

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Introduction to Geostatistics

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Low-rank methods and predictive processes for spatial models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Hierarchical Modeling for non-gaussian Spatial Data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

CBMS Lecture 1. Alan E. Gelfand Duke University

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Hierarchical Modelling for non-gaussian Spatial Data

Introduction to Spatial Data and Models

Modelling Multivariate Spatial Data

Introduction to Spatial Data and Models

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Hierarchical Modelling for non-gaussian Spatial Data

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes

Point-Referenced Data Models

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios

Multivariate spatial modeling

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Wrapped Gaussian processes: a short review and some new results

Hierarchical Modeling for Spatial Data

Information geometry for bivariate distribution control

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Lecture 23. Spatio-temporal Models. Colin Rundel 04/17/2017

Gaussian Processes 1. Schedule

Cross-covariance Functions for Tangent Vector Fields on the Sphere

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Bayesian data analysis in practice: Three simple examples

Multivariate Bayesian Linear Regression MLAI Lecture 11

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

Nonparametric Bayesian Methods (Gaussian Processes)

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Multivariate Spatial Process Models. Alan E. Gelfand and Sudipto Banerjee

STA 4273H: Statistical Machine Learning

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Bayesian Linear Models

EXTREME VALUE MODELING FOR SPACE-TIME DATA WITH METEOROLOGICAL APPLICATIONS

Gaussian Process Regression

Bayesian Dynamic Modeling for Space-time Data in R

Dynamically updated spatially varying parameterisations of hierarchical Bayesian models for spatially correlated data

Basics of Point-Referenced Data Models

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Nonparametric Bayesian Methods - Lecture I

Multivariate Gaussian Random Fields with SPDEs

Covariance function estimation in Gaussian process regression

Bayesian Linear Models

Statistícal Methods for Spatial Data Analysis

arxiv: v1 [stat.me] 28 Dec 2017

Hierarchical Modeling for Univariate Spatial Data

Fusing space-time data under measurement error for computer model output

(Multivariate) Gaussian (Normal) Probability Densities

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Journal of Statistical Software

STAT 518 Intro Student Presentation

Cross-sectional space-time modeling using ARNN(p, n) processes

Model Selection for Geostatistical Models

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Probabilistic & Unsupervised Learning

9.2 Support Vector Machines 159

Using Estimating Equations for Spatially Correlated A

Linear Regression (9/11/13)

A full-scale approximation of covariance functions for large spatial data sets

Bayesian Linear Models

Approaches for Multiple Disease Mapping: MCAR and SANOVA

COVARIANCE APPROXIMATION FOR LARGE MULTIVARIATE SPATIAL DATA SETS WITH AN APPLICATION TO MULTIPLE CLIMATE MODEL ERRORS 1

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Bayesian Linear Regression

CS Lecture 19. Exponential Families & Expectation Propagation

CPSC 540: Machine Learning

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

BAYESIAN PREDICTIVE PROCESS MODELS FOR HISTORICAL PRECIPITATION DATA OF ALASKA AND SOUTHWESTERN CANADA. Peter Vanney

Transcription:

Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015

Overview Recap Gaussian Process Spatial Regression Univariate Predictive Multivariate Gaussian Processes Linear Model of Coregionalization Multivariate Predictive Process Extensions to non-gaussian and Space-Time Data.

Gaussian Process Definition Y (s) is a Gaussian process with mean function µ(s) and covariance function H(s, s ) = cov(y (s), Y (s )) if for every subset of locations s 1,..., s n the vector Ỹ = (Y (s 1),..., Y (s n )) T Ỹ MV N n ( µ, H) (1) where µ = (µ(s 1 ),..., µ(s n )) T and H is a matrix such that {H} ij = H(s i, s j ; φ). To be a valid covariance function H(s, s : φ) must be positive semidefinite in the sense that it generates covariance matrices H which are positive semi definite v T Hv 0[2, Pg. 80].

Spatial Regression Model Y (s) = X(s) T β + w(s) + ε(s) (2) X(s) T is a vector of coefficients. ε(s) is an independent process, nugget effect. w(s) is a spatial random effect. w(s) GP (0, C(s, s ; θ)) C(s, s ; θ) = σ 2 ρ(s, s ; θ) Y N(Xβ, Σ Y ) Σ Y = C(θ) + τ 2 I

Computational Challenges Fitting the above model requires calculating determinants and inverses of large matrices. Computational Complexity grows with matrix size Matrix Complexity Memory limitations also create problems. Lot s of work has been done trying to fit models for large spatial data sets.

Univariate Predictive Process Y (s) = X(s) T β + w(s) + ε(s) (3) w = (w(s 1),..., w(s m)) w(s) = E(w(s 0 ) w ) = c(s; θ) T C 1 (θ)w c(s; θ) T = (C(s, s 1; θ),..., C(s, s m; θ)) w(s) GP (0, c T (s; θ)c 1 c(s; θ)) Advantage: Now work with m m matrices instead of n n.

Properties w(s 0 ) minimizes E(w(s 0 ) f(w ) w ) over all real valued functions f(w ). w(s ) = c T (s j; θ)c 1 (θ)w = w(s j) It interpolates process w(s) at the knots. w a = (w(s 1 ),..., w(s n ), w(s 1),..., w(s m)) and p(w a Y ) be the posterior distribution of w a with all other parameters fixed. 1. p(w a Y ) P (w a )P (Y w) since P (Y w) = P (Y w a ). 2. The posterior for the predictive process model replaces P (Y w) with q(y w ). 3. We want to preserve q(y w a ) = q(y w ). 4. Authors claim the predictive process model corresponds to the density which minimizes the reverse Kullback Leibler divergence between the posteriors q(w a Y ) and p(w a Y ).

Knot Selection In addition to specifying a covariance function the predictive process relies on specifying a set of knots S. We need to specify the number of knots m and the location of the knots. Choosing the knots to be all spatial locations reduces to the original model. Choosing the number of knots we balance performance with computational complexity. Authors consider modifications to a standard grid of knots(close pairs, infill).

To compare performance Compare covariance function of the parent process with that of the predictive process. 200 locations are uniformly generated over a [0, 10] X [0, 10] rectangle. Knots consist of a 10 X 10 equally spaced grid Matern covariance with σ 2 = 1 and range parameter φ = 2, and four values of ν. Covariances for 2,000 of the roughly 40,000 distance pairs are plotted for the predictive process.

Covariances of w(s) against distance (line) and covariances of w(s) against distance (points) : (a) smoothness parameter 0.5; (b) smoothness parameter 1; (c) smoothness parameter 1.5; (d) smoothness parameter 5 See Figure 1, Pg. 831

Alternative Scenario Exponential covariance Set ν = 0.5 Choose 4 values of the range parameter.

Covariances of w(s) against distance (line) and covariances of w(s) against distance (points): (a) range parameter 2; (b) range parameter 4; (c) range parameter 6; (d) range parameter 12 See figure 2, Pg. 832

Take-Aways 1 Covariance functions agree better at larger distances Especially when increasing smoothness and range May need dense knots. Knot selection with a packed subset (instead of just a grid), may improve results.

Lattice plus close pairs configuration: regular k x k lattice of knots, intensifies this grid by randomly choosing m of these lattice points and then placing an additional knot close to each of them. Lattice plus infill design: starts with knots on a regular k x k lattice, intensifies the grid by placing a more finely spaced lattice within m randomly chosen cells of the original lattice.

Simulation 1 Simulate Y (s) from 3000 irregularly scattered locations (s). Y (s) = x T (s)β + w(s) + ɛ(s) See figure 3a, Pg. 838.

See Figures 3a,3b,3c Pg. 838.

See Table 1, Pg. 837

See Table 2, Pg. 839

See Table 3, Pg. 839

Take-Aways 2 Estimation is more sensitive to the number of knots than to the underlying design. Close pair designs appear to improve estimation of the shorter ranges as seen for λ 2 with 256 knots. Predictions are much more robust (little change with increase in knots).

Simulation 2 15,000 locations (vs 3000 in Simulation 1) Non-stationary random field full model computationally infeasible. Divide domain into 3 regions each with a different intercept.

Simulated sites and OLS residuals knots. See Figure 4ab, Pg. 840. Spatial residuals See Figure 4c, Pg. 840.

See Table 4, Pg. 841

Take-Aways 3 Knot density, better estimation Spatial residuals are smoother and illustrate regional anisotropy.

Application Forest biomass and other variables that are related to current carbon stocks are important for quantifying ecological and economic viability of forest landscapes. Want to know: how biomass changes across the landscape (as a continuous surface) and how homogeneous it is across the region? interpolated surface.

Data Point-referenced biomass (log-transformed) data observed at 9500 locations (USDA) Y (s): biomass from trees X 1 (s) : the cross-sectional area of all stems above 1.37 m from the ground (basal area) X 2 (s) number of tree stems (stem density) at that location. Spatially varying-coefficient model: Y (s) = x T (s) β(s) + ɛ(s) Predictive Process Model: See Figure 5, Pg. 843. Y = Xβ + Z T C T (θ)c 1 (θ)w + ɛ

See Table 5 Pg. 844.

See Figure 6, Pg. 845 Posterior (mean) estimates of spatial surfaces from the spatially varying coefficients model: (a) intercept parameter (b) basal area parameter (c) stem density (d) (log-) biomass

Bivariate Gaussian Process Bivariance ( ) Gaussian Process w1 (s) MV GP w 2 (s) 2 (0, Γ w (s, s )) Cross-Covariance ( Function Γ w (s, s cov(w1 (s), w ) = 1 (s )) cov(w 1 (s), w 2 (s ) )) cov(w 2 (s), w 1 (s )) cov(w 2 (s), w 2 (s )) For observed locations s 1,..., s n The covariance matrix induced by Γ w (s, s ) becomes 2n 2n.

Multivariate Gaussian Process Multivariate Gaussian Process w 1 (s). MV GP k (0, Γ w (s, s )) w k (s) Cross-Covariance Function cov(w 1 (s), w 1 (s ))... cov(w 1 (s), w k (s )) Γ w (s, s ) =..... cov(w k (s), w 1 (s ))... cov(k 2 (s), w k (s )) For observed locations s 1,..., s n The covariance matrix induced by Γ w (s, s ) becomes k n k n.

Multivariate Spatial Regression Y (s) = X(s) T β + w(s) + ε(s) (4) Linear Model of Coregionalization[1] w 1 (s) v 1 (s). = A(s)v(s) = A(s). w m (s) v m (s) v j (s) independent GP (0, ρ j (s, s )) v j (s) GP (, Γ v (s, s )) Γ v (s, s ) = diag([ρ i (s, s )] k i=1) Γ w (s, s ) = A(s)Γ v (s, s )A T (s ) SinceΓ w (s, s) = A(s)A T (s ) we can take A(s) to be lower

Multivariate Predictive Process Y (s) = X(s) T β + w(s) + ε(s) (5) w(s) = cov(w(s), w )var 1 (w )w = C T (s; θ)c 1 (θ)w C(s; θ) = Γ w (s, s 1; θ). Γ w (s, s m; θ) is a mk k matrix. C 1 (θ) = [ Γ w (s i, s j) ] m is an mk mk matrix. i,j=1

Additional Computational Savings A(s 1) 0... 0 A 0 A(s 2)... 0 =...... 0 0 0 A(s m) Γ v (s 1, s 1) Γ v (s 1, s 2)... Γ v (s 1, s m) Γ v (s 2, s 1) Γ v (s 2, s 2)... Γ v (s 2, s m) Σ v =...... Γ v (s m, s 1)...... Γ v (s m, s m) ρ 1 (s i, s j) 0... 0 Γ v (s i, s 0 ρ 2 (s i, s j)... 0 j) =...... 0 0 0 ρ k (s i, s j)

Additional Computational Savings C = A Σ v A T C 1 = A 1 Σ 1 v A 1T Σ v = P T H P H1 0... 0 H 0 H2... 0 =...... 0 0 0 Hm H i = [ρ i (s j, s j )]m j,j =1 P 1 = P T

General Framework Spatial Mixed Model Y (s) = X T β + Z T (s)w(s) + ε(s) Y (s) is a q 1 vector of responses at location s. X T (s) = diag ( x T 1 (s),..., x T q (s) ) Z T (s) is a q k design matrix. w(s) is k 1 vector of spatial effects. β is a vector of coefficients of length p = Predictive Process Model Y (s) = X T β + Z T (s) w(s) + ε(s) q p l. l=1

Implementation Y = Xβ + Z T C T (θ)c 1 (θ)w + ε, ε N(0, I q Ψ). Y = [Y (s i )] n i=1 is a nq 1 vector of responses. X T = [X(s i ) T ] n i=1 is an nq p matrix of coefficients. Z T = BlockDiag(Z(s 1 ),..., Z(s n )) is a nq nk design matrix. C T (θ) = [Γ w (s i, s j )] n,m i,j=1 w and C are from the predictive process. After marginalizing out w f(y Ω) = MV N(Xβ, Z T C T (θ)c 1 (θ)c(θ)z + I n Ψ)

Sherman-Woodbury-Morrison Formula 1. (A + UCV ) 1 = A 1 A 1 U(C 1 + V A 1 U) 1 V A 1 [4] 2. det(a + UW V T ) = det(w 1 + V T A 1 U)det(W )det(a).[3] Likelihood calculations for Y involve computing Determinant Inverse of (Z T C T (θ)c 1 (θ)c(θ)z + I n Ψ). Using the identities computations are in terms of mk mk matrices instead of nq nq.

Extensions: Non-gaussian and Spatio-Temporal Data 1. Non-Gaussian Data(Binary, Count, Categorical): Binomial data (probit and logistic models) Count data. Assume we have an appropriate transformation η(s) = g(e(y (s))) = X T (s)β + w(s), g() known. In general you can t marginalize out w(s) and it s full conditional is not available. Clever trick: η(s) = g(e(y (s))) = X T (s)β + w(s) + ε(s), ε(s) produces full conditionals for w(s) which are multivariate normal. 2. Space-Time Data Now must specify knots over space and time D S. The predictive process model extends naturally to this case.

S Banerjee, BP Carlin, and AE Gelfand. Hierarchical modeling and analysis for spatial data. Monographs on statistics and applied probability (101) Show all parts in this series, 2004. Carl Edward Rasmussen. Gaussian processes for machine learning. 2006. Wikipedia. Matrix determinant lemma wikipedia, the free encyclopedia, 2015. [Online; accessed 16-September-2015]. Wikipedia. Woodbury matrix identity wikipedia, the free encyclopedia, 2015.

[Online; accessed 16-September-2015].