Spatially-Varying Covariance Functions for Nonstationary Spatial Process Modeling

Size: px

Start display at page:

Download "Spatially-Varying Covariance Functions for Nonstationary Spatial Process Modeling"

Aubrey Wilcox
6 years ago
Views:

1 Spatially-Varying Covariance Functions for Nonstationary Spatial Process Modeling Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Mark D. Risser, B.S., M.S. Graduate Program in Statistics The Ohio State University 2015 Dissertation Committee: Catherine A. Calder, Ph.D., Advisor Peter F. Craigmile, Ph.D. Christopher M. Hans, Ph.D.

2 c Copyright by Mark D. Risser 2015

3 Abstract In many environmental applications involving spatially-referenced data, limitations on the number and locations of observations motivate the need for practical and efficient models for spatial interpolation, or kriging. A key component of models for continuously-indexed spatial data is the covariance function, which is traditionally assumed to belong to a parametric class of stationary models. While convenient, the assumption of stationarity is rarely realistic; as a result, there is a rich literature on alternative methodologies which capture and model the nonstationarity present in most environmental processes. The first contribution of this dissertation is to provide a rigorous and concise description of the existing literature on nonstationary methods, paying particular attention to process convolution (also called kernel smoothing or moving average) approaches, since these serve as the motivation for this dissertation. The remaining contributions of this dissertation address the limitations innate to the existing methods by developing new approaches that are computationally feasible and yield interpretable results. For illustration, the methods are applied to both meteorological and soil science data sets. First, we address the lack of off-the-shelf software for fitting nonstationary convolution-based spatial models. Convolution-based models are highly flexible yet ii

4 notoriously difficult to fit, even with relatively small data sets. The general lack of prepackaged options for model fitting makes it difficult to compare new methodology in nonstationary modeling with other existing methods, and as a result most new models are simply compared to stationary models. We present a new convolution-based nonstationary covariance function for spatial Gaussian process models that provides efficient computing in two ways: first, by representing spatially-varying parameters via a discrete mixture or mixture component model, and second, by estimating the mixture component parameters through a local likelihood approach. In order to make computation for a convolution-based nonstationary spatial model readily available, this paper also presents and describes the convospat package for R. Next, we build on the growing literature of covariate-driven nonstationary spatial modeling and propose a Bayesian model for continuously-indexed spatial data based on a flexible parametric covariance regression structure for a convolution-kernel covariance matrix. The resulting model is a parsimonious representation of the kernel process, and we explore properties of the implied model, including a description of the resulting nonstationary covariance function and the interpretational benefits in the kernel parameters. Furthermore, we demonstrate that our model provides a practical compromise between stationary and highly parameterized nonstationary spatial covariance functions that do not perform well in practice. Finally, we propose a fully Bayesian covariate-driven nonstationary spatial model which can accommodate covariate information that is either not fully observed or observed on a different resolution than the spatial process of interest. Furthermore, while allowing covariates to inform the second-order properties of the process, the model mimics a nonparametric approach by not using the covariate directly in this iii

5 specification. Partially observed covariates are incorporated using a clustering or segmentation algorithm, and multivariate covariate information is accommodated in a Bayesian model-averaging framework. A discussion of extensions and future work concludes the dissertation. iv

6 Dedicated to Laura, my wife and partner in life s adventures, as well as my loving and supportive family Dave, Doris, Jon, and Jonalyn. v

7 Acknowledgments First and foremost, I would like to thank my advisor, Kate Calder, for the immense role she has played in my development as a researcher and statistician. I am extremely grateful for her leadership, guidance, and mentorship over the past three and a half years as I transitioned from a new graduate student to an independent researcher, and especially through my recent job search. Kate has provided me with endless and invaluable advice, and has set an excellent example for me of what it means to be an academic statistician. I will strive to imitate her commitment to research, teaching, and service to both my home university and the greater statistics community. I am honored to call Kate a friend, and I look forward to a continued professional relationship in the years ahead. I would like to thank the members of my committee: Dr. Peter Craigmile for his comments on my research, expertise in spatial statistics, and help with job and postdoc applications; Dr. Chris Hans for his comments on my research and expertise in Bayesian analysis and statistical computing; and Dr. Ying Sun, who served on my candidacy committee, for her comments on my research and expertise in spatial statistics and computing. I would like to thank the other professors in the Department of Statistics at Ohio State who have given much support and guidance throughout my development as a statistician and educator: Dr. Jackie Miller, Dr. Radu Herbei, Dr. Chris Holloman, vi

8 Dr. Elizabeth Stasny, Dr. Angela Dean, and Dr. Elly Kaizar. I would also like to thank Dr. Veronica Berrocal at the University of Michigan for being a fantastic collaborator. I would like to thank my graduate student colleagues at Ohio State, whose support and camaraderie was absolutely invaluable to surviving graduate school: Brittney Bailey, Kevin Donges, Dave Kline, Zach Thomas, Shivi Vaidyanathan, and Staci White. I would like to thank my wife Laura, my parents Dave and Doris, my brother Jon and sister-in-law Jonalyn, and many other friends for the constant love, support, and confidence in my abilities. Finally, I would like to thank God for walking with me daily, equipping me to face life s challenges, and providing me with the opportunities that have brought me to where I am today. vii

9 Vita B.S., Mathematics, Eastern Mennonite University University Fellow, Graduate School, The Ohio State University Research Associate, Dr. Jackie Miller, The Ohio State University M.S., Statistics, The Ohio State University Graduate Teaching Associate, Department of Statistics, The Ohio State University Research Associate, Dr. Radu Herbei, The Ohio State University Junior Consultant, Statistical Consulting Service, The Ohio State University Publications Risser, Mark D. and Calder, Catherine A. Regression-based covariance functions for nonstationary spatial modeling. Environmetrics, 26(4): , Miller, Jackie B., Risser, Mark D., and Griffiths, Robert P. Student choice, instructor flexibility: moving beyond the blended instructional model. Issues and Trends in Educational Technology, Vol. 1, No. 1, May, Fields of Study Major Field: Statistics viii

10 Table of Contents Page Abstract Dedication Acknowledgments Vita List of Tables ii v vi viii xii List of Figures xiii 1. Introduction Gaussian processes for spatial statistical modeling Second-order properties of spatial processes Parametric models for stationary spatial covariance functions Canonical Gaussian process-based spatial statistical model Prediction Evaluation criteria Motivation and contributions of this work Data descriptions United States monthly meteorology Soil organic carbon stocks from the continental United States Nonstationary spatial modeling Deformation methods Basis function expansions Gaussian Markov random field methods Process convolution or kernel smoothing methods ix

11 2.4.1 Main result Discrete process convolution model Convolution of locally stationary processes Spatially-varying parameters Methods for including covariate information in a covariance function Local likelihood estimation for covariance functions with spatially-varying parameters Introduction Spatially-varying parameters via a discrete mixture representation Computationally efficient inference: local likelihood estimation Using the convospat package for R Nonstationary model fitting Anisotropic model fitting Evaluation criteria and plotting functions Other functions Example 1: simulated data Example 2: annual precipitation Spatial model summaries Visualizations of nonstationarity Discussion Regression-based covariance functions for nonstationary spatial modeling Introduction A class of regression-based nonstationary covariance functions Alternative kernel matrix regression models Parameter interpretations, model geometry, and parsimony General properties of C R Computational details Prior specification Markov chain Monte Carlo (MCMC) Application: annual precipitation in Colorado, USA Data Model comparison Results Discussion Treed covariate segmentation models for soil carbon and other nonstationary spatial processes x

12 5.1 Introduction Segmentation approaches for nonstationary spatial models Mixture of stationary processes Independent segments Mixture component model Treed covariate models via Bayesian CART Case 1: categorical treed response Case 2: continuous treed response Posterior sampling via Metropolis-Hastings for the tree Multivariable segmentation and model averaging Prior selection for mean and covariance parameters Markov chain Monte Carlo Evaluation criteria for model averaging Application: soil organic carbon stocks for the Great Lakes region, USA Covariates used for segmentation Holdout data, tree selection, mean structure, and model comparison Results Discussion Contributions and Future Work Bibliography xi

13 List of Tables Table Page 3.1 Parameter estimates from the simulated data, comparing the stationary and nonstationary models A brief summary of the different models fit to the precipitation data Model details and evaluation for the best model of each type fit to the precipitation data, selected based on maximizing CRPS. Note: SV indicates spatially-varying. The computational time given is for a Dual Quad Core Xeon 2.66GHz machine with 32GB RAM Parameter estimates for the five best spatial models fit to the precipitation data set, indicating which parameters are spatially-varying (SV) Posterior means and 95% credible intervals for the stationary (S-M1) and full nonstationary (FNS-M2) models. Coefficient estimates in bold indicate those with a 95% credible interval that does not include zero. Note: all covariates have been standardized Fixed hyperparameter values used for the nonstationary models and stationary model (as appropriate) Posterior means for parameters in each of the individual models % posterior credible intervals for parameters in each of the individual models Out-of-sample (CRPS, MSPE, logscore) and in-sample (imspe, ilogscore) evaluation criteria for each model (recall: large CRPS and log score is better; small MSPE is better). Best models for each criteria are in bold.141 xii

14 List of Figures Figure Page 1.1 Colorado annual precipitation data for 1981, in log mm Annual precipitation data for 1997, in log mm Soil organic carbon stocks, for each Rapid Carbon Assessment (RaCA) site Visualization of the discrete basis kernel approach of Higdon (1998). The blue + symbols represent basis locations with fixed (locally estimated) basis kernel functions corresponding to the dashed black ellipses. The symbols represent three arbitrary spatial locations and the corresponding ellipses (kernel functions), calculated as in (2.9) Left: true mixture component ellipses with observation locations (red) and holdout locations (green). Right: simulated data Predictions and prediction errors from the stationary model (a. and b.) and the nonstationary model (c. and d.) True mixture component ellipses (solid red) with fit radius (dashed gray), nonstationary ellipses (solid black), and the stationary ellipse (dashed blue) Estimated correlations for a reference point, showing the nonstationary (left) and stationary (center) models, as well as the true correlation (right) Predictions and prediction standard errors for the stationary model (plots (a) and (b)) and the nonstationary model NS1 (plots (c) and (d)) xiii

15 3.6 Estimated mixture component ellipses for the nonstationary model (red), the stationary (anisotropic) model (blue), and the estimation region (dashed black) Plots of the estimated spatially-varying process variance σ 2 (right) and nugget variance τ 2 (left) for model NS Correlation plots for three reference points, comparing the stationary model S (top) and nonstationary model NS4 (bottom) A visualization of the parameter space for two-dimensional kernel matrices under the covariance regression model Observation stations (n = 217), labeled by log annual precipitation total (upper left) and residuals from a simple linear regression of log annual precipitation on elevation, slope, and the slope/elevation interaction (upper right); topographical map of Colorado with elevations in meters (lower left); a representation of the change in elevation or slope, measured as a west-to-east gradient (lower right) A plot of the estimated spatially-varying process variance, calculated using the posterior mean of the parameters Correlation plots for four reference points in the stationary model S- M1 (top) and nonstationary model FNS-M2 (bottom), calculated using the posterior mean parameter estimates Boxplots of the evaluation criteria for each of the models fit to the Colorado precipitation data, summarized for each of 20 holdout replicates. Recall: S-M1 is the stationary model, FNS-M2 is the full nonstationary model, RNS-M3 is the reduced nonstationary model. Small MSPE indicates better model fit; larger CRPS and log score indicate better model fit. The bar plot on the right summarizes which model is chosen as best under each criteria when the three models are compared separately for each of the 20 hold-out sets The Great Lakes region subset of the Rapid Carbon Assessment (RaCA) SOC data xiv

16 5.2 The Great Lakes region subset of the Rapid Carbon Assessment (RaCA) land use-land cover classes. Recall: CRP refers to a cropland site which corresponds to a Conservation Reserve Program The Great Lakes region subset of the Rapid Carbon Assessment (RaCA) drainage classes. Note: VPD = very poorly drained, SPD = somewhat poorly drained, PD = poorly drained, MWD = moderately well drained, WD = well drained, SED = somewhat excessively drained, ED = excessively drained The four-segment trees used for land use-land cover (left) and drainage classes (right) Nonstationary model summaries using the drainage class tree. Top: posterior boxplots for mean and variance parameters. Middle: (locally) stationary correlation plots for an arbitrary 6 by 6 subset of space. Bottom: bar charts of the land use-land cover variable within each segment (recall: ED = excessively drained, MWD = moderately well drained, PD = poorly drained, SED = somewhat excessively drained, SPD = somewhat poorly drained, VPD = very poorly drained, WD = well drained; NA = not available) Nonstationary model summaries using the drainage class tree. Top: posterior boxplots for mean and variance parameters. Middle: (locally) stationary correlation plots for an arbitrary 6 by 6 subset of space. Bottom: bar charts of the land use-land cover variable within each segment (recall: C = cropland, F = farmland, P = pastureland, W = wetland, X = CRP site) Posterior samples of V, the model label xv

17 Chapter 1: Introduction 1.1 Gaussian processes for spatial statistical modeling Despite the rising popularity of spatio-temporal statistical modeling, there is still a strong need for flexible and computationally efficient spatial models appropriate for spatial prediction. For example, in the case of meteorological, agricultural, or geological data where fixed monitoring stations are used to collect observations of a spatial process, monitoring sites are not always located where information about the spatial process is desired. In such situations, it may be of interest to generate a filled-in prediction map of the spatial process based on a sparse, finite number of observations, as well as estimate the uncertainty in these predictions. The standard way to model a point-referenced, continuously-indexed spatial process is to specify that observations of the process are generated by a particular stochastic mechanism or stochastic process. A Gaussian process (GP) is an extremely popular choice for the stochastic process, due to the fact that all finite-dimensional distributions are known to be Gaussian and because the process is completely specified by a characterization of its first- and second-order properties. Furthermore, for a GP, the second-order properties can be easily specified by one of the widely used classes of valid spatial covariance functions. The spatial covariance function for a GP describes 1

18 the degree and nature of spatial dependence (or covariance) present in a spatial process. In fact, a hallmark principle of spatial statistics states that, in general, values of the spatial GP which are close together are more likely to be similar (dependent), while values which are far apart are most likely unrelated (approximately independent). 1.2 Second-order properties of spatial processes Define {Y (s) : s G} to be a general univariate and real-valued spatial stochastic process of interest, where the spatial domain G R d, d 1. Furthermore, without loss of generality, assume that the process Y ( ) is mean-zero, i.e., E[Y (s)] = 0 for all s G. Define C(, ) to be the spatial covariance function of {Y (s) : s G}, such that C(s, s ) Cov[Y (s), Y (s )] = E[Y (s) Y (s )], for all s, s G. The covariance function C is always symmetric, i.e., C(s, s ) = C(s, s), and when s = s, the covariance function defines the variance of the process C(s, s) = Var[Y (s)]. The covariance function must be a nonnegative definite function, meaning that n n a i a j C(s i, s j ) 0, (1.1) i=1 j=1 for any positive integer n, any set of locations {s i : i = 1,..., n} G, and any set of real numbers {a i : i = 1,..., n}. In order to learn about the covariance function from realizations of a spatial process, further assumptions are nearly always made regarding the properties of C. The most common is that of stationarity, meaning that some features of C do not 2

19 depend on the spatial location. More formally, a process {Y (s) : s G} is said to be second-order stationary (or weakly stationary) if the following two properties hold for all spatial lags h R d : 1. E[Y (s)] = E[Y (s + h)] = c for some constant c, and 2. C(s, s + h) = C(0, h). For fixed h, note that C(0, h) is a constant which does not depend on s. The covariance function for a spatial process which is second-order stationary can be written as C(s, s ) = C(s s ) and is often simply called a stationary covariance function. The first requirement is not restrictive since we have already specified Y ( ) to be meanzero, and non-constant mean behavior can be introduced into a different component of a statistical model (see Section 1.4). However, the second requirement is much more restrictive, as it is rarely reasonable to assume that the spatial dependence structure does not depend on spatial location. A complete characterization of the class of valid covariance functions is given by Bochner s theorem (Bochner, 1959; Adler, 1981), which states that a real-valued function defined on R d is the covariance function of a stationary process if and only if it is even and nonnegative definite. Bochner s theorem is a powerful result, as it enables the construction of stationary processes by utilizing the existing literature on nonnegative definite functions. Two special cases of second-order stationary processes are isotropic processes and anisotropic processes. An isotropic process has a covariance function which can be written in terms of the length of the spatial lag h, or C(s, s + h) = C( h ), (1.2) 3

20 d where represents the Euclidean norm in R d, i.e., x = k=1 x2 k. Isotropic processes are particularly restrictive, as not even directionality impacts the covariance function. Anisotropic processes are a slight generalization of isotropic processes, in that both distance and direction are incorporated into the covariance function by way of a linear transformation of the lag vector h. That is, the covariance function can be written C(s, s + h) = C( A 1/2 h ), (1.3) where A is a d d positive definite matrix (often called the anisotropy matrix) which allows the range of dependence to be longer or shorter in particular directions. Intuitively, the isotropic covariance function (1.2) yields spherical correlation patterns, while the anisotropic covariance function (1.3) yields ellipsoidal correlation patterns. 1.3 Parametric models for stationary spatial covariance functions Several parametric models for isotropic covariance functions are particularly popular in spatial statistical modeling. In a traditional framework, each of these depend on (at least) three parameters σ 2 > 0, τ 2 0, and φ > 0, which represent the process variance, nugget, and range, respectively. However, in more modern hierarchical modeling frameworks (see Section 1.4), it is often preferable to separate the nugget from the covariance function model and incorporate it in a different component of the model. Aside from the resulting hierarchical structure, this decision is made based on physical meaning, since in spatial modeling the nugget τ 2 represents measurement error (although it also accounts for microscale variability that cannot be accounted for based on the resolution of the data). Thus, the nugget is often included in a model 4

21 for observed data, rather than in a theoretical latent process model. In what follows, we will opt for the hierarchical model specification, which separates the nugget from the covariance function itself. Using this framework, the process variance (also called the partial sill) is C(0) = σ 2 and represents the variability in the process. The range parameter φ does not directly represent the range of the covariance function, but instead determines how quickly the covariance function decays to zero. Smaller values of φ correspond to a faster decay, while larger values of φ correspond to slower decay. A summary of commonly used parametric isotropic covariance functions is available in chapter 2 of Banerjee et al. (2014). One of the most popular parametric models is the Matérn covariance function (e.g., Stein, 1999), which depends on an additional smoothness parameter κ > 0, which controls the smoothness of the resulting spatial field. The Matérn covariance function is M κ (h) = σ 2 1 Γ(κ)2 κ 1 (h/φ)κ K κ (h/φ), h 0, (1.4) where K κ ( ) denotes the modified Bessel function of the third kind of order κ. Two special cases of the Matérn covariance function are when κ = 0.5 (appropriate for non-smooth processes), which results in the exponential covariance function M 0.5 (h) = σ 2 exp { h/φ}, h 0, and letting κ (appropriate for extremely smooth processes), which results in the Gaussian covariance function M (h) = σ 2 exp { (h/φ) 2}, h 0. 5

22 1.4 Canonical Gaussian process-based spatial statistical model A general modeling framework for a univariate spatial Gaussian process can now be defined as follows. With {Y (s) : s G} again a mean-zero Gaussian process with general covariance function C, now define {Z(s) : s G} to be a mean-adjusted noisy version of Y ( ), also defined for all s G R d, d 1. Then, the model can be written as Z(s) = µ(s) + Y (s) + ɛ(s), (1.5) where E[Z(s)] = µ(s) is a deterministic mean function, the ɛ( ) is a stochastic component that represents measurement error or microscale variability and is independently distributed as N (0, τ 2 (s)) with τ 2 ( ) unknown, and ɛ( ) and Y ( ) are independent. (In general, N (a, b) is the univariate Gaussian distribution with mean a and variance b.) It follows that for a fixed, finite set of n spatial locations {s 1,..., s n } G, the random (observed) vector Z = (Z(s 1 ),..., Z(s n )) will have a multivariate Gaussian distribution ( ) [Z Y, µ, D] = N n µ + Y, D, (1.6) where D = diag[τ 2 (s 1 ),..., τ 2 (s n )], and, conditional on the other parameters in the model, the process vector Y = (Y (s 1 ),..., Y (s n )) is distributed as [Y Ω] = N n ( 0, Ω ), (1.7) where N n ( a, B ) is the n-dimensional Gaussian distribution with mean vector a and covariance matrix B. The elements of Ω are Ω ij C(s i, s j ). In (1.5), no assumptions are made regarding the second-order properties of Y ( ). 6

23 The additive framework of (1.5) lends itself well to hierarchical modeling, although it is also possible to integrate over the process Y ( ) to arrive at the marginal distribution for Z( ), which is [Z µ, D, Ω] = [Z Y, µ, D][Y Ω]dY = N n ( µ, D + Ω ). (1.8) The marginal likelihood arising from (1.8) is the likelihood model often used in more traditional frameworks, which incorporates the nugget into the covariance function. Statistical inference for a spatial model proceeds in one of several ways, each of which is conditional upon observed values z of the random vector Z. The first is called geostatistics or kriging, which does not make the Gaussian process assumption as in (1.5) or (1.8) but still use the concept of a spatial covariance function. One approach to kriging methods estimates the variance/covariance parameters D and Ω using nonparametric approaches and the mean parameters µ using a generalized least squares procedure. Likelihood analyses, on the other hand, rely on the Gaussian process assumption and estimate the variance/covariance parameters using numerical optimization methods for the likelihood associated with (1.8) in a restricted maximum likelihood (REML) approach (see, e.g., Section 3.3); the mean parameters are again estimated using generalized least squares. Finally, fully Bayesian analyses (many of which also rely on the Gaussian process assumption) seek to perform inference on all unknown quantities, including the latent process values Y, using the posterior distribution of all unknown quantities given Z = z, which is p(y, µ, D, Ω Z = z) p(z Y, µ, D) p(y Ω) p(µ, D, Ω). (1.9) Bayesian inference treats the unknown parameters as themselves random variables, and hence must also assign a prior distribution for the parameters, denoted p(µ, D, Ω). 7

24 The prior distribution summarizes the a priori knowledge about µ, D, and Ω, or knowledge before the analysis is conducted (for further details, see Chapters 4 and 5). Bayes Theorem (1.9) provides a systematic way to update the a priori knowledge about the unknown parameters with observed data to arrive at the posterior distribution, upon which all inference is based. 1.5 Prediction One additional reason for the popularity of the Gaussian process for spatial statistical modeling is the ease with which predictions of the process at unobserved locations can be obtained. This is due to the well-known conditional distribution properties of the multivariate Gaussian distribution, which provide a closed-form distribution for the process at unobserved locations given the value of the process at observed locations (and conditional on parameters). Thus, when using a Gaussian process model, it is always straightforward to obtain a filled-in prediction map of the spatial process of interest, along with a quantification of the uncertainty in the predictions. In a Bayesian context, predictions at unobserved locations can be obtained as follows: with Z = z as the n observed process values, now define Z to be the values at m unobserved locations. Also, define θ to be a generic vector of all mean, variance, and covariance parameters. The posterior predictive distribution of interest is p(z Z = z) = θ p(z, θ z)dθ = Following the model specification in (1.5), [ ] ([ ] [ Z µµ D + Z θ ΩZ Ω N n+m, ZZ Ω Z Z D + Ω Z 8 θ p(z θ, z)p(θ z)dθ. (1.10) ]),

25 so by conditional properties of the multivariate Gaussian distribution, Z Z = z, θ N m (µ Z z, Σ Z z), (1.11) where and µ Z z = µ Ω Z Z(D + Ω Z ) 1 (z µ) (1.12) Σ Z z = (D + Ω Z ) Ω Z Z(D + Ω Z ) 1 Ω ZZ. (1.13) For most prior choices, the integral in (1.10) is not available in closed form, but given Markov chain Monte Carlo samples from the posterior p(θ z), say {θ l, l = 1, 2,..., L}, we can compute a Monte Carlo estimate of the posterior predictive mean E[Z z] = L 1 L l=1 Z l, (1.14) where Z Z and the Z l are draws from the distribution [Z z, θ l ] from (1.11). Other inferential quantities, such as (1 α)100% posterior predictive intervals for α [0, 1], can be calculated by finding the ( 100(α/2) ) th and (100(1 α/2)) th percentiles of the {Zl ; l = 1,..., L}. 1.6 Evaluation criteria Given that prediction is often the most important inferential goal of spatial statistics, the various models fit in the next chapters will be compared in terms of out-ofsample prediction. That is, for each data set, 10 to 20 percent of the observations will be held out and used as test data (denoted Z test, with m total observations), while the remaining 80 to 90 percent of the observations will be used as training data (denoted Z train ) to fit each model and predict at the m test data locations. Three evaluation 9

26 criteria will be used to compare predictions with the held-out data for each of the models. First, the mean squared prediction error is MSP E = 1 m m (zj ẑj ) 2, (1.15) j=1 where z j is the jth held-out observed value and ẑ j is the corresponding predicted posterior mean (from (1.14)). Smaller MSPE indicates better predictions. Second, for a more formal comparison, the continuous rank probability score will be used (a proper scoring rule; see Gneiting and Raftery, 2007). For the jth prediction, this is defined as ( CRP S j CRP S(F j, zj ) = Fj (x) 1{x zj } ) 2 dx, (1.16) where F j ( ) is the cumulative distribution function (CDF) for the predictive distribution of z j given Z train and 1{ } is the indicator function. A Monte Carlo estimate of the CRPS can be obtained by averaging over the posterior samples obtained from the MCMC algorithm, ĈRP S j = 1 L L l=1 ( Fj (x; θ l ) 1{x z j } ) 2 dx, where {θ l, l = 1, 2,..., L} are the posterior samples and F j ( ; θ l ) is the conditional univariate (Gaussian) predictive cumulative distribution function given in (1.11) with θ = θ l. In this case, given that the predictive CDF is conditionally Gaussian, a computational shortcut can be used for calculating (1.16): when F is Gaussian with mean µ and variance σ 2, CRP S ( [ ( ) F, zj 1 z ) = σ j µ 2 φ π σ 10 z j µ σ ( ( z j µ 2 Φ σ ) 1) ],

27 where φ and Φ denote the probability density and cumulative distribution functions, respectively, of a standard Gaussian random variable. The reported metric will be the average over all holdout locations, ĈRP S = m 1 m j=1 ĈRP S j. Larger CRPS indicates better model fit. Finally, the logarithmic score will be used, defined as logscore = 1 L log {p(z θ l, z)} (1.17) L (Good, 1952). A larger logarithmic score indicates better model fit. 1.7 Motivation and contributions of this work l=1 While assumptions of second-order stationarity (see Section 1.2) for a Gaussian process are both convenient and widely made, the main idea of this thesis is that these assumptions are almost never appropriate in real-world applications. Instead, a spatial process will almost always display some sort of nonstationarity, in which features of the process vary over space. In some cases the stationary and nonstationary components of a process can be separated, such that the first-order properties are nonstationary (spatially-varying) and a stationary covariance function is used, but in most cases even this assumption does not truly reflect the expected behavior of a spatial process. The primary motivation of this dissertation is to address these limiting assumptions by seeking new classes of spatial statistical models which use nonstationary covariance functions. In Chapter 2, we provide a comprehensive summary of existing nonstationary methods, giving particular attention to process convolution models as these will be the focus of this dissertation. In the remaining chapters, we then seek to address the limitations in the existing literature on nonstationary modeling. First, in 11

28 Chapter 3, we use local likelihood estimation to provide a computationally efficient way to estimate a nonstationary spatial Gaussian process model, even for relatively large data (on the order of n = 1000). The model in Chapter 3 is also useful as a comparison tool for new nonstationary methods, as the existing methods are difficult to implement and do not provide any pre-packaged tools for model fitting. In Chapter 4, we build upon the existing literature of covariate-driven nonstationary covariance function modeling, implementing a model which provides a parsimonious representation of a nonstationary process and therefore efficient computation. Furthermore, the model parameterization allows for clear interpretations with respect to how the covariates relate to the second-order properties of the process. Finally, in Chapter 5, an additional approach for covariate-driven nonstationary modeling is outlined, such that the covariates inform but do not completely specify the second-order properties. Furthermore, and more importantly, the method does not require values of the covariate at all observation and prediction locations of interest, nor do the covariate values even need to be at the same resolution as the observations of the spatial process of interest. Across each of these chapters, the cross-cutting themes and contributions of this dissertation are to provide new nonstationary spatial models which are practical to implement and also provide summaries and interpretations as to how and why a spatial process exhibits nonstationary behavior. 12

29 Longitude Latitude Annual precipitation (log mm) Colorado annual precipitation Figure 1.1: Colorado annual precipitation data for 1981, in log mm. 1.8 Data descriptions United States monthly meteorology Meteorological data from the continental United States is available online from the National Center for Atmospheric Research at Data/US.monthly.met/index.shtml, which contains monthly observations of precipitation, minimum temperature, and maximum temperature from at each of almost 12,000 monitoring stations (although there are a somewhat large number of missing station values). The precipitation measurements are used for two applications in this dissertation (Sections 3.6 and 4.4): first, a subset of the 1981 annual precipitation data which includes records from Colorado, United States, and a larger subset of the annual precipitation from 1997 from the western United States. The Colorado data is chosen to match the data set used in Paciorek and Schervish (2006), 13

30 Longitude Latitude Annual Precipitation, 1997 (log mm) Figure 1.2: Annual precipitation data for 1997, in log mm. where the 1981 records were chosen since they have the most stations (217) without missing monthly values. The dataset used for analysis, which included precipitation, latitude, and longitude, was obtained upon request from Dr. Chris Paciorek. The records from the western United States from 1997 were chosen as a subset (consisting of 1270 observations) because precipitation is smoother and more densely observed over the central and eastern United States. For both of these applications, the annual totals are transformed to be on the log scale in order to make the Gaussian process assumption more reasonable. The datasets are plotted in Figures 1.1 and

31 Longitude Latitude SOC MG/ha to 100cm < > 2000 Soil Organic Carbon Stocks Figure 1.3: Soil organic carbon stocks, for each Rapid Carbon Assessment (RaCA) site Soil organic carbon stocks from the continental United States The data set used in Chapter 5 to illustrate our proposed methodology are measurements of soil organic carbon (SOC), available online from the National Resources Conservation Service (NRCS) at detail/soils/survey/?cid=nrcs142p (Wills et al., 2013). The data are provided as part of the Rapid Carbon Assessment (RaCA) project, a project initiated by the Soil Science Division (SSD) of NRCS to capture information on the carbon content of soils across the conterminous United States at a single point in time. RaCA specifically emphasizes soil organic carbon (SOC) stocks, or the amount of SOC in a volume (area and depth of soil). Additional variables such as land useland cover (LULC) classes, soil series, soil moisture, and the SSD major land resource 15

32 area (MLRA) regions are available through the soildb package in R (Beaudette and Skovlin, 2015). A summary report of the sampling methods and data description is provided in Wills et al. (2013). While measurements of SOC stocks are available for each site to depths of five, ten, twenty, thirty, fifty, and one hundred centimeters (in MG C ha 1 ), we will use the one hundred centimeter depth measurement since our focus is on estimating total SOC, not a soil depth profile (e.g., Mishra et al., 2009; Minasny et al., 2006). A plot of the raw data for the entire conterminous United States is available in Figure

33 Chapter 2: Nonstationary spatial modeling As discussed in Chapter 1, an important component of modeling a spatial Gaussian process {Y (s) : s G} is the spatial covariance function C, used to model the second-order properties of dependence over space in the process. Covariance functions for Y ( ) are traditionally chosen to belong to some parametric class of stationary or isotropic models, in which the dependence between the process at two locations is a function of only the separation vector or separation distance between the locations, respectively (see Sections 1.2 and 1.3). This modeling assumption is made mostly for convenience and is rarely appropriate in real-world applications; as a result, there is a rich literature on alternative methodologies for modeling the second-order nonstationarity present in most problems. Four primary approaches for introducing nonstationarity into a covariance function model are deformation methods, basis function expansions, Markov random field methods using stochastic partial differential equations (SPDEs), and kernel smoothing or process convolution methods. The process convolution approach will be given a particularly thorough summary in this chapter, as it will be the focus of this dissertation. In what follows, the spatial domain of interest G will be of dimension d, i.e., G R d ; without loss of generality it will be assumed that the spatial process is mean-zero. 17

34 2.1 Deformation methods One of the earliest methods for introducing nonstationarity into a spatial model is known as the deformation method, due to Sampson and Guttorp (1992). The fundamental idea of isotropic models is that the covariance between observation locations is a function of Euclidean distance and so, intuitively, the deformation method obtains a nonstationary covariance structure by rescaling interpoint distances in a systematic way over G. More formally, deformation involves transforming the geographic region of interest (G) to a different deformed space (say, D) wherein isotropy holds. The transformation ξ : G D is ideally one-to-one and, in general, a nonlinear mapping. Formally, the covariance of the spatial process Y ( ) between two locations s, s G is given by C(s, s ) = g ( ξ(s) ξ(s ) ), for an arbitrary isotropic covariance function g which is valid on R d for d 1. In the original paper, Sampson and Guttorp (1992) use a two step non-parametric approach to estimation: first, they use multidimensional scaling (see, e.g., Mardia et al., 1979) to generate a two-dimensional coordinate representation (in D) of the observation locations in G, with inter point distances in D representing sample spatial dispersions. Second, a thin-plate spline interpolation is used to fill in the mapping for all points in the geographic region of interest. The original deformation model introduced by Sampson and Guttorp (1992) suffered from being unable to quantify the uncertainty introduced in estimating the mapping from G to D, so several Bayesian alternatives were subsequently proposed. Two alternatives were suggested independently, due to Damian et al. (2001) and 18

35 Schmidt and O Hagan (2003), differing primarily in their specification of a prior distribution on the mapping ξ( ). A major problem in the Sampson and Guttorp paper was that the estimated mapping was often not one-to-one and folded over itself, and therefore Damian et al. (2001) introduced a prior distribution for the transformed observation locations (ξ(s 1 ),..., ξ(s n )) which penalizes non-smooth maps, including ones that fold. The parameters for the prior distribution are fixed or calculated from the geographical coordinates. Alternatively, Schmidt and O Hagan (2003) propose a Gaussian process prior to the mapping ξ( ), again fixing most of the prior parameters or calculating them based on the observation locations. Both of these Bayesian models require intricate MCMC algorithms for model fitting and, due to the high dimensionality of the parameter space, are quite difficult to fit. All of the deformation methods mentioned thus far require replicates of the spatial data which can, in general, be obtained from detrended spatially-referenced observations over time, although such replications may not always be available. Anderes and Stein (2008) address this limitation and introduce an approximate likelihoodbased deformation approach for a single replicate of densely observed data. In their approach, the transformation ξ( ) is parameterized in terms of local affine transformations, with the parameters of the transformation characterized by an ellipse. The parameters of these ellipses are estimated at each observation location and then smoothed over the spatial region. The likelihood-based approach used here is desirable in that it imposes no requirement on the configuration of observation locations and gives estimates which are easy to obtain and efficient. 19

36 2.2 Basis function expansions Basis function expansions provide a constructive way to model the nonstationarity in a spatial process. The main idea of basis function expansions comes from the Karhunen-Loève decomposition of a (mean-zero) spatial process, Y (s) = λl W l E l (s), (2.1) l=1 where the W l are uncorrelated, standardized random variables, the λ l are eigenvalues, and the E l ( ) are orthogonal eigenfunctions. Choosing the W l to be Gaussian specifies Y ( ) to be a GP (e.g., Nychka et al., 2002). The E l ( ) being orthogonal eigenfunctions requires G E j (s)e k (s)ds = { 1 if j = k, 0 otherwise, for all j, k. The covariance function of the process (2.1) is C(s, s ) = λ l E l (s)e l (s ), (2.2) l=1 where the λ l and E l ( ) come from the Fredholm integral equation G C(s, s )E l (s)ds = λ l E l (s ). If the infinite series (2.1) and (2.2) are truncated to the leading L terms, the finite sum approximation Ĉ(s, s ) = L λ l E l (s)e l (s ) (2.3) l=1 is used instead. It can be shown that this truncation minimizes the variance of the truncation error for all sets of L basis functions when the E l ( ) are the exact solutions to the Fredholm equation (Wikle, 2010) and, as a low-rank representation of the process Y ( ), can facilitate computation. Estimating the covariance function in this way clearly results in a nonstationary covariance structure (i.e., Ĉ(s, s ) Ĉ(s s )). 20

37 The main task with basis function expansions is clearly to model the eigenvalue/eigenfunction pairs {λ l, E l ( )}. In practice, if an empirical covariance matrix Σ can be calculated, the pairs {λ l, E l ( )} can be approximated by the sample quantities obtained from the spectral decomposition Σ = Ê DÊ. Here, D = diag( λ 1,..., λ n ), and the columns of Ê are the estimated eigenvectors Êl( ), which are called empirical orthogonal functions (EOFs). This is the approach taken by Holland et al. (1998), who use a slight variation of (2.3) and propose a new covariance function as the sum of a stationary covariance function (including a nugget) and a nonstationary component of the form of (2.3). The EOFs are calculated from a detrended empirical covariance matrix which removes the effect of the stationary component; only a single replicate of data was needed for the empirical estimate. Holland et al. (1998) compare this nonstationary covariance function to both a standard isotropic model as well as an exponential covariance function with spatially-varying marginal variances; the nonstationary model greatly reduced the mean square prediction error. Alternatively, Nychka et al. (2002) use non-orthogonal multiresolution wavelet basis functions in place of the eigenfunction bases in (2.3), which relaxes the condition that the random variables {W l } are uncorrelated. Multiresolution bases, which have differing ranges of dependence over space, are useful for modeling nonstationary processes because the stochastic properties of the process can be controlled locally while still giving a globally nonstationary covariance function. Computational feasibility is obtained by restricting these basis functions to be translations and scalings of a few fixed functions, and the authors demonstrate the flexibility of the multiresolution model with simulations in which the wavelets approximate standard covariance models very well. The original method in this paper requires observations on a grid; 21

38 Matsuo et al. (2011) extend this approach to irregularly spaced observations by mapping to a regular domain and introduce an EM algorithm to estimate the covariance parameters. 2.3 Gaussian Markov random field methods While not explicitly a Gaussian process model, the SPDE approach of Lindgren et al. (2011) introduces a model for a Gaussian Markov random field (GMRF) which approximates a particular Gaussian process model. GMRFs are popular for areal data, where it is easy to establish the necessary neighborhood structure, and the Markov properties of the model allow major computational gains due to working with the precision matrix instead of the covariance matrix. Unfortunately, GMRFs are poorly suited for spatial models for point-referenced data, since it is difficult to construct a GRMF with a specific spatial correlation structure; on the other hand, such a constructive formulation is natural for GPs. Lindgren et al. (2011) overcome this problem by providing an explicit strategy for constructing a GMRF which corresponds to a nonstationary GP with a known Matérn covariance function; the link between GMRFs and GPs is given by finding a finite basis function representation which is the solution to a particular SPDE. Nonstationarity is accomplished in this work by allowing the Matérn covariance function parameters (range and marginal variance) to vary over space; Lindgren et al. (2011) suggest using a low-dimensional representation in which these parameters vary smoothly over space according to a log-linear function. 22

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced