Bayesian hierarchical models for spatially misaligned data in R

Size: px
Start display at page:

Download "Bayesian hierarchical models for spatially misaligned data in R"

Transcription

1 Methods in Ecology and Evolution 24, 5, doi:./24-2x.29 APPLICATION Bayesian hierarchical models for spatially misaligned data in R Andrew O. Finley *, Sudipto Banerjee 2 and Bruce D. Cook 3 Department of Forestry, Michigan State University, 26 Natural Resources Building, East Lansing, MI , USA; 2 Division of Biostatistics, School of Public Health, University of Minnesota, A46 Mayo Building, MMC 33, 42 Delaware Street S.E., Minneapolis, MN 55455, USA; and 3 Biospheric Sciences Laboratory, National Aeronautics and Space Administration, Goddard Space Flight Center, Code 6, Greenbelt, MD 277,USA Summary. Spatial misalignment occurs when at least one of multiple outcome variables is missing at an observed location. For spatial data, prediction of these missing observations should be informed by within location association among outcomes and by proximate locations where measurements were recorded. 2. This study details and illustrates a Bayesian regression framework for modelling spatially misaligned multivariate data. Particular attention is paid to developing valid probability models capable of estimating parameter posterior distributions and propagating uncertainty through to outcomes predictive distributions at locations where some or all of the outcomes are not observed. 3. Models and associated software are presented for both Gaussian and non-gaussian outcomes. Model parameter and predictive inference within the proposed framework is illustrated using a synthetic and forest inventory data set. 4. The proposed Markov chain Monte carlo samplers were written in C++ and leverage R s Foreign Language Interface to call FORTRAN BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) libraries for efficient matrix computations. The models are implemented in the spmisalignlm and spmisalignglm functions within the spbayes R package available via the Comprehensive R Archive Network (CRAN) ( Key-words: multivariate, misalignment, missingness, Gaussian spatial process, linear model of coregionalization, Markov chain Monte Carlo Introduction Investment in long-term monitoring networks and advancement in sensor technologies are creating data-rich environments that provide extraordinary opportunities to understand the complexity of large and spatially indexed ecological data. Building such understanding often requires the analysis of spatially indexed data sets with multiple variables measured at each location. In such settings, it is commonly posited that there is association between the measurements at a given location as well as association among measurements across locations. In ecological analysis, we often seek inference about the association among these multiple variables or wish to predict their values at new locations. For example, consider the analysis of (i) species co-occurrence where species presence/ absence or abundance is recorded at each location, for example Ovaskainen, Hottola & Siitonen (2); (ii) soil nutrient impact on local tree growth and competition where soil nutrient measurements coincide with tree inventory locations, for example Baribault, Kobe & Finley (22); or (iii) relationship *Correspondence author. finleya@msu.edu between multiple environmental stressors and measures of focal species fitness, for example Swope & Parker (22). In each case, development of a statistical model typically requires the full set of outcomes, for example species presence/absence, and covariates, for example soil nutrients or environmental stressors, at a set of locations. Given such multivariate settings, it is common that different subsets of the outcome variables, or covariates, are available at different locations. In the statistical literature, this situation is referred to as spatial misalignment. Following the examples above, say observers record the presence/absence of different subsets of species at different locations, or for a subset of locations, only some of the soil nutrients or plant stressors were measured perhaps due to different sampling protocol or if data were drawn from different data bases. In such cases, it is necessary to somehow impute or predict the value of the missing observations. Note, if there is misalignment among the covariates, then we might view them as outcomes in a model used to predict their missing observations. Regardless of where the misalignment occurs, these predictions should be informed using the within location association among variables and from proximate locations where measurements were recorded. 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society

2 Models for spatially misaligned multivariate data 55 Further, it is common to seek prediction for the entire set of outcomes at new locations where no measurements were recorded. In both cases, an assessment of prediction uncertainty is also often desired. Here, we consider point-point misalignment to distinguish it from point-areal whichreferstothesituationwheresomevariables may be referenced by their points, while others may have been aggregated over spatial regions. Although the term pointareal misalignment is used in the literature, following Gotway & Young (22), we prefer to classify this as a change-of-support problem. See also Mugglin, Carlin & Gelfand (2), Gelfand, Zhu & Carlin (2), Zhu, Carlin & Gelfand (23) and the references therein for methods for change-of-support problems. A salient feature of our data is that every location generates, at most, only one replicate of the multiple outcomes. For empirical estimation of the association among these outcomes using sampling-based multivariate analysis methods, one must consider the observations at different spatial locations as independent replicates. This will, however, preclude estimation of the spatial associations. Under this setting, inference on associations should deploy fully model-based approaches using the flexibility of spatial stochastic processes. Existing model-based methods for handling spatial pointpoint misalignment primarily aim to align disparate variables by accounting for additional uncertainty when kriging or other smoothing methods are used to align the spatially referenced data (Madsen, Ruppert & Altman 2; Buonaccorsi 29; Gryparis et al. 29; Paciorek et al. 29; Szpiro et al. 2; Lopiano, Young & Gotway 2 23). These approaches build conditional regression-like models where the marginal distribution of the first outcome is specified, followed by the conditional distribution of the second outcome given the first and so on. This approach is easily interpretable and ensures the legality of the resulting joint distributions from the process realizations. However, the approach is more suitable when the number of outcomes is small and there is a natural ordering that would suggest the sequence for constructing the conditional distributions. Settings such as ours lack such information on ordering, so joint modelling of the outcomes is preferable to avoid the explosion in models emerging from alternate ordering schemes. Joint models attempt to directly construct cross-covariance functions that describe the covariances between different outcomes at two, possibly different, locations. A model-based Bayesian approach for point-point misalignment was presented in Banerjee & Gelfand (22). More recently, joint modelling of point patterns and misaligned covariates are considered in Illian, Sorbye & Rue (22), while Ren & Banerjee (23) considered modelling spatial misalignment using a class of spatial latent factor models. While the problem of spatial misalignment is ubiquitous, software to implement model-based analysis of such data is absent. Our current work focuses upon point-point misalignment and extends and integrates some of the aforementioned methodological work into a Bayesian hierarchical modelling framework. In addition, we demonstrate how this is implemented in our spbayes package for the R statistical programming language and environment. Multivariate spatial regression with misalignment Let S,S 2,...,S m denote sets comprising n,n 2,...,n m locations where m outcomes have been observed. We collect all observations for the first outcome into an n 9 columnvectory, those for the second outcome into an n 2 9 columnvectory 2, and so on until we collect observations corresponding to the m-th outcome into an n m 9 column vector y m.eachofthese isstackedintoann 9 columnvectory where N ¼ P m i n i. The covariates corresponding to the i-th outcome y i are collected into an n i 9 p i matrix X i,andweletb i denote the p i 9 regression slope vector associated with X i. The other key ingredient in the multivariate spatial regression model is the vector of unobserved spatial random effects. For any location s, indexed by some coordinate frame, we have a spatial random effect w i (s) associated with the i-th outcome y i (s) fori =,2,...,m. We collect the random effects corresponding to the i-th outcome into an n i 9 vectorw i so that it corresponds to y i. The multivariate spatial linear regression model is given by y i ¼ X i b i þ w i þ e i i ¼ ; 2;...; m; eqn where e i is an n i 9 column of zero-centred residual random errors corresponding to the i-th outcome such that the covariance between an element in e i andanelementine j is zero whenever i and j correspond to different outcomes. Two elements within e i represent random errors associated with the i-th outcome measured at two different locations. The covariances between any two such elements and the variances of each element in e i are placed as off-diagonal and diagonal entries in an n i 9 n i matrix Ψ i, which is the variance covariance matrix of e i.eachofthee i s is assumed to be normally distributed, independent of the others, with mean zero and variance covariance matrix Ψ i. Model () can be extended to accommodate non-gaussian outcomes such as (i) binary data modelled using logit or probit regression, and (ii) count data modelled using Poisson regression. Diggle, Tawn & Moyeed (99) unify the use of generalized linear models in spatial data contexts. See also Lin et al. (2), Kamman & Wand (23), and Banerjee, Carlin & Gelfand (24). Essentially we replace model () with the assumption that E[y i (s)] is linear on a transformed scale, that is, g(e[y i (s)]) = x i (s) b i + w i (s), where g() is a suitable link function and x i (s) isthep i 9 vector that includes outcome- and location-specific covariates. Spatial association is captured by the spatial effects, that is, the w i s in (). Any two entries in w i correspond to the spatial random effects for outcome i from two different locations. These are assumed to be associated or correlated based upon a function of the separation or distance between the two locations. The essence of multivariate spatial modelling is to prescribe these covariances in such a way that the joint distribution of the w i s, for i =,2,...,m, in () is a multivariate normal distribution. The key modelling ingredient here is a multivariate spatial process, see, for example, Chiles & Delfiner (999), Cressie & Wikle (2), and Banerjee, Carlin & Gelfand (24). In our 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

3 56 A. O. Finley, S. Banerjee & B. D. Cook context, the multivariate spatial process is an infinite collection of m 9 vectorsw(s) indexed by spatial coordinates s residing in two or three dimensional Euclidean space. The spatial random effects arise as a finite subset of this set indexed by the locations where the outcomes have been observed. A spatial process is well-defined whenever any finite collection of random effects has a legitimate probability distribution. When these distributions always belong to a multivariate normal family, we say that the spatial process is a Gaussian process. In (), each w i is an n i 9 vector of spatial random effects collected over the locations where outcome i has been observed. The covariance among outcomes spatial random effects provides learning about missing observations. The details on constructing and estimating the covariance among spatial random effects are given in Appendix S. In brief say we wish to model the covariance between spatial random effects corresponding to two different outcomes at two different locations. That is, for outcomes i and j, and locations s k and s l,wemust specify cov{w i (s k ),w j (s l )} in a manner that will ensure a legitimate probability distribution for the joint distribution of {w i : i =,2,...,m}. This covariance is specified using a spatial cross-covariance function that is constructed using outcomespecific spatial correlation functions which include parameters to control the random effects spatial dependence, for example rate of spatial decay. Given parameter estimates, the crosscovariance functions provide inference about how outcomes covary in space, after accounting for covariates, and inform prediction. We adopt the Bayesian paradigm for inference, see, for example, Gelman et al. (24), and build hierarchical models by modelling the parameters using probability distributions. Inference about the regression slopes, the spatial random effects, and the variances and covariances is based on Markov chain Monte Carlo (MCMC) sampling from posterior distributions. As noted in Introduction, a primary aim of our analysis is interpolation and prediction. Following terminology used in Banerjee & Gelfand (22), when we estimate the value of an outcome at a location where some of the other outcomes have been observed, we call it interpolation. When we seek to estimate the value of an outcome at a new location, where none of the outcomes have been observed, we call it prediction. In sampling-based Bayesian inference, we draw samples from the posterior predictive distributions of the outcome variable at unobserved locations given the observed data. The posterior predictive distribution is in fact the posterior distribution of y i (s )giveny,wheres is the location we want to interpolate or predict. Additional details can be found in the Appendix S. Software implementation The models described in the preceding section are available in the spbayes (version.3-) R package spmisalignlm and spmisalignglm functions for Gaussian and non-gaussian outcomes, respectively. These functions are written in C++ and leverage R s Foreign Language Interface to call FORTRAN BLAS (Basic Linear Algebra Subprograms, see Blackford et al. 22) and LAPACK (Linear Algebra Package, see Anderson et al. 999) libraries for efficient matrix computations. A heavy reliance on BLAS and LAPACK functions allows the software to leverage multiprocessor/core machines via threaded implementations of BLAS and LAPACK, for example Intel s Math Kernel Library (MKL; en-us/intel-mkl). Use of MKL, or similar threaded libraries, can dramatically reduce sampler run-times. For example, the illustrative analyses offered in subsequent sections were conducted using R, and hence spbayes, compiled with MKL on an Intel Ivy Bridge i7 quad-core processor with hyperthreading. The use of these parallel matrix operations results in a near linear speedup in the MCMC sampler s run-time with the number of CPUs. In addition to Appendix S, Finley, Banerjee & Gelfand (23) provide specifics on efficient implementation of the multivariate Gaussian process parameter estimation. Illustrative analyses SYNTHETIC DATA We consider a synthetic data set comprising three outcome variables observed over unique and common locations within a unit square domain. The analysis of these data demonstrates how the strength of correlation between outcomes spatial random effects and range of spatial dependence influences the accuracy and precision of prediction and interpolation. The R code to reproduce this and subsequent analyses is available in Finley, Banerjee & Cook (24). Following model () and using the true parameter values given in the first column of Table, we generated outcomes at all locations in Fig. (a). These outcomes are shown in Fig. (b d). Outcome observations were then subsampled to create misalignment following the design in Fig. (a). Here, each circle contains those locations where the given outcome identified by the circles number is observed. Regions where Table. Parameter values used to generated the synthetic data in the column labelled True along with spmisalignlm estimated parameter posterior distribution 5 (25, 975) percentiles. The b, correspond to outcomes regression intercepts, q is the cross-correlation between the outcomes spatial random effects, / is the spatial cross-correlation decay parameter, and Ψ is the non-spatial residual variances associated with each outcome. Subscripts indicate the associated outcome variable True Estimate b, (67, 57) b 2, (333, 639) b 3, 9 (99, 29) q,2 6 ( 97, 66) q,3 9 7 (63, 97) q 2, ( 7, 7) / 6 73 (439, 42) / (472, 2947) / (534, 252) Ψ 5 (2, 2) Ψ 2 4 (, 2) Ψ 3 5 (2, 27) 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

4 Models for spatially misaligned multivariate data 57 (a) (b) Fig.. (a) Locations of observed and unobserved outcome variables. Data associated with each outcome are observed within its respective circle, indicated by numbers, 2 and 3. Intersecting regions contain locations where two or more outcomes are observed. Surfaces for outcomes, 2 and 3 are given in (b), (c) and (d), respectively (c) (d) the circles overlap identify those locations where multiple outcomes were observed. The true spatial cross-covariances used to generate the data can be converted to cross-correlations to facilitate interpretation. These correlations are provided in Table and also displayed in their respective regions of overlap in Fig. (a). Given spatially misaligned data, the spmisalignlm function called in the R code below generates posterior samples from the parameters of the posited model. This function takes each outcome s symbolic regression model and locations where data are observed. Additionally, parameter starting values, prior distributions, MCMC Metropolis algorithm proposal distribution variances, spatial correlation function and number ofdesiredmcmcsamplesarealsopassedtothespmisalignlm function. A full explanation of argument syntax and output is available in the function s manual available via CRAN. 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

5 5 A. O. Finley, S. Banerjee & B. D. Cook (a) (b) (c) Fig. 2. Misalignment model posterior predictive distribution median surfaces for outcomes, 2 and 3 in (a), (b) and (c), respectively. (a) (b) (c) Fig. 3. Misalignment model posterior predictive distribution uncertainty surfaces for outcomes, 2 and 3 in (a), (b) and (c), respectively. The resulting MCMC samples were summarized using functions in the coda package and displayed in Table. Here, we can see that parameters estimated 95% credible intervals include the true parameter values. As we will see in the subsequent data analysis, Penobscot Experimental Forest LiDAR and biomass data, the parameter estimates associated with the spatial random effects cross-correlations can be used to explore hypotheses about association after accounting for the impact of covariates. Also, one can look to the spatial crosscorrelation decay parameters to make inference about the geographical range of dependence among observations. Given the spmisalignlm object m.miss and spatial coordinates with associated covariates, one can interpolate and predict using the sppredict function. In the code 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

6 Models for spatially misaligned multivariate data 59 Table 2. Univariate spatial regression model prediction and misalignment model interpolation performance. Performance metrics are (i) root mean squared error (RMSE) between the observed and predicted or interpolated outcomes; (ii) mean width between the lower and upper 95% posterior predictive distribution credible intervals (CI width); and (iii) the percentage of observations covered by their respective 95% credible interval (CI cover) Univariate outcome Misalignment outcome RMSE CI width CI cover below, sppredict is used to generate posterior predictive samples for all three outcomes at all locations in Fig. (a). Figures 2 and 3 show the median and dispersion of the resulting posterior predictive distributions. The interpolated and predicted outcomes shown in Fig. 2(a c) closely approximate the observed data Fig. (b d). We summarize the prediction uncertainty using the width between the lower and upper 95% posterior predictive credible intervals; given in Fig. 3(a), (b) and (c) for outcomes, 2 and 3, respectively. These surfaces show that stronger cross-correlation between outcomes result in more precise interpolation. For example, the spatial random effects associated with outcome are strongly correlated with those of outcomes 2 and 3, that is, estimated cross-correlation of 6 and 7, respectively. As a result, Fig. 3(a) shows greater precision in interpolation of outcomes 2 and 3 when outcome is observed (notice the lighter colours in circles 2 and 3). In contrast, when the cross-correlation is weak, there is less information available to inform interpolation. For example, a cross-correlation of 62betweenoutcomes2and3resultsinonly marginal narrowing of the interpolation precision in either Fig. 3(b) or (c). (a) (km) 5 PEF study area LVIS Lp25 and Lp95 extent G LiHT Gp95 observations Sample plot BIO observations 5 2 (km) (b) 25 2 (c) 2 Fitted Gp95 5 Fitted BIO 5 Fig. 4. Penobscot Experimental Forest LiDAR and sample plot data extent and locations (a). Misalignment model posterior distribution median (black point symbol) and 95% credible intervals for Gp95 and BIO in (b) and (c), respectively Observed Gp Observed BIO 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

7 52 A. O. Finley, S. Banerjee & B. D. Cook We are in a prediction setting when none of the outcomes are observed at a given location. In Fig. (a), prediction occurs for all locations outside of the three circles. In the absence of covariates, prediction is only informed by proximate observed locations. The stronger the spatial dependence, the more information for prediction is gleaned from observed locations. For example, the spatial decay point estimate for outcome is 73, which corresponds to an effective spatial range 3 domain distance units (where we define effective spatial range as the distance at which the spatial correlation drops to 5). The result of this relatively long spatial range is that predictions made just outside of circle show more precise posterior predictive intervals (notice the halo around the circle). To assess the usefulness of estimating the covariance among the outcomes spatial random effects for interpolation, we Table 3. Estimated parameter posterior distribution 5 (25, 975) percentiles for Penobscot Experimental Forest misalignment model. The b, correspond to outcomes regression intercepts, q Gp95,BIO is the cross-correlations between the outcomes spatial random effects, and Ψ is the non-spatial residual variances associated with each outcome. Estimates for / Gp95 and / BIO have been transformed to their respective effective spatial range in km Estimate b Gp95, 6 (245, 99) b Gp95,Lp25 45 (2, 6) b Gp95,Lp95 4 ( 3, 5) b Bio, 93 (95, 53) Ψ Gp95 75 (2, 9) Ψ BIO 4 (26, 9) q Gp95,BIO 36 (6, 9) Gp95 eff. range (km) 299 (3, 46) BIO eff. range (km) 49 (7, 54) compare the misalignment model results to predictions generated by outcome-specific univariate spatial regressions. The univariate models are equivalent to model () but assume thereisnocovarianceamongtheoutcomes randomeffects. These univariate models can be fit using the splm function in spbayes. Summaries of prediction performance are given in Table 2 and show the misalignment model improves prediction accuracy and precision for each outcome, as reflected by lower RMSE and narrower 95% credible intervals compared to those of the univariate model. PENOBSCOT EXPERIMENTAL FOREST LIDAR AND BIOMASS DATA This illustrative analysis considers data from a 6-ha area within the US Forest Service Penobscot Experimental Forest (PEF; ME, USA. The PEF has been studied extensively beginning in the 95s and is under active forest management as part of several long-term silvicultural experiments. A variety of forest variables are recorded on over 6 permanent georeferenced sample plots across the PEF. Light Detection and Ranging (LiDAR) data from the National Aeronautics and Space Administration (NASA) airborne Laser Vegetation Imaging Sensor (LVIS; and LiDAR, hyperspectral and thermal (G-LiHT; Cook et al. 23) sensors are also available for the PEF. The objectives of this illustrative analysis are to produce predictive maps, with associated uncertainty, of (i) forest canopy height metrics from sparsely sampled LiDAR, for example, G- LiHT, and (ii) forest variables measured at forest sample plots. For brevity, we consider only a subset of the available PEF data. The location and extent of these data are show in Fig. 4(a) and include: (a) 2 (b) 6 (km) (km) (km) (km) (c) 5 (d) 4 2 (km) (km) 5 (km) (km) Fig. 5. Penobscot Experimental Forest misalignment model posterior predictive distribution summary surfaces. Posterior median for Gp95 and BIO given in (a) and (b), respectively. Range between the lower and upper 95% credible intervals for Gp95 and BIO given in (c) and (d), respectively. 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

8 Models for spatially misaligned multivariate data 52 forest canopy height 25th and 95th percentiles, labelled Lp25 and Lp95, respectively, measured in 23 using LVIS at a 25-m-diameter footprint across the extent of the study area; forest canopy height 95th percentile, labelled Gp95, measured in 22 using the G-LiHT sensor at a 25-m-diameter footprint along a single transect across the study area; metric tons of live above-ground tree biomass per ha, BIO, estimated at each of the 7 permanent sample plots between 2 and 22. Here, we are interested in predicting both Gp95 and BIO at a fine spatial resolution across the study area. We expect a positive relationship between Gp95, which is a proxy for canopy height, and BIO. Further, although the forest structure has changed since 23 due to timber harvesting, the complete coverage LVIS Lp25 and Lp95 variables might explain some variability in the more current G-LiHT Gp95, and therefore, we use these metrics as covariates in the subsequent regression. This model is specified in the code below, along with parameter starting values, prior distributions, MCMC algorithm specifics and the spatial correlation function. Although not shown, variogram analysis of univariate non-spatial model residuals and other exploratory data analysis tools can help guide choice of prior distributions and associated hyperparameters for the spatial and non-spatial covariances. Again, a full explanation of argument syntax and output is available in the function s manual available via CRAN. The resulting MCMC samples were summarized using functions in the coda package and displayed in Table 3. Here, we see the LVIS Lp25 covariate explains a substantial portion of variability in G-LiHT Gp95, that is, the 95% credible intervals of the b Gp95,Lp25 do not include zero. Given timber harvesting activity in the study area over the 9 years between the LiDAR measurements, the lack of relationship between the sensors 95th canopy height percentiles is not too surprising. The long effective spatial ranges estimated for Gp95 and BIO suggest there is substantial spatial structure among the residuals. The effective spatial ranges are calculated using the cross-covariance and spatial correlation functions parameter estimates, see Finley, Banerjee & Cook (24) and Gelfand et al. (24, p. 292). Further, Gp95 s and BIO s spatial random effects are moderately correlated q Gp95,BIO 36. Estimating this crosscorrelation is useful for exploring hypotheses about strength and direction of association among the outcomes residual spatial structure, perhaps after accounting for some covariates. In this analysis, we could say there is a positive and significant, that is, credible intervals do not include zero, correlation between the residual spatial structure of Gp95 and BIO. Given the spmisalignlm object m.miss and spatial coordinates with associated covariates, one can interpolate and predict using the sppredict function. In the supplemental analysis code (Finley, Banerjee & Cook 24), sppredict is used to generate posterior predictive samples for Gp95 and BIO at all 226 locations where Lp25 and Lp95 were observed. Surfaces of the resulting posterior predictive distributions median and width between the lower and upper 95% credible intervals are given for Gp95 and BIO in Fig. 5. The posterior predictive medians shown in Fig. 5(a) and (b) closely approximate the observed data, see, for example, model fitted versus observed values in Fig. 4(b) and (c). However, more pertinent to this illustration, Fig. 5(c) and (d) 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

9 522 A. O. Finley, S. Banerjee & B. D. Cook shows narrowing of the posterior predictive distribution at and near locations of interpolation for the respective outcome. For example, the narrowing of the posterior predictive distributions for predicted Gp95 at and near observed BIO locations is clearly seen in Fig. 5(c). Similarly, Fig. 5(d) shows the posterior predictive distributions for BIO narrow within and adjacent to the G-LiHT transect where Gp95 is observed. Discussion and summary Themultivariatemodelshouldyieldimprovedpredictiveinference, over univariate models, in settings where there is moderate-to-strong covariance among outcomes spatial random effects and where the spatial range of dependence is sufficiently long as to allow observations to contribute information across locations. The development in the section Multivariate spatial regression with misalignment, and subsequent analyses, assumes a constant covariance among outcomes over the domain. This assumption might be reasonable in many settings. However, a more flexible model would pursue a non-stationary formulation of the cross-covariance matrix, see, for example, Guhaniyogi et al. (23). Such non-stationary cross-covariance models could improve inference about changing patterns in the strength and direction of the correlation between outcomes at broad spatial scales. In addition to improving prediction and interpolation in some settings, the multivariate misalignment model could be useful in designing efficient monitoring efforts. For example, if one had an a-priori estimate of the covariance among outcomes, or could learn about this covariance through an initial sampling effort, then resources could be used for an appropriate level of sampling of outcome subsets. This represents a very active area of work that builds upon a rich literature on sampling designs for spatiotemporal environmental data, see, for example, Mateu & M uller (23). Further development of the multivariate misalignment model for inference about spatiotemporal processes is a logical next step and would likely find application for exploring complex and dynamic ecological processes. Acknowledgements This work was supported by National Science Foundation Grants DMS- 669, EF-3739, EF-2474 and EF , as well as NASA Carbon Monitoring System grants. Data accessibility Data deposited in the Dryad repository: 56dryad.3g9s2 References Anderson,E.,Bai,Z.,Bischof,C.,Blackford,S.,Demmel,J.,Dongarra,J.,et al. (999) LAPACK Users Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA. ISBN Banerjee, S. & Gelfand, A.E. (22) Prediction, interpolation and regression for spatially misaligned data sets. Sankhya Series A, 64, Banerjee, S., Carlin, B.P. & Gelfand, A.E. (24) Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC Press, Boca Raton, FL. Baribault,T.,Kobe,R.K.&Finley,A.O.(22)Tropicaltreegrowthiscorrelated with soil phosphorus, potassium, and calcium, though not for legumes. Ecological Monographs, 2, Blackford, S.L., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., et al. (22) An Updated Set of Basic Linear Algebra Subprograms (BLAS). Transactions on Mathematical Software, 2, Buonaccorsi, J.P. (29) Measurement Error: Models, Methods and Applications. Chapman & Hall/CRC, Boca Raton, FL. Chiles, J.P. & Delfiner, P. (999) Geostatistics: Modelling Spatial Uncertainty. Wiley, New York. Cook, B.D., Corp, L.W., Nelson, R.F., Middleton, E.M., Morton, D.C., McCorkel, J.T., et al. (23) NASA Goddard s Lidar, Hyperspectral and Thermal (G-LiHT) airborne imager. Remote Sensing, 5, Cressie, N.A.C. & Wikle, C.K. (2) Statistics for Spatio-Temporal Data. Wiley, New York. Diggle, P.J., Tawn, J.A. & Moyeed, R.A. (99) Model-based geostatistics (with discussion). Journal of the Royal Statistical Society, Series C (Applied Statistics), 47, Finley, A.O., Banerjee, S. & Gelfand, A.E. (23) spbayes for large univariate and multivariate point-referenced spatio-temporal data models. arxiv:3. 92[stat.CO]. Finley, A.O., Banerjee, S. & Cook, B.D. (24) Data from: Bayesian hierarchical models for spatially misaligned data in R. Methods in Ecology and Evolution. doi:.56/dryad.3g9s2 Gelfand, A.E., Zhu, L. & Carlin, B.P. (2) On the change of support problem for spatio-temporal data. Biostatistics, 2, Gelfand, A.E., Schmidt, A.M., Banerjee, S. & Sirmans, C.F. (24) Nonstationary multivariate process modelling through spatially varying coregionalization (with discussion). TEST, 3, Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (24) Bayesian Data Analysis, 2nd edn. Chapman and Hall/CRC Press, Boca Raton, FL. Gotway, C.A. & Young, L.J. (22) Combining incompatible spatial data. Journal of the American Statistical Association, 97, Gryparis, A., Paciorek, C.J., Zeka, A., Schwartz, J. & Coull, B.A., (29) Measurement error caused by spatial misalignment in environmental epidemiology. Biostatistics,, Guhaniyogi, R., Finley, A.O., Banerjee, S. & Kobe, R.K. (23) Modeling complex spatial dependencies: low-rank spatially-varying cross-covariances with application to soil nutrient data. Journal of Agricultural, Biological, and Environmental Statistics,, Illian, J.B., Sorbye, S.H. & Rue, H. (22) A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). The Annals of Applied Statistics, 6, Kamman, E.E. & Wand, M.P. (23) Geoadditive models. Applied Statistics, 52,. Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R. & Klein, B. (2) Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Annals of Statistics, 2, Lopiano, K.K., Young, L.J. & Gotway, C.A. (2) A comparison of errors in variables methods for use in regression models with spatially misaligned data. Statistical Methods in Medical Research, 2, Lopiano, K.K., Young, L.J. & Gotway, C.A. (23) Estimated generalized least squares in spatially misaligned regression models with berkson error. Biostatistics, 4, Madsen, L., Ruppert, D. & Altman, N.S. 2. Regression with spatially misaligned data. Environmetrics, 9, Mateu, J. & M uller, W.G. (23) Spatio-Temporal Design: Advances in Efficient Data Acquisition. John Wiley & Sons, Ltd., West Sussex. Mugglin, A.S., Carlin, B.P. & Gelfand, A.E. (2) Fully model-based approaches for spatially misaligned data. Journal of the American Statistical Association, 95, Ovaskainen, O., Hottola, J. & Siitonen, J. (2) Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions. Ecology, 2, Paciorek, C.J., Yanosky, J.D., Puett, R.C., Laden, F. & Suh, H.H. (29) Practical large-scale spatio-temporal modeling of particulate matter concentrations. The Annals of Applied Statistics, 3, Ren, Q. & Banerjee, S. (23) Hierarchical factor models for large spatially misaligned data: a low-rank predictive process approach. Biometrics, 69, 9 3. Swope, S.M. & Parker, I.M. (22) Complex interactions among biocontrol agents, pollinators, and an invasive weed: a structural equation modeling approach. Ecology, 22, The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

10 Models for spatially misaligned multivariate data 523 Szpiro, A.A., Sheppard, L. & Lumley, T. (2) Efficient measurement error correction with spatially misaligned data. Biostatistics, 2, Zhu, L., Carlin, B.P. & Gelfand, A.E. (23) Hierarchical regression with misaligned spatial data: relating ambient ozone and pediatric asthma er visits in atlanta. Environmetrics, 4, Received 5 November 23; accepted 26 February 24 Handling Editor: Bob O Hara Supporting Information Additional Supporting Information may be found in the online version of this article. Appendix S. Misalignment model specification. 24 The Authors. Methods in Ecology and Evolution 24 British Ecological Society, Methods in Ecology and Evolution, 5,

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley 1, Sudipto Banerjee 2, and Bradley P. Carlin 2 1 Michigan State University, Departments

More information

Hierarchical Modeling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Hierarchical Modeling for non-gaussian Spatial Data

Hierarchical Modeling for non-gaussian Spatial Data Hierarchical Modeling for non-gaussian Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Modelling Multivariate Spatial Data

Modelling Multivariate Spatial Data Modelling Multivariate Spatial Data Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. June 20th, 2014 1 Point-referenced spatial data often

More information

Hierarchical Modelling for non-gaussian Spatial Data

Hierarchical Modelling for non-gaussian Spatial Data Hierarchical Modelling for non-gaussian Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2

More information

Bayesian Dynamic Modeling for Space-time Data in R

Bayesian Dynamic Modeling for Space-time Data in R Bayesian Dynamic Modeling for Space-time Data in R Andrew O. Finley and Sudipto Banerjee September 5, 2014 We make use of several libraries in the following example session, including: ˆ library(fields)

More information

Some notes on efficient computing and setting up high performance computing environments

Some notes on efficient computing and setting up high performance computing environments Some notes on efficient computing and setting up high performance computing environments Andrew O. Finley Department of Forestry, Michigan State University, Lansing, Michigan. April 17, 2017 1 Efficient

More information

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as

More information

Hierarchical Modelling for non-gaussian Spatial Data

Hierarchical Modelling for non-gaussian Spatial Data Hierarchical Modelling for non-gaussian Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Generalized Linear Models Often data

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Hierarchical Modeling for Spatio-temporal Data

Hierarchical Modeling for Spatio-temporal Data Hierarchical Modeling for Spatio-temporal Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of

More information

BAYESIAN HIERARCHICAL MODELS FOR MISALIGNED DATA: A SIMULATION STUDY

BAYESIAN HIERARCHICAL MODELS FOR MISALIGNED DATA: A SIMULATION STUDY STATISTICA, anno LXXV, n. 1, 2015 BAYESIAN HIERARCHICAL MODELS FOR MISALIGNED DATA: A SIMULATION STUDY Giulia Roli 1 Dipartimento di Scienze Statistiche, Università di Bologna, Bologna, Italia Meri Raggi

More information

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)

More information

Hierarchical Modeling and Analysis for Spatial Data

Hierarchical Modeling and Analysis for Spatial Data Hierarchical Modeling and Analysis for Spatial Data Bradley P. Carlin, Sudipto Banerjee, and Alan E. Gelfand brad@biostat.umn.edu, sudiptob@biostat.umn.edu, and alan@stat.duke.edu University of Minnesota

More information

Aggregated cancer incidence data: spatial models

Aggregated cancer incidence data: spatial models Aggregated cancer incidence data: spatial models 5 ième Forum du Cancéropôle Grand-est - November 2, 2011 Erik A. Sauleau Department of Biostatistics - Faculty of Medicine University of Strasbourg ea.sauleau@unistra.fr

More information

Gaussian Process Regression Model in Spatial Logistic Regression

Gaussian Process Regression Model in Spatial Logistic Regression Journal of Physics: Conference Series PAPER OPEN ACCESS Gaussian Process Regression Model in Spatial Logistic Regression To cite this article: A Sofro and A Oktaviarina 018 J. Phys.: Conf. Ser. 947 01005

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology Chris Paciorek and Brent Coull Department of Biostatistics Harvard School of Public Health wwwbiostatharvardedu/

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2007, Volume 19, Issue 4. http://www.jstatsoft.org/ spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O.

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1

More information

Rejoinder. Peihua Qiu Department of Biostatistics, University of Florida 2004 Mowry Road, Gainesville, FL 32610

Rejoinder. Peihua Qiu Department of Biostatistics, University of Florida 2004 Mowry Road, Gainesville, FL 32610 Rejoinder Peihua Qiu Department of Biostatistics, University of Florida 2004 Mowry Road, Gainesville, FL 32610 I was invited to give a plenary speech at the 2017 Stu Hunter Research Conference in March

More information

Gaussian predictive process models for large spatial data sets.

Gaussian predictive process models for large spatial data sets. Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview

More information

A Spatio-Temporal Downscaler for Output From Numerical Models

A Spatio-Temporal Downscaler for Output From Numerical Models Supplementary materials for this article are available at 10.1007/s13253-009-0004-z. A Spatio-Temporal Downscaler for Output From Numerical Models Veronica J. BERROCAL,AlanE.GELFAND, and David M. HOLLAND

More information

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modelling for Univariate and Multivariate Spatial Data Hierarchical Modelling for Univariate and Multivariate Spatial Data p. 1/4 Hierarchical Modelling for Univariate and Multivariate Spatial Data Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS Srinivasan R and Venkatesan P Dept. of Statistics, National Institute for Research Tuberculosis, (Indian Council of Medical Research),

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2 Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2 1 Norwegian University of Life Sciences, Department of Ecology and Natural Resource

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Statistics for extreme & sparse data

Statistics for extreme & sparse data Statistics for extreme & sparse data University of Bath December 6, 2018 Plan 1 2 3 4 5 6 The Problem Climate Change = Bad! 4 key problems Volcanic eruptions/catastrophic event prediction. Windstorms

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Combining Incompatible Spatial Data

Combining Incompatible Spatial Data Combining Incompatible Spatial Data Carol A. Gotway Crawford Office of Workforce and Career Development Centers for Disease Control and Prevention Invited for Quantitative Methods in Defense and National

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources Carol A. Gotway Crawford National Center for Environmental Health Centers for Disease Control and Prevention,

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

On the change of support problem for spatio-temporal data

On the change of support problem for spatio-temporal data Biostatistics (2001), 2, 1,pp. 31 45 Printed in Great Britain On the change of support problem for spatio-temporal data ALAN E. GELFAND Department of Statistics, University of Connecticut, Storrs, Connecticut

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models Introduction to Spatial Data and Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics,

More information

Fusing point and areal level space-time data. data with application to wet deposition

Fusing point and areal level space-time data. data with application to wet deposition Fusing point and areal level space-time data with application to wet deposition Alan Gelfand Duke University Joint work with Sujit Sahu and David Holland Chemical Deposition Combustion of fossil fuel produces

More information

spbayes: an R package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

spbayes: an R package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models spbayes: an R package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley, Sudipto Banerjee, and Bradley P. Carlin 1 Department Correspondence of Forest Resources,

More information

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES XX IMEKO World Congress Metrology for Green Growth September 9 14, 212, Busan, Republic of Korea MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES A B Forbes National Physical Laboratory, Teddington,

More information

Model Assessment and Comparisons

Model Assessment and Comparisons Model Assessment and Comparisons Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Areal Wombling for Geographic Boundary Analysis

Bayesian Areal Wombling for Geographic Boundary Analysis Bayesian Areal Wombling for Geographic Boundary Analysis Haolan Lu, Haijun Ma, and Bradley P. Carlin haolanl@biostat.umn.edu, haijunma@biostat.umn.edu, and brad@biostat.umn.edu Division of Biostatistics

More information

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer

More information

Represent processes and observations that span multiple levels (aka multi level models) R 2

Represent processes and observations that span multiple levels (aka multi level models) R 2 Hierarchical models Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a

More information

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models Introduction to Spatial Data and Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry

More information

Modelling Replicated Weed Growth Data Using Spatially-Varying Growth Curves

Modelling Replicated Weed Growth Data Using Spatially-Varying Growth Curves Modelling Replicated Weed Growth Data Using Spatially-Varying Growth Curves By Sudipto Banerjee, Gregg A. Johnson, Nick Schneider and Beverly R. Durgan 1 Abstract: Weed growth in agricultural fields constitutes

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

ASA Section on Survey Research Methods

ASA Section on Survey Research Methods REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data

More information

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.

More information

Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification

Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification Research Article Environmetrics Received: 10 September 2014, Revised: 12 January 2015, Accepted: 15 January 2015, Published online in Wiley Online Library: 18 February 2015 (wileyonlinelibrary.com) DOI:

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Summary STK 4150/9150

Summary STK 4150/9150 STK4150 - Intro 1 Summary STK 4150/9150 Odd Kolbjørnsen May 22 2017 Scope You are expected to know and be able to use basic concepts introduced in the book. You knowledge is expected to be larger than

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

Statistícal Methods for Spatial Data Analysis

Statistícal Methods for Spatial Data Analysis Texts in Statistícal Science Statistícal Methods for Spatial Data Analysis V- Oliver Schabenberger Carol A. Gotway PCT CHAPMAN & K Contents Preface xv 1 Introduction 1 1.1 The Need for Spatial Analysis

More information