Scaling intrinsic Gaussian Markov random field priors in spatial modelling

Size: px

Start display at page:

Download "Scaling intrinsic Gaussian Markov random field priors in spatial modelling"

Gregory Booth
6 years ago
Views:

1 Scaling intrinsic Gaussian Markov random field priors in spatial modelling Sigrunn Holbek Sørbye a,, Håvard Rue b a Department of Mathematics and Statistics, University of Tromsø, Norway. b Department of Mathematical Sciences, Norwegian University of Science and Technology, Norway. Abstract In Bayesian analysis of spatial regression models, intrinsic Gaussian Markov random fields (IGMRFs) are commonly applied to model underlying spatial or temporal dependency structures. This type of models is characterized by having a scaled precision matrix that reflects the neighbourhood structure of the model. The scaling, represented as a precision parameter, governs the degree of smoothness of the resulting random effect. Hyperprior choices for the precision parameter might have a strong influence on the output of posterior analysis but is often chosen in a rather ad-hoc manner. By mapping the random precision to the marginal variance of an IGMRF, we suggest that these hyperpriors should be rescaled to account for the model used, which also includes the resolution of the analysis. This provides an alternative interpretation of the parameter choices of the hyperprior and sensitivity can be investigated in terms of these scaled hyperpriors. The given suggestions are demonstrated by analysing two different types of spatial-area data in R-INLA, in which observations are given in geographical regions and as a spatial point pattern, respectively. Keywords: Bayesian hierarchical model, CAR-models, hyperprior, INLA, sensitivity analysis, spatial-area data. 1. Introduction Intrinsic conditional autoregressive (ICAR) models (Besag et al., 1991) are widely used as priors to model underlying unobserved dependency structures in Bayesian hierarchical models. Formulated as intrinsic Gaussian Markov random fields (Rue and Held, 2005), this type of models includes a precision or scaling parameter which governs the smoothness of the resulting random effect. A practical problem in fitting these models is the fact that posterior estimates and inference can be rather sensitive to the choice and scaling of hyperpriors for the precision. For example, this can be very noticeable in spatial analysis in which a spatial effect might easily be overfitted to the observations, while significant influence of measured covariates is not revealed. Despite of this sensitivity, the choice of hyperpriors in hierarchical models has often been made in a rather ad-hoc manner (Berger et al., 2005). The issue of prior selection in Bayesian analysis has beed adressed by a number of authors and is still a subject of some controversy, see for example Gelman et al. (2004) and Lesaffre and Lawson (2012) for thorough presentations of different classes of priors. Corresponding address: Department of Mathematics and Statistics, Faculty of Science, University of Tromsø, N-9037 Tromsø, Norway. Tel.: ; fax: address: sigrunn.sorbye@uit.no (Sigrunn Holbek Sørbye) Preprint submitted to Elsevier February 15, 2013

2 Here, we focus solely at the issue of choosing hyperpriors for the precision of IGMRFs. The main motivation for this is that these models are widely applied as priors, among others using the methodology of integrated nested Laplace approximations (INLA), introduced in Rue et al. (2009). INLA is accessible using the R-interface R-INLA ( in which IGMRFs are used as latent models to explain both spatial and temporal dependency structures. Using R-INLA, any hyperpriors can be specified freely by the user, or one can choose default hyperpriors, in which the precision is given a vague Gamma distribution with a predefined shape and inverse-scale parameter. Of course, such default parameters can not be used blindly and the tuning of these parameters is then left to the user. This is also the case using BUGS (Bayesian inference using Gibbs sampling), where Gamma distributions with shape and inverse-scale parameters set equal to the same small value, have been used as defaults. This gives inappropriate priors for the precision of random effects, especially for continuous data (Lunn et al., 2009). Advises on prior specifications using the INLA-methodology are given in Fong et al. (2010), analysing generalized linear mixed models. Roos and Held (2011) quantify sensitivity to prior assumptions for the same class of models, using sensitivity measures based on the Hellinger distance. Here we take a different approach, viewing prior choices for the precision of IGMRFs in terms of the marginal standard deviation of the models, calculated under linear constraints. Specifically, we suggest to assign hyperpriors to the scaled precisions of IGMRFs, which is achieved by scaling the precision with a reference standard deviation for different models. Such rescalings are easily-implemented and provide hyperpriors that are invariant to both the model used and the resolution of the analysis. The specific prior models under consideration include the first and second-order IGMRFs on a line, used to model smooth non-linear functions of covariates. Spatial effects are modelled using either a first-order IGMRF defined on an irregular lattice or a two-dimensional second-order IGMRF defined on a regular lattice. All of these models are described in section 3.3 and 3.4 in Rue and Held (2005) and at A basic characteristic in common to these different models is that they can be seen to penalize local deviation, either from a constant level, a line or a plane. The marginal standard deviation of the model, integrating out the random precision, gives information on how large we allow this deviation to be. This can be used to give an alternative interpretation of the parameters of the hyperprior and is also used in performing a sensitivity analysis to the parameter choices of a hyperprior. The plan of this paper is as follows. In section 2, we present some background on IGMRFs and the use of these as priors in latent Gaussian models. The ideas of assigning hyperpriors to scaled precisions are presented in section 3 while Section 4 demonstrates the use of scaled hyperpriors to two real data examples. Concluding remarks are given in section Intrinsic Gaussian Markov random fields The class of Bayesian hierarchical regression models that we consider here is referred to as latent Gaussian models (Rue et al., 2009), being a subclass of structured additive regression models (Fahrmeir and Tutz, 2001; Gelman et al., 2004). The observational vector y is assumed to belong to an exponential family, in which the mean µ = E(y) is linked to a structured additive predictor η i = g(µ i ) = β 0i + z i β + j f j (c ji ) + u i, i = 1,..., n, 2

3 where all random components are assigned Gaussian priors. More specifically, β 0 denotes an intercept or off-set, while z i denotes a vector of covariates assumed to have a fixed linear effect on the response. The corresponding vector of unknown parameters β are assumed to have independent zero-mean Gaussian priors with fixed precisions, while the vector of unstructured random effects u, is assigned an iid zero-mean Gaussian prior with random precision. The set of functions {f j ( )} is included to account for non-linear random effects of continuous covariates, having value c ji for observation i, and these functions can be used to explain temporal, spatial or other underlying dependency structure in the data. We consider the case where one or more of these functions are modelled using IGMRFs, and the main focus is how to scale the hyperprior for the random precisions of these models. More generally, latent Gaussian models can be expressed in a unified way as a three-stage Bayesian hierarchical model in which observations y are assumed conditionally independent given a latent field x and hyperparameters θ. If x is a Gaussian Markov random field, then x = (x 1,..., x n ) T N(µ, Q 1 ), (1) where the precision matrix Q is positive definite. GMRFs are commonly referred to as conditional autoregressive (CAR) models (Besag, 1974), in which x is specified in terms of conditional distributions x i x i N(µ i β ij x j, τ 1 i ), where i j denotes that i and j are neighbouring nodes. This conditional specification gives a valid joint density of x, where Q ii = τ i and Q ij = τ i β ij when i j. The limiting intrinsic version (Besag et al., 1991), yields an improper joint density as the resulting precision matrix Q in (1) is positive semidefinite. However, the intrinsic models retain Markov properties and can be seen to have certain advantages over standard CAR-models, both conceptually and in practice (Besag and Kooperberg, 1995). In general, a zero-mean IGMRF of kth order can be specified as having the improper density j:j i π(x) = (2π) (n k)/2 ( Q )1/2 exp{ 1 2 xt Qx}, in which the precision matrix Q is assumed to be semi-positive definite. The order of the IGMRF refers to the rank deficiency of the precision matrix and Q denotes the generalized determinant equal to the product of the n k non-zero eigenvalues of Q. Different IGMRFs can be characterized by expressing the precision matrix as Q = τr, where τ denotes the random precision parameter and where the matrix R reflects the specific neighbourhood structure of the model. The explicit first-order IGMRFs considered here are used to model either underlying smooth functions in one dimension or a spatially structured effect for an irregular lattice. Both of these first-order models capture local deviation from an overall constant trend (represented by the null space of Q), which is beneficial in applications where the underlying mean level is approximately or locally constant, see section 3.3 in Rue and Held (2005) for details. The marginal standard deviation of the model, integrating out the uncertainty of the random precision, give a priori information on how large we allow the local deviations to be. For these two models we need to impose the constraint i x i = 0, to have a proper joint density (in an (n 1) dimensional subspace), in order to get a finite marginal standard deviation. The first-order model used as a spatial prior for regional data will be referred to as the 3

4 Besag-model. In this case the density of a covariate effect is π(x τ) τ (n 1)/2 exp τ w ij (x i x j ) 2, (2) 2 where w ij denotes positive and symmetric weights for all pairs of adjacent nodes. The marginal variance of this model will depend on both the shape and the number of nodes of the graph. The first-order IGMRF on a line, also referred to as a first-order random walk (rw1 in R-INLA), is assumed to have independent first-order increments i j x i = x i+1 x i N(0, τ 1 ), i = 1,..., n 1. For simplicity we assume that the n nodes are equally spaced with distance 1 (see section 3.2 in Rue and Held (2005) for the irregular case). The density for x = (x 1,..., x n ) is then defined by ( ) π(x τ) τ (n 1)/2 exp τ n 1 (x i+1 x i ) 2. 2 As noted in Akerkar et al. (2010), the hyperprior for the precision of random walk models need to be rescaled if the number of nodes is changed, in order to obtain the same variance as before. Assume now that an interval is divided into k times n equidistant nodes x 1,..., x kn. The original variance can then be expressed as τ 1 = Var(x i+1 x i ) = Var(x k(i+1) x ki), i = 1,..., n 1, and the variance of the first-order increments of the new model is equal to i=1 τ 1 new = Var(x i+1 x i) = (kτ) 1, i = 1,..., kn 1. (3) Smooth underlying functions on the line can also be modelled using a second-order IGMRF, often referred to as a second-order random walk (rw2). This model assumes independent second-order increments 2 x i = x i+2 2x i+1 + x i N(0, τ 1 ), in which the density of x is ( ) π(x τ) τ (n 2)/2 exp τ n 2 (x i 2x i+1 + x i+2 ) 2. 2 This model captures local deviation from a line, see section in Rue and Held (2005). In order to define a proper joint density for x, we need to impose that i ix i = 0, in addition to the sum-to zero constraint used for first-order models. Again, the hyperprior for the precision needs to be rescaled if the original interval is divided into k times n nodes, in which the new variance is i=1 τ 1 new = Var(x i+2 2x i+1 x i) = (k 3 τ) 1, i = 1,..., kn 2, (4) as derived in Lindgren and Rue (2008). In modelling spatial autocorrelation on a grid, we will use the second-order IGMRF on a regular lattice (rw2d), see section in Rue and Held (2005). This model is constructed 4

5 by assigning specific weights to different first and second-order neighbours, and the resulting model will approximate a thin plate spline. More specifically, the conditional mean for a specific node can be described by a weight matrix in which the four first-order neighbours in the cardinal directions are given the weight 8. Further, the nearest neighbours on the diagonals are given the weight -2, while the second-order neighbours in the cardinal directions are given weight -1. The resulting model penalizes local deviation from a plane and again the marginal standard deviation of the model, integrating out the random precision, indicate how large we allow this deviation to be. The marginal standard deviation is calculated under the linear constraints x ij = ix ij = jx ij = 0. If the resolution is changed, in which an original lattice of size n n is divided into kn kn grid cells, the new marginal variance is given by τnew 1 = (k 4 τ) 1, (5) see Lindgren et al. (2011). 3. Scaling hyperpriors for the precision of IGMRFs It is not easy to give any clear preferences when it comes to hyperprior selection and often such distributions are chosen in a rather casual way. For example, it is quite common to choose the same hyperprior for different precision parameters in a regression model, even though the underlying prior models are very different. In R-INLA, defaults for the hyperprior of the precision τ of all IGMRFs are vague Gamma distributions, τ Γ(1, ). Usually, the parameters need to be tuned by the user. The main point we make here is that it is unnatural to use the exact same parameters in assigning hyperpriors to the precisions of different IGMRF-models, as the marginal variances of these models are different. We suggest to assign hyperpriors to scaled precisions, taking the marginal variance into account. This gives hyperpriors that are automatically scaled, both in terms of the IGMRF-model used and also in terms of resolution Mapping the random precision to the marginal variance of the model For any fixed precision τ, the marginal standard deviation of the components of a Gaussian vector x can be expressed, as a function of τ, by σ τ (x i ) = σ {τ=1}(x i ), i = 1,..., n. τ The level of the marginal standard deviation is different for different IGMRFs, but often this is not accounted for in assigning a hyperprior to the precision. In order to scale the hyperprior for different models, we propose to calculate a reference standard deviation for x for fixed τ = 1, for example using the geometric mean ( ) 1 n σ ref (x) = exp log σ {τ=1} (x i ), n i=1 where the n nodes are still assumed equidistant with distance 1 for the IGMRFs defined on grids. The marginal standard deviation of all components of x is then be approximated by σ(x i ) σ ref(x) τ, i = 1,..., n. (6) This implies that a hyperprior assigned to the scaled precision σ 2 ref τ (7) (x), 5

6 is invariant to the model used. Note that the reference standard deviation needs to be computed using the given linear constraints for the different models, as otherwise it would not be finite. Figure 1 illustrates the marginal standard deviation using fixed precision τ = 1 for all components of the first and second-order IGMRFs on a line, each of size n = 100. We immediately notice that the level of these curves are very different. This gives a large difference in the calculated reference standard deviations of the two models, being equal to σ ref (x rw1 ) = 3.89 and σ ref (x rw2 ) = 41.39, respectively. If we choose the same hyperprior for the precisions of these two models, we allow the local deviation of the second-order model to be much larger than the local deviation for the first-order model, which might not make sense in practice Figure 1: The marginal standard deviation of a first-order (left panel) and second-order (right panel) IGMRF on a line, using fixed precision τ = 1. By scaling the precisions, we account not only for the type of IGMRF used but also for the number and resolution of nodes used in an analysis, as the calculated reference standard deviations will be coherent with the theoretical results in equations (3) (5). This implies that if we have selected a hyperprior for the precision of an IGMRF for a given resolution, this hyperprior can easily be recalculated to give the same amount of smoothing for another resolution. This is also the case for IGMRFs defined on rectangular or irregular lattices in which the hyperprior assigned to the scaled precision will adjust for both the number of nodes and the shape of the graph used Specifications using the Gamma distribution In the subsequent examples, we will assume that the random precision of an IGMRF is assigned a Gamma distribution with shape parameter a and inverse-scale parameter b. In the Gamma-case, a hyperprior assigned to (7) is simply given by a new Gamma distribution in which the inverse-scale parameter equals b σref 2 (x). This implies that we can easily assign a hyperprior to the scaled precisions of different IGMRFs, using a common shape parameter a and an inverse-scale parameter adjusted to the model. For example, if we want to recalculate the hyperprior between a first and second-order IGMRF on the line, we can use the same shape parameter for both models but calculate a new inverse-scale parameter b rw2 = b rw1 σ 2 ref (x rw1) σ 2 ref (x rw2). 6

7 Naturally, these ideas can also be transferred to many other choices of hyperpriors than the Gamma distribution. An important remaining question is how the parameters of a selected hyperprior should be tuned, as this governs the degree of smoothness of the resulting random effect. This might be crucial to posterior analysis but different parameter choices of the hyperprior can be difficult to interprete directly. The mapping in (6) provides an alternative interpretation of the parameters of a hyperprior, as the precision can be linked to the marginal standard deviation of the model. Recall that the marginal standard deviation of an IGMRF specifies how large we allow the local deviation of the different models to be. We suggest that an intuitive way to investigate sensitivity to the hyperprior, is to vary the size of this deviation. In order to do this we use the mapping in (6) and define an upper limit U by ( ) τ P (σ(x i ) > U) P σref 2 (x) < U = α, (8) where α is a fixed small probability. A priori, this upper limit for the marginal standard deviation says something about how large we allow the influence of the different random effects in a model to be. This also implies that any choice of parameters can be interpreted in terms of the upper limit expressed by U = ( ) bσ 2 1/2 ref (x) F 1, (9) (α, a, 1) where F 1 ( ) denotes the inverse cumulative distribution function for the Gamma prior. 4. Examples of spatial modelling In the following two examples, we explore the ideas of assigning hyperpriors to the scaled precisions of the different types of IGMRFs discussed. A priori, it seems natural to assume that random effects in a regression model should have the same influence, which we impose by using a common upper limit U for the marginal standard deviations. Sensitivity to the hyperprior can then be investigated as a function of this upper limit for all random effects in the model, simultaneously. The first example demonstrates the relevance of recalculating hyperpriors between first and second-order IGMRFs for non-linear effects of continuous covariates. In the second example, tuning of the spatial effect is essential to posterior conclusions on significance of fixed environmental covariates Leukemia survival data in Northwest England We first review a data set analysed in Henderson et al. (2002) concerning spatial variation in survival of adult acute myeloid leukemia patients in the northwestern part of England. Data are registered for n = 1043 patients, being diagnosed between 1982 and In addition to survival times of each patient, the given dataset also contains information on sex, age and white blood cell counts (wbc) at time of diagnosis. Also, an index named the Townsend deprivation index (tpi) is used to measure social deprivation, in which higher and lower values indicate poorer and more wealthy regions, respectively. Henderson et al. (2002) analysed the given data applying a multivariate frailty approach, including linear predictors for the covariate effects together with possible spatial variation based on the 24 districts. In Kneib and Fahrmeir (2007), the given data set was analysed using a mixed model approach, modelling covariates as penalized splines. Here, we use the same approach as in Martino et al. (2011) and analyse the data in R-INLA, using a piecewise 7

8 log-constant Cox model. We consider the following model, in which the survival time of the patients are linked to a predictor η i (t) = f 0 (t) + β 1 z 1i + β 2 z 2i + f 1 (wbc i ) + f 2 (tpi i ) + f s (s i ), i = 1,..., n, where f 0 (t) denotes a log-baseline function being piecewise constant along the time axis. We include sex (z 1i ) and age (z 2i ) as fixed linear effects and consider both first and secondorder IGMRFs to model the effect of the continuous covariates (wbc and tpi). We also include a spatially structured effect f s ( ) for the 24 districts, using the Besag-model in (2). For identifiability reasons, all functions for the random effects are constrained to sum to 0. In defining hyperpriors for the three precision parameters θ = (τ wbc, τ tpi, τ spat ), we first calculate reference standard deviations for the relevant IGMRFs. Here, the continuous covariates wbc and tpi, are discretized to have 50 unique values. Due to the wide range of wbc, the reference standard deviation for this covariate is much larger than for tpi, see Table 1. The table also displays the calculated upper limits in (9) using α = and default priors in R-INLA. We immediately notice that the upper limit using the rw2 model for wbc is unreasonably high, which implies that this model is allowed to have a much larger impact a priori, than the other terms in the model. besag rw1 (wbc) rw2 (wbc) rw1 (tpi) rw2 (tpi) σ ref (x) Upper limit U (default prior) Table 1: The reference standard deviations for different IGMRF priors together with the upper limit defined in (8) for the corresponding marginal standard deviations, calculated using (α = 0.001) and a Γ(1, ) hyperprior for the precisions White blood cell count Townsend deprivation index Figure 2: The estimated effect of white blood cell count (left panel) and the Townsend deprivation index (right panel), using a second-order IGMRF with default hyperprior (black solid line) and scaled hyperprior (red dashed line). Using scaled hyperpriors, we choose to fix the shape parameter a = 1 and impose a common upper limit for all of the different IGMRFs, here set equal to U = log(1.5)

9 The adjusted inverse-scale parameters accounting for the different models and resolutions are then given by b for the Besag-model while the inverse-scale parameters are b wbc,rw , b wbc,rw , b tpi,rw and b tpi,rw for the two continuous covariates, using either the first or second-order IGMRF. We notice that these parameters are not very different from the default value , except for b wbc,rw2 which becomes very low to account for the large marginal variance of the corresponding second-order IGMRF. Figure 2 illustrates that the effect of w bc is too variable if the hyperprior for the precision of the second-order IGMRF is not scaled appropriately. Using the scaled version, the effect of this covariate is seen to be linear, which is coherent with previous analyses (Martino et al. (2011), Sørbye and Rue (2011)). Differences in using either scaled or rescaled hyperpriors for tpi are small. Also, we did not find any significant differences in the estimated spatial effects, comparing the results using the default and the scaled hyperpriors (results not shown) U = 5 U = 1 U = 0.5 U = U = 5 U = 1 U = 0.5 U = White blood cell count Townsend deprivation index Figure 3: The estimated effect of wbc (left panel) and tpi (right panel), using a second-order IGMRF on a line and different upper limits U, where α = and a = 1 in (9). An important step in Bayesian analysis is to investigate sensitivity to prior choices. Here, sensitivity to the parameter choices of the hyperprior can be performed simultaneously for all random effects modelled by IGMRFs, varying the uppper limit U in (8). One alternative in doing this is to keep the shape parameter fixed at a = 1, while the inverse-scale parameters are determined by the common upper limit U. Figure 3 displays the resulting estimated effects using a second-order IGMRF in modelling wbc and tpi. Recall that this model penalizes local deviation from a line. For the wbc covariate, the resulting effect is seen to be linear also when the upper limit is set equal to a quite large value, U = 5. This illustrates that the data supports a linear model in this case. The resulting estimated effect of tpi is seen to vary more as a function of U. If U is chosen too large, the estimated curve seems to reflect some structure for the function that is not really supported by the data. If this upper limit for the marginal standard deviation is chosen small enough, the estimated effect will just give a straight line. The corresponding estimated spatial effects were not seen to vary much, and we conclude that in this example, the spatial effect does not seem to be very sensitive to the tuning of the hyperprior. 9

10 4.2. Spatial point pattern model including environmental covariates We now apply a second-order IGMRF defined on a lattice to model spatial autocorrelation, analysing a spatial point pattern discretized to a grid. We have chosen to analyse the spatial point pattern formed by the species Protium Tenuifolium, being one of many tree species observed in a rainforest dataset from Barro Colorado, Panama. The full dataset is derived from a global network of plots co-ordinated by the Center for Tropical Forest Science (CTFS) and includes a large number of different tree species observed within a 50-ha plot. In addition to the location of the trees, this dataset also includes information on environmental covariates including two topography covariates and thirteen different soil covariates Figure 4: The point pattern formed by the species Protium Tenuifolium observed within a 50-ha plot in Barro Colorado, Panama. The species under consideration consists of a total of 4294 trees (Figure 4) and is clearly seen to exhibit spatial inhomogeneity as the point intensity is higher on the right hand side of the plot. A primary aim in analysing this type of data is typically to investigate the influence of environmental covariates while spatially structured and unstructured terms can be included to account for spatial autocorrelation and overdispersion caused by unobserved trends and heterogeneity Modelling approach We choose to analyse the given spatial point pattern using a log-gaussian Cox process (Møller et al., 1998), in which the point intensity of a spatial point process is modelled by Λ(s) = exp{z(s)}, where {Z(s) : s 2 R} is a Gaussian random field. Conditioned on Z(s) = z(s) the observed counts y i within grid cell s i are assumed to be Poisson, that is ( ) y i z(s) Poisson exp(z(s))ds Poisson( s i exp(η i )), i = 1,..., n s i 10

11 where the grid cell area is s i. We assume that η i can be modelled as a linear predictor p η i = β 0 + z jiβ ji + f s (s i ) + u i, (10) j=1 where the vector β j represents the linear effect of the fixed covariate z j for j = 1,..., p. The function f s ( ) represents a spatially structured random effect, here modelled by a second-order IGMRF defined on a lattice. The error term u denotes spatially unstructered iid random effects, u i N(0, τu 1 ), having random precision. In the current analysis, we divide the given region of interest into n = grid cells, each cell having an area equal to 100 m 2. We have used default choices in R-INLA for the fixed precisions of the covariates and also for the hyperprior of the random precision of the error term. Alternatively, we could use the suggestions in Fong et al. (2010) for these terms. The observed environmental covariates are seen to be highly correlated. Based on standard GLM-analysis (omitting the spatially structured and unstructured terms) and backward selection, we select a subset of covariates in which the variance inflation factor is less than 5, to avoid problems with multicollinearity. The resulting environmental covariates then include terrain slope (Slope), together with soil covariates giving content of Aluminium (Al), Copper (Cu), Iron (Fe), Manganese (Mn), Phosphorus (P), Zinc (Zn), Nitrogen (N ) and also the ph-value (ph ) of the soil. All of these covariates are log-transformed and standardized prior to the analysis Slope Al Cu Fe Mn P Zn N ph Slope Al Cu Fe Mn P Zn N ph Slope Al Cu Fe Mn P Zn N ph Figure 5: Estimated coefficients of covariates with 95% credible intervals in the case of including only fixed environmental covariates (left panel), fixed effects and the unstructured term (middle panel), fixed effects and both the spatially structured and unstructured terms (right panel) in (10). The estimated means and 95% credible intervals for the effects of the environmental covariates are given Figure 5. The left panel illustrates the results including only the fixed covariates in (10). All of these covariates are seen to be significant at a 5% level when the spatially structured and unstructured terms are omitted, but we notice that the variance is clearly underestimated. Including the unstructured error term to model unobserved heterogeneity (middle panel), the variances increase slightly but only one of the covariates becomes non-significant. When the spatially structured effect is also included (using default priors in R-INLA), all of the covariates become non-significant, except for Mn (right panel) Tuning the spatial effect The given results illustrate the trade-off involved in fitting a spatial effect to point patterns. If the spatial autocorrelation is not sufficiently explained, the variance is underestimated and conclusions might be invalid. On the other hand, a too detailed spatial effect 11

12 just mirrors the pattern itself and potential significant impact of environmental covariates might not be revealed. To investigate this further, we need to tune the spatial effect. First, notice that using the default prior in R-INLA, the upper limit for the marginal standard deviation equals U = 1.85 (using α = 0.001) and this gives a rather detailed spatial effect (Figure 6). To explain spatial autocorrelation at a larger scale, the spatial effect needs to be smoothed more, which can be achieved by either increasing the shape parameter a and/or decrease the inverse-scale parameter b of the Gamma hyperprior Figure 6: Estimated spatial effect for the rainforest species Protium Tenuifolium, using default hyperprior for the precision of the second-order IGMRF on a lattice. The investigation of sensitivity to hyperprior parameters can be performed in several ways. One option is to use the same strategy as in the previous example, in which the upper limit in (8) is decreased to give a smoother spatial effect. For the given example, we choose to first increase the value of the shape parameter a, as this will explain spatial autocorrelation at larger scales (Beguin et al. (2012), Illian et al. (2012)). Notice that using a Gamma hyperprior, the shape parameter a is related to the coefficient of variation c v as ( ) 2 µ(τ) a = c 2 v =. sd(τ) This implies that by selecting a = 1, the standard deviation of the precision is allowed to be of the same size as its mean, which might often be too drastic. Here, we choose to reduce the standard deviation by letting a = 25, and then rerun the model for different values of U. Recall that if we want to change the resolution of the analysis, the inverse-scale parameter is scaled automatically as the reference standard deviation then will change. Figur 7 illustrates the resulting estimated spatial effect when U = 0.5, together with the resulting estimated means and 95% credible intervals for the covariates. The content of Mangangese is still seen to have a positive effect on the estimated point pattern intensity. In addition, we find that the soil content of Aluminium has a significantly negative impact. These results are in accordance with Schreeg et al. (2010), where Manganese is said to be an essential plant nutrient, while Aluminium is nonessential for plant growth, possibly having a toxic effect. These results have also been confirmed using a wider range of values for a. 12

13 Slope Al Cu Fe Mn P Zn N ph Figure 7: The estimated spatial effect (left panel) and the estimated mean and 95% credible intervals for the covariates (right panel), modelling the rainforest species Protium Tenuifolium using a Γ(25, 0.045) hyperprior for the precision of the second-order IGMRF on a lattice. 5. Concluding remarks The main focus of the given paper is to illustrate that the marginal standard deviation of commonly used IGMRF priors are very different, and we believe that this should be accounted for in assigning hyperpriors to the precision parameters of these models. Using R-INLA or other software for Bayesian analysis, the user typically needs to choose and scale these hyperpriors and there are no clear answers to how this should be done. The ideas presented here do rely on making some subjective choices. However, by assigning hyperpriors to scaled precisions, the resulting hyperpriors are invariant to the choice of IGMRF-model, including the resolution applied. More specifically, we can easily recalculate a given hyperprior to account for a different model. This implies that in cases where the latent field is modelled using several IGMRFs, the rescaling can be used to control the influence of the resulting random effects and to scale these to have comparable impact. The marginal standard deviation of an IGMRF has an intuitive interpretation in terms of representing local deviation from an assumed underlying level, either being a constant, a line or a plane. This interpretation can be used, both in terms of giving a better understanding of different hyperprior choices and also in terms of sensitivity to the choices made. If we impose the same upper limit for the marginal standard deviation of different IGMRFs, posterior sensitivity can be investigated simultaneously, for all of the corresponding random effects. A possible strategy in performing such a sensitivity analysis is to vary the upper limit for the marginal standard deviation. However, the exact value of the upper limit has to be interpreted with care, as this relies on the choice of a specific quantile probability. The presented strategy of assigning hyperpriors to scaled precisions of IGMRFs is easilyimplemented and can be used to make selection of these hyperpriors less ad-hoc than making purely subjective choices. The given ideas are restricted to the class of IGMRF-priors and do not include many commonly applied prior models. For example, a highly relevant alternative within spatial point pattern analysis, is to use the Matérn correlation function to model underlying spatial dependency structures. This model is also implemented in R-INLA, in which the user needs to select hyperpriors for both a precision and a range parameter. Guidelines in choosing these hyperpriors would be very useful. In general, prior selection remains to be a complicated field and an important subject of future research. 13

14 Acknowledgements The rainforest data set has been collected with the support from the Center for Tropical Forest Science, the Smithsonian Tropical Research Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Center for Tropical Forest Science, a global network of large-scale demographic tree plots. References Akerkar, R., S. Martino, and H. Rue (2010). Implementing approximate Bayesian inference for survival analysis using integrated nested Laplace approximations. Preprint Statistics, Norwegian Universiy of Science and Technology 1, Beguin, J., S. Martino, H. Rue, and S. G. Cumming (2012). Hierarchical analysis of spatially autocorrelated ecological data using integrated nested Laplace approximation. Methods in Ecology and Evolution 3, Berger, J. O., W. Strawderman, and D. Tang (2005). Posterior propriety and admissibility of hyperpriors in normal hierarchical models. The Annals of Statistics 33, Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society, Ser. B 36, Besag, J. and C. Kooperberg (1995). Biometrika 82, On conditional and intrinsic autoregressions. Besag, J., J. York, and A. Mollié (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics 43, Fahrmeir, L. and G. Tutz (2001). Multivariate statistical modelling based on generalized linear models. 2nd edn. Springer Berlin. Fong, Y., H. Rue, and J. Wakefield (2010). Bayesian inference for generalized linear mixed models. Biostatistics 11, Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2004). Bayesian Data Analysis. Chapman & Hall, London, Boca Raton. Henderson, R., S. Shimakura, and D. Gorst (2002). Modeling spatial variation in leukemia survival data. Journal of the American Statistical Association 97, Illian, J. B., S. H. Sørbye, H. Rue, and D. K. Hendrichsen (2012). Fitting a log Gaussian Cox process with temporally varying effects a case study. Journal of Environmental Statistics 3, Kneib, T. and L. Fahrmeir (2007). A mixed model approach for geoadditive hazard regression. Scandinavian Journal of Statistics 34, Lesaffre, E. and A. B. Lawson (2012). Bayesian Biostatistics. John Wiley & Sons, Ltd. Lindgren, F. and H. Rue (2008). On the second-order random walk model for irregular locations. Scandinavian Journal of Statistics 35,

15 Lindgren, F., H. Rue, and J. Lindström (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society, Ser. B 73, Lunn, D., D. Spiegelhalter, A. Thomas, and N. Best (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine 28, Martino, S., R. Akerkar, and H. Rue (2011). Approximate Bayesian inference for survival models. Scandinavian Journal of Statistics 38, Møller, J., A. R. Syversveen, and R. P. Waagepetersen (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics 25, Roos, M. and L. Held (2011). Sensitivity analysis in Bayesian generalized linear mixed models for binary data. Bayesian Analysis 6, Rue, H. and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC, Boca Raton. Rue, H., S. Martino, and N. Chopin (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion). Journal of the Royal Statistical Society, Ser. B 71, Schreeg, L. A., W. J. Kress, D. L. Erickson, and N. G. Swenson (2010). Phylogenetic analysis of local-scale tree soil associations in a lowland moist tropical forest. PLoS ONE 5, Sørbye, S. H. and H. Rue (2011). Simultaneous credible bands for latent Gaussian models. Scandinavian Journal of Statistics 38,

Modelling geoadditive survival data

Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model