Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014

Outlines of the talk Introduction Spatial Frailty Modelling Lattice models Bayesian implementation Bayesian model choice Application to Minnessota Mortality Model fitting Mapping summaries Model checking Neonatal versus post-neonatal mortality Comparison of Spatial Frailty and Logistic Regression Models Concluding Remarks

Introduction Time-to-event data, Most of the time, grouped into clusters; clinical sites, geographic regions. Hierarchical modelling approach using stratum-specific frailties is often appropriate for this type of data. Let t ij be the time to event or censoring for subject j in stratum i, j = 1,..., n i, i = 1,..., I. Let x ij be a vector of individual-specific covariates. In the frailty setting h(t ij ; X ij ) = h 0 (t ij ) exp(x β) (1) h(t ij ; X ij ) = h 0 (t ij ) exp(x β + W i ) (2)

Introduction cont.. W i iid N(0, σ 2 ) Normal distribution is used for the frailty model to facilitate correlation structure between them. Introducing parametric model on the base line (h 0 )( Weibull model) makes it hierarchical frailty model. The parametric frailty model becomes; h(t ij ; X ij ) = ρt ρ 1 ij exp(x β + W i ) (3) By introducing the prior distributions on ρ, βandσ, the bayesian implementation can be handled.

Introduction cont.. Such spatial arrangement of the strata are modeled in two ways. Geostatistical approaches, where we use the exact geographic locations (e.g. latitude and longitude) of the strata. Lattice approaches, where we use only the positions of the strata relative to each other.

2. Spatial Frailty Modelling 2.1 Geostatistical Models These Models assume that the random process of interest Y (s) is indexed continuously by s throughout a space D. Predict the unobserved value Y (t) at some target location t, given observations Y {Y (s i )} at known source locations s i, i = 1,..., I. Y µ, θ N I (µ, H(θ)), θ = (σ 2, φ) (4) Where N I indicates I dimensional normal distribution with stationary mean level µ and H(θ) ii is the covariance between Y (s i ) and Y (s i ).

Assuming isotropic setting H(θ) ii = σ 2 exp{ φd ii }, σ 2 > 0, φ > 0. (5) apply the geostatistical model (4) and (5) to the random frailties W i with the spatial structure. Adding prior distributions for ρ, β, andθ completes a Bayesian specification using (3) and (6). W Θ N I (0, H(Θ)) (6)

Lattice Models In this model assumption W is defined only on discretely indexed regions such that the regions form a partition of the space D, usually incorporate information about the adjacency of regions. W λ CAR(λ) (7) The most common form of this prior (Bernardinelli and Montomoli, 1992) has joint distribution proportional to λ I /2 exp{ λ (W i W 2 i )} iadji λ I /2 exp{ λ 2 I m i W i (W i W i )} i=1 where i adj i denotes that regions i and i are adjacent, Wi is

where i adj i denotes that regions i and i are adjacent, W i is the average of the W i i that are adjacent to Wi, and m i is the number of these adjacencies. This CAR prior is a member of the class of pairwise difference priors (Besag et al., 1995), which are identified only up to an additive constant. To permit the data to identify an intercept term 0 in the hazard function (2), we also add the constraint I i=1 W i=0. The prior specification is then, W i W i i N( W i, And put the gamma hyperprior distribution for λ. 1 λm i ). (8)

Bayesian Implementation Letting γ ij be a death indicator, the joint posterior distribution of interest is given by P(β, W, ρ, θ t, x, γ) L(β, W, ρ; t, x, γ)p(w Θ)P(β)P(ρ)P(θ) (9) I n i L(β, W, ρ, θ t, x, γ) {ρt ρ 1 ij exp(β T X ij + W i } γ ij (10) i=1 j=1 exp{ t ρ 1 ij exp(β T X ij + W i )} The model specification in the Bayesian setup is completed by assigning prior distributions for β, ρ, and Θ.

2.4 Bayesian Model Choice Bayes factors are notoriously difficult to compute, and the Bayes factor is only defined when the marginal density of y under each model is proper. Deviance information criterion have been used for model selection. The deviance Statistics is: D(θ) = 2 log f (y θ) + 2 log h(y) (11) Where h(y) is some standardize function. The DIC is defined as: DIC = D + P D Where D and P D are E θ y (D), E θ y (D) D(E θ y (θ)), respectively.

3. Application to Minnesota Infant Mortality 3.1 Model fitting The data were obtained from the linked birthdeath records data registry kept by the Minnesota Department of Health. The data comprise 267 646 live births occurring during the years 1992 to 1996 followed through the first year of life. The covariate information such as birth weight, sex, race, mothers age, and the mothers total number of previous births have been incorporated in this study. The contiguous county neighbor structure as well as the latitude and longitude of the centroids have been taken. This information is important to implement both the geostatistical and the lattice models.

3.1 Model fitting In addition, they investigate the non-spatial frailty model (2), as well as a simple nonhierarchical (no frailty) model, which simply sets W i = 0 for all i. Metropolis random walk steps with Gaussian proposals were used for sampling from the full conditionals for β, while Hastings independence steps with gamma proposals were used for updating ρ. For the geostatistical modeling of the W i, they used the isotropic exponential correlation function. The inter-county distances (d ii ) are computed using the coordinates of the centroids of the counties.

Model fitting (Cont.) For the exponential correlation function, the quantity 3 φ may be thought of as a measure of the effective isotropic range, i.e. the distance beyond which the correlation between the observations drops to less than 0.05. Here they adopt a vague IG(2, 0.01) prior for σ 2, ensuring a mean of 100 but infinite variance. For φ we take a vague G(0.01, 100) prior, having mean 1 but variance 100. For the lattice model, they use the CAR distribution for the W i,putting a prior for the smoothness parameter λ.

3.1 Model fitting By using DIC and effective model size pd(from the table), for the no-frailty model we can see that a pd is 8.72, very close to the actual number of parameters(nine). The DIC values suggest that each of these models is substantially better than the no-frailty model, despite their increased size.

Model Fitting Figure: Ha Figure: Sa

In all three models, all of the predictors are significant at the 0.05 level. Boys have a higher hazard of death during the first year of life. Evidence of the modest amount of spatial similarity in our dataset is provided by the posterior median for φ in the geostatistical model (Table 4); its value of 0.043 implies a median effective spatial range of 3/0.043 = 70 km. Indeed, this provides some reason why our geostatistical and CAR results should be so similar, since in most cases, borrowing strength from counties having centroids within 70 km will be nearly the same as borrowing strength from adjacent counties. A benefit of fitting the spatial CAR structure is seen in the reduction of the length of the 95% credible intervals for the covariates in the spatial models compared to the i.i.d model.

3.2 Mapping summaries Figures 3 and 4 map(from the Paper!!!) the posterior medians of the W i under the non-spatial (i.i.d. frailties) and CAR models, respectively, in the case where no covariates x are included in the model. The fitted i.i.d model indicates excess mortality in the north, which is accentuated and extended to a generally increasing pattern from south to north by the CAR model. This trend, combined with the clear emergence of the Minneapolis (county 27) and St Paul (county 62) urban area, strongly suggests the need for fitting covariates in our model, most of which vary spatially.

3.3 Model checking Figure: 1, Boxplots of posterior median frailties, i.i.d. and CAR models with and without covariates

From Figure 1, Posterior median frailties for the four cases (IID no covariates, CAR no covariates, IID with covariates, CAR with covariates): The tightness of the full CAR boxplot suggests this model is best at reducing the need for the frailty terms.

Neonatal versus post-neonatal mortality Neonatal (death within the first 28 days) and Post Neonatal (Death between 29 and 365). So these two data sets are fitted separately using CAR frailty model. Figure: Ha Figure: Sa

Sex, birthweight and total births are significant for both groups. while mothers age and native American race are significant only for the post-neonatal group, and black and unknown race are significant only for the neonatal group. Thus the two groups differ in ways that are both intuitive and substantively intriguing.

Comparison of Spatial Frailty and Logistic Regression Models Since the dataset does not have any censored, competing risks, or any reason other than the end of the study, there is no ambiguity in defining a binary survival outcome to use logistic regression model. That is, we replace the event time data t ij with an indicator of whether the subject did (Y ij=0 ) or did not (Y ij=1 ) survive the first year. Letting p ij = (Y ij=1 )), then model is logit(p ij ) = β ˆ ij X + W i (12) with the usual flat prior for ˆβ and an i.i.d., CAR, or geostatistical prior for the W i. When the probability of death is very small, as it is in the case of infant mortality, the log odds and log relative risk become even more similar.

Comparison of Spatial Frailty cont. When the probability of death is very small, as it is in the case of infant mortality, the log odds and log relative risk become even more similar. Figure: Posterior medians of the frailties Wi (horizontal axis) versus posterior medians of the logistic random effects W i (vertical axis). Plotting character is county number

Concluding Remark Several hierarchical approaches to frailty modeling for spatially correlated survival data have been discussed. Previous work by Carlin and Hodges (1999) suggests a generalization of our basic model (3) to h(t ij ; X ij ) = ρ i t ρ i 1 ij exp(x β + W i ) (13) That is, they allow two sets of random effects: the existing frailty parameters W i, and a new set of shape parameters ρ i. This then allows both the overall level and the shape of the hazard function over time to vary from county to county.

Sudipto B. et al., Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics, pp. 123 142, 2003. Carlin P. et al., Hierarchical Proportional Hazards Regression Models for Highly Stratified Data. Biometricss, 1162 1170, 1999.