SPATIAL JOINT ANALYSIS OF

Size: px
Start display at page:

Download "SPATIAL JOINT ANALYSIS OF"

Transcription

1 SPATIAL JOINT ANALYSIS OF LONGITUDINAL AND SURVIVAL AIDS DATA IN BRAZIL Rui Martins CIIEM - Escola Superior de Saúde Egas Moniz ruimartins@ymail.com Giovani L. Silva CEAUL & DMIST - Technical University of Lisbon gsilva@math.ist.utl.pt Valeska Andreozzi CEAUL & DEIO - University of Lisbon valeska.andreozzi@fc.ul.pt Joint analysis of longitudinal and survival data has received increasing attention in the recent years, especially for AIDS data. That has been usually analysed considering time-to-event data (survival outcome) or repeated measurements (longitudinal outcome) separately. As both outcomes are observed in an individual, a joint modelling of longitudinal and survival data is more appropriate because it takes into account the dependence between the two types of responses. We here employed a Bayesian hierarchical model for jointly modelling longitudinal and spatial survival data for a cohort of patients with HIV/AIDS in Brazil, obtaining estimates for the parameters of interest via Markov chain Monte Carlo methods. In addition, we also include spatial random effects to account for the unobserved heterogeneity amongst individuals living in the same region. The results show that Bayesian joint model presents considerable improvements in the median survival time distributions when compared with those obtained through longitudinal and survival models separately and the introduction of the spatial random effects improves the models. Keywords: Joint modelling, Longitudinal data, Survival data, Spatial analysis, Bayesian inference. 1 INTRODUCTION In many types of research, longitudinal and survival data are collected simultaneously. But their longitudinal and survival characteristics have usually been analysed separately, even when there were at least a latent relation between both data. A joint analysis can be an improvement of those data Martins, Silva and Andreozzi (2012) page 1 of 27

2 analysis because: (a) the failure mechanism may contain information on the longitudinal variable, especially in cases where such failures are informative; (b) allows the insertion in the time-to-event model of an internal covariate in the sense of Kalbfleisch and Prentice [20]. Survival models with a time-dependent covariate requires that this is external, i.e., the value of the covariate at time point t is not affected by the occurrence of an event at time point u, with u < t. However, the type of time-dependent covariates encountered in longitudinal studies do not satisfy this condition due to the fact that they are the output of a stochastic process generated by the subject, which is directly related to the failure mechanism [25]. Also, the longitudinal covariates are often important indicators for disease progression, but are prone to measurement error and random fluctuations causing their direct use as time-varying covariates in a survival model inappropriate [18]. Time-dependent variables rarely respect this principle. Therefore, the errors of observation should be taken into account in the survival part. While (a) cause bias in both submodels, (b) only causes bias in the survival submodel. One way to overcome this problem in a reasonable way is to jointly model these kinds of data. One of the main reasons for the increasing interest in joint analysis is that these models can be used in a variety of problems. Wu and Carroll [39] have been the first to deal with the problem of potentially informative dropouts in longitudinal studies (among others, e.g. Follmann and Wu [12]). Xu and Zeger [42] extended the model of Faucett and Thomas [11] to evaluate surrogate markers under a Bayesian point of view. Tsiatis et al. [37] considered a problem in which a time-to-event is the outcome of interest and repeated measures were considered as error-prone, time-varying covariates. Wulfsohn and Tsiatis [40] made an important early contribution to the emerging literature on joint modelling in which the repeated measures and time-to-event outcomes were assumed to be conditionally independent given a set of unobserved, subject-level random effects. Henderson et al. [15] have investigated the joint evolution of the longitudinal and survival processes. Several extensions of joint models have been proposed, including among others flexible modelling of the subject-specific profiles of the longitudinal outcome using multiplicative random effects (Ding and Wang [9]). Relaxation of common parametric assumptions for the random effects distribution Song et al. [32], replacing the relative risk models by accelerated failure time models Tseng et al. [36], handling multiple failure times Elashoff et al. [10]. Law et al. [24], Brown and Ibrahim [3], Chen et al. [6] and Chi and Ibrahim [8] have developed joint models in the presence of cure fractions. The potential advantages of multiple longitudinal outcomes in joint models have been considered by Chi and Ibrahim [7], Brown et al. [4], Song et al. [31] and Xu and Zeger [41]. Hu et al. [17] deal with competing risks. Tsiatis and Davidian [38] is of required reading because it gives an excellent overview Martins, Silva and Andreozzi (2012) page 2 of 27

3 of the subsequent literature on joint modelling of longitudinal and time-toevent data. There are also many other approaches to the joint analysis of longitudinal and event time data, including both Bayesian and more classic framework. Bayesian papers published in recent years include Ibrahim, Chen and Sinha [19], Guo and Carlin [14], Chi and Ibrahim [7] and Zhang et al. [43]. In this paper we focus on event time. To do this we need to longitudinally modelling the time dependent covariate used in the time-to-event model. Our approach is fully Bayesian which will enable us to extend the models used so far in the literature to incorporate spatial random effects and thus to be able to capture shared heterogeneities but not observed in the individuals who live in the same region. The starting point for our work is the article of Guo and Carlin [14] which addresses the problem of joint analysis by proposing a Bayesian hierarchical model obtaining estimates for the parameters of interest via Markov chain Monte Carlo (MCMC) methods. Their model is inspired in Henderson, Diggle and Dobson [15] where the authors proposed a likelihood-based joint model using the EM algorithm to connect the longitudinal and survival response with a zero-mean latent bivariate Gaussian process. Models like this are much easier to implement with a Bayesian approach than with an EM algorithm. The Bayesian approach avoids the troubles in maximizing the likelihood function, making inferences based on the posterior density of the parameters. The remainder of this paper evolves as follows. In section 2 we describe the data set. In section 3 we outline the spatial joint model that we work with. Section 4 discusses Bayesian model assessment. In Section 5 a detailed analysis of the HIV/AIDS data set is conducted to apply the proposed methods together with a residual and a sensitivity analysis. Finally, a brief summary and an indication of possible areas of work are given in Section 6. 2 DESCRIPTION OF DATA STRUCTURE The Brazilian National AIDS Program has been acknowledged as a success in controlling the epidemic and has generated three major electronic databases [13]: SINAN-AIDS (Information System for Notifiable Diseases of AIDS Cases), SISCEL (Laboratory Test Control System) and SICLOM (System for Logistic Control of Drugs). We consider a random sample of the three databases referred above which have been combined in a single database with both HIV and AIDS cases applying the linkage strategy adopted by the Surveillance Unit of the Brazilian National AIDS Program, including all individuals in each database [13]. The longitudinal and survival data were collected in a network of 88 laboratories located in every 27 states of Brazil during the years CD4 + T lymphocyte counts (a measure of immunologic and virologic status) and survival time were the responses collected in a random sample of Martins, Silva and Andreozzi (2012) page 3 of 27

4 n = 4653 individuals from the original data base. While the explanatory variables included were: age ( codded 0 and 50 codded = 1), gender (Female= 0, Male= 1), PrevOI (previous opportunistic infection at study entry= 1, no previous infection= 0), state (patient s Brazilian state of residence), date of HIV/AIDS diagnosis and date of death (available if death happened before 31 December 2006 and censored otherwise). The survival time after diagnosis is calculated as the time period, in years, between dates of diagnosis and date of death. As referred by Souza et al. [33] the variable that accounts for age was chosen basis on the Ministry of Health recommendations, as the over-50 age group showed a higher proportion of delayed initiation of therapy when compared to the population group aged years. CD4 counts by gender, age and PrevOI (Figure 1) showed a high degree of skewness toward high CD4 counts, suggesting a power transformation. Table 1 summarizes the number of HIV/AIDS and dead individuals per Brazilian state. In addition there were 320 deaths. 88% of the patients were between 15 and 49 years old; 2774 (60%) were males of whom 220 died. 61% of the individuals had no previous infection; 6.7% lived in the Central-West, 11.5% in the Northeast, 4.8% in the North, 60% in the Southeast region and 16.7% in the South. The median initial CD4 + T lymphocyte count was 245 cells/mm 3 (men cells/mm 3 ; women cells/mm 3 ) and patients made on average 4.62 CD4 exams resulting in a total of observations (Figure 2). C.West North Northeast South Southheast CD4 by region Female Male CD4 by gender <50 >= CD4 by Age No Yes CD4 by PrevOI Figure 1: Boxplots for the CD4 counts by gender, age and prevoi. 3 JOINT MODELLING FRAMEWORK Often in clinical trials studies both baseline and longitudinal covariates are collected for each subject, together with an event time of interest termed survival time. For instance, in AIDS studies the CD4 + T lymphocyte counts of a patient are measured longitudinally and serve as biomarker for timeto-aids or time-to-death (Tsiatis et al. [37]). Here we restrict our attention to the case in which we want to incorporate a time-dependent covariate measured with error (longitudinal outcome - CD4 counts) in a time-to-event model (death is our event of interest). Martins, Silva and Andreozzi (2012) page 4 of 27

5 Freq mean of CD4's counts Censored Dead number of CD4's exames number of CD4's exames Figure 2: Left panel shows the histogram for the number of CD4 s exams by individual; right panel shows the CD4 s mean evolution per number of exams. Suppose we have a set of n subjects followed over a certain period of time and we want to measure the real value of a longitudinal outcome m i (t), i = 1,..., n at time point t and a survival time T i to a certain endpoint for the ith subject. The event times T i may be subject to the usual random censoring, and then only the minimum of survival and censoring time C i, i.e., T i = min(t i, C i), is observed. We define the event indicator as δ i = 1I[T i C i ] where δ i = 1 indicates a failure and δ i = 0 indicates a right censored observation. Thus the observed data for the time-to-event outcome consist of the pairs {(T i, δ i ); i = 1,..., n}. A routine framework for characterizing associations among the longitudinal and time-to-event processes and covariates is to represent the relationship by a relative risk model [20], h i (t i (t), x i ) = 1 lim d t Pr{t T d t 0 i < t + d t T i t, i (t), x i } = h 0 (t) exp{β x i + γm i (t)}, (1) where the baseline risk function h 0 (t) can be left unspecified, i (t) = {m i (u), 0 u < t} denotes the history of the true and unobserved longitudinal process up to time point t, and x i is a vector of baseline covariates (such as a treatment indicator, history of diseases, etc.) with β being the corresponding vector of regression coefficients. Similarly, parameter γ quantifies the effect of the underlying longitudinal outcome to the risk for an event; for instance, in the AIDS field, γ measures the effect of the number of CD4 cells to the risk of death. Although the above formulation involves the longitudinal response at any time t, the response is collected on each subject only intermittently at some set of times {t i j T i, j = 1,..., n i }. Note that we do not assume a common set of measurements times for all subjects. Moreover, the true values Martins, Silva and Andreozzi (2012) page 5 of 27

6 Region State Dead (%) Total Region State Dead (%) Total CW Distrito Federal 5 62 N Acre CW Goiás 2 59 N Amapá 0 12 CW Mato Grosso N Amazonas 8 12 CW Mato Grosso Sul N Pará 50 4 NE Alagoas N Rondônia NE Bahia 3 79 N Roraima 14 7 NE Maranhão 8 65 N Tocantins 9 23 NE Pernambuco SE Espírito Santo 6 16 NE Rio G. Norte SE Minas Gerais NE Sergipe 9 23 SE Rio de Janeiro NE Ceará SE São Paulo NE Piauí 3 34 S Paraná 6 34 NE Paraíba 4 52 S Rio G. Sul S Santa Catarina Table 1: Number of HIV/AIDS and dead individuals per Brazilian state. Region notation: Central-West (CW), Northeast (NE), North (N), Southeast (SE), South (S). m i = {m i (t i j ), j = 1,..., n i } of the longitudinal response are not observed. Actually, what we observe is the true and unobserved value of the longitudinal outcome, m i (t i j ), contaminated with the measurement error e i (t i j ). Hence the set of observed longitudinal data consist of the measurements y i = { y i (t i j ), j = 1,..., n i }, where y i (t i j ) = m i (t i j ) + e i (t i j ), (2) and the errors being normally distributed, e i (t i j ) (0, σ 2 ). 3.1 LONGITUDINAL MODEL For the remaining of this paper of this work we will focus on normal data, y i (t i j ) (m i (t i j ), σ 2 ), and we will postulate the widely used linear mixed effects model (Laird and Ware [23]) to describe the longitudinal process. For instance in (2) the mean parameter, m i (t i j ), would then be specified as a function of available covariates and random effects. y i (t i j ) = m i (t i j ) + e i (t i j ) = µ i (t i j ) + W 1i (t i j ) + e i (t i j ) (3) where µ i (t i j ) are the fixed effects and can be described by a linear model, µ i (t i j ) = x 1i (t i j)β 1, x 1i (t i j) represents possibly time-varying explanatory variables and β 1 denotes the vector of their corresponding fixed effects parameters. W 1i (t i j ) is a latent Gaussian process. In section 3.3 we assume that W 1i (t i j ) can be specified as a linear random effects model, i.e., W 1i (t i j ) = z 1i (t i j)b i, where b i denotes a vector of random effects parameters corresponding to the random effects design matrix z 1i (t) (explanatory variables). Martins, Silva and Andreozzi (2012) page 6 of 27

7 3.2 SPATIAL TIME-TO-EVENT MODEL One of the most common approaches to model a time-to-event for the ith subject is to assume a Weibull distribution, T i (r, υ i (t)), which reduces to the exponential distribution if r = 1. Therefore, to build the model in (1) we define the baseline hazard as h 0 (t) = r t r 1 (4) and introduce covariates through the scale parameter, υ i (t): log υ i (t) = x 2i (t)β 2 + W 2i(t). (5) From (4) and (5) the hazard at time t is given as h i (t) = r t r 1 exp{x 2i (t)β 2 + W 2i(t)}. (6) The vectors x 2i (may be a subset of the fixed effects design matrix x 1i (t)) and β 2 represent (possibly time-dependent) explanatory variables and their corresponding regression coefficients. W 2i (t) like W 1i (t) is a latent Gaussian process. Another member of the class of proportional hazards models is the semiparametric Cox regression model, where h 0 (t) in (1) is left completely unspecified. Despite it allows a more flexible way to model the event time we are going to concentrate on parametric models because the diagnostics tools for joint models of longitudinal and survival data that we are going to use, introduced in [28], requires a complete specification of the likelihood and as noted by Hsieh et al. [16] the nonparametric maximum likelihood estimate for the baseline hazard cannot be obtained explicitly under the random effects structure. Researchers may be interested in estimating joint models with both spatial and non-spatial random effects. If they believe that subjects are clustered in a hierarchical structure, such that individuals within the same cluster share a common frailty, a hierarchical shared frailty modelling approach is appropriate instead. Such an approach can be useful in examining the relative contributions of spatial and non-spatial effects. Care must be taken here, because spatial and individual random effects are now identified only through their priors. We can allow the random effects at neighbouring locations to exhibit spatial dependence (Banerjee et al. [1]). This spatial dependence is incorporated by specifying an intrinsic conditionally autoregressive (ICAR) prior, Besag et al. [2] which is popular in disease mapping and image restoration literature. This model can be considered a limiting case for the CAR model. ICAR prior distribution is used to account for the spatially correlated random effects in time-to-event and in the longitudinal measures data across Martins, Silva and Andreozzi (2012) page 7 of 27

8 neighbouring regions (Brazilian states), with the neighbours defined via an adjacency matrix. If we have irregular geographic regions, we can associate the effects in the dataset with areal units. As such one can extend equation (6) to accommodate spatial random effects. In fact we just have to do slight changes in our model. To illustrate this let k = 1,..., K denotes the kth region in our study; K is the number of regions and Q k = Q(l k i k) is an area-specific random effect. Q k captures the residual or unexplained (log) relative risk of an event (e.g. death) in area k. We often think of Q k as representing the effect of latent (unobserved) risk factors. To allow for spatial dependence between the spatial random effects in nearby areas, we assume a ICAR prior for these terms. The relative risk becomes h ik (t) = r t r 1 exp{x 2ik (t)β 2 + W 2ik(t) + Q k }. (7) The subscript ik denotes the ith individual living in the kth area which has n k individuals living there. Note that the spatial component can be set equal to zero. The critical step that distinguishes spatial modelling from standard modelling approaches is the incorporation of adjacency information for the observations and the parameterisation of spatial dependence across neighbouring areas. From a Bayesian perspective, this involves incorporating a prior distribution to account for the spatial dependence, which smooths the spatial variability of a region toward its neighbours. Typically, an intrinsic conditionally autoregressive (ICAR) prior distribution incorporating adjacency information is employed to model this spatial dependence. The inclusion of the ICAR prior distribution in the survival spatial frailty models incorporates the potential spatial dependence among frailties at neighbouring locations. By incorporating the spatial locations of areas, the ICAR prior thus produces a conditional distribution for the random effects that is normally distributed with a conditional mean equal to the average of the random effects for neighbours of kth state and a conditional variance that is inversely proportional to the number of units neighbouring k. More formally, to specify this ICAR model, we assume the conditional distribution of the spatial random effect in region k, given values for the spatial random effect in all other states l k, depends only on the spatial random effect of the neighbouring regions of k, ι k. Here, we specify that area k is a neighbour of l if they share the same boundary. The ICAR prior for Q k has the form of: Q k V k, σ 2 Q 1 σ 2 Q V l,, (8) N k N l ι k k where N k denotes the total number of neighbours of k and Q k the vector of all random effects terms minus Q k itself. The conditional variance of the Martins, Silva and Andreozzi (2012) page 8 of 27

9 Gaussian Markov random field, σ 2 Q, is a hyper-parameter which we can assign an Inverse Gamma prior with high variance, e.g. σ 2 Q (0.5, ) [34]. That is, we say X (a, b) if its density is f (x) x a 1 exp {( b/x)}. One should note that we can extend (7) in order to accommodate a second set of random effects, say s k. These are unstructured specific state random effects and together with the spatial structured random effects allows more flexibility than assume only CAR random effects and we leave data to decide how much of the residual risk (death) is due to spatially structured variation, and how much is unstructured over-dispersion. Frequently these kind of random effects are assumed to have an exchangeable Normal prior. 3.3 SPATIAL JOINT MODEL Association between the longitudinal and survival processes can arise in two ways. One is through the deterministic effects of common explanatory variables and the other is to allow that the survival and longitudinal submodels share the same random effects. Joint models of this last type are also known as shared parameter models. Henderson et al. [15] proposed to connect the longitudinal and survival response with a zero-mean latent bivariate Gaussian process W ik (t) = (W 1ik (t), W 2ik (t)) at time t that is realised for each individual independently. The repeated measurements and time-to-event data are assumed independent given this linking process and the covariates of interest. As a particular case one can link (3) and (7) with the following form for the zero-mean latent bivariate Gaussian process W ik (t): and W 1ik (t) = b 1ik + b 2ik t ik j (9) W 2ik (t) = γ 1 b 1ik + γ 2 b 2ik (10) where γ = γ 1, γ 2, the random effects vector bik = (b 1ik, b 2ik ) is bivariate normal 2 (0, Σ) and are independent of e ik (t) in (3). This specification allows different subjects to have different baseline repeated measures and different time trends for longitudinal responses during the follow-up. Note that W 2ik = 0 means a separated analysis of the longitudinal and survival data. One can see that the form of W 2ik (t) is similar to W 1ik (t), including subject-specific covariate effects and an intercept Due to the significant separation in time between observations, correlation among residuals over time is assumed to be negligible, so the term e ik (t) (0, σ 2 ) is a sequence of mutually independent measurement errors and is assumed independent of b ik. Rizopoulos et al. [27], shown that as the number of repeated measurements per subject increases, a misspecification of the random effects distribution has a minimal effect in parameter estimators and standard errors. Thus, here we will assume that b ik are iid Martins, Silva and Andreozzi (2012) page 9 of 27

10 p (0, Σ) random vectors. The structure of Σ describes the association between repeated observations of the longitudinal data y ik. The vector of time independent random effects b ik links both the longitudinal and survival processes. This means that these random effects account for both the association between the two submodels and the correlation between the repeated measurements in the longitudinal process (conditional independence). The contribution L ik of the ith individual living in the kth area to the likelihood function, L, for the joint model involves two components, denoted by, L ik1 and L ik2. The first component is the likelihood for y ik and the second component is the likelihood function for T ik. Let θ be a generic vector of all the parameters in L and assume: 1. The data from different subjects are independent; 2. For the ikth subject, given all the unknown parameters and covariates, his longitudinal data is independent of his survival data. 3. For the ikth subject, given {m ik (t ik j ), j = 1,..., n ik }, {y ik (t ik j )} n ik j=1 are independent and y ik (t ik j ) has normal distribution (m ik (t ik j ), σ 2 ) Under the modelling assumptions presented before the contribution of the ikth subject to the conditional likelihood is given by: L ik (θ T ik, δ ik, y ik ) = L ik1 (b ik, β 1, σ 2 y ik ) L ik2 (b ik, β 2, γ,q k T ik, δ ik ). (11) Let y ik denote the n ik 1 vector of longitudinal responses of the ith subject living in the kth region and p(.) denote an appropriate probability density function. The likelihood of the longitudinal part is n ik L ik1 (b ik, β 1, σ 2 y ik ) = p(y ik b ik, β 1, σ 2 ) = p{ y ik (t ik j ) b ik, β 1, σ 2 } = = n ik j=1 1 2πσ exp 2 and the likelihood of the survival part is j=1 yik (t ik j ) x 1ik (t ik j)β 1 z 1ik (t 2 ik j)b ik 2σ 2 (12) L ik2 (b ik, β 2, γ,q k T ik, δ ik ) = = h ik (T ik ik (T ik ); b ik, β 2, γ,q k ) δ ik S ik (T ik ik (T ik ); b ik, β 2, γ,q k ) = h ik (T ik ik (T ik ); b ik, β 2, γ,q k ) δ ik P r(t ik > t ik(t); b ik, β 2, γ,q k ) = h ik (T ik ik (T ik ); b ik, β 2, γ,q k ) δ ik exp t 0 h ik(s ik (s); b ik, β 2, γ,q k )ds (13) Martins, Silva and Andreozzi (2012) page 10 of 27

11 where h ik is given by equation (7). For reasons that will become apparent in the section 5.2 a Weibull proportional hazards model with r = 1 in (7), i.e., an exponential model, is used to model time-to-event. We can rewrite (13) as p(t ik, δ ik b ik, β 2, γ,q k ) = exp{x 2ik (t)β 2 + W 2ik(t) + Q k } δ ik exp t 0 exp{x 2ik (t)β 2 + W 2ik(t) + Q k }d t. Finally from (12) and (14) we can rewrite (11) as (14) L ik (θ T ik, δ ik, y ik ) = = n ik 1 j=1 exp 2πσ 2 2 y ik (t ik j ) x 1ik (t ik j)β 1 z 1i (t ik j)b i 2σ 2 exp{x 2ik (t)β 2 + W 2ik(t) + Q k } δ ik exp t 0 exp{x 2ik (t)β 2 + W 2ik(t) + Q k }d t. (15) In order to have the joint likelihood one can simple take the product of all the n individual contributions to the likelihood: L(θ ) = K nk k=1 i=1 L ik(θ ) = K nk k=1 i=1 L ik1(b ik, β 1, σ 2 y ik ) K nk k=1 i=1 L (16) ik2(b ik, β 2, γ,q k T ik, δ ik ). 4 MODEL ASSESSMENT We use the Deviance Information Criterion (DIC) (Spiegelhalter et al. [35]) to choose across the spatial and non-spatial joint models. The resulting DIC of the models fitted to the same data can be compared to determine the best choice. As with other information criteria, smaller values of the DIC are favoured over larger values. The computation of DIC is straightforward during an MCMC run because it is particularly convenient to compute from posterior samples. 4.1 COX-SNELL AND MARTINGALE RESIDUALS Because DIC provides no information about the absolute adequacy of the models, other diagnostic tools are needed to assess the adequacy of the model. Complementary to DIC, we also use Cox-Snell and Martingale residuals to graphically check the fit of the survival part of our joint model with the best DIC. Let θ t represent the parameters to estimate in the survival submodel including the individual and spatial random effects. Cox-Snell residuals conditionally on the parameters are defined as the value of the cumulative risk function evaluated at the observed event times T ik, i.e., r CS ik (t ˆθ t ) = t 0 h ik (s ˆ ik (T ik ), ˆθ t )ds = log Ŝ ik (t ik ˆθ t ). (17) Martins, Silva and Andreozzi (2012) page 11 of 27

12 If the assumed model fits the data well we expect r CS (t.) to have a unit ik exponential distribution. Instead of plugging-in estimates we follow a suggestion in Rizopoulos [26] and we calculate the posterior expectation of the Cox-Snell residuals: r CS ik (t) = r CS ik (t θ t)p(θ t T ik, δ ik, y ik )dθ t. (18) In practice, we are computing r CS ik (T ik), which are the residuals at the observed event times, and therefore, when T CS is censored, r ik ik (T ik) will be censored as well. To take censoring into account in checking the fit of the model, we can compare graphically the Kaplan Meier estimate of the survival function of r CS ik (T ik) with the survival function of the unit exponential distribution. Martingale residuals provide a measure of the difference between the observed number of deaths in the interval (0, t ik ), where t ik is the event time for the ikth individual and the number of deaths predicted by the model. Martingale residual for the ikth individual is: r M ik = δ ik r CS ik (19) where δ is the event indicator. The plot of these residuals against a covariate indicates how the functional form of this covariate should look like. In particular, a straight line plot indicates that a linear term is needed. To help the interpretation of the plot it is advisable to analyse a LOESS (or LOWESS) smoother of this residuals. 4.2 MULTIPLE-IMPUTATION-BASED RESIDUALS There are some issues in the traditional approach to check model assumptions by the inspection of residuals plots in the longitudinal outcome inside a joint modelling framework that make their use difficult as noted by Rizopoulos [28]. In particular, complications arise because the occurrence of events is related with the underlying evolution of the subject-specific longitudinal measures, which corresponds to a non-random dropout mechanism (i.e., the probability of dropout depends on unobserved longitudinal responses). So the reference distribution of the standard residuals is not directly available and residuals plots can be misleading because these residuals should not be expected to exhibit standard properties such zero mean or constant variance. Rizopoulos [28] proposes a new method, inside a Bayesian perspective, to calculate the residuals on joint models called Multiple-imputation-based residuals. To produce these residuals we must augment the observed data with randomly imputed longitudinal responses under the complete data model, corresponding to the longitudinal outcomes that would have been observed had the patients not failed (death). As the objective of multiple imputation is to investigate the fit of the model and not to do inferences after Martins, Silva and Andreozzi (2012) page 12 of 27

13 the event time (death) there is no problem in considering values of the longitudinal outcome after the event time (see Kurland and Heagerty [22]). We must stress that multiple-imputation-based residuals are related to the visit times process (see sec. 4.3). We must consider the visiting process and determine it if the longitudinal measures are taken at random (the time points at which the longitudinal measures are taken are determined by the patients themselves). Otherwise the visiting process is known and we do not have to worry about it. Let s assume that the joint model has been yet fitted to the data set and that we have obtained all the posterior estimates ˆθ. The multiple imputation method is based on repeated sampling from the posterior distribution of the missing part of the longitudinal response vector, y m ik = {y ik(t ik j ) : t ik j T ik, j = 1,..., n }, given the observed data averaged over the posterior of the ik parameters. The density for this distribution is: p(y m ik y o i, T ik, δ ik ) = p(y m ik y o ik, T ik, δ ik, θ )p(θ y o ik, T ik, δ ik )dθ. (20) where y o ik = y i (t ik j ) : t ik j < T ik, j = 1,..., n ik is the observed longitudinal data for th ith individual living in the kth area before the observed event time. In order to know the simulation scheme to produce the imputed longitudinal measures and to deepen this subject you should read the excellent paper of Rizopoulos et al. [28]. The benefit of using the simulated y m(t i j) values together with y o ik ik to calculate residuals is that these residuals inherit now the properties of the complete data model, and therefore they can be directly used in diagnostic plots without requiring to take dropout into account. Two widely used residuals to check the adequacy of the longitudinal part are the standardized marginal and standardized subject-specific residuals which are define as: y ik X ik ˆβ 1 r ym ik = ˆV 1 2 r ys ik (t i j) =, (21) ik y ik (t ik j ) x ik (t ik j) ˆβ 1 z ikˆb ik ˆσ 1 (22) where ˆβ 1, ˆσ, ˆb ik and ˆV ik are the posterior estimates for the vector of the fixed effects coefficients, for the residuals variance, for the random effects vector and for the posterior covariance matrix of y ik. X ik is the design matrix whose rows are x i (t ik j). Marginal residuals can be used to investigate misspecification of the mean structure and the subject-specific residuals for checking the homoscedasticity. 4.3 VISITING TIMES If the individuals in our data set have random visit times, for the implementation of the multiple imputation scheme, we must model the visiting process Martins, Silva and Andreozzi (2012) page 13 of 27

14 in order to predict the future visit times. Let u ikq (q = 2,..., n ik ) denote the time elapsed between visit q 1 and visit q for the ikth subject, assume that all individuals have at least one measurement and that the visiting process is noninformative. Under this assumptions the visiting process can be formulated as p(u ikq u ik2,..., u ik(q 1), y ik (t ik1 ),..., y ik (t ik(q 1) ); θ v ) = = p(u ikq y ik (t ik(q 1) ); θ v ) (23) where θ v is the vector parameterizing the visiting process and where we see that the time elapsed between visits depends only on the last observed longitudinal measurement. In Rizopoulos [28] one can found other forms for this visiting process, namely some that take into account the complete history of the longitudinal process up to time t ik(q 1) and all the elapsed times. For the multivariate elapsed visit times u ik = (u ik2,..., u iknik ), we propose a Weibull model with an individual multiplicative Gamma frailty for the ikth individual and a structured spatial random effect for the kth region h(u ikq x vik, ω ik ) = ru r 1 ikq ω ik exp(x vik β v + Q k), (24) where ω ik is the frailty term, x vik denotes the design vector which may contain a functional form of the observed longitudinal responses y ik (t ik1 ),..., y ik (t ik(nik 1)), β v is the vector of regression coefficients and Q k is an areaspecific random effect capturing the residual or unexplained (log) relative risk of an event (e.g. death) in region k as defined in section 3.2. The choice for this model is motivated by its flexibility and simplicity. 5 APPLICATION 5.1 FITTING JOINT MODEL Inside a Bayesian approach the methodology developed in section 3 is now applied to the HIV-AIDS data described in section 2. The parameters estimates for the proposed joint (spatial) model in this work are obtained through the use of an MCMC simulation within the WinBUGS software (module GeoBUGS [34] was used to produce the maps). To apply our Bayesian hierarchical model let y ik (t ik j ) denote the square root of the jth CD4 count measurement on the ith patient living in one of the 27 Brazilian states, k = 1,..., 27. For the transformed longitudinal measure, an individual linear trajectory (inside the linear random effects model) was considered to account for patient-specific CD4 counts over time: y ik j b ik, β 1, σ 2 (m ik (t ik j ), σ 2 ) y i (t ik j ) = β 11 + β 12 t ik j + β 13 gender ik + β 14 age ik + β 15 prevoi ik + +b ik1 + b ik2 t ik j + Q k + e ik j (25) Martins, Silva and Andreozzi (2012) page 14 of 27

15 Because we have little prior information about all the parameters to be estimated, we want our data information to dominate the prior distribution by assuming reasonably noninformative priors for all parameters in this model. The prior distributions needed in the longitudinal component of the model are taken as: σ 2 (0.01, 0.01) β 1 = (β 11,..., β 15 ) 5 (µ β 1 = 0 5, Σ β 1 = 1000I 5 ) where σ 2 is the error variance, denotes the inverse gamma distribution, 0 5 = (0, 0, 0, 0, 0) and I 5 is the 5-dimensional identity matrix. We see that the prior for β 1 is sufficiently diffuse. For reasons that will become apparent in the section 5.2 a Weibull proportional hazards model (with r = 1 in (6), i.e., an exponential model) is used to model time surviving with AIDS/HIV: T ik (1, υ ik (t)) = (υ ik (t)) log(υ ik (t)) = β 21 + β 22 gender ik + β 23 age ik + β 24 prevoi ik + +γ 1 b ik1 + γ 2 b ik2 + Q k. (26) Below we describe the prior distributions that were used in our survival component: β 2 = (β 21,..., β 24 ) 4 (µ β 2 = (0, 0, 0, 0), Σ β 2 = 1000I 4 ) σ 2 Q (0.5, ) The prior distribution for spatial random effects Q k are assumed to follow a ICAR and is assign an inverse gamma to the hyperparameter σ 2 Q following a suggestion in Spiegelhalter et al. [35]. Because the ICAR distribution is improper and parameterised to include a sum-to-zero constraint on the random effects, a separate intercept term assigned a flat (improper uniform) prior must be included in the models [34]. Thus, when ICAR models were assumed for the spatial random effects, Q k, the improper uniform prior was assumed for the intercepts β 21. For the parameters common to both components we take: b ik = (b 1ik, b 2ik ) 2 (0, Σ) Σ ish(r = 100I 2, ξ) where ish(r, ξ) denotes the Inverse Wishart distribution with R representing a 2 2 positive definite matrix prespecified a priori and we putted the degrees of freedom ξ = n/20 = 4654/ following a suggestion in Carlin and Louis 2001 page 279 [5] to avoid confounding between fixed Martins, Silva and Andreozzi (2012) page 15 of 27

16 and random effects. Finally, for the association parameters γ 1 and γ 2 we choose normal vague priors: γ 1 (0, σ γ1 = 100) γ 2 (0, σ γ2 = 100) We use noninformative proper prior distributions for each of the parameters required in the joint model. The initial values of the parameters β 1 and β 2 for sampling are obtained by modelling the longitudinal and survival data separately. Priors for σ 2, β 1, β 2, Σ and σ 2 Q are motivated by their conjugacy and we assume that they are independent a priori. A set of models was fitted in our analysis. The fixed effects from covariates gender, age and presence of previous opportunistic infections are always included in both submodels. By including individual and spatial random effects (structured, unstructured or both), we have a variety of candidate models from which a best-fitting model will be selected based on the DIC value of the joint model. The models considered are summarized in table 2. Model W 1 W 2 V p D DIC no random effects I random intercept II b III b 1 γ 1 b random intercept and random slope IV b 1 + b 2 t V b 1 + b 2 t γ 1 b VI b 1 + b 2 t γ 2 b VII b 1 + b 2 t γ(b 1 + b 2 ) VIII b 1 + b 2 t γ 1 b 1 + γ 2 b spatial random effects IX b 1 + b 2 t γ 1 b 1 + γ 2 b 2 V X b 1 + b 2 t γ(b 1 + b 2 ) V XI b 1 + b 2 t γ(b 1 + b 2 ) V + s k XII b 1 + b 2 t γ(b 1 + b 2 ) s k Table 2: Candidate Bayesian models. 5.2 RESULTS Table 2 reports the p D and DIC score for a variety of joint models for the HIV/AIDS data from Brazil with different forms of the latent processes W 1 and W 2 and for the spatial random effects. There is some inability of our data to reliably identify both r and β 21 in the Weibull model. If both are included in the sampling order, the chains exhibit strong negative correlations between the samples of these two parameters, as well as strong positive Martins, Silva and Andreozzi (2012) page 16 of 27

17 autocorrelations in their individual sampled chains. This had already been noted by Guo and Carlin [14]. In order to overcome this problem we have to use a thin equal to 250 and iterations. The computation time takes approximately 25 hours in a Quad core Desktop. The posterior mean of the shape parameter of the Weibull distribution was As this value is not much different from 1 and because of the computational time we decide to use a Weibull model with r = 1 (i.e., an exponential model) to analyse the time-to-event data. We think that Guo and Garlin [14] have not achieved convergence for the chains of the two parameters referred above because they have not try a such big thin. After we decided to use an exponential model our results were based in one MCMC sampling chain of iterations following a iteration burn-in period. In order to eliminate autocorrelation among samples within the sequence, we selected every 50-th iteration in the chain. A study of the trace and density plots of the posterior distributions of the coefficients indicates that the results are accurate and the MCMC algorithm converged. Our first model (Model I) is a very simple one with no random effects and no interaction resulting in a very poor DIC. With the inclusion of a random intercept in the longitudinal submodel (Model II) there is a great improve in DIC. In Model III we allow shared random effects decreasing DIC too. Models IV and VIII have a random intercept and slope in the longitudinal submodel. These five models continue to have a lower DIC than their predecessors. Here the association between the two submodels was introduced in different ways: no interaction (Model IV); Models V and VI introduced interaction with a shared random intercept and a random slope, respectively; Model VII have a common association parameter for the random intercept and slope while Model VIII assumes that the parameter is different for both the intercept and slope. The next four models introduced the spatial component. It have an interesting impact in DIC allowing it to decrease further which suggests a latent spatial effect in our data set. The smallest DIC, and therefore the best fit, is achieved by Model IX. Henceforth whenever we refer to the model we will be referring to the model IX (the best model). It s interesting to note that when we have a unstructured spatial random effect s k in the survival part (Model XII) DIC is lower than when we have both structured, Q k, and unstructured, s k, spatial random effects (Model XI) but bigger than when we only have structured effects (Model IX). We think this is because there is a confounding between both spatial random effects. We also try to adjust other models, namely including spatial random effects in the longitudinal submodel but this proved not to be a good choice because DIC increases. Table 3 summarizes the posterior estimates of the most important parameters in the model and their 95% credible intervals. The estimated average longitudinal regression coefficients for gender is negative and significantly different from zero (the 95% credible interval do not contains 0) suggest- Martins, Silva and Andreozzi (2012) page 17 of 27

18 Bayesian separate analysis Bayesian joint analysis Parameter Posterior mean 95% CI Posterior mean 95% CI Longitudinal submodel Longitudinal submodel Intercept (β 11 ) (17.14, 17.66) (17.19, 17.74) Time (β 12 ) 1.81 (1.71, 1.90) 1.71 (1.61, 1.80) Gender (β 13 ) 0.63 ( 0.93, 0.32) 0.65 ( 0.94, 0.35) Age (β 14 ) 0.51 ( 0.96, 0.05) 0.58 ( 1.04, 0.13) Prevoi (β 15 ) 2.01 ( 2.33, 1.71) 2.12 ( 2.44, 1.79) σ (6.87, 7.2) 7.04 (6.87, 7.21) σ b (25.64, 28.21) (25.64, 28.19) σ b 5.20 (4.82, 5.59) 5.17 (4.79, 5.59) 22 cor = σ b / σ b σb ( 0.43, 0.36) 0.36 ( 0.40, 0.32) Survival submodel Survival submodel Intercept (β 21 ) 4.30 ( 4.54, 4.07) 5.03 ( 5.39, 4.70) Gender (β 22 ) 0.33 (0.098, 0.59) 0.40 (0.16, 0.64) Age (β 23 ) 0.62 (0.33, 0.88) 0.76 (0.47, 1.06) Prevoi (β 24 ) 0.87 (0.63, 1.1) 1.02 (0.75, 1.24) γ ( 0.26, 0.197) γ ( 0.49, 0.26) σ 2 V (0.0002, 0.102) Table 3: Separate (Model IV) and joint (Model IX) Bayesian analysis. ing that male patients have lower CD4 counts during the follow-up than females. The same occurs with the regression coefficient for age and PrevOI. This means that patients aged above 50 and patients that have already some opportunistic infectious disease at study entry have lower CD4 counts than individuals who have less than 50 years old and the individuals without opportunistic infectious disease at study entry, respectively. In the survival submodel the positive coefficients for gender suggest that the relative risk of dying is significantly lower in women than in men. This is in agreeing with what we have seen in the longitudinal submodel, men have lower counts of CD4. We know that high CD4 count means a healthier individual. Age and PrevOI have significant positive coefficients, which indicate a lowering of hazard rate in group under 50 years old and an increase in the hazard for the patients having already some previous infectious disease. The posterior estimates of the association parameters γ 1 and γ 2 in the joint analysis are negative and significantly different from zero, providing strong evidence of association between the two submodels and indicating that both the initial level and slope of CD4 count are negatively associated with the hazard of death. Bayesian joint model presents improvements over the median survival time as compared with those ones obtained separately. To illustrate this situation let us consider two patients described in table 4. Figure 3 shows Patient 61 has a relatively good CD4 trajectory (starts relatively high and Martins, Silva and Andreozzi (2012) page 18 of 27

19 remains that way), while Patient 62 has a bad one (starts low and do not increases much). The joint results substantially differ from the separate results, increasing the survival time for Patient 61, and decreasing it for Patient 62. Moreover, the joint model actually reverses separate models findings, in the sense that the patient with the good CD4 trajectory is now predicted to survive much longer than the patient with the bad trajectory. For the median survival times of patients 61 and 62 we obtained point estimates of roughly 59 and 24, respectively. Gender Age PrevOI Region State Cens. Obs. T.(days) Patient 61 M 29 no C.West Mato Grosso Yes 1624 Patient 62 F 24 no C.West Mato Grosso Yes 1390 Table 4: Measures of two patients. CD4 over the time Median survival time Median survival time CD Density Model IV Model IX Density Model IV Model IX Year since entry Patient Patient 62 Figure 3: Left panel shows CD4 trajectory for patients 61 and 62. The other two panels show the posterior distribution of the median survival time for the two patients using model IX (solid black line) and using model IV where we separate model the longitudinal and survival part (dashed black line). Figure 4 shows the plot of spatial random effects based on model IX, representing unobserved, or unknown, environmental or geographical factors. In order to make the spatial distribution of the random effects visually identifiable, the posterior means of the state-specific random effects were first coded according to the quintiles of their distributions. A clear spatial variation is displayed in the map with increased hazard in the Central-Western region, while low risk prevailed in parts of the northern and southern regions. 5.3 RESIDUALS To check the fit of the model for the survival outcome we analysed the posterior estimates of the Cox-Snell and Martingale residuals as indicated in Section 4.1. For the longitudinal outcome we combined in one plot the observed residuals corresponding to y o i with the multiply imputed residuals corresponding to y m i discussed in Rizopoulos [28]. The linear predictor Martins, Silva and Andreozzi (2012) page 19 of 27

20 Relativ Risk (Model IX) 1.02 to to to to to 0.99 Figure 4: Map of Brazil showing spatially correlated heterogeneity in model IX. which we use to investigate the visiting process (24) was: x vik β v + log(ω ik) + Q k = β v0 + β v1 age ik + β v2 sex ik + +β v3 prevoi ik + β v4 y ik(tq 1 ) + log(w ik ) + Q k. (27) We assigned a noninformative prior distribution to each β in (27); for the spatial random effects we assume a ICAR prior distribution as in section 3.3. For frailties and for the shape parameter of the Weibull distribution we assigned a Gamma distribution with mean 1 and large variance, (η, η). We say X (a; b) if its density is f (x) x a 1 exp( bx). η is the precision of ω ik s and small values of it means a closer positive relationship between the elapsed visit times within an individual and greater heterogeneity among the individuals in their elapsed visit times. Because it might be difficult in some cases to extract conclusions by examining each imputation separately due to missingness in areas with few observed data, Rizopoulos [28] suggests to include in one plot the residuals from several imputations (we used L = 5 imputations) and check for systematic trends using weighted loess fits, with weight one for the observed residuals, and 1/L for the imputed ones. The plot of the standardized subject-specific residuals in Figure 5 shows a systematic trend for the observed residuals (solid grey line) but also shows that this behaviour is alleviated when we consider the imputed residuals (dashed grey line), therefore homoscedasticity of the errors e ik (t ik j ) is a condition which is verified. In the plot of the standardized marginal residuals we observe that the fitted weighted loess curve based on the observed data alone versus the fitted values of CD4 show a slight systematic trend (solid Martins, Silva and Andreozzi (2012) page 20 of 27

21 line) but this behaviour is not present when we look to the imputed residuals (dashed line), indicating that after taking dropout into account the fitted joint model does seem to be a plausible model for this data set. In Figure 5, Martingale residuals show that the assumed relation between the longitudinal time-dependent covariate and the hazard function is adequate, because the loess smoother does not show severe discrepancies from zero. To aid visual assessment in the plot for the Kaplan Meier estimates of the posterior estimates of the Cox-Snell residuals, the unit exponential distribution (dashed line), corresponding to a perfect fitting model, was added to the plot. Although there is some deviation, the majority of points are close to the curve. The middle of the curve appears to deviate from the unit exponential distribution. Actually, this deviation represents only a small percentage of the total observations, only about 7 per cent of our total observations. Further examination reveals that most of these observations correspond to individuals with only one measure of CD4. However again, considering that most observations in our data are for those with more than one measure of CD4, model adequacy may be deemed reasonable. 5.4 SENSITIVITY ANALYSIS In order to investigate the influence of hyperprior specifications we carried out a sensitivity analysis with respect to the prior distributions for the spatial variance components, σ 2 Q, of our best model (model IX), assuming a variety of different inverse gamma priors (a, b), whose mode is 1/(b(a + 1)). In particular, our experimental design used the following combinations suggested in Silva et al. [30]: (a, 1/b) = (0.5, ), (0.001, 0.001), (0.01, 0.01), (0.1, 0.1), (2, 0.001), (0.2, ), and (10, 0.25), which are here denoted by A, B, C, D, E, F, and G, respectively. The prior A as been suggested in Kelsall and Wakefield [21] and expresses the prior belief that the random effects standard deviation is centred around 0.05 with a 1% prior probability of being smaller than 0.01 or larger than 2.5. C and D are variants of prior B, with associated dispersion (larger than that of B) in increasing order; note, in addition, that B has much larger dispersion than prior A. The use of the inverse gamma with small values of c = d provides a distribution with a very low probability on small values of the random effects standard deviation. Priors E and F correspond to distributions with the same mode as prior A, but with lower (E) and larger (F) dispersion. Prior G is quite concentrated and favours marked geographical variation; it is the furthest from a noninformative setting. The estimates of the variance based on model IX are listed in table 5. It is also important to present the sensitivity of model selection with respect to the hyperparameters, that is, whether the model selected by DIC and other measures considered here change with the assumed priors. Table 5 provides summary measures of model fit for model IX with various hyperparameter settings. Martins, Silva and Andreozzi (2012) page 21 of 27

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages

Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages Xu GUO and Bradley P. CARLIN Many clinical trials and other medical and reliability studies generate both

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Parametric Joint Modelling for Longitudinal and Survival Data

Parametric Joint Modelling for Longitudinal and Survival Data Parametric Joint Modelling for Longitudinal and Survival Data Paraskevi Pericleous Doctor of Philosophy School of Computing Sciences University of East Anglia July 2016 c This copy of the thesis has been

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Longitudinal breast density as a marker of breast cancer risk

Longitudinal breast density as a marker of breast cancer risk Longitudinal breast density as a marker of breast cancer risk C. Armero (1), M. Rué (2), A. Forte (1), C. Forné (2), H. Perpiñán (1), M. Baré (3), and G. Gómez (4) (1) BIOstatnet and Universitat de València,

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues Journal of Probability and Statistics Volume 2012, Article ID 640153, 17 pages doi:10.1155/2012/640153 Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL Christopher H. Morrell, Loyola College in Maryland, and Larry J. Brant, NIA Christopher H. Morrell,

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Joint modelling of longitudinal measurements and event time data

Joint modelling of longitudinal measurements and event time data Biostatistics (2000), 1, 4,pp. 465 480 Printed in Great Britain Joint modelling of longitudinal measurements and event time data ROBIN HENDERSON, PETER DIGGLE, ANGELA DOBSON Medical Statistics Unit, Lancaster

More information

Fully Bayesian Spatial Analysis of Homicide Rates.

Fully Bayesian Spatial Analysis of Homicide Rates. Fully Bayesian Spatial Analysis of Homicide Rates. Silvio A. da Silva, Luiz L.M. Melo and Ricardo S. Ehlers Universidade Federal do Paraná, Brazil Abstract Spatial models have been used in many fields

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Investigating posterior contour probabilities using INLA: A case study on recurrence of bladder tumours by Rupali Akerkar PREPRINT STATISTICS NO. 4/2012 NORWEGIAN

More information

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University September 28,

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

arxiv: v1 [stat.ap] 12 Mar 2013

arxiv: v1 [stat.ap] 12 Mar 2013 Combining Dynamic Predictions from Joint Models for Longitudinal and Time-to-Event Data using Bayesian arxiv:1303.2797v1 [stat.ap] 12 Mar 2013 Model Averaging Dimitris Rizopoulos 1,, Laura A. Hatfield

More information

Multivariate spatial modeling

Multivariate spatial modeling Multivariate spatial modeling Point-referenced spatial data often come as multivariate measurements at each location Chapter 7: Multivariate Spatial Modeling p. 1/21 Multivariate spatial modeling Point-referenced

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis CIMAT Taller de Modelos de Capture y Recaptura 2010 Known Fate urvival Analysis B D BALANCE MODEL implest population model N = λ t+ 1 N t Deeper understanding of dynamics can be gained by identifying variation

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Regional and demographic determinants of poverty in Brazil

Regional and demographic determinants of poverty in Brazil Regional and demographic determinants of poverty in Brazil Andre P. Souza Carlos R. Azzoni Veridiana A. Nogueira University of Sao Paulo, Brazil 1. Introduction Poor areas could be so because they concentrate

More information

A Bayesian multi-dimensional couple-based latent risk model for infertility

A Bayesian multi-dimensional couple-based latent risk model for infertility A Bayesian multi-dimensional couple-based latent risk model for infertility Zhen Chen, Ph.D. Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health

More information

Spatial Analysis of Incidence Rates: A Bayesian Approach

Spatial Analysis of Incidence Rates: A Bayesian Approach Spatial Analysis of Incidence Rates: A Bayesian Approach Silvio A. da Silva, Luiz L.M. Melo and Ricardo Ehlers July 2004 Abstract Spatial models have been used in many fields of science where the data

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

A Bayesian Semi-parametric Survival Model with Longitudinal Markers

A Bayesian Semi-parametric Survival Model with Longitudinal Markers A Bayesian Semi-parametric Survival Model with Longitudinal Markers Song Zhang, Peter Müller and Kim-Anh Do Song.Zhang@utsouthwestern.edu Abstract We consider inference for data from a clinical trial of

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Analysis of Cure Rate Survival Data Under Proportional Odds Model

Analysis of Cure Rate Survival Data Under Proportional Odds Model Analysis of Cure Rate Survival Data Under Proportional Odds Model Yu Gu 1,, Debajyoti Sinha 1, and Sudipto Banerjee 2, 1 Department of Statistics, Florida State University, Tallahassee, Florida 32310 5608,

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions C. Xing, R. Caspeele, L. Taerwe Ghent University, Department

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Point process with spatio-temporal heterogeneity

Point process with spatio-temporal heterogeneity Point process with spatio-temporal heterogeneity Jony Arrais Pinto Jr Universidade Federal Fluminense Universidade Federal do Rio de Janeiro PASI June 24, 2014 * - Joint work with Dani Gamerman and Marina

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Best Linear Unbiased Prediction: an Illustration Based on, but Not Limited to, Shelf Life Estimation

Best Linear Unbiased Prediction: an Illustration Based on, but Not Limited to, Shelf Life Estimation Libraries Conference on Applied Statistics in Agriculture 015-7th Annual Conference Proceedings Best Linear Unbiased Prediction: an Illustration Based on, but Not Limited to, Shelf Life Estimation Maryna

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

Downloaded from:

Downloaded from: Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Integrated likelihoods in survival models for highlystratified

Integrated likelihoods in survival models for highlystratified Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information