Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Size: px

Start display at page:

Download "Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes"

Stella Cannon
6 years ago
Views:

1 Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics. Chapel Hill 2010 Approved by: Dr. Donglin Zeng, Advisor Dr. David Couper, Reader Dr. Danyu Lin, Reader Dr. John Preisser, Reader Dr. Ying So, Reader

3 Abstract SE HEE KIM: Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes. (Under the direction of Dr. Donglin Zeng.) In this dissertation, we study statistical methodology for joint modeling that correctly controls for the interplay among longitudinal and counting processes and makes the most efficient use of data. Three types of joint modeling approaches are proposed based on three different purposes of studies. In the first topic, we develop a method for joint modeling of longitudinal data and recurrent events in the presence of an informative terminal event. We focus on data from patients who experience the same type of event at multiple times, such as multiple infection episodes or recurrent strokes, have longitudinal biomarkers, and may be subject to an event, for example death, that makes further observations impossible. To analyze such complicated data, we propose joint models based on a likelihood approach. A broad class of transformation models for the cumulative intensity of recurrent events and the cumulative hazard of the terminal event is considered. We propose to estimate all the parameters using nonparametric maximum likelihood estimators (NPMLE), and we provide computationally efficient EM algorithms to implement the proposed inference procedure. Asymptotic properties of the estimators are shown to be asymptotically normal and semiparametrically efficient. Finally, we evaluate the performance of the proposed method through extensive simulations and application to real data. In the second topic, we develop a method for joint modeling of longitudinal and iii

4 cure-survival data. By cure-survival data, we mean time-to-event data in which a certain proportion of patients never have any event during a sufficiently long follow-up period. These patients are believed to have been cured by treatment, such as radiation therapy or an initial surgery, and are often the source of heavy tail probabilities in survival curves. To take into account the possibility of patients being cured, we propose to model time-to-event through a transformed promotion time cure model, jointly with a linear mixed effects model for longitudinal data. Due to transformations applied to the promotion time cure model, the proposed method is able to be used in cases where the proportionality assumption does not hold. All the parameters are estimated using NPMLEs, and inference procedures are implemented via a simple EM algorithm. Asymptotic properties of the proposed NPMLEs are derived based on empirical process theory. Simulation studies are conducted and the method is applied to the ARIC data in order to demonstrate the small-sample performance of the proposed method. In the third topic, we develop a partially linear model for longitudinal data with informative censoring, where the main interest is in making inferences about the individual s trajectory of longitudinal responses, which may be informatively censored. Since a fully parameterized mean structure may be insufficient to capture the underlying patterns of longitudinal and event processes, we propose to use a partially linear model for longitudinal responses, where an unspecified underlying function is formulated along with linear covariate effects, and a transformation model is used for informative censoring times. We employ a sieve estimation for the nonparametric trajectory of longitudinal responses, where the unknown trajectory is approximated by cubic B-spline basis functions. All parameters are estimated based on a likelihood approach, and inference procedures are implemented via the EM algorithm. We also investigate a reliable way to select the number of knots and the best transformation. iv

5 Through empirical process theory, asymptotic properties of the proposed estimators are shown to provide desirable properties. The validity of the proposed method is confirmed by simulated and real data examples. v

6 vi

7 Acknowledgments None of this would have been possible without the personal and practical support of numerous people. I would gratefully and sincerely like to thank my advisor, Dr. Donglin Zeng, for his guidance, encouragement, support, and deep consideration throughout the duration of this dissertation. He showed me different ways to approach a research problem and the need to be persistent to accomplish any goal. Furthermore, he always read and responded to the drafts of each page of my work more quickly than I could have hoped. I am grateful to my committee members, Drs. David Couper, Danyu Lin, John Preisser, and Ying So, for their valuable comments and insightful suggestions on this dissertation. I wish to thank Drs. Ying So and Gordon Johnston, who as my supervisors and friends at SAS guided my RA work and cared deeply about me. I truly enjoyed working with them. I would also like to express my sincere appreciation to Dr. John Preisser for inspiring and supporting me to be a professor. Special thanks go to my best friends, Joy Wu, Che-Chin Lie, and Chaeryon Kang for their support and encouragement. Last, I can never find words to express my gratitude to my fiance Seunggeun Lee and my family who are always standing right next to me. vii

8 viii

9 Table of Contents Abstract iii List of Tables xiii List of Figures xv 1 Introduction xvi 1.1 Joint Models of Longitudinal Data and Recurrent Events with Informative Terminal Event Joint Modeling of Longitudinal Data and Cure-Survival Data Partially Linear Model for Longitudinal Data with Informative Censoring 3 2 Literature Review Models for Longitudinal and Survival Data Transformation Models for Survival Data Partially Linear Models for Longitudinal Data Joint Models for Longitudinal Data and Survival Event Models for Cure-Survival Data Mixture Cure Models Promotion Time Cure Models Transformation of Promotion Time Cure Models Models for Longitudinal Data and Recurrent Events ix

10 2.4 Models for Recurrent and Terminal Events Joint Models of Longitudinal Data and Recurrent Events with Informative Terminal Event Introduction Joint Models Inference Procedure Nonparametric Maximum Likelihood Estimation EM Algorithm Asymptotic Properties Simulation Studies Data Application Concluding Remarks E-step and M-step in EM Algorithm Proof of Asymptotic Properties Joint Modeling of Longitudinal and Cure-Survival Data Introduction Joint Models Inference Procedure NPMLEs for Transformation Models EM Algorithm Asymptotic Properties Simulation Studies Data Application Concluding Remarks Proof of Asymptotic Properties x

11 5 Partially Linear Model for Longitudinal Data with Informative Censoring Introduction Joint Models Inference Procedure Sieve Approximation NPMLEs for Transformation Models EM Algorithm Asymptotic Properties Simulation Studies Data Application Concluding Remarks Proof of Asymptotic Properties Summary and Future Research 123 Bibliography 127 xi

12 xii

13 List of Tables 3.1 Simulation results for G R (x) = G T (x) = x Simulation results for G R (x) = x and G T (x) = log(1 + x) Analysis results for the ARIC study. The Fisher transformation is used for testing ρ, while the 50:50 mixture of χ 2 distributions is used for testing variances Simulation results for H(x) = x. t p represents the pth percentile Simulation results for H(x) = 2 log(1 + x/2). t p represents the pth percentile Simulation results for H(x) = log(1+x). t p represents the pth percentile Analysis results for the ARIC study. The 50:50 mixture of χ 2 distributions is used for testing variances Simulation results for H(x) = x and α(t) = sin(πt) exp(t/2)/{1 + exp(t/2)} based on m=6 control points of B-spline curves. τ p represents p% of τ (study duration) Simulation results for H(x) = x and α(t) = sin(πt) exp(t/2)/{1 + exp(t/2)} based on m control points of B-spline curves. τ p represents p% of τ (study duration) Simulation results for H(x) = log(1+x) and α(t) = sin(πt) exp(t/2)/{1+ exp(t/2)} based on m=6 control points of B-spline curves. τ p represents p% of τ (study duration) Simulation results for H(x) = log(1+x) and α(t) = sin(πt) exp(t/2)/{1+ exp(t/2)} based on m control points of B-spline curves. τ p represents p% of τ (study duration) Joint analysis results of the medical costs data. The 50:50 mixture of χ 2 distributions is used for testing variances xiii

14 xiv

15 List of Figures 3.1 Log-likelihood surface under the logarithmic transformations for the ARIC study. The x-axis and y-axis correspond to the transformation parameter γ for recurrent events and terminal event, respectively Predicted survival probability (a) and the expected longitudinal SBP levels (b) for a subject who had one CHD event at the 5th year of study. The solid curves are point estimates, and the dotted curves are the 95% confidence bands In the ARIC data (a) Kaplan-Meier survival curve of the entire study population; (b) Estimated survival curve of the non-immune subpopulation under the joint cure-survival model with the proportional odds structure. The solid curves are point estimates, and the dotted curves are 95% confidence bands Predicted marginal survival rates of the entire population using the results in Table 4.4. The rates beyond the cure threshold are interpreted as the immune fractions or the cure rates (CR). Reference is taken for age of 54, female, HDL-cholesterol 42 mg/dl, LDL-cholesterol 136 mg/dl, no hypertension medication use, never smoking, and no diabetes Example of basis functions (cubic B-spline) for time t in [0, 1] under 5 control points {0.1, 0.15, 0.2, 0.4, 0.7} Simulation results for the baseline coefficient function by (a) H(x) = x and α(t) = sin(πt) e t 2 /(1 + e t 2 ); (b) H(x) = log(1 + x) and α(t) = sin(πt) e t 2 /(1 + e t 2 ); (c) H(x) = x and α(t) = (t 0.8) 2 ; and (d) H(x) = log(1 + x) and α(t) = (t 0.8) 2. The solid curves are true values, the dashed curves are estimates under m=3, the dash-dotted curves are estimates under m=6, and the dotted curves are estimates under m= Bayesian information criterion (BIC) for the transformation H(x) = log(1 + ηx)/η and the number of control knots (m). From top to bottom, the dot-long-dashed curve is for m=8, the long-dashed curve is for m=7, the short-dashed curve is for m=4, the dot-short-dashed curve is for m=6, the dotted curve is for m=3, and the solid curve is for m= Baseline coefficient function of hospital visit time in the medical cost data under the best fit of transformation H(x) = 2 log( x) and 5 control points. The solid curves are estimates from the joint model, the dashed curves are estimates from the marginal model, and the dots are residual means of {Y (t) ˆβ T X 1 (t)} xv

16 xvi

17 Chapter 1 Introduction Joint modeling of longitudinal data and counting processes becomes increasingly popular in a wide range of applications. In these applications, the longitudinal data serve as an outcome variable or a covariate with measurement errors, which are observed at a series of times, while the counting process often represents time to single- or multiple-endpoints, informative observation process, or informative censoring. Joint modeling starts from separate model building for each process and links the models together via correlated or common latent random effects in a variety of ways. Using the joint modeling approach, we can build a model to assess the effect of covariates on both longitudinal measures and time to events, can optimize the use of data through the information shared between the processes, and can correct the biases due to the dependence between the processes. In this dissertation, we focus on simultaneous inferences for both longitudinal measures and time to single or multiple events, while accounting for the dependence between them.

18 1.1 Joint Models of Longitudinal Data and Recurrent Events with Informative Terminal Event We first consider joint modeling of longitudinal data and recurrent events along with another event that discontinues further observations, such as death. We refer to the latter event as a terminal event. Examples of recurrent events include multiple strokes, the number of bladder tumors, or informative measurement times such as emergency hospital visiting times. To model such a complicated system, we propose joint models; a linear mixed effects model is used to model longitudinal data, and a broad class of transformation models is used for the cumulative intensity and hazard functions of recurrent and terminal events, respectively. Through transformations, the proposed method is applicable more generally without the proportional hazards or odds assumption. Random effects in the longitudinal model and other dependent random effects in the recurrent event model are shared in the terminal event model, and hence they account for their respective dependencies with the terminal event. 1.2 Joint Modeling of Longitudinal Data and Cure- Survival Data We next focus on the joint analysis of longitudinal and cure-survival data. By curesurvival data, we mean time-to-event data in which a certain proportion of patients never have any event during a sufficiently long follow-up period. These patients are believed to being cured by treatment, such as radiation therapy or an initial surgery. The potential of being cured can produce a heavy tail probability in the survival curve, and ignoring the true cure proportion may be a source of bias in the estimates of model parameters. 2

19 To take into account the possibility of patients being cured in survival data, we model time to event through the promotion time cure model, jointly with a linear mixed effects model for longitudinal data. The promotion time cure model does not separate the population into cured or uncured subpopulations intentionally, unlike other commonly used mixture cure rate models, and hence it does not involve identifiability issues. Conditional on covariates and the shared random effects between the two models, we assume longitudinal data are independent of cure-survival data. The proposed method is flexible in terms of the fact that the proportionality assumption does not need to be true for the survival event. 1.3 Partially Linear Model for Longitudinal Data with Informative Censoring Longitudinal data analysis has been challenged by informative censoring where the censorship can provoke biases in estimating model parameters. Most existing methods for jointly modeling longitudinal data and censored event assume the full parametric specification for the mean structure of longitudinal responses. While parametric approaches are useful, questions always arise about the adequacy of the model assumptions. Apparently, many longitudinal studies, for example HIV/AIDS clinical trials, show that the parametric models are not sufficient to reveal the complicated patterns of responses with covariates in practice. This motivates us to consider a partially linear model that combines the unspecified underlying trajectory of longitudinal responses with linear covariate effects. Specifically, we propose a partially linear model for longitudinal responses and a transformed survival model for informative censoring. This semiparametric modeling approach allows sufficient flexibility to disclose complex patterns of longitudinal re- 3

20 sponses. In the proposed method, the dependence of longitudinal data on informative censorship is modeled by shared latent effects. 4

21 Chapter 2 Literature Review In this chapter, we review literature on statistical methods for longitudinal and survival data in Section 2.1, for longitudinal and cure-survival data in Section 2.2, for longitudinal data and recurrent events in Section 2.3, and for recurrent and terminal events in Section Models for Longitudinal and Survival Data In survival analysis, the most attractive models are the Cox proportional hazards model (Cox, 1972) and the proportional odds model (Bennett, 1983), which have been fully explored in theory and extensively used in practice. For two sets of covariate values, the proportional hazards models assume that the associated ratio of the hazards to be constant over time, while the proportional odds models assume the associated odds ratio of survival to be constant over time. The two models are special cases of linear transformation models, which provide many useful alternatives. In Section 2.1.1, we review the transformation models for survival analysis. These transformation models will be one of the important features of the three topics proposed in this dissertation. In longitudinal data analysis, the main interest lies in the pattern or mean changes of responses measured at a series of observation times. To identify

22 the complicated trajectory of repeated measures, there has been increasing interest and activity in the general area of partially linear regression models. In Section 2.1.2, we review the methods and techniques developed for the partially linear models. The acquired knowledge and skills for the partially linear regression models will be an essential part for accomplishing the proposed work in Chapter 5. In longitudinal and survival data analysis, joint modeling approaches are one of the most popular ways to describe or control the dependence between longitudinal data and a time-to-event from the same subject. Depending on the purpose of study, various joint modeling approaches have been useful in different applications. In Section 2.1.3, we review the various joint modeling approaches for longitudinal and survival data Transformation Models for Survival Data A class of transformation models for survival functions was proposed by Cheng et al. (1995), in which an unknown transformation of the survival time is linearly related to the covariates with completely specified error distributions. Specifically, let T be the failure time and let Z be a vector of covariates. We denote the survival function of T given Z by S Z (t). Then, the Cox proportional hazards model can be written as log( log(s Z (t))) = H(t) + β T Z, and the proportional odds model can be written as logit(s Z (t)) = H(t) + β T Z, where H(t) is a completely unspecified strictly increasing function, and β is a vector of unknown regression coefficients. A natural generalization of these models is g(s Z (t)) = H(t) + β T Z, 6

23 where g is a known continuous and decreasing function. It is easy to see that the above equation is equivalent to the linear transformation model by Cheng et al. (1995), H(t) = β T Z + ɛ, (2.1) where ɛ is a random error with a known distribution function F, where F = 1 g 1. If F is the extreme value distribution F (s) = 1 exp{ exp(s)}, (2.1) is the proportional hazards model, while if F is the standard logistic distribution, that is P [ɛ > s] = exp(s)/{1 + exp(s)}, (2.1) is the proportional odds model. We note that model (2.1) is appealing in that it is a familiar linear model and includes the proportional hazards and the proportional odds models as special cases. However, model (2.1) cannot handle time-dependent covariates or cannot be generalized to counting processes such as recurrent events. Zeng and Lin (2006) proposed a class of semiparametric transformation models for general counting processes to accommodate time-varying covariates on the intensity functions of recurrent events. In particular, let N (t) be the number of events that occurred by time t, and let Z( ) be a vector of time-varying covariates. Then, the cumulative intensity function for N (t) conditional on {Z(s); s t}, denoted by Λ Z (t), takes the form ( t ) Λ Z (t) = G R (s) e βtz(s) dλ(s), (2.2) 0 where R ( ) is the indicator process for the risk set, Λ( ) is an arbitrary increasing function, and G is a continuously differentiable and strictly increasing function with G(0) = 0, G( ) = and G (0) > 0. As examples of the transformation function 7

24 G( ), the class of Box-Cox transformations, G(x) = (1 + x)ρ 1, ρ 0 ρ with ρ = 0 corresponding to G(x) = log(1 + x) and the class of logarithmic transformations G(x) = log(1 + γx), γ 0 γ with γ = 0 corresponding to G(x) = x can be considered. In both cases, the choice of G(x) = x yields the proportional intensity or hazards models, while G(x) = log(1+x) leads to the proportional odds models. We note that when N (t) has a single jump at the survival time T and Z is time-invariant, (2.2) reduces to the linear transformation model (2.1) in that log Λ(T ) = β T Z + log G 1 ( log(ɛ )), where ɛ has a uniform distribution. Zeng and Lin (2007a) further extended the class of semiparametric transformation models for the intensity function of counting process with random effects, which allows non-proportional intensity and various frailty distributions. By introducing the random effects, the proposed models account for the dependence of the recurrent event times within the same subject. Let X( ) and Z( ) be vectors of possibly time-dependent covariates associated with the fixed and random effects, respectively. Conditional on {Z(s), X(s), b ; s t}, the cumulative intensity function for N (t) has the form of ( t ) Λ(t X, Z; b) = G R (s) e βt X(s)+b TZ(s) dλ(s), (2.3) 0 8

25 where b is a set of random effects with a parametric density function. These models are substantially flexible in the sense that one can have a wide variety of options for the transformation G as well as the distribution of the random effects Partially Linear Models for Longitudinal Data Parametric regression models for longitudinal data have received tremendous attention, and the related methods have been well developed. However, a major limitation of these methods is that the fully parameterized mean structure may be insufficient in modeling the complicated relationship between the responses and covariates in various longitudinal studies. Examples include trajectories of CD4 cell counts in HIV/AIDS research (Zeger and Diggle, 1994; Lin and Ying, 2001; Huang et al., 2002; Brown et al., 2005); time-varying effects of gender and HIV status on the growth of infants born from HIV infected mothers (Hoover et al., 1998); age effects on childhood respiratory disease (Diggle et al., 2002); and treatment effects on the longitudinal number of bladder tumors (Sun et al., 2005; Liang et al., 2009). These practical applications encouraged significant developments of nonparametric regression methods for longitudinal data, in which unspecified functions of time or covariates provide enough flexibility to reflect the complicated relationship between longitudinal outcomes and covariates. Despite the fact that, a semiparametric partially linear regression model is more desirable than modeling every covariate effect nonparametrically in many cases, only limited work has been done on semiparametric regression for correlated data. We review three ways of estimating parameters in the semiparametric regression models using kernal smoothing, smoothing splines, and regression splines. Kernal smoothing was considered by Zeger and Diggle (1994) and Lin and Carroll (2001) for models with linear covariate effects and a nonparametric function of time with correlated observations, among others. Let Y ij = Y i (t ij ) (i = 1,..., n; j = 9

26 1,..., m i ) be the jth outcome of the ith subject at time t ij. Zeger and Diggle (1994) and Moyeed and Diggle (1994) proposed a semiparametric mixed effects model for longitudinal data Y ij = µ(t ij ) + X T ijβ + W i (t ij ) + ɛ ij, (2.4) where µ(t) is a twice-differentiable smooth function of time t, β is a vector of regression coefficients associated with covariates X ij, W i (t) is a subject-specific stationary Gaussian process with mean zero, and ɛ ij is a white measurement noise with constant variance σ 2. They suggested a backfitting procedure which initially estimates µ(t) by a kernel smoother with the bandwidth parameter chosen via cross-validation, and then iteratively estimates µ(t) and β using generalized least squares. For clustered data, Lin and Carroll (2001) considered a marginal partially generalized linear model and the profile-kernel method where the nonparametric function is estimated using the local linear kernel regression and the regression coefficients are estimated using the profile estimating equations. Surprisingly, the resulting regression parameter estimators by the conventional profile-kernel method failed to achieve semiparametric efficiency. A smoothing spline can be an alternative choice of the nonparametric estimation of µ(t), which uses a piecewise polynomial function with all the observation times used as knots and smoothness constraints imposed at the knots. The most commonly used smoothing spline is the natural cubic smoothing spline, which approximates µ(t) by a piecewise cubic function with boundary constraints. The natural cubic smoothing spline was studied by Zhang et al. (1998) to estimate the nonparametric function of time in the partially linear model which was expanded from (2.4) with the addition of subject-specific random effect terms. They estimated β and µ(t) as a natural cubic 10

27 spline by maximizing the penalized likelihood function with the penalty term λ 2 T2 T 1 [µ (t)] 2 dt = λ 2 µt Kµ, where λ 0 is a smoothing parameter controlling the balance between the goodness of fit and the roughness of the estimated µ(t), T 1 and T 2 specify the range of t, µ = (µ(t 11 ),..., µ(t n,mn )) T, and K is the nonnegative definite smoothing matrix defined in the equation (2.3) of Green and Silverman (1994). A key feature of this approach is that the proposed semiparametric model can be represented as a modified parametric linear mixed model. Therefore, the smoothing parameter and variance components can be estimated simultaneously using the restricted maximum likelihood estimator. Another attractive method to estimate the nonparametric function is regression splines. The smoothing spline has the merit of not involving the knot selection issue since it uses all the observation points as knots. However, when the sample size is large, computational demands substantially grow and make it difficult to work properly. In contrast, a key advantage of regression splines is its computational simplicity. The regression splines is a basis function-based nonparametric regression method, which uses a small number of knots and implements a parametric regression using the bases. The most commonly used basis function for regression splines is the B-spline basis. Rice and Wu (2001) adopted the B-spline basis with equally spaced knots in estimating µ(t) and a smooth random function W i (t) in (2.4). The approximated mean function is, µ(t) = p ξ k B k (t), k=1 where {B k ( )} is a basis for spline function on the time range with a fixed knot 11

28 sequence. Similarly, the random function for the ith subject can be approximated with splines W i (t) = q ν ik Bk (t), k=1 where { B k ( )} is a basis for random spline function, which may be a different basis than {B k ( )}, and ν ik are random coefficients with mean zero and covariance matrix V. Then, conditional on p and q, the approximated model is a classical linear mixed effects model. Estimation of the parameters β, ξ, σ 2, and the covariance matrix V can be accomplished by the EM algorithm. In terms of regression splines method, choices of the number and location of the knots for the splines are critical since estimation of µ(t) and W i (t) could be very sensitive to these choices. Rice and Wu (2001) suggested using model selection techniques such as Akaike information criterion (AIC) and Bayesian information criterion (BIC), and leave-one-subject-out cross-validation Joint Models for Longitudinal Data and Survival Event Analysis of longitudinal and survival data can be classified into three categories, depending on how one factors the joint distribution of repeated measurements and an event time to meet the study objective. A joint model of the vector of repeated measurement Y and the event time T corresponds to the factorization f(y, T ) = f(y T )f(t ) = f(t Y )f(y ), where f(.) denotes a density function. The three categories are referred to as a selection model, a pattern-mixture model, and a simultaneous model. First, in selection models, time-to-event is the endpoint of interest, and the common primary objective of the study is to assess the relationship between the event time and longitudinal covariate process with measurement error. One example is modeling the probability 12

29 of death given trajectory of CD4 cell counts, that is f(t Y ). Second, in patternmixture models, the repeated measures are the primary endpoint, and investigators are focusing on modeling f(y T ) and mainly interested in the effect of covariates on the longitudinal outcomes, while accounting for possible correlation with an event such as non-ignorable dropout or death. In these cases, the longitudinal process is subject to right-censoring because it is unobservable after the censoring time. Third, in simultaneous models, repeated measures and survival time are both important outcomes, namely f(y, T ) are focused. The primary goal of the joint analysis is to evaluate simultaneously the effect of covariates on the two types of outcomes, while accounting for the relationship between longitudinal and event time data. In all three types of joint models, it is commonly assumed that observation times of the longitudinal outcomes are usually not informative because they are measured at scheduled follow-up visits. Recent literature is briefly reviewed in the subsequent paragraphs. Selection models The association of longitudinal covariates with a failure time as the primary endpoint can be assessed through joint modeling of the Cox proportional hazards model of the failure time and a random process model of the longitudinal covariates when the longitudinal covariates are intermittently measured with errors. In this situation, the longitudinal covariates may not be observed at the failure times. The presence of random error in a measured covariate causes the parameter estimators to be biased toward the null (Prentice, 1982). A naive approach is to substitute the closest observed covariate value prior to the failure time, often termed last value carried forward, in the Cox partial likelihood for each subject at each failure time. However, it is well known that substituting mismeasured values for true covariates in the Cox model leads to biased estimation (Prentice, 1982). Various approaches have been proposed to deal with measurement error. 13

30 Tsiatis et al. (1995) proposed use of a two-stage method where, in the first stage, empirical Bayes estimates of the random covariates are computed, and in the second stage, they are imputed into the partial likelihood of the Cox model as true values of time-dependent covariates at each event time point. However, the two-stage model did not use any survival information in modeling the covariate process, and hence information is not utilized as efficiently as it could be. In addition, the estimated covariate values from stage one are regarded as fixed in stage two, thus the approach does not convey uncertainty from stage one to stage two. Instead of simply utilizing the predicted covariate values to find the parameters in the Cox model, Wulfsohn and Tsiatis (1997) studied the two-stage method in a different way to maximize the joint likelihood of the covariate process and survival data. The joint maximization makes more efficient use of the data by utilizing information from both the longitudinal covariates and survival simultaneously. Wulfsohn and Tsiatis (1997) used the EM algorithm to estimate all the parameters in covariate and survival processes together, assuming random effects that characterize the longitudinal covariate process are normally distributed. An attractive feature of this likelihood-based approach is its robustness against departure from the normal random effects assumption. Hsieh et al. (2006) confirmed that the likelihood-based procedure with normal random effects can be very efficient and robust as long as the longitudinal data are not too sparse or do not carry too large measurement errors. In contrast, considering other situations where the normality assumption on random effects is violated or regarded as being too strict, Tsiatis and Davidian (2001) proposed conditional score estimators. The underlying idea of the conditional score approach was to treat the random effects as nuisance parameters for which a sufficient statistic may be derived and construct a set of estimating equations conditioning on the sufficient statistic. Then, the resulting estimating equations can be free of the 14

31 random effects. The proposed model is semiparametric in the sense that it does not require any distributional assumption on the random effects. Song et al. (2002) also proposed another semiparametric model in which parametric assumptions on the distribution of random effects may be relaxed to those following a smooth density. They took a likelihood-based approach with the EM algorithm for the estimation procedure. An important feature of this procedure, in contrast to the conditional score approach, is that the investigation of robustness to parametric assumptions on the random effects is possible. Song and Wang (2008) proposed an even more flexible semiparametric model by adapting time-varying coefficients to the proportional hazards model of the failure time, which allows the effect of coefficients to vary over time, in addition to no distributional assumptions on the underlying longitudinal covariate process. An estimation procedure was constructed based on the conditional score estimators, and asymptotic properties of the estimators were derived based on martingale and empirical process theories. Pattern-mixture models Vonesh et al. (2006) presented a joint model of longitudinal and survival data, focusing on the estimation and comparison of serial trends over time while adjusting for possible informative censoring due to patient dropout. They strongly addressed the need for accounting for non-ignorable dropout/death through extensive simulation studies. They used the generalized linear mixed effects model for repeated measurements and a family of accelerated failure time (AFT) models for conditioning the event time. The presented joint model was relatively flexible in that the family of AFT models includes various proportional hazards models (e.g. Weibull, extreme value, piecewise exponential) and non-proportional hazards models (e.g. log-normal). An alternative joint model was introduced by Liu et al. (2007) for medical cost repeatedly recorded at fixed time intervals in the presence of a terminating event, such as death. Both Vonesh et al. (2006) and Liu et al. (2007) 15

32 modeled the terminal event as a function of covariates and linked the terminal event to the pattern of repeated measures through shared random effects by the longitudinal and survival components. Taking the likelihood-based approach, Vonesh et al. (2006) used maximum likelihood (ML) estimation with the approximated observed log-likelihood through the second-order Laplace s method, while Liu et al. (2007) used the ML estimation through the EM algorithm. Simultaneous models Henderson et al. (2000) considered both longitudinal data and recurrent or single event time to be equally important endpoints and jointly formulated them via correlated latent Gaussian processes. For clustered data, Ratcliffe et al. (2004) proposed a joint model for longitudinal and survival outcomes of interest, which linked the two outcomes at the cluster-level random effects. In their method, repeated measures were modeled using a mixed effects model that incorporates both subject-level and cluster-level random effects, and survival data were modeled using a Cox model with the cluster-level random effects to allow for between-cluster heterogeneity. While most of the joint models associated repeated measures with survival data via common random effects or latent processes, Zeng and Cai (2005a) allowed every unobserved random factor to differently affect the longitudinal measure and survival time. Commonly, ML estimation was used with EM algorithm in Henderson et al. (2000), Ratcliffe et al. (2004), and Zeng and Cai (2005a). However, the asymptotic properties of the proposed ML estimators were established for the first time by Zeng and Cai (2005a). 2.2 Models for Cure-Survival Data A cure model is applicable when there exist immunes or long-term survivors in survival data. As a result of cure, cured subjects never experience an event endpoint 16

33 but are censored because cure can never be observed. On the other hand, susceptible subjects would eventually develop the endpoint if followed for long enough. The primary interest in such studies can be on the effect of covariates on the cure rate as well as on the time-to-event. In this section, we review the approaches of modeling cure in survival analysis, which do not involve any longitudinal data, in Sections Mixture Cure Models One of the commonly used cure models is the so-called mixture model, named after the basic concept that the underlying population consists of two subpopulations of the cured and non-cured. The mixture cure model is the mixture of a certain proportion π(z i ) belonging to the cured subpopulation and the remaining fraction 1 π(z i ) being not cured, such that S pop (t Z i ) = π(z i ) + {1 π(z i )}S uc (t), where Z i is the vector of covariates, and S uc (t) is the conditional survival function for the uncured population. It is assumed that all patients in the non-cured subpopulation will eventually experience the event while those in the cured subpopulation will never. Early work on such models was done by Berkson and Gage (1952), Farewell (1982, 1986), and Yamaguchi (1992) under completely specified parametric models. Berkson and Gage (1952) used a mixture of exponential distributions with a constant cure fraction π(z i ) = π. Farewell (1982) adopted the Weibull regression for survival and the logistic regression for the cure fraction give by π(z i ) = exp(β T Z i )/(1 + exp((β T Z i )). (2.5) 17

34 Yamaguchi (1992) applied a cure model with a logistic mixture probability model (2.5) and an accelerated failure time model with generalized gamma distribution. Laska and Meisner (1992) extensively studied the cure model, specifically nonparametric failure time models, adapting Kaplan and Meier (1958) estimation. More recent work has focused on semiparametric approaches, mixtures of the cure fraction modeled through a logistic link (2.5) and the survival distribution with a complete or partial nonparametric component. Taylor (1995) introduced a more flexible mixture cure model, an extension of Farewell (1982), by retaining the conditional survival distribution for uncured individuals as a completely unspecified function. To investigate the effects of covariates on the time to event, other semiparametric mixture models have been proposed (Kuk and Chen, 1992; Sy and Taylor, 2000; Peng and Dear, 2000; Lu and Ying, 2004). Kuk and Chen (1992) estimated the regression parameters first by eliminating the baseline survival function via a Monte Carlo approximation of a marginal likelihood, and then estimated the baseline survival function using an EM algorithm, given the regression parameter estimates. However, Sy and Taylor (2000) and Peng and Dear (2000) studied alternative estimation techniques using the classic EM algorithm, to compute estimates for both the parametric and nonparametric components. The theoretical properties of the resulting estimators for the proportional hazards cure model remain to be established. Lu and Ying (2004) considered a class of transformation models for the event time. They proposed to use generalized estimating equations for parameter estimation, and the asymptotic properties were established by the usual counting process and its associated martingale theory. However, their approach was limited to only time-independent covariates due to the form of transformations. Although the mixture cure model is intuitively appealing, it involves several unresolved issues discussed by Farewell (1986), Laska and Meisner (1992), Taylor (1995), Chen et al. (1999) and Ibrahim et al. (2001). One problem 18

35 associated with the mixture model is identifiability. This arises due to the lack of information at the end of the follow-up period, caused by a significant proportion of censored subjects before the end of study. As a result, we can have difficulties in distinguishing whether the information from the censored subjects should be a part of cured group or susceptible group Promotion Time Cure Models An alternative way to incorporate the cure fraction in survival analysis is the promotion time cure model, or referred to as the bounded cumulative hazard model (Yakovlev et al., 1996). The literature existing on the promotion time cure models is mainly the Bayesian context since the population survival function is improper. These models have been proposed and studied by Yakovlev et al. (1996), Tsodikov (1998), and Chen et al. (1999), among others. The promotion time cure model was motivated by cancer clinical trials under the biological assumption that a patient has N metastatic tumor cells remaining after treatment. Let N i be the number of metastatic cancerous cells of the ith patient, which is an unobservable latent variable. The N i s are assumed to have a Poisson distribution with mean π(z i ). We denote the time for the kth metastatic cancer cell to produce a detectable tumor (promotion time) by T k (k = 1,..., N i ) and assume that, conditional on N i, Tk s are identically independently distributed with the cumulative distribution function F (t). If we understand F (t) = 1 S uc (t), it can be interpreted similarly to the distribution function for the uncured patients in the mixture model. Then, the time to relapse of cancer for the ith patient, defined by T i = min{ T 1,..., T Ni }, has a form of the population 19

36 survival function S pop (t Z i ) = P [N i = 0] + k 1 P [ T1 > t,..., T Ni > t N i = k] P [N i = k] = exp{ π(z i )} + {1 F (t)} k π(z i ) k exp{ π(z i )} k! k 1 = exp{ π(z i )F (t)}. (2.6) In the promotion time cure model (2.6), the survival function is integrated into one formulation regardless of cured or uncured. The hazard function is given by π(z i )f(t), where f(t) = df (t)/dt. Thus, we can see that the model (2.6) retains the proportional hazards structure when the covariates Z i are formulated through π(z i ) = exp(β T Z i ). Moreover, if the regression coefficients β include an intercept term, say β 0, the baseline cumulative hazard function is equal to exp(β 0 )F (t), which implies that the model (2.6) becomes the Cox s proportional hazards model with a bounded baseline cumulative hazard. For the cured patients, the survival rate at t = can be interpreted as the cure rate, i.e., the cure rate is S pop ( ) = exp{ π(z i )} = 0, leading to an improper survival function Transformation of Promotion Time Cure Models In model (2.6), the independent assumption on { T k N i ; k = 1,..., N i } may not be realistic in practice since they have common features shared by the same patient, such as the patient s underlying health condition or dietary habits. As a solution to adjust the correlated cancer progression times, Zeng et al. (2006) have introduced a subjectspecific frailty ζ i and assumed that, given (N i, ζ i ), Tk s are mutually independent with the distribution function F (t). Moreover, ζ i makes the most of an opportunity to reflect the underlying heterogeneity for the rate of metastatic cancer cells by the 20

37 assumption that N i follows the Poisson distribution with mean ζ i π(z i ), conditional on (Z i, ζ i ). Following the similar derivation to (2.6), the resulting survival function for the time to relapse T takes a form S(t Z i ) = E ζi [exp{ π(z i )F (t)ζ i }], (2.7) where E ζi denotes the expectation with respect to ζ i. Explicitly specifying the distribution for ζ i as a gamma distribution with unit mean and variance η, we can express (2.7) as S(t Z i ) = [1 + ηπ(z i )F (t)] 1/η = G η (π(z i )F (t)), (2.8) where G η (.) has a form of transformations with a parameter η such that (1 + ηx) 1/η, η > 0 G η (x) = exp( x), η = 0. This class of transformations includes the proportional hazards model (when η = 0) and the proportional odds model (when η = 1) as special cases. 2.3 Models for Longitudinal Data and Recurrent Events For the analysis of longitudinal data with informative observation times, a variety of joint models have been developed. Instead of considering a common set of observation times across all subjects, Lin and Ying (2001), Lin et al. (2004), and Sun et al. 21

38 (2005), among others, proposed to use counting processes to describe arbitrary observation times. The counting process approach allowed subject-specific observation times through directly adjusted covariate effects, thereby providing a flexible tool for modeling the observation process. For the longitudinal component, Lin and Ying (2001) and Sun et al. (2005) modeled the pattern of longitudinal outcomes using a partially linear model, whereas Lin et al. (2004) modeled that using a nonparametric function of linear coviariate effects. In these models, different assumptions have been made for the longitudinal outcome and observation processes. In Lin and Ying (2001), the observation process is assumed to be independent of the longitudinal outcome process after adjusting for some external covariates. In Lin et al. (2004), the intensity of the observation at time t is assumed to be independent of the longitudinal outcomes at that time point given the past observed data; whereas in Sun et al. (2005), the longitudinal outcome at time t is assumed to be dependent only on some external covariates and the past observation history such as the total and recent numbers of observations. Among them, the commonly used approaches were the marginal models based on estimating equations for both longitudinal data and time processes. Under these marginal approaches, it is challenging to obtain efficient estimators and also impossible to predict future outcomes of an individual given the past information. An alternative approach was suggested by Liang et al. (2009). Based on the idea that the observation process may be correlated with the longitudinal outcomes through some unmeasured confounders even after conditioning on external covariates in practice, they studied the joint modeling approach using random effects. The longitudinal outcomes with irregular observation times were modeled through a partially linear mixed model and the informative observation process was modeled by adopting a frailty nonhomogeneous Poisson process structure. However, their method is 22

39 limited to the case where both the distribution of frailty and the conditional linear mean structure between the random effects in longitudinal and observation processes can be specified. 2.4 Models for Recurrent and Terminal Events In this section, we review previous research on joint modeling of recurrent and terminal events. Statistical methodology and theory for analyzing recurrent event data are typically developed based on non-informative censoring times. In many applications, however, when a failure event serves as a part of the censoring mechanism, meaning that the failure event terminates observing further recurrent events (so-called informative censoring), the independent censoring assumption can be violated. For example, if the rate of recurrent tumors is high in a patient, this patient is also subject to increased risk of death. The most popular solution to model or control the dependence of recurrent events with a terminal event or informative censoring is a joint modeling approach. Joint (or shared) frailty (or random effects) models have been studied by several authors. In these models, the dependence between recurrent and terminal events were specified via a common frailty variable allowed to have a multiplicative effect on their respective rates. The most popular distributional assumption on the frailty was a gamma distribution with unit mean to avoid the non-identifiability issue (Lancaster and Intrator, 1998; Liu et al., 2004; Ye et al., 2007; Huang and Liu, 2007). Lancaster and Intrator (1998) considered joint parametric modeling of recurrent event and survival data, using Poisson processes for the rate functions of the recurrent and terminal events. Liu et al. (2004) considered proportional hazards frailty models where the recurrent and terminal event processes were jointly modeled by a shared gamma frailty. 23

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,