Joint Modeling of Event Time and Nonignorable Missing Longitudinal Data

Size: px

Start display at page:

Download "Joint Modeling of Event Time and Nonignorable Missing Longitudinal Data"

Gerard Fox
5 years ago
Views:

1 Lifetime Data Analysis, 8, , 2002 # 2002 Kluwer Academic Publishers. Printed in The Netherlands. Joint Modeling of Event Time and Nonignorable Missing Longitudinal Data JEAN-FRANÇOIS DUPUY jean-francois.dupuy@univ-ubs.fr Laboratoire de Statistiques Appliquées de l Université de Bretagne-Sud (Sabres), Université de Bretagne-Sud, Vannes, France MOUNIR MESBAH Laboratoire de Statistiques Appliquées de l Université de Bretagne-Sud (Sabres), Université de Bretagne-Sud, Vannes, France Received October 5, 2000; Revised December 6, 2001; Accepted December 26, 2001 Abstract. Survival studies usually collect on each participant, both duration until some terminal event and repeated measures of a time-dependent covariate. Such a covariate is referred to as an internal time-dependent covariate. Usually, some subjects drop out of the study before occurence of the terminal event of interest. One may then wish to evaluate the relationship between time to dropout and the internal covariate. The Cox model is a standard framework for that purpose. Here, we address this problem in situations where the value of the covariate at dropout is unobserved. We suggest a joint model which combines a first-order Markov model for the longitudinaly measured covariate with a time-dependent Cox model for the dropout process. We consider maximum likelihood estimation in this model and show how estimation can be carried out via the EM-algorithm. We state that the suggested joint model may have applications in the context of longitudinal data with nonignorable dropout. Indeed, it can be viewed as generalizing Diggle and Kenward s model (1994) to situations where dropout may occur at any point in time and may be censored. Hence we apply both models and compare their results on a data set concerning longitudinal measurements among patients in a cancer clinical trial. Keywords: time-dependent Cox model, missing covariate values, nonparametric likelihood, EM-algorithm, nonignorable dropout 1. Introduction 1.1. Preliminaries Many survival studies collect longitudinal measures of covariates on each study subject, until occurence of a terminal event such as death, infection or disease progression. Usually, some subjects drop out of the study before occurence of this event. Our interest is on modelling the relationship between time to dropout (the precise definition of dropout will be given later in this introduction) and a longitudinaly measured covariate. Kalbfleisch and Prentice (1980) call such a time-dependent covariate an internal covariate. More precisely, they define an internal covariate to be the output of a stochastic process that is generated by the individual under study. This stochastic process is observed only so long as the individual remains in the study. Let Z be an internal covariate and Z(t) denote

2 100 DUPUY AND MESBAH its history {Z(u), 0 u t} up to time t. Let T denote time to some event. Kalbfleisch and Prentice (1980) define hazard of occurence of this event at time t by ðt j ZðtÞÞ ¼ lim dt!0 Pr½t T t þ dt j ZðtÞ; T tš=dt; ð1þ which is conditional on the covariate process up to t. Specific problems arise when fitting a Cox model (Cox, 1972) with internal covariates. For example, several authors (Cox and Oakes, 1984; Altman and De Stavola, 1994) point out that in a clinical trial comparing two treatments, inclusion of an internal covariate whose path is directly affected by treatment may mask the treatment effect. Other authors discuss estimation of probabilities of the form Pr[T t + DtjZ(t)] (note that Pr[T t jz(t)] ¼ 1). This should also be conducted with care since such probabilities depend on the development of Z between t and t + Dt (Andersen et al., 1993; Altman and De Stavola, 1994). Other references on these issues include Kalbfleisch and Prentice (1980) and Collett (1994). Some new developments involving internal covariates in a Cox model have recently been proposed. They address the problem of measurement error in an internal covariate, and include the work of Tsiatis et al. (1995), Wulfsohn and Tsiatis (1997), Dafni and Tsiatis (1998), and Tsiatis and Davidian (2001). We consider a survival study where values of an internal covariate Z are measured at discrete times until a terminal event occurs. Some sequences of measurements may terminate prematurely (i.e., before occurence of this event). This phenomenon, which truncates the sequence of observations of Z, is called a dropout (Diggle and Kenward, 1994; Little, 1995; Scharfstein et al., 1999). A standard framework to evaluate the relationship between time to dropout and the time-dependent covariate Z is given by the Cox model (Cox, 1972). For a given individual, we assume that changes in the path of Z occur at the times t j of measurement of Z (Z is piecewise constant) and that the value of Z on [t j, t j+1 [ is observed at the end of the interval. Hence, Z is not observed at the time of dropout. Following suggestions of Altman and De Stavola (1994) and Collett (1994), one may nevertheless fit a Cox model to these data by replacing the unobserved value of Z at the event time by the last observed value. However, this approach is not appropriate if Z may vary in the instants preceding dropout. In this paper, we propose a joint modelling approach for dropout time and longitudinal covariate data. This model accomodates possible changes in Z just prior to dropout. We also propose to apply this model in the context of nonignorable dropout Application to Nonignorable Dropout Following Little and Rubin (1987), three main dropout processes can be distinguished in longitudinal studies. A dropout process is said to be completely random when the dropout is independent of both observed and unobserved measurements of the longitudinal variable. A dropout is random when it is independent of the unobserved measurements but depends on the observed ones. A dropout is nonignorable when it depends on unobserved measurements. Under completely random and random dropout

3 JOINT MODELING OF EVENT TIME 101 processes, and provided that there are no parameters in common between the measurement and the dropout models, nor any functional relationship between the parameters describing the measurement process and the parameters describing the dropout process, the longitudinal measurement process can be ignored for the purpose of making likelihood-based inferences about the time-to-dropout model. This property does not hold when the dropout is nonignorable. Recently, a number of methods have been proposed to accomodate nonignorable dropout in longitudinal data. Diggle and Kenward (1994) combined a multivariate linear model for the longitudinal process with a logistic regression model for the dropout. This logistic model allows dependence of dropout on the missing observation at dropout time. Molenberghs et al. (1997) adopted a similar approach for longitudinal ordinal data. These models fall into the class of outcome-based selection models (Little, 1995; Hogan and Laird, 1997b), for which the joint density of the repeated measures vector and dropout time is obtained as the conditional density of the dropout time given the longitudinal outcomes, multiplied by the marginal density of these outcomes. In some situations, dropout is related to a trend over time, rather than to the unobserved value of the covariate. One may then relate dropout time to the longitudinal outcomes through individual random effects used to model the longitudinal process. This yields the random-coefficient-based selection models (Wu and Carroll, 1988; Schluchter, 1992; DeGruttola and Tu, 1994; Ribaudo et al., 2000). An alternative class of models for the joint distribution of repeated measures and dropout time is called pattern-mixture (Little, 1995; Hogan and Laird, 1997a). This approach stratifies the sample by time of dropout and then models the distribution of the repeated measures within each stratum. Detailed reviews of these various approaches can be found in Little (1995), Hogan and Laird (1997b), and Verbeke and Molenberghs (2000). In our setting, the value of the internal covariate at dropout is unobserved. Hence, we may interpret the dropout as being nonignorable. Moreover, the joint model we suggest naturally falls into the class of outcome-based selection models and can be seen as a generalization of Diggle and Kenward s selection model (1994) to situations where dropout may occur at any point in time and may be censored. The selection and pattern-mixture models cited above assume that dropout occurs at one of the pre-specified measurement times of the longitudinal variable (it must be noted that Hogan and Laird (1997a) proposed a mixture model also allowing for censoring of the dropout times). Our work is motivated by a data set concerning quality-of-life (QoL) of subjects involved in a cancer clinical trial. QoL values are measured up to disease progression and constitute the internal covariate of interest. Dropout may occur on some subjects before disease progression. Since our interest is on characterizing the relationship between dropout and the longitudinal covariate, we consider dropout time as being censored when disease progression occurs first. In Section 2, we define notation and derive the joint likelihood for the dropout time and the internal covariate Z. A Markov model is assumed for the marginal distribution of this covariate. In Section 3, we show how the EM-algorithm (Dempster et al., 1977) can be applied to estimate the parameters in the proposed model. In Section 4, we apply our model to actual data and compare the results to those obtained using the model of Diggle and Kenward (1994). Discussion follows in Section 5.

4 102 DUPUY AND MESBAH 2. The Model 2.1. Preliminaries Let Z denote an internal covariate and Z i (t) denote the value of Z at time t for the ith individual under study (i ¼ 1,..., n). Repeated measurements of Z are taken on each subject at common fixed times t j, j ¼ 1, 2,... (t 0 ¼ 0). In the following, it will be convenient to write Z ij to denote the response value for the i-th subject on [t j, t j+1 [. If l 2 [t j, t j+1 [, let [l] ¼ j 1. With this notation, Z i[l] denotes the response value on [t j 1, t j [, recorded at time t j on the ith subject (i.e., the last recorded value of Z on the ith subject before time l). Let Z i(l) ¼ (Z i[l], Z i (l)) T, i ¼ 1,..., n. Let T i denote the time to dropout for the ith individual (i ¼ 1,..., n). If we denote by C i the potential censoring time for the ith individual, then we actually observe S i ¼ min(t i,c i ) and the corresponding censoring indicator D i ¼ 1 {Ti C i }, where 1 {A} is the indicator of an event A. The value of Z at s i, denoted z i (s i ), is not observed. The time-to-dropout model assumes that the hazard of dropout is related to the internal covariate Z through a time-dependent Cox model (Cox, 1972). More precisely, it assumes that the hazard of dropout at time t depends on the covariate history up to t through the last observed value before t, Z i[t], and through the current unobserved value Z i (t). It is defined by: ðt j Z i ðtþþ ¼ 0 ðtþexp T Z iðtþ ¼ 0 ðtþexpð 0 Z i½tš þ 1 Z i ðtþþ; ð2þ where 0 (.) is an unspecified baseline hazard function and ¼ ( 0, 1 ) T is a vector of unknown regression parameters. Some observed (possibly time-dependent) covariates X may also be included in the model, by defining the hazard as 0 (t) exp( T Z i(t) + T X i (t)). However, for simplicity, we do not include X in what follows. For ease of exposition, we shall develop the model by assuming that the hazard of dropout at time t is a function of the observed longitudinal process before t only through the last observed value. This can be generalized to a more complex functional of the observed history of Z. Let f (z i0,..., z ij ) denote the joint distribution of the responses Z i0,..., Z ij Likelihood Function The observed data for each subject i is y i ¼ (s i, i, z i0,..., z i[si ]). We assume that conditional on (z 0,..., z(s)), T and C, the censoring time, are independent. This corresponds to the usual assumption of independent censoring in survival studies. We also assume that the distribution of C does not depend on,, 0 nor on z(s). This last hypothesis can be viewed as an analogue of the noninformative censoring hypothesis stated by Nielsen in the context of frailty models. Here it is motivated by the fact that censoring arises when disease progression occurs first. In this case, a subject is removed from the longitudinal study by doctors, who base their choice on clinical criteria which are independent of the current value of Z. In other situations however, C might depend on z(s).

5 JOINT MODELING OF EVENT TIME 103 We obtain the likelihood for the observed data by first writing the likelihood for ( y i, z i (s i )) as the likelihood for (s i, i ) given z i0,..., z i[si ], z i (s i ), times the marginal density of (Z 0,..., Z [s], Z(s)). We then integrate over the variable Z(s). This conditioning naturally arises from the definition (1) of the hazard function with internal covariates (Kalbfleisch and Prentice, 1980). The contribution from a single observation y i to a partial likelihood, obtained by discarding terms adhering to censoring, can then be written: Z L i ð; ; 0 ð:þþ ¼ i 0 ðs iþexp i T z iðsi Þ R f ðz i0 ;...; z i ðs i ÞÞ dz i ðs i Þ: Z si 0 0 ðuþe T z iðuþ du ð3þ However, the maximum of this likelihood does not exist for 0 ranging over the space of positive functions on [0, + 1). We modify the likelihood by constraining the cumulative baseline hazard R 0 t 0 (u) du to be a step function taking positive jumps 1,..., p at the distinct observed dropout times u 1 < u 2 <... < u p. In the special case of no missing covariate values, this approach results in the usual maximum partial likelihood estimator of and the Breslow estimator of R 0 t 0 (u) du. Similar approaches have been proposed in the context of frailty models (Nielsen et al., 1992), for Cox regression with measurement error in the covariate (Wulfsohn and Tsiatis, 1997) and for Cox regression with missing covariates (Martinussen, 1999). The resulting likelihood is usually called nonparametric likelihood and maximum likelihood estimation yields the so-called nonparametric maximum likelihood estimators (although the only part that is really nonparametric is just the representation of the baseline hazard). In our case, a proof of consistency of the nonparametric maximum likelihood estimators by Dupuy et al. (2001) seems to point to the asymptotic validity of the proposed method. Work on a proof of asymptotic normality of the estimator of is still under progress. It relies on techniques based on empirical process theory, used by Murphy (1995) to establish the asymptotic theory for the frailty model. Letting ¼ (, 1,..., p, ) be the vector of all parameters of the joint model and using Eq.(3), a nonparametric likelihood function is obtained by multiplying the following contributions over subjects: Z " # " # Y p L i ð Þ¼ exp i T z iðsi Þ Xp l e T z iðul Þ 1ful s i g R l¼1 i1 ful ¼s i g l f ðz i0 ;...; z i ðs i ÞÞdz i ðs i Þ: l¼1 ð4þ Without any missing information, a full likelihood for, 1,..., p, can be factorized into one component for the dropout process, involving and 1,..., p, and another component for the longitudinal measurement process, involving. Estimation of (, 1,..., p ) and would then be achieved by separate maximisation of these two

6 104 DUPUY AND MESBAH components. We show later that in our case, a similar partition of the likelihood function can be achieved through the use of the EM-algorithm (Dempster et al., 1997). We finally introduce a particular form for the longitudinal variable, assuming a firstorder Markov model. In a first-order Markov model, f ðz ij j z i; j 1 ;...; z i0 Þ¼f ðz ij j z i; j 1 Þ: It follows that the joint density of the responses Z i0,..., Z ik from subject i can be written as: f ðz i0 ;...; z ik Þ¼f ðz i0 Þ Yk j¼1 f ðz ij j z i; j 1 Þ: The distribution of Z ij is conditioned by the previous response z ij 1, considered as an explanatory variable once it has been observed. Since such an explanatory variable is not available for the measurement obtained at time t 1, we drop that response from the model, except as a fixed value upon which Z i1 is conditioned. We further assume that f (z ij jz i, j 1 )is a Gaussian probability density function with mean z i, j 1 and variance 2 (here ¼ (, 2 )). 3. Parameter Estimation Using the EM-Algorithm 3.1. The EM-Algorithm Inference for this joint model is conducted using the EM-algorithm (Dempster et al., 1977). EM iterates between an E-step where the expected log-likelihood of the complete data conditional on the observed data and the current estimate of the parameters is computed, and an M-step where parameter estimates are updated by maximizing this expected log-likelihood. We consider ( y i, z i (s i )) as being the set of complete data for the ith subject. At the (m + 1)th iteration of the EM-algorithm, the E-step consists of calculating the conditional expectation E(log L c ( )jy i, i ¼ 1,..., n; (m) ) of the complete data loglikelihood given the observed data y i, for the fixed set of parameter estimates (m). To simplify notation, we denote E(jy i, i ¼ 1,..., n; (m) )bye (m)(). Letting Y i(u) ¼ 1 {t½sišþ1 us i } and W i (u) ¼ 1 {u<t½sišþ1}, E (m)(log L c( )) is given by: E ðmþðlog L cð ÞÞ ¼ Xn i¼1 " X p l¼1 i 1 ful ¼s i glog l Xp l¼1 l e T z iðul Þ W i ðu l Þ þ log f ðz i0 ;...; z i½si ŠÞþ i E ðmþ T Z iðsi Þ Xp l¼1 l E e ðmþ T Z iðul Þ Y i ðu l ÞþE log ðmþ f ðz i ðs i Þjz i½si ŠÞ # :

7 JOINT MODELING OF EVENT TIME 105 It can be seen that E (m)(log L c( )) can be separated in two components, one involving and 1,..., p and another involving. In the M-step, we (m)(log L c( ))/@ ¼ 0 (where is the vector ( 0, 1, 1,..., p,, 2 ) T ), which results in updated estimates (m+1), (m+1) 1,..., (m+1) p, (m+1). The M-step proceeds as follows: a) We first maximize E (m)(log L c( )) with respect to and 2 to obtain the updated estimates (m+1) and 2(m+1). b) No closed-form solution exists for updating. However, the updated estimate may be approximated using the Newton-Raphson algorithm. At the (m + 1)th iteration of the EM-algorithm, the (d + 1)th iteration of the Newton-Raphson algorithm is 1: ðdþ1þ ¼ ðdþ þ I ðmþ ð ðdþ Þ U ðmþ ðdþ where U (m) () and I (m) () are respectively the score and information for. c) We finally update l (m+1) (l ¼ 1,..., p). The EM-algorithm requires computation of conditional expectations of the form E (m)( (Z i(s i ))). Because Z is continuous, numerical integration is required. Since Z is normally distributed, we evaluate these expectations using Gauss-Hermite quadrature (Crouch and Spiegelman, 1990). Closed-form formulas for (m+1), 2(m+1), l (m+1), and expressions for U (m) ( (d ) ), I (m) ( (d ) ) and the evaluation of E (m)( (Z i(s i ))) appear in the appendix Estimation of Standard Errors Under the assumption that the cumulative baseline hazard is a step function with jumps at the distinct dropout times, we derived maximum likelihood estimators of the parameters. A convenient consequence of this approach is that we can use likelihood-based methods to estimate asymptotic variance of the estimators. Asymptotic variance of the maximum likelihood estimators is usually estimated by the inverse of the observed Fisher information matrix. In presence of missing data, we may obtain it using formulas given by Louis (1982). However, in the case of nonparametric likelihood estimation, the information matrix for the observed data is obtained using the following expression: n 2 log L i ( T, evaluated at the maximum likelihood estimate ^n of the parameters. Estimation of the asymptotic variances requires inversion of a high dimensional matrix. Martinussen (1999) nevertheless uses this approach in his work on Cox regression with incomplete covariate measurements. An alternative method consists of estimating the asymptotic variance by inverting the negative of the second order derivatives of a profile log-likelihood. We explain this approach using the asymptotic variance of ^n as an example. The profile log-likelihood for is given by the following expression: log L P () ¼ log (P n i¼1 L i ( ^, )), where ^ ¼ (^, ^1,..., ^p, ^2) maximizes the log-likelihood log (P n i¼1

8 106 DUPUY AND MESBAH L i ( )) for fixed. Noting 2 1 logl P 2 ¼ " P n logðl i ð ^nþþ ; T where [] denotes the sub-matrix of [] associated with the parameter (Gourieroux and Monfort, 1996), we can estimate the asymptotic variance of ^n by 2 log L P (^n)/@ 2 ] 1. Since analytical differentiation of log L P may be cumbersome, we use numerical differentiation instead. In particular, we use the following central-difference approximation (Nocedal and Wright, 1999) 2 log L P (^n)/@ 2 : log L P ð^n Þ 2log L P ð^nþþlog L P ð^n þ Þ 2 ( is an arbitrarily small perturbation). For a given value of, ^ is calculated using the same EM-algorithm as described in Section 3.1, applied while keeping the value of fixed. Similar procedures give estimates of the asymptotic variances for the maximum likelihood estimators of the other parameters. 4. Example As an illustration of the proposed joint model, we now analyze data from a cancer clinical trial. We also explain how this model can be used in longitudinal studies with dropout to distinguish between random and nonignorable dropouts. We then compare our results to those obtained by fitting the model of Diggle and Kenward (1994) (DK in the following) to the same data. The study involved 120 patients, who were treated with a new therapy against cancer. One objective of this trial was to evaluate the effect of this therapy on overall QoL, measured by a questionnaire. This questionnaire produced a quantitative score ranging between 0 and 7, with a higher score indicating a better overall QoL. Each patient was asked to regularly fill out this questionnaire until disease progression. The number of scores per patient ranges between 1 and 13. A description of the data is provided by Table 1, which may be read as follows. Among 29 patients that reported 2 QoL measurements, 27 left the longitudinal study following dropout. Dropout times were censored for 2 patients who left the longitudinal study following disease progression. Disease progression occured in 31 of 120 patients. The remaining subjects dropped out the study. Due to the study design, the QoL score at dropout or disease progression was unobserved. A simple approach to the problem of fitting a Cox model to these data would be to set the value of Z at the dropout time equal to the last observed value of Z (i.e., to assume

9 JOINT MODELING OF EVENT TIME 107 Table 1. Numbers of scores and dropouts among the 120 patients included in the analysis. Number of completed QoL assessments Number of patients Number of dropouts Number of censoring events Table 2. Maximum likelihood estimates (standard errors) from the random dropout and the nonignorable dropout analysis. MLE Random dropout Nonignorable dropout ^0n (0.078) (0.080) ^1n (0.087) ^n (0.008) (0.009) ^2n (0.039) (0.040) log-likelihood that (t Z i (t)) ¼ 0 (t) exp ( 1 Z i (t)) and Z i (t) ¼ Z i[t] ). This approach may be viewed as a random dropout analysis, when considered in the context of longitudinal studies with dropout. Gender and a measure of gravity of illness at baseline were also recorded for each patient. None of these covariables had a significant effect when included in the model, hence we do not include them in the following analysis. We then fit the model described in Section 2. Computations were carried out in the programming language Scilab (Scilab Group, 1998) (the interested reader may obtain this program at the following address The EM-algorithm was stopped and considered to have converged when the increment in the log-likelihood was less than Starting values for the EM-algorithm were given by the parameter estimates resulting from the random dropout analysis. The number of iterations needed to reach convergence was about 20. Table 2 displays the EM-algorithm estimates together with standard errors, as well as the log-likelihood evaluated at the maximum likelihood estimate ^n of. The estimated hazard function resulting from the first analysis is ^0(t) exp( 0.167Z(t)). The negative value for the regression parameter 0 implies that individuals with low levels of QoL are more likely to dropout. This is natural as high values of Z indicate a better overall QoL. Hence, we would expect that a patient feeling pretty well is more likely to complete the QoL questionnaire.

10 108 DUPUY AND MESBAH The estimated hazard resulting from the joint modelling approach is ^0(t) exp(0.089z [t] 0.316Z(t)). As suggested by various authors (Diggle and Kenward, 1994; Verbeke and Molenberghs, 2000), some insight into this model can be obtained by rewriting the hazard rate as a function of the increment Z(t) Z [t] and the level Z [t] of the outcome variable, as ^0ðtÞexpð 0:316ðZðtÞ Z ½tŠ Þ 0:227Z ½tŠ Þ: ð5þ The value confirms that low levels of QoL are associated with higher rates of dropout. Moreover, the joint model we propose suggests that the hazard of dropout is associated with a change in the level of the variable Z. More precisely, dropout increases with a decreasing evolution in the QoL outcome. Finally, the conditional density of Z ij (given z i, j 1 ) is Gaussian with an estimated mean of z i, j 1 (suggesting an overall decrease in QoL over time) and variance of The suggested joint model may prove useful for distinguishing between random and nonignorable dropout. For example, for our data, one may suspect dropout to be nonignorable since QoL values were measured using a questionnaire that was filled out by each patient. Hence, it is likely that a patient feeling poorly will not complete the questionnaire, implying that dropout is nonignorable. Although a rigorous test for testing nonignorable vs. random dropout is not yet available, examination of Table 2 (where ^1n ¼ with a standard error 0.087) may confirm this opinion. The joint model we suggest accomodates censoring and continuous dropout times. In the following, we will refer these two features of our data to as JMAC (Joint Model s Application Conditions). The DK model considers less general situations in that it does not accomodate the JMAC. However, it may be fitted to our data by relaxing these conditions in the following way. In line with Diggle and Kenward (1994), we assumed that the probability for a dropout at time t j ( j 2), given the subject was still under study at time t j 1, follows a logistic regression model: logit Pr[T i ¼ t j T i t j, z i0,...] ¼ 0 z i, j z ij. When this model was applied to our data, T i denotes the time of the first missing measurement following dropout. z ij denotes the unobserved measurement of Z at t j. Subjects who reached disease progression were also treated as nonignorable dropout since the value of Z at the occurence of this event was unobserved. We fit the DK model to our data using the PCMID function in Splus OSWALD suite (Smith, 1997). This led to the following fitted model: logit P r ½T i ¼ t j jt i t j ; z i0 ;...Š¼ 0:036z i; j 1 0:170z ij ¼ 0:170ðz ij z i; j 1 Þ 0:206z i; j 1 : ð6þ From this model, we also conclude that subjects with lower levels of QoL and subjects whose QoL decreased were more likely to dropout. We wish to compare our results to the ones obtained by the DK model. In particular, we wish to evaluate the effect of ignoring

11 JOINT MODELING OF EVENT TIME 109 the JMAC when treating our data with the DK method. We suggest that the following approach may give some clues to answer this question. It may appear uncomfortable to directly compare estimates of the parameters in the fitted model (5) and (6), since these models rely on different modelling choices. Hence, we suggest comparing instead, estimated relative risks of dropout between two individuals, using the fitted models. Let us consider two individuals sharing the same value z of Z at time t j. One individual has constant value z over time. Z varies by a quantity denoted by incr for the second individual. Using (5), the relative risk of dropout between these individuals is exp( incr). From Eq. (6), it is equal to exp( incr)(1 + exp( z))/(1 + exp( incr z)). Estimated relative risks from these two models are represented as functions of incr, by the solid and dotted lines respectively on Figure 1. The DK model was fit under the conditions of discrete dropout times and no censoring. We also fit the suggested joint model under these conditions, which gave exp( incr) as the relative risk. Its representation on Figure 1 lies close to the one obtained from the DK model. It appears that both the DK model and our model underestimate the relative risk of dropout when incr becomes negative. Both models perform equally well when incr is positive. It may then be interesting to investigate the effect of ignoring each of the JMAC in turn. Hence, we fit the suggested joint model to our data, first ignoring censoring, secondly ignoring the continuous nature of the dropout process. The estimated relative risks were respectively exp( incr) and exp( incr). We compare in Figure 1 these results to those obtained by fitting the joint model using the full information given by the data. Results obtained by ignoring each of the JMAC in turn are similar to each other. Again, they underestimate the relative risk of dropout for negative values of incr. When the continuous nature of the dropout is ignored, underestimation may occur because a dropout is set to occur at t j when it actually lies between t j 1 and t j. The high prognostic value of a decrease in the QoL for occurence of a dropout, is then moderated by the increase of the duration of follow-up induced by this convention. Similarly, considering censoring events, which are not preceded by a decline in QoL level as dropouts, attenuates the impact of a decrease in QoL on dropout. 5. Discussion In this paper, we have proposed a new approach to the problem of Cox regression with an internal time-dependent covariate whose value at the event time is not observed (here event is a dropout). This approach jointly models the longitudinal covariate and hazard processes. We have shown that the likelihood resulting from this model can be maximized using an EM-algorithm. We fit the suggested joint model to data from a cancer clinical trial. We suggested that it may be useful to apply model (2) within the context of longitudinal data with nonignorable dropout. Comparison between our approach and

12 110 DUPUY AND MESBAH Joint model (JM) JM with no censoring JM with discrete dropout times JM with discrete dropout times and no censoring Diggle and Kenward s model Relative risk of dropout incr Figure 1. Relative risk of dropout vs incr for the joint model (JM) fitted under JMAC and by relaxing each of the JMAC in turn, and for the DK model.

13 JOINT MODELING OF EVENT TIME 111 the DK model is provided. The DK model assumes discrete dropout times and does not accomodate censoring. Under these conditions, the two models led to similar results, both appearing to underestimate risk of dropout for individuals with decrease in the longitudinal outcome. To compare the two models, we used the hazard ratio and a relative risk calculated from the DK model. It should be noted that this relative risk is conditional on the fact that individuals have not yet dropped out. One may nevertheless keep in mind that in general, hazard ratios and relative risks are different quantities that should be compared with care. More work on how to compare results of both models is still needed. The increasing recognition of the need for models that accomodate missing values in longitudinal studies makes the issue of model checking extremely important. However, goodness-of-fit analysis is rarely performed by the users of such models, despite awareness of the adverse effects of model misspecification on the statistical inference. Some suggestions follow, which may be viewed as preliminary attempts to validate the model we suggest. Whenever possible, one may rely on a double sampling scheme (Mesbah et al., 1992) in which one obtains the QoL value at dropout for a subsample of the studied population and conducts a validation study on this subsample. However, this may also raise difficulties (e.g., the size and choice of the subsample). An alternative approach may rely on goodness-of-fit tests for the model using the entire data set (including unobserved values). However, missing values preclude from using such an approach. We might then consider instead, assessing the validity of the marginal model (3) obtained by integration over the missing quantities, recognizing however that this does not allow to validate the assumption about the distribution of the missing components given the observed data. Validation of this marginal model should be important, in order to draw reliable conclusions from the estimations of parameters obtained by maximizing the integrated likelihood (4). We may ask whether estimates are robust to misspecification of the model assumptions. Various authors have discussed robustness in selection models, including Little (1995), Hogan and Laird (1997b), Molenberghs et al. (1997), and some of the discussants of the papers by Diggle and Kenward (1994) and Scharfstein et al. (1999). To date, very little work has been done to propose methods that investigate the sensitivity of selectionmodelling results with respect to the model assumptions (for a review, see Verbeke and Molenberghs (2000), who also adapt the DK model to a tractable form for sensitivity analysis (Ch. 19)). Hence, further work is needed to propose a methodology to be applied to the model we suggest. Some work has already been done for pattern-mixture models (a recent overview and treatment of this are given by Verbeke and Molenberghs (2000), Ch. 20). However, these models require large numbers of dropouts per dropout pattern to reliably estimate the usually large number of parameters. One feature of our data was the continuous nature of dropout, and Figure 1 suggests that one has to take this characteristic into account when it is present in the data. A problem raised by pattern-mixture models is that they cannot accomodate this feature. Identification of parameters is also an important issue of models for nonignorable dropout. It has recently been discussed (in a context different from that of longitudinal studies however) by Scharfstein et al. (1999). There, the authors consider a study

14 112 DUPUY AND MESBAH designed to end at a fixed time T, at which an outcome of interest Y is measured on each individual. Letting V be some time-dependent covariate, they propose to model the hazard of dropout by (t V(t), Y ) ¼ 0 (t V(t)) exp( 0 Y ), thus introducing nonignorable dropout through the term exp( 0 Y ). In the absence of further knowledge of Y and V(t), Scharfstein et al. (1999) do not formulate any hypothesis on the joint distribution of the observed V(t) and the potentially unobserved Y, and they show that the parameter 0 cannot be identified. The approach we take when applying model (2) in the context of longitudinal data with nonignorable dropout is different and is motivated by the following reason: when interest lies in longitudinal trajectories, rather than in a single measure of an outcome, it may not be unreasonable to make certain model assumptions on the longitudinal process. Then, assuming a first-order Markov model for the longitudinal process, Dupuy et al. (2001) have shown that model (2) is identifiable under mild conditions. It may be interesting to accomodate longitudinal categorical data in the model we suggest, and to compare results with those obtained from the model of Molenberghs et al. (1997). One may also investigate how the suggested method may be extended to nonmonotone missing data and compare results with those given by Troxel et al. (1998), who extended the DK model to this situation. One may view non-monotone missing data as recurrent dropouts, and proceed by extending methods for analysis of recurrent events. Appendix Updating Formulas for (m+1), 2(m+1) and l (m+1) At the M-step of the (m + 1)th iteration of the EM-algorithm, the estimates for, 2 and l (l ¼ 1,..., p) are updated using respectively: P h n P i ½si Š i¼1 j¼1 z ij:z i; j 1 þ E ðmþðz i½s i Š:Z i ðs i ÞÞ P n P ½si ; Š i¼1 j¼0 z2 ij Pn i¼1 hp i ½si Š j¼1 ðz ij ðmþ1þ z i; j 1 Þ 2 þ E ðmþðz iðs i Þ ðmþ1þ z i½si ŠÞ 2 n þ P n i¼1 ½s ; iš and P n i¼1 i1 ful ¼s i g P h i: n i¼1 E Z iðul Þ ÞY ðmþðetðmþ1þ i ðu l Þþe Tðmþ1Þ z iðul Þ W i ðu l Þ

15 JOINT MODELING OF EVENT TIME 113 Score and Information for It is convenient to introduce the following notation: Sm ðrþ ð; sþ ¼Xn j¼1 h E ðmþðzr Z jðsþ jðsþ et ÞY j ðsþþz r z jðsþ et jðsþ W j ðsþ for r ¼ 0, 1, 2, where for any column vector a, a 0 ¼ 1, a 1 ¼ a and a 2 ¼ aa T. Define E m (, s) ¼ S m (1) (, s) /S m (0) (, s). The score and information for are respectively: i and U ðmþ ðþ ¼ Xn i¼1 i E ðmþðz iðs i ÞÞ E m ð; s i Þ I ðmþ ðþ ¼ Xn i¼1 h i i Sm ð2þ ð; s iþ=sm ð0þ ð; s iþ E 2 m ð; s iþ : Computation of Conditional Expectations The conditional expectations to be evaluated in the E-step of the EM-algorithm have the form E ( (Z i (s i ))). In particular, the choices of Z i (s i ), Z 2 i (s i ), e Z i(s i ), Z i (s i )e Z i(s i ) and Z 2 i (s i )e Z i(s i ) for (Z i (s i )) are of interest. These expectations are taken under the conditional distribution of (S i, D i, Z i0,..., Z i (s i ) y i ; ) which is: f ; ðs i ; i jz i0 ;...; z i ðs i ÞÞf ðz i ðs i Þjz i½si ŠÞ RR f ;ðs i ; i jz i0 ;...; z i ðs i ÞÞf ðz i ðs i Þjz i½si ŠÞdz i ðs i Þ : Taken with respect to this density, the E ( (Z i (s i )) are then equal to: R R ðz iðs i ÞÞ f ; ðs i ; i jz i0 ;...; z i ðs i ÞÞ f ðz i ðs i Þjz i½si ŠÞdz i ðs i Þ R R f : ð7þ ; ðs i ; i jz i0 ;...; z i ðs i ÞÞ f ðz i ðs i Þjz i½si ŠÞdz i ðs i Þ Letting W i ¼ð2 2 Þ 1 2 ðzi ðs i Þ z i½si ŠÞ; Z~ i ðs i Þ¼ð2 2 Þ 1 2 Wi þ z i½si Š and Z~ i(si ) ¼ (Z i[si ], Z~ i (s i )) T, Eq.(7) can be re-written after simplification as: h 1 ðz~ i ðs i ÞÞexp i 1 z~ i ðs i Þ P i p l¼1 le T z~ iðsiþ Y i ðu l Þ w 2 i dw i R þ1 1 exp i 1 z~ i ðs i Þ P p l¼1 : le T z~ iðsiþ Y i ðu l Þ w 2 i dwi R þ1

16 114 DUPUY AND MESBAH We then use an N-points Gauss-Hermite quadrature to compute E (m)( (Z i(s i ))) using the following formula: h ðz~ i ðs i ÞÞexp i ðmþ 1 z~ i ðs i Þ P i p l¼1 ðmþ l e T ðmþ z~ iðsiþ Y ðu l Þ! j h exp i ðmþ 1 z~ i ðs i Þ P i ; p e TðmÞ z~ iðsiþ Y i ðu l Þ! j P N j¼1 P N j¼1 l¼1 ðmþ l where w i takes on N abscissa values a j ( j ¼ 1,..., N) and! j are the corresponding weights. Acknowledgments The authors wish to thank an editor and four referees for their helpful comments and suggestions on earlier versions of this paper. References D. G. Altman and B. L. De Stavola, Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates, Statistics in Medicine vol. 13 pp , P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding, Statistical Models Based on Counting Processes, Springer-Verlag: New-York, D. Collett, Modelling Survival Data in Medical Research, Chapman & Hall: London, D. R. Cox, Regression models and life-tables (with discussion), Journal of the Royal Statistical Society, Series B vol. 34 pp , D. R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall: London, E. A. C. Crouch and D. Spiegelman, The evaluation of integrals of the form R 1 +1 f (t) exp( t 2 ) dt: Application to logistic-normal models, Journal of the American Statistical Association vol. 85 pp , U. G. Dafni and A. A. Tsiatis, Evaluating surrogate markers of clinical outcome when measured with error, Biometrics vol. 54 pp , V. DeGruttola and X. M. Tu, Modelling progression of CD4-lymphocyte count and its relationship to survival time, Biometrics vol. 50 pp , A. P. Dempster, N. M. Laird, and D. R. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society, Series B vol. 39 pp. 1 38, P. J. Diggle and M. G. Kenward, Informative dropout in longitudinal data analysis (with discussion), Applied Statistics vol. 43 pp , J.-F. Dupuy, I. Grama, and M. Mesbah, Identifiability and consistency in a Cox model with missing timedependent covariate. Technical Report SABRES 2001/11, University of South-Brittany, France, C. Gourieroux and A. Monfort, Statistique et Modèles Econométriques, Economica, J. W. Hogan and N. M. Laird, Mixture models for the joint distribution of repeated measurements and event times, Statistics in Medicine vol. 16 pp , 1997a. J. W. Hogan and N. M. Laird, Model-based approaches to analysing incomplete longitudinal and failure time data, Statistics in Medicine vol. 16 pp , 1997b. J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, Wiley: New-York, R. J. A. Little, Modeling the dropout mechanism in repeated-measures studies, Journal of the American Statistical Association vol. 90 pp , 1995.

17 JOINT MODELING OF EVENT TIME 115 R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, Wiley: New-York, T. A. Louis, Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society, Series B vol. 44 pp , T. Martinussen, Cox regression with incomplete covariate measurements using the EM-algorithm, Scandinavian Journal of Statistics vol. 26 pp , M. Mesbah, J. Lellouch, and C. Huber, The choice of loglinear models in contingency tables when the variables of interest are not jointly observed, Biometrics vol. 48 pp , G. Molenberghs, M. G. Kenward, and E. Lesaffre, The analysis of longitudinal ordinal data with nonrandom dropout, Biometrika vol. 84 pp , S. A. Murphy, Asymptotic theory for the frailty model, The Annals of Statistics vol. 23 pp , G. G. Nielsen, R. D. Gill, P. K. Andersen, and T. I. A. Sørensen, A counting process approach to maximum likelihood estimation in frailty models, Scandinavian Journal of Statistics vol. 19 pp , J. Nocedal and S. J. Wright, Numerical Optimization, Springer: New-York, H. J. Ribaudo, S. G. Thompson, and T. G. Allen-Mersh, A joint analysis of quality of life and survival using a random effect selection model, Statistics in Medicine vol. 19 pp , D. O. Scharfstein, A. Rotnitzky, and J. M. Robins, Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion), Journal of the American Statistical Association vol. 94 pp , M. D. Schluchter, Methods for the analysis of informatively censored longitudinal data, Statistics in Medicine vol. 11 pp , Scilab Group, Introduction to Scilab. INRIA Meta2 Project/ENPC Cergrene, User s Guide, D. M. Smith, Oswald: Object-Oriented Software for the Analysis of Longitudinal Data in S, WWW page A. B. Troxel, S. R. Lipsitz, and D. P. Harrington, Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data, Biometrika vol. 85 pp , A. A. Tsiatis and M. Davidian, A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error, Biometrika vol. 88 pp , A. A. Tsiatis, V. DeGruttola, and M. S. Wulfsohn, Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS, Journal of the American Statistical Association vol. 90 pp , G. Verbeke and G. Molenberghs, Linear Mixed Models for Longitudinal Data, Springer-Verlag: New-York, M. C. Wu and R. J. Carroll, Estimation and comparison of changes in the presence of informative right censoring by modelling the censoring process, Biometrics vol. 44 pp , M. S. Wulfsohn and A. A. Tsiatis, A joint model for survival and longitudinal data measured with error, Biometrics vol. 53 pp , 1997.

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,