Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events. Recurrent events (focused topic) - time-to-events model (point process model) - time-between-events model (gap times model) - e.g. repeated infections/hospitalizations/tumor occurrences Ordered multiple events - HIV AIDS death - birth onset age of a genetic disease death - disease staging I II III IV Unordered multiple events

Time-to-events and time-between-events models Time-to-events models - Interest focuses on occurrence rate of recurrent events over time. - Time is measured from time-origin to events. - Time-origin could be a fixed calendar time, onset of treatment, or a biological event. - Outcome variables of interest are gap times between events. - This type of models are more relevant when cycling pattern of recurrent events is strong; for example, women s menstrual cycles.

5.1 Time-to-events models Consider a continuous point process N(t), where N(t) represents the number of events occurring at or prior to t, 0 t τ. Intensity function. Intensity function of a continuous point process in [0, τ] is conventionally defined as the occurrence rate of events given the event history, λ(t N H (t)) = lim t 0 + Pr(N(t + ) N(t) > 0 N H (t)), where N H (t) = {N(u) : 0 u t} represents the history of the point process before or at t, t [0, τ].

Remarks - The intensity function uniquely determines the probability structure of the point process under regularity conditions. - For recurrent events, the so-called conditional regression models are constructed on the basis of the intensity function.

Rate function. In contrast with the conditional interpretation of the intensity function, a rate function λ(t), t [0, τ], is defined as the average number of events in unit time at t for subjects in the random population. More precisely, λ(t) = Pr(N(t + ) N(t) > 0) lim 0 + namely, the occurrence rate at t unconditionally on the event history H(t).,

Remarks - In general, a rate function itself does not fully determine the probability structure of the point process. - The rate function is conceptually and quantitatively different from the intensity function, and it coincide with the intensity function only when the process is memoryless. - For recurrent events, the so-called marginal regression models are constructed on the basis of the rate function

Define the cumulative rate function as Λ(t) = t 0 λ(u)du, t [0, T 0 ]. The CRF Λ(t) is also expectation of the number of recurrent events occurring in [0, t]. Note that E[N(t)] = Λ(t) we frequently write E[dN(t)] = λ(t)dt

5.1.1 Poisson process models Poisson process is a counting process model for multiple events occurring over a fixed time interval [0, τ], τ > 0. The Poisson distribution is the probability distribution for the total number of events, M. The Poisson distribution is sometimes used for modelling a count variable in other situations.

A point process is a stationary Poisson process if the following three conditions are satisfied (sketch): 1. The probability that exactly one event occurs in a small interval [t, t + h] is approximately λh, where λ is called the intensity (or rate) of events, λ > 0. 2. The probability that 2 or more events occur at the same time is approximately 0. 3. The numbers of events in disjoint regions are independent. ( ) Let µ = λτ > 0. The pdf of M is f M (m) = e µ µ m, m = 0, 1, 2,.... A Poisson process is called a non-stationary Poisson process if the occurrence rate, λ(t), is time dependent. m!

5.1.2 Nonparametric estimation of CRF Data. Let t i1 t i,mi be the ordered event times with m i defined as the index for the last observed event. The observed data include {(m i, c i, t i1,..., t i,mi ) : i = 1,..., n}. Population. Note that for a single event process (univariate survival time), the risk population at t is composed of subjects who have not failed prior to t, thus the risk population varies with different values of t. In contrast, for a recurrent event process, the risk population at different t s always coincides with the target population defined at 0. Risk set. Let C i represent the terminating time (censoring time) for observing N(t). The risk set at t is defined as {i : C i t} which includes subjects who are under observation at t. Define R i (t) = I(C i > t) as the risk-set indicator, and R(t) = n i=1 R i(t). Independent censoring. If C i is independent of N i ( ), the risk set forms a random sample from the risk population at t.

Under independent censoring assumption, for t > 0 and positive-valued but small, a crude estimate of the occurrence probability in (t, t] can constructed as λ(t) n i=1 mi j=1 I(t ij (t, t]) R(t), (1) with I( ) representing the indicator function. The estimate is essentially an empirical measure with time-dependent sample size R(t). A nonparametric estimate of the CRF corresponding to (1) can then be constructed as ˆΛ(t) = n m i i=1 j=1 I(t ij t) R(t ij ). (2) Nelson (88, JQT; 95, Technometrics)

5.1.3 Conditional Regression Models Anderson and Gill (1982, AS) proposed a time-to-events model which extended Cox s proportional hazards model from single event data to recurrent event data. Suppose the dates of recurrent events are recorded with a continuous scale (e.g., by days or weeks), and the outcome measures of interest are recurrent events occurring in the time interval [0, τ], where the constant τ > 0 is determined with the knowledge that recurrent events could potentially be observed up to τ, say 3 years. Let N H (t) be the recurrent event history and Z H (t) the possibly time-dependent covariate history prior to t. For t [0, τ], the AG model assumes the events occur over time with the occurrence rate λ(t N H (t), Z H (t)) = λ 0 (t)exp{x(t)β}, (3) where X(t) = φ(n H (t), Z H (t)) is a transformation of (N H (t), Z H (t))

Pros and cons of conditional regression model (i) The AG model can be thought of as a predicting model since the event history is included as a part of conditional statistics in the rate function. (ii) Use of the AG model to identify treatment effects is subject to constraints, since the model identifies treatment effects adjusted for subject-specific event history. In general, AG model is not ideal for identifying treatment effects or population risks. (iii) If the AG model chooses to use time-independent covariate, X(t) = X, the model is then required to be memoryless. For example, two subjects with the same X but different event histories would predict the same occurrence rate of events. Thus, if X =treatment indicator, two patients who receive the same treatment but have different hospitalization records would have the same level of risk for rehospitalization according to the AG model.

Statistical methods for conditional regression model AG extended the partial likelihood methods from univariate survival data to recurrent event data. The partial likelihood score function for β 0 can be derived as U(β, t) = n i=1 t 0 {X i (u) X(β, u)}dn i (u) (4) where Z(β, n i=1 t) = Ri(t)Xi(t) exp{βt 0 Xi(t)} n i=1 Ri(t) exp{βt 0 Xi(t)}. Martingale theory was also developed to establish the large sample properties (as an extension of martingale theory for univariate survival analysis).

5.1.4 Marginal Regression Models In stead of the conditional regression model, we may consider a marginal model where the event history, N(t), is not included as part of the conditional statistics: λ(t Z(t)) = λ 0 (t)exp{z(t)β}. The marginal model is generally ideal for identifying treatment effects and risk factors, but the estimation procedure of LWYY depends heavily on the independent censoring assumption. The LWYY estimates could be very biased when the follow-up is terminated by reasons associated with the recurrent events such as informative drop-out or death. Statistical inferences can be found in the articles of Pepe and Cai (1993, JASA) and Lin et al. (Huang, 2000, JRSS-B).

5.1.5 Semi-parametric latent variable models. With intension to deal with censoring due to death or informative drop-out, Wang et al. (2001, JASA) proposed a semi-parametric latent variable model for time-to-events data: λ(t Z, X) = Z λ 0 (t)exp{xβ} The model allows for informative censoring through the use of a latent variable. The model implies the marginal rate model λ(t X = x) = λ 0(t)exp{xβ}. where λ 0(t) = E[Z] λ 0 (t). The model has the feature of treating both the censoring and latent variable distributions as nonparametric components. The approach avoids modeling and estimating these nonparametric components by proper conditional likelihood techniques. As a related work, a joint model for recurrent events and a failure time was proposed and studied by Huang and Wang (2004, JASA).

5.2 Suppose the outcome measure of interest is time between successive events (gap time). When time-between-events is the variable of interest, the occurrence of each recurrent event is considered as the time origin for the occurrence of the next event. Recurrence times could be considered as a type of correlated failure time data in survival analysis. This type of correlated data are, however, different from the correlated data collected from families (e.g., twin data or sibling data) due to the ordering nature of recurrent events.

5.2.1 Specific features of data Informative m. For typical multivariate survival data such as family data, cluster size is usually assumed to be uncorrelated with failure times of a cluster. For recurrence time data, the number of recurrent events, m, is typically correlated with recurrence times in follow-up studies - large m is likely to imply shorter times and vice versa. In some applications, m is even used as the outcome measurement for analysis; e.g., in a Poisson model, m is the Poisson count variable. Induced informative censoring. Induced informative censoring is a special feature for ordered events. When the observation of the recurrent event process is censored at C, the censoring time for T j is max{c j 1 k=1 Y k, 0}, for each j = 2, 3,.... Because j 1 k=1 Y k is correlated with T j for j 2, recurrence times of order greater than one are observed subject to informative censoring even if the censoring time C is independent of N( ).

Intercepted sampling. The intercepted sampling is a well-known probability feature of renewal processes. It is a specific feature of recurrence time data because the sampling scheme to observe recurrence times in longitudinal studies is similar to the intercepted sampling of renewal processes. For simplicity of understanding, assume the recurrence times {Y j : j = 1, 2,...} are independent and identically distributed (iid). Let f, S and µ represent the density function, survival function and mean of Y j. Let T = C T m and R = T m+1 C be the so-called backward and forward recurrence times. When the censoring time, C, is sufficiently large so that an equilibrium condition is reached, the joint density of (T, R) can then be derived as p T,R (t, r) = f(t + r)i(t 0, r 0)/µ. (5)

The marginal density functions of Y, T and R can be derived, based on (1), as p Ym+1 (y) = yf(y)i(y 0)/µ, (6) p T (t) = S(t)I(t 0)/µ, (7) p R (r) = S(r)I(r 0)/µ. (8) The distribution of Y m+1 is referred to as the length-biased distribution. In most of the longitudinal studies, however, the censoring time is not very large and therefore the equilibrium condition is not satisfied. In these cases, although the above distributional results do not hold, the bias from Y m+1 is still significant and one should be careful when conducting statistical analysis. In general, because of the specific data features, standard statistical methods in survival analysis may or may not be appropriate for recurrence time data.

5.2.2 Transitional probability Model Let f j (y y i1,..., y i,j 1 ) denote the pdf of Y ij conditioning on (Y i1,..., Y i,j 1 ) = (y i1,..., y i,j 1 ). Suppose the censoring time C i is independent of the recurrent event process N i ( ). Note that the likelihood function is L n m i { f j (y ij y i1,..., y i,j 1 )}S mi+1(y + i,m y i+1 i1,..., y i,mi ) i=1 j=1 A transitional probability model can be constructed by placing distributional assumptions on the conditional probability f j (y y i1,..., y i,j 1 ). In applications, when a transitional probability model is used, it is frequently accompanied by a further 1st-order (or 2nd-order) markovian assumption that the conditional pdf of Y ij depends on (Y i1,..., Y i,j 1 ) only through Y i,j 1.

In a regression setting, when covariates x i is present, we assume that conditioning on x i the censoring time C i is independent of N i ( ). The likelihood function is modified as L n m i { f j (y ij x i, y i1,..., y i,j 1 )}S mi+1(y + i,m x i+1 i, y i1,..., y i,mi ) i=1 j=1

5.2.3 Parametric Frailty Model Frailty models are basically random-effects or latent-variable models, where the frailty is used to characterize a subject. Assume the following conditions: (i) Conditional on a subject-specific latent variable Z = z, the recurrence times {Y j : j = 1, 2,...} are independent. (ii) (Independent censoring) C and (N( ), Z) are independent. (iii) (Distributional assumption) Conditional on Z = z, Y j is distributed with pdf f j (y z; θ), θ Θ. The latent variable Z is distributed with pdf h(z; γ), γ Γ.

With Assumptions (i), (ii) and (iii), the likelihood function from the data can be formulated as L n i=1 f j (y ij z i ; θ)}s mi+1(y + i,m z i+1 i; θ)h(z i ; γ)dz i m i { j=1 The likelihood function is then maximized to derive estimates (MLEs) of theta and γ. Large sample distributions of the MLEs can be derived based on normal approximation.

In a regression setting when covariates x is present, Assumptions (i - iii) can be modified as (i) Conditional on x and a subject-specific latent variable Z = z, the recurrence times {Y j : j = 1, 2,...} are independently distributed. (ii) (Independent censoring) Conditional on x, C and (N( ), Z) are independent. (iii) (Distributional assumption) Conditional on x and Z = z, Y j is distributed with pdf f j (y z; θ), θ Θ. The latent variable Z is distributed with pdf h(z; γ), γ Γ.

With the modified assumptions, the likelihood function is expressed as n L i=1 f j (y ij x i, z i ; θ)}f mi+1(y + i,m x i+1 i, z i ; θ)h(z i ; γ)dz i. m i { j=1 It is, however, generally difficult to compute the MLE. In the literature EM algorithms and other computation algorithms have been developed to resolve the problem.

Appendix (optional reading) A.1 Nonparametric estimation of survival function estimation Recurrence times can be treated as a type of correlated survival data in statistical analysis. However, because of the ordinal nature of recurrence times, statistical methods which are appropriate for clustered survival data may not be applicable to recurrence time data. In many medical papers, recurrence time data are frequently analyzed by inappropriate methods as indicated by Aalen and Husebye (1991). Specifically, for estimating the marginal survival function, the Kaplan-Meier estimator derived from the pooled data is frequently used for exploratory analysis although the estimator is generally inappropriate for such analyses. Suppose recurrent events are of the same type and consider the problem of how to estimate the marginal survival function from univariate recurrence time data. Assume the following conditions are satisfied.

(i) (Conditional iid assumption) Conditional on a subject-specific latent variable Z = z, the recurrence times {Y j : j = 1, 2,...} are identically and independently distributed. (ii) (Independent censoring) C and (N( ), Z) are independent. Define the univariate recurrent survival function of Y j as S(y) Pr(Y j > y) = S(y z)dh(z), where S(y z) is the conditional survival function of Y j given Z = z, and H is the distribution function of Z.

Under (i) and (ii), let S = 1 S, the nonparametric likelihood function can be formulated as n i=1 d S(u ij z i )]S(u + i,m z i+1 i)dh(z i ). m i [ j=1 Conceptually, the likelihood function involves both infinite parameters (the conditional cdf s S( z i )) and a mixing distribution (H). With infinite parameters, the maximization of the likelihood function could be problematic and therefore it is not used as the tool for finding an estimator of S. Instead of the nonparametric likelihood approach, Wang and Chang (1999, JASA) proposed a class of nonparametric estimators for estimating S(y):

Define the observed recurrence times as { u u ij if j = 1,..., m i ij = u + i,m i+1 if j = m i + 1 Define m i = { m i if m i = 0 m i 1 if m i 1

Let w i = w(c i ), where w( ) is a positive-valued function. The total mass of the risk set at y is calculated as R (y) = n w i [ m i + 1 i=1 and the mass evaluated at y is d (y) = n [ w ii(m i 1) m i + 1 i=1 m i +1 j=1 m i +1 j=1 I(u ij y)] I(u ij = y)]. Let u (1), u (2),..., u (K) be the ordered and distinct uncensored times. The estimator takes the product limit expression, Ŝ n (y) = { 1 d (u (i) ) } R (u (i) ), u (i) y which is non-increasing in y and satisfies 0 Ŝn(y) 1. Further, this estimator also possesses proper large sample properties.

A.2 Semiparametric Regression Models Conditional proportional hazards model. Now, we are back to the general case that recurrent events may or may not be the same. Prentice, Williams and Peterson (1981, Biometrika) modeled time-between-event data by a conditional proportional hazards model as an extension of the usual proportional hazards model for univariate failure time data: λ(t N(t ) = j 1, N H (t), X H (t)) = λ 0j (t t j 1 )exp{z(t)γ j }, (9) for t t j 1. In the model, - N H (t) = {N(u) : 0 u t} is the event history up to t - X H (t) = {X(u) : 0 u t} is the covariate history up to t - λ 0j ( ) is the baseline hazard function - γ j is the regression parameter for the jth recurrence time

The possibly time-dependent covariate history up to t is denoted by X H (t). As an important requirement, the event history N H (t) must be part of the given knowledge (conditional statistics) in the PWP model. The time-dependent covariate vector Z(t) = φ(x H (t), N H (t)) is a transformation of (X H (t), N H (t)). This model serves as a proper model for predicting the future events given subject-specific covariates and event history information. However, since event history is part of the conditional statistics in the model, the PWP model does not serve as an appropriate model for identifying treatment or prevention effects. The PWP model has been further extended to include both globally defined parameters β and episode-specific parameters γ j (Chang and Wang, 1999, JASA): λ(t N(t ) = j 1, N H (t), X H (t)) = λ 0j (t t j 1 )exp{z(t)γ j + W (t)β}, (10) for t t j 1, where Z(t) and W (t) are functions of (X H (t), N H (t)).

Marginal regression models. In contrast with conditional regression models, marginal regression models do not include the event history N H (t) as part of the covariates and therefore serve as appropriate models for identifying treatment effects or population-based risk factors. Without conditioning on event history, limited techniques have been developed for the analysis of marginal regression models, with exceptions of Huang s accelerated failure time model (Y. Huang, 2000, JASA): log Y j = α j + x j β j + ɛ j, j = 1, 2,...

(cont d) Lin, Wei and Robins bivariate accelerated failure time model (1998, Biometrika): log Y 1 = α 1 + x 1 β 1 + ɛ 1, log Y 2 = α 2 + x 2 β 2 + ɛ 2 and Huang and Chen s proportional hazards model for Y j (2002, LIDA): λ(y x) = λ 0 (y)exp{xβ}, where x is the baseline covariates and λ 0 is the baseline hazards function shared by all the episodes. Note that the first two models only partially depend on N(t), and the third model is essentially a renewal model.

Trend models. In many applications the distributional pattern of recurrence times can be used as an index for the progression of a disease. Such a distributional pattern is important for understanding the natural history of a disease or for confirming long-term treatment effect. Assume (i) Within each subject, the recurrence times Y 1, Y 2,... are independently distributed with the survival functions S 0, S 1, S 2,..., and (ii) within each subject, the censoring time C is independent of N( ).

Assumption (i) can be viewed as a frailty condition where the conditional independence of recurrence times holds within each subject. Assumption (ii) implies that, within subject, the censoring mechanism is uninformative for the probability structure of event process. In applications, one might be interested in testing the null hypothesis (that is, (i)) that the duration distributions of different episodes Y 1, Y 2,... remain the same to confirm the stability of pattern of recurrence times, or to identify the treatment efficacy over time; see Wang and Chen (2001, Biometrics) for nonparametric and semiparametric approaches to deal with the problem.