dierent individuals in the data set can have dierent \baseline" event rates (on entry of study) which may be regarded as sampled from an imaginary dis

Size: px

Start display at page:

Download "dierent individuals in the data set can have dierent \baseline" event rates (on entry of study) which may be regarded as sampled from an imaginary dis"

Maude Todd
6 years ago
Views:

1 Group Seuential rocedures for oisson rocess Data with Frailty Wenxin Jiang Department of Statistics, Northwestern University, Evanston, Illinois 60208, U.S.A. January 7, 1998 Summary Consider studies with recurrent events data modeled by local oisson processes with frailty, or random eects. In a group seuential design, the increments of the test statistics are no longer independent. We explore the inuences of the frailty, and tabulate the stopping boundaries and sample size ratios to control the overall type-i and type-ii error rates. Applications in an animal trial with recurrent tumors are discussed, using the procedures such as the group seuential tests and the repeated condence intervals. Minimal cost analysis is considered for determining the optimal combination of study duration and sample size. 1. Introduction This paper investigates group seuential procedures for recurrent events data, allowing frailty (see Oakes, 1992), or the random heterogeneity of event freuencies among dierent subjects (see Lawless, 1987; Turnbull, Jiang, and Clark, 1997). Recurrent events data consist of individuals each being able to develop a number of events over time. Examples include data from medical studies of epileptic seizures, asthmatic attacks, infections, etc. In this context, \frailty" means that Key words: Frailty; Independent increments; Interim analyses; oisson process; Minimal cost analysis; Multiplicative intensity model; Recurrent events. 1

2 dierent individuals in the data set can have dierent \baseline" event rates (on entry of study) which may be regarded as sampled from an imaginary distribution. As we show in Section 2, when frailty is present, the increments of the seuentially calculated partial-likelihood score statistics no longer have the convenient independent properties (e.g., Tsiatis, Boucher, and Kim, 1995; Lan and DeMets, 1983), implying that the usual stopping rules or exit boundaries are no longer valid in general. Seuential analyses with xed number of subjects but with variable follow-up times have been discussed in a dierent context of repeated measurements data by Armitage, Stratton and Worthington (1985); Geary (1988); Lee and DeMets (1991); Lee and DeMets (1992). Cook (1995) discussed the xed design of clinical trials for recurrent events data in the same context of the present paper. Few works exist on seuential analysis of recurrent events data. In a recent paper, Cook and Lawless (1996) used robust pseudo-score test statistics which do not necessarily have independent increment structures and considered the evaluation of stopping boundaries of various types. The important idea of using the robust score-type tests, stemming from Lawless and Nadeau (1995), Cook, Lawless and Nadeau (1996) and applied to seuential analyses in Cook and Lawless (1996), has the virtue of (i) no distributional assumption is needed for the event process except for the assumption of the mean process; (ii) the score statistic does not reuire obtaining the parameter estimate to start with; (iii) The temporal trend of the mean process is modeled non-parametrically. The current paper is intended to focus more on the design issues, and to directly model and investigate the eects of frailty, by focusing on the specic situation when the recurrent events data can be modeled by local oisson processes with frailty. Here, \local" means that the event rate of a same individual is allowed to change over time. When transformed to the process of the cumulative count of events, this model is essentially a generalization of the Andersen-Gill Model (Andersen et al., 1993) incorporating frailty. A \frailty parameter" is introduced in Section 2, which is proportional to the between-subject variance of the baseline event rate. 2

3 In particular, we address the following aspects dierent from Cook and Lawless (1996), in an attempt to provide some guidelines for study design. (i) We consider the Wald statistics which are often used in biomedical studies. (ii) We consider the asymptotic joint distribution of the seuential test statistics under local alternatives (Section 2) to derive the formulas for obtaining stopping boundaries and sample size planning (Section 3), and repeated condence intervals (Section 4) that allow extra exibility for interim analyses (Jennison and Turnbull, 1984; Lai, 1984). For uniform follow-up plans (Section 3), we transform the test statistics and derive an iterative algorithm in the form of Armitage, Mcherson and Rowe (1969) for calculating stopping boundaries and planning sample size. For 2 to 5-stage interim analyses with eual increments procedures (Section 3), we present concise tables (Tables 3,4,5) for the stopping boundaries and sample size planning, labeled by one parameter related to frailty. (iii) We illustrate the calculation of a correlation parameter expressed in (8), to see how the dependence of the increments of the test statistic is induced by frailty. Two properties of the test statistics caused by frailty are discussed (1 and 2 in Section 2), with their implications on study planning discussed in Section 3. (iv) For studies with recurrent events data, designers often have their own choice on increasing the sample size or the length of study period to achieve certain error rates. We present a minimal cost analysis (Section 5) for determining the optimal combination of sample size and study duration. (v) Cook and Lawless (1996) uses the error spending function approach, and estimates the covariance matrix of the seuential statistics to obtain the stopping boundaries stage by stage from data. This approach has the virtue of robustness under model misspecications. We consider the original ocock (ocock, 1977) and OF (O'Brien and Fleming, 1979) designs and summarize the covariance matrix in terms of the frailty parameter, under the local oisson process model with frailty. This approach is more model-specic, but is especially suitable for designing problems, due to the ease of summarizing the procedures in terms of pre-planned stopping boundaries and sample 3

4 sizes using the frailty parameter. (vi) All the stopping criteria and sample size calculations are based on the joint asymptotic normality of the seuentially calculated score-type statistics. However, since the independent increment properties fail, the joint normality is not automatic even if under mild regularity conditions the seuential score statistics are marginally asymptotically normal. We outline a proof of the joint asymptotic normality in Lemma 1 in Appendix A. In Section 2 the robust test statistics are introduced to account for frailty. Joint distributions of the test statistics evaluated at dierent interim analysis dates are derived and expressed in terms of certain correlation coecients, which can be calculated for dierent models of follow-up processes. From the joint distributions, we calculate the adjusted stopping boundaries of the ocock and the OF types, and tabulate them (Tables 1,2,3) for dierent correlation coecients (Section 3). Recipes are also introduced for calculating sample sizes (Section 3) and constructing (Section 4) repeated condence intervals. Optimal choice between increasing sample size and study duration is discussed in Section 5. We then retrospectively use a rats experiment data to illustrate the methods of this paper (Section 6), followed by a brief discussion (Section 7). All the Tables (1 to 5) are put in Appendix B. 2. Model and notation First, let us consider a trial with a xed design, with n independent subjects labeled as i = 1; : : : ; n. Each subject is randomly assigned to receive a certain treatment (Z = 1) or a placebo (Z = 0) with eual probabilities. The study duration is parameterized as Day 0 (t = 0) to Day K? 1 (t = K? 1). During the study period, subject i enters at t = t e and exits either at the end of study K or at a censoring time c due to a loss to follow-up. The purpose of the trial is to detect the treatment eect in reducing the freuency of certain outcome events. For subject i at time t, the response variable Y ~ it is the number of events, observed or not. Let Y it = H it Yit ~ be the number of 4

5 observed events, where we introduce an indicator H it which takes value 1 if subject i is observed at t, and 0 otherwise. We assume that the follow-up process fh it g is independent of the event process f ~Y it g. A semi-parametric model in the multiplicative form is specied as where is the parameter for treatment eect (log risk ratio), t E c (Y it ) = H it i t e Z i ; (1) is a discrete baseline intensity function which represents the natural trend of disease progression, E c represents the expectation conditional on the follow-up process, treatment assignment and a \frailty" factor i. Note that in this model even if treatment assignments are the same for two patients, the event freuencies can dier due to dierent frailty i 's, which can come from all the dierent personal attributes (age, gender, genetic factors, family history, etc). We simplistically assume that i 's are independent and identically distributed (i.i.d.) random variables, and have mean 1 and variance (the frailty parameter). In the following we will assume a local oisson process regression model, where Y it 's conditional on the Z i 's, H it 's and the frailty i 's are independent oisson random variables with mean expressed by (1). We use this model as an example to formulate our method, which itself allows other models as well. For the model described above, it is known (Lawless and Nadeau, 1995; Jiang, 1996; Jiang and Turnbull, 1997) that the usual partial likelihood estimate ^ is consistent for the treatment eect, and is asymptotically normal with variance estimatible by a robust sandwich-type estimator, despite the existence of frailty. The partial likelihood score test is also valid, after using a robust estimate of the variance, for testing the null hypothesis H 0 : = 0. The results can be summarized as follows. Let the partial likelihood estimator be ^ = arg max b2r L(b) where L(b) = log n Y i=1 K?1 Y t=0 0 e Z i 0 b n j=1 H jt e Z j 0 b AY it : Denoting U(b) := r b L(b), the partial likelihood score statistic is U( 0 ). We have, under mild 5

6 regularity conditions, as n! 1, that n 1=2 ( ^? 0 )! Normal f0; n var( ^)g in distribution, and n?1=2 U( 0 )! Normal f0; n?1 varfu( 0 )gg in distribution, under the null hypothesis H 0 : = 0. Robust estimates of the asymptotic variances var( ^) and varfu( 0 )g are discussed in Lawless and Nadeau (1995), and Jiang (1996), for example. In the present notation, they can be expressed as var( ^ ^) = (n^i?1 )(n?1 ^V)(n^I?1 ) and varfu( ^ 0 )g = (n?1 ^V); where ^I =?r 2 bl(b)j b= ^ and (2) nx K?1 X n j=1 Y jt ^V = [ H it fy it? ( ^gfz n n j=1 H jt e Z j ^ )ez i j=1 H jt Z j e Z j ^ ik? ( n j=1 H jt e Z j ^ )g]2 : i=1 t=0 Denote I(b) :=?r 2 b`(b) where `(b) := K?1 t=0 E[Y it fz i b? log E(H it e Z ib )g] is the uniform strong asymptotic limit of the (n?1 L), up to a constant independent of b. When the true parameter is, the (strong) asymptotic limit of n?1 ^I is just I(), which will be used in the discussion below. We now come to a seuential design, where we divide the entire study period into [0; K) = [K 0 ; K 1 ) [ [K 1 ; K 2 ) [ ::: [ [K Q?1 ; K Q ), where K 0 = 0 and K Q = K. Q analyses are scheduled at t = K 1 ; :::K Q, to perform the Wald test or the score test by examining ^ on data in t 2 [0; K ), = 1; :::; Q, allowing early termination of the trial. or U ( 0 ) based From now on, a subscript represents the uantity evaluated from data in t 2 [0; K ), = 1; :::; Q. To establish the stopping rules to control the overall type-i error rate, we need to know the joint distributions of ( ^? 0 ); = 1; :::; Q, and of U ( 0 ); = 1; :::; Q. Under a seuence of local alternatives, where the true parameter is = 0 + n?1=2, the joint distribution of the n?1=2 U ( 0 )'s turns out to be asymptotically normal under mild regularity conditions. See Lemma 1 in Appendix A. The joint asymptotic distribution of the n 1=2 ( ^? 0 )'s can be obtained by noticing that n 1=2 ( ^? 0 ) is fi ( 0 )g?1 fn?1=2 U ( 0 )g + o p (1) as n! 1. When the test statistics U ( 0 )'s and 6

7 ( ^? 0 )'s are normalized by being divided by their asymptotic standard errors, they both have the same asymptotic distribution. Denote the normalized statistic as s ( 0 ) = U ( 0 )[ varfu ^ ( 0 )g]?1=2, and w ( 0 ) = ( ^? 0 )fvar( ^ ^ )g?1=2, = 1; :::; Q. Then we have the following theorem. Theorem 1. If the true parameter is = 0 +n?1=2 (local alternative), then under mild regularity conditions, w 1 ( 0 ) ::: w Q ( 0 ) and s 1 ( 0 ) ::: s Q ( 0 ) 3 7 5! Normal 82 >< 6 4 >: 1 ::: Q ; ::: 1Q ::: ::: ::: Q1 ::: QQ 39 >= 7 5>; (3) in distribution, as n! 1. Here for ; 0 2 f1; :::; Qg, = I ()[n?1 varfu ()g]?1=2 + o(1), and 0 = corrfu (); U 0()g + o(1), as n! 1. For the rest of this section, we derive expressions for and 0's. We introduce the notation = () = e (1 + e )?1 (4) K 0?1 X and (T i; 0) = H it t for ; 0 2 f1; :::; Qg and < 0. (5) t=k We will also often omit the subscript i for a generic subject. In the present randomization set-up, the treatment variable Z = 0; 1 with eual probability and is assumed independent of the follow-up process. Straightforward calculation leads to I () = 2?1 ()Ef(T i;0 )g: In order to evaluate varfu ()g's and the correlation 0, we need to evaluate the covariance of the following form: covfu (); U 0()g, ; 0 2 f1; :::; Qg. Notice that the score U () is linear in the outcome variable Y it. Under the local oisson process regression model with frailty parameterized by variance, the covariance of Y it 's conditional on the Z i 's and the H it 's becomes covf(y it ; Y i0 t 0)jZ i's & H it 'sg = ii 0f tt 0H it t e Z i + (H it t e Z i )(H i0 t 0 t 0eZ i 0 )g; (6) 7

8 by rst conditioning on the frailty i 's. Here ab = 1 if a=b, and 0 otherwise. Next, note that the expectation of U ()'s conditional on the Z i 's and H it 's are 0. Hence covfu (); U 0()g is the same as EcovfU (); U 0()jZ i 's & H it 'sg. Then, using (6), we obtain, for ; 0 2 f1; :::; Qg, n?1 covfu (); U 0()g = 2?1 ()E( X K?1 t=0 +f()g 2 E( K 0?1 X X t 0 =0 K?1 K 0?1 t=0 H it H it 0 t tt 0) X t 0 =0 H it H it 0 t t 0) + o(1): (7) The convergence (o(1)) was proved rst by showing that the conditional covariance converges almost surely, and then applying the dominated convergence theorem. One thing worth mentioning is that the existence of frailty implies that the the score statistic U ( 0 )'s no longer have independent increments. Using notation (5), euation (7) implies that covfu 1 ( 0 ); U 2 ( 0 )? U 1 ( 0 )g = f( 0 )g 2 Ef(T i;01 )(T i;12 )g; which is positive if the frailty parameter is positive. The independent increment structure has been a basis for many work in seuential analyses, e.g., Tsiatis, Boucher and Kim (1995), Jennison and Turnbull (1989). Now its failure means that we cannot use the usual test criteria, of the ocock or the OF types for example. The joint distribution of the test statistics depends on the 's and the 0's. In the present model, these parameters are dependent on the frailty as following: = n 1=2 (? 0 )[2?1 fe((t 0 ))g?1 + 4Ef((T 0 )) 2 gfe((t 0 ))g?2 ]?1=2 0 = E(T 0 ) + 2Ef(T 0 )(T 0 0)g ; [Ef(T 0 )g + 2Ef((T 0 )) 2 g] 1=2 [Ef(T 0 0)g + 2Ef((T 0 0)) 2 1=2 g] (8) for ; 0 2 f1; :::; Qg and < 0, where is dened in (4). The element 0 's are reuired to be the same as the 0's, so as to make the variance-covariance matrix symmetric in (3). In the case when Q = 2, a single parameter = 12 determines the whole variance-covariance matrix. 8

9 Note 0's depend on the moments M 0 ;d 1 d 2 = E[f(T 0 )g d 1 f(t 0 0)g d 2 ]; where d 1 ; d 2 2 f0; 1; 2g, ; 0 2 f1; : : : ; Qg. They depend on the the form of the baseline intensity function t, as well as the details of the follow-up process. While various models are possible, we in the following focus on a simplest situation, where covf(t 0 ); (T 0 0)g = 0; 1 ; 0 Q, which could be assumed when the length of follow-up periods in each stage are nearly the same for all individuals (uniform follow-up plans). In this situation, = 2?1 A ; 1 Q (9) 0 = A =A 0; 0 = 0; 1 0 Q (10) where A = ( + (2 )?1 )?1 ; and is a shorthand notation for E(T 0 ), 1 Q, and = n 1=2 (? 0 ). Obviously, the correlation 0's is an increasing function of (non-negative). Hence, we have roperty 1 (1) (Increased Correlations): For uniform follow-up plans, the existence of frailty leads to bigger pairwise correlations between the seuentially calculated test statistics (Wald or score) asymptotically. Another implication of the frailty is the following: roperty 2 (2) (Standard Error Ination): The seuentially calculated test statistics (Wald or score), at each stage, have bigger asymptotic standard errors, due to frailty. This is because (7) implies that n?1 varfu ()g is an increasing function of (up to a term of o(1)), and the asymptotic variance of the test statistics (Wald or score) is proportional to n?1 varfu ()g. Note that (2) is not restricted to the uniform follow-up plans. In the following section, we will investigate the stopping boundaries and the sample size planning. The implications of (1) and (2) on these design aspects will then be made clear. From now 9

10 on we will concentrate only on the (normalized) Wald test statistic. However, up to the leading order of large sample size, things are the same for the normalized score test, since the two test statistics are asymptotically euivalent. 3. Stopping rules and sample size planning Consider now the following stopping rule for the Q-stage analysis: For = 1; :::; Q? 1, if jw ( 0 )j c (Q) then stop the trial and reject H 0 at time K ; otherwise continue the trial and perform a test at time K +1. At time K Q, if jw Q ( 0 )j c (Q) Q then reject H 0 ; otherwise retain H 0. Exit boundary fc (Q) 1 ; :::; c (Q) g is needed to preserve the overall type-i error rate, say, I. Sample size Q is also reuired to achieve a certain power, say, 1? II, at the alternative hypothesis H a : = a. Suppose the true parameter is parameterized in as () = 0 + n?1=2. Dene The error rate constraints are then simply () = pr () [jw ( 0 )j < c (Q) ; = 1; :::; Q]: (11) (0) = 1? I (12) ( a ) = II ; (13) with an alternative hypothesis H a : = a = ( a ). The above two euations can be used to nd the stopping boundary and the sample size (or the study duration, if the sample size is xed). The integration involved in calculating the probability () can be performed by using a multivariate normal integration package such as MULNOR (Schervish, 1984). In the following, we consider the ocock type boundary (ocock, 1977), where all c (Q) 's are assumed eual to some constant c (Q) ; as well as the OF type boundary (O'Brien and Fleming, 1979), where c (Q) = c (Q) OF (Q=) 1=2 is assumed for some constant c (Q) OF commonly-used type-i error rate I and = 1; :::; Q. For the most = 0:05, we obtained from (12) the constants c (Q) 10 and c (Q) OF

11 for Q = 2, as a function of the correlation = 12, by numerical integration using the Gaussian uadrature method with 48 nodes. The results are tabulated below (Tables 1 and 2) for correlations ranging from 0.00 to 0.99, in increments of The value of c (Q) or c (Q) OF for a negative correlation is the same as the value for a positive correlation with the same magnitude, due to the symmetry of the integration region. In the following we illustrate the calculation of based on a uniform follow-up plan dened at the end of the last section. Assume a constant intensity rate t = r 0. Then (T 0 ) = r 0 T 0 where T 0 := K?1 t=0 H t. Assume that all subjects enter study at time 0, there is no loss to follow-up, and one interim analysis is scheduled at t = K 1 = K 2 =2. In this case (T 01 ) = r 0 K 1, and (T 02 ) = r 0 K 2. Then the parameters for the asymptotic distribution of the Wald statistic are = n 1=2 (? 0 )f2?1 (r 0 K )?1 + 4g?1=2 ; = 1; 2! 1= r and = 2?1=2 0 K 2 where is dened in (4). (14) 1 + r 0 K 2 The frailty parameters r 0, (in ) and in (14) need to be estimated before the study, strictly speaking. They could be estimated by the method of moments, or by a method of negative binomial regression (Lawless, 1987; Abu-Libdeh, Turnbull and Clark, 1990; Turnbull et al., 1997), based on some pilot study data set, or results from previous studies with a similar nature. One thing to notice in (14) is that the correlation, as a function of the frailty, is always bigger than the correlation without frailty ( = 0), which is ( = 0) = 2?1=2 0:7071. Comparing with Tables 1 and 2, we see that it will always be conservative to use c (Q) 2:18 and c (Q) OF 1:98 in testing H 0, which corresponds to neglecting the frailty by taking = 0 and = 2?1=2. If frailty is taken into account, the rejection critical values c (Q) and c (Q) OF will be smaller, making it easier to reject H 0 for a same value of the Wald statistic. This shows a major implication of property (1) in the last section on the stopping boundaries. (1) implies, by the Slepian's ineuality (e.g., Theorem

12 of Tong, 1980), that the existence of frailty allows the use of narrower stopping boundaries. The usual stopping boundaries obtained based on the independent increments assumption, neglecting frailty, are conservative. In the rest of this section, we consider extensions to multi-stage analyses (Q > 2). The multidimensional numerical integration algorithms have decreasing accuracy and reliability as Q increases. However, we notice that a convenient alternative algorithm is available for uniform followup plans. In this situation, (9), (10) and Theorem 1 imply that (A? A?1 )?1=2 [A 1=2 fw ( 0 )? g? A 1=2?1fw?1 ( 0 )??1 g]; 1 Q are independent standard normal random variables (A 0 = 0). This leads to the following iterative (1-dimensional) integration algorithm, in the same form of Armitage et al. (1969), for () dened in (11): () = Z c (Q) Q?c (Q) Q where g(q; z; ) is iteratively dened as the following: g(q; z; )dz g(1; z; 0) = '(z) := (2)?1=2 exp(?z 2 =2); For 1 Q? 1, g( + 1; z; 0) =! 1=2 A Z (Q) c +1 A +1? A?c (Q) g(; u; 0)'f A1=2 +1z? A 1=2 u gdu; 1=2 (A +1? A ) and g(q; z; ) = g(q; z; 0) expf?8?1 A Q 2 + 2?1 z(a Q ) 1=2 g: This facilitates calculation of the stopping boundaries, powers and the sample sizes for multi-stage designs. When (Q = 2), the single-parameter () parameterization was very convenient for setting up concise tables for stopping boundaries. For Q > 2, the stopping boundary will in general depend on a Q-dimensional symmetric matrix determined by the recruitment/follow-up model. This will 12

13 lead to diculties in the summarization of the designing aspects of the interim analyses. However in a class of recruiting schemes simpler description is achievable for the seuential designs. Suppose, in addition to the uniform follow-up, we have an eual increments procedure, where = ; 1 Q for some constant, parameterization of and 0's can be further simplied: = Dh (); 0 = h ()=h 0(); 0 = 0; 1 0 Q (15) where D = 2?1 (2) 1=2, h () = ((? 0:5)(1? )?1 +?1 )?1=2 and = () := 2 12 = (2 + 0:5)(2 + 1)?1 : (16) (15) is easily obtained from (9) and (10) by noting that A 1=2 = (2) 1=2 h (); 1 Q: Hence we can in this case parameterize the power function by D and ( 2 [0:5; 1]). contains the input of frailty parameter. When there is no frailty, = 0:5. Consider now the null hypothesis = 0. Note that = n 1=2 (? 0 ) in D is 0, and so is D. The stopping boundaries can then be solved from (12), labeled by one parameter only. Table 3 lists c (Q) ranging from 0:50 to 0:90 in increments of 0.01, when I and c (Q) OF for Q = 2; 3; 4; 5 and = 0:05. When Q = 1, c (Q) and c (Q) OF are simply z I =2 1:96, where z is the 100(1? )th percentile of the standard normal cumulative distribution function. Consider now the alternative hypothesis = a. To solve for the sample size from (13), note that (13) involves two variables D and only through parameterization (15), and the sample size enters D through = n 1=2 (? 0 ). We therefore solve D from (13) as a function of to obtain D = D(), which also (implicitly) depend on Q, I and II (error rates reuired), as well as the type T of stopping boundaries (T = or OF ). Then, the denition of D below (15) leads to n = 4fD()g 2 ( a? 0 )?2 (2)?1 (17) where = ( a ), and () is dened in (4). If Q = 1, and = 0:5 (no frailty), D() can be solved directly as (z I =2 +z II ). Hence, in a xed design without frailty, the sample size needed for a study 13

14 length Q is n 0 = 4(z I =2 + z II ) 2 ( a? 0 )?2 (2Q)?1 : (18) n 0 is to be used as a reference sample size. The sample size reuired in (17) can then be expressed as n = R (Q) T ()n 0, where the ratio R (Q) T () = QD() 2 (z I =2 + z II )?2 is tabulated at I = 0:05, and II = 0:20 (Table 4) and 0:10 (Table 5), for Q = 2 to 5 and = 0:50 to 0:90 in increments of In practice, sample size could be obtained by rst nding the xed design \frailtiless" sample size n 0 from (18), then multiplying by a ratio R (Q) T () read from Table 4 or 5, if there is an initial estimate of frailty parameter to determine. We nd it very convenient to base the sample size planning on these tables. For comparison, the rst row R (1) in those tables lists the sample size ination ratio reuired in xed designs with > 0:5, induced by the existence of frailty. Now we comment on the implication of properties (1) and (2) of the last section on the sample size planning. (1) and (2) have opposite implications. With xed type-i and II error rates, (1) alone would imply that a smaller sample size could be used with the presence of frailty. However in reality the increase of standard error, due to (2), is often overwhelming, and a net result is that a much bigger sample size is reuired, due to the existence of frailty. The increment of the reuired sample size can be seen from Tables 4 and 5, where the ratio R (Q) T ()'s increase with, and hence also increase with the frailty parameter. 4. Repeated condence intervals Repeated condence intervals (RCIs) are a method which allows the study results to be evaluated exibly at interim analyses without depending on the rigid stopping criteria (Jennison and Turnbull, 1984; Lai, 1984; Coe and Tamhane, 1993). Here we construct the RCIs for recurrent events data with frailty. Note rst that the boundary fc (Q) g in the previous section is dependent on 0 (the value of under H 0 ) through the correlation 0's which depend on ( 0 ). We in this section explicitly 14

15 express such relation as c (Q) = c (Q) ( 0 ). Euation (12) can then be rewritten as pr 0 fjw ( 0 )j c (Q) ( 0 ); = 1; :::; Qg = 1? I : Let RCIs be I (Q) = f 0 : jw ( 0 )j c (Q) ( 0 )g; = 1; :::; Q: Then pr 0 f 0 2 I (Q) ; = 1; :::; Qg = 1? I : Then pr 0 f 0 2 I g 1? I for any stopping rule. Note that w ( 0 ) = (? 0 )fvar( ^ ^ )g?1=2 = (? 0 )= ^se( ^ ). When sample size n is large, the leading order approximation to the RCI I (Q) is determined by I (Q) = f 0 : j? 0 j= ^se( ^ ) = jw ( 0 )j c (Q) ( ^ )g; = 1; :::; Q; replacing c( 0 ) with c( ^ ). The solution becomes I (Q) = ( ^? c (Q) ( ^ ) ^se( ^ ); ^ + c (Q) ( ^ ) ^se( ^ )); = 1; :::; Q: (19) 5. Minimal cost analysis For recurrent events study planning, we often have the choice of increasing the sample size or study duration (or expected number of events per subject) to achieve a certain power. Optimal combination of sample size and duration could be determined by a minimax-type cost analysis. This is made possible by the sample size calculation method presented in Section 3. We assume an eual increments procedure where = ; 1 Q. C 0 0 Suppose the maximal cost of a study is C Q (n; ) = C 0 n(q + 0 ), where C 0 is a constant, and is the starting cost for recruiting one subject. 0, termed initial duration, is the expected number of events to be observed that would cost the same as recruiting one extra subject. Note that n = n 0 R (Q) T () / R (Q) T ()(Q)?1 from Section 3, where = () is dened in (16). We have C Q (n; ) / R (Q) T f( 0 x)gf1 + (Qx)?1 g 15

16 where x := = 0. An alternative expression in terms of is C Q (n; ) / R (Q) T ()f Q?1 (1? )(? 0:5)?1 g (20) Up to a multiplicative constant, this could be calculated for each from the tables of R (Q) T (), to search for the minimizer op. Then the minimizer in is op = (2)?1 ( op? 0:5)(1? op )?1 : The optimal sample size is then n op = n 0 R (Q) T ( op ) = R (Q) T ( op )4(z I =2 + z II ) 2 ( a? 0 )?2 (2Q op )?1 : In xed design Q = 1, R (1) () can be analytically evaluated to be 0:5(1? )?1, the optimal points could then be solved as op = (2 0 )?1=2 0, R (1) ( op ) = op and n op = (1 + 2 op )4(z I =2 + z II ) 2 ( a? 0 )?2 (2Q op )?1 : 6. An example Let us use a rats experiment data set (Gail, Santner and Brown, 1980; Thompson et al., 1978) retrospectively to illustrate our method. The data set itself was really obtained from a xed design. However we imagine that it was designed to have an interim analysis at halftime, and see what information a halftime analysis can provide in deciding whether we need to further carry out the other half of the experiment, as an illustration of our method. The data set has a relatively small sample size (48) and a relatively big average number of events (about 6). Further simulations will be reuired to test if these conditions are good enough for our proposed asymptotic method to work satisfactorily. Here the main motivation will be to use this example to illustrate the methodology. 48 female rats who remained tumor-free after sixty days of pre-treatment of a prevention drug (retinyl acetate) were randomized into two groups. In Group 1 (23 rats) they continue to receive 16

17 treatment (Z=1), in Group 2 (25 rats) they receive placebo (Z=0). Rats were palpated for tumors twice a week. For details see Thompson et al. (1978). Times of mammary tumor diagnoses were recorded, from which our response variable Y it 's are constructed. The objective of the study was to see if discontinuation of treatment leads to more tumors diagnosed. Formally, we would like to test the null hypothesis H 0 : = 0 at level I = 0:05. The original design was to follow all rats for a xed length of time (122 days). However, imagine that an interim analysis was planned on the data gathered up to the 61th day (halftime), and we will see how the method described above can be applied. We will use the local oisson process model with frailty, as introduced in Section 2, and the uniform follow-up plan as described in Section 3, to calculate the correlation (14) and perform the analysis. Based on the halftime data ( = 1), we get the maximum partial likelihood estimate as ^ 1 =?0:7549, with the robust estimate of standard error ^se( ^ 1 ) = 0:2427. Here the subscript 1 is used to denote the rst interim analysis. Note that the Z-value (or the Wald statistic) is the ratio?0:7549=0:2427 =?3:1104. If we decided in advance to use a ocock's boundary, then c (2) 1 = c (2). We notice from Table 1 that c (2) for any correlation is not as large in magnitude as our test statistic. So jw 1 (0)j = j ^ 1 j= ^se( ^ 1 ) c (2) 1. Then we could stop the trial and reject H 0 at halftime. If the OF procedure was decided before hand, we would be comparing the Z-value with c (2) 1 = 2 1=2 c (2) O. From Table 2 we again see that even the largest possible c (2) O will make c (2) 1 smaller than the Z-value. Therefore we decide to reject H 0 and discontinue the experiment at halftime. The actual analysis at Day 122 gives ^ 2 = , ^se( ^ 2 ) = 0:1968, and a Z-value?4:1819. RCIs with an overall level of condence 95% ( I = 0:05) can provide a range of plausible values for at the interim analysis, without conforming to a rigid stopping rule. In order to calculate the critical value c (2) 1, we need to nd the correlation from (14). The constants r 0 and are needed from some pilot study. Since we do not have such information, two approaches could be used. One is to simply use the lower bound of which is 0:7071. Another is to estimate r 0 and by a negative 17

18 binomial regression or a method of moment estimation from the data set accumulated up to the time of the interim analysis. The second approach, unlike in the case of tests where boundaries need to be pre-specied bef ore the experiment, is legitimate at present for obtaining RCIs up to the leading order of large sample sizes. This is because the estimates of r 0 and are used only as approximations to their true values. The rst approach ( = 0:7071) is conservative but is simpler, and will result in a slightly wider RCI. We did perform the second approach for the half-time data set and found that the resulting RCIs are very close to the conservative ones. Here we decide only to report the conservative results, corresponding to using = 0:7071. For RCI derived from the ocock's procedure, we use c (2) 1 = c (2) = 2:18 from = 0:7071, in place of the coecient c (2) 1 ( ^ 1 ) in (19). We obtain I 1 = (?1:28;?0:226). For OF's procedure we use c (2) 1 = 2 1=2 c (2) O = (1:41)(1:98) and get (?1:43; 0:0763). These, in the scale of of the risk ratio (e ), become the intervals (0.277, 0.797) and (0.239, 0.927) respectively. These imply that the treatment eect could range from roughly none, to reducing the freuency of tumors to about a uarter. The exibility of RCI approach allows the decision on the continuation of the experiment being independent of the rigid stopping rules. If we decide to carry on the experiment, we can obtain a second RCI at Day 122 based on the complete data. Using a correlation of 0:7071 to obtain a most conservative interval, we get the following intervals in the scale of risk ratio (e ). For ocock type RCI we obtain (0.286, 0.674), while for OF type we obtain (0.298, 0.648). Notice that the OF type gives a wider RCI for the rst interim analysis, but gives a narrower RCI for the later one. In conclusion, we nd that the treatment eect reduces the tumor freuency to about one-third to two-thirds of the control group rats. Now imagine that the present study is a pilot study for the purpose of estimating the sample size for the study of another drug, with the same time table of scheduled analyses, i.e., an interim analyses at Day 61 (K 1 ), plus a possible nal analysis at Day 122 (K 2 ). The present pilot study provides an estimate of r 0 = 6:04=122 (or about 1 tumor every 20 days), as well as a frailty 18

19 parameter = 0:2665, from a negative binomial regression (see, e.g., Abu-Libdeh et al., 1990; or Turnbull et al., 1997). Therefore we can use a rough value r 0 = 6=122 and = 0:3 in our sample size calculation. Suppose we are interested in detecting a drug eect of 80% in risk ratio, which corresponds to an alternative hypothesis H a : = a = log(0:80) =?0:2231, with power 1? II = 0:80. Then = ( a ) = 0:4444 where () is dened in (4). These lead to a correlation = 0:8498. From Tables 1 and 2 we obtain c (2) 1 = c (2) 2 = c (2) and c (2) 1 = 2 1=2 c (2) O = 2:780, c (2) 2 = c (2) O = 2:133 for the ocock's boundary; = 1:966 for the OF's boundary. Then we can use euation (13) to estimate the sample size. Alternatively we could use Table 4. Note that 12 = 0:85. = :73 (rounded up). From (18), we obtain a reference sample size n 0 as 119. According to Table 4, sample size ratios 2:8817 and 2:7086 are needed for power 0:80, leading to adjusted sample sizes n of 343 and 323, for ocock and OF design, respectively. If frailty was neglected ( = 0 and = 0:5), however, the sample size ratios would be and , respectively for ocock and OF designs with Q = 2, leading to corresponding sample sizes of 132 and 120 that would be naively planned. The correct sample sizes are more than twice the naive ones. This increase of sample size is reuired, however, to achieve the real power of If the naive sample size were to be used, real power can only be achieved at and 0.416, for ocock and OF's procedures respectively, instead of the desired one 0:80. It is also interesting to look at the sample size needed for a x design, if the error rates are specied to be the same as before. Suppose we plan to observe the subjects for 122 days, and assume that r 0 K Q is about 6, and the frailty is taken as 0:3 as before. In the previous formalism we need to replace by r 0 K Q = 6 when calculating by (16). The result is 0:81. The corresponding sample size ratio R (1) is For the previously obtained reference sample size n 0 = 119, we get an adjusted sample size n 314. Note that when there exists frailty ( = 0:3), the sample size reuired for a xed design (314) is only slightly smaller than the sample size reuired for a seuential design (e.g., 323 for the OF design from last paragraph). On the other hand, 19

20 a seuential design also allows the possibility of a shorter study period by possible stopping at halftime. These two observations suggest that in trials with recurrent events data with frailty, seuential designs are recommendable. Now we consider a cost analysis. The Type-I and II error rates are reuired to be 0.05 and 0.20, respectively. Suppose the initial duration described in Section 5 is 0 = 4. a = log(0:8), = 0:3 as before. We used (20) and Table 4 to obtain op = 0:68, op = 2:11 (corresponding to about 40 days follow-up each stage) and n op = 385, for the ocock design; and op = 0:67, op = 1:93 (corresponding to about 40 days follow-up each stage) and n op = 374, for the OF design. Using these results of optimal sample sizes and durations, it is straightforward to check that the OF design will have less optimal maximal cost C Q (n op ; op ) (93% of that of the ocock design). 7. Discussion This paper is an attempt to prescribe how to perform interim analyses for studies with recurrent events data with frailty. The most general level of our approach does not impose independent increment structures for the seuential test statistics. As a result, there may be other problems with non-independent increments of test statistics to which the present method could be applied. For example, Tables 1 and 2 only depend on the correlation, rather than the underlying mechanism by which the correlation is induced. However, computation is most convenient in the case when follow-up variation is negligible and the iterative 1-dimensional integration techniue can be applied. For eual increments procedures, parameterization can be further simplied and tables for stopping boundaries and sample size planning are provided for 2 to 5-stage analyses. The failure of independent increment structure leads to, in this case, a one-parameter family of stopping boundaries labeled by a parameter depending on the frailty, which reduce to the usual stopping boundaries when = 0:5. The impact of the frailty on the designing aspects are explicitly investigated. The existence of 20

21 frailty induces extra correlation between the seuentially calculated statistics, leading to smaller stopping boundary constants c (Q) and c (Q) OF for a pre-specied type-i error rate. However due to the ination of the standard error caused by the frailty, the sample size needed to achieve a certain power will often be larger. Our methods are asymptotic and depend on the large sample approximation. A study on the agreement of the empirical and nominal type-i error rates are discussed in Cook and Lawless (1996), in the situation of constant recruitment rates. Further such studies under dierent parameter ranges and dierent recruitment plans will be helpful. Our method will probably be useful in the context when initial stage analysis involves already a relatively large number of subjects (e:g:, > 100), and decision on whether or not to continue the follow-up/recruitment is to be made at each stage. There are still many opening uestions left in this direction. Various recruitment procedures of practical interest may be considered in calculating the correlations. A dierent problem is to look at the seuential bioeuivalence tests involving recurrent events outcomes with frailty. When the multivariate asymptotic normality does not hold but the marginal asymptotic normality for each seuentially calculated score-type statistic does, conservative stopping boundaries may be obtained from Bonferroni-type probability ineualities, which should work well for small number of interim analyses such as 5 or 6. When marginal asymptotic normality fails as well, e:g:, due to very small sample size at the initial stage, Chebyshev-type ineualities (or their correlationadjusted versions) may provide conservative stopping boundaries{the drawback is that they may be very conservative. These are the price to be paid for being robust in terms of the distributions of the test statistics. Acknowledgment The author is deeply indebted to rofessor Bruce Turnbull for helpful discussions and teaching on seuential analysis. He is also grateful to rofessor Ajit Tamhane for commenting on the 21

22 manuscript. This research is partly supported by the URGC Award of Northwestern University, and the U.S. National Science Foundation grant DMS References Abu-Libdeh, H., Turnbull, B. W., and Clark, L. C. (1990). Analysis of multi-type recurrent events in longitudinal studies; application to a skin cancer prevention trial. Biometrics 46, Andersen,. K., Borgan, O., Gill, R. D. and Keiding, N. (1993). Statistical Models Based on Counting rocesses. New York: Springer-Verlag. Armitage,., Mcherson, C. K., and Rowe B. C. (1969). Repeated Signicance Tests on Accumulating Data. Journal of the Royal Statistical Society, Series A 132, Armitage,., Stratton, I. M. and Worthington, H. V. (1985). Repeated signicance tests for clinical trails with a xed number of patients and variable follow-up. Biometrics 41, Coe,. R. and Tamhane, A. C. (1993). Exact repeated condence intervals for Bernoulli parameters in a group seuential clinical trial. Controlled Clinical Trials 14, Cook, R. (1995). The design and analysis of randomized trials with recurrent events. Statistics in Medicine 14, Cook, R. and Lawless, J. F. (1996). Interim monitoring of longitudinal comparative studies with recurrent event responses. Biometrics 52, Cook, R. J., Lawless, J. F. and Nadeau, C. (1996). Robust tests for treatment comparisons based on recurrent event responses. Biometrics 52, Gail, M. H., Santner T. J. and Brown, C. C. (1980). An analysis of comparative carcinogenesis experiments based on multiple times to tumor. Biometrics 36, Geary, D. N. (1988). Seuential testing in clinical trials with repeated measurements. Biometrika 75, Jennison, C. and Turnbull, B.W. (1984). Repeated condence intervals for group seuential clinical trials. Controlled Clinical Trials 5,

23 Jennison, C. and Turnbull, B.W. (1989). Interim analyses: the repeated condence interval approach. (with discussion). Journal of the Royal Statistical Society, Series B 51, Jiang, W. (1996). Aspects of Misspecication in Statistical Models: Applications to Latent Variables, Measurement Error, Random Eects, Omitted Covariates and Incomplete Data, h. D. Thesis, Cornell University. Jiang, W. and Turnbull, B. W. (1997). Semiparametric Regression Models for Repeated Events with Random Eects and Measurement Error. (submitted to Journal of the American Statistical Association). Lai, T. L. (1984). Incorporating scientic, ethical and economic considerations into the design of clinical trials in the pharmaceutical industry: a seuential approach. Communications in Statistics (A){Theory and Methods 13, Lan, K. K. and DeMets. D. L. (1983). Discrete seuential boundaries for clinical trials. Biometrika 70, Lawless, J. F. (1987). Regression methods for oisson process data. Journal of the American Statistical Association 82, Lawless, J. F. and Nadeau, C. (1995). Some simple robust methods for the analysis of recurrent events. Technometrics 37, Lee, J. W. and DeMets, D. L. (1991). Seuential comparison of changes with repeated measurements data. Journal of the American Statistical Association 86, Lee, J. W. and DeMets, D. L. (1992). Seuential rank tests with repeated measurements in clinical trials. Journal of the American Statistical Association 87, O'Brien,. C. and Fleming, T. R. (1979). Biometrics 35, A multiple testing procedure for clinical trials. Oakes, D. (1992). Frailty models for multiple event times. In Survival Analysis: State of the Arts, Ed. J.. Klein and. K. Goel, pp Netherlands: Kluwer Academic ublishers. ocock, S. J. (1977). Group seuential methods in the design and analysis of clinical trials. Biometrika 64, Schervish, M. J. (1984). Multivariate normal probabilities with error bound. (with corrections in 1985). Applied Statistics 33,

24 Thompson, H. F., Grubbs, C. J., Moon, R. C. and Sporn, M. B. (1978). Continual reuirement of retinoid for maintenance of mammary cancer inhibition. roceedings of the Annual Meeting of the American Association for Cancer Research 19, 74. Tong, Y. L. (1980). robability ineualities in multivariate distributions. Academic ress, New York. Tsiatis, A. A., Boucher, H. and Kim, K. (1995). models. Biometrika 82, Seuential methods for parametric survival Turnbull, B. W., Jiang, W. and Clark, L. C. (1997). Regression models for recurrent event data: parametric random eects models with measurement error. Statistics in Medicine 16,

25 Appendix A Lemma 1. If true parameter is = 0 + n?1=2 (local alternative), then under mild regularity conditions, n?1=2 ~U( 0 )! Normal f~i(); n?1 V g in distribution as n! 1. Here ~U(b) is the column of U (b)'s for each b 2 R; V is the (asymptotic) variance-covariance matrix of ~U(); and ~I() is the column of partial likelihood information I ()'s, where I () can be obtained from the asymptotic limit of n?1 ^I dened by (2), when restricted to the data in t 2 [0; K ). We here only give the outline of the proof. We claim that n?1=2 ~U() is asymptotically normal with mean 0, if the true parameter is. Then a Taylor expansion n?1=2 ~ U(0 ) = n?1=2 ~ U()+ ~ I()+op (1) is used to obtain Lemma 1, using the Slutsky's theorem. To prove the claimed asymptotic normality of n?1=2 ~U(), it suces to show that n?1=2 U? () is asymptotically normal, where U? () is the column of dierence fu ()? U?1 ()g's (U 0 () = 0). We show this by noting that U? () can be regarded as a usual score vector of dimension. This follows by constructing a Q-dimensional time-dependent covariate vector ~Z it = ( ~Z it1 ; :::; ~Z itq ) 0, where ~Z it = Z i 1 t, 1 t = 1 if t 2 [K?1 ; K ) and 0 otherwise, for = 1; :::; Q. Construct also a Q-dimensional vector ~ b = (b 1 ; :::; b Q ) 0. Then U? () = r ~ ~ b L( ~ b)j ~ b= ~, which is the score vector for the log partial likelihood ~L( ~ b) = log ny K?1 Y i=1 t=0 e ~ Z 0 it ~ b n j=1 H jt e ~ Z 0 jt ~ b 1 AY it ; taking value at ~ b = ~ = (; :::; ) 0. Note that ~ corresponds to the true parameter in this formalism, since Er ~ b ~ L( ~ b)j ~ b= ~ = 0. Finally, since U? () = r ~ ~ b L( ~ b)j ~ b= ~ has the form of a usual score vector, there are several dierent methods to prove it to be asymptotically normal with mean 0. A proof can be made from using a multivariate central limit theorem, by rst showing that U? () is a sum of n i.i.d. random vectors plus a term of order o p (1), or from using a martingale central limit theorem, or alternatively from recognizing ~L( ~ b) as the prole likelihood for a system of n independent oisson processes. 25

26 c (2) Appendix B Five Useful Tables Table 1 as a function of ( I = 0:05) c (2) Table 2 OF as a function of ( I = 0:05)

27 c (Q) Table 3 and c (Q) OF as a function of ; Q = 2; 3; 4; 5 ( I = 0:05) c (2) c (2) OF c (3) c (3) OF c (4) c (4) OF c (5) c (5) OF

Independent Increments in Group Sequential Tests: A Review

Independent Increments in Group Sequential Tests: A Review KyungMann Kim kmkim@biostat.wisc.edu University of Wisconsin-Madison, Madison, WI, USA July 13, 2013 Outline Early Sequential Analysis Independent