Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Size: px

Start display at page:

Download "Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation"

Antony Lewis
5 years ago
Views:

1 Research Article Received XXXX ( DOI: /sim.0000 Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation Ajit C. Tamhane 1, Yi Wu and Cyrus R. Mehta 3 In this Part II of the paper on adaptive extensions of a two-stage group sequential procedure (GSP) proposed by Tamhane, Mehta and Liu [1] (referred to as TML hereafter) for testing a primary and a secondary endpoint, we focus on the second stage sample size re-estimation based on the first stage data. First we show that if we use the Cui, Huang and Wang [] (referred to as CHW hereafter) statistics at the second stage then we can use the same primary and the secondary boundaries as for the original procedure (without sample size re-estimation) and still control the type I familywise error rate (FWER). This extends their result for the single endpoint case. We further show that the secondary boundary can be sharpened in this case by taking the unknown correlation coefficient ρ between the primary and secondary endpoints into account through the use of the confidence limit method proposed in Part I of this paper [3]. If we use the sufficient statistics instead of the CHW statistics then we need to modify both the primary and secondary boundaries; otherwise the error rate can get inflated. We show how to modify the boundaries of the original GSP to control the FWER. Power comparisons between competing procedures are provided. The procedures are illustrated with a clinical trial example. Copyright c 0000 John Wiley & Sons, Ltd. Keywords: Adaptive designs; Familywise error rate; Gatekeeping procedures; Multiple comparisons; Multiple endpoints; O Brien-Fleming boundary; Pocock boundary; Sample size re-estimation. 1. Introduction Adaptive designs have received wide acceptance both from clinical researchers as well as regulators. A common adaptation is sample size re-estimation based on interim estimates of the treatment effect and/or of the variance of the observations. Much work has been done on the problem of sample size re-estimation in group sequential designs; see, e.g., Fisher [4], Cui et al. [], Jennison and Turnbull [5, 6, 7], Gao, Ware and Mehta [8] and Mehta and Pocock [9]. All of these works 1 Department of IEMS, Northwestern University, Evanston, IL 6008, U.S.A. Department of, Northwestern University, Evanston, IL 6008, U.S.A. 3 Cytel Corporation and Harvard School of Public Health, Cambridge, MA 0139, U.S.A. Correspondence to: Ajit C. Tamhane, Department of IEMS, Northwestern University, Evanston, IL Phone: (847) , atamhane@northwestern.edu Statist. Med. 0000, [Version: 010/03/10 v3.00] Copyright c 0000 John Wiley & Sons, Ltd.

2 A.C. Tamhane et al. deal with a single endpoint. In this paper we extend these methods to the two endpoints case where the primary endpoint is a gatekeeper for the secondary endpoint. The outline of the paper is as follows. In Section we review the basic notation and assumptions. Section 3 describes two methods of sample size re-estimation. Section 4 studies two GSPs based on the CHW statistics. The first GSP uses separate α-level primary and secondary boundaries; this choice of boundaries implicitly assumes the least favorable value of ρ 1. The second GSP uses sharper boundaries that take into account the unknown ρ through the confidence limit method proposed in Part I. A power comparison between the two GSPs is given to assess the power advantage of the second GSP. Section 5 discusses the use of the sufficient test statistics instead of the CHW test statistics. Section 6 discusses numerical evaluation of the critical boundaries for the GSP based on the sufficient test statistics. Section 7 gives simulation-based power comparisons between competing procedures. Section 8 shows how the methods presented in the previous sections for the one-sample problem extend to the two-samples problem. Section 9 gives a clinical trial example to illustrate the methodology proposed in the paper. Finally, Section 10 gives a discussion and final recommendation.. Notation and Background The basic setup is the same as in Part I which we review here briefly for convenience.consider a two-stage GSP with sample sizes n 1 and n with i.i.d. bivariate normal observations (X ij, Y ij ) on the primary and secondary endpoints for the jth patient in the ith stage (i 1,, j 1,..., n i ) where X ij N(µ 1, σ1 ), Y ij N(µ, σ ) and corr(x ij, Y ij ) ρ 0. Let δ 1 µ 1 /σ 1 and δ µ /σ. For convenience, assume that σ 1 and σ are known, but all other parameters are unknown. In practice, σ 1 and σ are unknown and are estimated. In Section 5 of Part I, we studied via simulation the effect on the FWER of using estimates in place of the unknown σ 1 and σ and found that the FWER requirement (1) is satisfied. The hypotheses to be tested are H 1 : δ 1 0 and H : δ 0 against upper one-sided alternatives, subject to the gatekeeping restriction that H can be tested only if H 1 is rejected; otherwise H is accepted without a test. Any test procedure under consideration must satisfy the FWER requirement FWER P {Reject at least one true H i (i 1, )} α (1) for a specified α when either H 1 or H is true. The first stage test statistics are defined as X 1 X (1) X(1) σ 1 / and Y 1 Y (1) Y (1) n 1 σ /, () n 1 where X (1) and Y (1) are the first stage sample means. Similarly the second stage test statistics are defined as X () X() σ 1 / and Y () Y () n σ /, (3) n where X () and Y () are the second stage sample means. Then the cumulative test statistics at the final look can be expressed as X fx (1) + X (), Y fy (1) + Y () (4) where f n 1 /(n 1 + n ) is the information fraction (Lan and DeMets [10]) at the interim look. The GSP, denoted by P, operates as follows. Stage 1: Take n 1 observations, (X 1j, Y 1j ), j 1,..., n 1, and compute (X 1, Y 1 ). If X 1 c 1 continue to Stage. If X 1 > c 1, reject H 1 and test H. If Y 1 > d 1, reject H ; otherwise accept H. In either case, stop sampling. Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

3 A.C. Tamhane et al. Stage : Take n observations, (X j, Y j ), j 1,..., n, and compute (X, Y ). If X c, accept H 1 and stop testing; otherwise reject H 1 and test H. If Y > d, reject H ; otherwise accept H. The critical boundaries (c 1, c ) and (d 1, d ) of P are determined to satisfy the FWER requirement (1). We next define the notation for the case where the second stage sample size is re-estimated in light of the first stage data. We will denote the re-estimated second stage sample size by n ; typically, n n. Rules for determining n are generally functions of the primary conditional power (CP) of the GSP, defined as CP P {X > c X 1 x 1 }. (5) In the sequel, we will consider two different statistics for adaptive designs. To define these statistics denote the second stage test statistics for the primary and secondary endpoints based on n by X () X () σ 1 / n and Y () Y () σ /, (6) n where X () and Y () are the second stage sample means based on n. The CHW statistics are defined as X fx (1) + X () and Y fy (1) + Y (), (7) which use the unadjusted weights, f and based on the pre-planned sample sizes. As a result, they are not sufficient statistics, i.e., they are not functions of the overall sample means X (n 1 X (1) + n X () )/(n1 + n ) and Y (n 1 Y (1) + n Y () )/(n1 + n ), respectively. Therefore we will also consider the following sufficient test statistics at the second stage: and X Y X σ 1 / n 1 + n Y σ / n 1 + n f X (1) + X () f Y (1) + Y () where f n 1 /(n 1 + n ) is the information fraction at the interim look for the adaptive design and f and are the adjusted weights based on the adjusted sample size. (8) (9) 3. Methods for Sample Size Re-estimation The second stage sample size will be re-estimated if the primary conditional power (CP) after the first stage is not large enough to meet the prespecified power requirement. Generally, this is because the actual treatment effect is smaller than the expected treatment effect for which the trial was designed to meet the prespecified power requirement. An expression for CP is given by ( CP P fx (1) + ) X () X > c (1) x 1 ( P X () δ 1 n > c x 1 f ) X δ (1) 1 n x 1 ( ) c x 1 f 1 Φ δ 1 n. (10) We will consider two different rules for determining the second stage adjusted sample size n. Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 3

4 A.C. Tamhane et al. 1. Gao et al. [8] proposed determining n so that CP is preserved at the planned value 1 β. If z β denotes the 1 β quantile of the standard normal distribution then from (10) we get the following formula for n : n 1 δ 1 { 1 n [ c n1 + n x 1 n1 ] + zβ }. (11) In this expression δ 1 is an unknown parameter, which we can replace by its sample estimate, δ1 x 1 / n 1. However, this can result in impractically large n values if x 1 or equivalently CP is small. Therefore we truncate n using the following modification: { n min(n according to (11), γn ) n if CP min CP CP max (1) otherwise, where [CP min, CP max ] is called the promising zone and γ > 1 is a prespecified constant.. Due to the sensitivity of the above formula to δ1 and the numerical difficulties involved in computing the modified critical boundaries, the power comparisons reported in the present paper were carried out using a simpler version of the above rule, called the fixed increase (FI) rule in which the dependence of n on δ1 is dropped. Thus { n γn n if CP min CP CP max (13) otherwise. It is possible to devise other modifications. For example, one could increase n to n γn for CP in the interval [CP min, CP mid ], where CP mid is some point in the interval [CP min, CP max ] (e.g., the midpoint of the interval) and linearly taper off n to n over the interval [CP mid, CP max ]. 4. Procedures Based on Cui-Hung-Wang (CHW) We will offer two procedures, labeled P 1 and P, based on the CHW statistics. P 1 uses separate α-level boundaries, (c 1, c ) and (d 1, d ) for testing H 1 and H (see (14) below), and implicitly assumes that the unknown ρ is at its least favorable value ρ 1. On the other hand, P uses the adjusted secondary boundary (d 1, d ) based on the confidence limit method for unknown ρ proposed in Part I Procedure P 1 Let (c 1, c ) and (d 1, d ) be any α-level boundaries for the primary and the secondary endpoint, respectively. Procedure P 1 is the same as P stated in Section but uses the CHW statistics (X 1, X ) and (Y 1, Y ) in conjunction with these boundaries. In Proposition 1 we show that P 1 controls the FWER. In fact, we will show this result for a general m-stage GSP for m which is an extension of the corresponding result in CHW for the single endpoint case. We first introduce the necessary notation for this general case. Consider a non-adaptive procedure P for a general m-stage GSP with sample sizes n 1,..., n m which uses the test statistics (X 1,..., X m ) and (Y 1,..., Y m ) in conjunction with the boundaries (c 1,..., c m ) and (d 1,..., d m ), respectively. The test statistics (X i, Y i ) are the standardized cumulative sample means at the ith look (i 1,..., m) and (c 1,..., c m ) and (d 1,..., d m ) are separate α-level boundaries which satisfy P H1 ( m i1{x i > c i }) α and P H ( m i1{y i > d i }) α. (14) 4 Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

5 A.C. Tamhane et al. Consider a similar generalization of the adaptive m-stage GSP in which the sample sizes are adjusted in the same proportion at some stage l < m for all subsequent stages i > l, i.e., the adjusted sample sizes are n i γn i for all stages i > l, where the common proportionality constant γ 1 is a function of the data up to stage l. Let N i n n i be the cumulative unadjusted sample sizes for stages i 1,..., m and let f ij n j /N i (1 j i, 1 i m). Denote by X (i) and Y (i) the stagewise standardized statistics for stage i in the non-adaptive part of the procedure where X (i) ni j1 X ni ij and Y (i) j1 Y ij σ 1 ni σ ni for 1 i l and by X (i) and Y (i) the corresponding statistics for the adaptive part of the procedure, where X (i) n i j1 X ij σ 1 n i and Y (i) Then the CHW statistics for stages i > l can be expressed as X i l fij X (j) + j1 i jl+1 fij X (j) and Y i n i j1 Y ij σ n i for l + 1 i m. l fij Y (j) + j1 i jl+1 fij Y (j), (15) which generalize those in (7). These CHW statistics can be shown to be equivalent to Fisher s [4] variance function statistics when the sample size re-estimation is done only at one stage during the trial and the sample sizes for all subsequent stages are adjusted in the same proportion. Proposition 1 Consider an m-stage group sequential procedure P with fixed sample sizes n 1,...,n m which uses the test statistics (X 1,..., X m ) and (Y 1,..., Y m ) in conjunction with α-level boundaries (c 1,..., c m ) and (d 1,..., d m ) satisfying (14). Let P 1 be an adaptive extension of P which adjusts all sample sizes after stage l < m in the same proportion so that n i γn i for some γ 1 for all i > l and uses the statistics (X 1,..., X l, X l+1,..., X m) and (Y 1,..., Y l, Y l+1,..., Y m) in conjunction with the same α-level boundaries (c 1,..., c m ) and (d 1,..., d m ) where (X i, Y i ) for l + 1 i m are the CHW statistics defined in (15). Then P 1 controls the FWER at level α. Proof: We will first show that P controls the FWER at level α. Consider three cases in which a type I error can occur. Case 1 (Both H 1 and H are true): In this case FWER P H1 H ({Reject H 1 } {Reject H }) P H1 (Reject H 1 ) (since H can be rejected only if H 1 is rejected) P H1 ( m i1{x i > c i }) α. Case (H 1 is true and H is false): In this case Case 3 (H 1 is false and H is true): In this case FWER P H1 (Reject H 1 ) P H1 ( m i1{x i > c i }) α. FWER P H (Reject H ) P H ( m i1{x 1 c 1,..., X i 1 c i 1, X i > c i, Y i > d i }) < P H ( m i1{y i > d i }) α. Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 5

6 A.C. Tamhane et al. The same proofs go through for P 1 since, as shown by Cui et al. (1999), the joint distributions of (X 1,..., X l, X l+1,..., X m) and (X 1,...,, X l, X l+1,..., X m ) are the same under H 1. Similarly, the joint distributions of (Y 1,..., Y l, Y l+1,..., Y m) and (Y 1,..., Y l, Y l+1,..., Y m ) are the same under H. Remark 1: The above result for m follows directly from Propositions 1 3 in TML but it was not explicitly stated. The proof given there was more complicated. The present proof due to Lingyun Liu [13] is more direct and simpler besides being more general. 4.. Procedure P As can be seen from the above proof, P 1 does not use any information on ρ; effectively it assumes the least favorable value ρ 1. We now propose procedure P that uses the confidence limit method of Part I based on the first stage sample correlation coefficient r to obtain a sharper second stage boundary. Proposition Let (c 1, c ) and (d 1, d ) be any α-level boundaries of P that control the FWER at level α. Then P which uses the CHW statistics in conjunction with the critical boundaries (c 1, c ) and (d 1, d ) controls the FWER at level α if we set (c 1, c ) (c 1, c ), d 1 d 1 and d as the solution to the equation ( ) ( fx1 c Φ + δ 1 n fy1 d fx1 c, ρ Φ + δ 1 n, ) fy1 d ρ (16) where (x 1, y 1 ) are the observed values of (X 1, Y 1 ) and Φ (, ρ) is the c.d.f. of the standard bivariate normal distribution with correlation coefficient ρ. Proof: We have seen from the proof of Proposition 1 that the FWER is controlled under H 1 if (c 1, c ) (c 1, c ). Thus we only need to consider the configuration H 1 H. As in the proof of Proposition 1, we first consider the FWER of P which can be written as P 1 + P where P 1 P H (X 1 > c 1, Y 1 > d 1 ) and P P H (X 1 c 1, X > c, Y > d ). (17) By conditioning on (X 1, Y 1 ) (x 1, y 1 ) and using the fact that under H 1 H, (X 1 δ 1 n1, Y 1 ) is distributed as standard bivariate normal with correlation coefficient ρ and denoting its p.d.f. ϕ (x 1 δ 1 n1, y 1 ρ) by f X1,Y 1 (x 1, y 1 ), we can write P as P c1 c1 c1 c1 P H (X > c, Y > d x 1, y 1 )f X1,Y 1 (x 1, y 1 )dx 1 dy 1 P H ( X () > c fx 1, Y () > d fy 1 x 1, y 1 ) f X1,Y 1 (x 1, y 1 )dx 1 dy 1 P H ( Z 1 > c fx 1 δ 1 n, Z > d fy 1 ) f X1,Y 1 (x 1, y 1 )dx 1 dy 1 ( fx1 c Φ + δ 1 n, ) fy1 d ρ ϕ (x 1 δ 1 n1, y 1 ρ)dx 1 dy 1. (18) In the above derivation we have used the fact that under H 1 H, (X () δ 1 n, Y () ) has a standard bivariate normal distribution with correlation coefficient ρ. The FWER of P under H 1 H can be analogously written as P 1 + P where P 1 and P are defined as in (17) but with (d 1, d ) replaced by (d 1, d ) and (X, Y ) replaced by (X, Y ). Note that P 1 P 1 since c 1 c 1 and d 1 d 1. Furthermore, P has the same integral expression (18) as for P but with d replaced by d and n replaced by n (which is fixed conditioned on X 1 x 1 ). Setting the integrands in the two integral expressions equal gives equation (16). 6 Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

7 A.C. Tamhane et al. Remark : Note that d depends on the observed (x 1, y 1 ) and thus P controls the FWER conditionally on (x 1, y 1 ). However, since this holds for every (x 1, y 1 ), it also holds unconditionally. Remark 3: Equation (16) involves two unknown parameters, ρ and δ 1. It is not clear how to use the upper confidence limit ρ in place of ρ because ρ and its associated confidence level 1 ε were determined in Part I to optimize the (d 1, d ) boundary. There is no analog of that problem here. So we used the sample estimate r for ρ and similarly the sample estimate δ1 x 1 / n 1 for δ 1. On the other hand, for choosing the unadjusted second stage critical boundary for the secondary endpoint, d, we employed the confidence limit method proposed in Part I and used ρ for the unknown ρ. 5. Procedures Based on Sufficient We consider two adaptive group sequential procedures, P 1 and P, based on the sufficient statistics (X 1, X ) and (Y 1, Y ) (where X and Y are defined in (8) and (9)). These are used in conjunction with critical boundaries (c 1, c ) and (d ) (different for P 1 and P ), which are determined below to control the FWER. First we consider procedure 1, d P 1. In Proposition 3 we show that the critical boundaries (c 1, c ) and (d 1, d ) can be chosen so that P 1 makes the same decisions as any GSP based on the CHW statistics that controls the FWER, e.g., P 1 or P, by modifying their critical constants. Thus P 1 is strictly not a sufficient statistics based procedure; it simply re-expresses the rejection rule in terms of the CHW statistics by making the second stage critical constants functions of the first stage data Procedure P 1 Proposition 3 Let (c 1, c ) and (d 1, d ) be the critical boundaries of P 1 (or P ) that control the FWER at level α. Denote the critical boundaries of P 1 by (c 1, c ) and (d 1, d ). Then P 1 (or P ) and P 1 make identical decisions (and hence P 1 controls the FWER at level α) if we set c 1 c 1, d 1 d 1, [ ] c 1 n (c n1 + n n1 + n n 1 x 1 ) + n 1 x 1 n (19) and [ ] d 1 n (d n1 + n n1 + n n 1 y 1 ) + n 1 y 1. (0) n Proof: Procedures P 1 and P 1 are equivalent at the first stage because c 1 c 1 and d 1 d 1. For the second stage, it is easy to check using (7) and (8) that X > c X () > c fx 1 and X > c X () > c f x 1 Equating the right hand sides of the above two inequalities (to get the same rejection regions) and solving for c yields (19). The equation (0) for d is obtained in the same way. Remark 4: The formula (19) for c was given by Gao et al.[8]. They showed that in fact c c for x 1 [1.1, ] for α Thus in this region we can increase n to n without adjusting the critical constant c and the resulting procedure would still be conservative. Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 7

8 A.C. Tamhane et al. 5.. Procedure P As noted before, P 1 is simply a re-expression of P 1. To gain the potential power advantage associated with sufficient statistics we need to determine (c, d ) unconditionally by integrating over all possible outcomes (x 1, y 1 ) and solving the equation obtained by equating the resulting FWER integral to α. Here it is assumed that (c 1, d 1 ) are prespecified. We denote the corresponding procedure by P. First consider the problem of FWER control under H 1. It is well-known ([], [14]) that the α-level boundary (c 1, c ) does not control the FWER when used in conjunction with the sufficient test statistics (X 1, X ). We can determine c unconditionally as follows. For any given c 1, the equation for FWER control under H 1 can be written as FWER P 1 + P α where P 1 Φ( c 1 ) (here Φ( ) is the standard normal c.d.f.) and P P H1 (X 1 c 1, X > c c1 c1 c1 ) P H1 ( f X (1) + X () > c X (1) x 1 )ϕ(x 1 )dx 1 ( P H1 X () > c ) f x 1 ϕ(x 1 )dx 1 ( c Φ + f x 1 ) ϕ(x 1 )dx 1 (1) where f n 1 /(n 1 + n ) is function of x 1 since n depends on CP and hence on x 1. We can solve for c by setting P c1 ( c Φ + f x 1 ) ϕ(x 1 )dx 1 α Φ( c 1 ). () Next the equation for determining d to control the FWER under H can be derived as follows. First, FWER P 1 + P α where P 1 P H (X 1 > c 1, Y 1 > d 1 ) Φ ( c 1 + δ 1 n1, d 1 ρ) (3) is a function of specified (c 1, d 1 ) and hence can be evaluated given n 1, δ 1 and ρ. Next, P P H (X 1 c 1, X > c, Y > d ) c1 ( P H f X (1) + X () > c, f Y (1) + Y () > d ) X (1) x 1, Y (1) y 1 ϕ (x 1, y 1 ρ)dx 1 dy 1 c1 ( P H X () > c f x 1, Y () > d ) f y 1 ϕ (x 1, y 1 ρ)dx 1 dy 1 ( c1 c Φ + f x 1 + δ 1 n, d + ) f y 1 ρ ϕ (x 1, y 1 ρ)dx 1 dy 1. (4) Having earlier obtained c from () we can solve for d from the above equation by setting (4) equal to α P 1 where P 1 is given by (3). 8 Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

9 A.C. Tamhane et al. Table 1. (c, d ) Values for the OF1-PO Boundary Using the Fixed Increase (FI) Rule (13) for Second Stage Sample Size Re-estimation (α.05, 1.0 and ρ 0.5) [CP min, CP max ] γ γ 4 [30%, 70%] (1.6717, 1.703) (1.6575, ) [40%, 80%] (1.664, 1.731) (1.643, ) [50%, 90%] (1.6505, 1.78) (1.6, ) 6. Numerical Evaluation of Critical Boundaries Evaluation of the integrals in the above equations is complicated by the fact that f n 1 /(n 1 + n ) is a function of CP (since n is) and hence of x 1. Numerical integration is more difficult for the Mehta-Pocock (MP) rule (1) than for the fixed increase (FI) rule since in the former case, f varies continuously with x 1 while in the latter case f is a step function of x 1. Computation of c is relatively straightforward since equation (1) does not involve any unknown parameters. However, computation of d is more complicated because both P 1 given by (3) and P given by (4) depend on the unknown parameters δ 1 and ρ. We will use the estimates δ1 x 1 / n 1 and ρ r as proposed in Remark 3. Note that this makes d conditional on the first stage data. However, as argued before, FWER is unconditionally controlled since it is controlled for every observed (x 1, y 1, r). Table 1 gives the computed values of (c, d ) for the OF1-PO boundary for α.05, 1.0, ρ 0.5 and γ, 4 (typically, sample size increases would be limited to a factor of but we also considered a factor of 4 to study any patterns in the computed values). We considered three promising zones: [30%, 70%], [40%, 80%] and [50%, 90%]. For the OF1 boundary, without sample size re-estimation, we have c and c Similarly, for the PO boundary, we have d 1 d Note that c values are computed constants since they don t depend on the estimates of any unknown parameters. However, d values are random variables since they depend on (x 1, y 1, r). The values reported in the table are averages obtained over 10,000 replications of (x 1, y 1, r). It is seen that the c values are larger and the d values are smaller for γ than for γ 4. In either case, c < c and d < d. Thus the critical boundaries of P are sharper than those of P 1. The c values become smaller and the d values become larger as the promising zone [CP min, CP max ] shifts to the right, i.e., the sample size adjustment is made for higher conditional power values. 7. Power Comparison In this section we compare the secondary powers (probability of rejecting false H ) of the following procedures via simulation: P 1 which uses separate α-level boundaries (c 1, c ) and (d 1, d ), P using either the known ρ assumption (denoted as P (ρ)) or the upper confidence limit ρ proposed in Part I for dealing with unknown ρ (denoted as P (ρ )) and P based on sufficient statistics. Note that since the first three procedures use the same primary test statistics (X 1, X ) and the same critical boundary (c 1, c ), their primary powers are identical. P will have a different primary power because it uses different test statistics (X 1, X ) and different critical boundary (c 1, c ); we will report those results in a separate paper. We considered the same six combinations of different parameters as in Table 1. Throughout we kept the following quantities fixed: primary-secondary boundary combination: OF1-PO, α.05, n 1 n 50, 1.0, ρ 0.5 and the number of replications per simulation run 10,000. In each scenario we varied from 1.0 to 4.0 in steps of 0.5. The results are presented in Tables 7. For each simulation run we computed the percentage power gain achieved by P (ρ ) over P 1 as a fraction of the Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 9

10 A.C. Tamhane et al. Table. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [30%, 70%] Promising Zone for Conditional Power and the FI Rule (13) with γ for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P Table 3. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [40%, 80%] Promising Zone for Conditional Power and the FI Rule (13) with γ for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P Table 4. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [50%, 90%] Promising Zone for Conditional Power and the FI Rule (13) with γ for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P maximum achievable power gain of the gold standard procedure P (ρ) for known ρ over P 1 (which assumes ρ 1), as defined below: Power Gain(%) Power (ρ ρ ) Power (ρ 1) 100 (5) Power (Known ρ) Power (ρ 1) For all scenarios, these power gains range between 70% to 80%. Procedure P based on the sufficient statistics generally has the highest power, even higher than that of the gold standard procedure P (ρ), demonstrating the power advantage of sufficient statistics. However, one must weigh this power advantage of P against the difficulty of computing its critical constants (c, d ) Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

11 A.C. Tamhane et al. Table 5. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [30%, 70%] Promising Zone for Conditional Power and the FI Rule (13) with γ 4 for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P Table 6. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [40%, 80%] Promising Zone for Conditional Power and the FI Rule (13) with γ 4 for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P Table 7. Secondary Powers of P 1, P (ρ ), P (ρ) and P Using [50%, 90%] Promising Zone for Conditional Power and the FI Rule (13) with γ 4 for Modified Second Stage Sample Size n (OF1-PO Boundary Combination, α.05, n 1 n 50, 1.0, ρ 0.5) P 1 P (ρ ) P (ρ) P Extension to Two Samples The basic setup and notation are the same as defined in Section 7 of Part I. The stagewise test statistics are given by X (k) U 1 k U k and Y (k) V 1 k V k (k 1, ), (6) σ 1 1/n1k + 1/n k σ 1/n1k + 1/n k where U i k and V i k are the sample means of the observations U ijk and V ijk averaged over patients j 1,..., n ik, respectively. Next denote by U i and V i the overall sample means of the primary and secondary endpoints data for the ith group and n i n i1 + n i (i 1, ). The cumulative test statistics (X 1, Y 1 ) and (X, Y ) for the first and second Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 11

12 A.C. Tamhane et al. stages are as defined in (11) and (1), respectively, of Part I. Note that we can express X, Y as follows: X n n 11 n 1 U 1 1 n n1 1 n U 1 n n 1 n 1 U 1 n n1 n U + σ 1 n1 + n σ 1 n1 + n and Y n n 11 n 1 V 1 1 n n1 1 n V 1 n n 1 n 1 V 1 n n1 n V +. σ n1 + n σ n1 + n These can be expressed as linear combinations of the stagewise test statistics (X (k), Y (k) ) for k 1, only if n 11 n 1 and n 1 n, i.e., if the treatment and control groups have a balanced sample size allocation in both stages. From now on we will assume this condition (which may not hold exactly when the patients are randomized to the treatment and control groups). Then it is easy to show that X, Y can be written as follows: where X fx (1) + X () and Y fy (1) + Y (), (7) f n 11 + n 1 n 1 + n n 1 n 1 + n (8) is the fraction of the total sample size allocated to the first stage which is analogous to the information fraction f n 1 /(n 1 + n ) in the one-sample case. Next consider the case where the second stage total sample size n n 1 + n is re-estimated to n n 1 + n. Let U i and V i (i 1, ) be the sample means of the second stage data with re-estimated sample sizes. The corresponding second stage standardized test statistics, X () and Y () are defined as in (6) for k but with all quantities in the formulae replaced by their primed analogs. Once again assuming a balanced sample size allocation with n 11 n 1 and n 1 n, the second stage CHW statistics are given by (7) with X (1), X (), Y (1), Y () for the two-sample problem as defined above and f given by (8). Similarly, the sufficient statistics are given by (8) and (9) with n 1 f n 1 + n. An alternative and a more direct way of defining sufficient statistics is X U 1 U and Y V 1 V. (9) σ 1 1/n1 + 1/n σ 1 1/n1 + 1/n This definition does not require the assumption of the balanced sample size allocation. We will use this definition in the example below. We have assumed that the standard deviations σ 1 and σ are known, but in practice they must be estimated. The first stage pooled (from the treatment and the control groups) estimate of σ 1 is given by σ (1) 1 i1 ni1 j1 (U ij1 U i 1 ) n 11 + n 1 with an analogous expression for σ (1). These estimates will be used to calculate X(1) and Y (1). The second stage pooled estimates, σ () 1 and σ (), have similar expressions except the sample sizes n i are changed to the re-estimated sample sizes n i. Since X () and Y () must be based only on the second stage data, these second stage pooled estimates are used in 1 Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

13 A.C. Tamhane et al. their calculation and not the overall pooled estimates σ 1 and σ, where σ 1 i1 ni1 j1 (U ij1 U i ) + n i i1 j1 (U ij U i ) n 1 + n with an analogous expression for σ. We use these overall pooled estimates in the calculation of the sufficient statistics (9) in the example below. 9. Example We will use the same ISOLDE trial example from Part I. As discussed there, we will use the rate of decline in FEV1 as the primary endpoint and the rate of decline in FVC as the secondary endpoint where the rates of decline are computed by dividing the difference between the final measurement (the timing of which varies from patient to patient depending upon how many visits they completed) and the baseline measurement (at randomization) by the time period (in months) between the two measurements. To allow for sample size re-estimation with increased sample size at the second stage, we assume that the original trial was planned with a total of 300 patients with n 1 n 150 patients in each stage (randomized between the treatment and the placebo). This design corresponds to primary power of 90% to detect a clinically significant difference of δ 1 (µ 1 µ )/σ or approximately The sample size re-estimation rule is: if CP at the interim look is in the promising zone [50%, 90%] then we would use n n 300 patients bringing the total number of patients to 450. We further assume that the OF1-PO boundary combination with α 0.05 is used for the primary and the secondary endpoints. The corresponding OF1 critical values are c , c and the PO critical values are d 1 d Of the first 150 patients, n were randomized to the treatment and n 1 81 to the control. At the interim look we have the following summary statistics: U , U , V , V , σ (1) , σ (1) , r From these summary statistics we can calculate X (1) and Y (1) Since X (1) < c 1, sampling continues to the second stage. To determine if the second stage sample size needs to be re-estimated, we calculate the conditional power (CP). We have δ1 (U 1 1 U 1 )/ σ (1) 1 ( )/ CP is given by the formula (10) with n replaced by its two-sample analog (n 1 n )/(n 1 + n ). At the interim look n 1 and n are as yet unobserved, so we take n 1 n 150/ 75, which gives (n 1 n )/(n 1 + n ) Also, f 0.5. Substituting these values in (10) we get ( CP 1 Φ ) Φ( ) Since CP falls in the promising zone we increase the second stage sample from 150 to 300. It would be instructive to compare this sample size with what one gets using the Gao-Ware-Mehta formula (11). Modified for the two-sample setup, that formula becomes { 1 }. n 4 δ 1 n [c n 1 + n x 1 n 1 ] + z β Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 13

14 A.C. Tamhane et al. Substituting δ , n 1 n 150, c , x and z β 1.8 for 1 β 0.90, we get n 49.3 or 50. In the actual data if we consider the next 300 patients then we find that n were randomized to the treatment and n 139 to the control resulting in a total of n 1 30 patients on the treatment and n 0 on the control. The second stage summary statistics are as follows: U , U , V , V , σ () , σ () From these summary statistics we calculate X ().9195 and Y () The CHW statistics equal X fx (1) + X () 1/(1.7783) + 1/(.9195) and Y fy (1) + Y () 1/(1.0151) + 1/(.0069) P 1 compares X and Y with c and d.0661, respectively. Thus P 1 rejects both H 1 and H. P compares X with c but Y with d.0665, which is determined from (16) given the observed values of (X 1, Y 1 ), ρ estimated by r and δ 1 estimated by δ So we obtain the same result. Next we will apply P based on the sufficient statistics. The overall sample means and the standard deviations are U , U , V , V , σ , σ Using (9) the sufficient statistics can be calculated as X /30 + 1/ and Y /30 + 1/ To apply P we calculated c from (3) and d.0555 from (4) given (X 1, Y 1 ) (1.7783, ), ρ estimated by r and δ 1 estimated by δ These values are slightly smaller than c and d, respectively. Once again, X > c and Y > d, so we get the same result. 10. Discussion Although P based on sufficient statistics is the most powerful among the procedures that were compared, there are some practical drawbacks associated with it. First, computation of its critical constants (c, d ) is rather complicated. Second, and more important, it requires that (c ) be prespecified, which makes the sample size increase binding if, d the conditional power falls in the promising zone; otherwise the stopping boundary will not be valid. This ties the hands of the Data Monitoring Committee, which may not want to increase the sample size because of other reasons such as slow accrual rate or excessive toxicity. On the other hand, procedure P (ρ ) gives the flexibility to change the plans about sample size adaptation without the risk of inflating the type I error. These practical considerations must be weighed against the fact that the procedure P has the higher secondary power. Thus, although the CHW statistics are less efficient, they are more practically applicable because of their flexibility Copyright c 0000 John Wiley & Sons, Ltd. Statist. Med. 0000,

15 A.C. Tamhane et al. The problem of binary data was considered in the Discussion section of Part I. It was noted there that there are difficulties in extending the normal data results to binary data especially in the two sample case because of the dependence of the correlation coefficient between the primary and the secondary endpoints on the corresponding success probabilities for the treatment and control groups which must be equal (i.e., H 1 and H must be true) for the assumption of the common correlation coefficient for the two groups to be valid. Acknowledgment The authors thank the referees for useful suggestions, Dr. Lingyun Liu for providing the proof of Proposition 1 and Dr. Stuart Pocock for providing the data used in the ISOLDE trial. References 1. Tamhane AC, Mehta CR and Liu L. Testing a primary and secondary endpoint in a group sequential design. Biometrics 010; 66: Cui L, Hung HMJ and Wang SJ. Modification of sample size in group sequential clinical trials. Biometrics 1999; 55: Tamhane AC, Wu Y and Mehta CR. An adaptive two-stage group sequential procedure for testing a primary and a secondary endpoint using the sample estimate of the unknown correlation between the endpoints. Submitted for publication Fisher LD. Self-designing clinical trials. 1998; 17: Jennison C and Turnbull BW. Mid-course sample size modification in clinical trials based on the observed treatment effect. 003; : Jennison C and Turnbull BW. Adaptive and nonadaptive group sequential tests. Biometrika 006; 93: Jennison C and Turnbull BW. Adaptive seamless designs: Selection and prospective testing of hypotheses. Journal of Biopharmaceutical 007; 17: Gao P, Ware JH and Mehta CR. Sample size re-estimation for adaptive sequential design. Journal of Biopharmaceutical 008; 18: Mehta CR and Pocock SJ. Adaptive increase in sample size when interim results are promising: A practical guide with examples. 010; published online. 10. Lan KKG and DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika 1983; 70: O Brien PC and Fleming TR. A multiple testing procedure for clinical trials. Biometrics 1979;, 35: Pocock SJ. Group sequential methods in the design and analysis of clinical trials. Biometrika 1977; 64: Liu L. Personal communication Proschan MA and Hunsberger SA. Designed extenstion of studies based on conditional power. Biometrics 1995; 51: Statist. Med. 0000, Copyright c 0000 John Wiley & Sons, Ltd. 15

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE