Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Similar documents
Interim Monitoring of. Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Group Sequential Designs: Theory, Computation and Optimisation

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Independent Increments in Group Sequential Tests: A Review

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Group Sequential Tests for Delayed Responses

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

The SEQDESIGN Procedure

The Design of a Survival Study

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Adaptive Designs: Why, How and When?

Pubh 8482: Sequential Analysis

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Pubh 8482: Sequential Analysis

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Published online: 10 Apr 2012.

Pubh 8482: Sequential Analysis

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

Multiple Testing in Group Sequential Clinical Trials

Group-Sequential Tests for One Proportion in a Fleming Design

Inverse Sampling for McNemar s Test

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Repeated confidence intervals for adaptive group sequential trials

Overrunning in Clinical Trials: a Methodological Review

METHODS OF EVALUATION OF PERFORMANCE OF ADAPTIVE DESIGNS ON TREATMENT EFFECT INTERVALS AND METHODS OF

Lecture 7 Time-dependent Covariates in Cox Regression

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Pubh 8482: Sequential Analysis

Adaptive Treatment Selection with Survival Endpoints

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Group sequential designs with negative binomial data

Group sequential designs for Clinical Trials with multiple treatment arms

Pubh 8482: Sequential Analysis

4. Issues in Trial Monitoring

Stat Seminar 10/26/06. General position results on uniqueness of nonrandomized Group-sequential decision procedures in Clinical Trials Eric Slud, UMCP

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

Reports of the Institute of Biostatistics

A simulation study for comparing testing statistics in response-adaptive randomization

Calculation of Efficacy and Futiltiy Boundaries using GrpSeqBnds (Version 2.3)

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

INTERIM MONITORING AND CONDITIONAL POWER IN CLINICAL TRIALS. by Yanjie Ren BS, Southeast University, Nanjing, China, 2013

Exact Inference for Adaptive Group Sequential Designs

c Copyright 2014 Navneet R. Hakhu

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

TMA 4275 Lifetime Analysis June 2004 Solution

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

BIOS 312: Precision of Statistical Inference

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Survival Analysis for Case-Cohort Studies

Sample Size Determination

Group sequential designs for negative binomial outcomes

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

Generalized Likelihood Ratio Statistics and Uncertainty Adjustments in Efficient Adaptive Design of Clinical Trials

Group Sequential Trial with a Biomarker Subpopulation

Multivariate Survival Analysis

AN INVESTIGATION OF TWO-STAGE TESTS. Marc Vandemeulebroecke. Otto-von-Guericke-Universität Magdeburg. and. Schering AG

4. Comparison of Two (K) Samples

On Using Truncated Sequential Probability Ratio Test Boundaries for Monte Carlo Implementation of Hypothesis Tests

PASS Sample Size Software. Poisson Regression

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Statistics in medicine

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Survival Regression Models

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Power and Sample Size Calculations with the Additive Hazards Model

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Accelerated Failure Time Models

Monitoring Paired Binary Surgical Outcomes Using Cumulative Sum Charts

Are Flexible Designs Sound?

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

Session 3 The proportional odds model and the Mann-Whitney test

When enough is enough: early stopping of biometrics error rate testing

Sequential Analysis of Quality of Life Measurements Using Mixed Rasch Models

6-1. Canonical Correlation Analysis

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Semiparametric Regression

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

High-Throughput Sequencing Course

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Truncated Life Test Sampling Plan Under Odd-Weibull Distribution

Whether to use MMRM as primary estimand.

Transcription:

Group Sequential Methods for Clinical Trials Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK http://www.bath.ac.uk/οmascj PSI, London, 22 October 2003 1

Plan of talk 1. Why sequential monitoring? 2. 1929, Dodge & Romig: 2stage sampling 3. 1940s: methods for manufacturing 4. 1950s and 0s: methods for medical studies 5. 1970s: group sequential tests. Types of test, including equivalence, and types of stopping rule 7. Sequential theory, including survival data 8. A unified approach for group sequential design, monitoring and analysis 9. Nuisance parameters: updating a design 10. Survival data example 11. Error spending 2

1. Motivation of interim monitoring In clinical trials, animal trials and epidemiological studies there are reasons of ethics administration (accrual,compliance,...) economics to monitor progress and accumulating data. Subjects should not be exposed to unsafe, ineffective or inferior treatments. National and international guidelines call for interim analyses to be performed and reported. It is now standard practice for medical studies to have a Data and Safety Monitoring Board to oversee the study and consider the option of early termination. 3

The need for special methods There is a danger that multiple looks at data can lead to overinterpretation of interim results Overall Type I error rate applying repeated significance tests at ff = 5% to accumulating data Number of tests Error rate 1 0.05 2 0.08 3 0.11 5 0.14 10 0.19 20 0.25 100 0.37 1 1.00 Pocock (1983) Clinical Trials Table 10.1, Armitage, et al. (199), Table 2. 4

2. Acceptance sampling Dodge & Romig (1929), Bell Systems Technical Journal. Components are classified as effective or defective. A batch is only accepted if the proportion of defectives in a sample is sufficiently low. Number defective Reject Reject Continue Accept Accept n 1 n 1 + n 2 Number of items sampled 5

3. Manufacturing production Barnard and Wald developed methods for industrial production and development. Wald (1947) published his Sequential Probability Ratio Test (SPRT) for testing between two simple hypotheses. Stopping boundaries and continuation region Sample sum S n Φ ψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψ Φ X @Φ ψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψψ X n

The SPRT Ostensibly, the SPRT tests between H 1 : = 1 and H 2 : = 2. low high ff ff 1 2 In reality, it is usually used to choose between two sets of values. The SPRT has an optimality property if only 1 and 2 need be considered. However, it assumes continuous monitoring of the data and has no upper bound on the possible sample size. 7

4. Sequential monitoring of clinical trials In the 1950s, Armitage and Bross took sequential testing from industrial applications to comparative clinical trials. Their plans were fully sequential but with a bounded maximum sample size. The restricted test, Armitage (1957), S n Reject H 0 (((((((((((((((((((((((((((((( Accept H 0 n hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh Reject H 0 testing H 0 : = 0 against = 0, where is the treatment difference. 8

Armitage s repeated significance test Armitage, McPherson & Rowe (199) applied a significance test of H 0 : = 0 after each new pair of observations. Numerical calculations gave the nominal significance level ff 0 to use in each of N repeated significance tests for an overall type I error probability ff. S n, ΦΦ A A@! οοοοοψψψψψψψ(((((((((( Reject H 0! Accept H 0 n l Reject H 0 QHHaaPPPPXXXXX```````hhhhhhhhhh 9

5. Group sequential tests In practice, one can only analyse a clinical trial on a small number of occasions. Shaw (19): talked of a block sequential analysis. Elfring & Schultz (1973): gave group sequential designs to compare two binary responses. McPherson (1974): use of repeated significance tests at a small number of analyses. Pocock (1977): provided clear guidelines for group sequential tests with given type I error and power. O Brien & Fleming (1979): an alternative to Pocock s repeated significance tests. 10

Pocock s repeated significance test To test H 0 : = 0 against = 0. Use standardised test statistics Z k, k = 1;:::;K. Stop to reject H 0 at analysis k if jz k j > c: If H 0 has not been rejected by analysis K, stop and accept H 0. Z k Reject H 0 Accept H 0 k Reject H 0 11

. Types of hypothesis testing problems Twosided test: testing H 0 : = 0 against = 0. Onesided test: testing H 0 :» 0 against > 0. Equivalence tests: onesided to show treatment A is as good as treatment B, within a margin ffi. twosided to show two treatment formulations are equal within an accepted tolerance. 12

A twosided equivalence test Conduct a test of H 0 : = 0 vs = 0 with type I error rate ff and power 1 fi at = ±ffi. Declare equivalence if H 0 is accepted. PrfDeclare equivalenceg 1 ff fi........................................... ffi 0 ffi Here, fi represents the consumer s risk. In design and implementation, give priority to attaining power 1 fi at = ±ffi. 13

Types of early stopping 1. Stopping to reject H 0 : no treatment difference Allows progress from a positive outcome Avoids exposing further patients to the inferior treatment Appropriate if no further checks are needed on, say, treatment safety or longterm effects. 2. Stopping to accept H 0 : no treatment difference Stopping for futility or abandoning a lost cause Saves time and effort when a study is unlikely to lead to a positive conclusion. 14

Onesided tests If we are only interested in showing that a new treatment is superior to a control, we should test H 0 :» 0 against > 0, requiring PrfReject H 0 j = 0g = ff, PrfReject H 0 j = ffig = 1 fi. A typical boundary is: Z k H P P P ρ ρρ Reject H 0 ```!!!! " "" Accept H 0 k E.g., Whitehead (1997), Pampallona & Tsiatis (1994). 15

Twosided tests with early stopping for H 0 Early stopping in favour of H 0 may be included in a twosided test to abandon a lost cause. Z k H H a aa Reject H 0 XXX hhh P PP Accept H 0 k Q QQ Φ Φ!!! οοο ((( Reject H 0 1

Onesided tests of H 0 : = 0 vs > 0 Early stopping to Z k Reject H 0 reject H 0 or accept H 0 I k Accept H 0 Early stopping only Z k Reject H 0 to reject H 0 I k Accept H 0 Abandoning a lost Z k Reject H 0 cause: Early stopping only I k to accept H 0 Accept H 0 17

Twosided tests of H 0 : = 0 vs = 0 Early stopping to reject H 0 Z k Reject H 0 Accept H 0 I k Reject H 0 An inner wedge: Early stopping to reject H 0 or Z k Reject H 0 Accept H 0 I k accept H 0 Reject H 0 Z k Reject H 0 Abandoning a lost cause: Only an inner wedge Accept H 0 I k Reject H 0 18

7. Joint distribution of parameter estimates Reference: Jennison & Turnbull, Ch. 11 Suppose our main interest is in the parameter and let b k be the estimate of based on data at analysis k. The information for at analysis k is I k = fvar( b k )g 1 ; k = 1;:::;K: Canonical joint distribution of b 1 ;:::; b K In very many situations, b 1 ;:::; b K are approximately multivariate normal, b k ο N( ; fi k g 1 ); k = 1;:::;K; and Cov( b k1 ; b k2 ) = Var( b k2 ) = fi k2 g 1 for k 1 < k 2 : 19

Sequential distribution theory The preceding results for the joint distribution of b 1 ;:::; b K can be demonstrated directly for: a single normal mean, = μ A μ B ; the effect size in a comparison of two normal means. The results also apply when is a parameter in: a general normal linear, a general model fitted by maximum likelihood (large sample theory). So, we have the theory to support general comparisons, including adjustment for covariates if required. 20

Canonical joint distribution of zstatistics In testing H 0 : = 0,thestandardised statistic at analysis k is Z k = b k q Var( b k ) = b k p Ik : For this, (Z 1 ;:::;Z K ) is multivariate normal, Z k ο N( p I k ; 1); k = 1;:::;K, Cov(Z k1 ;Z k2 )= q I k1 =I k2 for k 1 < k 2. 21

Canonical joint distribution of score statistics The score statistics S k = Z k p Ik, are also multivariate normal with S k ο N( I k ; I k ); k = 1;:::;K: The score statistics possess the independent increments property, Cov(S k S k 1 ; S k 0 S k 0 1 ) = 0 for k = k0 : It can be helpful to know the score statistics behave as Brownian motion with drift observed at times I 1 ;:::;I K. 22

Survival data The canonical joint distributions also arise for a) the estimates of a parameter in Cox s proportional hazards regression model b) a sequence of logrank statistics (score statistics) for comparing two survival curves and to zstatistics formed from these. For survival data, observed information is roughly proportional to the number of failures seen. Special types of group sequential test are needed to handle unpredictable and unevenly spaced information levels: see error spending tests. 23

8. Group sequential design, monitoring and analysis To have the usual features of a fixed sample study, Randomisation, stratification, etc., Adjustment for baseline covariates, Appropriate testing formulation, Inference on termination, plus the opportunity for early stopping. Response distributions: Normal, unknown variance Binomial Cox model or logrank test for survival data Normal linear models Generalized linear models 24

General approach Think through a fixed sample version of the study. Decide on the type of early stopping, number of analyses, and choice of stopping boundary: these will imply increasing the fixed sample size by a certain inflation factor. In interim monitoring, compute the standardised statistic Z k at each analysis and compare with critical values (calculated specifically in the case of an error spending test). On termination, one can obtain Pvalues and confidence intervals possessing the usual frequentist interpretations. 25

Example of a two treatment comparison, normal response, 2sided test Cholesterol reduction trial Treatment A: new, experimental treatment Treatment B: current treatment Primary endpoint: reduction in serum cholesterol level over a four week period Aim: To test for a treatment difference. High power should be attained if the mean cholesterol reduction differs between treatments by 0.4 mmol/l. DESIGN MONITORING ANALYSIS 2

DESIGN How would we design a fixedsample study? Denote responses by X Ai, i = 1;:::;n A, on treatment A, X Bi, i = 1;:::;n B, on treatment B. Suppose each X Ai ο N(μ A ;ff 2 ) and X Bi ο N(μ B ;ff 2 ): Problem: to test H 0 : μ A = μ B with twosided type I error probability ff = 0:05 and power 0.9 at jμ A μ B j = ffi = 0:4. We suppose ff 2 is known to be 0.5. (Facey, Controlled Clinical Trials, 1992) 27

Fixed sample design Standardised test statistic Z = μx A μ XB qff 2 =n A + ff 2 =n B : Under H 0, Z ο N(0; 1) so reject H 0 if jzj > Φ 1 (1 ff=2): Let μ A μ B =. Ifn A = n B = n, Z ο N( q 2ff 2 =n ; 1) so, to attain desired power at = ffi, aimfor n = fφ 1 (1 ff=2) + Φ 1 (1 fi)g 2 2ff 2 =ffi 2 = (1:90 + 1:282) 2 (2 0:5)=0:4 2 = 5:7; i.e., subjects on each treatment. 28

Group sequential design Specify type of early termination: stop early to reject H 0 Number of analyses: 5(fewerifwestopearly) Stopping boundary: O Brien & Fleming. Reject H 0 at analysis k, k = 1;:::;5, if jz k j > c p f5=kg, Z k k where Z k is the standardised statistic based on data at analysis k. 29

Example: cholesterol reduction trial O Brien & Fleming design From tables (JT, Table 2.3) or computer software c = 2:040 for ff = 0:05 so reject H 0 at analysis k if jz k j > 2:040 q 5=k: Also, for specified power, inflate the fixed sample size by a factor (JT, Table 2.4) IF = 1:02 to get the maximum sample size 1:02 5:7 = 8: Divide this into 5 groups of 13 or 14 observations per treatment. 30

Some designs with K analyses O Brien & Fleming Reject H 0 at analysis k if jz k j > c q K=k: Z k k Pocock Reject H 0 at analysis k if Z k jz k j > c: k Wang & Tsiatis, shape Reject H 0 at analysis k if Z k jz k j > c (k=k) 1=2 : k ( = 0 gives O Brien & Fleming, = 0:5 gives Pocock ) 31

Example: cholesterol reduction trial Properties of different designs Sample sizes are per treatment. Fixed sample size is. K Maximum Expected sample size sample size = 0 = 0:2 = 0:4 O Brien & Fleming 2 7 7 5 5 5 8 8 4 50 10 9 8 4 48 Wang & Tsiatis, = 0:25 Pocock 2 8 7 4 52 5 71 70 5 47 10 72 71 4 44 2 73 72 7 51 5 80 78 70 45 10 84 82 72 44 32

MONITORING Implementing the OBF test Divide the total sample size of 8 per treatment into 5 groups of roughly equal size, e.g., 14 in groups 1 to 3, 13 in groups 4 and 5. At analysis k, define μx (k) A = 1 n Ak nx Ak X Ai ; i=1 μ X (k) B = 1 n Bk nx Bk X Bi i=1 and Z k = q μx (k) A μ X (k) B ff 2 (1=n Ak + 1=n Bk ) : Stop to reject H 0 if jz k j > 2:040 q 5=k; k = 1;:::;5: Accept H 0 if jz 5 j < 2:040. 33

Implementing the 5analysis OBF test The stopping rule gives type I error rate ff = 0:050 and power 0.902 at = 0:4 if group sizes are equal to their design values. Note the minor effects of discrete group sizes. Perturbations in error rates also arise from small variations in the actual group sizes. For major departures from planned group sizes, we should really follow the error spending approach see later. 34

ANALYSIS Analysis on termination The sample space consists of all possible pairs (k; Z k ) on termination: Z k 5 4 3 2 1 1 2 3 4 5 1 2 3 4 5 k 35

Analysis on termination First, order the sample space. Z k 5 4 3 2 ) ρ ρ ρ ρ ρ ρ ρ= Ψ Ψ 1 1 1 2 3 4 5 k 2 3 4 5 P PPPP Pq Z ZZ Z ZZ Z~ @ @@ @ @@ @R @ @@ @ @@ @R We define Pvalues and confidence intervals with respect to this ordering. 3

The Pvalue for H 0 : μ A = μ B is the probability under H 0 of observing such an extreme outcome. Z k 5 4 3 2 ) ρ ρ ρ ρ ρ Λ ρ ρ= Ψ Λ Λ Ψ 1 1 1 2 3 4 5 k 2 3 4 5 P PPPP Pq @ Z @@ ZZ Z @ ZZ @@ Z~ @R @ @@ @ @@ @R E.g., if the test stops at analysis 3 with Z 3 = 4:2, the twosided Pvalue is Pr =0 fjz 1 j 4:5 or jz 2 j 3:23 or jz 3 j 4:2g = 0:0013: 37

A confidence interval on termination Suppose the test terminates at analysis k Λ with Z k Λ = Z Λ. A 100(1 ff)% confidence interval for = μ A μ B is the interval ( 1 ; 2 ) where Pr = 1 fan outcome above (k Λ ;Z Λ )g= ff=2 and Pr = 2 fan outcome below (k Λ ;Z Λ )g= ff=2: E.g., if the test stops at analysis 3 with Z 3 = 4:2, the 95% confidence interval for is (0:24; 0:91); using our specified ordering. Compare: fixed sample CI would be (0:35; 0:95). 38

9. Updating a design as a nuisance parameter is estimated The case of unknown variance We can design as for the case of known variance but use an estimate of ff 2 initially. If in doubt, err towards overestimating ff 2 in order to safeguard the desired power. At analysis k, estimate ff 2 by P s 2 k = (XAi X μ(k) A )2 + P (X Bi X μ(k) B )2 n Ak + n Bk 2 In place of Z k, define tstatistics T k = μx (k) A X μ(k) B qs 2 k (1=n Ak + 1=n Bk ) ; : then test at the same significance level used for Z k when ff 2 is known. 39

Updating the target sample size Recall, maximum sample size is set to be the fixed sample size multiplied by the Inflation Factor. In a 5group O Brien & Fleming design for the cholesterol example this is 1:02 fφ 1 (1 ff=2) + Φ 1 (1 fi)g 2 2ff 2 =ffi 2 = 134:8 ff 2 : After choosing the first group sizes using an initial estimate of ff 2, at each analysis k = 1; 2;::: we can reestimate the target for n A5 and n B5 as 134:8 s 2 k and modify future group sizes to achieve this. 40

Example: updating the sample size Initially: With initial estimate ^ff 2 = 0:5, aim for n A5 = n B5 = 134:8 0:5 = 8. Plan 14 observations per treatment group. Analysis 1: With n A1 = n B1 = 14 and s 2 1 = 0:80, aim for n A5 = n B5 = 134:8 0:80 = 108. For now, keep to 14 obs. per treatment group. Analysis 2: With n A2 = n B2 = 28 and s 2 2 = 0:9, aim for n A5 = n B5 = 134:8 0:9 = 93. Now increase group size to 22 obs. per treatment. 41

Example: updating the sample size Analysis 3: With n A3 = n B3 = 50 and s 2 3 = 0:5, aim for n A5 = n B5 = 134:8 0:5 = 88. Set next group size to 19 obs. per treatment. Analysis 4: With n A4 = n B4 = 9 and s 2 4 = 0:72, aim for n A5 = n B5 = 134:8 0:72 = 97. Set final group size to 28 obs. per treatment. Analysis 5: With n A5 = n B5 = 97, suppose s 2 5 = 0:74, so the target is n A5 = n B5 = 134:8 0:74 = 100 and the test may be slightly underpowered. 42

Remarks on reestimating sample size The target information for = μ A μ B is I max = IF fφ 1 (1 ff=2) + Φ 1 (1 fi)g 2 =ffi 2 = 1:02 (1:90 + 1:282) 2 =0:4 2 = 7:4: The relation between information and sample size I k = ρ 1 n Ak + 1 n Bk ff 2ff 1 involves the unknown ff 2. Hence, the initial uncertainty about the necessary sample size. In effect, we proceed by monitoring observed information: I max = 7:4 I 1 I 2 I 3 I 4 I 5 Information NB, state I max = 7:4 in the protocol, not n = ::: 43

Recapitulation Designing a group sequential test: Formulate the testing problem Create a fixed sample study design Choose number of analyses and boundary shape parameter Set maximum sample size equal to fixed sample size times the inflation factor Monitoring: Find observed information at each analysis Compare zstatistics with critical values Analysis: Pvalue and confidence interval on termination This method can be applied to many response distributions and statistical models. 44

10. A survival data example Oropharynx Clinical Trial Data (Kalbfleisch & Prentice, 1980, Appendix 1, Data Set II) Patient survival was compared on experimental Treatment A and standard Treatment B. Number Number entered of deaths k Date Trt A Trt B Trt A Trt B 1 12/9 38 45 13 14 2 12/70 5 70 30 28 3 12/71 81 93 44 47 4 12/72 95 100 3 5 12/73 95 100 9 73 45

The logrank statistic At stage k, observed number of deaths is d k. Elapsed times between study entry and failure are fi 1;k < fi 2;k < ::: < fi dk ;k (assuming no ties). Define r ia;k and r ib;k numbers at risk on Treatments A and B at fi i;k r ik = r ia;k + r ib;k total number at risk at fi i;k O k E k = P d k i=1 V k = P d k i=1 r ib;k r ik r ia;k r ib;k r 2 ik observed number of deaths on Treatment B at stage k expected number of deaths on Treatment B at stage k. variance of O k The standardised logrank statistic at stage k is Z k = O k E p k : Vk 4

Proportional hazards model Assume hazard rates h A on Treatment A and h B on Treatment B are related by h B (t) = h A (t): The log hazard ratio is = ln( ). Then, approximately, and Z k ο N( p I k ; 1) Cov(Z k1 ;Z k2 )= p (I k1 =I k2 ); 1» k 1» k 2» K; where I k = V k. For ß 1, wehavei k ß V k ß d k =4. 47

Design of the Oropharynx trial Onesided test of H 0 :» 0 vs > 0. Under the alternative > 1, i.e., Treatment A is better. Require: type I error probability ff = 0:05, power 1 fi = 0:95 at = 0:, i.e., = 1:8. Information needed for a fixed sample study is I f = fφ 1 (ff) + Φ 1 (fi)g 2 0: 2 = 30:0 Under the approximation I ß d=4 the total number of failures to be observed is d f = 4 I f = 120:2. 48

Design of the Oropharynx trial For a onesided test with up to 5 analyses, we could use a standard design created for equally spaced information levels. Z k H P P P ρ ρρ Reject H 0 ```!!!! " "" Accept H 0 I k However, increments in information between analyses will be unequal and unpredictable. This leads to consideration of an error spending design. 49

11. Error spending tests Lan & DeMets (1983) presented twosided tests which spend type I error as a function of observed information. Maximum information design: Error spending function f(i) f(i) ff Φ!! ρ " # ρ#!!φ" I max I k Set the boundary at analysis k to give cumulative Type I error f(i k ). Accept H 0 if I max is reached without rejecting H 0. 50

Error spending tests Analysis 1: Observed information I 1. Z k Reject H 0 if jz 1 j > c 1 where I 1 k Pr =0 fjz 1 j > c 1 g = f(i 1 ): Analysis 2: Cumulative information I 2. Reject H 0 if jz 2 j > c 2 where Pr =0 fjz 1 j < c 1 ; jz 2 j > c 2 g Z k I 1 I 2 k = f(i 2 ) f(i 1 ): etc. 51

Onesided error spending tests For a onesided test, define f(i) and g(i) to specify how type I and type II error probabilities are spent as a function of observed information. f(i) ff ρ "Φ!! ρ g(i) fi ρ "Φ!! ρ!! Φ"!! Φ" I max I I max I At analysis k, set boundary values (a k ; b k ) so that Pr =0 freject H 0 by analysis kg = f(i k ); Pr =ffi faccept H 0 by analysis kg = g(i k ): Power family of error spending tests: f(i) and g(i) / (I=I max ) ρ. 52

Onesided error spending tests 1. Values fa k ;b k g are easily computed using iterative formulae of McPherson, Armitage & Rowe (199). 2. Computation of (a k ; b k ) does not depend on future information levels, I k+1 ; I k+2 ; :::. 3. In a maximum information design, the study continues until the boundary is crossed or an analysis is reached with I k I max. 4. The value of I max should be chosen so that boundaries converge at the final analysis under a typical sequence of information levels, e.g., I k = (k=k) I max ; k = 1;:::;K: 53

Overrunning If one reaches I K > I max, solving for a K and b K is liable to give a K > b K. Z k H P P P ρ ρρ ``` " "" Reject H 0 ffi a K bk... " "" " "" k Accept H 0 Keeping b K as calculated guarantees type I error probability of exactly ff. So, reduce a K to b K and gain extra power. Overrunning may also occur if I K = I max but the information levels deviate from the equally spaced values (say) used in choosing I max. 54

Underrunning If a final information level I K < I max is imposed, solving for a K and b K is liable to give a K < b K. Z k H P P P ρ ρρ ``` " "" Reject H 0 b K ((( ffi ak!! k... Accept H 0 Again, with b K as calculated, the type I error probability is exactly ff. This time, increase a K to b K and attained power will be a little below 1 fi. 55

A onesided error spending design for the Oropharynx trial Specification: onesided test of H 0 :» 0 vs > 0, type I error probability ff = 0:05, power 1 fi = 0:95 at = ln( ) = 0:. At the design stage, assume K = 5 equally spaced information levels. Use a powerfamily test with ρ = 2, i.e., error spent is proportional to (I=I max ) 2. Information of a fixed sample test is inflated by a factor R(K; ff; fi; ρ) = 1:101 (JT, Table 7.). So, we require I max = 1:101 30:0 = 33:10, which needs a total of 4 33:10 = 132:4 deaths. 5

Summary data and critical values for the Oropharynx trial We construct error spending boundaries for the information levels actually observed. This gives boundary values (a 1 ; b 1 );:::;(a 5 ; b 5 ) for the standardised statistics Z 1 ;:::;Z 5. Number Number k entered of deaths I k a k b k Z k 1 83 27 5.43 1.0 3.00 1.04 2 12 58 12.58 0.37 2.49 1.00 3 174 91 21.11 0.3 2.13 1.21 4 195 129 30.55 1.51 1.81 0.73 5 195 142 33.28 1.73 1.73 0.87 This stopping rule would have led to termination at the 2nd analysis. 57

Covariate adjustment in the Oropharynx trial Covariate information was recorded for subjects: gender, initial condition, Tstaging, Nstaging. These can be included in a proportional hazards regression model along with treatment effect fi 1. The goal is then to test H 0 : fi 1 = 0 against the onesided alternative fi 1 > 0. At stage k we have the estimate ^fi(k) 1, v k = d Var(^fi(k) 1 ), I k = v 1 k and Z k = ^fi(k) 1 =p v k. All these are available from standard Cox regression software. The standardised statistics Z 1 ;:::;Z 5 have, approximately, the canonical joint distribution. 58

Covariateadjusted group sequential analysis of the Oropharynx trial Constructing the error spending test gives boundary values (a 1 ; b 1 );:::;(a 5 ; b 5 )for Z 1 ;:::;Z 5. k I k a k b k ^fi(k) 1 Z k 1 4.11 1.95 3.17 0.79 1.0 2 10.89 0.1 2.59 0.14 0.45 3 19.23 0.43 2.20 0.08 0.33 4 28.10 1.28 1.90 0.04 0.20 5 30.9 1.8 1.8 0.01 0.04 Under this model and stopping rule, the study would have terminated at the 3rd analysis. 59

Further topics Chapters of Jennison & Turnbull, Group Sequential Methods with Applications to Clinical Trials: Ch 9. Ch 10. Ch 12. Ch 15. Ch 1. Ch 17. Ch 18. Ch 19. Repeated confidence intervals Stochastic curtailment Special methods for binary data Multiple endpoints Multiarmed trials Adaptive treatment assignment Bayesian approaches Numerical computations for group sequential tests 0