Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Similar documents
Adaptive Designs: Why, How and When?

Group Sequential Designs: Theory, Computation and Optimisation

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Group Sequential Tests for Delayed Responses

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

Adaptive clinical trials with subgroup selection

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Pubh 8482: Sequential Analysis

Adaptive Dunnett Tests for Treatment Selection

ADAPTIVE SEAMLESS DESIGNS: SELECTION AND PROSPECTIVE TESTING OF HYPOTHESES

Adaptive Treatment Selection with Survival Endpoints

Pubh 8482: Sequential Analysis

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Group-Sequential Tests for One Proportion in a Fleming Design

Estimation in Flexible Adaptive Designs

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Are Flexible Designs Sound?

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

The Design of a Survival Study

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Testing hypotheses in order

Preliminary Statistics. Lecture 5: Hypothesis Testing

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Multiple Testing in Group Sequential Clinical Trials

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

University of California, Berkeley

Pubh 8482: Sequential Analysis

Finding Critical Values with Prefixed Early. Stopping Boundaries and Controlled Type I. Error for A Two-Stage Adaptive Design

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Exact Inference for Adaptive Group Sequential Designs

Overrunning in Clinical Trials: a Methodological Review

Repeated confidence intervals for adaptive group sequential trials

The SEQDESIGN Procedure

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

Bios 6649: Clinical Trials - Statistical Design and Monitoring

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

Inverse Sampling for McNemar s Test

CH.9 Tests of Hypotheses for a Single Sample

OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING

ROI analysis of pharmafmri data: an adaptive approach for global testing

9-7: THE POWER OF A TEST

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Sample Size Determination

Introductory Econometrics

Pubh 8482: Sequential Analysis

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Adaptive Trial Designs

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Review. December 4 th, Review

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

Chapter 9. Hypothesis testing. 9.1 Introduction

Statistical Inference

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Chapter Seven: Multi-Sample Methods 1/52

The Limb-Leaf Design:

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0)

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Control of Directional Errors in Fixed Sequence Multiple Testing

On the Inefficiency of the Adaptive Design for Monitoring Clinical Trials

The Purpose of Hypothesis Testing

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Hypothesis testing for µ:

Lawrence D. Brown* and Daniel McCarthy*

6 Sample Size Calculations

Group sequential designs for Clinical Trials with multiple treatment arms

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Josh Engwer (TTU) 1-Factor ANOVA / 32

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

MTMS Mathematical Statistics

A Gate-keeping Approach for Selecting Adaptive Interventions under General SMART Designs

A simulation study for comparing testing statistics in response-adaptive randomization

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

On the Triangle Test with Replications

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Whether to use MMRM as primary estimand.

Transcription:

Comparing Adaptive Designs and the Classical Group Sequential Approach to Clinical Trial Design Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj KOL Lecture Series, PhRMA Adaptive Design Working Group, 12 September, 2008 1

Plan of talk 1. Objectives of Adaptive Designs 2. Objectives of Group Sequential Tests 3. Case Study 1: Sample size re-estimation 4. Case Study 2: Switching to a patient sub-population 5. Conclusions References: Adaptive and non-adaptive group sequential tests, Jennison and Turnbull, Biometrika (2006). Adaptive seamless designs: Selection and prospective testing of hypotheses, Jennison and Turnbull, Journal of Biopharmaceutical Statistics (2007). 2

1. Objectives of Adaptive Designs Ordinarily, in a clinical trial one specifies at the outset: Patient population, Treatment, Primary endpoint, Hypothesis to be tested, Power at a specific effect size. Adaptive designs allow these elements to be reviewed during the trial. Because... there may be limited information to guide these choices initially, but more knowledge will accrue as the study progresses. 3

Adaptive Designs Adaptive methods can enable mid-study modifications, including: Changing sample size in response to estimates of a nuisance parameter Changing sample size in response to interim estimates of the effect size Switching the primary endpoint Switching to a patient sub-population Changing the null hypothesis (e.g., switching from a superiority trial to a non-inferiority trial) Treatment selection (combining dose-selection in Phase II and a confirmatory Phase III trial) 4

Adaptive Designs Adaptation must be carried out in a way which protects type I error probability. Specialised methods have been developed to do this, Some adaptive approaches offer a high degree of flexibility. Adaptive procedures should achieve their objectives efficiently. A critical appraisal of an adaptive method needs a full specification of the rules that will be applied, Then, properties such as power and expected sample size can be calculated, often by simulation. Overall properties of a likely implementation are relevant, even if investigators intend to behave flexibly. 5

2. Objectives of Group Sequential Tests In a Group Sequential design, accumulating data are monitored regularly. The study stops as soon as there is sufficient evidence to reach a conclusion. Group sequential designs are available for testing a null hypothesis against a one-sided or two-sided alternative. Early termination may be for a positive result (success) or a negative outcome (stopping for futility). An efficient design reduces average sample size or time to a conclusion while protecting the type I error rate and maintaining the desired power. The group sequential design must be specified in a trial s protocol. 6

Group Sequential Tests A group sequential test can be represented by a stopping boundary: Z Accept H 0 Reject H 0 n 7

Group Sequential Tests Some flexibility and adaptability is available in group sequential designs. Error spending tests Error spending designs offer a flexible way to deal with unpredictable group sizes. Information monitoring In the general theory of sequential tests, what really matters is the observed information for a treatment effect. Designing a trial so that information will reach a pre-specified level is an effective approach when nuisance parameters affect the sample size needed to meet the power condition. Implementation There is a wide literature on the theory and practice of group sequential tests and a range of software products to help implement these methods. 8

Case study 1: Sample size modification to increase power Investigators may start out optimistically and design a trial with power to detect a large treatment effect. Interim data may then suggest a smaller effect size still clinically important but difficult to demonstrate with the chosen sample size. An adaptive design can allow sample size to be increased during the trial, rescuing an under-powered study. Some would advocate this approach as a way to let the data say what power and sample size should be chosen. Or, a group sequential design can achieve a desired power curve and save sample size through early stopping when the effect size is large. Questions: Is there a down-side to the wait and see approach? How are the adaptive and group sequential approaches related? 9

Sample size modification to increase power Example (Jennison & Turnbull, Biometrika, 2006, Ex. 2) We start with a group sequential design with 5 analyses, testing H 0 : θ 0 against θ > 0 with one-sided type I error probability α = 0.025 and Initial design: power 1 β = 0.9 at θ = δ. Z(k) 4 2 Reject H 0 Continue 0 2 50 150 100 250 Accept H 0 Sample size 4 10

Sample size modification to increase power Suppose, at analysis 2, a low interim estimate ˆθ 2 prompts investigators to consider the trial s power at effect sizes below δ, where power 0.9 was originally set: Lower effect sizes start to appear plausible, Conditional power under these effect sizes, using the current design, is low. Applying the method of Cui, Hung and Wang (Biometrics, 1999) Sample sizes for groups 3 to 5 are multiplied by a factor γ. Sample sums from these groups are down-weighted by γ 1/2. This gives the same variance as before but the mean is multiplied by γ 1/2. Using the new weighted sample sum in place of the original sample sum maintains the type I error rate and increases power. We choose the factor γ to give conditional power 0.9 if θ is equal to ˆθ 2, with the constraint γ 6 so sample size can be at most 4 times the original maximum. 11

Sample size modification to increase power Re-design has raised the power curve at all effect sizes. 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 Adaptive test with conditional power 0.9 at estimated θ Original 5 group test 0 0 0.2 0.4 0.6 0.8 1 1.2 θ/δ Overall power at θ = δ/2 has increased from 0.37 to 0.68. 12

Sample size modification to increase power Reasons for re-design arose purely from observing ˆθ 2. A group sequential design responds to such interim estimates in the decision to stop the trial or to continue. Investigators could have considered at the design stage how they would respond to low interim estimates of effect size. If they had thought this through and chosen the above adaptive procedure, they could also have examined its overall power curve. Assuming this power curve were acceptable, how else might it have been achieved? An alternative group sequential design Five-group designs matching key features of the adaptive test can be found. To be comparable, power curve should be as high as that of the adaptive design. Can expected sample size be lower too? 13

Sample size modification to increase power Power of our matched group sequential design is as high as that of the adaptive design at all effect sizes and substantially higher at the largest θ values. 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 Matched group sequential test Adaptive test with conditional power 0.9 at estimated θ Original 5 group test 0 0 0.2 0.4 0.6 0.8 1 1.2 θ/δ 14

Sample size modification to increase power The group sequential design has significantly lower expected information than the adaptive design over a range of effect sizes. The group sequential design has slightly higher expected information for θ > 0.8 δ, but this is where its power advantage is greatest. 2.5 2 1.5 E(Inf) 1 0.5 Matched group sequential test Adaptive test with conditional power 0.9 at estimated θ 0 0 0.2 0.4 0.6 0.8 1 1.2 15 θ/δ

Sample size modification to increase power Jennison & Turnbull (Biometrika, 2006) define an Efficiency Ratio to compare expected sample size, adjusting for differences in attained power. By this measure, the adaptive design is up to 39% less efficient than the non-adaptive, group sequential alternative. Efficiency ratio of adaptive design vs group sequential test 100 90 80 Efficiency ratio 70 60 50 40 30 20 10 0 0 0.2 0.4 0.6 0.8 1 1.2 16 θ/δ

Sample size modification to increase power We have found similar inefficiency relative to group sequential tests in a wide variety of proposed adaptive designs. Paradoxically, adaptive designs should have an advantage from the extra freedom to choose group sizes in a response-dependent manner. Jennison & Turnbull (Biometrika, 2006) show that adaptation can lead to gains in efficiency over non-adaptive group sequential tests but the gains are very slight. Moreover, sample size rules based on conditional power are far from optimal, hence the poor properties of adaptive designs using such rules. Conclusion: Specify power properly at the outset, then Group sequential designs offer a simple and efficient option. 17

Case study 2: Switching to a patient sub-population A trial protocol defines a specific target population. Suppose it is believed the treatment may be effective in a certain sub-population, even if it is ineffective in the rest of the population. Enrichment: Focusing recruitment on a sub-population At an interim analysis, the options are: Continue as originally planned, or Restrict the remainder of the study to a sub-population. This choice will affect the licence a positive outcome can support. The possibility of testing several null hypotheses means a multiple testing procedure must be used. 18

Enrichment: Example Sub-population 1 (proportion λ 1 ) θ 1 θ 2 Sub-population 2 (proportion λ 2 ) Overall treatment effect is θ = λ 1 θ 1 + λ 2 θ 2. We may wish to test: The null hypothesis for the full population, H 0 : θ 0 vs θ > 0, The null hypothesis for sub-population 1, H 1 : θ 1 0 vs θ 1 > 0, The null hypothesis for sub-population 2, H 2 : θ 2 0 vs θ 2 > 0. 19

Multiple testing procedures Suppose k null hypotheses, H i : θ i 0 for i = 1,..., k, are to be considered. A procedure s family-wise error rate under a set of values (θ 1,..., θ k ) is P r{reject H i for some i with θ i 0} = P r{reject any true H i }. The family-wise error rate is controlled strongly at level α if this error rate is at most α for all possible combinations of θ i values. Then P r{reject any true H i } α for all (θ 1,..., θ k ). With such strong control, the probability of choosing to focus on the parameter θ i and then falsely claiming significance for null hypothesis H i is at most α. 20

Closed testing procedures (Marcus et al, Biometrika, 1976) For each subset I of {1,..., k}, we define the intersection hypothesis H I = i I H i. We construct a level α test of each intersection hypothesis H I : this test rejects H I with probability at most α whenever all hypotheses specified in H I are true. Closed testing procedure The simple hypothesis H j : θ j 0 is rejected if, and only if, H I is rejected for every set I containing index j. Proof of strong control of family-wise error rate Let Ĩ denote the set of indices of all true hypotheses H i. For a family-wise error to be committed, we must reject HĨ. Since HĨ is true, P r{reject HĨ} = α and, thus, the probability of a family-wise error is no greater than α. 21

Enrichment: Example Sub-population 1 (proportion λ 1 ) θ 1 θ 2 Sub-population 2 (proportion λ 2 ) First, consider a design testing for a whole population effect, θ = λ 1 θ 1 + λ 2 θ 2. The design has two analyses and one-sided type I error probability 0.025. Sample size is set to achieve power 0.9 at θ = 20. Data in each stage are summarised by a Z-value: Stage 1 Stage 2 Overall H 0 Z 1,0 Z 2,0 Z 0 = 1 2 Z 1,0 + 1 2 Z 2,0 22

Enrichment: Example Two stage design, testing for a whole population effect, θ. Stage 1 Stage 2 Overall H 0 Z 1,0 Z 2,0 Z 0 = 1 2 Z 1,0 + 1 2 Z 2,0 Decision rules: If Z 1,0 < 0 Stop at Stage 1, Accept H 0 If Z 1,0 0 Continue to Stage 2, then If Z 0 < 1.95 Accept H 0 If Z 0 1.95 Reject H 0 23

Enrichment: Example Assume equal sub-population proportions, so λ 1 = λ 2 = 0.5. Properties of design for the whole population effect, θ: θ 1 θ 2 θ Power for H 0 20 20 20 0.90 10 10 10 0.37 20 0 10 0.37 0 20 10 0.37 Is it feasible to identify at Stage 1 that θ is low, but it would be worthwhile to switch resources to test a sub-population? 24

Enrichment: A closed testing procedure We wish to be able to consider three null hypotheses: On rejection, conclude: H 0 : θ 0 Treatment is effective in the whole population H 1 : θ 1 0 Treatment is effective in sub-population 1 H 2 : θ 2 0 Treatment is effective in sub-population 2 To apply a closed testing procedure, we need tests of intersection hypotheses: H 01 : θ 0 and θ 1 0 H 02 : θ 0 and θ 2 0 H 12 = H 012 : θ 1 0 and θ 2 0 (Since θ = λ 1 θ 1 + λ 2 θ 2, it is clear that H 012 is identical to H 12.) 25

Enrichment: An adaptive design At Stage 1, if ˆθ < 0, stop to accept H 0 : θ 0. If ˆθ > 0 and the trial continues: If ˆθ2 < 0 and ˆθ 1 > ˆθ 2 + 8 Restrict to sub-population 1 and test H 1 only, needing to reject H 1, H 01, H 12, H 012. If ˆθ1 < 0 and ˆθ 2 > ˆθ 1 + 8, Restrict to sub-population 2 and test H 2 only, needing to reject H 2, H 02, H 12, H 012. Else, Continue with full population and test H 0, needing to reject H 0, H 01, H 02, H 012. The same total sample size for Stage 2 is retained in all cases, increasing the numbers for the chosen sub-population when enrichment occurs. 26

Enrichment: An adaptive design Each null hypothesis, H i say, is tested in a 2-stage group sequential test. With Z-statistics Z 1 and Z 2 from Stages 1 and 2, H i is rejected if Z 1 0 and 1 2 Z 1 + 1 2 Z 2 1.95. When continuing with the full population, we use Z-statistics: Stage 1 Stage 2 H 0 Z 1,0 Z 2,0 H 01 Z 1,0 Z 2,0 H 02 Z 1,0 Z 2,0 H 012 Z 1,0 Z 2,0 where Z i,0 is based on ˆθ from responses in Stage i. So, there is no change from the original test of H 0. 27

Enrichment: An adaptive design When switching to sub-population 1, we use: Stage 1 Stage 2 H 1 Z 1,1 Z 2,1 H 01 Z 1,0 Z 2,1 H 12 = H 012 Z 1,0 Z 2,1 When switching to sub-population 2, we use: Stage 1 Stage 2 H 2 Z 1,2 Z 2,2 H 02 Z 1,0 Z 2,2 H 12 = H 012 Z 1,0 Z 2,2 where Z i,j is based on ˆθ j from responses in Stage i. The need to reject intersection hypotheses adds to simple tests of H 1 or H 2. 28

Simulation results: Power of non-adaptive and adaptive designs Non-adaptive Adaptive θ 1 θ 2 θ Full pop n Sub-pop Sub-pop Full Total 1 only 2 only pop n 1. 30 0 15 0.68 0.42 0.00 0.42 0.84 2. 20 20 20 0.90 0.03 0.03 0.84 0.90 3. 20 10 15 0.68 0.10 0.01 0.60 0.71 Case 1: Testing focuses (correctly) on H 1, but it is still possible to find an effect (wrongly) for the full population. Case 2: Restricting to a sub-population reduces power for finding an effect in the full population. Case 3: Adaptation improves power overall, but there is a small probability of restricting to the wrong sub-population. 29

Enrichment: Example The rules for sticking or switching to a sub-population can be adjusted, but we cannot eliminate the probability of making an error in these decisions. This is to be expected since the standard error of interim estimates ˆθ 1 and ˆθ 2 is 12.3 much higher than the differences between θ 1 and θ 2 that interest us. Similar results are found if only one sub-population is specified as a candidate for restricted sampling. Conclusions: 1. Switching to a sub-population can improve power. In general, larger sample sizes are needed for accurate sub-population inference. 2. This form of adaptation goes beyond standard group sequential designs. 30

Conclusions Together, group sequential tests and adaptive designs provide a powerful set of techniques for clinical trial design. For early stopping, I would recommend group sequential tests, not adaptive sample size re-estimation (Case Study 1). Other applications involving multiple hypotheses need to use adaptive methods (Case Study 2). Adaptation can be part of a pre-planned and pre-tested trial design does it do what is needed? does it do so efficiently? Flexible adaptation brings risks as well as opportunities. 31