Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Similar documents
Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Adaptive Dunnett Tests for Treatment Selection

Group sequential designs for Clinical Trials with multiple treatment arms

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Adaptive clinical trials with subgroup selection

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Estimation in Flexible Adaptive Designs

Adaptive Treatment Selection with Survival Endpoints

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Multiple Testing in Group Sequential Clinical Trials

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

Adaptive Designs: Why, How and When?

Pubh 8482: Sequential Analysis

Finding Critical Values with Prefixed Early. Stopping Boundaries and Controlled Type I. Error for A Two-Stage Adaptive Design

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

The Limb-Leaf Design:

On Generalized Fixed Sequence Procedures for Controlling the FWER

Optimal exact tests for multiple binary endpoints

Control of Directional Errors in Fixed Sequence Multiple Testing

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Maximum type 1 error rate inflation in multiarmed clinical trials with adaptive interim sample size modifications

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Repeated confidence intervals for adaptive group sequential trials

Pubh 8482: Sequential Analysis

ADAPTIVE SEAMLESS DESIGNS: SELECTION AND PROSPECTIVE TESTING OF HYPOTHESES

UNIFORMLY MOST POWERFUL TESTS FOR SIMULTANEOUSLY DETECTING A TREATMENT EFFECT IN THE OVERALL POPULATION AND AT LEAST ONE SUBPOPULATION

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

An extrapolation framework to specify requirements for drug development in children

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Generalizing the MCPMod methodology beyond normal, independent data

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Bonferroni - based gatekeeping procedure with retesting option

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Approaches to sample size calculation for clinical trials in small populations

Bayesian designs of phase II oncology trials to select maximum effective dose assuming monotonic dose-response relationship

Group sequential designs for negative binomial outcomes

Group-Sequential Tests for One Proportion in a Fleming Design

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Generalizing the MCPMod methodology beyond normal, independent data

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Group Sequential Trial with a Biomarker Subpopulation

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Group sequential designs with negative binomial data

Group Sequential Designs: Theory, Computation and Optimisation

Lecture 6 April

University of California, Berkeley

Two-stage k-sample designs for the ordered alternative problem

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

Multiple testing: Intro & FWER 1

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Statistica Sinica Preprint No: SS R1

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Adaptive Survival Trials

Familywise Error Rate Controlling Procedures for Discrete Data

Alpha-recycling for the analyses of primary and secondary endpoints of. clinical trials

Philippe Delorme a, Pierre Lafaye de Micheaux a,

Optimal rejection regions for testing multiple binary endpoints in small samples

Exact Inference for Adaptive Group Sequential Designs

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

An Alpha-Exhaustive Multiple Testing Procedure

Testing a Primary and a Secondary Endpoint in a Confirmatory Group Sequential Clinical Trial

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

A note on tree gatekeeping procedures in clinical trials

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

Adaptive Enrichment Designs for Randomized Trials with Delayed Endpoints, using Locally Efficient Estimators to Improve Precision

BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES

Optimizing trial designs for targeted therapies - A decision theoretic approach comparing sponsor and public health perspectives

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Overrunning in Clinical Trials: a Methodological Review

A simulation study for comparing testing statistics in response-adaptive randomization

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Structured testing of 2 2 factorial effects: an analytic plan requiring fewer observations

6 Sample Size Calculations

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

High-throughput Testing

discovery rate control

Dose-response modeling with bivariate binary data under model uncertainty

Step-down FDR Procedures for Large Numbers of Hypotheses

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

New Developments in East

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Multiple Testing. Anjana Grandhi. BARDS, Merck Research Laboratories. Rahway, NJ Wenge Guo. Department of Mathematical Sciences

This paper has been submitted for consideration for publication in Biometrics

Alpha-Investing. Sequential Control of Expected False Discoveries

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics

Sequential/Adaptive Benchmarking

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

ROI ANALYSIS OF PHARMAFMRI DATA:

Type-II Generalized Family-Wise Error Rate Formulas with Application to Sample Size Determination

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

The Design of a Survival Study

Transcription:

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim Frank Bretz Statistical Methodology, Novartis Joint work with Martin Posch (Medical University of Vienna) and Willi Maurer (Novartis) Adaptive Designs KOL Lecture Series February 12, 2010

Adaptive Designs with Treatment Selection BRETZ ET AL., 2009 First Stage Selection Second Stage A B C D Control Second Stage Trial is planned using first stage data. Efficacy is demonstrated with data from Stages 1 + 2. Control of the familywise error rate is required. EMEA REFLECTION PAPER, 2007

The Simulation Approach for Type I error control For predefined adaptation rules adjusted critical values are determined by simulation or numerical integration under the global null hypothesis. Select the best THALL ET AL. 88, STALLARD AND TODD, 03,... Bayesian adaptive designs BERRY ET AL. 07,...

Case Study Late development phase of a new drug for a life-threatening disease Primary objective: compare two doses of a new compound with standard-of-care for efficacy and safety Three parallel groups: C: standard-of-care L: standard-of-care + low dose of new compound H: standard-of-care + high dose of new compound Primary endpoint: 28-day all-cause mortality θ C, θ L, θ H... mortality rate in the control, the low dose and the high dose group One-sided tests of the null hypotheses H L : θ C = θ L, H H : θ C = θ H against H L : θ C > θ L, H H : θ C > θ H

Adaptation Strategy Planned total sample size is N T = 2100. The first 900 patients are randomized in a 1:1:1 ratio Interim analysis after 900 patients are observed Discontinue one of the treatments L or H or the entire study based on futility criteria. If both experimental treatments are continued: randomize the remaining 2100-900=1200 patients in a 1:1:1 ratio to the two experimental treatment arms or the control group. If only one experimental treatment is continued: randomize the remaining 1200 patients in a 1:1 ratio to the remaining treatment arm or the control group.

Futility Stopping Criteria r C,1, r L,1, r H,1... interim event rates in the three groups. f 1, f 2 pre-specified futility thresholds. Drop treatment L for futility if Drop treatment H for futility if r L,1 r C,1 > 1 + f 1. r H,1 r C,1 > 1 + f 1 or if L is continued and r H,1 r L,1 > f 2.

The Test Procedure Y i... test statistic to test H i, i = L, H Reject H i, if treatment i is continued to the final analysis and Y i c for some critical value c Two choices for Y i : max test pooled test Choose c to control the Familywise Error Rate (FWER)

The max Test Y i = Z i, i = L, H, where Z i = r C r i 2 ri (1 r i ) n1 + n 2, i = L, H r L, r H, r C... events rates in the three groups based on all data available in the final analysis. r i = (r i + r C )/2, i = L, H n 1, n 2... first and second stage per group sample sizes.

The pooled Test { min(zi, Z Y i = p ) if both treatments are selected if only one treatment is selected Z i where Z p compares the control with the pooled treatment groups: Z p = r C (r L + r H )/2 2 r(1 r) 2(n1 + n 2 ), r = (r C + r L + r H )/3

Simulation Algorithm The critical value c is determined by a simulation algorithm: Step 1 Specify: Test statistics, f 1, f 2, n 1, θ = θ C = θ L = θ H Step 2 Generate first stage rates drawn from the B(θ, n 1 ) distribution. Apply the futility criteria. Step 3 For the selected treatments determine n 2 generate second stage rates compute Y i. Step 4 Set Y i = for dropped treatments. Set Y = max i=l,h Y i. Step 5 Repeat Steps 2-3-4 Definition of the critical value: c = 100(1 α)th percentile of the pool of Y values.

Simulated critical boundaries For nominal level α = 0.025 using 10 6 simulation runs The simulated critical values depend on the assumed true mortality rate in the control group and the futility thresholds. Mortality Rate Futility Bounds Max test Pooled test θ C = θ L = θ H f 1 (%) f 2 (%) c c 0.1 20 10 2.22 1.96 0.1 10 1 2.13 2.08 0.2 20 10 2.21 1.94 0.2 10 1 2.11 2.07 The mortality rate in the control group is unknown Possible approach: Use the observed mortality rate in the control group. Impact on Type I error rate?

Weak Control of the Familywise Error Rate Assuming the true mortality rate of the control group is known, the simulation based test under the assumption θ L = θ H = θ C controls the FWER up to Monte Carlo error. For other mortality configurations the FWER could still be larger than α: Example: Let θ C = 0.2, f 1 = 20%, f 2 = 10% and consider the pooled test. Under the scenario θ L = θ C = 0.2, θ H = 0.1 the type I error to reject H L is 2.6%.

The Impact of Unscheduled Adaptations Unscheduled dropping of treatments based on safety endpoints may inflate the significance level. Assume that in addition to the pre-planned interim analysis after 900 patients a safety analysis after 1500 patients is performed. Worst case example: At the safety analysis, we drop a treatment if this increases the conditional probability to reject a null hypothesis given the data observed so far. Mortality Rate Futility Bounds Type I Error θ C = θ L = θ H f 1 (%) f 2 (%) Max test Pooled test 0.1 20 10 2.59 4.31 0.2 20 10 2.59 4.52

Limitations of Simulation Based Procedures The distribution of the final test statistics under the null hypotheses may depend on unknown parameters e.g., the unknown configuration of true and false null hypotheses and the effect sizes of the false null hypothesis. the absolute treatment effects, if the adaptation rule depends on interim estimates of absolute treatment effects. the joint distribution of data that is correlated with the primary endpoint and used in the adaptations. the joint distribution of covariates if the test statistics is adjusted for covariates.

When do Simulation Based Procedures Work? If a point null hypothesis is tested and the distribution of the final test statistics is fully specified under the null hypothesis If the least favorable configuration (LFC) across all nuisance parameters can be identified The Type I error rate at the LFC gives an upper bound for the Type I error rate The resulting test will be strictly conservative. The LFC maybe difficult to identify.

The Adaptive Bonferroni Holm Test Based on conditional error rates (MÜLLER AND SCHÄFER, 2001) adaptive level α tests for the intersection hypothesis and the elementary hypotheses are defined. Reject an elementary hypothesis at multiple level α if both, the intersection and the elementary test reject. The conditional error rate approach requires the computation of the conditional error rate for the test of the intersection hypothesis which involves the conditional joint distribution of Z L, Z H. We propose an alternative test that is based on the sum of marginal conditional error rates of the elementary tests at level α/2. POSCH, MAURER, BRETZ, 2009

The Bonferroni Holm Test as Closed Test Non-Adaptive Fixed Sample Test Z L, Z H... Z-statistics for H L, H H. z q... standard normal q-quantiles 1 Global Nullhypothesis H L H H reject if Z L z 1 α/2 or Z H z 1 α/2 If H L H H is rejected: 2 Elementary Hypotheses H L, H H Adaptive tests: Reject H L if Z L z 1 α, Reject H H if Z H z 1 α 1 Global null hypothesis: Adaptive Bonferroni Test 2 Elementary null hypotheses: Conditional Rejection Probability Test

Adaptive Test for Elementary Hypotheses E.g., to test H H if no treatment was dropped, perform the fixed sample test (i.e. reject if Z H z 1 α ). if only treatment H is continued: Let Reject H H if a H = P ( Z H z 1 α Z H,t ) Z H,t+ z 1 ah MÜLLER & SCHÄFER 2001

Adaptive Bonferroni Test for H L H H (Z L,t, Z H,t ),... Z statistics of the observations collected before the adaptation (Z A,t+, Z B,t+ ),... Z statistics of the observations collected after the adaptation The Adaptive Bonferroni Test if no treatment was dropped, perform the fixed sample Bonferroni test (i.e. reject if max(z L, Z H ) z 1 α/2 ). if (say) only treatment H is continued a = P ( Z L z 1 α/2 ZL,t ) + P ( ZH z 1 α/2 ZH,t ) Reject H L H H if Z H,t+ z 1 a

Properties of the adaptive Bonferroni Holm Test The adaptive Bonferroni Holm Test controls the FWER in the strong sense without pre-specification of the selection rule: selection may depend on all interim data including safety and secondary endpoints selection may depend on external data no pre-specification of the analysis time required (continuous monitoring) no binding futility stopping rule sample sizes and allocation ratios may be adapted

Power Comparison of the Procedures After 900 of 2100 patients an interim analysis is performed. Control group mortality rate is θ C = 0.2. Futility parameters f 1 = 10%, f 2 = 1% Rates max test (%) pooled test (%) Adaptive BH (%) θ L θ H P any P L P H P any P L P H P any P L P H 0.2 0.14 82 1.5 82 77 1.7 76 80 2.1 80 0.14 0.14 92 82 32 92 83 32 92 84 31 P any... power to reject any elementary hypothesis P L (P H )... power to reject P L (P H ), respectively.

Conclusion Simulation of the FWER under the global null hypothesis in adaptive multi-armed clinical trials is in general insufficient to control the FWER. In settings with nuisance parameters the FWER control may depend on specific assumptions on the nuisance parameters. Robust alternatives are tests based on conditional error rates and combination tests. However, simulations are valuable to assess the power of adaptive tests.

References I F. Bretz, F. König, W. Brannath, E. Glimm, and M. Posch. Adaptive designs for confirmatory clinical trials. Statistics in Medicine, 28:1181 1217, 2003. EMEA. Reflection paper on methodological issues in confirmatory clinical trials planned with an adaptive design. EMEA Doc. Ref. CHMP/EWP/2459/02, 2007. available at http://www.emea.europa.eu/pdfs/human/ewp/245902enadopted.pdf. H. H. Müller and H. Schäfer. A general statistical principle for changing a design any time during the course of a trial. Statistics in Medicine, 23:2497 2508, 2004.

References II M. Posch, W. Maurer, and F. Bretz. Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim. Pharmaceutical Statistics, 2010. to appear. N. Stallard and S. Todd. Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in Medicine, 22:689 703, 2003. P. F. Thall, R. Simon, and S. S. Ellenberg. Two-stage selection and testing designs for comparative clinical trials. Biometrika, 75:303 310, 1988.