Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Similar documents
Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Group sequential designs for Clinical Trials with multiple treatment arms

Adaptive clinical trials with subgroup selection

Group Sequential Trial with a Biomarker Subpopulation

The Design of a Survival Study

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Multiple Testing in Group Sequential Clinical Trials

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Testing a Primary and a Secondary Endpoint in a Confirmatory Group Sequential Clinical Trial

Group sequential designs with negative binomial data

Adaptive Designs: Why, How and When?

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Overrunning in Clinical Trials: a Methodological Review

Group Sequential Tests for Delayed Responses

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Group-Sequential Tests for One Proportion in a Fleming Design

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

The SEQDESIGN Procedure

On Generalized Fixed Sequence Procedures for Controlling the FWER

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Estimation in Flexible Adaptive Designs

METHODS OF EVALUATION OF PERFORMANCE OF ADAPTIVE DESIGNS ON TREATMENT EFFECT INTERVALS AND METHODS OF

Adaptive Dunnett Tests for Treatment Selection

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Pubh 8482: Sequential Analysis

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Group sequential designs for negative binomial outcomes

Pubh 8482: Sequential Analysis

Adaptive Treatment Selection with Survival Endpoints

University of California, Berkeley

Independent Increments in Group Sequential Tests: A Review

Pubh 8482: Sequential Analysis

Exact Inference for Adaptive Group Sequential Designs

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

A note on tree gatekeeping procedures in clinical trials

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

OPTIMAL, TWO STAGE, ADAPTIVE ENRICHMENT DESIGNS FOR RANDOMIZED TRIALS USING SPARSE LINEAR PROGRAMMING

Adaptive Survival Trials

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Alpha-recycling for the analyses of primary and secondary endpoints of. clinical trials

Optimizing trial designs for targeted therapies - A decision theoretic approach comparing sponsor and public health perspectives

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Repeated confidence intervals for adaptive group sequential trials

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Sample Size/Power Calculation by Software/Online Calculators

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

Pubh 8482: Sequential Analysis

Adaptive Enrichment Designs for Randomized Trials with Delayed Endpoints, using Locally Efficient Estimators to Improve Precision

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0)

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Interim Monitoring of. Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Finding Critical Values with Prefixed Early. Stopping Boundaries and Controlled Type I. Error for A Two-Stage Adaptive Design

Analysis of Multiple Endpoints in Clinical Trials

Control of Directional Errors in Fixed Sequence Multiple Testing

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

Optimal exact tests for multiple binary endpoints

Calculation of Efficacy and Futiltiy Boundaries using GrpSeqBnds (Version 2.3)

ADAPTIVE SEAMLESS DESIGNS: SELECTION AND PROSPECTIVE TESTING OF HYPOTHESES

Statistica Sinica Preprint No: SS R1

4. Issues in Trial Monitoring

Technical Manual. 1 Introduction. 1.1 Version. 1.2 Developer

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

Citation for final published version: Publishers page:

Pubh 8482: Sequential Analysis

Designs for Clinical Trials

Published online: 10 Apr 2012.

BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES

arxiv: v2 [stat.me] 6 Sep 2013

On the Inefficiency of the Adaptive Design for Monitoring Clinical Trials

An Alpha-Exhaustive Multiple Testing Procedure

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Alpha-Investing. Sequential Control of Expected False Discoveries

INTERIM MONITORING AND CONDITIONAL POWER IN CLINICAL TRIALS. by Yanjie Ren BS, Southeast University, Nanjing, China, 2013

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

The Limb-Leaf Design:

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Bayesian designs of phase II oncology trials to select maximum effective dose assuming monotonic dose-response relationship

Stepwise Gatekeeping Procedures in Clinical Trial Applications

Bonferroni - based gatekeeping procedure with retesting option

A simulation study for comparing testing statistics in response-adaptive randomization

Transcription:

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem Lei Yang, Ph.D. Statistical Scientist, Roche (China) Holding Ltd. Aug 30 th 2018, Shanghai Jiao Tong University Page 1

GROUP SEQUENTIAL DESIGN & EFFICACY SUPERIORITY INTERIM ANALYSIS Group sequential design Potentially stop trial after interim analysis for futility or efficacy Look into the data multiple times while maintaining integrity Early stopping for efficacy superiority at interim analysis All patients should have timely access to new effective treatment Launch new product early Early analysis may reveal problems (e.g. compliance, accrual rate) Page 2

HIERARCHICAL TESTING Group C H 0,BM H Group B 1 1 H 0,BM H & M Group A H 0, All-comer Test primary population at full alpha If significant, test next population with full alpha recycled and so forth Fixed testing sequence ranking by clinical relevance or likelihood of success Strong control of familywise error rate (FWER) due to (1) prospective specification of the testing sequence and (2) no further testing once the sequence breaks Carefully selecting the ordering of the tests is essential FDA Guidance for Industry (2017); Hung et al (2007); Glimm et al (2010); Tamhane et al (2010) Page 3

STRAIGHTFORWARD TESTING WITH ONE PRIMARY ANALYSIS ONLY Primary Analysis OS: BM H OS: BM H & M OS: All-comer FPE +32 months FPE: First patient enrolled Page 4

NESTED SUBGROUP ANALYSES WITHOUT IA Possible scenarios with 2 BM subgroups: Panel I: non-nested subgroup analyses where subgroups can be disjoint (Panel Ia) or can intersect (Panel Ib); Panel II: nested subgroup analyses Spiessens and Debois (2010) Page 5

INTERIM ANALYSIS INTRODUCES MORE COMPLEXITY Interim Analysis Primary Analysis OS: BM H alpha=0.004* OS: BM H OS: BM H & M alpha=0.004 OS: BM H & M OS: All-comer alpha=0.004 OS: All-comer FPE +21 months +32 months *O Brien-Fleming boundary used at 60% information fraction FPE: First patient enrolled Page 6

NESTED AND INTERSECTANT SUBGROUP ANALYSES WITH IA Possible scenarios with 2 BM subgroups: Panel I: non-nested subgroup analyses where subgroups can be disjoint (Panel Ia) or can intersect (Panel Ib); Panel II: nested subgroup analyses Spiessens and Debois (2010) Page 7

A MIXTURE OF INTERSECTANT AND NESTED SUBGROUPS Interim Analysis Primary Analysis OS: BM H alpha=0.004* OS: BM H OS: BM H & M alpha=0.004 OS: BM H & M OS: All-comer alpha=0.004 OS: All-comer FPE +21 months +32 months *O Brien-Fleming boundary used at 60% information fraction FPE: First patient enrolled Page 8

AN ARTIFICIAL PHASE III TIME-TO-EVENT CASE STUDY Drug A vs SOC by randomization ratio 1:1 Population: All-comer & two subgroups defined by predefined cut-offs of CDx assays All-comer (Group A) BM high and medium (Group B) BM high (Group C) Can be extended to more biomarker subgroups Prevalence rates are 75% for Group B and 50% for Group C Primary efficacy endpoint: OS HR assumptions (median OS of drug A vs SOC [months]) Biomarker high: 0.70 (14.3 vs 10) Biomarker high and medium: 0.75 (13.3 vs 10) All-comer: 0.80 (12.5 vs 10) Constant recruitment rate (40 pts/month for all-comer) and no drop-out assumed One-sided test with alpha = 0.025 Page 9

Withou t IA With IA WHEN CORRELATIONS AMONG BIOMARKER SUBGROUPS NOT CONSIDERED Key message HR (median OS [months]) Sample size Marginal power (IA) Events (IA) BM H (Group C) 0.70 (10 vs 14.3) 451 87.6% 305 68% BM H&M (Group B) 0.75 (10 vs 13.3) 676 87.3% 465 69% All-comer (Group A) 0.80 (10 vs 12.5) 902 80.0% 631 70% BM H (Group C) 0.70 (10 vs 14.3) 451 87.6% (40.8%) BM H&M (Group B) 0.75 (10 vs 13.3) 676 87.3% (40.4%) All-comer (Group A) 0.80 (10 vs 12.5) 902 80.0% (31.8%) Sample size determined to give 80% power for all-comer with EPR=70% without considering biomarker subgroups If so, both biomarker subgroups are over-powered when assessed respectively Target number of events slightly larger with IA Event patient ratio, EPR (IA) 307 (184) 68% (41%) 469 (281) 69% (42%) 636 (382) 71% (42%) Question How about if correlations are considered too? Page 10

CANONICAL JOINT DISTRIBUTION WITHOUT IA Distribution of OS effect estimates θ A = log HR A ~N θ A, var(θ A ), I A = var(θ A ) 1 = N A 4 θ B = log HR B ~N θ B, var(θ B ), I B = var(θ B ) 1 = N B 4 θ C = log HR C ~N θ C, var(θ C ), I C = var(θ C ) 1 = N C 4 with I A, I B, I C are information levels in Group A, B and C and N A, N B, N C are numbers of events respectively Under null hypotheses H 0,C : θ C = 0; H 0,B : θ B = 0; H 0,A : θ A = 0, logrank test statistics (Z C = θ C I C, Z B = θ B I B, Z A = θ A I A ) has approximately canonical joint distribution Z C Z B Z A ~N θ C I C θ B I B, θ A I A 1 τ CB τ CA τ CB 1 τ BA τ CA τ BA 1 where τ CB = I C I B, τ CA = I C I A, τ BA = I B I A are information fractions Define score statistics (S C, S B, S A ) = (Z C I 1C, Z B I 1B, Z A I 1A ), then (S C, S B S C, S A S B ) are independent For the case study without IA, τ CB = 0.64, τ CA = 0.46, τ BA = 0.72 Spiessens and Debois (2010); Jennison and Turnbull (1997, 2000) Page 11

OUTCOMES WHEN WITHOUT IA BM H PA BM H & M Allcomer Outcome of label claiming 1 + + + All-comer at PA 2 + + - BM H&M at PA Probability of outcome Accumulated probability 74.46% 74.5% (Marginal power for All-comer: 80.0%) 7.95% 82.4% (Marginal power for BM H&M: 87.3%) 3 + - BM H at PA 5.18% 87.6% (Marginal power for BM H: 87.6%) Hypothesis testing Reject H 0,C, H 0,B, H 0,A under alternative hypotheses at PA Reject H 0,C, H 0,B under alternative hypotheses at PA Reject H 0,C under alternative hypotheses at PA 4 - No 12.41% 100.0% - Key message Powers are lower when taking correlations into consideration (-5.5% for all-comer and -4.9% for BM H&M), when there is no IA Difference might be larger if assumptions changed (HR, prevalence rates, etc) Page 12

CANONICAL JOINT DISTRIBUTION WITH IA Distribution of OS effect estimates θ ka = log HR ka ~N θ A, var(θ ka ), I ka = var(θ ka ) 1 = N ka 4 θ kb = log HR kb ~N θ B, var(θ kb ), I kb = var(θ kb ) 1 = N kb 4 θ kc = log HR kc ~N θ C, var(θ kc ), I kc = var(θ kc ) 1 = N kc 4 with I ka, I kb, I kc are information levels in Group A, B and C and N ka, N kb, N kc are numbers of events respectively at kth stage, where k 1, 2 with 1 for IA, 2 for PA Under null hypotheses H 0,kC : θ C = 0; H 0,kB : θ B = 0; H 0,kA : θ A = 0, logrank test statistics (Z C = θ 1C I 1C, Z B = θ 1B I 1B, Z A = θ 1A I 1A, Z C = θ 2C I 2C, Z B = θ 2B I 2B, Z A = θ 2A I 2A ) has approximately canonical joint distribution where are information fractions Z C Z B Z A Z C Z B Z A ~N θ C θ B θ A θ C θ B θ A I 1C I 1B I 1A I 2C I 2B I 2A, 1 τ CB τ CA τ CB 1 τ BA τ CA τ BA 1 τ CC τ BC τ AC τ CB τ BB τ AB τ CA τ BA τ AA τ CC τ CB τ CA τ BC τ BB τ BA τ AC τ AB τ AA 1 τ C B τ C A τ C B 1 τ B A τ C A τ B A 1 τ CB = I 1C I 1B, τ CA = I 1C I 1A, τ CC = I 1C I 2C, τ CB = I 1C I 2B, τ CA = I 1C I 2A, τ BA = I 1B I 1A, τ BC = (I 1C I 1B ) (I 1C I 2C ), τ BB = I 1B I 2B, τ BA = I 1B I 2A, τ AC = (I 1C I 1A ) I 1C I 2C, τ AB = (I 1B I 1A ) (I 1B I 2B ), τ AA = I 1A I 2A, τ C B = I 2C I 2B, τ C A = I 2C I 2A, τ B A = I 2B I 2A, Define score statistics (S C, S B, S A, S C, S B, S A ) = (Z C I 1C, Z B I 1B, Z A I 1A, Z C I 2C, Z B I 2B, Z A I 2A ), then (S C, S B S C, S A S B, S C S A, S B S C, S A S B ) are not independent anymore For the case study with IA, τ CB = 0.64, τ CA = 0.46, τ CC = 0.60, τ CB = 0.38, τ CA = 0.28, τ BA = 0.72, τ BC = 0.38, τ BB = 0.60, τ BA = 0.43, τ AC = 0.28, τ AB = 0.43, τ AA = 0.60, τ C B = 0.64, τ C A = 0.46, τ B A = 0.72 Spiessens and Debois (2010); Jennison and Turnbull (1997, 2000) Page 13

A PRESPECIFIED TESTING RULE WITH IA Start from testing BM H in IA Overall hierarchical rule* If significant, go to next subgroup in IA, and stop when Allcomer in IA is significant If not significant for any test in IA, go to the same subgroup in PA; then stop if not significant; otherwise move forward until All-comer in PA is significant Key message The so-called overall hierarchical rule achieves strong control of FWER in the case of with an IA *Glimm et al (2010) Page 14

SIMPLIFIED TESTING STRATEGY Interim Analysis Primary Analysis OS: BM H alpha=0.004 OS: BM H OS: BM H & M alpha=0.004 OS: BM H & M OS: All-comer alpha=0.004 OS: All-comer Key message Dimensions reduced (n+1<2n*); nested subgroups only Score statistics with independent increment structure *n is the number of tests in either IA or PA with always n>=2. In the artificial case study, n=3. Page 15

OUTCOMES WHEN WITH IA BM H BM H & M IA PA Outcome of label claiming BM H BM H & M Allcomer Allcomer Probability of outcome Accumulated probability Power of IA 1 + + + All-comer at IA 22.39% 22.4% (Marginal power for Allcomer: 31.8%) 2 + + - + BM H&M at IA, All-comer at PA 8.52% 30.9% 3 + + - - BM H&M at IA 0.02% 30.9% (Marginal power for BM H&M: 40.4%) 4 + - + + BM H at IA, Allcomer at PA 5 + - + - BM H at IA, BM H&M at PA 9.75% 40.7% 0.10% 40.8% 6 + - - BM H at IA 0.02% 40.8% (Marginal power for BM H: 40.8%) Key message Powers are lower when taking correlations into consideration with IA (-9.4% for all-comer and -9.5% for BM H&H respectively at IA) Difference might be larger if assumptions changed Page 16

NON-PROPORTIONAL HAZARD RATIO MARGINAL POWER With out IA Sampl e size Event patient ratio, EPR (IA) Events (IA) PH: Marginal power (IA) BM H (Group C) 451 68% 305 87.6% 62.7% BM H&M (Group B) 676 69% 465 87.3% 62.2% All-comer (Group A) 902 70% 631 80.0% 56.8% NPH(assume 3M efficacy effect delay): Marginal power (IA) With IA BM H (Group C) 451 68% (41%) 307 (184) 87.6% (40.8%) BM H&M (Group B) 676 69% (42%) 469 (281) 87.3% (40.4%) All-comer (Group A) 902 71% (42%) 636 (382) 80.0% (31.8%) 62.7% (13.9%) 62.2% (12.1%) 56.8% (8.1%) Key message Power decrease are 23%~25% for PA when assuming 3 months efficacy effect delay while larger for IA (27%~49%) Decrease depends on specific assumption of treatment effect delay Page 17

NON-PROPORTIONAL HAZARD RATIO A TENTATIVE STRATEGY - How about if correlations are considered too? - Unfortunately NO theory established yet when assuming non-proportional HR! A tentative strategy To approximate the NPH model with a PH model by re-estimating a rough proportional HR* from NPH simulation Pros Very straightforward for audience to understand impacts of NPH Make cross-table comparison possible Cons Lack of solid theory support and good accuracy Need further investigation on the HR reestimation * e.g., with a relatively stable proportional HR around the timing of PA and beyond Page 18

NON-PROPORTIONAL HAZARD RATIO POWER FINDINGS BM H PA/IA BM H & M Allcomer Outcome of label claiming 1 + + + All-comer at PA 2 + + - BM H&M at PA PH Power of PA (IA)/Assumed HR NPH* Power of PA (IA)/Reestimated HR 74.5% (22.4%)/0.70 43.0% (7.9%)/0.77 82.4% (30.9%)/0.75 53.0% (12.2%)/0.81 3 + - BM H at PA 87.6% (40.8%)/0.80 62.7% (19.0%)/0.85 Key message Either taking correlation into account or NPH will lower down powers for both IA and PA When taking correlations into account, power decrease from PH to NPH is usually larger than the case of marginal power for PA (for IA, it is on the contrary) When assuming the NPH model, power decrease from marginal power to correlated one is usually larger than the case of PH model for PA (for IA, it is on the contrary) IA power for NPH considering correlation >= marginal IA power for NPH is mostlikely due to poor approximation of the tentative strategy (intuitively it should be < ) More investigation needed to understand the why * Assuming there is a 3 months efficacy effect delay Page 19

TOPICS FOR FURTHER EXTENSION Not uncommon in CIT New theory needs to be further developed Nonproportional HR assumption Secondary endpoint included in testing sequence Different alpha spending methods for primary & secondary? Correlation Power might be increased for secondary endpoint e.g., O B-F boundary for BM H vs (refined) Pocock boundary for others? Power might be increased for BM H&M and/or all-comer Different alpha spending methods over subgroups New testing method beyond hierarchical testing e.g., weighted test with loop-back or weighted test exploiting correlation Power might be increased Maurer and Bretz (2013); Xi et al (2017); Tamhane et al (2018) Page 20

TAKE-AWAY MESSAGES Interim analysis brings more complexity to group sequential design with multiple biomarker subgroups Correlations among subgroups should be considered to calculate power Scenario planning for claiming study success at IA/PA is be pre-discussed prior to study readouts Group sequential testing method, non-proportional HR and alpha spending methods need to be fully discussed prior to study conduct Page 21

THANK YOU Page 22