Pubh 8482: Sequential Analysis

Similar documents
Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Pubh 8482: Sequential Analysis

Adaptive Designs: Why, How and When?

Pubh 8482: Sequential Analysis

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

The Design of a Survival Study

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

METHODS OF EVALUATION OF PERFORMANCE OF ADAPTIVE DESIGNS ON TREATMENT EFFECT INTERVALS AND METHODS OF

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Group Sequential Designs: Theory, Computation and Optimisation

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Superiority by a Margin Tests for One Proportion

Exact Inference for Adaptive Group Sequential Designs

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Adaptive Treatment Selection with Survival Endpoints

Comparison of Different Methods of Sample Size Re-estimation for Therapeutic Equivalence (TE) Studies Protecting the Overall Type 1 Error

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

ADAPTIVE SEAMLESS DESIGNS: SELECTION AND PROSPECTIVE TESTING OF HYPOTHESES

Multiple Testing in Group Sequential Clinical Trials

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

One-sample categorical data: approximate inference

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Bayesian Optimal Interval Design for Phase I Clinical Trials

Adaptive Prediction of Event Times in Clinical Trials

Sample Size Determination

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Group-Sequential Tests for One Proportion in a Fleming Design

Group sequential designs for Clinical Trials with multiple treatment arms

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

6 Sample Size Calculations

arxiv: v1 [stat.me] 24 Jul 2013

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

TECHNICAL REPORT # 58 APRIL Design resampling for interim sample size recalculation

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Inverse Sampling for McNemar s Test

Generalized Likelihood Ratio Statistics and Uncertainty Adjustments in Efficient Adaptive Design of Clinical Trials

Estimation in Flexible Adaptive Designs

Overrunning in Clinical Trials: a Methodological Review

A simulation study for comparing testing statistics in response-adaptive randomization

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Repeated confidence intervals for adaptive group sequential trials

Are Flexible Designs Sound?

Power and Sample Size Calculations with the Additive Hazards Model

The SEQDESIGN Procedure

Finding Critical Values with Prefixed Early. Stopping Boundaries and Controlled Type I. Error for A Two-Stage Adaptive Design

A PREDICTIVE PROBABILITY INTERIM DESIGN FOR PHASE II CLINICAL TRIALS WITH CONTINUOUS ENDPOINTS

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Group Sequential Tests for Delayed Responses

Sociology 6Z03 Review II

Rerandomization to Balance Covariates

A Bayesian Stopping Rule for a Single Arm Study: with a Case Study of Stem Cell Transplantation

Adaptive clinical trials with subgroup selection

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Adaptive Dunnett Tests for Treatment Selection

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Sample Size Estimation for Studies of High-Dimensional Data

Sampling Distributions

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Multiple samples: Modeling and ANOVA

Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice

How do we compare the relative performance among competing models?

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Section 6.2 Hypothesis Testing

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

University of California, Berkeley

(1) Introduction to Bayesian statistics

The Limb-Leaf Design:

Independent Increments in Group Sequential Tests: A Review

Adaptive Survival Trials

Hypothesis testing for µ:

Confidence Intervals and Hypothesis Tests

Bayesian Enhancement Two-Stage Design for Single-Arm Phase II Clinical. Trials with Binary and Time-to-Event Endpoints

University of California, Berkeley

c Copyright 2014 Navneet R. Hakhu


Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Session 9 Power and sample size

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

Lecture 11 - Tests of Proportions

BIOS 312: Precision of Statistical Inference

INTERIM MONITORING AND CONDITIONAL POWER IN CLINICAL TRIALS. by Yanjie Ren BS, Southeast University, Nanjing, China, 2013

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Sampling Distribution of a Sample Proportion

Transcription:

Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 12

Review So far... We have discussed the role of phase III clinical trials in drug development We discussed the goal and unique characteristics of phase III clinical trials as confirmatory trials We discussed several approaches to adaptive designs in phase III sequential testing adaptively incorporating historical information Seamless phase II/III designs

Sample Size Calculations Phase III clinical trials serve as the final evaluation before the drug is approved One of the most important steps in designing a clinical trial is calculating the sample size We require a large enough sample size to fully evaluate our scientific question Specifically, we desire a high probability of rejecting the null hypothesis if the alternative hypothesis is true

Information required for power calculations The following information is needed to calculate the required sample size null hypothesis and test being used desired type-i error rate desired power alternative hypothesis to base power calculations values of important nuisance parameters

Alternative Hypothesis Clinical trials are designed to reject the null hypothesis with high power assuming that a pre-specified alternative hypothesis is true The alternative hypothesis is usually the expected effect size for the experimental treatment Problem: what if we are unsure about the expected effect size: The study population has changed throughout drug development We have imprecise estimate of the effect size due to underpowered phase II trials etc.

Clinically Meaningful Difference One approach to overcoming this difference is to power a trial to detect a clinically meaningful difference In this case, we have acceptable power for any large difference and do not care about smaller differences Problem It is not easy to define a clinically meaningful difference Furthermore, it is very difficult to achieve unanimity on what constitutes a clinically meaningful difference

Nuisance Parameters In addition to the alternative hypothesis, we also require knowledge about specific nuisance parameters variance for normally distributed data baseline hazard in survival analysis etc. We do not usually have a precise estimates of the parameter of interest We are even less likely to have a precise estimates of nuisance parameters!

External Pilot Data Ideally, estimates of the alternative or nuisance parameters are available from phase II Alternately, we could complete an external pilot study to estimate these parameters An external pilot study will provide preliminary estimates of the alternative hypothesis and nuisance parameters In addition, external pilot data allows us to evaluate our study protocol to identify potential problems in advance of the larger trial

Disadvantages to External Pilot Data External pilot studies are ideal because they are independent of the primary study but are not perfect Running an external pilot study is expensive and delays the start of the trial This is especially true because the protocol of an external pilot study still has to go through local review (IRB, etc.) before opening

Internal Pilot Data An alternate approach is to complete an internal pilot study The internal pilot data can be used to estimate the unknown parameters and estimate the required sample size The study continues to enroll and then a standard, fixed-sample test is completed at study completion

Re-estimation: alternative hypothesis vs. nuisance parameters Internal pilot data can be used to estimate the true difference and the nuisance parameters Re-estimation based on the nuisance parameters will not, generally impact the type-i error rate Re-estimation based on the outcome (i.e. the true difference) can increase the type-i error rate substantially For now, we will only consider sample size re-estimation based on the nuisance parameters

Internal Pilot Study: Example Consider a two-arm, randomized clinical trial Our primary outcome follows a normal distribution with unknown variance x 1, x 2,... N ( µ x, σ 2) y 1, y 2,... N ( µ y, σ 2) We will test the null hypothesis, H 0 : µ x = µ y, using the two-sample t-test

Sample Size Calculation We calculate the sample size using the following formula where n = (Φ (1 α/2) + Φ (1 β))2 2σ 2 δ 2 α is the two-sided type-i error rate β is the type-ii error rate δ = µ x µ y

Initial Sample Size Calculation We begin by calculating an initial sample-size This requires us to specify α, β and δ We make an initial guess at σ0 2 and calculate the sample size based on σ0 2, n 0

Internal Pilot Phase For the internal pilot phase, we randomize a small number of subjects to each group n p << n 0 Using these data, we estimate the variance, ˆσ 2 Using the estimated variance, we are able to estimate the required sample size assuming α, β and δ

Altering the Sample Size Let n 1 be the required sample size estimated from the initial pilot data We do not alter the sample size if n 1 < n 0 If n 1 > n 0, then n 1 replaces n 0 as the total sample size Alternately, you may only re-estimate the sample size if n 1 is substantially larger than n 0 (i.e 25% larger) The trial continues until the total sample size is reached and we compare the two groups using the t-test

Example Recall the Mr Fit study We designed a two-arm trial assuming the following parameters α = 0.05 β = 0.10 δ = 5 σ = 14 This requires a sample size of 165 subjects/group

Example Let s assume that an internal pilot study was run and we estimate ˆσ = 16 A study with 165 subjects/group would be underpowered Assuming σ = 16, we would require a sample size of 216 subjects/group This is a substantial increase over our initial sample size estimate (31% increase)

Simulation Study Consider this small simulation study to investigate the operating characteristics of this design The initial design is based on the following parameters α = 0.05 β = 0.10 δ = 0.5 σ = 1 This results in an initial sample size of 85 subjects/group The sample size of the internal pilot study is 9 subjects/group

Simulation Study Consider the following true value of σ: 1.0, 1.5, 2.0, 2.5, 3.0 These values of σ correspond to sample sizes of 85, 190, 337, 526 and 757 subjects/group We will consider the type-i error rate and power with and without sample size re-estimation

Simulation Results: No Re-estimation σ δ = 0 δ = 0.5 1.0 0.051 0.896 1.5 0.046 0.580 2.0 0.048 0.372 2.5 0.054 0.253 3.0 0.048 0.189

Simulation Results: with Re-estimation σ δ = 0 δ = 0.5 1.0 0.051 0.919 1.5 0.055 0.859 2.0 0.049 0.861 2.5 0.047 0.860 3.0 0.050 0.867

Simulation Study: Conclusions Incorrectly specifying σ leads to dramatically underpowered studies Re-estimating σ in an internal pilot study results in power close to the desired rate in all cases

Internal Pilot Study: Binomial Data Consider the slightly different scenario of comparing two binomial proportions x 1, x 2,... Bern (p x) y 1, y 2,... Bern (p y) We will test the null hypothesis, H 0 : p x = p y, using the two-sample t-test

Sample Size Calculation We calculate the sample size using the following formula where n 0 = (Φ (1 α/2) + Φ (1 β))2 2p (1 p ) δ 2 α is the two-sided type-i error rate β is the type-ii error rate δ = p x p y p = (p x + p y) /2 = (2p x + δ) /2 = p x + δ 2

Internal Pilot Data We can again re-estimate the sample size using an internal pilot study We enroll n p << n subjects/group We re-estimate the sample size, replacing p with either p = ˆp x + δ 2 p = (ˆp x + ˆp y) /2 n = max (n 1, n 0 )

Simulation Study Consider this small simulation study to investigate the operating characteristics of this design The initial design is based on the following parameters α = 0.05 β = 0.10 δ = 0.1 p x =.1 This results in an initial sample size of 268 subjects/group The sample size of the internal pilot study is 14 subjects/group

Simulation Study Consider the following true value of p x :.1,.2,.3,.4,.5 These values of σ correspond to sample sizes of 268, 395, 479, 521 and 521 subjects/group We will consider the type-i error rate and power with and without sample size re-estimation

Simulation Results: No Re-estimation p 0 δ = 0 δ = 0.1 0.1 0.048 0.904 0.2 0.053 0.755 0.3 0.045 0.671 0.4 0.049 0.631 0.5 0.052 0.633

Simulation Results: with Re-estimation p 0 δ = 0 δ = 0.1 0.1 0.047 0.923 0.2 0.050 0.882 0.3 0.052 0.885 0.4 0.049 0.891 0.5 0.052 0.889

Simulation Study: Conclusions We observe the same pattern as before: Incorrectly specifying p x leads to dramatically underpowered studies Re-estimating p x in an internal pilot study results in power close to the desired rate in all cases

Internal Pilot Study: Summary Incorrectly specifying nuisance parameters in a power calculation can have a dramatic impact on power Internal pilot data can be used to estimate these nuisance parameters We can re-estimate the sample size based on these nuisance parameters This results in power closer to the desired rate and does not, generally, inflate the type-i error rate

Re-estimation based on δ To this point, we have only considered sample size re-estimation based on nuisance parameters This has little, if any, impact on the type-i error rate Alternately, one could use internal pilot data to estimate δ and re-estimate the sample size based on the estimated δ

Basic setup The set-up is the same as when we were only estimating nuisance parameters from internal pilot data We estimate an initial sample size, n 0, based on initial guesses of δ and any important nuisance parameters We enroll an initial pilot sample, n p << n 0, and estimate δ and the nuisance parameters We re-estimate the sample size using these estimates, n 1, and set n = max (n 0, n 1 ) How do you test the null hypothesis?

Naive Approach The simplest approach is to test the null hypothesis using a standard Z statistic Z = ˆδ n In This can dramatically inflate the type-i error rate (Cui, Hung and Wang, 1999)

For illustration I completed a simulation study were the initial sample size was estimated using the following: α = 0.05 β = 0.10 δ = 0.5 σ = 1 This requires a sample size of 85 subjects/group I using an internal pilot study of 9 subjects/group to estimate δ and σ This increase the type-i error to 0.10

Unweighted Z statistic The naive approach weights all data equally That is, Z np, is weighted equally to Z n np np Z = n Z n p + 1 n p n Z n n p Where n p is the sample size for the pilot study and n is the re-estimated sample size

Weighted Z statistic An alternate approach is to weight the Z statistic Z = tz np + 1 tz n np Where t = np n 0 That is, t is the ratio of the pilot sample size to the initial sample size

Weighted Z statistic: Characteristics The weighted Z statistic is a weighted average of the Z statistic calculated from the pilot data and the Z statistic calculated using the remaining data This statistic controls the type-i error rate It accomplishes this goal by down-weighting the later data This violates the principal of one patient, one vote

One patient, one vote The principal of one patient, one vote says that each subject should have equal weight in the final analysis The weighted Z statistic gives more weight to the initial pilot study Furthermore, Chen, DeMets and Lan (2004) argue that weighted Z statistic is unnecessary if the interim analysis is blinded

An Alternate Approach An alternate approach to sample size re-estimation proposed by Mehta and Pocock (2010) uses conditional power Conditional power refers to the conditional probability of rejecting the null hypothesis given the current data This is similar to the predictive probability approach in the Bayesian paradigm

Predictive Power Approach At the interim analysis, we estimate ˆδ and determine the conditional power assuming δ = ˆδ Mehta and Pocock consider three zones unfavorable promising favorable

Unfavorable Unfavorable refers to the cases when there is very low conditional power In their example, Mehta and Pocock use 36% as the cut-off In this case, an unreasonably large increase in the sample size would be required Trial is stopped for futility

Favorable Favorable refers to the case when there is high conditional power In their example, Mehta and Pocock use 90% as the cut-off In this case, we are likely to reject with the initial sample size and no increase is required The trial continues as planned

Promising Promising refers to intermediate values of conditional power In their example, Mehta and Pocock consider conditional power between 36% and 90% as promising These trials are promising in the sense that there appears to be a difference between the two groups but we are underpowered to detect this difference They suggest increasing the sample size in this case

Mehta and Pocock Design Determine an initial sample size based on an initial guess at the treatment effect and a maximum sample size for re-estimation At the interim analysis, calculate the conditional power Evaluate the conditional power and re-estimate the sample size accordingly Test the null hypothesis at study completion using a modified test statistic to control the type-i error rate The key to this design is that a plan for re-estimation is pre-planned, which allows them to calculate the type-i error rate and power

Advantages Sample size estimation allows us to salvage studies that are under powered due to poor initial guesses at the effect size The decision to re-estimate the sample size is based on intuitive criteria based on conditional power All decisions are pre-specified, which allows us to evaluate the type-i error rate and power of the design

Disadvantages Statistical significant is closely tied to sample size Any difference will be statistically significant with a large enough sample size A promising conditional power may represent a non clinically significantly different result, rather than an important difference that we are underpowered to detect

Disadvantages This design essentially represents a group sequential design with a pre-specified maximum sample size You can design a design with the same type-i error rate and power by specifying the appropriate group sequential design In essence, this really isn t new and potentially complicates the issue if not done correctly

Sample Size Re-estimation Summary Sample size estimation depends on the correct specification of several important design parameters Incorrectly specifying these parameters can potentially lead to clinical trials that are dramatically under powered Internal pilot data can be used to estimate these unknown parameters Sample size re-estimation based on nuisance parameters increases power when nuisance parameters are misspecified without increasing the type-i error rate Sample size re-estimation based on the true difference can inflate the type-i error rate and the test statistic must be adjusted to control the type-i error rate