Pubh 8482: Sequential Analysis

Similar documents
Pubh 8482: Sequential Analysis

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Adaptive Designs: Why, How and When?

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Pubh 8482: Sequential Analysis

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Pubh 8482: Sequential Analysis

The Design of a Survival Study

Group-Sequential Tests for One Proportion in a Fleming Design

Pubh 8482: Sequential Analysis

A PREDICTIVE PROBABILITY INTERIM DESIGN FOR PHASE II CLINICAL TRIALS WITH CONTINUOUS ENDPOINTS

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

arxiv: v1 [stat.me] 24 Jul 2013

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Bayesian designs of phase II oncology trials to select maximum effective dose assuming monotonic dose-response relationship

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Bayesian Enhancement Two-Stage Design for Single-Arm Phase II Clinical. Trials with Binary and Time-to-Event Endpoints

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

4. Issues in Trial Monitoring

Group Sequential Designs: Theory, Computation and Optimisation

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Optimal Designs for Two-Arm Randomized Phase II Clinical Trials with Multiple Constraints. Wei Jiang

Adaptive Dunnett Tests for Treatment Selection

Bayesian Optimal Interval Design for Phase I Clinical Trials

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Multiple Testing in Group Sequential Clinical Trials

Hierarchical Models & Bayesian Model Selection

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

BAYESIAN STATISTICAL ANALYSIS IN A PHASE II DOSE-FINDING TRIAL WITH SURVIVAL ENDPOINT IN PATIENTS WITH B-CELL CHRONIC LYMPHOCYTIC LEUKEMIA

Dose-response modeling with bivariate binary data under model uncertainty

Inverse Sampling for McNemar s Test

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Bayesian Methods for Machine Learning

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Superiority by a Margin Tests for One Proportion

Sample Size Determination

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Adaptive Trial Designs

Overrunning in Clinical Trials: a Methodological Review

Group sequential designs for Clinical Trials with multiple treatment arms

Sample Size Estimation for Studies of High-Dimensional Data

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

A simulation study for comparing testing statistics in response-adaptive randomization

Two-stage k-sample designs for the ordered alternative problem

Bayesian Uncertainty Directed Trial Designs arxiv: v1 [stat.ap] 29 Jun 2018

METHODS OF EVALUATION OF PERFORMANCE OF ADAPTIVE DESIGNS ON TREATMENT EFFECT INTERVALS AND METHODS OF

The Limb-Leaf Design:

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Modeling & Simulation to Improve the Design of Clinical Trials

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Designs for Clinical Trials

Part 7: Hierarchical Modeling

Introductory Econometrics

Hierarchical expectation propagation for Bayesian aggregation of average data

Chapter Three. Hypothesis Testing

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Phase I design for locating schedule-specific maximum tolerated doses

Technical Manual. 1 Introduction. 1.1 Version. 1.2 Developer

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

UNIVERSITY OF TORONTO Faculty of Arts and Science

Randomized dose-escalation design for drug combination cancer trials with immunotherapy

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Adaptive Treatment Selection with Survival Endpoints

Machine Learning Linear Classification. Prof. Matteo Matteucci

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

A Bayesian Stopping Rule for a Single Arm Study: with a Case Study of Stem Cell Transplantation

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

The SEQDESIGN Procedure

Estimation in Flexible Adaptive Designs

Group Sequential Tests for Delayed Responses

Practical considerations for survival models

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Introduction to Bayesian Statistics

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Evidence synthesis for a single randomized controlled trial and observational data in small populations

Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice

Ling 289 Contingency Table Statistics

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Lecture 01: Introduction

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Model Averaging (Bayesian Learning)

COS513 LECTURE 8 STATISTICAL CONCEPTS

PMR Learning as Inference

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Updating: Discrete Priors: Spring

Transcription:

Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10

Class Summary Last time... We began our discussion of adaptive clinical trials Specifically, we focused on Phase I clinical trials The goal of phase I is to evaluate the safety of a novel chemotherapeutic agent We discussed how this motivates adaptive designs and illustrated the pros and cons of adaptive designs

Phase II Clinical Trials Phase II is the second step in drug development Provides a bridge between evaluating toxicity and final, confirmatory clinical trials The focus in Phase II shifts from toxicity to efficacy Phase II clinical trials are much larger than Phase I (40-200 subjects)

Phase II Clinical Trials In Phase 1, we collect preliminary data on the safety profile, dose and administration schedule The goal of Phase II is to determine if the new drug has adequate efficacy to motivate a larger, phase III clinical trial In addition, we also further evaluate the toxicity profile of the novel agent Phase II clinical trials can be subdivided as follows: phase IIA trials phase IIB trials

Phase IIA The initial assessment of efficacy is made in phase IIA These are usually single arms studies, comparing response rates to historical standards The goal at this phase eliminate ineffective agents

Phase IIB Multi-arm studies New drug is compared to standard of care or other experimental agents Goal is to identify most promising drugs for large confirmatory trials

Role of Phase II The number of candidate agents being developed has increased rapidly over the atl 20 years Large, confirmatory, phase III clinical trials are expensive time-consuming Phase II provides an important proving grown and allows for the winnowing out of ineffective agents

Design of Phase IIA Single-arm, open label studies Phase II trials are relatively small ( 100 subjects) Usually binary endpoints tumor response (i.e. did the tumor shrink?)

Stopping for futility in Phase IIA The majority of novel agents will not have acceptable efficacy We would prefer not to waste time and money studying an ineffective drug This motivates the use of two-stage designs that allow early termination for futility

Two-stage Simon Design The most popular two-stage design for phase II clinical trials is Simon s two-stage design Simon s two-stage design tests the response probability of a binary endpoint (usually tumor response) Simon s design has one interim analysis and allows early termination for futility but not efficacy There are two versions: Simon s optimal two-stage design Simon s minimax design

Two-stage Simon Design: Basic Design Simon s two-stage design is a one-arm design to test p = p 0 vs. p = p a The basic set-up for both designs is as follows Enroll n 1 subjects and let x 1 be the number of responses for the first n 1 subjects If x 1 c 1, stop for futility. Otherwise, enroll n 2 additional subjects and let x 2 be the number of responses for the next x 2 subjects Reject the null hypothesis if x 1 + x 2 > c 2

Simon s Optimal two-stage design The goal of Simon s Optimal two-stage design is to find the design that minimizes the expected sample size under the null E [SS p 0 ] = P (x 1 c 1 p 0 ) n 1 + P (x 1 > c 1 p 0 ) (n 1 + n 2 ) among the class of two-stage designs with null hypothesis p 0 alternative hypothesis p a type-i error α type-ii error β

Simon s Optimal Minimax design The goal of Simon s Optimal minimax design is to find the design that minimizesthe maximum sample size n 1 + n 2 among the class of two-stage designs with null hypothesis p 0 alternative hypothesis p a type-i error α type-ii error β

Simon s Two-stage design: Example Consider a two-stage design with p 0 = 0.2 p a = 0.4 α = 0.1 β = 0.2 Optimal design n 1 = 12, c 1 = 2, n 2 = 13, c 2 = 7 Minimax design n 1 = 14, c 1 = 2, n 2 = 10, c 2 = 7 R code library(clinfun) ph2simon(pu = 0.2, pa = 0.4, ep1 = 0.1, ep2 = 0.2)

Extensions to the Simon Design Several alternate multi-stage designs have been proposed Fleming proposed a two-stage design that allows early termination for futility and efficacy Bryant and Day developed a two-stage design that monitors both efficacy and toxicity Ensign et al. and Chen proposed three-stage designs In principle, these can be derived from the methods discussed in the first half of the course and calculating the exact operating characteristics for binomial data

Design of Phase IIB Usually larger than phase IIA (100-200 subjects) Randomized, multi-arm studies Endpoints binary (tumor response) survival (time to progression) overall survival would take too long

Type I error in phase IIB larger type I error rates are acceptable in phase IIB α can be increased to 0.10 or more This is considered acceptable because the novel agent will be further evaluated in phase III Furthermore, at this point false negatives are a bigger concern than false positives That is, we don t want to miss on a promising drug because of unnecessarily rigid error restrictions

Power in phase IIB Phase IIB studies are usually powered to detect a large expected sample size In this case, we are unlikely to find a statistically significant difference between groups One approach to overcome this problem is to embed a one-sample study within a randomized clinical trial In this way, we still have an avenue for advancing to phase III even if a significant difference is not observed between groups

Pick-the-Winner Design An alternate approach to randomized designs is the pick-the-winner approach For binary endpoints, the pick-the-winner approach treats patients with the treatment that appears to be the best at enrollment Pick-the-winner designs are designed to control the type-ii error and not the type-i error These designs have much higher power than randomized designs These designs can substantially inflate the type-i error rate

Pick-the-Winner Design: Limitations Furthermore, the pick-the-winner is best when there is a clear winner (i.e. when one treatment clearly has the best efficacy) The method struggles when several treatments have similar efficacy In addition, this design will pick a winner regardless if the best treatment has acceptable efficacy There is also no option to terminate early for futility

Summary of standard phase-ii designs Standard methods work well in simple cases but have limited flexibility Standard methods can easily accommodate one and two-arm studies but do not perform well with multiple arms In addition, these designs lack the flexibility to accommodate additional research questions

Bayesian Adaptive Approaches Predictive probability approach Sequential stopping Adaptive randomization and dose allocation Hierarchical modeling

Bayesian Adaptive Approaches Predictive probability approach Sequential stopping Adaptive randomization and dose allocation Hierarchical modeling

Predictive Probability Approach Consider the Phase IIA setting of the two-stage Simon Design In Phase IIA, the goal is to establish that the novel agent has adequate efficacy for further evaluation The two-stage Simon design offers an improvement over a fixed-sample design but it somewhat rigid in it s structure One can extends this basic idea using the predictive probability approach

Predictive Probability: Basic Approach A maximum sample size is determined before the study begins Interim analyses are completed throughout the study At each interim analysis, we calculate the probability rejecting the null hypothesis if the study reaches full enrollment The decision to continue the study is based on this probability

Predictive Probability for binomial data Let Y 1, Y 2,... be i.i.d. Bernoulli random variables with probability p We would like to test the following hypothesis H 0 : p p 0 H 1 : p > p 0 Let N max be the maximum number of subjects we can accrue

Likelihood and Prior Let X j = j i=1 Y i The likelihood after the first j observations is L ( X j p ) ( ) j = p X j (1 p) j X j X j We will put a Beta (a, b) prior on p This results in a Beta posterior p x Beta ( a + X j, b + j X j )

Predictive Probability For simplicity, assume that we will reject the null hypothesis at study completion if P (p X Nmax ) > γ for a pre-specified γ Let X Nmax j = X Nmax X j = N max i=j+1 Y i At each interim analysis, we will calculate the predictive probability, which is defined as: P ( P ( ) ) p > p 0 X j, X Nmax j > γ Xj

Calculating the Predictive Probability The predictive probability is calculated as follows: PP = = I P(p>p0 X j,x Nmax j)>γ f ( x Nmax j p ) f ( p x j ) dxnmax jdp I P(p>p0 X j,x Nmax j)>γ f ( x Nmax j x j ) dxnmax j where the second line integrates out p and makes use of the fact that: X Nmax j x j Beta Binomial ( N max j, a + x j, b + j x j )

Calculating the Predictive Probability The previous integral can be re-written as a sum as follows: PP = I P(p>p0 X j,x Nmax j)>γ f ( x Nmax j x j ) dxnmax j = N max j i=0 P ( X Nmax j = i x j ) IP(p>p0 X j,x Nmax j)>γ where P ( X Nmax j = i x j ) is calculated from the beta-binomial distribution

Predictive Probability algorithm Identify N max At each interim analysis, we calculate the predictive power (PP) If PP < θ L, stop the trial and reject the alternative hypothesis If PP > θ U, stop the trial and reject the null hypothesis Otherwise, continue the trial until N max is reached

Predictive Probability Example Consider a phase IIA trial with the following characteristics N max = 40 p 0 = 0.60 γ = 0.90 Beta (0.6, 0.4) prior on p

Predictive Probability Example Consider an interim analysis with j = 23 and x j = 16 p x j Beta (16.6, 24.4) X Nmax j Beta Binomial (17, 16.6, 7.4) p x j, X Nmax j = i Beta (16.6 + i, 24.4 i)

Predictive Probability Example i P ( ) X Nmax j = i x j P ( p > 0.60 x j, X Nmax j = i ) I P(p>0.60 xj,x Nmax j =i)>0.90 0 0.000 0.01 0 1 0.000 0.01 0 2 0.000 0.02 0 3 0.001 0.06 0 4 0.002 0.10 0 5 0.006 0.17 0 6 0.014 0.27 0 7 0.028 0.38 0 8 0.050 0.51 0 9 0.079 0.63 0 10 0.113 0.75 0 11 0.143 0.84 0 12 0.159 0.91 1 13 0.153 0.95 1 14 0.125 0.98 1 15 0.081 0.99 1 16 0.038 0.997 1 17 0.001 0.999 1

Predictive Probability Example P ( p > 0.60 x j, X Nmax j = i ) < 0.90 for i = 1,..., 11 P ( p > 0.60 x j, X Nmax j = i ) > 0.90 for i = 12,..., 17 Therefore, the predictive probability is P ( X Nmax j 12,..., 17 ) = 0.567 If, say, θ L = 0.10 and θ U = 0.90, then the trial continues to enroll more subjects

Frequency of Interim Analyses In principle, these calculations can be completed after each subject Berry et al. recommend calculating predictive probabilities after each subject starting with the tenth subject Alternately, you could calculate the predictive probability after every 5 or 10 subjects This should be specified in advance

Designing a Predictive Probability Trial We must specify the following parameters: N max, γ, θ L and θ U We would like to identify a design that achieves the desired type-i error rate and power Design that achieve the desired operating characteristics can be found by searching for parameter combinations that achieve the desired operating characteristics

Designing a Predictive Probability Trial: Example Consider a phase II clinical trial to test the tumor response for a novel treatment p 0 = 0.2 p 1 = 0.4 α = 0.10 1 β = 0.10

Designing a Predictive Probability Trial: Example Simon s optimal two-stage design n 1 = 17, c 1 = 3, n 2 = 20, c 2 = 10 Simon s Minimax two-stage design n 1 = 19, c 1 = 3, n 2 = 17, c 2 = 10

Designing a Predictive Probability Trial: Example The desired operating characteristics can be achieved by many combinations of N max, γ, θ L and θ U One particular design is N max = 37 γ = 0.90 θ L = 0.011 θ U = 1.0 (i.e. only allow early termination for futility)

Designing a Predictive Probability Trial: Example Design E (N p 0 ) α β Optimal 26.02 0.095 0.097 Minimax 28.26 0.086 0.098 Predictive Probability 25.13 0.099 0.084

Predictive Probability Designs Decision to terminate the trial is based on the predictive probability Predictive probability is the probability of rejecting the null hypothesis if the study were to continue to the maximum sample size We illustrated that the predictive probability design can have better operating characteristics than the Simon design The drawback being that they are a bit more complicated and require more continuous monitoring

Sequential Stopping for Efficacy and Futility The predictive probability approach uses the probability of rejecting had we reached full enrollment as the basis for early termination Alternately, we could follow a similar approach to identifying stopping as was used in the first half of the course That is, we specifying a statistical model and determine critical values that produce the desired operating characteristics

Sequential Stopping for Efficacy and Futility Consider the simplest case of a one-arm trial to evaluate the response rate of a new treatment y 1,..., y nmax i.i.d Bern (p) Let p = p 0 be the rate of response for the standard of care We would like to show that the new treatment improves the response rate by at least δ to warrant further investigation That is, p > p 0 + δ to warrant further investigation

Sequential Stopping for Efficacy and Futility: Decision Rules Sequentially monitoring of the null hypothesis is carried out by evaluating the following posterior probability π n = P (p > p 0 + δ y 1,..., y n ) Decisions to continue would be based on the following rule: if π n > U n, stop for efficacy if L n < π n < U n, continue if π n < L n, stop for futility

Sequential Stopping for Efficacy and Futility: Decision Rules The primary difference between this approach and the group sequential procedures discussed earlier in the class is that we are proposing continuous monitoring L n and U n can be any shape as before One simple approach is to make L n and U n constants L n and U n are set to achieve the desired operating characteristics

Sequential Stopping for Efficacy and Futility: Operating Characteristics In principle, operating characteristics can be calculated analytically for binomial endpoints This can be very difficult in the case of continuous monitoring An alternate approach is to evaluate the design via simulation We iteratively evaluate combinations of U n and L n to identify combinations that achieve the desired operating characteristics

Sequential Stopping for Efficacy and Futility: Example Consider a one-arm trial to evaluate the response rate of a novel agent We will assume a Beta (1, 1) prior for the response probability, p The posterior ( after each subject will be Beta 1 + j i=1 y i, 1 + j ) j i=1 The response rate for the standard of care if p 0 = 0.20 The new treatment would have to have a response rate greater than p 0 + 0.10 = 0.30 to warrant further investigation

Sequential Stopping for Efficacy and Futility: Example We would like to limit the type-i error rate to 0.10 In this case, the type-i error rate is the probability of concluding that the drug is promising if p < p 0 We would like 90% power to reject the null hypothesis for some p > p 0 + δ For this example, we power our study assuming a response rate of 0.4

Sequential Stopping for Efficacy and Futility: Example We evaluate the operating characteristics of our study using simulation We can vary the following parameters U n L n N mas r, the first subject at which we are willing to terminate the study

Sequential Stopping for Efficacy and Futility: Example N max r L n U n α E (SS p = 0.20) 1 β E (SS p = 0.40) 50 5 0.10 0.90 0.087 21.2 0.780 18.8 50 10 0.10 0.90 0.048 25.0 0.778 23.8 50 5 0.05 0.90 0.083 28.6 0.805 20.6 50 5 0.10 0.80 0.199 17.1 0.882 12.3 70 5 0.10 0.90 0.083 23.1 0.836 21.1 70 10 0.10 0.90 0.042 28.2 0.846 26.3 70 5 0.05 0.90 0.080 33.4 0.862 23.0 70 5 0.10 0.80 0.199 18.5 0.898 12.9 90 5 0.10 0.90 0.078 24.7 0.860 22.4 90 10 0.10 0.90 0.043 29.8 0.876 27.9 90 5 0.05 0.90 0.084 35.7 0.897 25.0 90 10 0.10 0.80 0.203 18.9 0.903 13.3

Sequential Stopping for Efficacy and Futility: Example The design with N max = 90, r = 5, L n = 0.05 and U n = 0.90 had the following operating characteristics α = 0.084 1 β = 0.897 E (SS p = 0.20) = 35.7 E (SS p = 0.40) = 25.0

Sequential Stopping for Efficacy and Futility: Example Tweaking the design slightly to N max = 90, r = 5, L n = 0.05 and U n = 0.88 results in the following operating characteristics α = 0.094 1 β = 0.912 E (SS p = 0.20) = 35.4 E (SS p = 0.40) = 23.2 In comparison, a fixed-sample design would require approximately 40 subjects

Sequential Stopping for Efficacy and Futility: Example The preceding was a somewhat haphazard search but illustrated the basic principles In reality, there are many designs that result in the correct type-i and type-ii error rates In practice, one should complete a thorough search and compare the expected sample sizes of several potential designs

Sequential Stopping for Efficacy, Futility and Toxicity The previous design considers early termination for efficacy and futility but ultimately only considers one outcome: response In phase II, we are also interested in further evaluating the toxicity profile of a new drug In addition, only a smaller number of subjects were used in phase I and we need to protect against a drug that is excessively toxic This motivates extensions to the previous designs that consider early termination for toxicity as well as efficacy and futility

Sequential Stopping for Efficacy, Futility and Toxicity Thall et al (1995) extended the previous design to accommodate sequential stopping for efficacy, futility and toxicity This is essentially a phase IIA analog to the efficacy/toxicity trade-off designs we discussed for phase I The difference, though, is that we only consider a single dose and, therefore, the focus is on testing rather than dose-finding

Sequential Stopping for Efficacy, Futility and Toxicity: Outcomes We are now considering two, binary outcomes: Efficacy: tumor response Toxicity: toxicity vs. no toxicity This can be thought of as a multinomial random variable A 1 : response with toxicity A 2 : no response with toxicity A 3 : response with no toxicity A 4 : no response and no toxicity In this case: efficacy = A 1 A 3 toxicity = A 1 A 2

Sequential Stopping for Efficacy, Futility and Toxicity: Probability Model We assume that our data follow a multinomial distribution with probabilities (p 1, p 2, p 3, p 4 ) We assume a Dirichlet prior for (p 1, p 2, p 3, p 4 ): (p 1, p 2, p 3, p 4 ) Dir (θ 1, θ 2, θ 3, θ 4 ) This results in the following posterior distribution: (p 1, p 2, p 3, p 4 ) y Dir (θ 1 + y 1, θ 2 + y 2, θ 3 + y 3, θ 4 + y 4 ) Where y 1 : y 4 are the counts for each of the four outcomes

Sequential Stopping for Efficacy, Futility and Toxicity: Probability Model We will make decisions by monitoring the marginal probabilities of efficacy and toxicity P (efficacy) = p E = p 1 + p 3 P (toxicity) = p T = p 1 + p 2 The posterior distributions for each of these two probabilities are: p E Beta (θ 1 + y 1 + θ 3 + y 3, θ 2 + y 2 + θ 4 + y 4 ) p E Beta (θ 1 + y 1 + θ 2 + y 2, θ 3 + y 3 + θ 4 + y 4 ) This is due to the Beta distribution being a special case of the Dirichlet distribution with only two probabilities

Sequential Stopping for Efficacy, Futility and Toxicity: Probability Model We will make decisions by monitoring the marginal probabilities of efficacy and toxicity P (efficacy) = p E = p 1 + p 3 P (toxicity) = p T = p 1 + p 2 The posterior distributions for each of these two probabilities are: p E Beta (θ 1 + y 1 + θ 3 + y 3, θ 2 + y 2 + θ 4 + y 4 ) p T Beta (θ 1 + y 1 + θ 2 + y 2, θ 3 + y 3 + θ 4 + y 4 ) This is due to the Beta distribution being a special case of the Dirichlet distribution with only two probabilities

Sequential Stopping for Efficacy, Futility and Toxicity: Decision rules Let p E,s be the probability of response for the standard of care Let p T,s be the probability of toxicity for the standard of care We make our decisions about terminating the trial based on similar posterior probabilities as before π n,e = P ( p E > p E,s + δ E y ) π n,t = P ( p T > p T,s + δ T y )

Sequential Stopping for Efficacy, Futility and Toxicity: Decision rules The previous probabilities can be interpreted as follows: We hope to show that the new drug resulted in an improved response rate We do not want the probability of toxicity to exceed the standard of care by more than a small amount

Sequential Stopping for Efficacy, Futility and Toxicity: Sequential Monitoring The decision to continue is now updated to the following: if π n,e > U n,e and π n,t < U n,t, stop for efficacy if π n,e < L n,e, stop for futility if π n,t > U n,t, stop for excess toxicity otherwise, continue

Sequential Stopping for Efficacy, Futility and Toxicity: Operating Characteristics The study can be designed to achieve the desired operating characteristic by altering the following parameters U n,e : controls stopping for efficacy L n,e : controls stopping for futility U n,t : controls stopping for excess toxicity The operating characteristics of this design can be evaluated by simulation U n,e, L n,e and U n,t are chosen through an iterative procedure, as before

Sequential Stopping for Efficacy, Futility and Toxicity: Operating Characteristics It should be noted that there are now four possible true scenarios Improved efficacy, acceptable toxicity Improved efficacy, unacceptable toxicity unacceptable efficacy, acceptable toxicity unacceptable efficacy, unacceptable toxicity but only two possible conclusions drug is promising (i.e. safe and efficacious) drug is not promising This complicates standard concepts of type-i error and power You should consider the probability of rejecting under each scenario

Sequential Stopping in Phase II: Summary We discussed two scenarios for sequential monitoring in Phase II: Sequential stopping for efficacy and futility Sequential stopping for efficacy, futility and toxicity These methods are similar to the group sequential procedures discussed in the first half of the course Specify a probability model Develop critical values that result in desired operating characteristics These designs utilize continuous monitoring, which can reduce expected sample size Continuous monitoring also increases the burden on study personnel

Multiple Subpopulations The previous designs are appropriate for evaluating a novel agent in a single disease We are often interested in evaluating the efficacy of a novel agent in several diseases We may not have identified a target disease after phase I We have several potential targets but need to narrow down for phase III We are interested in the efficacy of the novel agents in sub-populations of a single disease

Pooling Across Groups One approach to the previous problem is to pool across diseases This only makes sense if the drug is equally efficacious for the various diseases It is more likely that the drug has different effects in the various populations, making interpretation difficult Furthermore, the drug could be very effective in one disease but the effect could be washed out because the drug is ineffective for other diseases

Running Individual Trials Instead, you could run individual clinical trials for each disease or sub-population This avoids the problems of pooling but is inefficient: Running multiple trials is time-consuming and expensive There may not be enough subjects to run multiple trials It is also possible that there is information about the effectiveness of the drug in one disease can be found in the other trials

Hierarchical Modeling in Phase II An alternate approach is to model the data hierarchically This allows information to be shared across the various disease classifications or sub-populations This avoids the problem of pooling across populations that may be different This will also be more efficient than running independent trials for each population

Hierarchical Modeling: Basic Approach Let j = 1,..., J index J sub-populations Let y j and θ j generically represent the data and parameters The data are modeled hierarchically as follows: P (y j θ j ) P (θ j φ) φ controls the borrowing of information across groups

Hierarchical Modeling: Binary Data Recall that in phase II, the outcome is usually tumor response, measured as a binary outcome Let y j be the number of responses from n j subjects for sub-population j y j Bin ( n j, p j ) where p j is the probability of tumor response for sub-population j

Hierarchical Modeling: Modeling p j The probability of tumor response, p j, is transformed using the logit transformation θ j = log ( p j / ( 1 p j )) We put a normal prior on θ j to facilitate the borrowing of information across groups θ j N (µ, τ)

Hierarchical Modeling: Hyperparameters We put a normal hyperprior on µ µ N (m µ, s µ ) And a gamma hyperprior on τ τ Gamma (a 0, b 0 )

Hierarchical Modeling: Conducting the Trial The previous model allows us to share information across sub-populations This does not fundamentally change how the trial is conducted The decision as to whether or not to continue the trial is made for each sub-population We may terminate the study within one sub-population while the others continues The trial continues until all sub-populations have terminated or reached a maximum sample size

Randomization in Phase II We have so far only discussed methods for phase IIA clinical trials Phase IIA trials are single arm trials comparing measures of efficacy to historical standards In Phase IIB, we begin to consider randomized trials to evaluate a novel agent Randomize subjects to novel treatment or standard of care Randomize subjects to one of several novel treatments

Standard Randomization The standard approach to randomizing subjects is to specify a fixed randomization ratio at the beginning of the study 1:1 - treatment:control 2:1 - treatment:control The disadvantage to this approach is that a large number of subjects may be randomized to an ineffective treatment In addition, it may be beneficial to limit enrollment to a subset of treatments that appear promising in a trial where several treatments are being considered

Adaptive Randomization An alternate approach is adaptive randomization In adaptive randomization, we adapt the randomization ratio based on interim results We increase the number of subjects randomized to promising treatments and limit the number randomized to ineffective treatments This has the advantage of maximizing the number of subjects receiving optimal care

Adaptive Randomization Adaptive randomization has been studied extensively throughout the history of statistics (Thompson, 1933, Louis, 1975, 1977) Thall and Wathen (2007) provide a review of adaptive randomization They propose a practical approach to adaptive randomization

Basic Approach Let A 1 and A 2 be the two arms of the trial Let θ 1 and θ 2 be the response rate for each arm If y represents the available data, we would randomize subjects at A 1 and A 2 with probabilities proportional to P ( θ 1 > θ 2 y ) c ( 1 P ( θ 1 > θ 2 y )) c c = n, where N is the maximum number of patients 2N

Basic Approach Let N = 100 and P (θ 1 > θ 2 y) = 0.7 This results in the following randomization ratios at various times in the study n = 20, randomize (.7/.3) 20/200 : 1 or 1.09 : 1 n = 60, randomize (.7/.3) 60/200 : 1 or 1.29 : 1 n = 90, randomize (.7/.3) 90/200 : 1 or 1.46 : 1

Generalizing to Multiple Groups This approach can easily be generalized to more than two groups In the case of more than 2 groups, the probability of being randomized to each arm is proportional to: P ( θ j = max k θ k y ) c

Generalizing to Multiple Groups: Example Let N = 100 P ( θ 1 = max k θ k y ) = 0.3 P ( θ 2 = max k θ k y ) = 0.5 P ( θ 3 = max k θ k y ) = 0.2 This results in the following randomization ratios at various times in the study n = 20, randomize 1.04:1.10:1 n = 60, randomize 1.13:1.32:1 n = 90, randomize 1.20:1.51:1

Conducting an Adaptive Randomization Trial Start by identifying a maximum sample size Subjects will be enrolled and sequentially monitored using continuous or group sequential monitoring We must specify the following decision rules for terminating the study early loser early winner futility final winner

Early Loser Enrollment in an arm will stop if there is overwhelming evidence that the arm is not the best treatment This can be formalized as the following: P ( θ j = max k θ k y ) < p L That is, the probability that treatment j is the best is very low In this case, enrollment in treatment j would stop and the trial continues only with the remaining treatments

Early Winner The trial stops and we identify the best treatment when there is overwhelming evidence that treatment j is the best treatment Formally, we stop and declare treatment j the winner if P ( θ j = max k θ k y ) > p U In this case, the trial terminates and we declare treatment j the winner

Futility We could also stop an arm for futility if there is overwhelming evidence that the treatment does not meet some historical standard Formally P ( θ j > θ min y ) < P F In this case, enrollment in arm j stops and the trial continues with the other arms

Final Winner At study completion, we identify treatment j as the best treatment if there is strong evidence that it is the best That is, P ( θ j = max k θ k y ) > p U,final p U,final is set less than p U to increase the chance of picking a winner No treatment is selected if the above statement is false for all arms

Monitoring an Adaptive Randomization Trial At study initiation, subjects are randomized equally to each arm The randomization ratios are updated throughout the study based on the available data Individual arms or the entire study are terminated based on the pre-specified stopping rules If the study does not terminate early, the final decision is made at the maximum sample size using the pre-specified decision rule

Designing an Adaptive Randomization Trial As before, the operating characteristics of our study are not usually available analytically Instead, we conduct simulation studies to evaluate our design Simulations are completed under a variety of scenarios and identify design parameters that result in acceptable characteristics in a variety of settings

Adaptive Randomization: Example Consider the following example from Berry et al (2012) Researchers would like to complete a phase II clinical trial to evaluate a novel sensitizer that enhances the efficacy of chemotherapy when given in combination This will be a two-arm study Arm 1 - chemotherapy alone Arm 2 - chemotherapy + sensitizer

Adaptive Randomization: Example We would like to compare the response rate for the two arms The clinician has good prior information on the two arms and prior distributions for the response rates were derived using the following prior mean and standard deviation for each response rate Arm 1 - prior mean = 0.55, prior standard deviation = 0.10 Arm 2 - prior mean = 0.75, prior standard deviation = 0.13 Beta priors for the two response rates can be derived from the prior mean and standard deviation for each response rate

Adaptive Randomization: Example Consider a design with the following parameters Maximum sample size = 60 P L = 0.025 P U = 0.975 θ min = 0.50 and P F = 0.05 P U,final = 0.90 Operating characteristics of our study can be evaluated by simulation

Operating Characteristics: Null Hypothesis Arm P (Eff ) P (Sel) P (ES) P (stop) ASN (2.5%, 97.5%) Arm 1 0.55 0.01 0.00 0.11 19.6 (5, 38) Arm 2 0.55 0.16 0.11 0.00 35.6 (8, 53)

Operating Characteristics: Alternative Hypothesis Arm P (Eff ) P (Sel) P (ES) P (stop) ASN (2.5%, 97.5%) Arm 1 0.55 0.0 0.0 0.55 10.1 (4, 22) Arm 2 0.70 0.74 0.55 0.0 30.8 (4, 51)

Operating Characteristics: Best Case Scenario Arm P (Eff ) P (Sel) P (ES) P (stop) ASN (2.5%, 97.5%) Arm 1 0.55 0.0 0.0 0.89 7.01 (4, 16) Arm 2 0.80 0.96 0.89 0.0 20.1 (4, 51)

Adaptive Randomization: Operating Characteristics The adaptive design has a type-i error rate of 17% and 74% power The expected sample size under each scenario is: null hypothesis: 55 alternative hypothesis: 41 best case scenario: 27 In contrast, a corresponding fixed-sample design would require 67 subjects/group

Adaptive Randomization: Operating Characteristics These simulations results represent the operating characteristics in only a small number of scenarios In practice, you would consider a much broader set of scenarios In addition, you may also vary design parameters to determine their impact on the operating characteristics priors for response rates stopping rule parameters

Adaptive Randomization: Summary Standard randomized designs randomize subjects using a fixed ratio This results in a large number of subjects randomized to inferior treatments Adaptive randomization updates randomization allocation as study continues This results in shorter trials and more subjects treated with better treatments

Phase II Summary Phase II clinical trials provide the first look at the efficacy of a novel agent The goal of Phase II is to determine if a novel agent has sufficient efficacy to warrant a much larger phase II clinical trials In addition, we hope to answer several secondary questions about the drug best dose optimal population

Phase II Summary The wide array of questions asked in phase II motivates the use of adaptive designs We discussed several approaches to adaptive trial design in phase II predictive probability sequential monitoring hierarchical modeling adaptive randomization The goal of these methods is to answer our scientific question as quickly as possible and to treat as many subjects as possible with the optimal treatment