The Role of Operating Characteristics in Assesing Bayesian Designs in Pharmaceutical Development

Similar documents
Bayesian Methods in Clinical Research

Solutions to Exercises

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Pubh 8482: Sequential Analysis

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Fractional Replications

Group sequential designs for Clinical Trials with multiple treatment arms

Adaptive Designs: Why, How and When?

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

19. Blocking & confounding

The SEQDESIGN Procedure

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Pubh 8482: Sequential Analysis

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Group Sequential Tests for Delayed Responses

The Design of a Survival Study

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Chapter 30 Design and Analysis of

Adaptive Treatment Selection with Survival Endpoints

A simulation study for comparing testing statistics in response-adaptive randomization

Construction of Mixed-Level Orthogonal Arrays for Testing in Digital Marketing

Modeling & Simulation to Improve the Design of Clinical Trials

Bayesian concept for combined Phase 2a/b trials

Sample Size Determination

BMA-Mod : A Bayesian Model Averaging Strategy for Determining Dose-Response Relationships in the Presence of Model Uncertainty

The One-Quarter Fraction

An extrapolation framework to specify requirements for drug development in children

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Confidence Distribution

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Fractional Factorial Designs

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Estimation in Flexible Adaptive Designs

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

Group-Sequential Tests for One Proportion in a Fleming Design

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

MATH602: APPLIED STATISTICS

c Copyright 2014 Navneet R. Hakhu

Group Sequential Trial with a Biomarker Subpopulation

FRACTIONAL REPLICATION

Statistics in medicine

Bayesian Enhancement Two-Stage Design for Single-Arm Phase II Clinical. Trials with Binary and Time-to-Event Endpoints

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Assessing the Effect of Prior Distribution Assumption on the Variance Parameters in Evaluating Bioequivalence Trials

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

a Sample By:Dr.Hoseyn Falahzadeh 1

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value.

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

TWO-LEVEL FACTORIAL EXPERIMENTS: IRREGULAR FRACTIONS

TWO-LEVEL FACTORIAL EXPERIMENTS: REGULAR FRACTIONAL FACTORIALS

Inverse Sampling for McNemar s Test

Statistical Design and Analysis of Experiments Part Two

Adaptive Dunnett Tests for Treatment Selection

4. Issues in Trial Monitoring

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)

Generalizing the MCPMod methodology beyond normal, independent data

On the efficiency of two-stage adaptive designs

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

A Bayesian decision-theoretic approach to incorporating pre-clinical information into phase I clinical trials

Overrunning in Clinical Trials: a Methodological Review

6 Sample Size Calculations

Multiple Testing in Group Sequential Clinical Trials

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

Designs for Clinical Trials

STAT451/551 Homework#11 Due: April 22, 2014

Adaptive clinical trials with subgroup selection

Computer Aided Construction of Fractional Replicates from Large Factorials. Walter T. Federer Charles E. McCulloch. and. Steve C.

Dose-response modeling with bivariate binary data under model uncertainty

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

CSCI 688 Homework 6. Megan Rose Bryant Department of Mathematics William and Mary

Reference: Chapter 8 of Montgomery (8e)

23. Fractional factorials - introduction

Reports of the Institute of Biostatistics

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

arxiv: v1 [stat.me] 24 Jul 2013

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

Independent Increments in Group Sequential Tests: A Review

BNP survival regression with variable dimension covariate vector

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Answer Keys to Homework#10

Stat 5303 (Oehlert): Fractional Factorials 1

METHODS OF EVALUATION OF PERFORMANCE OF ADAPTIVE DESIGNS ON TREATMENT EFFECT INTERVALS AND METHODS OF

Assignment 9 Answer Keys

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

On the Compounds of Hat Matrix for Six-Factor Central Composite Design with Fractional Replicates of the Factorial Portion

Transcription:

The Role of Operating Characteristics in Assesing Bayesian Designs in Pharmaceutical Development Professor Andy Grieve SVP Clinical Trials Methodology Innovation Centre, Aptiv Solutions GmbH Cologne, Germany

Outline Guidelines for Reporting Bayesian Analyses Operating Characteristics Bayesian Monitoring of Groups Sequential Designs Operating Characteristics of Thall and Wathen AD Accuracy of Simulation Designs Simulation Studies Conclusions 2011 Aptiv Solutions 2

Guidelines for Reporting Bayesian Analyses ROBUST BAYESWATCH BASIS Prior Distribution Specified Justified Sensitivity analysis Analysis Statistical model Analytical technique Results Central tendency SD or Credible Interval What s Missing? Introduction Intervention described Objectives of study Methods Design of Study Statistical model Prior / Loss function? When constructed Prior described Loss function described Use of Software MCMC, starting values, run-in, length of runs, convergence, diagnostics Results Interpretation Posterior distribution summarized Sensitivity analysis if alternativepriors used Research question Statistical model Likelihood, structure, prior & rationale Computation Software - convergence if MCMC, validation, methods for generating posterior summaries Model checks, sensitivity analysis Posterior Distribution Summaries used: i). Mean, std, quintiles ii) shape of posterior, (iii) joint posterior for multiple comparisons, (iv) Bayes factors Results of model checks and sensitivity analyses Interpretation of Results Limitation of Analysis 2011 Aptiv Solutions 3

What s Missing? Operating Characteristics Type I Error, Power etc Guidelines written by Bayesians Frequentist properties of Bayesian Procedures Bayesianly Justifiable And Relevant Frequency Calculations For The Applied Statistician Don Rubin (1979) Objective Bayes Berger & Bernardo (Uniformative) Calibrated Bayes Rubin, Lewis & Berry, Spiegelhalter Important for pharmaceutical statisticians? 2011 Aptiv Solutions 4

Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials FDA/CDRH 2010 Because of the inherent flexibility in the design of a Bayesian clinical trial, a thorough evaluation of the operating characteristics should be part of the trial design. This includes evaluation of: probability of erroneously approving an ineffective or unsafe device (type I error) probability of erroneously disapproving a safe and effective device (type II error) power (the converse of type II error: the probability of appropriately approving a safe and effective device) sample size distribution (and expected sample size) prior probability of claims for the device if applicable, probability of stopping at each interim look. 2011 Aptiv Solutions 5

Operating characteristics: Anti-Bayesian? What are Operating characteristics? How often do I make the wrong decision? Picking the wrong dose to take into later stage testing (Phase III). Deciding to continue developing the drug when the achievable effect size isn t sufficient. What if there s no effect at all? Type I error Given that there s something to find, how soon do I find it, and how often do I get it right? ASN Time to run study Type II error Anti-Bayesian? 2011 Aptiv Solutions 6

Operating characteristics Operating characteristics aim to answer many types of queestions If there is no effect, how often do we find a spurious effect? If there is an effect, how often do we pick it up? Do we make the right choice of dose? If we are estimating something, how close do we get to the true value? 2011 Aptiv Solutions 7

Simulation Results Bayesian Adaptive Design (NI) Nitin Patel(Cytel) FDA/DIA Stats Forum (Apr 2011) 10 5 simualtions Treatment Effect (P T ) Power Fixed Sample Size SSA+ES Sample Size Trial Duration (weeks) 0.145 0.025 NA 408 103 0.245 0.739 418 372 87 0.275 0.911 418 345 77 0.305 0.979 419 319 68 0.335 0.997 426 299 62 For fixed design with sample size of 418 the trial duration is 108 weeks Expected improvement when P T = 0.275: 17% fewer patients, 31 wks shorter 2011 Aptiv Solutions 8

Bayesian Monitoring of Clinical Trials Development follows Grossman et al (SIM, 1994) All data & priors are normal (known variance s 2 ) A maximum of n patients in each of 2 groups (trts:a and B) T interim analyses after tn/t (t=1,..t) patients per group Of interest is d=m A -m B The observed difference between groups of the t th cohort is d t with variance Ts d2 /n (where s d2 =2s 2 ) Prior information for d is available: corresponding to fn patients per group centred at d 0 Bayes Theorem implies that at the t th interim the posterior for d is: pd D t ~ N fnd 2011 Aptiv Solutions 9 t i1 d i n T D t tn T fn 0 2 s, tn fn T

Bayesian Monitoring of Clinical Trials Stopping rule: equivalent to: requiring Pr ob d dc Dt 1 yt d C 0 t 1/ 2 Dt Tfd /(t s /(tn / T fn) ft) D t T 1/ 2 (t ft) n 1/ 2 1/ 2 This is the general case and there are a number of tuning parameters: y t, f, d C and d 0 Z t s d C (t ft) Tfd 0 2011 Aptiv Solutions 10

Bayesian Monitoring of Clinical Trials Special Case 1: T=1, d C =0 Stopping rule requires: D (1 f) n 1/ 2 1/ 2 Z s fd 0 What are the frequency properties of this rule? Under the Null Hypothesis: d ~ N(0,s 2 d/n) PD s z (1 f) 1 z 1/ 2 1/ 2 1/ 2 d y f (nf) d0 fd 1/ 2 0 y n 1/ 2 1 f s d To control this at the 2.5% level we need z 1y z f (1 f) 0.975 1/ 2 1/ 2 z 0 2011 Aptiv Solutions 11

f Contours of Bayesian Decision Rule (y) To give a One-sided Type I Error 0f 2.5% 0.500 0.375 0.250 Determining Bayesian Decision Rule Difference between Treatment Means 0.125 0.000 0.0 0.5 1.0 1.5 2.0 Prior Standardised Effect Size (fn) 1/2 d 0 /s d = z 0 2011 Aptiv Solutions 12

Bayesian Monitoring of Clinical Trials Special Case 2: T=1, y0.025 In this case Pr ob( d d C D) 0.975 n 1 1/ 2 (1 f) dc (D 1/ 2 s (1 f) d fd 0 ) giving a condition for D which can be used to find a value for d c to give the appropriate type I error. 2011 Aptiv Solutions 13

f 0.100 0.200 0.300 0.400 0.500 0.600 Contours of Bayesian Decision Rule (d C s d /n 1/2 ) To give a One-sided Type I Error 0f 2.5% 0.500 0.375 0.250 0.125 0.000 0.0 0.5 1.0 1.5 2.0 Prior Standardised Effect Size (fn) 1/2 d 0 /s d = z 0 2011 Aptiv Solutions 14

Bayesian Monitoring of Clinical Trials Special Case 1 & 2 Whichever approach is used is turns out that using this approach is effectively discounting the prior information. 1/ 2 To see this substitute into D z y z (1 f (1 f) 0.025 f) n 1/ 2 1/ 2 Z 1/ 2 s z 0 fd 0 to give s D dz n 0.975 1/ 2 the standard, frequentist decision criteria 100% discounting 2011 Aptiv Solutions 15

Bayesian Monitoring of Clinical Trials Special Case 3: y t 0.025, d C =0, d 0 =0 (Sceptical Prior) A sceptical prior can be set up formally. Prior centred around 0, with a small probability g of achieving the alternative (d A ) - p(d>d A ) = g From which d sceptical prior A s d z1 g 1/ 2 (fn) Now suppose the trial has been designed with size a and power 1-b to detect the alternative hypothesis d A. So that n From which: Example: α=0.05, 1- b=0.90, g=0.05 => f ~ 1/4 s f 2 d z 1a / 2 2 da z 1b zg z 1 a / 2 z 2 1b 2 0 d A 2011 Aptiv Solutions 16

Bayesian Monitoring of Clinical Trials Special Case 3: y t 0.025, d C =0, d 0 =0 In this case: Pr ob( d d D ) 1/ 2 T D sd(nt) 1/ 2 T Dt 2 (t ft) which is equivalent to increasing the critical region by a factor Grossman et al(1994) call f the handicap C t t ft t Dt /(t ft) 1 1/ sd /(t / T f) s d n 1/ t 1/ 2 1/ 2 1/ 2 t (t ft) 2 1/ 2 0.975 0.975 0.975 2011 Aptiv Solutions 17

Bayesian Monitoring of Clinical Trials Special Case 3: y t 0.025, d C =0, d 0 =0 The frequentist properties of this handicapping are not so easy to derive. For T=2 a single interim the frequentist type-i error can be calculated using a bivariate normal probability function, e.g. the SAS function PROBNRM. For T > 3 Grossman et al (1994) use simulation to determine the handicap f that controls the two-sided type I error at 5% and 1% (20,000,000 trials) Alternatively use can be made of the algorithm derived by Armitage, MacPherson and Rowe (JRSSA, 1969) used a SAS implementation of FORTRAN program by Reboussin, DeMets, Kim and Lan or SEQ, SEQSCALE & SEQSHIFT (PROC IML) 2011 Aptiv Solutions 18

Handicap (f) Handicaps(f) To Control the Two-sided a for Upto 25 Analyses 1.0 0.8 0.6 0.4 0.2 a 0.30 0.25 0.20 0.15 0.10 0.05 0.01 0.0 0 5 10 15 20 25 Number of Analyses (T) 2011 Aptiv Solutions 19

Standardised Statistic Comparison of Critical Values O Brien/Fleming, Pocock & Handicapped Bayes 5 4 3 Bayes O Brien/Fleming 2 1 Pocock 0 0 1 2 3 4 5 Analysis 2011 Aptiv Solutions 20

Handicapped Bayes versus Optimal Designs (Pocock, 1982) Investigated properties of group sequential designs, in particular the Average Sample Number (ASN) Maximum number of groups, K Nominal significance level, a Required number* of patients per group 2n Maximum number* of patients 2nN Average number of patients until stopped under H A (ASN) 1 0.05 51.98 52.0 52.0 2 0.0294 28.39 56.8 37.2 3 0.0221 19.73 59.2 33.7 4 0.0182 15.19 60.8 32.3 5 0.0158 12.38 61.9 31.3 10 0.0106 6.50 65.0 29.8 20 0.0075 3.38 67.6 29.5 Multiply by s 2 /d 2 Generated optimal GSDs, for given power minimum ASN 2011 Aptiv Solutions 21

Standardised Statistic Comparison of Critical Values Optimal (75% & 80% Power) & Handicapped Bayes 5 4 3 2 1 0 0 1 2 3 4 5 Analysis 2011 Aptiv Solutions 22

Bayesian Adaptive Randomisation Thall and Wathen (Eur J Cancer, 2007) P( p Back to the idea of Thompson (1933) Similar to RPW binary outcome Randomisation to treatment A on the basis of a function of P(p A < p B Data) although in practice Thompson used P(p A < p B Data). Unstable Thall and Wathen (2007) A p B P( p Data) A C p B Data) 1P( p A C p B Data) C 2011 Aptiv Solutions 23

Bayesian Adaptive Randomisation Impact of Choice of C 1 C=1.0 P(p A < p B Data) C P(p A < p B Data) C + (1- P(p A < p B Data)) C 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 C=0.4 C=0.2 C=0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P( p A p B Data) 2011 Aptiv Solutions 24

Bayesian Adaptive Randomisation Impact of Choice of C Thall and Whalen recommend C= n/(2n) n=current sample size N=study s maximum sample size Begins with C=0, ends with C=1/2 C=1/2 works well in many applications Giles et al (J Clin Oncology, 2003) Similar idea but now with 3 arms (2 experimental, 1 control) using functions of P(m 1 <m 0 data), P(m 2 <m 0 data), and P(m 1 <m 2 data), - m 2, m 1, and m 0 are the median survival times 2011 Aptiv Solutions 25

2 x 2 Contingency Table Data structure Response No Response Treatment A n 11 (p A ) n 12 (1-p B ) Treatment B n 21 (p B ) n 22 (1-p B ) Likelihood Prior Posterior p n 12 n21 1 p p p 22 n A 1A B 1 p p 11 n 1 1 12 221 21 1 p p p 111 A A B 1 B n 12121 n 1 n22221 2121 1 p p p n11111 A A B 1 B B 2011 Aptiv Solutions 26

2x2 Contingency Table - Posterior Inference The probability of interest is Prob( p A p B n2121 1 kmax(n n 21 21 12 Data) 12,0) n11 11 n21 21 1 n12 12 n22 k n21 21 n22 22 n11 11 n21 21 n12 12 n22 22 n11 11 n12 12 1 1 1 k 2 22 based on the cumulative hypergeometric function as is Fisher s exact test: 2011 Aptiv Solutions 27

Bayesian AD Thall & Wathen(EJC,2007) Type-I Error Based on T&W Criterion Thall & Wathen illustration is based on: N = 200 Stopping Rules If P(p A <p B Data) > 0.99 stop and choose B If P(p A <p B Data) < 0.01 stop and choose A (futility) What does the type I error look like? A complication is that the control rate, p A, is a nuisance parameter 2011 Aptiv Solutions 28

Randomisation Probability Bayesian AD Thall & Wathen(EJC,2007) N=200 Randomisation Probabilities (10 5 simulations) p A =0.25, p B =0.25(0.05)0.45 1.0 0.9 0.8 C=1 C=n/(2N) 0.7 0.6 0.5 0.4 0 20 40 60 80 100 120 140 160 180 200 Patient Number 2011 Aptiv Solutions 29

Standard Deviation Bayesian AD Thall & Wathen(EJC,2007) N=200 Variability of Randomisation Probabilities p A =0.25, p B =4.25 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 20 40 60 80 100 120 140 160 180 200 Patient Number C=1 C=n/(2N) 2011 Aptiv Solutions 30

Type I Error Bayesian AD Thall & Wathen(EJC,2007) N=200 Type-I Error Based on T&W Criterion - P(p A >p B Data)>0.99 10 6 Simulations / control rate 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Nuisance Parameter (control rate) 2011 Aptiv Solutions 31

Bayesian AD Thall & Wathen(EJC, 2007) N=200 Control One-Sided Type-I Error The issue is the number of tests being conducted 1. Reduce the problem using cohorts (20, 50?) 2. Or choose decision criterion P(p A <p B Data)>? to control type-i error 2011 Aptiv Solutions 32

Critical Value Bayesian AD Thall & Wathen(EJC, 2007) N=200 Critical Value to Control One-Sided Type-I Error 10 6 Simulations / control rate 1.000 0.995 0.990 0.985 0.980 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Nuisance Parameter (control rate) 2011 Aptiv Solutions 33

Critical Value Bayesian AD Thall & Wathen(EJC, 2007) N=200 Critical Value to Control One-Sided Type-I Error 10 6 Simulations / control rate 1.000 0.995 0.990 0.985 0.99747 0.980 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Nuisance Parameter (control rate) 2011 Aptiv Solutions 34

Type I Error Bayesian AD Thall & Wathen(EJC, 2007) N=200 Type-I Error Based on P(p A <p B Data)>.99747 10 6 Simulations / control rate 0.030 0.025 0.020 0.015 0.010 0.005 0.000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Nuisance Parameter (control Rate) 2011 Aptiv Solutions 35

Type I Error Bayesian AD Thall & Wathen(EJC, 2007) N=200 Comparison of Type-I Error Based on T&W Criterion & Adjusted 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Nuisance Parameter (control rate) 2011 Aptiv Solutions 36

Bayesian AD Thall & Wathen(EJC, 2007) N=200 Probability of Choosing B as a Function of Patient Accrual 10 5 Simulations 0.0 0.2 0.4 0.6 0.8 1.0 C=0.0 C=1.0 C=n/(2N) 0.25 0.30 0.35 0.40 0.45 0.25 0.30 0.35 0.40 0.45 0.25 0.30 0.35 0.40 0.45 Critical Value: 0.99 0.99747 q B 2011 Aptiv Solutions 37

Maxium Critical Value Bayesian AD Thall & Wathen(EJC, 2007) Maximum Critical Value vs Maximum Patient Number >10 6 Simulations / control rate 1.000 0.996 0.992 0.988 0.984 0.980 0 50 100 150 200 250 300 Maximum Number of Patients 2011 Aptiv Solutions 38

Crisiticism of This Approach Korn and Freidlin (J Clin Oncol, 2011) Their simulations show : Thall & Wathen AD inferior to1:1 randomisation in terms of information, benefits to patients in trial True I agree with Don Berry (J Clin Ocol 2011) that the greatest benefits are likely to accrue for trials with more than 2 arms Rather as in the case of T=1 in the group sequential case greater complexity gives more scope for Bayesian designs 2011 Aptiv Solutions 39

Setting Up Simulations How do we design a simulation experiment? What are the factors? Can we use a fractional factorial 2011 Aptiv Solutions 40

ASTIN Trial Acute Stroke: Dose Effect Curve (Grieve and Krams, Clinical Trials, 2005) Factors of the Simulation Experiment Termination Rule Decision theoretic Posterior Probabilities Prior Distribution Uninformative (flat) prior Informative prior Variance of Prior Constant Less variable at low end of the dose range Level of Smoothing High smoothing Low Smoothing Allocation Criteria Var(ED95) x Var(f(z*)) Var(f(z*)) Var(ED95) Determinant of Covar ((ED95) and Var(f(z*)) Randomisation Method Probability of allocation proportional to expected utility Allocate to maximum utility Probability uniform over doses st 0.9f(z 0 ) < E[f(z) Y] < 1.10f(z 0 ) Probability uniform over doses st 0.9f(z 0 ) < E[f(z) Y] 2 4 x 4 2 experiment 1/4 replicate 2011 Aptiv Solutions 41

ASTIN Trial Acute Stroke: Dose Effect Curve (Grieve and Krams, Clinical Trials, 2005) Aliasing Structure Contrast Effects Contrast Effects 1 A, BCDG, BEFH 19 BF, AEH 2 B, ACDG, AEFH 20 BG, ACD 3 C, ABDG 21 BH, AEF 4 D, ABCG 22 CD, ABG, EFGH 5 E, ABFH 19 BF, AEH 6 F, ABEH 20 BG, ACD 7 G, ABCD 21 BH, AEF 8 H, ABEF 22 CD, ABG, EFGH 9 AB, CDG, EFH 23 CE, DFGH 10 AC, BDG 24 CF, DEGH 11 AD, BCG 25 CG, ABD, DEFH 12 AE, BFH 26 CH, DEFG 13 AF, BEH 27 DE, CFGH 14 AG, BCD 28 DF, CEGH 15 AH, BEF 29 DG, ABC, CEFH 16 BC, ADG 30 DH, CEFG 17 BD, ACG 31 EF, ABH, CDGH 18 BE, AFH 32 EG, CDFH 2011 Aptiv Solutions 42

8 10 12 14 16 18 20 Change from baseline ASTIN Trial Acute Stroke: Dose Effect Curve (Grieve and Krams, Clinical Trials, 2005) Simulated Dose-Response Curves Height : 2 or 4 pts 0.0 0.5 1.0 1.5 Dose 2011 Aptiv Solutions 43

Simulation Experiment Simulated 20 replicates 64 experiments 9 curves 11520 individual studies Run-on a network of PCs/Workstations Having established optimal settings 1000 simulations were performed to establish operating characteristics 2011 Aptiv Solutions 44

Accuracy of Simulations Posch, Mauerer and Bretz (SIM, 2011) Studied adaptive design with treatment selection at an interim and sample size re-estimation Control FWER (familywise error rate) in a strong sense under all possible configurations of hypotheses Conclude: That you have to be careful with the assumptions behind the simulations. Intriguing point: the choice of seed has an impact on the estimated type I error 2011 Aptiv Solutions 45

Accuracy of Simulations Posch, Mauerer and Bretz (SIM, 2011) If it is important to be able to differentiate between 0.025 and 0.026 then we should power our simulation experiment for it A sample of 10 4 has only 10% power to detect H A =0.026 vs H 0 =0.025, 10 5 : 50% 80% power requires n=194,000 search 380 seeds 90% power requires n=260,000 search 1600 seeds 2011 Aptiv Solutions 46

Average Run Length Average Run Length to find a Good Seed 10000000000 1000000000 100000000 10000000 1000000 100000 10000 1000 100 10 1 N=10 6 : number of seeds =8*10 9 100 1000 10000 100000 1000000 Simulation Sample Size 2011 Aptiv Solutions 47

Conclusions Determining Decision Criteria Appropriate approach: Choose decision rule based on clinical or commercial criteria. Investigate operating characteristics If they are unacceptable e.g., type I error > 20% then look to change them BUT don t strive to get exact control Two examples 2011 Aptiv Solutions 48

Effect Over Placebo 49 ASTIN Trial Acute Stroke: Dose Effect Curve (Grieve and Krams, Clinical Trials, 2005) Efficacy (>2 pts) 0 ED95* Futility (< 1 pt) Dose Andy Grieve 2011 Aptiv Solutions 49

Change from baseline -3.0-2.5-2.0-1.5-1.0-0.5 0.0 0.0 0.6 50 POC Study in Neuropathic Pain Smith et al (Pharmaceutical Statistics, 2006) Probability of futility and dose-response curve. Change from baseline in mean pain score Probability of futility (<=1.5 improvement over PBO) Placebo 50mg 100mg 150mg 200mg 300mg 450mg 600mg Dose (mg) Horizontal reference line at P(Futility)=0.8 NDLM estimate of dose-response curve NDLM estimate 80% CI limits Andy Grieve Placebo 50mg 100mg 150mg 200mg 300mg 450mg 600mg 2011 Aptiv Solutions Dose (mg) 50