Group Sequential Tests for Delayed Responses

Similar documents
Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: Incorporating pipeline data in group sequential and adaptive designs. Christopher Jennison

Group Sequential Designs: Theory, Computation and Optimisation

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

Adaptive Designs: Why, How and When?

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Group Sequential Methods. for Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Power assessment in group sequential design with multiple biomarker subgroups for multiplicity problem

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis

The Design of a Survival Study

Interim Monitoring of. Clinical Trials. Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Overrunning in Clinical Trials: a Methodological Review

Independent Increments in Group Sequential Tests: A Review

Advanced Herd Management Probabilities and distributions

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Estimation in Flexible Adaptive Designs

Confidence Distribution

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Statistical Aspects of Futility Analyses. Kevin J Carroll. nd 2013

Inverse Sampling for McNemar s Test

Review. December 4 th, Review

The SEQDESIGN Procedure

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Pubh 8482: Sequential Analysis

Designing multi-arm multi-stage clinical trials using a risk-benefit criterion for treatment selection

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Pubh 8482: Sequential Analysis

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Bios 6648: Design & conduct of clinical research

Review: Statistical Model

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

4. Issues in Trial Monitoring

Use of frequentist and Bayesian approaches for extrapolating from adult efficacy data to design and interpret confirmatory trials in children

Statistical Inference Theory Lesson 46 Non-parametric Statistics

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

A Bayesian solution for a statistical auditing problem

Adaptive Dunnett Tests for Treatment Selection

Multiple Testing in Group Sequential Clinical Trials

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Testing a Primary and a Secondary Endpoint in a Confirmatory Group Sequential Clinical Trial

FAILURE-TIME WITH DELAYED ONSET

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Bayesian methods for sample size determination and their use in clinical trials

ST745: Survival Analysis: Nonparametric methods

When enough is enough: early stopping of biometrics error rate testing

A simulation study for comparing testing statistics in response-adaptive randomization

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Group sequential designs with negative binomial data

BIOS 312: Precision of Statistical Inference

CTDL-Positive Stable Frailty Model

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Bias Variance Trade-off

Political Science 236 Hypothesis Testing: Review and Bootstrapping

STAT 4385 Topic 01: Introduction & Review

Adaptive Trial Designs

clinical trials Abstract Multi-arm multi-stage (MAMS) trials can improve the efficiency of the drug

BIOS 6649: Handout Exercise Solution

Two-stage k-sample designs for the ordered alternative problem

HYPOTHESIS TESTING. Hypothesis Testing

Adaptive clinical trials with subgroup selection

Adaptive Survival Trials

Repeated confidence intervals for adaptive group sequential trials

ADAPTIVE SEAMLESS DESIGNS: SELECTION AND PROSPECTIVE TESTING OF HYPOTHESES

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Likelihood and p-value functions in the composite likelihood context

Introduction to Bayesian Statistics

Sequential Plans and Risk Evaluation

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Are Flexible Designs Sound?

Power and Sample Size Calculations with the Additive Hazards Model

Estimation for Modified Data

Efficient multivariate normal distribution calculations in Stata

a Sample By:Dr.Hoseyn Falahzadeh 1

Pubh 8482: Sequential Analysis

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Estimating the Dynamic Effects of a Job Training Program with M. Program with Multiple Alternatives

Psychology 282 Lecture #4 Outline Inferences in SLR

Alpha-Investing. Sequential Control of Expected False Discoveries

Bayesian Optimal Interval Design for Phase I Clinical Trials

Group sequential designs for negative binomial outcomes

6 Sample Size Calculations

Transcription:

Group Sequential Tests for Delayed Responses Lisa Hampson Department of Mathematics and Statistics, Lancaster University, UK Chris Jennison Department of Mathematical Sciences, University of Bath, UK Read to the Royal Statistical Society London, May 2012

Outline 1 Group sequential tests

Outline 1 Group sequential tests 2 Delayed responses

Outline 1 Group sequential tests 2 Delayed responses 3 Optimal designs

Outline 1 Group sequential tests 2 Delayed responses 3 Optimal designs 4 Extensions

Outline 1 Group sequential tests 2 Delayed responses 3 Optimal designs 4 Extensions 5 Recovering efficiency

Outline 1 Group sequential tests 2 Delayed responses 3 Optimal designs 4 Extensions 5 Recovering efficiency 6 Summary

Group sequential monitoring of clinical trials Consider a clinical trial comparing a new treatment against a control. Let the treatment effect, θ, be the difference in average response between the new treatment and control. We can design a superiority trial to test H 0 : θ 0 against θ > 0 with one-sided type I error rate α and power 1 β at θ = δ. In a group sequential design, we monitor standardised test statistics Z k at analyses k = 1, 2,.... The stopping rule allows an early decision to reject H 0 or accept H 0.

A group sequential test (GST) A group sequential boundary for testing H 0 : θ 0 vs θ > 0 Z k Reject H 0 Accept H 0 k Here, the trial stops and rejects H 0 at the third of five analyses. Sequential testing can reduce expected sample size to around 60% or 70% of that of a fixed sample size design.

The problem of delayed responses Example A: Cholesterol reduction after 4 weeks of treatment In this example, there is a delay of four weeks between the start of treatment and observation of the primary endpoint. At each interim analysis we expect about 16 subjects to be in the pipeline, that is, to have started treatment but not yet provided a response. If a group sequential test reaches its conclusion at an interim analysis, one would still expect investigators to follow up the pipeline subjects and observe their responses. How should these data be analysed?

The problem of delayed responses A possible outcome for the cholesterol reduction trial of Example A Z k Reject H 0 Accept H 0 k Suppose Z 3 = 2.4, exceeding the boundary value of 2.3. The trial stops but, with the pipeline data included, Z = 2.1. Can the investigators claim significance at level α?

Using short term endpoints in Delayed Response GSTs We shall show how to design a Delayed Response Group Sequential Test which makes the best possible use of pipeline data. Nevertheless, we cannot achieve all of the reductions in expected sample size that are possible for an immediate response. We therefore seek ways to recover some of this lost efficiency. One way to do this is by fitting a joint model for the primary endpoint and a correlated response which is observed more rapidly.

Using short term endpoints in Delayed Response GSTs Example D: Prevention of fracture in postmenopausal women In this example, the primary endpoint is whether or not a fracture occurs within five years of entry to the study. Changes in bone mineral density (BMD) are measured after one year. It is expected that these two variables are correlated. How can we use the BMD data to gain information from subjects who have been followed for between one and five years? Would fitting a Kaplan-Meier curve for time to first fracture also help remembering that inference is about the binary outcome defined at five years?

Incorporating delayed responses into GSTs Consider a trial where response is observed time t after treatment. N Number recruited Number of responses Time t Interim t max analysis 1 We assume information is proportional to the observed number of responses. We will equally space interim analyses between times t and t max. T.W. Anderson (JASA, 1964) considers sequential tests for delayed responses. We follow this basic structure to construct GSTs.

Boundaries for a Delayed Response GST At interim analysis k, Z k is associated with information level I k = Var(ˆθ k ). Z k b 1 c 1 a 1 I 1 Ĩ 1 I 2 Ĩ 2 Ĩ 3 I If Z k > b k or Z k < a k, cease enrollment of future patients and follow-up all recruited subjects. At the decision analysis, based on information Ĩ k, reject H 0 if Z k > c k.

Calculating properties of Delayed Response GSTs Calculations of test properties (type I error rate, power, E θ (N) ) require the joint distributions of test statistic sequences: {Z 1,..., Z k, Z k }, for k = 1,..., K 1, {Z 1,..., Z K 1, Z K }. Each sequence is based on accumulating datasets. Given {I 1,..., I k, Ĩ k }, the sequence {Z 1,..., Z k, Z k } follows the canonical distribution for statistics generated by a GST for immediate responses (Jennison & Turnbull, JASA, 1997). Properties of Delayed Response GSTs can therefore be calculated using numerical routines devised for standard designs.

Reversals of anticipated final decisions Stopping with Z k > b k or Z k < a k indicates our likely final decision but there there may be a reversal. We could observe Z k b 1 a 1 I 1 Ĩ 1 c 1 I We optimise our designs to maximise the value of the additional pipeline responses for increasing the test s power.

Reversals of anticipated final decisions Stopping with Z k > b k or Z k < a k indicates our likely final decision but there there may be a reversal. We could observe Z k b 1 a 1 I 1 Ĩ 1 c 1 I We optimise our designs to maximise the value of the additional pipeline responses for increasing the test s power.

Optimal Delayed Response GSTs Let N represent the total number of subjects recruited. Let r be the fraction of a test s maximum sample size in the pipeline at each interim analysis. Objective: For a given r, maximum sample size n max, stages K and analysis schedule, we find the Delayed Response GST minimising Z F = E θ (N) f(θ) dθ with type I error rate α at θ = 0 and power 1 β at θ = δ. Here f(θ) is the density of a N(δ/2,(δ/2) 2 ) distribution. We create an unconstrained Bayes problem by adding a prior on θ and costs for sampling and for making incorrect decisions. We search for the combination of prior and costs which gives a solution with frequentist error rates α and β.

Efficiency loss when there is a delay in response It is required to test H 0 : θ 0 against θ > 0 with α = 0.025 and β = 0.1. Suppose (δ, σ 2 ) are such that the fixed sample test needs n fix = 100 subjects and set n max = 1.1 n fix. The figure shows the minima of F attained by optimised versions of our designs. 100 90 F 80 70 K=2 K=3 K=5 60 0 10 20 30 40 50 # subjects in the pipeline at an interim analysis

Revisiting Example A Example A: Cholesterol reduction after 4 weeks of treatment Responses are assumed normally distributed with variance σ 2 = 2. It is required to test H 0 : θ 0 against θ > 0 with type I error rate α = 0.025 at θ = 0, power 1 β = 0.9 at θ = δ = 1.0. The fixed sample test needs n fix = 86 subjects divided between the two treatments. We consider designs with a maximum sample size of 96, assuming a recruitment rate of 4 per week, giving 4 4 = 16 pipeline subjects at each interim analysis.

Revisiting Example A Once the trial is underway, data start to accrue after 4 weeks. Recruitment will close after 24 weeks. Interim analyses are planned after n 1 = 28 and n 2 = 54 observed responses. A decision analysis will be based on ñ 1 = 44 responses if recruitment stops at interim analysis 1 ñ 2 = 70 responses if recruitment stops at interim analysis 2 ñ 3 = 96 responses in the absence of early stopping. We derive a Delayed Response GST minimising Z F = E θ (N) f(θ) dθ, where f(θ) is the density of a N(0.5, 0.5 2 ) distribution.

Revisiting Example A Critical values for the optimised Delayed Response GST are shown below. 3 2.5 2 Z k 1.5 b 1 c 1 b 2 c 2 c 3 1 0.5 0 a 2 a 1 28 44 54 70 96 Number of Responses Critical values c 1 and c 2 are well below b 1 and b 2, so the probability of a reversal is small. Both c 1 and c 2 are less than 1.96. If desired, these can be raised to 1.96 with little change to the design s power curve.

Revisiting Example A The figure shows expected sample size curves for the fixed sample test with n fix = 85 patients, the Delayed Response GST minimising F, the GST for immediate responses with analyses after 32, 64 and 96 responses, also minimising F. 90 80 70 E θ (N) 60 50 n fix delay no delay 40 30 1 0 1 2 θ The delay in response means savings in E θ (N) are smaller than they would be if response were immediate.

Making inferences on termination How can we calculate a p-value for H 0 : θ 0 and a CI for θ? On termination of the test at stage T, (Ĩ T, Z T ) is a sufficient statistic for θ. We base inferences on a stage-wise ordering of the test s sample space for this pair. Z k c 1 c 2 c 3 Ĩ 1 Ĩ 2 Ĩ 3 I The sample space at Ĩ T = Ĩ k is partitioned by c k into high and low sets. This ordering ensures p-value calculations do not depend on future, possibly unpredictable, information levels.

Error spending Delayed Response GSTs We design error spending Delayed Response GSTs which reach a target information level Ĩ max in absence of early stopping, spend error probabilities as a function of I/I max. Let π k and γ k be cumulative type I and II error rates to be spent by stage k. Choosing c k to balance reversal probabilities under θ = 0 implies we may choose (a k, b k ) to satisfy P θ=0 {Z 1 C 1,..., Z k 1 C k 1, Z k b k } = π k π k 1 P θ=δ {Z 1 C 1,..., Z k 1 C k 1, Z k a k } = γ k γ k 1, and control the type I error rate at level α, and the type II error rate at a level just below β. Under this construction, the stage k stopping rule can be set without knowledge of Ĩ k.

Efficiency of error spending tests In the figure below, error spending tests are designed using the ρ-family of error spending functions. Values of F are attained by tests designed and conducted with K = 5, n max = 1.1 n fix, α = 0.025 and β = 0.1. 100 90 100F/n fix 80 70 error spending optimal 60 0.1 0.2 0.3 0.4 0.5 r Error spending Delayed Response GSTs are flexible and closely match the optimal tests for savings in E θ (N).

Dealing with unexpected overrunning Suppose a standard GST designed with I k and boundaries (a k, b k ) stops at analysis k < K with Z k > b k or Z k < a k. Question: If additional data are observed, how can these be incorporated into the final analysis while preserving the type I error rate? Solution: We partition the sample space at Ĩ k such that if Z k c k, reject H 0, if Z k c k, accept H 0. Requiring c k to balance the probabilities of reversing decisions under θ = 0 at stage k preserves the test s overall type I error rate. In addition, p-value calculations do not depend on Ĩ 1,..., Ĩ k 1, nor on information levels beyond stage k.

Efficiency loss when there is a delay in response In Section 3 of the paper, we consider tests with type I error rate 0.025 and power 0.9 at θ = δ, minimising F, the integral of E θ (N) with respect to a N(δ/2,(δ/2) 2 ) distribution for θ. Maximum sample size, n max, is 1.1 times the fixed sample size. There are r n max subjects in the pipeline at each interim analysis. We find optimal Delayed Response GSTs with K = 2, 3 and 5 analyses when the number of observed responses at analysis k is n k = k (1 r)nmax, k = 1,..., K 1. K The effect of delayed response on the efficiency of a Delayed Response GST increases with the pipeline fraction r.

Efficiency loss when there is a delay in response Minima of average E θ (N), F, as a function of the pipeline fraction r 100 100 F/n fix 90 80 70 K=2 K=3 K=5 60 0 0.1 0.2 0.3 0.4 0.5 Substantial savings in E θ (N) are still present for small values of r, when response is rapidly observed. About half the benefits of group sequential testing are lost as r increases to 0.25. r

Using a short term endpoint to recover efficiency Suppose a second endpoint, correlated with the primary endpoint, is available soon after treatment. For patient i on treatment T = A or B, let Y T,i X T,i = The short term endpoint, = The long term endpoint. Assume we have a parametric model for the joint distribution of (Y T,i, X T,i ) in which E(X A,i ) = µ A,2, E(X B,i ) = µ B,2 and θ = µ A,2 µ B,2. We analyse all the available data at each interim analysis.

Using a short term endpoint to recover efficiency At interim analysis k, subjects are Unobserved, Partially observed (just Y T,i available), or Fully observed (both Y T,i and X T,i available). We use maximum likelihood estimation to fit the full model to all the data available at analysis k, then extract b θ k and I k = {Var( b θ k )} 1. The sequence of estimates { b θ k } follows the standard joint distribution for a group sequential trial with observed information levels {I k }. Thus, these estimates can be used to design a Delayed Response GST in the usual way.

Using a short term endpoint to recover efficiency Values of F achieved using a second, short-term endpoint 100 F/n fix 100 90 80 70 60 κ=1 κ=0.5 κ=0.2 κ=0 0.1 0.2 0.3 0.4 0.5 r Results are for the previous testing problem with K = 5 analyses. We assume Y T,i and X T,i are bivariate normal with correlation 0.9. The ratio of time to short-term and long-term endpoints is κ. The solid line for κ = 1 is also the case of no short-term endpoint.

Using a short term endpoint to recover efficiency Although a short-term endpoint may be of clinical interest, we still make inferences about the primary endpoint alone. It is straightforward to extend this approach to use repeated measurements as follow-up continues for each patient. In Example D, we can fit a joint model for bone mineral density measured at one year and incidence of fracture within five years thereby increasing information about the latter. In an extended model, we could also use censored information on the fracture endpoint for subjects with less than five years of follow-up. Nuisance parameters, such as variances and correlation between short-term and long-term endpoints, can be estimated within the trial.

Summary In this presentation, we have discussed Formulation of a Delayed Response GST Optimisation of a Delayed Response GST P-values and confidence intervals on termination Error spending versions of these tests Unexpected overrunning Using a short-term endpoint to improve efficiency

Additional topics covered in the paper The paper also addresses Existence and uniqueness of optimal Delayed Response GSTs Computation of optimal Delayed Response GSTs Optimising designs for an objective combining expected sample size and time to a conclusion Adaptive choice of group sizes in a Delayed Response GST