Pubh 8482: Sequential Analysis

Similar documents
Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis

The Design of Group Sequential Clinical Trials that Test Multiple Endpoints

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

4. Issues in Trial Monitoring

Pubh 8482: Sequential Analysis

Estimation in Flexible Adaptive Designs

Review. December 4 th, Review

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Simple logistic regression

The Design of a Survival Study

Multiple Testing in Group Sequential Clinical Trials

Likelihood-based inference with missing data under missing-at-random

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Overrunning in Clinical Trials: a Methodological Review

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

1 Statistical inference for a population mean

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Tests for Delayed Responses

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Statistical Inference

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Psychology 282 Lecture #4 Outline Inferences in SLR

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

A Very Brief Summary of Statistical Inference, and Examples

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Some General Types of Tests

On the Inefficiency of the Adaptive Design for Monitoring Clinical Trials

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Mathematical statistics

One-sample categorical data: approximate inference

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Ch. 5 Hypothesis Testing

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

c Copyright 2014 Navneet R. Hakhu

First we look at some terms to be used in this section.

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Classical Inference for Gaussian Linear Models

Statistics 135 Fall 2008 Final Exam

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Group sequential designs with negative binomial data

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

ST495: Survival Analysis: Hypothesis testing and confidence intervals

7 Estimation. 7.1 Population and Sample (P.91-92)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Simple and Multiple Linear Regression

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

The SEQDESIGN Procedure

BIOS 312: Precision of Statistical Inference

Optimal group sequential designs for simultaneous testing of superiority and non-inferiority

Multiple Linear Regression

Reports of the Institute of Biostatistics

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Linear Models and Estimation by Least Squares

Chapter 12: Estimation

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Lecture 01: Introduction

A Type of Sample Size Planning for Mean Comparison in Clinical Trials

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Inverse Sampling for McNemar s Test

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Theory of Maximum Likelihood Estimation. Konstantin Kashin

An Adaptive Futility Monitoring Method with Time-Varying Conditional Power Boundary

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Statistical inference

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

POLI 443 Applied Political Research

A3. Statistical Inference

Sampling distribution of GLM regression coefficients

Probability and Statistics

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Answer Key for STAT 200B HW No. 7

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Choice-Based Revenue Management: An Empirical Study of Estimation and Optimization. Appendix. Brief description of maximum likelihood estimation

Loglikelihood and Confidence Intervals

Statistics and econometrics

Two-Phase, Three-Stage Adaptive Designs in Clinical Trials

Basic Concepts of Inference

Statistical Inference

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Discrete Multivariate Statistics

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Inference for Binomial Parameters

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Tests about a population mean

Maximum-Likelihood Estimation: Basic Ideas

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Transcription:

Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 8

P-values When reporting results, we usually report p-values in place of reporting whether or not we reject the null hypothesis For better or worse, p-values are usually used by investigators to evaluate the strength of evidence against the null hypothesis P-values are then translated into hypothesis tests by comparing p-values to a nominal significance level (usually 0.05)

P-values and group sequential designs We ve seen that a group sequential testing procedure will change the sampling distribution of a test statistic Z This means that inference based on usual normal approximations are no longer appropriate How do we calculate p-values for group sequential designs?

Problem A p-value can be interpreted as the probability under the null hypothesis of observing a test statistic as extreme or more extreme than what was observed This simply in a fixed-sample design Z 1 < Z 2 implies Z 2 is more extreme than Z 1 This is not so clear in the group sequential setting

Problem Which of the following realizations is more extreme? (T = 1, Z 1 = 3.70) (T = 2, Z 2 = 4.50) It depends how you order the sample space? It is not obvious how this should be done We will consider four possible ordering stage-wide ordering MLE ordering Likelihood ordering Score Test ordering

Stage-wise Ordering the pair (k 2, z 2 ) > (k 1, z 1 ) if any of the following are true k 2 = k 1 and z 2 z 1 k 2 < k 1 and z 2 b k2 k 2 > k 1 and z 1 a k1

MLE Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 / I k2 > z 1 / I k1

Likelihood Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 > z 1

Score-Test Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 Ik2 > z 1 Ik1

Ordering: Example Consider a one-sided power family test with k = 5 and the following stopping boundaries a 1 : a 5 = 3.49, 0.70, 0.43, 1.13, 1.63 b 1 : b 5 = 8.11, 4.06, 2.70, 2.03, 1.63 Consider the following pairs of sufficient statistics with information levels: 1, 2, 3, 5 and 5 (k 1, z 1 ) = (2, 5.0) (k 2, z 2 ) = (4, 0.5) (k 3, z 3 ) = (5, 1.8)

Stage-wide Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) k 1 < k 2 and z 1 > b 1 (k 1, z 1 ) > (k 3, z 3 ) k 1 < k 3 and z 1 > b 1 (k 3, z 3 ) > (k 2, z 2 ) k 3 > k 2 and z 2 < a 2

MLE Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 / I 1 = 3.54 > z 2 / I 2 = 0.25 (k 1, z 1 ) > (k 3, z 3 ) z 1 / I 1 = 3.54 > z 3 / I 3 = 0.80 (k 3, z 3 ) > (k 2, z 2 ) z 3 / I 3 = 0.80 > z 2 / I 2 = 0.25

Likelihood Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 > z 2 (k 1, z 1 ) > (k 3, z 3 ) z 1 > z 3 (k 3, z 3 ) > (k 2, z 2 ) z 3 > z 2

Score-test Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 I1 = 7.07 > z 2 I2 = 1 (k 1, z 1 ) > (k 3, z 3 ) z 1 I1 = 7.07 > z 3 I3 = 4.02 (k 3, z 3 ) > (k 2, z 2 ) z 3 I3 = 4.02 > z 2 I2 = 0.80

Calculating P-values The orderings described on the preceding slides allow us to order the sample space in a sensible manner How do we translate these orderings into a p-value? Recall the definition of a p-value: The probability under the null hypothesis of observing a test statistic as extreme or more extreme than what was observed

Calculating P-values One-sided upper P-value: One-sided lower P-value: P θ=0 ((T, Z T ) (k, Z k )) P θ=0 ((T, Z T ) (k, Z k )) Two-sided p-value is equal to twice the minimum of the upper and lower p-value

Calculating P-values: Stage-wise Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the stage-wide ordering is: k 1 P θ=0 ((T, Z T ) (k, Z k )) = + i=1 b i z k f (i, z θ = 0) dz f (k, z θ = 0) dz

Properties of the Stage-wise ordering The p-value is less than α if and only if H 0 is reject The p-value does not depend on information levels beyond the observed stopping stage

Calculating P-values: MLE Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the MLE ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k Ii /I k f (i, z θ = 0) dz

Calculating P-values: Likelihood Ratio Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the likelihood ratio ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k f (i, z θ = 0) dz

Calculating P-values: Score test Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the score test ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k Ik /I i f (i, z θ = 0) dz

Properties of the MLE, likelihood and score test orderings All three cases potentially involve integrating over regions that do not correspond to rejecting the null hypothesis P-values depend on the information levels for future (unobserved) stopping times

Calculating P-values: Example Consider a group sequential design with two-sided O Brien-Fleming boundaries, k = 5 and α = 0.05 b 1 = 4.56 b 2 = 3.23 b 3 = 2.63 b 4 = 2.28 b 5 = 2.04 Two cases: (k, z k ) = (2, 3.5) (k, z k ) = (5, 2.5)

Calculating P-values: Example 1 Stage-wise ordering ( p = 2 f (1, z θ = 0) dz + 4.56 = 0.0005 3.5 ) f (2, z θ = 0) dz

Calculating P-values: Example 1 MLE ordering ( p = 2 f (1, z θ = 0) dz + 3.5.2/.4 + f (3, z θ = 0) dz + 3.5.6/.4 ) + 3.5 1/.4 = 0.0138 f (5, z θ = 0) dz 3.5 3.5.8/.4 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 1 Likelihood ratio ordering p = 2 ( 5 = 0.0013 i=1 3.5 f (i, z θ = 0) dz )

Calculating P-values: Example 1 Score test ordering ( p = 2 f (1, z θ = 0) dz + 3.5.4/.2 + f (3, z θ = 0) dz + 3.5.4/.6 ) + 3.5.4/1 = 0.0258 f (5, z θ = 0) dz 3.5 3.5.4/.8 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 1 Summary Ordering p-value Stage-wise 0.0005 MLE 0.0138 LR 0.0013 Score test 0.0258

Calculating P-values: Example 2 Stage-wise ordering ( p = 2 f (1, z θ = 0) dz + 4.56 = + 2.63 2.5 = 0.0295 f (3, z θ = 0) dz + ) f (5, z θ = 0) dz 2.28 3.23 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 MLE ordering ( p = 2 f (1, z θ = 0) dz + 2.5.2/.4 + f (3, z θ = 0) dz + 2.5.6/.4 ) + 2.5 1/.4 = 0.0913 f (5, z θ = 0) dz 2.5 2.5.8/.4 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 Likelihood ratio ordering p = 2 ( 5 = 0.0481 i=1 2.5 f (i, z θ = 0) dz )

Calculating P-values: Example 2 Score test ordering ( p = 2 f (1, z θ = 0) dz + 2.5.4/.2 + f (3, z θ = 0) dz + 2.5.4/.6 ) + 2.5.4/1 = 0.2129 f (5, z θ = 0) dz 2.5 2.5.4/.8 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 Summary Ordering p-value Stage-wise 0.0295 MLE 0.0913 LR 0.0481 Score test 0.2129

P-values: Summary There are many approaches to ordering the sample space after a group sequential clinical trial P-values will vary considerably depending on the ordering applied The stage-wise ordering is preferred because: The p-value is less than α if and only if H 0 is reject The p-value does not depend on information levels beyond the observed stopping stage

Confidence intervals In general, (1 α) level confidence intervals for θ can be derived by inverting a hypothesis test with type-i error α Confidence intervals after a group sequential test will also rely on the orderings described previously for ordering the sample space Properties of confidence intervals after a group sequential test will depend on how the sample space is ordered

Inverting a hypothesis test For any ordering and any value of θ 0, we can find pairs (k u (θ 0 ), z u (θ 0 )) and (k l (θ 0 ), z l (θ 0 )) such that P θ=θ0 ((T, Z T ) (k u (θ 0 ), z u (θ 0 ))) = α/2 and P θ=θ0 ((T, Z T ) (k l (θ 0 ), z l (θ 0 ))) = α/2

Inverting a hypothesis test The acceptance region, A (θ 0 ) = {(k, z) : (k l (θ 0 ), z l (θ 0 )) < (k, z) < (k u (θ 0 ), z u (θ 0 ))} defines a two-sided hypothesis test of θ = θ 0 with type-i error rate α. Therefore, the set θ CS = {θ : (T, Z T ) A (θ)} if a (1 α)-level confidence set for θ

Inverting a hypothesis test If P θ ((T, Z T ) (k, z)) is an increasing function of θ for all pairs (k, z), then the set of all pairs (k, z) is said to be stochastically ordered We will refer to this as the monotonicity assumption In this case, (k l (θ 0 ), z l (θ 0 )) and (k u (θ 0 ), z u (θ 0 )) are increasing in θ, where increasing refers to the specified ordering of the sample space Therefore, if the monotonicity assumption holds, the set, θ CS, is an interval, (θ L, θ U ), where P θl ((T, Z T ) (k, z)) = P θu ((T, Z T ) (k, z)) = α/2

Desired Properties of Confidence Intervals We would like confidence sets formed after a group sequential design to have the following properties: θ CS should be an interval θ CS should agree with the original test θ CS should contain the MLE, ˆθ = Z T / I T Narrower confidence intervals are preferred θ CS should be well defined when information levels are unpredictable Whether or not these properties hold depends on how the sample space is ordered

θ CS should be an interval This holds for the stage-wide ordering when a two-sided or one-sided test is used but not for a two-sided test with an inner-wedge This holds for the MLE ordering This does not always hold for the score-test or likelihood ratio ordering but will be true in most instances

θ CS should agree with the original test This holds for the stage-wide and MLE orderings This does not necessarily hold for the likelihood ratio and score test orderings

θ CS should contain the MLE, ˆθ = Z T / I T This may not occur for the stage-wise ordering This will hold for the MLE, likelihood ratio and score test ordering

Narrower confidence intervals are preferred Width of confidence intervals depends on the design being used, confidence level and true value of θ Limited numerical studies have been completed MLE and Likelihood ratio orderings produce slightly narrower intervals but the difference is negligible

θ CS should be well defined when information levels are unpredictable Holds for the stage-wise ordering As previously mentioned, the MLE, likelihood and score test orderings rely on information at future, unobserved time-points Therefore, only the stage-wise ordering can be used when information levels are unpredictable

Confidence Intervals: Summary Confidence intervals can be formed by inverting a hypothesis test Confidence intervals will depend on how the sample space is ordered The stage-wise ordering is most commonly used when continuation regions are an interval MLE ordering is most commonly used when continuation regions are not an interval

Confidence Intervals: An Alternate Approach The previously described approach is appropriate for constructing confidence intervals at study completion We might, instead, prefer to confidence intervals that can be formed at any interim analysis This is particularly important for safety monitoring boards making decision as to whether or not the study should continue

Repeated Confidence Intervals Let CI 1, CI 2,..., CI K be a sequence of confidence intervals formed at the k = 1, 2,..., K This sequence of confidence intervals are known as repeated confidence intervals Repeated confidence intervals are impacted by multiple looks in the same way as repeated hypothesis tests That is, a sequence of (1 α)% confidence intervals will have less than (1 α)% coverage over the K interim analyses

Coverage Probability of Naive 95% Repeated Confidence Intervals K Overall Coverage Probability 1 0.95 2 0.92 3 0.89 4 0.87 5 0.86 10 0.81 20 0.75

Repeated Confidence Intervals with Correct Coverage The goal is to construct a sequence of repeated confidence intervals that provides correct overall coverage The simplest approach to achieving this goal is to invert a group sequential test with the appropriate type I error probability

Repeated Confidence Intervals with Correct Coverage In general, a two-sided group sequential hypothesis test of H 0 : β = β 0 will reject if for k = 1,..., K Z k = ( ˆβ k β 0 ) Ik > c k The general form of the repeated confidence intervals corresponding to this test is for k = 1,..., K CI k = {β 0 : ( ˆβk β 0 ) Ik < c k }

Repeated Confidence Intervals with Correct Coverage We know that ( ) ) P β0 ( ˆβk β 0 Ik > c k for some k = 1,..., K = α This implies P β0 (β 0 CI k for some k = 1,..., K ) = α That is, repeated confidence intervals derived form inverting a group sequential test will have correct overall coverage

Repeated Confidence Intervals with Correct Coverage In general, standard confidence intervals resulting from normal theory have the following form: ( ˆβ Z1 α/2 / I k, ˆβ + Z 1 α/2 / I k ) For a repeated confidence interval, the general form is ( ˆβ ck / I k, ˆβ + c k / I k ) Note: this is the case for inverting a two-sided test but will not necessarily be the case for inverting a one-sided test

Width of Repeated Confidence Intervals In order to achieve correct overall coverage, repeated confidence intervals will be wider than standard confidence intervals Provided below are the ratio of widths of 95% repeated confidence intervals formed by inverting Pocock and O Brien-Fleming boundaries compared to the width of standard confidence intervals Analysis Pocock O Brien-Fleming 1 1.231 2.328 2 1.231 1.646 3 1.231 1.344 4 1.231 1.164 5 1.231 1.041

Repeated Confidence Intervals: Summary Repeated confidence intervals can be formed at any interim analysis Repeated confidence intervals are calibrated to provide correct overall coverage Repeated confidence intervals are wider than standard confidence intervals