Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 8

P-values When reporting results, we usually report p-values in place of reporting whether or not we reject the null hypothesis For better or worse, p-values are usually used by investigators to evaluate the strength of evidence against the null hypothesis P-values are then translated into hypothesis tests by comparing p-values to a nominal significance level (usually 0.05)

P-values and group sequential designs We ve seen that a group sequential testing procedure will change the sampling distribution of a test statistic Z This means that inference based on usual normal approximations are no longer appropriate How do we calculate p-values for group sequential designs?

Problem A p-value can be interpreted as the probability under the null hypothesis of observing a test statistic as extreme or more extreme than what was observed This simply in a fixed-sample design Z 1 < Z 2 implies Z 2 is more extreme than Z 1 This is not so clear in the group sequential setting

Problem Which of the following realizations is more extreme? (T = 1, Z 1 = 3.70) (T = 2, Z 2 = 4.50) It depends how you order the sample space? It is not obvious how this should be done We will consider four possible ordering stage-wide ordering MLE ordering Likelihood ordering Score Test ordering

Stage-wise Ordering the pair (k 2, z 2 ) > (k 1, z 1 ) if any of the following are true k 2 = k 1 and z 2 z 1 k 2 < k 1 and z 2 b k2 k 2 > k 1 and z 1 a k1

MLE Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 / I k2 > z 1 / I k1

Likelihood Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 > z 1

Score-Test Ordering The pair (k 2, z 2 ) > (k 1, z 1 ) if z 2 Ik2 > z 1 Ik1

Ordering: Example Consider a one-sided power family test with k = 5 and the following stopping boundaries a 1 : a 5 = 3.49, 0.70, 0.43, 1.13, 1.63 b 1 : b 5 = 8.11, 4.06, 2.70, 2.03, 1.63 Consider the following pairs of sufficient statistics with information levels: 1, 2, 3, 5 and 5 (k 1, z 1 ) = (2, 5.0) (k 2, z 2 ) = (4, 0.5) (k 3, z 3 ) = (5, 1.8)

Stage-wide Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) k 1 < k 2 and z 1 > b 1 (k 1, z 1 ) > (k 3, z 3 ) k 1 < k 3 and z 1 > b 1 (k 3, z 3 ) > (k 2, z 2 ) k 3 > k 2 and z 2 < a 2

MLE Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 / I 1 = 3.54 > z 2 / I 2 = 0.25 (k 1, z 1 ) > (k 3, z 3 ) z 1 / I 1 = 3.54 > z 3 / I 3 = 0.80 (k 3, z 3 ) > (k 2, z 2 ) z 3 / I 3 = 0.80 > z 2 / I 2 = 0.25

Likelihood Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 > z 2 (k 1, z 1 ) > (k 3, z 3 ) z 1 > z 3 (k 3, z 3 ) > (k 2, z 2 ) z 3 > z 2

Score-test Ordering: Example (k 1, z 1 ) > (k 2, z 2 ) z 1 I1 = 7.07 > z 2 I2 = 1 (k 1, z 1 ) > (k 3, z 3 ) z 1 I1 = 7.07 > z 3 I3 = 4.02 (k 3, z 3 ) > (k 2, z 2 ) z 3 I3 = 4.02 > z 2 I2 = 0.80

Calculating P-values The orderings described on the preceding slides allow us to order the sample space in a sensible manner How do we translate these orderings into a p-value? Recall the definition of a p-value: The probability under the null hypothesis of observing a test statistic as extreme or more extreme than what was observed

Calculating P-values One-sided upper P-value: One-sided lower P-value: P θ=0 ((T, Z T ) (k, Z k )) P θ=0 ((T, Z T ) (k, Z k )) Two-sided p-value is equal to twice the minimum of the upper and lower p-value

Calculating P-values: Stage-wise Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the stage-wide ordering is: k 1 P θ=0 ((T, Z T ) (k, Z k )) = + i=1 b i z k f (i, z θ = 0) dz f (k, z θ = 0) dz

Properties of the Stage-wise ordering The p-value is less than α if and only if H 0 is reject The p-value does not depend on information levels beyond the observed stopping stage

Calculating P-values: MLE Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the MLE ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k Ii /I k f (i, z θ = 0) dz

Calculating P-values: Likelihood Ratio Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the likelihood ratio ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k f (i, z θ = 0) dz

Calculating P-values: Score test Ordering For sufficient statistic, (k, z k ), the one-sided upper p-value assuming the score test ordering is: P θ=0 ((T, Z T ) (k, Z k )) = K i=1 z k Ik /I i f (i, z θ = 0) dz

Properties of the MLE, likelihood and score test orderings All three cases potentially involve integrating over regions that do not correspond to rejecting the null hypothesis P-values depend on the information levels for future (unobserved) stopping times

Calculating P-values: Example Consider a group sequential design with two-sided O Brien-Fleming boundaries, k = 5 and α = 0.05 b 1 = 4.56 b 2 = 3.23 b 3 = 2.63 b 4 = 2.28 b 5 = 2.04 Two cases: (k, z k ) = (2, 3.5) (k, z k ) = (5, 2.5)

Calculating P-values: Example 1 Stage-wise ordering ( p = 2 f (1, z θ = 0) dz + 4.56 = 0.0005 3.5 ) f (2, z θ = 0) dz

Calculating P-values: Example 1 MLE ordering ( p = 2 f (1, z θ = 0) dz + 3.5.2/.4 + f (3, z θ = 0) dz + 3.5.6/.4 ) + 3.5 1/.4 = 0.0138 f (5, z θ = 0) dz 3.5 3.5.8/.4 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 1 Likelihood ratio ordering p = 2 ( 5 = 0.0013 i=1 3.5 f (i, z θ = 0) dz )

Calculating P-values: Example 1 Score test ordering ( p = 2 f (1, z θ = 0) dz + 3.5.4/.2 + f (3, z θ = 0) dz + 3.5.4/.6 ) + 3.5.4/1 = 0.0258 f (5, z θ = 0) dz 3.5 3.5.4/.8 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 1 Summary Ordering p-value Stage-wise 0.0005 MLE 0.0138 LR 0.0013 Score test 0.0258

Calculating P-values: Example 2 Stage-wise ordering ( p = 2 f (1, z θ = 0) dz + 4.56 = + 2.63 2.5 = 0.0295 f (3, z θ = 0) dz + ) f (5, z θ = 0) dz 2.28 3.23 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 MLE ordering ( p = 2 f (1, z θ = 0) dz + 2.5.2/.4 + f (3, z θ = 0) dz + 2.5.6/.4 ) + 2.5 1/.4 = 0.0913 f (5, z θ = 0) dz 2.5 2.5.8/.4 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 Likelihood ratio ordering p = 2 ( 5 = 0.0481 i=1 2.5 f (i, z θ = 0) dz )

Calculating P-values: Example 2 Score test ordering ( p = 2 f (1, z θ = 0) dz + 2.5.4/.2 + f (3, z θ = 0) dz + 2.5.4/.6 ) + 2.5.4/1 = 0.2129 f (5, z θ = 0) dz 2.5 2.5.4/.8 f (2, z θ = 0) dz f (4, z θ = 0) dz

Calculating P-values: Example 2 Summary Ordering p-value Stage-wise 0.0295 MLE 0.0913 LR 0.0481 Score test 0.2129

P-values: Summary There are many approaches to ordering the sample space after a group sequential clinical trial P-values will vary considerably depending on the ordering applied The stage-wise ordering is preferred because: The p-value is less than α if and only if H 0 is reject The p-value does not depend on information levels beyond the observed stopping stage

Confidence intervals In general, (1 α) level confidence intervals for θ can be derived by inverting a hypothesis test with type-i error α Confidence intervals after a group sequential test will also rely on the orderings described previously for ordering the sample space Properties of confidence intervals after a group sequential test will depend on how the sample space is ordered

Inverting a hypothesis test For any ordering and any value of θ 0, we can find pairs (k u (θ 0 ), z u (θ 0 )) and (k l (θ 0 ), z l (θ 0 )) such that P θ=θ0 ((T, Z T ) (k u (θ 0 ), z u (θ 0 ))) = α/2 and P θ=θ0 ((T, Z T ) (k l (θ 0 ), z l (θ 0 ))) = α/2

Inverting a hypothesis test The acceptance region, A (θ 0 ) = {(k, z) : (k l (θ 0 ), z l (θ 0 )) < (k, z) < (k u (θ 0 ), z u (θ 0 ))} defines a two-sided hypothesis test of θ = θ 0 with type-i error rate α. Therefore, the set θ CS = {θ : (T, Z T ) A (θ)} if a (1 α)-level confidence set for θ

Inverting a hypothesis test If P θ ((T, Z T ) (k, z)) is an increasing function of θ for all pairs (k, z), then the set of all pairs (k, z) is said to be stochastically ordered We will refer to this as the monotonicity assumption In this case, (k l (θ 0 ), z l (θ 0 )) and (k u (θ 0 ), z u (θ 0 )) are increasing in θ, where increasing refers to the specified ordering of the sample space Therefore, if the monotonicity assumption holds, the set, θ CS, is an interval, (θ L, θ U ), where P θl ((T, Z T ) (k, z)) = P θu ((T, Z T ) (k, z)) = α/2

Desired Properties of Confidence Intervals We would like confidence sets formed after a group sequential design to have the following properties: θ CS should be an interval θ CS should agree with the original test θ CS should contain the MLE, ˆθ = Z T / I T Narrower confidence intervals are preferred θ CS should be well defined when information levels are unpredictable Whether or not these properties hold depends on how the sample space is ordered

θ CS should be an interval This holds for the stage-wide ordering when a two-sided or one-sided test is used but not for a two-sided test with an inner-wedge This holds for the MLE ordering This does not always hold for the score-test or likelihood ratio ordering but will be true in most instances

θ CS should agree with the original test This holds for the stage-wide and MLE orderings This does not necessarily hold for the likelihood ratio and score test orderings

θ CS should contain the MLE, ˆθ = Z T / I T This may not occur for the stage-wise ordering This will hold for the MLE, likelihood ratio and score test ordering

Narrower confidence intervals are preferred Width of confidence intervals depends on the design being used, confidence level and true value of θ Limited numerical studies have been completed MLE and Likelihood ratio orderings produce slightly narrower intervals but the difference is negligible

θ CS should be well defined when information levels are unpredictable Holds for the stage-wise ordering As previously mentioned, the MLE, likelihood and score test orderings rely on information at future, unobserved time-points Therefore, only the stage-wise ordering can be used when information levels are unpredictable

Confidence Intervals: Summary Confidence intervals can be formed by inverting a hypothesis test Confidence intervals will depend on how the sample space is ordered The stage-wise ordering is most commonly used when continuation regions are an interval MLE ordering is most commonly used when continuation regions are not an interval

Confidence Intervals: An Alternate Approach The previously described approach is appropriate for constructing confidence intervals at study completion We might, instead, prefer to confidence intervals that can be formed at any interim analysis This is particularly important for safety monitoring boards making decision as to whether or not the study should continue

Repeated Confidence Intervals Let CI 1, CI 2,..., CI K be a sequence of confidence intervals formed at the k = 1, 2,..., K This sequence of confidence intervals are known as repeated confidence intervals Repeated confidence intervals are impacted by multiple looks in the same way as repeated hypothesis tests That is, a sequence of (1 α)% confidence intervals will have less than (1 α)% coverage over the K interim analyses

Coverage Probability of Naive 95% Repeated Confidence Intervals K Overall Coverage Probability 1 0.95 2 0.92 3 0.89 4 0.87 5 0.86 10 0.81 20 0.75

Repeated Confidence Intervals with Correct Coverage The goal is to construct a sequence of repeated confidence intervals that provides correct overall coverage The simplest approach to achieving this goal is to invert a group sequential test with the appropriate type I error probability

Repeated Confidence Intervals with Correct Coverage In general, a two-sided group sequential hypothesis test of H 0 : β = β 0 will reject if for k = 1,..., K Z k = ( ˆβ k β 0 ) Ik > c k The general form of the repeated confidence intervals corresponding to this test is for k = 1,..., K CI k = {β 0 : ( ˆβk β 0 ) Ik < c k }

Repeated Confidence Intervals with Correct Coverage We know that ( ) ) P β0 ( ˆβk β 0 Ik > c k for some k = 1,..., K = α This implies P β0 (β 0 CI k for some k = 1,..., K ) = α That is, repeated confidence intervals derived form inverting a group sequential test will have correct overall coverage

Repeated Confidence Intervals with Correct Coverage In general, standard confidence intervals resulting from normal theory have the following form: ( ˆβ Z1 α/2 / I k, ˆβ + Z 1 α/2 / I k ) For a repeated confidence interval, the general form is ( ˆβ ck / I k, ˆβ + c k / I k ) Note: this is the case for inverting a two-sided test but will not necessarily be the case for inverting a one-sided test

Width of Repeated Confidence Intervals In order to achieve correct overall coverage, repeated confidence intervals will be wider than standard confidence intervals Provided below are the ratio of widths of 95% repeated confidence intervals formed by inverting Pocock and O Brien-Fleming boundaries compared to the width of standard confidence intervals Analysis Pocock O Brien-Fleming 1 1.231 2.328 2 1.231 1.646 3 1.231 1.344 4 1.231 1.164 5 1.231 1.041

Repeated Confidence Intervals: Summary Repeated confidence intervals can be formed at any interim analysis Repeated confidence intervals are calibrated to provide correct overall coverage Repeated confidence intervals are wider than standard confidence intervals