Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 7
Course Summary To this point, we have discussed group sequential testing focusing on Maintaining the correct type-i error rate and power Decreasing the expected sample size These approaches only provide a yes or no answer as to whether or not we reject the null hypothesis Generally, more detail is provided when presenting results
Four-Number Summary In general, the following should always be reported when presenting results Point estimate Confidence Interval p-value
Impact of a Group Sequential Design Implementing a group sequential procedure will change the properties of standard point and interval estimators Group sequential procedures change the sampling distribution of standard estimators Confidence intervals derived from normal approximations will no longer have nominal coverage
Impact of a Group Sequential Design Implementing a group sequential procedure will change the properties of standard point and interval estimators Group sequential procedures change the sampling distribution of standard estimators Confidence intervals derived from normal approximations will no longer have nominal coverage We will start by considering distribution theory for group sequential design and then consider the implication for point and interval estimation
Set-up Let β be our ( parameter of interest and assueme that the sequence of estimates ˆβ1,..., ˆβ ) K follows a multivariate normal distribution with ( ) ˆβk N β, I 1 β,k for k = 1,..., K [ Cov ˆβk, ˆβ ] [ ] j = Var ˆβj = I 1 β,j for k j
Set-up ( ) Define ˆβ β0 = ˆθ β and (β β 0 ) = θ β. For Z k = ˆθ β,k Ik, the sequence of test statistics (Z 1,..., Z K ) follows a multivariate normal distribution with Z k N ( θ β Iβ,k, 1 ) for k = 1,..., K Cov [ Z k, Z j ] = Iβ,k /I β,j for k j
Notation Let T be the stage at which stopping occurs: T = min (k : Z k C k ) where C k is the continuation region at stage k and C K =
Notation Let Z (k) = (Z 1,..., Z k ) be the vector of the first k test statistics and for k = 1,..., K, define A k = {z (k) : z i C i, i = 1,..., k 1, and z k C k } i.e. A k is the set of sample paths that terminate at stage k.
Density of (Z 1,..., Z k ) The joint density of (Z 1,..., Z k ) follows a multivariate normal distribution as described above The joint density of (Z 1,..., Z k ) can also be written as a product of independent normal random variables by considering the following transformation
Transformations Consider the following transformation Let and y 1 = z 1 Iβ,1 1 = I β,1 y i = z i Iβ,i z i 1 Iβ,i 1 i = I β,i I β,i 1
Joint Density of (y 1,..., y k ) For i = 1 y 1 is normally distributed with ] E [y 1 ] = E [z 1 Iβ,1 = θ β,1 I β,1 = θ β 1 ] Var [y 1 ] = Var [z 1 Iβ,1 = I β,1 = 1 For i = 2,..., k y i is normally distributed with ] E [y i ] = E [z i Iβ,i z i 1 Iβ,i 1 = θ β (I β,i I β,i 1 ) = θ β i ] Var [y i ] = Var [z i Iβ,i z i 1 Iβ,i 1 = I β,1 = i
Joint Density of (y 1,..., y k ) More importantly, we know that the y i s are indendent. Cov [ y i, y j ] = 0 for i j Recall that the z i s have independent increments
Joint Density of (y 1,..., y k ) This means that we can write the joint density of (y 1,..., y k ) as the product of independent normally distributed random variables f T, yt (k, y k θ beta ) = k i=1 1 e ( y i i θ β) 2 2 i 2π i Therefore, the joint density of (z 1,..., z k ), f T, z ( k, z (k) θ β ), can be evaluated by evaluating f T, yt (k, y k θ beta ) for the correct y k.
Joint Density of (y 1,..., y k ): Example Assume you have the following sequence of test statistics: z (k) = (0.73,.25, 0.33, 0.10) and the following sequence of information I β,k = (3.53, 5.00, 6.12)
Joint Density of (y 1,..., y k ): Example The resulting sequence of y i s and i s are y k = (0.73, 3.83, 3.27, 1.31) and k = (3.54, 1.46, 1.12, 0.95)
Joint Density of (y 1,..., y k ): Example Therefore, the joint density of z (k) is f T,z (T ) ((0.73, 0.25, 0.33, 0.10), k θ β ) =f T, yt (k, (0.73, 3.83, 3.27, 1.31) θ β ) k 1 = e (y i i θ β) 2 2 i 2π i i=1 =2.3 10 7
Equivalence of two joint distributions The preceding argument shows that the joint distribution of z (k) and y k are equivalent Therefore, we can simply study f T, yt to derive theoretical properties of z (k)
Joint Density of (y 1,..., y k ) We can re-write f T, yt (k, y k θ β ) as f T, yt (k, y k θ β ) = = k i=1 ( k 1 e ( y i i θ β) 2 2 i 2π i i=1 ) 1 e y k 2 2 i i θ β y i + 2 i θ2 β i=1 2 i 2π i ( k ) 1 = e y 2 i 2 i e θ βz k Iβ,k θ 2 β I β,k /2 2π i i=1 = h (k, y k, I 1,..., I k ) e θ βz k Iβ,k θ 2 β I β,k /2
Joint Density of (y 1,..., y k ) There are two primary implications from the previous results By factorization, we see that (Z T, T ) is sufficient for θ β Z T / I β,t is the MLE of θ β
Implications Implications of the sufficiency of (Z T, T ) for θ β The only information about θ β is contained in the stopping time and final Z That is, it only matters that you reached the kth stopping time. The exact path followed to the kth stopping time is irrelevant This should be somewhat intuitive given that the Z s have independent increments The final increment z k Iβ,k z k 1 Iβ,k 1 is independent of the first k 1 test statistics
Sub-densities of Z k To this point we have consider the joint density of (Z 1,..., Z k ) and (y 1,..., y k ) We might also consider the sub-densities of Z k, f (k, z k θ β ) The sub-densities can be found by integrating over all paths that result in terminating at the kth interim analysis
Sub-densities of Z k That is, the kth sub-density, f (k, z k θ β ) is defined as f (k, z k θ β ) = h (k, y k, I 1,..., I k ) e θ 2 βz k Iβ,k θβ I β,k /2 dy k 1... dy 1 B k( y) where B k ( y) is the set of all paths that result in terminating at the kth interim analysis
Sub-densities of Z k Note that if θ β = 0, f (k, z k θ β ) = h (k, y k, I 1,..., I k ) e θ 2 βz k Iβ,k θβ I β,k /2 dy k 1... dy 1 B k( y) = h (k, y k, I 1,..., I k ) dy k 1... dy 1 B k( y)
Sub-densities of Z k This implies that f (k, z k θ β ) = f (k, z k 0) e θ βz k Iβ,k θ 2 β I β,k /2 This is a helpful because it allows us to easily calculate sub-densities at multiple values of θ β.
Defining the sub-densities recursively The previous integral is potentially nasty Luckily, the sub-densities can be defined recursively, which aids in computation
Defining the sub-densities recursively The general form of the sub-densities is { g k (z θ β ) if z / C k f (k, z k θ β ) = 0 if z C k
Defining the sub-densities recursively Sub-density at the first interim analysis. At the first interim analysis, Z 1 is normally distributed with mean θ β I1 and variance 1 Therefore ) g 1 (z θ β ) = φ (z θ β I1
Defining the sub-densities recursively For k = 2,..., K, g k is defined recursively as ( Ik z I k u ) I k 1 k θ β g k (z θ β ) = g k 1 (u θ β ) φ du C k 1 k k
Defining the sub-densities recursively Essentially, each sub-density is the kernel of a normal density multiplied a factor accounting for the possibility of terminating early The inflation factor is determined by integrating over all sample paths that result in terminating at the kth interim analysis using the recursive procedure described before
Sub-Densities: Example Consider a group sequential design with O Brien-Fleming stopping boundaries and α = 0.10 k = 4 90% power to reject assuming that θ β = δ
Sub-Densities: Example theta = 0 sub densities 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 z
Sub-Densities: Example theta =.5 * delta sub densities 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 z
Sub-Densities: Example theta = delta sub densities 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 z
Sub-Densities: Example theta = 1.5 * delta sub densities 0.0 0.1 0.2 0.3 0.4 4 2 0 2 4 z
Sub-Densities and stopping times It should be noted that the sub-densities do not integrate to 1 Integrating each sub-density will give the probability of stopping at that interim analysis Pr (T = k θ β = θ) = f (k, z θ β = θ) dz z / C k In contrast, the sum of the k integrals will equal 1
Sub-Densities and stopping times: Example For example, if θ β = 0 and assuming the O Brien-Fleming design discussed before Pr (T = 1) = 0.0006 Pr (T = 2) = 0.0140 Pr (T = 3) = 0.0358 Pr (T = 2) = 0.9496
Sub-Densities and stopping times: Example If θ β = δ Pr (T = 1) = 0.0239 Pr (T = 2) = 0.3407 Pr (T = 3) = 0.3594 Pr (T = 2) = 0.2760
Estimating β To this point, we have considered distribution theory for a group sequential test of a general parameter β We are also interested in point and interval estimates of β In a fixed-sample test, point and interval estimates of β are based on a normal sampling distribution for ˆβ We have seen that implementing a group sequential procedure changes the sampling distribution of Z Group sequential procedures also change the sampling distribution of ˆβ and thus changes our approach to estimation after a group sequential test
Sampling distribution of ˆβ Previously, we defined the sub-densities of Z k, f (k, z k θ) It should be clear that the overall density is simply K f (z θ) = f (k, z k θ) How do we use this result to derive the sampling density of ˆβ? k=1
Sampling distribution of ˆβ Recall that Z k = ( ˆβ β0 ) Ik Therefore, the sampling density of ˆβ at ˆβ = y is f (y β) = K f k=1 ( k, (y β 0 ) I k θ) Ik Note that θ = (β β 0 ), in which case, conditioning on θ is synonimous to conditioning on β
Sampling distribution of ˆβ: example Consider the case where x 1, x 2,..., x 128 are i.i.d. N ( µ, σ 2 = 20 ) We want to complete a group sequential test of H 0 : µ = 0 In this case, ˆβ = X
Sampling distribution of ˆβ: example Consider a group sequential design with O Brien-Fleming stopping boundaries and α = 0.10 K = 4 In this case, I 1 I 4 = 1.6, 3.2, 4.8, 6.4
Density ˆβ: Example beta = 0 f(beta_hat) 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 beta_hat
Density ˆβ: Example beta = 0.5 f(beta_hat) 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 beta_hat
Density ˆβ: Example beta = 1 f(beta_hat) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 3 2 1 0 1 2 3 beta_hat
Density ˆβ: Example beta = 0 f(beta_hat) 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 beta_hat
Density ˆβ: Example We see that the sampling distribution is substantially difference when a group sequential test is used The sampling distribution is no longer normal and, therefore, interval estimates based on the normal approximation are no longer valid The difference between the sampling density under the group sequential test and the usual sampling density becomes more dramatic as β moves away from the null hypothesis
Expected value of ˆβ The expected value of ˆβ after a group sequential test can be expressed as [ ] E β ˆβ = β 0 + K i=1 z / C k z Ik f (k, z β) dz For simplicity, we will now consider a two-stage design with cotinuation region C 1 = (a, b) in order to illustrate the bias due to a group sequential clinical trial
Expected value of ˆβ After a two-stage design The expected value of ˆβ after a two-stage design can be expressed as: [ a E ˆβ] = β 0 + + + z 1 I1 φ b a b z 1 φ (z 1 θ ) I 1 dz 1 I1 (z 1 θ I 1 ) dz 1 z 2 φ (z 1 θ ) ( ) z2 I2 z 1 I1 (I 2 I 2 ) θ I 1 φ dz 2 dz 1 I2 I 1 I2 I 1
Expected value of ˆβ After a two-stage design At stage 1, z 1 is a truncated normal random variable with mean θ I 1 and variance 1 and b a z 1 φ (z 1 θ ) I 1 dz 1 = θφ (a θ ) I 1 φ ( a θ ) I 1 I1 I1 z 1 I1 φ (z 1 θ I 1 ) dz 1 = θ ( 1 Φ (b θ I 1 )) + φ ( b θ I 1 ) I1
Expected value of ˆβ After a two-stage design Consider the double integral, we see that ( ) z 2 z2 I2 z 1 I1 (I 2 I 1 ) θ φ dz 2 I2 I 1 I2 I 1 Is simply the expected value of x I 2, where x is a normally distributed random variable with mean ( z 1 I1 I 2 I 1 θ ) and variance I 2 I 2. Therefore: z 2 I2 I 1 φ ( z2 I2 z 1 I1 (I 2 I 1 ) θ I2 I 1 ) dz 2 = z 1 I1 + (I 2 I 1 ) θ I 2
Expected value of ˆβ After a two-stage design Therefore, a b a = b = θi ( 1 Φ I 2 z 2 φ (z 1 θ ( ) ) z 2 I2 z 1 I1 (I 2 I 2 ) θ I 1 φ dz 2 dz 1 I2 I 1 I2 I 1 z 1 I1 + (I 2 I 1 ) θ I 2 (b θ ) I 1 Φ + (I 2 I 1 ) θ ( I 2 ( =θ Φ (b θ I 1 ) Φ Φ (b θ ) I 1 φ (z 1 θ ) I 1 dz 1 (a θ )) I 1 + ((φ(a θ ) ( I 1 φ b θ )) I 1 I1 /I 2 Φ (a θ )) I 1 (a θ )) I 1 + ((φ(a θ ) ( I 1 φ b θ )) I 1 I1 /I 2
Expected value of ˆβ After a two-stage design Summing everything up, we get a β 0 + + b a + b z 1 φ (z 1 θ ) I 1 dz 1 I1 z 1 I1 φ (z 1 θ I 1 ) dz 1 z 2 φ (z 1 θ ( ) ) z 2 I2 z 1 I1 (I 2 I 2 ) θ I 1 φ dz 2 dz 1 I2 I 1 I2 I 1 =β 0 + θφ (a θ ) φ I 1 (a θ I 1 ) ( + θ 1 Φ (b θ )) φ (b θ ) I 1 I 1 + I1 I1 ( + θ Φ (b θ ) I 1 Φ (a θ )) I 1 + ((φ(a θ ) ( I 1 φ b θ )) I 1 I1 /I 2 =β + ((φ(b θ I 1 ) φ ( a θ I 1 )) I1 I 2 I1 I 2
Bias of ˆβ From the previous slide, we see that the bias in ˆβ is [ E ˆβ] = β + ((φ(b θ ) ( I 1 φ a θ )) I1 I 2 I 1 I1 I 2 = β + b (β) where the bias, b (β) depends on β a and b I 1 and I 2
Bias of ˆβ: Example Consider a two-stage design with O Brien-Fleming boundaries with α = 0.05 a 1 = 2.80 b 1 = 2.80 I 1 = 1.6 I 2 = 3.2
Bias of ˆβ: Example bias 0.15 0.10 0.05 0.00 0.05 0.10 0.15 4 2 0 2 4 beta
Bias of ˆβ: Example What if we double the information? I 1 = 3.2 I 2 = 6.4
Bias of ˆβ: Example bias 0.10 0.05 0.00 0.05 0.10 4 2 0 2 4 beta
Bias of ˆβ: Example Our first example considered symmetric bounds In this case, the bias was naturally symmetric about 0 What if we use asymmetric bounds? a 1 = 0.38 b 1 = 2.00 Information same as before
Bias of ˆβ: Example bias 0.15 0.10 0.05 0.00 0.05 0.10 0.15 4 2 0 2 4 beta
Bias of ˆβ: Example Again, doubling the information I 1 = 3.2 I 2 = 6.4
Bias of ˆβ: Example bias 0.10 0.05 0.00 0.05 0.10 4 2 0 2 4 beta
Bias of ˆβ: Summary Implementing a group sequential procedure results in substantial bias for ˆβ Bias is smallest at the extremes and in the middle of the continuation region Where the study either stops early or continues to full enrollment with high probability Bias is symmetric for symmetric bounds and asymmetric for asymmetric bounds
Correcting the Bias We will consider two estimators for correcting the biased caused by a group sequential design Whitehead s mean adjusted estimator UMVUE suggested by Emerson and Fleming
Whitehead s Mean adjusted Estimator Whitehead s mean adjusted estimator is defined as ˆβ w, such that ˆβ = ˆβ ( ) w + b ˆβw That is, whitehead s mean adjusted estimator is the true value of beta that results in an expectation equal to the observed ˆβ
Properties of Whitehead s Mean adjusted estimator ˆβw must be found by numerical search ˆβw is only biased adjusted and not unbiased
UMVUE Emerson and Fleming proposed the UMVUE defined as [ ˆβ umvue = E ˆβ 1 (T, Z T )] Where ˆβ 1 is the estimate of ˆβ 1 after stage 1 Note that ˆβ 1 is an unbiased estimator of β and we find the UMVUE by the Rao-Blackwell technique
Properties of the UMVUE ˆβumvue has the minimum variance among the class of unbiased estimators Unbiasedness is a restrictive property and the set of unbiased estimators is narrow This estimator has substantial bias and, in fact, has larger MSE than ˆβ w
Estimating β Implementing a group sequential design dramatically impacts the sampling distribution of ˆβ This results on substantial bias in ˆβ depending on the true value of β Unbiased or bias-reduced estimators have been proposed but we need to be mindful of the mean-variance trade-off when evaluating these estimators