Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for the Mathematical Sciences Stanford University, June 14-16, 2011
Motivating Question For adaptive designs, How does the selection of sequential treatments affect the properties of estimators? Even if the design is ancillary to the experiment, can it be ignored?
Heuristics Behind Adaptive Optimal Designs Optimal designs (e.g., designs that minimize the variance of best dose) are functions of the unknown parameters for nonlinear response functions. So they need to be estimated. If MLEs are consistent, in the limit MLEs of the optimal designs will be consistent. Hence estimating the optimal design with accruing data from sequential cohorts of subjects will provide increasing efficient designs, and a reasonable overall strategy for treatment allocation. This strategy has been proposed frequently in the optimal design literature starting with (before?) Box and Hunter (1963).
Outline: Information in a Two-stage model 1. One Parameter Regression Model with Exponential Mean Function 2. Basic Review for Independent Observations 3. A Two-Stage Design 4. Illustration with Exponential Mean Function 5. Conclusions
Notation treatments/stages x i, i = 1, 2; total sample size n = n i ; sample weights w i = n i /n; wi = 1 design {w i, x i }, n fixed; responses y i = (y i1,..., y ini ); expected response η i = η(x i, θ); mean response ȳ i = n 1 ni i j=i y ij
A Regression Model with Exponential Mean Function y = η(x, θ) + ɛ, ɛ N (0, 1) η(x, θ) = exp ( θx), θ (, ), 0 < x b < Observe responses y i = (y i1,..., y ini ) at x i. For two treatments, in canonical exponential family form: 2 2 L(θ, y 1, y 2 x 1, x 2 ) = f (θ, y i x i ) exp 1 n i (y ij η i ) 2 2 i=1 i=1 j=1 { 2 exp nw i (η i ȳ i 1 ) } 2 η2 i (x i, θ) i=1
A Regression Model The probabilities of estimates on the boundaries goes to zero as n, so I refer just to the interior for clarity of exposition.
Notation and Basic Elements: jth subject in ith stage single unit score function s ij = s ij (y ij x i, θ) = d dθ ln f (θ, y ij x i ) = (y ij η i ) dη i dθ = (y ij η i ) x i η i within-stage scores S i = n i j=1 s ij; total score S = 2 i=1 S i = 2 i=1 n i (ȳ i η i ) dη i dθ expected unit information [ µ i = µ(x i, θ) = Var yij x i [s ij ] = E yij x i d [ ( ) ] 2 E dηi yij x i dθ (yij η i ) d2 η i x dθ 2 i = dθ s ij ] x i = ( dηi dθ ) 2 = x 2 i η 2 i per unit expected information M(ξ, θ) = 1 n Var [S] = 2 i=1 w iµ i = 2 i=1 w ix 2 i η2 i.
MLE approximation 1. ln{l n } is twice differentiable in the neighborhood of the true parameter θ t, so a Taylor expansion of ln{l n } yields ln{l n } = ln{l n } θ=θt + (θ θ t ) (S θ=θt ) + 1 ( 2 (θ θ t) 2 ds dθ where θ (θ t, ˆθ n ). 2. Max θ {ln{l n }} occurs where S + (θ θ t ) d S dθ = 0. 3. Taking the derivative of ln{l n } and rearranging terms, for θ = ˆθ in the neighborhood of θ t, ) n (ˆθ n θ t ( 1 n ) d S 1 1 n S. dθ θ= θ ),
Asymptotic Normality of the MLE - Given x 1 and x 2 ( 1 S = 1 n1 w n n 1 j=1 s 1j n1 + n2 j=1 w s ) 2j 2 n2 ( 1 n N (0, w 1 µ 1 + w 2 µ 2 ). ) n1 d S j=1 d dθ = w s 1j 1 dθ By Slutsky s theorem, ( 1 d S n dθ + w 2 n 1 as w 1µ 1 + w 2 µ 2. n ) 1 1 n S n2 j=1 d dθ s 2j n 2 LLN ( D N 0, [w 1 µ 1 + w 2 µ 2 ] 1). n
Adaptively Selecting the Stage 2 Design Point Observe y 1 at fixed x 1. Then select the stage 2 design point as x 2 = arg max Var y2j x x 2 [s 2j ] θ=ˆθ1 ( { }) = arg max x 2 {ˆθ 1 } exp 2ˆθ 1 x = min x 1, b. The MLE from the stage 1 data is if 0 < ȳ 1 < 1; at bounds else ˆθ 1 = ln ȳ 1 /x 1,
The Adaptive Likelihood Assuming responses given the treatment are independent of the past, i.e., f (y 2 x 2, x 1, y 1, θ) = f (y 2 x 2, θ), the total likelihood after stage 2 is L(x 1, x 2, y 1, y 2, θ) = f (y 2 x 2, θ)f (x 2 x 1, y 1, θ)f (y 1 x 1, θ). So long as x 2 is a completely determined by x 1 and y 1, f (x 2 x 1, y 1, θ) is a delta function; the design is ancillary. Note density is no longer member of exponential family: L(x 1, x 2, y 1, y 2, θ) = f (y 2 x 2, θ)f (y 1 x 1, θ) ( exp {nw 1 η 1 ȳ 1 1 ) ( 2 η2 1 + nw 2 η 2 (ȳ 1, x 1 ) ȳ 2 1 )} 2 η2 2 (ȳ 1, x 1 ).
Adaptive Expected Information: Var [s ij ] = E [ 1 n i d dθ s ij ]. E yij x i [ 1 n i ] [ (dηi d dθ s ij = E yij x i dθ = x 2 i η 2 i = µ (x i, θ). ) 2 (y ij η i ) d 2 η i dθ 2 ( ) 2 { µ(x 2, θ) = x2 2 exp 2θx x1 2 = exp 2θ lnȳ 1 [ E 1 ] { d n i dθ s µ (x 1, θ) if i = 1 ij = Eȳ1 [µ (ȳ 1, θ)] if i = 2. ] x i ( x1 lnȳ 1 )}.
Second stage information NOTE: µ(x 2, θ) is random function of ȳ 1! µ(x 2, θ) will only converge to a constant only as ȳ 1 converges to a constant. Conditioning on x 2 is equivalent to conditioning of stage 1 responses!!!!
) n (ˆθn θ t ( 1 n ) d S 1 n 1 dθ S f (S) = f (S ȳ 1 )f (ȳ 1 )dȳ 1. ( 1 S n ȳ1 = 1 n1 w n 1 j=1 s 1j n1 + w 2 n2 j=1 s 2j n2 ) ȳ 1 ( 1 n = 1 n1 j=1 s 1j w n 1 + N (0, w 2 µ 2 ) n1 ) n1 d S j=1 d dθ = w s 1j 1 dθ + w 2 n 1 as w 1µ 1 + w 2 µ 2 n n2 j=1 d dθ s 2j n 2
Illustration: θ = 1, x (.01, 100) x 1 = 0.5 optimal x 2 = arg max x Var y2j x 2 [s 2j ] = 1.0; θ=1 adaptive x 2 = arg max x Var y2j x 2 [s 2j ] θ=ˆθ1
Asymptotic Fisher for x =.5 and x = 1.0 alone; two-sample locally optimal and median two-stage plug-in estimates for n = 30, 100, 300 versus w 1 at x =.5.
Two-sample locally optimal Fisher and percentiles of two-stage plug in estimates for n = 1000 versus w 1 at x =.5.
Stage 2 2.5th, 50th and 97.5th Percentiles of µ 2 ; n 1 = n 2 =.5 (a) n i = 30 (b) n i = 100
Conclusions The locally optimal adaptive design is ancillary, but informative. The conditional incremental information after the first stage is a random variable depending on stage one observations. The conditional incremental information does not achieve the Cramer-Rao Bound MLEs from the locally optimal adaptive design do not have the hoped for optimality, and if the stage one design has a small sample size, their variance is random.
Thank you!
References Yao, P, Flournoy, N. (2010) Information in a Two-stage Adaptive Optimal Design for Normal Random Variables having a One Parameter Exponential Mean Function. MoDa 9 229-236. Springer (eds. Giovagnoli, A., Atkinson, A.C., Torsney, B., May, C.).
Asymptotic Fisher 1 for x =.5 and x = 1.0 alone; two-sample locally optimal and two-stage n MSE(ˆθ), n = 1000 versus w 1 at x =.5.
Asymptotic Fisher 1 for x =.5 and x = 1.0 alone; two-sample locally optimal and two-stage n MSE(ˆθ), n = 30, 100, 1000 versus w 1 at x =.5.
Remarks The max x1 {µ 2 } = 0.135, which is the asymptotic Fisher s information. The 97.5th percentiles of µ 2 attain 0.135 at all but the highest values of x 1 for n = 100 and 30. In contrast, the 97.5th percentile of d dθ s 2j is greater than 0.135 except for values of x 1 somewhat less than one. Furthermore, d dθ s 2j is negative with high probability.
Remarks The median of µ 2, attains its maximum value when x 1 = 1 for n = 100 and 30. The median of µ 2 comes closer to 0.135 at x 1 = 1 as the sample size increases. Indeed, the median of µ 2 is close to 0.135 for a range of values of x 1 that includes x 1 = 1; this range is larger for for n = 100 than for n = 30. For n = 30, the 2.5th percentile of µ 2 is zero, except for a very small blip for x 1 just less than one; however, for n = 100, the 2.5th percentile of µ 2 is nearly quadratic for x 1 (0.2, 1.8) with its maximum approximately 50% of 0.135.