Information in a Two-Stage Adaptive Optimal Design

Similar documents
Econometrics I, Estimation

Graduate Econometrics I: Maximum Likelihood I

DA Freedman Notes on the MLE Fall 2003

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Linear Methods for Prediction

Maximum Likelihood Estimation

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistics & Data Sciences: First Year Prelim Exam May 2018

1. Fisher Information

Introduction to Estimation Methods for Time Series models Lecture 2

Maximum Likelihood Estimation

EM Algorithm II. September 11, 2018

Statistical Inference

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

A Very Brief Summary of Statistical Inference, and Examples

Stat 710: Mathematical Statistics Lecture 12

simple if it completely specifies the density of x

i=1 h n (ˆθ n ) = 0. (2)

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA 260: Statistics and Probability II

On the efficiency of two-stage adaptive designs

Linear Methods for Prediction

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Chapter 7. Hypothesis Testing

Inference in non-linear time series

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Lecture 8: Information Theory and Statistics

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Generalized linear models

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

Review and continuation from last week Properties of MLEs

Problem Selected Scores

Brief Review on Estimation Theory

Sampling distribution of GLM regression coefficients

Central Limit Theorem ( 5.3)

Mathematical statistics

Generalized Linear Models I

Lecture 17: Likelihood ratio and asymptotic tests

Chapter 3. Point Estimation. 3.1 Introduction

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Ch. 5 Hypothesis Testing

Computational methods for mixed models

δ -method and M-estimation

A Very Brief Summary of Statistical Inference, and Examples

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

The loss function and estimating equations

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

6.1 Variational representation of f-divergences

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2014 Instructor: Victor Aguirregabiria

Fractional Imputation in Survey Sampling: A Comparative Review

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

5601 Notes: The Sandwich Estimator

Weighted Least Squares I

Outline of GLMs. Definitions

Mathematical statistics

BIOS 2083: Linear Models

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

STAT 512 sp 2018 Summary Sheet

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Lecture 28: Asymptotic confidence sets

Topic 12 Overview of Estimation

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Non-linear least squares

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

ECE531 Lecture 10b: Maximum Likelihood Estimation

Chapter 3 : Likelihood function and inference

Economics 583: Econometric Theory I A Primer on Asymptotics

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

1 One-way analysis of variance

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

STAT215: Solutions for Homework 2

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Generalized Linear Models. Kurt Hornik

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Cox regression: Estimation

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Chapter 3: Maximum Likelihood Theory

Master s Written Examination

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

LECTURE 18: NONLINEAR MODELS

Testing Restrictions and Comparing Models

Economics 620, Lecture 18: Nonlinear Models

HT Introduction. P(X i = x i ) = e λ λ x i

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Theory of Statistics.

Statistical Estimation

One-step ahead adaptive D-optimal design on a finite design. space is asymptotically optimal

Transcription:

Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for the Mathematical Sciences Stanford University, June 14-16, 2011

Motivating Question For adaptive designs, How does the selection of sequential treatments affect the properties of estimators? Even if the design is ancillary to the experiment, can it be ignored?

Heuristics Behind Adaptive Optimal Designs Optimal designs (e.g., designs that minimize the variance of best dose) are functions of the unknown parameters for nonlinear response functions. So they need to be estimated. If MLEs are consistent, in the limit MLEs of the optimal designs will be consistent. Hence estimating the optimal design with accruing data from sequential cohorts of subjects will provide increasing efficient designs, and a reasonable overall strategy for treatment allocation. This strategy has been proposed frequently in the optimal design literature starting with (before?) Box and Hunter (1963).

Outline: Information in a Two-stage model 1. One Parameter Regression Model with Exponential Mean Function 2. Basic Review for Independent Observations 3. A Two-Stage Design 4. Illustration with Exponential Mean Function 5. Conclusions

Notation treatments/stages x i, i = 1, 2; total sample size n = n i ; sample weights w i = n i /n; wi = 1 design {w i, x i }, n fixed; responses y i = (y i1,..., y ini ); expected response η i = η(x i, θ); mean response ȳ i = n 1 ni i j=i y ij

A Regression Model with Exponential Mean Function y = η(x, θ) + ɛ, ɛ N (0, 1) η(x, θ) = exp ( θx), θ (, ), 0 < x b < Observe responses y i = (y i1,..., y ini ) at x i. For two treatments, in canonical exponential family form: 2 2 L(θ, y 1, y 2 x 1, x 2 ) = f (θ, y i x i ) exp 1 n i (y ij η i ) 2 2 i=1 i=1 j=1 { 2 exp nw i (η i ȳ i 1 ) } 2 η2 i (x i, θ) i=1

A Regression Model The probabilities of estimates on the boundaries goes to zero as n, so I refer just to the interior for clarity of exposition.

Notation and Basic Elements: jth subject in ith stage single unit score function s ij = s ij (y ij x i, θ) = d dθ ln f (θ, y ij x i ) = (y ij η i ) dη i dθ = (y ij η i ) x i η i within-stage scores S i = n i j=1 s ij; total score S = 2 i=1 S i = 2 i=1 n i (ȳ i η i ) dη i dθ expected unit information [ µ i = µ(x i, θ) = Var yij x i [s ij ] = E yij x i d [ ( ) ] 2 E dηi yij x i dθ (yij η i ) d2 η i x dθ 2 i = dθ s ij ] x i = ( dηi dθ ) 2 = x 2 i η 2 i per unit expected information M(ξ, θ) = 1 n Var [S] = 2 i=1 w iµ i = 2 i=1 w ix 2 i η2 i.

MLE approximation 1. ln{l n } is twice differentiable in the neighborhood of the true parameter θ t, so a Taylor expansion of ln{l n } yields ln{l n } = ln{l n } θ=θt + (θ θ t ) (S θ=θt ) + 1 ( 2 (θ θ t) 2 ds dθ where θ (θ t, ˆθ n ). 2. Max θ {ln{l n }} occurs where S + (θ θ t ) d S dθ = 0. 3. Taking the derivative of ln{l n } and rearranging terms, for θ = ˆθ in the neighborhood of θ t, ) n (ˆθ n θ t ( 1 n ) d S 1 1 n S. dθ θ= θ ),

Asymptotic Normality of the MLE - Given x 1 and x 2 ( 1 S = 1 n1 w n n 1 j=1 s 1j n1 + n2 j=1 w s ) 2j 2 n2 ( 1 n N (0, w 1 µ 1 + w 2 µ 2 ). ) n1 d S j=1 d dθ = w s 1j 1 dθ By Slutsky s theorem, ( 1 d S n dθ + w 2 n 1 as w 1µ 1 + w 2 µ 2. n ) 1 1 n S n2 j=1 d dθ s 2j n 2 LLN ( D N 0, [w 1 µ 1 + w 2 µ 2 ] 1). n

Adaptively Selecting the Stage 2 Design Point Observe y 1 at fixed x 1. Then select the stage 2 design point as x 2 = arg max Var y2j x x 2 [s 2j ] θ=ˆθ1 ( { }) = arg max x 2 {ˆθ 1 } exp 2ˆθ 1 x = min x 1, b. The MLE from the stage 1 data is if 0 < ȳ 1 < 1; at bounds else ˆθ 1 = ln ȳ 1 /x 1,

The Adaptive Likelihood Assuming responses given the treatment are independent of the past, i.e., f (y 2 x 2, x 1, y 1, θ) = f (y 2 x 2, θ), the total likelihood after stage 2 is L(x 1, x 2, y 1, y 2, θ) = f (y 2 x 2, θ)f (x 2 x 1, y 1, θ)f (y 1 x 1, θ). So long as x 2 is a completely determined by x 1 and y 1, f (x 2 x 1, y 1, θ) is a delta function; the design is ancillary. Note density is no longer member of exponential family: L(x 1, x 2, y 1, y 2, θ) = f (y 2 x 2, θ)f (y 1 x 1, θ) ( exp {nw 1 η 1 ȳ 1 1 ) ( 2 η2 1 + nw 2 η 2 (ȳ 1, x 1 ) ȳ 2 1 )} 2 η2 2 (ȳ 1, x 1 ).

Adaptive Expected Information: Var [s ij ] = E [ 1 n i d dθ s ij ]. E yij x i [ 1 n i ] [ (dηi d dθ s ij = E yij x i dθ = x 2 i η 2 i = µ (x i, θ). ) 2 (y ij η i ) d 2 η i dθ 2 ( ) 2 { µ(x 2, θ) = x2 2 exp 2θx x1 2 = exp 2θ lnȳ 1 [ E 1 ] { d n i dθ s µ (x 1, θ) if i = 1 ij = Eȳ1 [µ (ȳ 1, θ)] if i = 2. ] x i ( x1 lnȳ 1 )}.

Second stage information NOTE: µ(x 2, θ) is random function of ȳ 1! µ(x 2, θ) will only converge to a constant only as ȳ 1 converges to a constant. Conditioning on x 2 is equivalent to conditioning of stage 1 responses!!!!

) n (ˆθn θ t ( 1 n ) d S 1 n 1 dθ S f (S) = f (S ȳ 1 )f (ȳ 1 )dȳ 1. ( 1 S n ȳ1 = 1 n1 w n 1 j=1 s 1j n1 + w 2 n2 j=1 s 2j n2 ) ȳ 1 ( 1 n = 1 n1 j=1 s 1j w n 1 + N (0, w 2 µ 2 ) n1 ) n1 d S j=1 d dθ = w s 1j 1 dθ + w 2 n 1 as w 1µ 1 + w 2 µ 2 n n2 j=1 d dθ s 2j n 2

Illustration: θ = 1, x (.01, 100) x 1 = 0.5 optimal x 2 = arg max x Var y2j x 2 [s 2j ] = 1.0; θ=1 adaptive x 2 = arg max x Var y2j x 2 [s 2j ] θ=ˆθ1

Asymptotic Fisher for x =.5 and x = 1.0 alone; two-sample locally optimal and median two-stage plug-in estimates for n = 30, 100, 300 versus w 1 at x =.5.

Two-sample locally optimal Fisher and percentiles of two-stage plug in estimates for n = 1000 versus w 1 at x =.5.

Stage 2 2.5th, 50th and 97.5th Percentiles of µ 2 ; n 1 = n 2 =.5 (a) n i = 30 (b) n i = 100

Conclusions The locally optimal adaptive design is ancillary, but informative. The conditional incremental information after the first stage is a random variable depending on stage one observations. The conditional incremental information does not achieve the Cramer-Rao Bound MLEs from the locally optimal adaptive design do not have the hoped for optimality, and if the stage one design has a small sample size, their variance is random.

Thank you!

References Yao, P, Flournoy, N. (2010) Information in a Two-stage Adaptive Optimal Design for Normal Random Variables having a One Parameter Exponential Mean Function. MoDa 9 229-236. Springer (eds. Giovagnoli, A., Atkinson, A.C., Torsney, B., May, C.).

Asymptotic Fisher 1 for x =.5 and x = 1.0 alone; two-sample locally optimal and two-stage n MSE(ˆθ), n = 1000 versus w 1 at x =.5.

Asymptotic Fisher 1 for x =.5 and x = 1.0 alone; two-sample locally optimal and two-stage n MSE(ˆθ), n = 30, 100, 1000 versus w 1 at x =.5.

Remarks The max x1 {µ 2 } = 0.135, which is the asymptotic Fisher s information. The 97.5th percentiles of µ 2 attain 0.135 at all but the highest values of x 1 for n = 100 and 30. In contrast, the 97.5th percentile of d dθ s 2j is greater than 0.135 except for values of x 1 somewhat less than one. Furthermore, d dθ s 2j is negative with high probability.

Remarks The median of µ 2, attains its maximum value when x 1 = 1 for n = 100 and 30. The median of µ 2 comes closer to 0.135 at x 1 = 1 as the sample size increases. Indeed, the median of µ 2 is close to 0.135 for a range of values of x 1 that includes x 1 = 1; this range is larger for for n = 100 than for n = 30. For n = 30, the 2.5th percentile of µ 2 is zero, except for a very small blip for x 1 just less than one; however, for n = 100, the 2.5th percentile of µ 2 is nearly quadratic for x 1 (0.2, 1.8) with its maximum approximately 50% of 0.135.