COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

Size: px

Start display at page:

Download "COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)"

Valentine Cameron
5 years ago
Views:

1 STAT 52 Completely Randomized Designs COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure) Completely randomized means no restrictions on the randomization of units to treatments (except for sample sizes ordinarily fixed in advance), so with N available units: Randomly select n units for assignment to trt Randomly select n 2 remaining units for assignment to trt 2... Last n t units are assigned to trt t Or, any other procedure s.t. all N!/(n!n 2!...n t!) possible assignment patterns have equal probability Ordinarily reflects model and assumptions for -way ANOVA

2 STAT 52 Completely Randomized Designs 2 Cell Means model: y i,j = µ i + ɛ i,j i =...t, j =...n i, t i= n i = N µ i represents E[y i,j ], an expected response corresponding to treatment i, in the laboratory in which the experiment was performed on the day the experiment was run... E[ɛ i,j ] = 0, V ar[ɛ i,j ] = σ 2, independent, perhaps normal ɛ represents (in part) unit-to-unit variation; indep is justified (in part) by complete randomization Matrix representation: y = Xµ + ɛ [N ] = [N t][t ] + [N ]

3 STAT 52 Completely Randomized Designs 3 y,... y,n y 2,... y 2,n2... y t,... y t,nt = µ µ 2... µ t + ɛ,... ɛ,n ɛ 2,... ɛ 2,n2... ɛ t,... ɛ t,nt

4 STAT 52 Completely Randomized Designs 4 Effects model: y i,j = α + τ i + ɛ i,j This says the same thing mathematically... Let α = anything, then τ i = µ i α α now conceptually represents everything in common to all experimental runs, τ i is the deviation from this due only to treatment i Matrix representation: y = Xθ + ɛ, θ = (α, τ, τ 2,..., τ t ) [N ] = [N (t + )][(t + ) ] + [N ] X is a different matrix now... will often use X generically Less-than-full-rank

5 STAT 52 Completely Randomized Designs 5 y,... y,n y 2,... y 2,n2... y t,... y t,nt = α τ τ 2... τ t + ɛ,... ɛ,n ɛ 2,... ɛ 2,n2... ɛ t,... ɛ t,nt

6 STAT 52 Completely Randomized Designs 6 Estimation: a review y = Xθ + ɛ... (general X, iid errors) (Xθ is the expectation of y... the noiseless model ) MLE for normal errors, LS in any case, is any sol n to: X Xˆθ = X y Normal equations, as in norm, not normal dist n X of full column rank: ˆθ = (X X) X y Otherwise: ˆθ = (X X) X y generalized inverse, non-unique AA A = A Need to review statistical properties of ˆθ

7 STAT 52 Completely Randomized Designs 7 Expectations: Recall that for random y with E[y] = m, V ar[y] = V vector of means m, matrix of variances and covariances V E[Ay] = Am V ar[ay] = AVA (generalization of constant-squared in the scalar case) So, E[ˆθ] = (X X) X (Xθ)... generally E[ˆθ] = θ... full rank

8 STAT 52 Completely Randomized Designs 8 In the less-than-full-rank case, ˆθ is not uniquely determined, but consider estimable functions: c θ, where c = l X for some N-vector l ĉ θ = c ˆθ = (l X)(X X) X y X(X X) X = H is invariant to choice of g-inverse projection matrix, into space spanned by columns of X symmetric N N, idempotent, H for hat because ŷ = Xˆθ = Hy c ˆθ = l Xˆθ = l Hy E[c ˆθ] = l HXθ = l Xθ = c θ Likewise for C = c... c m, E[Cˆθ] = Cθ if C = LX

9 STAT 52 Completely Randomized Designs 9 CRD Examples: τ i isn t estimable µ i = α + τ i is estimable, but is a function of experiment as well as treatment µ i µ j = τ i τ j is estimable... comparative (see X, slide 5)

10 STAT 52 Completely Randomized Designs 0 Variances: Recall for estimable Cθ, Cˆθ = (LX)(X X) X y = LHy V ar[cˆθ] = LH(σ 2 I)H L = σ 2 LHL = σ 2 (LX)(X X) (X L ) = σ 2 C(X X) C... generally V ar[cˆθ] = σ 2 C(X X) C... full rank in particular, V ar[ˆθ] = σ 2 (X X)... full rank

11 STAT 52 Completely Randomized Designs Optimal allocation: Interest in estimating Cθ well, i.e. with minimum variance Allocate N units to treatments to minimize the average contrast variance min ni,i=...t ave j σ 2 c j (X X) c j, subject to t i= n i = N or g-inv if X less than full rank and Cθ estimable c j, the jth row in C σ 2 is a positive constant, doesn t matter min ni,i=...t trace C(X X) C, subject to t i= n i = N min ni,i=...t trace C C(X X), subject to t i= n i = N linear optimality, generalization of A-optimality If X is not of full rank but Cθ is estimable, same argument applies to (X X)

12 STAT 52 Completely Randomized Designs 2 Example: Full-rank cell means model, θ = µ (X X) = diag(/n, /n 2,.../n t ) Interest in estimating differences between the control (treatment ) and each of the other treatments C = t t

13 STAT 52 Completely Randomized Designs 3 C C = t (dimension t t) trace C C(X X) = (t )/n + t i=2 /n i want to: minimize, w.r.t. n, n 2,...n t, subject to: (t )/n + t i=2 /n i t i= n i = N

14 STAT 52 Completely Randomized Designs 4 Constrained Minimization by Method of Lagrangian Multipliers: φ = [(t )/n + t i=2 /n i] + λ( t i= n i N) (first term is what you want to minimize, second is what is constrained to be zero) Set each to zero and solve: φ/ n = (t )/n 2 + λ φ/ n i = /n 2 i + λ i = 2...t φ/ λ = t i= n i N n = (t )/λ n i = /λ i = 2...t n = t N t +(t ) n i = N t +(t )

15 STAT 52 Completely Randomized Designs 5 Note: This is really a continuous optimization technique applied to a discrete problem. In practice, need to round solutions to integer values, or use as starting point in a discrete search Note also that Cov[ µ µ i, µ µ j ] = σ 2 /n relatively larger n reduces these... Postscript: intuition might say you need less info about control, since you have more experience with it... but this means you intend to USE that information... e.g. via a Bayesian prior

16 STAT 52 Completely Randomized Designs 6 Example: (less-than-full-rank version) Effects model, θ = (α τ... τ t ) N n n 2... n t n n X X = n 2 0 n n t n t (X X) = /n /n /n t

17 STAT 52 Completely Randomized Designs 7 General algorithm for g-inverse of symmetric matrices (e.g. Linear Models, Searle, Wiley) find a full-rank, principal minor of maximum dimension invert it fill in zeros everywhere else C = C C and this (X X) are as before, but with leading row/column of zero... so problem and solution are the same

18 STAT 52 Completely Randomized Designs 8 Hypothesis testing: a review Alternative (full) and Null (restricted) hypotheses: Hyp A : y = X A θ A + ɛ Hyp 0 : y = X 0 θ 0 + ɛ = X A Pθ 0 + ɛ Columns of X 0 are all linear combinations of columns of X A X A is N-by-p A, P is p A -by-p 0, p 0 < p A

19 STAT 52 Completely Randomized Designs 9 Test for treatment effects: cell means model X A =, P =, X 0 = Test for treatment effects: effects model X A =, P = or 0, X 0 =

20 STAT 52 Completely Randomized Designs 20 Test for additional terms : partitioned model Hyp 0 : y = X θ + ɛ Hyp A : y = X θ + X 2 θ 2 + ɛ This is a special case of the general form: X A = (X, X 2 ), P = I 0, X 0 = X A P = X But X 0 doesn t have to be a subset of columns from X A

21 STAT 52 Completely Randomized Designs 2 For either model: SSE = i j (y i,j ŷ i,j ) 2... sometimes resid = i j (y i,j x iˆθ) 2, where x i is row from X = i j (y i,j x i (X X) X y) 2 = squared length of y X(X X) X y = y (I H)(I H)y = y (I H H + H 2 )y = y (I H)y (key result for what s coming up) = y y (Hy) (Hy) = length 2 of y - length 2 of Hy

22 STAT 52 Completely Randomized Designs 22 Given truth of Hyp A (which means Hyp 0 MAY also be true), the validity of Hyp 0 is judged by how much worse the data fit the corresponding model: SST = SSE(Hyp 0 ) - SSE(Hyp A ) = y (I H 0 )y y (I H A )y = y H A y y H 0 y = y (H A H 0 )y Notation: Here and later, the subscript on H is the subscript on the corresponding X. Here it is 0 or A, later also or 2.

23 STAT 52 Completely Randomized Designs 23 Example: Full-rank (cell means) model Hyp 0 : µ = µ 2 =... = µ t, or y i,j = µ + ɛ i,j Hyp A : at least two differ, or y i,j = µ i + ɛ i,j X 0 = n n2..., H 0 = ( ) = N J N N X A = nt n... 0 n 0 n n , H A = n J n n... 0 n n t 0 n2 n... 0 n2 n t nt... nt 0 nt n... n t J nt n t

24 STAT 52 Completely Randomized Designs 24 Letting y i represent the n i -element vector of observations associated with treatment i, SST = y H A y y H 0 y = i y i ( n i J)y i y ( N J)y = i n i y 2 i. N y2.. = i n i(ȳ i. ) 2 N(ȳ.. ) 2 = i n i(ȳ i. ȳ.. ) 2 Example (cont.) Reduced-rank (effects) model, same problem Due to the invariance of the hat matrices, H 0 and H A are the same as before So, the resultant sums of squares are also the same

25 STAT 52 Completely Randomized Designs 25 Expectations: Recall that for random y with E[y] = m, V ar[y] = V E[y Ay] = m Am + trace(av) Suppose Hyp A really is true (perhaps also Hyp 0 ), so: E[y] = X A θ A E[SSE(Hyp A )] = θ AX A (I H A)X A θ A + trace((i H A )σ 2 I)) = θ AX A (X A X A )θ A + σ 2 (trace(i) trace(h A )) = 0 + σ 2 (N rank(x A )) (because for idempotent matrices, rank=trace, and rank(h) = rank(x))

26 STAT 52 Completely Randomized Designs 26 E[ SSE( Hyp 0 ) ] = θ AX A (I H 0)X A θ A + trace((i H 0 )σ 2 I)) = θ AX A (I H 0)X A θ A + σ 2 (N rank(x 0 )) = Q A 0 (θ A ) + σ 2 (N rank(x 0 )) (subscript A 0 comes from space(x A ) -space(x 0 )) For the partitioned model: = (θ X + θ 2X 2)(I H )(X θ + X 2 θ 2 ) + σ 2 (N rank(x )) = θ 2X 2(I H )X 2 θ 2 + σ 2 (N rank(x )) (X terms vanish... why?) = Q 2 (θ 2 ) + σ 2 (N rank(x )) (subscript 2 comes from space(x 2 ) -space(x )) E[ SST ] = Q 2 (θ 2 ) + (rank(x A ) rank(x 0 ))σ 2

27 STAT 52 Completely Randomized Designs 27 Example: all-treatments-equal hypothesis In either parameterization: rank(x A ) rank(x 0 ) = t H 0 = ( ) = N J Full rank: X A (I H 0)X A = X A X A N X A JX A Q A 0 (µ) Reduced rank: = diag(n) N nn, where n = (n, n 2,...n t ) = t i= n iµ 2 i N ( n i µ i ) 2 = t i= n i(µ i µ) 2, µ = N t i= n iµ i X 2(I H )X 2 = X A (I H 0)X A in full-rank case Q 2 (τ ) = t i= n i(τ i τ) 2, τ = N ni τ i

28 STAT 52 Completely Randomized Designs 28 Distribution theory: a review. y MV N(m, σ 2 I), A and B p.s.d. symmetric, Any two of: A and B idempotent A + B idempotent AB = 0 Implies all of: σ 2 y Ay χ 2 (rank(a), σ 2 m Am) σ 2 y By χ 2 (rank(b), σ 2 m Bm) the two quadratic forms are independent 2. For idempotent A, rank(a) = trace(a)

29 STAT 52 Completely Randomized Designs 29 Implications for CRD: σ 2 SSE(Hyp A ) = σ 2 y (I H A )y χ 2 (N t, 0) i.e. central σ 2 SST= σ 2 y (H A H 0 )y χ 2 (t, Q(τ )/σ 2 ) SST t / SSE (Hyp A ) N t F (t, N t, Q(τ )/σ 2 ) or, replace τ with µ in Q... same thing Other things being equal, larger Q greater power

30 STAT 52 Completely Randomized Designs 30 Example: CRD N = 20 t = 3 n = n 2 = 8, n 3 = 4 numerator df = 2 denominator df = 7 Suppose µ = µ 2 = 2 µ 3 = 3 σ =.5 Then µ =.8 Q( µ ) = 8(.8) 2 + 8(2.8) 2 + 4(3.8) 2 =.2 Q( µ )/σ 2 =

31 STAT 52 Completely Randomized Designs 3 R: qf(.95, 2, 7) th quantile of central F, i.e. critical value -pf(3.595, 2, 7, ) probability of value in critical region for non-central F SAS: proc iml; finv(.95, 2, 7); probf(3.595, 2, 7, );

32 STAT 52 Completely Randomized Designs 32 A practical point: May not know enough about problem to speculate about the (complete) true value of µ, but willing to guess: Bounds for Q (assume equal n i ) D = µ max µ min = τ max τ min Most favorable situation (Q greatest) greatest variance of µ s t/2 µ s = µ max t/2 µ s = µ min Q = n i t ( 2 D)2 = N D 2 /4 Least favorable situation (Q smallest) smallest variance of µ s µ = µ max µ = µ min others = (µ max + µ min )/2 Q = 2 n i ( 2 D)2 = n i D 2 /2

33 STAT 52 Completely Randomized Designs 33 Reduced normal equations for τ : y = Xθ + ɛ = X β + X 2 τ + ɛ β are nuisance parameters for our CRD effects model, X β = α Full normal equations: (X X)ˆθ = X y X X X X 2 ˆβ = X 2X X 2X 2 ˆτ X y X 2y X X ˆβ + X X 2 ˆτ = X y X 2X ˆβ + X 2 X 2 ˆτ = X 2y

34 STAT 52 Completely Randomized Designs 34 X X ˆβ + X X 2 ˆτ = X y X 2X ˆβ + X 2 X 2 ˆτ = X 2y Pre-multiply first equation by X 2X (X X ), to match leading st term of second equation will need g-inverse if X is of less than full rank X 2X ˆβ + X 2 H X 2 ˆτ = X 2H y Subtract this from the second equation to get reduced normal equations : X 2(I H )X 2 ˆτ = X 2(I H )y

35 STAT 52 Completely Randomized Designs 35 Pattern of reduced normal equations: Same form found in weighted regression... there I H is replaced with a weight matrix, any matrix proportional to the inverse of V ar(y) X 2 -model projected into null space of X Same as regression of resid(y : X ) on resid(x 2 : X ) Note reliance of form on X 2 and H X 2, or (I H )X 2...

36 STAT 52 Completely Randomized Designs 36 In fact, suppose we transform the independent variables in this problem, pretending the model isn t really partitioned, and that the single model matrix is: X 2 = (I H )X 2 then the usual normal equations are: X 2 X 2 ˆτ = X 2 y i.e. the correct result... we ve corrected for β by projecting X 2 out of the space spanned by X so, for example, c τ is estimable iff c = l X 2 V ar(ĉ τ ) = σ 2 c (X 2 X 2 ) c

37 STAT 52 Completely Randomized Designs 37 X 2 X 2 is called the design information matrix in the book, and sometimes denoted I 2. For CRD s: H = ( ) = N J N N X 2 = (I H )X 2 = ( n N ) n n 2 N n... n t N n n N n 2 ( n 2 N ) n 2... n t N n n N n t n 2 N n t... ( n t N ) n t I 2 = X 2 X 2 = diag(n) N nn, where n = (n, n 2,..., n t ) Note that X 2 and I 2 are each of rank t, reflecting the fact that τ isn t estimable

38 STAT 52 Completely Randomized Designs 38 So, for CRD s, given interest in τ and accounting for α, the precision of ĉ τ is determined by σ 2 c I 2 c one g-inverse of I 2 = diag(n) N nn is I 2 = diag (n)... check it the power of the test for equality of τ s is determined by τ I 2 τ /σ 2

39 STAT 52 Completely Randomized Designs 39 A brief note on nonlinear models: How do nonlinear models change things? Consider a very simple example: t = 2, and for the two treatments respectively: y = exp(α + τ ) + ɛ y = exp(α + τ 2 ) + ɛ i.e. the effects model we ve used, but wrapping the fixed part of the model in exp. As in our Optimal Allocation discussion, consider: Designs for which the contrast τ τ 2 is to be estimated as precisely as possible, and Constrained design for which n + n 2 = N (given) The linear algebra framework we ve been using isn t entirely adequate, so we need to talk about variances in more fundamental terms:

40 STAT 52 Completely Randomized Designs 40 For normal i.i.d. ɛ, the likelihood function for this problem is: L = n i= c σexp[ (y i exp(α + τ )) 2 /2σ 2 ] n2 j= c σexp[ (y j exp(α + τ 2 )) 2 /2σ 2 ] Given the necessary regularity conditions, the Fisher Information Matrix for efficient inference concerning θ = (α, τ, τ 2 ) is: E 2 θ 2 lnl = ( nexp 2 (α + τ ) + n 2 exp 2 (α + τ 2 ) n exp 2 (α + τ ) n 2 exp 2 (α + τ 2 ) σ 2 n exp 2 (α + τ ) n exp 2 (α + τ ) 0 n 2 exp 2 (α + τ 2 ) 0 n 2 exp 2 (α + τ 2 ) For estimable functions, asymptotic variances can be computed from a g-inverse of this, e.g V = σ 2 0 n exp 2 (α+τ ) n 2 exp 2 (α+τ 2 ) )

41 STAT 52 Completely Randomized Designs 4 As we would do in the linear models setting, we can compute a variance for the MLE of the estimable contrast τ τ 2 as c Vc, where c = (0, +, ) Using Lagrangian multipliers, you can show that the optimal allocation for any N is n n 2 = exp 2 3 (τ 2 τ ) The trouble is, this is a function of θ... i.e., the optimal design (or the performance of any design) depends on the unknown parameters you set out to investigate! The beauty of linear models, from the design perspective, is that the parameter values don t matter... the parameters disappear as a result of how differentiation treats the linear form in Fisher s Information. How to do Optimal Allocation in a nonlinear setting?

42 STAT 52 Completely Randomized Designs 42 Two popular practical approaches:. Bayesian Design (which, it can be argued, really isn t Bayesian, but is often helpful): Specify a prior distribution for the parameter vector, p(θ) Find the design that minimizes E θ c Vc This tends to work well when the there is some reasonable prior knowledge of the parameters, i.e. when p(θ) is not too diffuse. 2. Sequential Experimentation: Begin with a reasonable first-stage design and iterate: Compute ˆθ based on the data collected so far Compute the next-stage design to minimize c Vc as if θ = ˆθ For some problems, this can be shown to converge to the known-parameters optimal design, but this convergence may be slow unless the first-stage design is actually reasonable

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix