Randomized Complete Block Designs David Allen University of Kentucky February 23, 2016
1 Randomized Complete Block Design There are many situations where it is impossible to use a completely randomized design. Suppose a researcher at an agricultural experiment station wants to conduct trials to compare six fertilization programs for wheat with regard to nitrate content of the plants. It would be desirable to make recommendations apply across the state. Because of variability in climate, soil type, fertility, etc. trials would be conducted at multiple locations. The trial at each location should be balanced with respect to treatments. Hence a separate randomization is done at each location. 2
Kentucky 3
Data from an example in Kuehl [1] are shown in Table 1. Block 1 Treatment 2 5 4 1 6 3 Response 40.89 37.99 37.18 34.98 34.89 42.07 2 Treatment 1 3 4 6 5 2 Response 41.22 49.42 45.85 50.15 41.99 46.69 3 Treatment 6 3 5 1 2 4 Response 44.57 52.68 37.61 36.94 46.65 40.23 4 Treatment 2 4 6 5 3 1 Response 41.90 39.20 43.29 40.45 42.91 39.97 Table 1: Nitrate content of wheat plants 4
The Model The model in scalar notation is Y j = μ + b j + ε j (1) where μ is the mean of the th treatment group, = 1,, t, b j is a random component associated with the jth block, j = 1,, b and ε j is a random component specific to the th treatment and jth block. The b j are independently distributed N(0, σ 2 b ), and the ε j are independently distributed N(0, σ 2 ). The b j and ε j are jointly independent. 5
Study Objectives The objectives of the study are 1. to test hypotheses, or place confidence intervals, on linear combinations of treatment means, and 2. to test the null hypothesis σ 2 b = 0. The test of the null hypothesis that the treatment means are all equal is commonly applied, but is discussed in another presentation. 6
Learning Objectives I am sure you have encountered the randomized complete block design before. However, when the linear combinations of treatment means is not a contrast, there is not an exact test or confidence interval. The Satterthwaite procedure is used to obtain approximate tests or confidence intervals. 7
Analysis by SAS I m thinking that an analysis by SAS might help set the stage for the more theoretical development later. SAS code evoking proc glimmix is supplied that addresses the objectives above for some specific examples. Note the use of the model statement option ddfm = satterthwaite, and on the output, there is fractional degrees of freedom associated with the confidence interval for the first treatment mean. Exercise 1.1. Run SAS proc glimmix on the data in Table 1. State your conclusions relative to the study objectives stated on page 6. 8
2 Analytic Analysis For the randomized complete block design, estimates of the population means are just the sample means. Derivation of the formula for the standard errors of a linear combination of sample means is the subject of this Section. Let b (with no subscript) denote the number of blocks in the design and t the number of treatments. 9
Variances and Covariances The th and jth sample means in terms of model (1) are Ȳ = μ + b + ε Ȳ j = μ j + b + ε j The variance of each mean is σ2 b b + σ2. The covariance b between the th and jth means is E ( b + ε )( b + ε j ) = E b 2 + b ε j + b ε + ε ε j = E b 2 = σ2 b b 10
Variance of a Linear Combination The variance of a linear combination is V r c Ȳ = c 2 σ 2 b + c 2 σ 2 b b. If c = 0 the linear combination is called a contrast. For contrasts, σ 2 drops out of the variance and inference is b straight forward. 11
Inference on Non-Contrasts If a linear combination is not a contrast, it may not be possible to construct a Student s-t statistic. In such a case, it is common to use the Satterthwaite procedure described in the next section. 12
3 The Satterthwaite Procedure Suppose there is a situation where an immediate estimate of V r c t β does not exist. However, there are two independent sums of squares SS 1 and SS 2 with respective degrees of freedom ν 1 and ν 2. Furthermore, constants c 1 and c 2 are such that E (c 1 SS 1 /ν 1 + c 2 SS 2 /ν 2 ) = V r c t β. If we use c 1 SS 1 /ν 1 + c 2 SS 2 /ν 2 as an estimator of the variance, what value of the degrees of freedom should be used? 13
The Pivotal quantity The pivotal quantity for a confidence interval on c t β is t = c t β c t β c1 SS 1 /ν 1 + c 2 SS 2 /ν 2. Pivotal quantity above is in quotes because the distribution of t is unknown if c 1 = 0. Common practice is to use the Satterthwaite approximation [2]. This approximation may be thought of as synthesizing a mean square. 14
Decomposing t The approach is to approximate the distribution of t by a t-distribution. That reduces the problem to finding the degrees of freedom of the approximating t-distribution. Define c t β c t β Z = c 1 σ 2 1 + c 2σ 2 2 and U = then t = Z/ U. c 1 σ 2 SS 1 1 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) σ 2 + 1 c 2 σ 2 2 ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) SS 2 σ 2 2 15
The distribution of Z is standard normal. It remains to approximate the distribution of U by a Chi-square divided by it degrees of freedom, i.e. there exist a ν such that is approximately satisfied. U χ 2 (ν)/ν 16
Degrees of freedom for approximating distribution By approximately satisfied we mean U and χ 2 (ν)/ν should have the same variance. Now V r(u) = and c 1 σ 2 1 ν 1 (c 1 σ 2 1 + c 2σ 2 2 ) 2 2ν 1 + c 2 σ 2 2 ν 2 (c 1 σ 2 1 + c 2σ 2 2 ) 2 2ν = 2 c2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 (c 1 σ 2 1 + c 2σ 2 2 )2 (2) V r χ 2 (ν)/ν = 2 ν. (3) 17
Equating variances (2) and (3) and solving for ν gives ν = (c 1σ 2 1 + c 2σ 2 2 )2 c 2 1 σ4 1 /ν 1 + c 2 2 σ4 2 /ν 2 Note that ν depends on unknown parameters inpractice the σ 2 are replaced by the corresponding mean squares. 18
4 Numerical Analysis In order to display the matrices involved, a situation with fewer blocks and fewer treatments is used. A tabular layout of the responses for four treatments and three blocks is Block 1 2 3 Mean 1 Y 11 Y 12 Y 13 Ȳ 1 Treatment 2 Y 21 Y 22 Y 23 Ȳ 2 3 Y 31 Y 32 Y 33 Ȳ 3 4 Y 41 Y 42 Y 43 Ȳ 4 19
Simulated Data Simulated data for this layout is treat block response 1 1 4.92 1 2 5.36 1 3 1.78 2 1 2.96 2 2 6.79 2 3 4.63 3 1 6.86 3 2 10.02 3 3 4.55 4 1 8.43 4 2 10.30 4 3 8.13 20
The Model in Matrix Notation The model in matrix notation is Y = Xβ + Zb + ε 4.92 1 0 0 0 1 0 0 5.36 1 0 0 0 0 1 0 1.78 1 0 0 0 0 0 1 2.96 0 1 0 0 6.79 0 1 0 0 1 0 0 μ 1 0 1 0 4.63 6.86 = 0 1 0 0 μ 2 0 0 1 0 μ + 0 0 1 b 1 3 1 0 0 b 2 + ε 10.02 0 0 1 0 μ 4 0 1 0 b 3 4.55 0 0 1 0 0 0 1 8.43 0 0 0 1 1 0 0 10.30 0 0 0 1 0 1 0 8.13 0 0 0 1 0 0 1 21
The Transformed Model Elements of the transformed model are -1.73 0 0 0-0.58-0.58-0.58-6.96 0 1.73 0 0 0.58 0.58 0.58 8.30 0 0 1.73 0 0.58 0.58 0.58 12.37 0 0 0 1.73 0.58 0.58 0.58 15.51 0 0 0 0 1.63-0.82-0.82-1.07 0 0 0 0 0 1.41-1.41 4.73 0 0 0 0 0 0 0-0.73 0 0 0 0 0 0 0 0.39 0 0 0 0 0 0 0-2.19 0 0 0 0 0 0 0-1.22 0 0 0 0 0 0 0-1.39 0 0 0 0 0 0 0-0.66 22
Variance Matrix The variance matrix of the transformed Y is 1-1 -1-1 0 0 0 0 0 0 0 0-1 1 1 1 0 0 0 0 0 0 0 0-1 1 1 1 0 0 0 0 0 0 0 0-1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 σ 2 b + σ2 (4) 23
An analysis of variance is Analysis of Variance Sum of Degrees of Mean Source Squares Freedom Square 510.97 4 127.74 Blocks 23.51 2 11.76 Residual 9.33 6 1.55 24
Exercises Each of the following exercises refer to the mockup data. Exercise 4.1. Estimate σ 2 b and σ2. Exercise 4.2. Estimate μ 1 and its standard error. Put a 95% confidence interval on μ 1. Exercise 4.3. Estimate μ 1 μ 2 and its standard error. Put a 95% confidence interval on μ 1 μ 2. Exercise 4.4. Test H 0 : σ 2 = 0 with α = 0.05. What is the b power of this test if σ 2 b /σ2 = 2? 25
References [1] Robert O. Kuehl. Design of Experiments: Statistical Principles of Research Design and Analysis. Duxbury Press, second edition, 2000. [2] F. E. Satterthwaite. An approximate distribution of estimates of variance components. Biometrics Bulletin, 2:110 114, 1946. 26