Lecture 7 Randomized Complete Block Design (RCBD) [ST&D sections (except 9.6) and section 15.8]

Lecture 7 Randomized Complete Block Design () [ST&D sections 9.1 9.7 (except 9.6) and section 15.8] The Completely Randomized Design () 1. It is assumed that all experimental units (EU's) are uniform.. Treatments are randomly assigned to EUs such that each treatment occurs equally often in the experiment. (1 randomization per experiment) 3. It is advocated to include as much of the native variability of the experiment as possible within each EU. 4. When EU's are not uniform, experimental error () increases, F (MST/) decreases, and the experiment loses sensitivity. If the experiment is replicated in a variety of situations to increase its scope, the variability increases even further. The Randomized Complete Block Design () 1. The population of EU's is divided into a number of relatively homogeneous subpopulations or blocks, and it is assumed that all EU's within a given block are uniform.. Within each block, treatments are randomly assigned to EU's such that each treatment occurs equally often (usually once) in each block. (1 randomization per block) 3. It is advocated to minimize the native variability as much as possible within blocks and to maximize the native variability as much as possible among blocks. 4. Variation among blocks can be partitioned out of the experimental error (), thereby reducing this quantity and increasing the power of the test. Additional variability introduced when increasing the scope of the experiment can also be partitioned out of the. Blocks usually represent levels of naturally-occurring differences or sources of variation that are unrelated to the treatments, and the characterization of these differences is not of interest to the researcher. 1

Example: A field trial comparing three cultivars (A, B, and C) of sugar beet with four replications. North end of field 1 3 Hi N 4 5 6 7 8 9 10 11 1 South end of field Low N : One randomization per experiment North end of field B C B Hi N C A C A B A C B A South end of field Low N : One randomization per block Block North end of field Hi N 1 1 3 1 3 3 1 3 4 1 3 South end of field Low N Block North end of field Hi N 1 B A C A B C 3 A C B 4 A C B South end of field Low N

The linear model The model underlying each observation in the experiment: Y And the sum of squares: t r Y ij = µ + τ i + β j + ε ij ij = Y.. ij + Y + ( Y i. Y..) + ( Y. j Y..) + ( Y Y i. Y. j..) t ( Yij Y ) = r ( Y i. Y..) + t ( Y. j Y..) +.. ( Y Y i. Y. j+ Y..) i= 1 j= 1 i= 1 j= 1 i= 1 j= 1 r TSS = SST + SSB + SSE t r ij This partitioning of variance is possible because the sums of squares of treatments, blocks, and error are orthogonal to one another. This orthogonality is a direct result of the completeness of the block design. Source df SS MS F Total rt - 1 TSS Treatments t - 1 SST SST/(t-1) MST/ Error t(r - 1) TSS-SST SSE/r(t-1) (one replication per block-treatment combination) Source df SS MS F Total rt - 1 TSS Treatments t - 1 SST SST/(t-1) MST/ Blocks r - 1 SSB SSB/(r-1) Error (t-1)(r-1) TSS-SST-SSB SSE/(t-1)(r-1) 1. has (r - 1) fewer df e than the.. If there are no differences among blocks (SSB = 0), <. 3. If there are large enough differences among blocks (SSB >> 0), >. 3

Example: An experiment was conducted to investigate the effect of estrogen on weight gain in sheep. The treatments are combinations of sex of sheep (M, F) and level of estrogen (Est 0, Est 3 ). The sheep are blocked by ranch, with one replication of each treatment level at each ranch. Ranch Trtmt 1 3 4 M Est 0 M Est 3 F Est 0 F Est 3 Effect of estrogen on weight gain in sheep (lbs). Ranch (i.e. block) Treatment Treatment I II III IV Total Mean F-S0 47 5 6 51 1 53 M-S0 50 54 67 57 8 57 F-S3 57 53 69 57 36 59 M-S3 54 65 74 59 5 63 Block Total 08 4 7 4 98 Block Mean 5 56 68 56 58 ANOVA (treating blocks as reps) Source df SS MS F Totals 15 854 Treatment 3 08 69.33 1.9 NS Error 1 646 53.83 ANOVA Source df SS MS F Total 15 854 Treatment 3 08 69.33 8.91** Blocks 3 576 19.00 4.69** Error 9 70 7.78 4

Relative efficiency: When to block? F = MST = SSE F crit = Fα, df df trt, df e e Blocking reduces SSE, which reduces. Blocking reduces df e, which increases and increases F crit. The concept of relative efficiency formalizes the comparison between two experimental methods by quantifying this balance between loss of degrees of freedom and reduction in experimental error. The information per replication in a given design is: I = 1 df + 1 1 df 3 + ε RE 1: = I I 1 df df df df 1 1 + 1 1 3 + + 1 1 3 + 1 ( df = ( df 1 + 1)( df + 1)( df 1 + 3) + 3) 1 The main complication is how to estimate for the alternative design. If an experiment was conducted as an, can be estimated by the following formula (ST&D ): ˆ df B MSB + ( df df + df B T T + df + df e e ) Assume TSS of the two designs is the same. Rewrite TSS in terms of its components and simplify the expression. 5

For the interested: Derivation of the expected 1. Set the total sums of squares of each design equal to each other and rewrite in terms of mean squares and degrees of freedom: TSS SST df T ( MST R ( t 1) MST = TSS + SSB R + df B( + SSE MSB R + ( r 1) MSB R + df = SST R + SSE = df T ( C) + ( t 1)( r 1) R MST C + df C) = ( t 1) MST C C + t( r 1) C. Replace each mean square with the variance components of its expected mean square: ( t 1)( + r T ( ) + ( r 1)( [( t 1) + ( r 1) + ( t 1)( r 1)] t( r 1) C) = B( + ( tr 1) t( r 1) B( + ( tr 1) + t B( + t( r 1) = ( tr 1) C) ) + ( t 1)( r 1) B( + r( t 1) T ( = ( t 1)( C) + r T ( C ) = [( t 1) + t( r 1)] C) ) + t( r 1) + r( t 1) C ) T ( C) 3. Finally, rewrite this expression in terms of mean squares and degrees of freedom: = MSB + t( r 1) t( tr 1) MSB = + ( r 1) ( r 1) tr 1 tr 1 MSB = [( tr 1) ( r 1)] + ( r 1) tr 1 tr 1 r( t 1) + ( r 1) MSB = tr 1 ( dft ( + df ) + df BMSB = df + df + df T ( B( 6

Example: From the sheep experiment, = 7.78 and MSB = 19.0. Therefore: ˆ df BMSB + ( dft + dfe) 3(19) + (3+ 9)7.78 M SE = = 44.6 df + df + df 3+ 3+ 9 B T e RE ( df ( df + 1)( df + 1)( df + 3) ˆ + 3) (9+ 1)(1+ 3)44.6 = (1+ 1)(9+ 3)7.78 1 : = 1 5.51 Interpretation: It takes 5.51 replications in the to produce the same amount of information as one replication in the. Or, the is 5.51 time more efficient than the in this case. 1. When there are no significant differences among blocks distribution of the difference between two means () distribution of the difference between two means () α/ SSE = SSE > 0 Power < Power Power 1 β β 7

. When there are significant differences among blocks distribution of the difference between two means () α/ distribution of the difference between two means () 0 df e < df e SSE < SSE Power > Power or Power < Power Power 1 β β Assumptions of the model The model for the with a single replication per block-treatment combination: Y ij = µ + τ i + β j + ε ij 1. The residuals (ε ij ) are independent, homogeneous, and normally distributed.. The variance within each treatment levels is homogeneous across all treatment levels. 3. The main effects are additive. 8

Recall that experimental error is defined as the variation among experimental units that are treated alike. Ranch Trtmt 1 3 4 M Est 0 M Est 3 F Est 0 F Est 3 There is an expected value for each sheep, given by: Expected Y ij = µ + τ i + β j Observed Y ij = µ + τ i + β j + ε ij With only one replication per cell, the residuals are the combined effects of experimental error and any non-additive treatment*block interactions: ε ij = τ i *β j + error ij So when we use ε ij as estimates of the true experimental error, we are assuming that τ i *β j = 0. This assumption of no interaction is referred to as the assumption of additivity of the main effects. If this assumption is violated, all F-tests will be very inefficient and possibly misleading, particularly if the interaction effect is very large. 9

Example: Consider the hypothetical effects of two factors (A, B) on some response variable: Purely additive Factor A Factor B τ 1 = 1 τ = τ 3 = 3 β 1 = 1 3 4 β = 5 6 7 8 Purely multiplicative Factor A Factor B τ 1 = 1 τ = τ 3 = 3 β 1 = 1 1 3 β = 5 5 10 15 If multiplicative data of this sort are analyzed by a conventional ANOVA, the interaction SS will be large due to the nonadditivity of the treatment effects. In the case of multiplicative effects, there is a simple remedy. Simply transform the variable by taking the log of each mean to restore additivity: Restoring additivity (log of data) Factor A Factor B τ 1 = 0.00 τ = 0.30 τ 3 = 0.48 β 1 = 0.00 0.00 0.30 0.48 β = 0.70 0.70 1.00 1.18 10

Tukey s 1-df test for nonadditivity [ST&D 395] Under our linear model, each observation is characterized as: yij = µ + β + τ + ε i j ij The predicted value of each individual is given by: pred ij = µ + β + τ i j So, if we had no error in our experiment (i.e. if ε = 0 ), the observed data would exactly match its predicted values and a correlation plot of the two would yield a perfect line with slope = 1: ij Observed vs. Predicted Values (, no error) 0 18 Observed 16 14 1 10 10 1 14 16 18 0 Predicted Now let's introduce some error: Observed vs. Predicted (, with error) 0 18 Observed.. 16 14 1 10 10 1 14 16 18 0 Predicted 11

But what happens when you have an interaction (e.g. Block * Treatment) but lack the degrees of freedom necessary to include it in the linear model? ε ε + B*T Interaction Effects ij = RANDOMij Observed vs. Predicted (, with error and B*T) 0 18 Observed.. 16 14 1 10 10 1 14 16 18 0 Predicted SO, if the observed and predicted values obey a linear relationship, then the nonrandom Interaction Effects buried in the error term are sufficiently small to uphold our assumption of additivity of main effects. This test is easily implemented using SAS: Data Lambs; Input Sex_Est $ @@; Do Ranch = 1 to 4; Input Gain @@; Output; End; Cards; F0 47 5 6 51 M0 50 54 67 57 F3 57 53 69 57 M3 54 65 74 59 ; Proc GLM; Class Sex_Est Ranch; Model Gain = Ranch Sex_Est; Output out = LambsPR p = Pred r = Res; Proc GLM; Class Sex_Est Ranch; Model Gain = Ranch Sex_Est Pred*Pred; Run; Quit; * This is the Tukey 1 df test; NOTE: Pred*Pred is not in the Class statement, and it is the last term in the Model. 1

Output from the nd Proc GLM (the Tukey 1 df test): Source DF Type III SS Mean Square F Value Pr > F Ranch 3 0.73506585 0.450195 0.03 0.997 Sex_Est 3 0.980876 0.099364 0.01 0.9981 Pred*Pred 1 3.4188034 3.4188034 0.41 0.5395 NS This test is necessary ONLY when there is one observation per block-treatment combination. If there are two or more replications per block-treatment combination, the block*treatment interaction can be tested directly in an exploratory model. Example: Yield of penicillin from four different protocols (A D). Blocks are different stocks of an important reagent. The numbers below each observation (O) are the predicted values (P = Grand Mean + Treatment effect + Block effect) and the residuals (. Block Stock 1 Stock Stock 3 Stock 4 Stock 5 Treatment A B C D O: 88 O: 97 O: 94 P: 91 P: 95 P: 9 R: -3 R: R: O: 89 P: 90 R: -1 O: 84 P: 81 R: 3 O: 81 P: 83 R: - O: 87 P: 86 R: 1 O: 79 P: 80 R: -1 O: 77 P: 8 R: -5 O: 87 P: 84 R: 3 O: 9 P: 87 R: 5 O: 81 P: 81 R: 0 O: 9 P: 86 R: 6 O: 87 P: 88 R: -1 O: 89 P: 91 R: - O: 80 P: 85 R: -5 O: 79 P: 83 R: -4 O: 85 P: 85 R: 0 O: 84 P: 88 R: -4 O: 88 P: 8 R: 6 Treatment mean 84 85 89 86 Treatment effect - -1 3 0 Block Mean Block Effect 9 +6 83-3 85-1 88 8-4 Mean = 86 13

The SAS code for such an analysis: Data Penicillin; Do Block = 1 to 4; Do Trtmt = 1 to 4; Input Yield @@; Output; End; End; Cards; 89 88 97 94 84 77 9 79 81 87 87 85 87 9 89 84 79 81 80 88 ; Proc GLM Data = Penicillin; Class Block Trtmt; Model Yield = Block Trtmt; Output out = PenPR p = Pred r = Res; Proc Univariate Data = PenPR normal; * Testing for normality of residuals; Var Res; Proc GLM Data = Penicillin; * Testing for variance homogeneity (1 way ANOVA); Class Trtmt; Model Yield = Trtmt; Means Trtmt / hovtest = Levene; Proc Plot Data = PenPR; * Generating a plot of predicted vs. residual values; Plot Res*Pred = Trtmt; Proc GLM Data = PenPR; * Testing for nonadditivity; Class Block Trtmt; Model Yield = Block Trtmt Pred*Pred; Run; Quit; This dataset meets all assumptions: normality, variance homogeneity, and additivity: Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.967406 Pr < W 0.6994 NS Levene's Test for Homogeneity of Yield Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F Trtmt 3 9. 307.4 0.39 0.760 NS Error 16 1618.8 788.7 Tukey's 1 df test for nonadditivity Source DF Type III SS Mean Square F Value Pr > F Block 3 8.915818 9.43071939 0.8 0.8365 Trtmt 3 8.681454 9.471508 0.8 0.8367 Pred*Pred 1 6.46875000 6.46875000 0.79 0.3901 NS 14

No particular pattern presents itself in this graph. 15

Nesting within an Data Lambs; Input Sex_Est $ Ranch Animal Gain @@; Cards; f0 1 1 46 f0 1 51 f0 3 1 61 f0 4 1 50 m0 1 1 49 m0 1 53 m0 3 1 66 m0 4 1 56 f3 1 1 56 f3 1 5 f3 3 1 68 f3 4 1 56 m3 1 1 53 m3 1 64 m3 3 1 73 m3 4 1 58 f0 1 1 48 f0 1 53 f0 3 1 63 f0 4 1 5 m0 1 1 51 m0 1 55 m0 3 1 68 m0 4 1 58 f3 1 1 58 f3 1 54 f3 3 1 70 f3 4 1 58 m3 1 1 55 m3 1 66 m3 3 1 75 m3 4 1 60 ; Proc GLM Data = Lambs Order = Data; Class Ranch Sex_Est Animal; Model Gain = Ranch Sex_Est Animal(Ranch*Sex_Est); Random Animal(Ranch*Sex_Est); Test h = Sex_Est e = Animal(Ranch*Sex_Est); Contrast 'sex' Sex_Est 1-1 1-1 / e = Animal(Ranch*Sex_Est); Contrast 'estrogen' Sex_Est 1 1-1 -1 / e = Animal(Ranch*Sex_Est); Contrast 'interaction' Sex_Est 1-1 -1 1 / e = Animal(Ranch*Sex_Est); Means Sex_Est / Tukey e = Animal(Ranch*Sex_Est); Proc Varcomp Method = Type1; Class Ranch Sex_Est Animal; Model Gain = Ranch Sex_Est Animal(Ranch*Sex_Est); Run; Quit; Ranch Trtmt 1 3 4 M Est 0 M Est 3 F Est 0 F Est 3 measurements Type 1 Analysis of Variance Source Ranch Sex_Est Animal(Ranch*Sex_Es) Error Expected Mean Square Var(Error) + Var(Animal(Ranch*Sex_Es)) + 8 Var(Ranch) Var(Error) + Var(Animal(Ranch*Sex_Es)) + 8 Var(Sex_Est) Var(Error) + Var(Animal(Ranch*Sex_Es)) Var(Error) Type 1 Estimates Variance Component Estimate % Var(Ranch) 46.05556 65.6 Var(Sex_Est) 15.38889 1.9 Var(Animal(Ranch*Sex_Es)) 6.77778 9.7 Var(Error).00000.8 The only reason to analyze this dataset as a nested is to calculate the variance components. If you do not need the variance components, simply average the subsamples for each experimental unit and analyze it as a simple. 16

with multiple replications per block-treatment combination Data Lambs; Do Ranch = 1 to 4; Do Sex_Est = 'M0', 'M3', 'F0', 'F3'; Input Gain @@; Output; End; End; Cards; 58 64 46 66 56 57 46 54 59 68 51 63 54 64 56 6 Ranch Trtmt 1 3 4 M Est 0 M Est 3 F Est 0 F Est 3 58 6 58 58 53 63 45 53 56 73 46 54 55 71 5 57 ; Proc GLM Data = Lambs Order = Data; Class Ranch Sex_Est; Model Gain = Ranch Sex_Est Ranch*Sex_Est; e * The exploratory model; Proc GLM Data = Lambs Order = Data; Class Ranch Sex_Est; Model Gain = Ranch Sex_Est; Output out = LambsPR p = Pred r = Res; Contrast 'sex' Sex_Est 1-1 1-1; Contrast 'estrogen' Sex_Est 1 1-1 -1; Contrast 'interaction' Sex_Est 1-1 -1 1; Means Sex_Est / Tukey; * The ANOVA; Proc Univariate Data = LambsPR normal; Var Res; Proc GLM Data = Lambs Order = Data; Class Sex_Est; Model Gain = Sex_Est; Means Sex_Est / hovtest = Levene; * Testing normality; * Levene's test (1 way ANOVA); Run; Quit; Output from the exploratory model: Sum of Source DF Squares Mean Square F Value Pr > F Model 15 164.875000 84.35000 5.51 0.0008 Error 16 45.000000 15.31500 Corrected Total 31 1509.875000 Source DF Type III SS Mean Square F Value Pr > F Ranch 3 176.150000 58.7083333 3.83 0.0304 Sex_Est 3 951.650000 317.083333 0.7 <.0001 Ranch*Sex_Est 9 137.150000 15.361111 1.00 0.4811 NS 17

1 rep/cell 1 rep/cell with subsamples T1 T T1 T B1 B1 B B Class Block Trtmt; Model Y = Block Trtmt; Tukey Test Required Class Block Trtmt Pot; Model Y = Block Trtmt Pot(Block*Trtmt); Random Pot(Block*Trtmt); Test h = Trtmt e = Pot(Block*Trtmt); Tukey Test Required >1 rep/cell >1 rep/cell with subsamples T1 T T1 T B1 B1 B B Exploratory model: Class Block Trtmt; Model Y = Block Trtmt Block*Trtmt; Tukey Test not Required ANOVA: Class Block Trtmt; Model Y = Block Trtmt; Exploratory model: Class Bl Trt Pot; Model Y = Bl Trt Bl*Trt Pot(Bl*Trt); Random Pot(Bl*Trt); Test h = Bl*Trt e = Pot(Bl*Trt); Tukey Test not Required ANOVA: Class Bl Trt Pot; Model Y = Bl Trt Pot(Bl*Trt); Random Pot(Bl*Trt); Test h = Trt e = Pot(Bl*Trt); 18