Topic 28: Unequal Replication in Two-Way ANOVA
Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant n just special case
Data for two-way ANOVA Y is the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Y ijk is the k th observation in cell (i,j) k = 1 to n ij and n ij may vary
Recall Bread Example KNNL p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2: regular, wide n=2 stores for each of the 3x2 treatment combinations (BALANCED)
Regression Approach Create a-1 dummy variables to represent levels of A Create b-1 dummy variables to represent levels of B Multiply each of the a-1 dummy variables with b-1 dummy variables for B to get variables for AB LET S LOOK AT THE RELATIONSHIP AMONG THESE SETS OF VARIABLES
Common Set of Variables data a2; set a1; i X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3; i 0, 0 i ij j ij j j 0, 0
Run Proc Reg proc reg data=a2; model sales= X1 X2 X3 X13 X23 / XPX I; height: test X1, X2; width: test X3; interaction: test X13, X23; run;
X X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 Intercept 12 0 0 0 0 0 X1 0 8 4 0 0 0 X2 0 4 8 0 0 0 X3 0 0 0 12 0 0 X13 0 0 0 0 8 4 Sets of variables orthogonal Crossproducts between sets is 0 X23 0 0 0 0 4 8
Orthogonal X s Order in which the variables are fit in the model does not matter Type I SS = Type III SS Order of fit not mattering is true for all choices of restrictions when n ij is constant Orthogonality lost when n ij are not constant
KNNL Example KNNL p 954 Y is the change in growth rates for children after a treatment A is gender, a=2 levels: male, female B is bone development, b=3 levels: severely, moderately, or mildly depressed n ij =3, 2, 2, 1, 3, 3 children in the groups
Read and check the data data a3; infile 'c:\...\ch23ta01.txt'; input growth gender bone; proc print data=a1; run;
Obs growth gender bone 1 1.4 1 1 2 2.4 1 1 3 2.2 1 1 4 2.1 1 2 5 1.7 1 2 6 0.7 1 3 7 1.1 1 3 8 2.4 2 1 9 2.5 2 2 10 1.8 2 2 11 2.0 2 2 12 0.5 2 3 13 0.9 2 3 14 1.3 2 3
Common Set of Variables data a3; set a3; i i 0, 0 i ij j ij X1 = (bone eq 1) - (bone eq 3); X2 = (bone eq 2) - (bone eq 3); X3 = (gender eq 1) - (gender eq 2); X13 = X1*X3; X23 = X2*X3; j j 0, 0
Run Proc Reg proc reg data=a3; model growth= X1 X2 X3 X13 X23 / XPX I; run;
X X Matrix Model Crossproducts X'X X'Y Y'Y Variable Intercept X1 X2 X3 X13 X23 Intercept 14-1 0 0 3 0 X1-1 9 5 3 1-1 X2 0 5 10 0-1 -2 X3 0 3 0 14-1 0 X13 3 1-1 -1 9 5 X23 0-1 -2 0 5 10 Crossproduct terms no longer all 0 Order of fit matters
How does this impact the analysis? In regression, this happens all the time (explanatory variables are correlated) In regression, t tests look at significance of variable when fitted last When looking at comparing means order of fit will alter null hypothesis
Prepare the data for a plot data a1; set a1; if (gender eq 1)*(bone eq 1) then gb='1_msev '; if (gender eq 1)*(bone eq 2) then gb='2_mmod '; if (gender eq 1)*(bone eq 3) then gb='3_mmild'; if (gender eq 2)*(bone eq 1) then gb='4_fsev '; if (gender eq 2)*(bone eq 2) then gb='5_fmod '; if (gender eq 2)*(bone eq 3) then gb='6_fmild';
Plot the data title1 'Plot of the data'; symbol1 v=circle i=none; proc gplot data=a1; plot growth*gb; run;
Find the means proc means data=a1; output out=a2 mean=avgrowth; by gender bone; run;
Plot the means title1 'Plot of the means'; symbol1 v='m' i=join c=blue; symbol2 v='f' i=join c=green; proc gplot data=a2; plot avgrowth*bone=gender; run;
Plot of the means avgrowth 2.4 F 2.2 2.0 1.8 M F M Interaction? 1.6 1.4 1.2 1.0 0.8 MF 1 2 3 bone gender M M M 1 F F F 2
Cell means model Y ijk = μ ij + ε ijk where μ ij is the theoretical mean or expected value of all observations in cell (i,j) the ε ijk are iid N(0, σ 2 ) Y ijk ~ N(μ ij, σ 2 ), independent
Estimates Estimate μ ij by the mean of the observations in cell (i,j), Y ij ˆ Y Y n ij ij k ijk ij For each (i,j) combination, we can get an estimate of the variance s 2 2 ij ijk ij ij Y Y n 1 k We pool these to get an estimate of σ 2
Pooled estimate of σ 2 In general we pool the s ij2, using weights proportional to the df, n ij -1 The pooled estimate is s n ij 1 sij nij 1 2 2 ij Nothing different in terms of parameter estimates from balanced design ij
Run proc glm proc glm data=a1; class gender bone; model growth=gender bone/solution; means gender*bone; run; Shorthand way to write main effects and interactions
Parameter Estimates Solution option on the model statement gives parameter estimates for the glm parameterization These constraints are Last level of main effect is zero Interaction terms with a or b are zero These reproduce the cell means in the usual way
Parameter Estimates Parameter Estimate Standard Error t Value Pr > t Intercept 0.90000000 B 0.2327373 3.87 0.0048 gender 1-0.00000000 B 0.3679900-0.00 1.0000 bone 1 1.50000000 B 0.4654747 3.22 0.0122 bone 2 1.20000000 B 0.3291403 3.65 0.0065 gender*bone 1 1-0.40000000 B 0.5933661-0.67 0.5192 gender*bone 1 2-0.20000000 B 0.5204165-0.38 0.7108 Example: ˆ22 0.90 0.00 1.20 0.00 2.10
Output Source DF Sum of Squares Mean Square F Value Pr > F Model 5 4.4742857 0.89485714 5.51 0.0172 Error 8 1.3000000 0.16250000 Corrected Total 13 5.7742857 Note DF and SS add as usual
Output Type I SS Source DF Type I SS Mean Square F Value Pr > F gender 1 0.0028571 0.00285714 0.02 0.8978 bone 2 4.3960000 2.19800000 13.53 0.0027 gender*bone 2 0.0754286 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.47429
Output Type III SS Source DF Type III SS Mean Square F Value Pr > F gender 1 0.12000000 0.12000000 0.74 0.4152 bone 2 4.18971429 2.09485714 12.89 0.0031 gender*bone 2 0.07542857 0.03771429 0.23 0.7980 SSG+SSB+SSGB=4.38514
Type I vs Type III SS for Type I add up to model SS SS for Type III do not necessarily add up Type I and Type III are the same for the interaction because last term in model The Type I and Type III analysis for the main effects are not necessarily the same Different hypotheses are being examined
Type I vs Type III Most people prefer the Type III analysis This can be misleading if the cell sizes differ greatly Contrasts can provide some insight into the differences in hypotheses
Contrast for A*B Same for Type I and Type III Null hypothesis is that the profiles are parallel; see plot for interpretation μ 12 - μ 11 = μ 22 - μ 21 and μ 13 - μ 12 = μ 23 - μ 22 μ 11 - μ 12 - μ 21 + μ 22 = 0 and μ 12 - μ 13 - μ 22 + μ 23 = 0
A*B Contrast statement contrast 'gender*bone Type I and III' gender*bone 1-1 0-1 1 0, gender*bone 0 1-1 0-1 1; run;
Type III Contrast for gender (1) μ 11 = (1)(μ + α 1 + β 1 + (αβ) 11 ) (1) μ 12 = (1)(μ + α 1 + β 2 + (αβ) 12 ) (1) μ 13 = (1)(μ + α 1 + β 3 + (αβ) 13 ) (-1) μ 21 = (-1)(μ + α 2 + β 1 + (αβ) 21 ) (-1) μ 22 = (-1)(μ + α 2 + β 2 + (αβ) 22 ) (-1) μ 23 = (-1)(μ + α 2 + β 3 + (αβ) 23 ) L = 3α 1 3α 2 + (αβ) 11 + (αβ) 12 + (αβ) 13 (αβ) 21 (αβ) 22 αβ 23
Contrast statement Gender Type III contrast 'gender Type III' gender 3-3 gender*bone 1 1 1-1 -1-1;
Type I Contrast for gender (3) μ 11 = (3)(μ + α 1 + β 1 + (αβ) 11 ) (2) μ 12 = (2)(μ + α 1 + β 2 + (αβ) 12 ) (2) μ 13 = (2)(μ + α 1 + β 3 + (αβ) 13 ) (-1) μ 21 = (-1)(μ + α 2 + β 1 + (αβ) 21 ) (-3) μ 22 = (-3)(μ + α 2 + β 2 + (αβ) 22 ) (-3) μ 23 = (-3)(μ + α 2 + β 3 + (αβ) 23 ) L = (7α 1 7α 2 )+(2β 1 β 2 β 3 )+3(αβ) 11 +2(αβ) 12 +2(αβ) 13 1(αβ) 21 3(αβ) 22 3(αβ) 23
Contrast statement Gender Type I contrast 'gender Type I' gender 7-7 bone 2-1 1 gender*bone 3 2 2-1 -3-3;
Type III Contrast for bone Null hypothesis is that the marginal means are the same In terms of means H 0 : μ.1 = μ. 2 and μ.2 = μ.3 contrast bone Type III' bone 2-2 0 gender*bone 1-1 0 1-1 0, bone 2 0-2 gender*bone 1 0-1 1 0-1;
Contrast output Contrast DF Contrast SS Mean Square F Value Pr > F gender*bone Type I and III 2 0.07542857 0.03771429 0.23 0.7980 gender Type III 1 0.12000000 0.12000000 0.74 0.4152 gender Type I 1 0.00285714 0.00285714 0.02 0.8978 bone Type III 2 4.18971429 2.09485714 12.89 0.0031
Summary Type I and Type III F tests test different null hypotheses Should be aware of the differences Most prefer Type III as it follows logic similar to regression analysis Be wary, however, if the cell sizes vary dramatically
Comparing Means If interested in Type III hypotheses, need to use LSMEANS to do comparisons If interested in Type I hypotheses, need to use MEANS to do comparisons. We will show this difference via the ESTIMATE statement
SAS Commands Will use earlier contrast code to set up the ESTIMATE commands estimate 'gender Type III' gender 3-3 gender*bone 1 1 1-1 -1-1 / divisor=3; estimate 'gender Type I' gender 7-7 bone 2-1 -1 gender*bone 3 2 2-1 -3-3 / divisor=7;
MEANS OUPUT Level of ------------growth----------- gender N Mean Std Dev 1 7 1.65714286 0.62411843 2 7 1.62857143 0.75655862 Diff = 0.0286
LSMEANS OUPUT gender growth LSMEAN 1 1.60000000 2 1.80000000 Diff = -0.20
Estimate output Parameter Estimate Std Err gender Type III -0.200 0.2327 gender Type I 0.029 0.2155 Notice that these two estimates agree with the difference of estimates for LSMEANS or MEANS
Analytical Strategy First examine interaction Some options when the interaction is significant Interpret the plot of means Run A at each level of B and/or B at each level of A Run as a one-way with ab levels Use contrasts
Analytical Strategy Some options when the interaction is not significant Use a multiple comparison procedure for the main effects Use contrasts for main effects If needed, rerun without the interaction
Example continued proc glm data=a3; class gender bone; model growth=gender bone/ solution; For Type I hypotheses means gender bone/ tukey lines; run; Pool here because small df error
Output Source DF Sum of Squares Mean Square F Value Pr > F Model 3 4.3988571 1.46628571 10.66 0.0019 Error 10 1.3754286 0.13754286 Corrected Total 13 5.7742857
Output Type I SS Source DF Type I SS Mean Square F Value Pr > F gender 1 0.00285714 0.00285714 0.02 0.8883 bone 2 4.39600000 2.19800000 15.98 0.0008
Output Type III SS Source DF Type III SS Mean Square F Value Pr > F gender 1 0.09257143 0.09257143 0.67 0.4311 bone 2 4.39600000 2.19800000 15.98 0.0008 Although different null hypothesis for gender, both Type I and III tests are not found significant
Tukey comparisons Group Mean N bone A 2.1000 4 1 A A 2.0200 5 2 B 0.9000 5 3
Tukey Comparisons Why don t we need a Tukey adjustment for gender? Means statement does provide mean estimates so you know directionality of F test but that is all the statement provides you
Last slide Read KNNL Chapter 23 We used program topic28.sas to generate the output for today