Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described in full by KNNL p.. I have been unable to find this scheme in R, so I coded it by hand. Results match KNNL, but do not entirely match R's anova() report using either contr.sum or contr.treatment codings. Example comes from Chapter in Kuter et al. (KNNL) Applied Linear Statistical Models th Edition. < Here the independent variables were explicitly coded into p= Dummy KNNL dataset Table. variables X-X using KNNL coding scheme designed for nested variable B within A. Example: Cell Means ANOVA Model: K Variable Assignment: Y K N a length( Y) Fitted Values & Hat Matrix H: Residuals: READPRN ("c:/linearmodelsdata/trainingschoolubcodedr.txt" ) b ii N I X r identity( N) OV i j X j K j X T X X T Y Y h X H X X T X X T e Y Y h.7.7 7.7. < fitted values Yh < nxn Hat matrix < residuals Full Model ANOVA Table for Cell Means Model: Sum of Squares: Degrees of Freedom: SSTR Y T H J Y SSTR ( ) df R p df R N Y T ( I H) Y SSTO Y T I Nested -Way ANOVA as Linear Models - Unbalanced Example N augment( OVX) cols( X) i N J iii r Least Squares Estimation of the Regression Parameters: N J < total number of cases < Identity, One Matrix & One Vector for matrix calculations < design matrix p r Y ( ) ^ note intercept column in X Mean Squares: df E N p df E MSE df E SSTO df T N df T MSTO df T ^ SSTR sums as shown below, verified KNNL p. SSTO ( ) < meaning of coefficients not determined Y K 7 X SSTR MSTR df R W. Stein MSTR ( 77 ) MSE (. ) MSTO (.7 )

Linear Models Nested -Way ANOVA GLM Cell Means Decomposition of SSTR: #READ STRUCTURED DATA TABLE WITH NESTED CODE KNNL p. K=read.table("c:/LinearModelsData/TrainingSchoolUBcodedR.txt") K aach(k) Y=Score FM=lm(Y~X+X+X+X) summary(fm) anova(fm) RMa=lm(Y~X+X+X) RMbina=lm(Y~X) anova(rma,fm) anova(rmbina,fm) detach(k) X.... X....7 * X.. 7.7. * X.... ** Residuals.. Note: serial anova() is appropriate here because X-X need to be considered together, and not as separate marginal estimates. SSA.77 Prototype in R: SS BinA. #READ STRUCTURED DATA TABLE WITH NUMERIC CODED FACTOR K=read.table("c:/LinearModelsData/TrainingSchoolUBR.txt") K aach(k) Y=Score A=factor(School) B=factor(Instructor) contrasts(a)=contr.sum contrasts(b)=contr.sum FM=lm(Y~A+B%in%A) anova(fm) detach(k) > summary(fm) Call: lm(formula = Y ~ X + X + X + X) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept).7.7.. *** X.7.7.7. X 7.7... ** X -.. -.7. * X -..7 -.77. **.. ^ anova() SS sum to SSTR above < derived from GLM anova(rm,fm) ^ confirmed KNNL p. A.... A:B.... * Residuals.. > K Score X X X X - - - - - 7 - - - - - > anova(rma,fm) Model : Y ~ X + X + X Model : Y ~ X + X + X + X Res.Df RSS Df Sum of Sq F Pr(>F).77..77.7. ^ Extra SS for A > anova(rmbina,fm) Model : Y ~ X Model : Y ~ X + X + X + X Res.Df RSS Df Sum of Sq F Pr(>F) 7..... * ^ Extra SS for B in A > K Score School Instructor 7 < report for A doesn't match KNNL, but matches summary() above.

Linear Models Nested -Way ANOVA Example: K READPRN ("c:/linearmodelsdata/trainingschoolubr.txt" ) Variable Assignment: Y K < response (dependent) variable Y K B B B A Y Y Y A Y Y < response blocks 7 a b N length( Y) Means & number of elements: GM mean( Y) GM. < grand mean A mean( ) mean( ) A. < Independent Factor A means & n i B meany meany meany meany meany A A A A A B... < Nested Factor B with A means, n j & A mean level Y b W augment Y b W < Within values & associated block means

Linear Models Nested -Way ANOVA Sums of Squares: SSA independent factor: i length A n A AM A SSA n AM GM SSA. i i SSB(A) nested factor: j length B i n B BM B AM B SSB n BM AM SSB. j j j within block error: k length W j O W BM W O BM k k k Note : It remains troubling that I have not found a way in R to automatically recreate ANOVA SS results verified above with KNNL. I presume that different hypotheses i = of regression coefficients derived from the ways R codes nested factor B within A, versus the code system shown in KNNL p., is responsible for discrepant SSA/MSA/F/P that are reported. The KNNL coding scheme is easily expandable to accomodate any number of factors and/or factor levels, but seemingly this needs to be done by hand. However, having done so, one can use KNNL as the cited reference for how SSA was calculated. Note : The results here conform to the method of calculations in SL Box.. So, Sokal & Rohlf can be cited instead for calculation of Sums of Squares and associated F tests. However, discrepency in hypotheses related to SSA. here (agreeing with SR) and.77 in above (agreeing with KNNL) remains unresolved. Note : Anova(car) will not work using R's coding for this nested factor. It works fine with the code system in KNNL. ANOVA Table: Sums of Squares: Degrees of freedom: Mean Squares: SSA SSA. df A length A df A SSA MSA MSA. df A lengtha SS B(A) SSB. df B length B df B SSB MSB MSB. df B lengthb df E length W df E MSE MSE. df E

Linear Models Nested -Way ANOVA Tests of Significance: Using Full and Reduced Linear Models approach KNNL. For effect in independent variable A: Null Hypothesis and Alternative: H : Regression coefficient for Treatment A is zero - no independent effect in A H : Regression coefficient not zero - treatment effect is evident in A Test Statistic: Decision Rule:. MSA F a MSE < set as desired F a. If Fs > F(-df A, df E ) then Reject H, otherwise accept H Probability: 7.7 qf df A df E pff a df A P min pf F a df A df E df E P. For effect in nested variable B(A): Null Hypothesis and Alternative: H : Regression coefficient for Nested B is zero - no effect for nested B(A) H : Regression coefficient not zero - effect evident in B(A) Test Statistic: Decision Rule:. MSB F b MSE < set as desired F b. If Fs > F(-df B, df E ) then Reject H, otherwise accept H Probability:. qf df B df E pff b df B P min pf F b df B df E df E P. A.... A:B.... * Residuals.. --- Signif. codes: ***. **. *...