Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Similar documents
Biostatistics 380 Multiple Regression 1. Multiple Regression

R Output for Linear Models using functions lm(), gls() & glm()

ANOVA Randomized Block Design

Unbalanced Nested ANOVA - Sokal & Rohlf Example

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

MODELS WITHOUT AN INTERCEPT

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Variance Decomposition and Goodness of Fit

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Tests of Linear Restrictions

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Factorial Analysis of Variance with R

Inference for Regression

Chapter 8 Conclusion

STAT 401A - Statistical Methods for Research Workers

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Chapter 11 - Lecture 1 Single Factor ANOVA

Regression and the 2-Sample t

16.3 One-Way ANOVA: The Procedure

Comparing Nested Models

Stat 5102 Final Exam May 14, 2015

1 Use of indicator random variables. (Chapter 8)

Workshop 7.4a: Single factor ANOVA

Extensions of One-Way ANOVA.

MAT3378 ANOVA Summary

3. Design Experiments and Variance Analysis

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

ST430 Exam 2 Solutions

36-707: Regression Analysis Homework Solutions. Homework 3

ST430 Exam 1 with Answers

1 Multiple Regression

Multiple Predictor Variables: ANOVA

MATH 644: Regression Analysis Methods

Week 7 Multiple factors. Ch , Some miscellaneous parts

Lecture 13 Extra Sums of Squares

MS&E 226: Small Data

Booklet of Code and Output for STAC32 Final Exam

Ch 3: Multiple Linear Regression

Multiple Regression: Example

Unit 7: Random Effects, Subsampling, Nested and Crossed Factor Designs

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

CAS MA575 Linear Models

Multiple Predictor Variables: ANOVA

df=degrees of freedom = n - 1

ANOVA: Comparing More Than Two Means

NC Births, ANOVA & F-tests

General Linear Statistical Models - Part III

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

SCHOOL OF MATHEMATICS AND STATISTICS

Statistics Lab #6 Factorial ANOVA

Example of treatment contrasts used by R in estimating ANOVA coefficients

Psychology 405: Psychometric Theory

ANOVA Analysis of Variance

Extensions of One-Way ANOVA.

STAT 350: Summer Semester Midterm 1: Solutions

Biostatistics 360 F&t Tests and Intervals in Regression 1

Homework 3 - Solution

Analysis of Variance and Design of Experiments-I

11 Factors, ANOVA, and Regression: SAS versus Splus

Econometrics. 4) Statistical inference

Lecture 10 Multiple Linear Regression

Statistics and Quantitative Analysis U4320

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Ch 2: Simple Linear Regression

ANOVA (Analysis of Variance) output RLS 11/20/2016

(a) The percentage of variation in the response is given by the Multiple R-squared, which is 52.67%.

Unit 27 One-Way Analysis of Variance

School of Mathematical Sciences. Question 1

6. Multiple Linear Regression

Lecture 22 Mixed Effects Models III Nested designs

STAT Final Practice Problems

Chapter 3: Multiple Regression. August 14, 2018

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Handout 4: Simple Linear Regression

Unbalanced Data in Factorials Types I, II, III SS Part 1

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Lecture 10. Factorial experiments (2-way ANOVA etc)

Multiple Regression Part I STAT315, 19-20/3/2014

1 Overview. 2 Multiple Regression framework. Effect Coding. Hervé Abdi

Josh Engwer (TTU) 1-Factor ANOVA / 32

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

STATS Analysis of variance: ANOVA

Lecture 6 Multiple Linear Regression, cont.

Explanatory Variables Must be Linear Independent...

Chapter 10: Analysis of variance (ANOVA)

R 2 and F -Tests and ANOVA

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Motor Trend Car Road Analysis

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

Factorial and Unbalanced Analysis of Variance

Transcription:

Linear Models Nested -Way ANOVA ORIGIN As with other linear models, unbalanced data require use of the regression approach, in this case by contrast coding of independent variables using a scheme not described in full by KNNL p.. I have been unable to find this scheme in R, so I coded it by hand. Results match KNNL, but do not entirely match R's anova() report using either contr.sum or contr.treatment codings. Example comes from Chapter in Kuter et al. (KNNL) Applied Linear Statistical Models th Edition. < Here the independent variables were explicitly coded into p= Dummy KNNL dataset Table. variables X-X using KNNL coding scheme designed for nested variable B within A. Example: Cell Means ANOVA Model: K Variable Assignment: Y K N a length( Y) Fitted Values & Hat Matrix H: Residuals: READPRN ("c:/linearmodelsdata/trainingschoolubcodedr.txt" ) b ii N I X r identity( N) OV i j X j K j X T X X T Y Y h X H X X T X X T e Y Y h.7.7 7.7. < fitted values Yh < nxn Hat matrix < residuals Full Model ANOVA Table for Cell Means Model: Sum of Squares: Degrees of Freedom: SSTR Y T H J Y SSTR ( ) df R p df R N Y T ( I H) Y SSTO Y T I Nested -Way ANOVA as Linear Models - Unbalanced Example N augment( OVX) cols( X) i N J iii r Least Squares Estimation of the Regression Parameters: N J < total number of cases < Identity, One Matrix & One Vector for matrix calculations < design matrix p r Y ( ) ^ note intercept column in X Mean Squares: df E N p df E MSE df E SSTO df T N df T MSTO df T ^ SSTR sums as shown below, verified KNNL p. SSTO ( ) < meaning of coefficients not determined Y K 7 X SSTR MSTR df R W. Stein MSTR ( 77 ) MSE (. ) MSTO (.7 )

Linear Models Nested -Way ANOVA GLM Cell Means Decomposition of SSTR: #READ STRUCTURED DATA TABLE WITH NESTED CODE KNNL p. K=read.table("c:/LinearModelsData/TrainingSchoolUBcodedR.txt") K aach(k) Y=Score FM=lm(Y~X+X+X+X) summary(fm) anova(fm) RMa=lm(Y~X+X+X) RMbina=lm(Y~X) anova(rma,fm) anova(rmbina,fm) detach(k) X.... X....7 * X.. 7.7. * X.... ** Residuals.. Note: serial anova() is appropriate here because X-X need to be considered together, and not as separate marginal estimates. SSA.77 Prototype in R: SS BinA. #READ STRUCTURED DATA TABLE WITH NUMERIC CODED FACTOR K=read.table("c:/LinearModelsData/TrainingSchoolUBR.txt") K aach(k) Y=Score A=factor(School) B=factor(Instructor) contrasts(a)=contr.sum contrasts(b)=contr.sum FM=lm(Y~A+B%in%A) anova(fm) detach(k) > summary(fm) Call: lm(formula = Y ~ X + X + X + X) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept).7.7.. *** X.7.7.7. X 7.7... ** X -.. -.7. * X -..7 -.77. **.. ^ anova() SS sum to SSTR above < derived from GLM anova(rm,fm) ^ confirmed KNNL p. A.... A:B.... * Residuals.. > K Score X X X X - - - - - 7 - - - - - > anova(rma,fm) Model : Y ~ X + X + X Model : Y ~ X + X + X + X Res.Df RSS Df Sum of Sq F Pr(>F).77..77.7. ^ Extra SS for A > anova(rmbina,fm) Model : Y ~ X Model : Y ~ X + X + X + X Res.Df RSS Df Sum of Sq F Pr(>F) 7..... * ^ Extra SS for B in A > K Score School Instructor 7 < report for A doesn't match KNNL, but matches summary() above.

Linear Models Nested -Way ANOVA Example: K READPRN ("c:/linearmodelsdata/trainingschoolubr.txt" ) Variable Assignment: Y K < response (dependent) variable Y K B B B A Y Y Y A Y Y < response blocks 7 a b N length( Y) Means & number of elements: GM mean( Y) GM. < grand mean A mean( ) mean( ) A. < Independent Factor A means & n i B meany meany meany meany meany A A A A A B... < Nested Factor B with A means, n j & A mean level Y b W augment Y b W < Within values & associated block means

Linear Models Nested -Way ANOVA Sums of Squares: SSA independent factor: i length A n A AM A SSA n AM GM SSA. i i SSB(A) nested factor: j length B i n B BM B AM B SSB n BM AM SSB. j j j within block error: k length W j O W BM W O BM k k k Note : It remains troubling that I have not found a way in R to automatically recreate ANOVA SS results verified above with KNNL. I presume that different hypotheses i = of regression coefficients derived from the ways R codes nested factor B within A, versus the code system shown in KNNL p., is responsible for discrepant SSA/MSA/F/P that are reported. The KNNL coding scheme is easily expandable to accomodate any number of factors and/or factor levels, but seemingly this needs to be done by hand. However, having done so, one can use KNNL as the cited reference for how SSA was calculated. Note : The results here conform to the method of calculations in SL Box.. So, Sokal & Rohlf can be cited instead for calculation of Sums of Squares and associated F tests. However, discrepency in hypotheses related to SSA. here (agreeing with SR) and.77 in above (agreeing with KNNL) remains unresolved. Note : Anova(car) will not work using R's coding for this nested factor. It works fine with the code system in KNNL. ANOVA Table: Sums of Squares: Degrees of freedom: Mean Squares: SSA SSA. df A length A df A SSA MSA MSA. df A lengtha SS B(A) SSB. df B length B df B SSB MSB MSB. df B lengthb df E length W df E MSE MSE. df E

Linear Models Nested -Way ANOVA Tests of Significance: Using Full and Reduced Linear Models approach KNNL. For effect in independent variable A: Null Hypothesis and Alternative: H : Regression coefficient for Treatment A is zero - no independent effect in A H : Regression coefficient not zero - treatment effect is evident in A Test Statistic: Decision Rule:. MSA F a MSE < set as desired F a. If Fs > F(-df A, df E ) then Reject H, otherwise accept H Probability: 7.7 qf df A df E pff a df A P min pf F a df A df E df E P. For effect in nested variable B(A): Null Hypothesis and Alternative: H : Regression coefficient for Nested B is zero - no effect for nested B(A) H : Regression coefficient not zero - effect evident in B(A) Test Statistic: Decision Rule:. MSB F b MSE < set as desired F b. If Fs > F(-df B, df E ) then Reject H, otherwise accept H Probability:. qf df B df E pff b df B P min pf F b df B df E df E P. A.... A:B.... * Residuals.. --- Signif. codes: ***. **. *...