Workshop 7.4a: Single factor ANOVA

Similar documents
Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

ST430 Exam 2 Solutions

MODELS WITHOUT AN INTERCEPT

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA.

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Categorical Predictor Variables

Multiple Predictor Variables: ANOVA

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Linear Model Specification in R

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

STAT 572 Assignment 5 - Answers Due: March 2, 2007

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Comparing Nested Models

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

1 Use of indicator random variables. (Chapter 8)

Introduction to Analysis of Variance (ANOVA) Part 2

MATH 644: Regression Analysis Methods

Topic 28: Unequal Replication in Two-Way ANOVA

Workshop 9.3a: Randomized block designs

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 10. Factorial experiments (2-way ANOVA etc)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Tests of Linear Restrictions

Inference for Regression

Regression and Models with Multiple Factors. Ch. 17, 18

22s:152 Applied Linear Regression. 1-way ANOVA visual:

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Multiple Regression: Example

14 Multiple Linear Regression

STAT 215 Confidence and Prediction Intervals in Regression

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Stat 5102 Final Exam May 14, 2015

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Lecture 6 Multiple Linear Regression, cont.

MS&E 226: Small Data

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Difference in two or more average scores in different groups

36-707: Regression Analysis Homework Solutions. Homework 3

Booklet of Code and Output for STAC32 Final Exam

13 Simple Linear Regression

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Variance Decomposition and Goodness of Fit

General Linear Model (Chapter 4)

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

COMPARING SEVERAL MEANS: ANOVA

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Ch 2: Simple Linear Regression

Analysis of Variance

Exercise 2 SISG Association Mapping

More about Single Factor Experiments

Simple Linear Regression

Regression and the 2-Sample t

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Section 4.6 Simple Linear Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

School of Mathematical Sciences. Question 1

Multiple Predictor Variables: ANOVA

Introduction and Background to Multilevel Analysis

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

ANOVA. Testing more than 2 conditions

1 Multiple Regression

Ch 3: Multiple Linear Regression

Unbalanced Data in Factorials Types I, II, III SS Part 1

Handling Categorical Predictors: ANOVA

General Linear Statistical Models - Part III

Example of treatment contrasts used by R in estimating ANOVA coefficients

R 2 and F -Tests and ANOVA

Stat 401B Exam 2 Fall 2016

Weighted Least Squares

NC Births, ANOVA & F-tests

Comparing Several Means: ANOVA

Stat 401B Final Exam Fall 2015

SCHOOL OF MATHEMATICS AND STATISTICS

Chapter 8 Conclusion

Comparing Several Means

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Allow the investigation of the effects of a number of variables on some response

Lecture 2. Simple linear regression

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

R Output for Linear Models using functions lm(), gls() & glm()

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation

Formula for the t-test

Introduction to Analysis of Variance. Chapter 11

ST430 Exam 1 with Answers

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

A discussion on multiple regression models

Explanatory Variables Must be Linear Independent...

Simple Linear Regression: One Qualitative IV

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

ANOVA (Analysis of Variance) output RLS 11/20/2016

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Transcription:

-1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1. Estimation Y X 3 0 2.5 1 6 2 5.5 3 9 4 8.6 5 12 6 3.0 = β 0 1 + β 1 0 + ε 1 2.5 = β 0 1 + β 1 1 + ε 1 6.0 = β 0 1 + β 1 2 + ε 2 5.5 = β 0 1 + β 1 3 + ε 3 9.0 = β 0 1 + β 1 4 + ε 4 8.6 = β 0 1 + β 1 5 + ε 5 12.0 = β 0 1 + β 1 6 + ε 6 1.2. Estimation 3.0 = β 0 1 + β 1 0 + ε 1 2.5 = β 0 1 + β 1 1 + ε 1 6.0 = β 0 1 + β 1 2 + ε 2 5.5 = β 0 1 + β 1 3 + ε 3 9.0 = β 0 1 + β 1 4 + ε 4 8.6 = β 0 1 + β 1 5 + ε 5 12.0 = β 0 1 + β 1 6 + ε 6

-2-3.0 1 0 2.5 1 1 6.0 1 2 5.5 = 1 3 9.0 1 4 8.6 1 5 12.0 1 6 }{{}}{{} Response values Model matrix ( ) β0 + β 1 }{{} Parameter vector ε 1 ε 2 ε 3 ε 4 ε 5 ε 6 }{{} Residual vector 1.3. Matrix algebra 3.0 1 0 2.5 1 1 ε 1 6.0 1 2 ( ) ε 2 5.5 = 1 3 β0 + ε 3 9.0 1 4 β 1 }{{} ε 4 8.6 1 5 ε Parameter vector 5 ε 12.0 1 6 6 }{{}}{{}}{{} Residual vector Response values Model matrix Y = X β + ϵ 2. Anova Parameterization ˆβ = (X X ) 1 X Y 2.1. Simple ANOVA Three treatments (One factor - 3 levels), three replicates

-3-2.2. Simple ANOVA Two treatments, three replicates

-4-2.3. Categorical predictor Y A dummy1 dummy2 dummy3 --- --- -------- -------- -------- 2 G1 1 0 0 3 G1 1 0 0 4 G1 1 0 0 6 G2 0 1 0 7 G2 0 1 0 8 G2 0 1 0 10 G3 0 0 1 11 G3 0 0 1 12 G3 0 0 1 y ij = µ + β 1 (dummy 1 ) ij + β 2 (dummy 2 ) ij + β 3 (dummy 3 ) ij + ε ij 2.4. Overparameterized y ij = µ + β 1 (dummy 1 ) ij + β 2 (dummy 2 ) ij + β 3 (dummy 3 ) ij + ε ij

-5- Y A Intercept dummy1 dummy2 dummy3 --- --- ----------- -------- -------- -------- 2 G1 1 1 0 0 3 G1 1 1 0 0 4 G1 1 1 0 0 6 G2 1 0 1 0 7 G2 1 0 1 0 8 G2 1 0 1 0 10 G3 1 0 0 1 11 G3 1 0 0 1 12 G3 1 0 0 1 2.5. Overparameterized y ij = µ + β 1 (dummy 1 ) ij + β 2 (dummy 2 ) ij + β 3 (dummy 3 ) ij + ε ij three treatment groups four parameters to estimate need to re-parameterize 2.6. Categorical predictor y i = µ + β 1 (dummy 1 ) i + β 2 (dummy 2 ) + β 3 (dummy 3 ) i + ε i 2.6.1. Means parameterization y i = β 1 (dummy 1 ) i + β 2 (dummy 2 ) i + β 3 (dummy 3 ) i + ε ij y ij = α i + ε ij i = p 2.7. Categorical predictor 2.7.1. Means parameterization y i = β 1 (dummy 1 ) i + β 2 (dummy 2 ) i + β 3 (dummy 3 ) i + ε i Y A dummy1 dummy2 dummy3 --- --- -------- -------- -------- 2 G1 1 0 0 3 G1 1 0 0 4 G1 1 0 0 6 G2 0 1 0 7 G2 0 1 0

-6-8 G2 0 1 0 10 G3 0 0 1 11 G3 0 0 1 12 G3 0 0 1 2.8. Categorical predictordd 2.8.1. Means parameterization Y A 1 2.00 G1 2 3.00 G1 3 4.00 G1 4 6.00 G2 5 7.00 G2 6 8.00 G2 7 10.00 G3 8 11.00 G3 9 12.00 G3 y i = α 1 D 1i + α 2 D 2i + α 3 D 3i + ε i y i = α p + ε i, where p = number of levels of the factor and D = dummy variables 2 1 0 0 ε 1 3 1 0 0 ε 2 4 1 0 0 ε 3 6 0 1 0 α 1 ε 4 7 = 0 1 0 α 2 + ε 5 8 0 1 0 α 3 ε 6 10 0 0 1 ε 7 11 0 0 1 ε 8 12 0 0 1 ε 9 2.9. Categorical predictor 2.9.1. Means parameterization Parameter Estimates Null Hypothesis α1 mean of group 1 H 0 : α 1 = α 1 = 0 α2 mean of group 2 H 0 : α 2 = α 2 = 0 α3 mean of group 3 H 0 : α 3 = α 3 = 0 > summary(lm(y~-1+a))$coef Estimate Std. Error t value Pr(> t ) AG1 3 0.5773503 5.196152 2.022368e-03 AG2 7 0.5773503 12.124356 1.913030e-05 AG3 11 0.5773503 19.052559 1.351732e-06 but typically interested exploring effects 2.10. Categorical predictor y i = µ + β 1 (dummy 1 ) i + β 2 (dummy 2 ) i + β 3 (dummy 3 ) i + ε i

-7-2.10.1. Effects parameterization y ij = µ + β 2 (dummy 2 ) ij + β 3 (dummy 3 ) ij + ε ij 2.11. Categorical predictor 2.11.1. Effects parameterization y ij = µ + α i + ε ij i = p 1 y i = α + β 2 (dummy 2 ) i + β 3 (dummy 3 ) i + ε i Y A alpha dummy2 dummy3 --- --- ------- -------- -------- 2 G1 1 0 0 3 G1 1 0 0 4 G1 1 0 0 6 G2 1 1 0 7 G2 1 1 0 8 G2 1 1 0 10 G3 1 0 1 11 G3 1 0 1 12 G3 1 0 1 2.12. Categorical predictor 2.12.1. Effects parameterization Y A 1 2.00 G1 2 3.00 G1 3 4.00 G1 4 6.00 G2 5 7.00 G2 6 8.00 G2 7 10.00 G3 8 11.00 G3 9 12.00 G3 y i = α + β 2D 2i + β 3D 3i + ε i y i = α p + ε i, where p = number of levels of the factor minus 1 and D = dummy variables 2 1 0 0 ε 1 3 1 0 0 ε 2 4 1 0 0 ε 3 6 1 1 0 µ ε 4 7 = 1 1 0 α 2 + ε 5 8 1 1 0 α 3 ε 6 10 11 12 1 0 1 1 0 1 1 0 1 ε 7 ε 8 ε 9 2.13. Categorical predictor 2.13.1. Treatment contrasts Parameter Estimates Null Hypothesis Intercept mean of control group H 0 : µ = µ 1 = 0

-8- Parameter Estimates Null Hypothesis α 2 α 3 mean of group 2 minus mean of control group mean of group 3 minus mean of control group H 0 : α 2 = α 2 = 0 H 0 : α 3 = α 3 = 0 > contrasts(a) <-contr.treatment > contrasts(a) 2 3 G1 0 0 G2 1 0 G3 0 1 > summary(lm(y~a))$coef Estimate Std. Error t value Pr(> t ) (Intercept) 3 0.5773503 5.196152 2.022368e-03 A2 4 0.8164966 4.898979 2.713682e-03 A3 8 0.8164966 9.797959 6.506149e-05 2.14. Categorical predictor 2.14.1. Treatment contrasts Parameter Estimates Null Hypothesis Intercept mean of control group H 0 : µ = µ 1 = 0 α 2 α 3 mean of group 2 minus mean of control group mean of group 3 minus mean of control group H 0 : α 2 = α 2 = 0 H 0 : α 3 = α 3 = 0 > summary(lm(y~a))$coef Estimate Std. Error t value Pr(> t ) (Intercept) 3 0.5773503 5.196152 2.022368e-03 A2 4 0.8164966 4.898979 2.713682e-03 A3 8 0.8164966 9.797959 6.506149e-05 2.15. Categorical predictor 2.15.1. User defined contrasts User defined contrasts Grp2 vs Grp3 Grp1 vs (Grp2 & Grp3)

-9- Grp1 Grp2 Grp3 α 2 0 1-1 α 3 1-0.5-0.5 > contrasts(a) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) > contrasts(a) [,1] [,2] G1 0 1.0 G2 1-0.5 G3-1 -0.5 2.16. Categorical predictor 2.16.1. User defined contrasts p 1 comparisons (contrasts) all contrasts must be orthogonal 2.17. Categorical predictor 2.17.1. Orthogonality Four groups (A, B, C, D) p 1 = 3 comparisons 1. A vs B :: A > B 2. B vs C :: B > C 3. A vs C :: 2.18. Categorical predictor 2.18.1. User defined contrasts > contrasts(a) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) > contrasts(a) [,1] [,2] G1 0 1.0 G2 1-0.5 G3-1 -0.5 0 1.0 = 0 1 0.5 = 0.5 1 0.5 = 0.5 sum = 0

-10-2.19. Categorical predictor 2.19.1. User defined contrasts > contrasts(a) <- cbind(c(0,1,-1),c(1, -0.5, -0.5)) > contrasts(a) [,1] [,2] G1 0 1.0 G2 1-0.5 G3-1 -0.5 > crossprod(contrasts(a)) [,1] [,2] [1,] 2 0.0 [2,] 0 1.5 > summary(lm(y~a))$coef Estimate Std. Error t value Pr(> t ) (Intercept) 7 0.3333333 21.000000 7.595904e-07 A1-2 0.4082483-4.898979 2.713682e-03 A2-4 0.4714045-8.485281 1.465426e-04 2.20. Categorical predictor 2.20.1. User defined contrasts > contrasts(a) <- cbind(c(1, -0.5, -0.5),c(1,-1,0)) > contrasts(a) [,1] [,2] G1 1.0 1 G2-0.5-1 G3-0.5 0 > crossprod(contrasts(a)) [,1] [,2] [1,] 1.5 1.5 [2,] 1.5 2.0 3. Partitioning of variance (ANOVA) 3.1. ANOVA

-11-3.1.1. Partitioning variance 3.2. ANOVA 3.2.1. Partitioning variance > anova(lm(y~a)) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) A 2 96 48 48 0.0002035 *** Residuals 6 6 1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 3.3. Categorical predictor 3.3.1. Post-hoc comparisons No. of Groups No. of comparisons Familywise Type I error probability 3 3 0.14 5 10 0.40 10 45 0.90

-12-3.4. Categorical predictor 3.4.1. Post-hoc comparisons Bonferoni > summary(lm(y~a))$coef Estimate Std. Error t value Pr(> t ) (Intercept) 7 0.3333333 21.000000 7.595904e-07 A1-8 0.9428090-8.485281 1.465426e-04 A2 4 0.8164966 4.898979 2.713682e-03 > 0.05/3 [1] 0.01666667 3.5. Categorical predictor 3.5.1. Post-hoc comparisons Tukey s test > library(multcomp) > data.lm<-lm(y~a) > summary(glht(data.lm, linfct=mcp(a="tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = Y ~ A) Linear Hypotheses: Estimate Std. Error t value Pr(> t ) G2 - G1 == 0 4.0000 0.8165 4.899 0.00653 ** G3 - G1 == 0 8.0000 0.8165 9.798 < 0.001 *** G3 - G2 == 0 4.0000 0.8165 4.899 0.00679 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Adjusted p values reported -- single-step method) 3.6. Assumptions Normality Homogeneity of variance Independence As for regression

-13-4. Worked Examples 4.1. Worked Examples > day <- read.csv('../data/day.csv', strip.white=t) > head(day) TREAT BARNACLE 1 ALG1 27 2 ALG1 19 3 ALG1 18 4 ALG1 23 5 ALG1 25 6 ALG2 24 4.2. Worked Examples Question: what effects do different substrate types have on barnacle recruitment Linear model: 4.3. Worked Examples Barnacle i = µ + α j + ε i ε N (0, σ 2 ) > partridge <- read.csv('../data/partridge.csv', strip.white=t) > head(partridge) GROUP LONGEVITY 1 PREG8 35 2 PREG8 37 3 PREG8 49 4 PREG8 46 5 PREG8 63 6 PREG8 39 > str(partridge) 'data.frame': 125 obs. of 2 variables: $ GROUP : Factor w/ 5 levels "NONE0","PREG1",..: 3 3 3 3 3 3 3 3 3 3... $ LONGEVITY: int 35 37 49 46 63 39 46 56 63 65... 4.4. Worked Examples Question: what effects does mating have on the longevity of male fruitflies Linear model: Longevity i = µ + α j + ε i ε N (0, σ 2 )