Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment types and combinations? What if we have spatial gradients in our experiments? 2/32
Multiway ANOVA Extends multiple predictor framework Categorical treatments are orthogonal Reflects reality of experiments Stepping-stone to factorial designs 3/32 Blocked Designs 4/32
What if you manipulate two factors? Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment only has 1 replicate of a second treatment 5/32 What if you manipulate two factors? Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment only has 1 replicate of a second treatment Note: Above is a Latin Squares Design - Every row and column contains one replicate of a treatment. 5/32
Effects of Stickleback Density on Zooplankton Units placed across a lake so that 1 set of each treatment was blocked together 6/32 Treatment and Block Effects 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 control high low Treatment 1 2 3 4 5 Block 7/32
Modeling & Evaluating Multiple Factors 8/32 Model for Multiway ANOVA/ANODEV y k = β 0 + β i x i + β j x j + ɛ k ɛ ijk N(0, σ 2 ), x i = 0, 1 9/32
Model for Multiway ANOVA/ANODEV y k = β 0 + β i x i + β j x j + ɛ k ɛ ijk N(0, σ 2 ), x i = 0, 1 Or, with matrices... Y = βx + ɛ 9/32 Model for Multiway ANOVA/ANODEV Y = βx + ɛ y1 β i1 1 0 1 0 ɛ 1 y2 y3 = β i2 1 0 0 1 β j1 0 1 1 0 + ɛ 2 ɛ 3 y4 β j2 0 1 0 1 ɛ 4 10/32
Model for Multiway ANOVA/ANODEV Y = βx + ɛ y1 β i1 1 0 1 0 ɛ 1 y2 y3 = β i2 1 0 0 1 β j1 0 1 1 0 + ɛ 2 ɛ 3 y4 β j2 0 1 0 1 ɛ 4 We can have as many groups as we need, so long as there is sufficient replication of each treatment combination. 10/32 Hypotheses for Multiway ANOVA/ANODEV TreatmentHo: µ i1 = µi2 = µi3 =... Block Ho: µ j1 = µj2 = µj3 =... 11/32
Sums of Squares for Multiway ANOVA Factors are Orthogonal and Balanced, so... SST = SSA + SSB + SSR F-Test using Mean Squares as Before Type I and Type II SS will produce the same result 12/32 Before we model it, make sure Block is a factor zoop$block <- factor(zoop$block) 13/32
Two-Way ANOVA as a Linear Model zoop_lm <- lm(zooplankton treatment + block, data=zoop) 14/32 Check Diagnostics Residuals vs Fitted Normal Q Q Scale Location Residuals 0.5 0.0 0.5 14 1 13 Standardized residuals 2 1 0 1 2 13 14 1 Standardized residuals 0.0 0.4 0.8 1.2 14 1 13 1.0 1.5 2.0 2.5 3.0 3.5 1 0 1 1.0 1.5 2.0 2.5 3.0 3.5 Fitted values Theoretical Quantiles Fitted values Cook's distance Constant Leverage: Residuals vs Factor Levels Cook's distance 0.0 0.1 0.2 0.3 0.4 0.5 1 13 14 2 4 6 8 10 12 14 Standardized residuals 2 1 0 1 2 1 13 treatment : control high low 14 Obs. number Factor Level Combinations 15/32
Residuals by Groups and No Non-Additivity Pearson residuals 0.6 0.2 0.2 0.6 0.6 0.2 0.2 0.6 control high low treatment 1 2 3 4 5 block Pearson residuals 0.6 0.2 0.2 0.6 Pearson residuals 1.0 1.5 2.0 2.5 3.0 3.5 Fitted values 16/32 Residuals by Groups and No Non-Additivity Tukey s Test for Non-Additivity library(car) residualplots(zoop_lm, cex.lab=1.4) # Test stat Pr(> t ) # treatment NA NA # block NA NA # Tukey test 0.474 0.635 17/32
The ANOVA But first, what are the DF for... Treatment (with 3 levels) Block (with 5 blocks) Residuals (with n=15) 18/32 The ANOVA anova(zoop_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 6.8573 3.4287 16.3660 0.001488 # block 4 2.3400 0.5850 2.7924 0.101031 # Residuals 8 1.6760 0.2095 19/32
Coefficients via Treatment Contrasts summary(zoop_lm)$coef # Estimate Std. Error t value # (Intercept) 3.420000e+00 0.3126766 1.093782e+01 # treatmenthigh -1.640000e+00 0.2894823-5.665286e+00 # treatmentlow -1.020000e+00 0.2894823-3.523532e+00 # block2 1.039137e-15 0.3737200 2.780521e-15 # block3-7.000000e-01 0.3737200-1.873060e+00 # block4-1.000000e+00 0.3737200-2.675800e+00 # block5-3.000000e-01 0.3737200-8.027399e-01 # Pr(> t ) # (Intercept) 4.330286e-06 # treatmenthigh 4.729729e-04 # treatmentlow 7.805477e-03 # block2 1.000000e+00 # block3 9.794523e-02 # block4 2.810839e-02 # block5 4.453163e-01 20/32 Unique Effect of Each Treatment crplots(zoop_lm) Component + Residual Plots Component+Residual(zooplankton) 1.0 0.5 0.0 0.5 1.0 1.5 Component+Residual(zooplankton) 0.5 0.0 0.5 1.0 control high low treatment 1 2 3 4 5 block 21/32
Unique Effect of Each Treatment (visreg) 4.0 4.0 zooplankton 3.5 3.0 2.5 2.0 zooplankton 3.5 3.0 2.5 1.5 2.0 1.0 control high low treatment 1 2 3 4 5 block 22/32 Exercise: Bees! Load the Bee Gene Expresion Data Does bee type or colony matter? How much variation does this experiment explain? 23/32
Bee ANOVA anova(bee_lm) # Analysis of Variance Table # # Response: Expression # Df Sum Sq Mean Sq F value Pr(>F) # type 1 2.69340 2.69340 35.3465 0.02714 # colony 2 0.34293 0.17147 2.2502 0.30767 # Residuals 2 0.15240 0.07620 24/32 Bee Effects crplots(bee_lm) Component + Residual Plots Component+Residual(Expression) 0.5 0.0 0.5 Component+Residual(Expression) 0.4 0.2 0.0 0.2 for nurse 1 2 3 type colony 25/32
What if my data is unbalanced? 26/32 Unbalancing the Zooplankton Data zoop_u <- zoop[-c(1,2),] 27/32
An Unbalanced ANOVA zoop_u_lm <- update(zoop_lm, data=zoop_u) anova(zoop_u_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 28/32 An Unbalanced ANOVA zoop_u_lm <- update(zoop_lm, data=zoop_u) anova(zoop_u_lm) # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 Is this valid? Can we use Type I sequential SS? 28/32
Unbalanced Data and Type I SS Missing cells (i.e., treatment-block combinations) mean that order matters in testing SS zoop_u_lm1 <- lm(zooplankton treatment + block, data=zoop_u) zoop_u_lm2 <- lm(zooplankton block + treatment, data=zoop_u) Intercept versus Treatment and Block versus Treatment + Block will not produce different SS 29/32 Unbalanced Data and Type I SS # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # treatment 2 4.1751 2.08754 16.481 0.003652 # block 4 1.7480 0.43700 3.450 0.086009 # Residuals 6 0.7600 0.12667 # Analysis of Variance Table # # Response: zooplankton # Df Sum Sq Mean Sq F value Pr(>F) # block 4 2.2364 0.55910 4.414 0.052852 # treatment 2 3.6867 1.84333 14.553 0.004993 # Residuals 6 0.7600 0.12667 30/32
Solution: Marginal, or Type II SS SS of Block: Treatment versus Treatment + Block SS of Treatment: Block versus Block + Treatment Note: Because of marginality, the sum of all SS will no longer equal SST 31/32 Solution: Marginal, or Type II SS Anova(zoop_u_lm1) # Anova Table (Type II tests) # # Response: zooplankton # Sum Sq Df F value Pr(>F) # treatment 3.6867 2 14.553 0.004993 # block 1.7480 4 3.450 0.086009 # Residuals 0.7600 6 Note the capital A - this is a function from the car package. 32/32