Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable: number of newly recruited worms Predictor variable: biofilm type categorical with 4 groups (sterile substrata, lab biofilms with net, lab biofilms w/o net, field biofilms with net) Fixed or Random???? Replicates are settlement plates 1
Serpulid recruitment and biofilms Serpulid recruitment and biofilms Source df MS F P Biofilm type 3 0.080 6.01 0.003 Residual 24 0.013 Total 27 2
P(F) 3, 24 df = 0.05 0 1 2 3 4 5 F F = 3.01 Any Fratio > 3.01 has < 0.05 (5%) chance of occurring if H 0 is true Serpulid example: F 3,24 = 6.01 We reject H 0 statistically significant result 3
2.4 Log (serpulids +SE) 2.3 2.2 2.1 2.0 1.9 1.8 1.7 F NL SL UL Biofilm Treatment Assumptions Apply to response variable within each group Apply to error terms from linear model 4
Normality Observations within each group come from normally distributed populations ANOVA robust: use boxplots to check for skewness and outliers Use probability plots to check for overall normality of data Homogeneity of variance Variances of group populations are the same skewed populations produce unequal group variances ANOVA reliable if group n s are equal and variances not too different: ratio of largest to smallest variance 3:1 tests for equal variances Bartlett s, Cochran s, Levene s tests 5
Residual Difference between observed and predicted value of response variable ANOVA residual is difference between each Yvalue and group mean ( y y ) ij i Residual plot: residuals against group means Outliers Other plots Plot group variances against group means in skewed distributions (lognormal and Poisson), variance +very related to mean in symmetrical distributions, variance independent of mean 6
Independence Observations independent within and between groups no replicate used more than once must be considered at design stage Robust ANOVA Tests with unequal variances: Welch test, Wilcox Z test Rankbased nonparametric tests: KruskalWallis test RT ANOVA Randomization test Generalized linear modeling 7
ANOVA with 2 groups Null hypothesis: no difference between 2 population means ANOVA Fratio test or t test F = t 2 P values identical Specific comparisons of groups 8
Type I error Probability of rejecting H 0 when true probability of false significant result Set by significance level (e.g. 0.05) 5% chance of falsely rejecting H 0 Probability of Type I error for each separate test Specific comparisons of means Which groups are significantly different from which? Multiple pairwise t tests: each test with = 0.05 Increasing Type I error rate: probability of at least one Type I error among all comparisons (familywise Type I error rate) increases 9
Control of FamilyWise error rate No. of No. of Familywise groups comparisons probability Type I error ( 0.05) 3 3 0.14 5 10 0.40 10 45 0.90 c Familywise error rate = 1 (1 ) Where: = Critical pvalue (prob. Of Type I error) c = Number of comparisons Unplanned pairwise comparisons 10
Unplanned comparisons Comparisons done after significant ANOVA F test Comparing each group to each other group: which are significantly different from which? Lots of comparisons: not independent Unplanned comparisons Control familywise (FW) Type I error rate to 0.05: significance level for each comparison must be below 0.05 Termed unplanned (pairwise) multiple comparisons Test statistics: F, t, Q (studentized range statistic) 11
Multiple comparison tests Fisher s Least Significant Difference (LSD), StudentNewmanKeuls (SNK) test, Duncan s Multiple Range test: incomplete control of FW Type I error not recommended Tukey s test, Ryan s (REGW) test recommended Bonferroni adjusted pairwise tests (e.g. t tests) Least powerful Most conservative Multiple comparison tests The logic of Bonferroni adjusted pairwise tests Recall: c Familywise error rate = 1 (1 ) Therefore a conservative correction is to divide the desired level of Type I error by the number of comparisons. This yields a new estimate of acceptable (individual comparison) Type I error New Critical Pvalue = c For example with alpha (familywise) =0.05 and number of comparisons =10, the new critical pvalue for an individual comparison = 0.05/10 = 0.005 12
Relationship between education level and income assumption assessment using untransformed data Diagnostic Plot 0.985 Normal Probability 0.95 0.91 0.84 0.7 0.5 0.3 0.16 0.09 0.05 0.015 0.004 0 10 20 30 40 50 60 70 INCOME Test of homogeneity of variance Relationship between education level and income assumption assessment using log transformed data Chart Diagnostic Plot Mean(Log(Income)) 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 EDUCATN Each error bar is constructed using 1 standard error from the mean. Test of homogeneity of variance Normal Probability 0.985 0.95 0.91 0.84 0.7 0.5 0.3 0.16 0.09 0.05 0.015 0.004 0.5 1 1.5 2 Log(Income) 13
Formal ANOVA : Question which groups differ? Note, this implies you have no specific hypotheses Least Squares Means Table Level 1 2 3 4 5 6 7 Least Sq Mean 0.9053344 1.0240003 1.2077589 1.3462042 1.2706101 1.4823576 1.5126769 Std Error 0.18244865 0.04609484 0.03192186 0.04710804 0.04935252 0.08445731 0.11172652 Mean 0.90533 1.02400 1.20776 1.34620 1.27061 1.48236 1.51268 Tukey test on all pairwise comparisons LSMeans Differences Tukey HSD α= 0.050 Q= 2.97261 LSMean[j] Mean[i]Mean[j] 1 2 3 4 5 6 7 Std Err Dif Lower CL Dif Upper CL Dif 1 0 0.1187 0.3024 0.4409 0.3653 0.577 0.607 0 0.18818 0.18522 0.18843 0.18901 0.20105 0.2139 0 0.6781 0.853 1.001 0.9271 1.1747 1.243 0 0.44072 0.24816 0.11927 0.19657 0.02062 0.0286 2 0.11867 0.18818 0 0.1838 0.3222 0.2466 0.4584 0.488 0 0.05607 0.06591 0.06753 0.09622 0.1208 0.4407 0 0.3504 0.5181 0.4474 0.7444 0.84 0.67806 0 0.0171 0.1263 0.0459 0.1723 0.129 3 0.30242 0.18376 0.18522 0.05607 0 0.1384 0.0629 0.2746 0.304 0 0.0569 0.05878 0.09029 0.116 0.2482 0.01709 0 0.3076 0.2376 0.543 0.650 0.85301 0.35043 0 0.03071 0.11187 0.0062 0.0404 4 0.44087 0.3222 0.13845 0.18843 0.06591 0.0569 0.1193 0.12628 0.0307 1.00101 0.51812 0.3076 0 0.07559 0.1362 0.166 0 0.06823 0.09671 0.1212 0 0.1272 0.4236 0.526 0 0.2784 0.15132 0.1939 5 0.36528 0.24661 0.06285 0.0756 0.18901 0.06753 0.05878 0.06823 0.1966 0.04587 0.1119 0.2784 0.92712 0.44735 0.23757 0.12722 0 0.2117 0.242 0 0.09782 0.1221 0 0.5025 0.605 0 0.07903 0.1210 6 0.57702 0.45836 0.2746 0.13615 0.21175 0.20105 0.09622 0.09029 0.09671 0.09782 0 0.030 0 0.1400 0.0206 0.17234 0.00621 0.1513 0.079 0 0.446 1.17466 0.74437 0.54299 0.42363 0.50253 0 0.3860 7 0.60734 0.48868 0.30492 0.16647 0.24207 0.03032 0.21394 0.12086 0.1162 0.12125 0.12214 0.14006 0.0286 0.1294 0.0405 0.194 0.121 0.386 1.2433 0.84795 0.65033 0.52691 0.60515 0.44665 Least Level 7 6 4 5 3 A B A A B A B B Sq Mean 1.5126769 1.4823576 1.3462042 1.2706101 1.2077589 2 1 C A B C 1.0240003 0.9053344 Levels not connected by same letter are significantly different. 14
The problems with presentation and unplanned comparisons ABC C B AB AB A AB ABC C B AB AB A AB Planned comparisons 15
Planned comparisons what you should be doing!!! Also called contrasts Interesting and logical comparisons of means or combinations of means Planned before data analysis Ideally independent: therefore only small number of comparisons allowed Contrast Logic (assume 4 groups) Array must sum to 0 1111 = 1111 = 2 2 2 2 (all compare the 1 st 2 groups to the second 2 groups 1 1 0 0 or 1 1 0 0 compare the 1 st and 2 nd groups 2 1 3 0 compares the first 2 groups to the 3 rd and weights the 1 st group twice as much as the second 1.5.5.5 1.5 tests for a linear trend in groups Or simply set polynomial order =1 in contrast window polynomial order = 2 tests for a quadratic fit 16
Number of independent comparisons < df Groups e.g. 7 groups, 6 df, maximum 6 independent contrasts Each test can be done at 0.05 no correction for increased familywise error rate???? Methods for planned comparisons 17
Partition variance ANOVA Partition SS Groups : SS for each comparison 1 df test with Fratio test as part of ANOVA F = MS Contrast / MS Residual H 0 : 1 = 2 or H 0 : 1 2 = 0 Linear combination of means using coefficients (c i s): c 1y1 c2 y2... c i yi where c i = 0 18
Newman (1994) Ecology 75:10851096 Effects of changing food levels on size and age at metamorphosis of tadpoles Four treatments used: low food (n=5), medium food (n=8), high food (n=6), food decreasing from high to low (n=7) H 0 : no effect of food levels on size of toads at metamorphosis. Planned comparison of decreasing food vs constant high food: H O : no difference between decreasing food and high food on size of toads at metamorphosis. Source df SS F P Food 3 0.0448 17.41 <0.001 High vs decreasing 1 0.0345 40.27 <0.001 Residual 22 0.0189 19
Example: ANOVA coupled with hypothesis tests Does educational level affect income? Seven categories 1: No High School Degree 2: Dropped out of HS 3: High School Degree 4: Some College 5: College Degree 6: Some postgraduate study 7: Postgraduate degree Specific Hypotheses H 1 : Postgraduate degree> No postgraduate degree H 2 : Postgraduate experience >No postgraduate experience H 3 : College Experience but no postgrad experience > No College experience Survey2 Check Assumptions use log transformed data (as noted above) 20
Specific Hypotheses H 1 : Postgraduate degree> No postgraduate degree H 2 : Postgraduate experience >No postgraduate experience H 3 : College Experience but no postgrad experience > No College experience Contrast Contrast Specification EDUCATN 1 2 3 4 5 6 7 0.167 + 0.167 + 0.167 + 0.167 + 0.167 + 0.167 + 1 + H 1 H 2 H 3 Contrast Contrast Specification EDUCATN 1 2 3 4 5 6 7 0.2 + 0.2 + 0.2 + 0.2 + 0.2 + 0.5 + 0.5 + Contrast Contrast Specification EDUCATN 1 2 3 4 5 6 7 0.333 + 0.333 + 0.333 + 0.5 + 0.5 + 0 + 0 + Click on + or to make contrast values. Click on + or to make contrast values. Click on + or to make contrast values. Output Analysis of Variance Source SumofSquares df MeanSquare Fratio P Education 4.56 6 0.76 7.62 <0.0001 H1: 0.670 1 0.670 6.80 0.0097 H2: 1.834 1 1.834 18.36 <0.0001 H3: 1.322 1 1.322 13.24 0.0003 Error 29.43 249 0.099 H 1 : Postgraduate degree> No postgraduate degree H 2 : Postgraduate experience >No postgraduate experience H 3 : College Experience but no postgrad experience > No College experience 21
Trend analyses Trend through quantitative factor levels Orthogonal polynomials: linear trend, quadratic trend etc. Spacing of factor levels Linear Quadratic Cubic Y Group Group Group Test for linear trend using contrasts Contrast Contrast Specification EDUCATN 1 2 3 4 5 6 7 0.5 + 0.333 + 0.167 + 0 + 0.1667 + 0.3333 + 0.5 + Click on + or to make contrast values. 22