Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when we conduct ANOVAs Our ANOVA provides a single F statistic assessing the suitability of H o NOT how groups differ Omnibus test Rejecting H 0 simply tells us that: H 0 :µ 1 = µ 2 = µ 3 Is not an accurate representation of the data 1
Multiple Comparisons: A Rationale Any of the following could be true: H 1 :µ 1 µ 2 µ 3 H 1 :µ 1 µ 2 = µ 3 H 1 :µ 1 µ 3 = µ 2 H 1 :µ 1 = µ 2 µ 3 H 1 :µ 1 = µ 3 µ 2 All are potentially valid null hypotheses Which is most accurate? Multiple Comparisons: Example Going back to our AN treatment example: Which treatment is best for AN? H 0 :µ = µ = µ H 1 :µ µ µ H 1 :µ µ = µ H 1 :µ µ = µ H 1 :µ = µ µ H 1 :µ = µ µ 2
Multiple Comparisons: Example We could simply compare differences between groups by conducting a series of individual groups t-tests vs. vs. vs. From these three analyses, we can answer which of the alternate hypotheses best fits the data Why don t we do it this way? Multiple Comparisons: The Limitation Too much work (3 more analyses!) Inflation of Type I error rate 3
Error Associated With Multiple Comparisons When we want to consider error rates for analyses, there are two ways to calculate error: Per-comparison Error Associated With Multiple Comparisons The probability associated with making an error for each analysis Could be Type I or Type II but generally we think only about Type I error Per-Comparison (PC) error rate = α Thus, for every analysis we conduct, there is a fixed amount of error we have to live with 4
Familywise Error Associated With Multiple Comparisons The probability associated with making an error for a set of comparisons Familywise error rate = 1 (1 α) c α = α c = the number of comparisons made Error Associated With Multiple Comparisons Thus, for our AN treatment example: α = 1 (1.05) 3 α = 1 (.95) 3 α = 1 (.8574) α =.14 Thus, we see that the familywise error rate is slightly less than the sum of the α values for all 3 analyses 5
Error Associated With Multiple Comparisons When conducting multiple comparisons, we run the risk of inflating the error in our analysis The more analyses, the greater the chance of making a Type I error 3 t-tests =.14 Multiple Comparisons: Conclusion Most multiple comparison procedures seek to minimize or eliminate the impact of familywise error This is the reason we use multiple comparison tests to evaluate group differences in ANOVA 6
Types of Multiple Comparison Tests A Priori Tests based on hypotheses you have BEFORE collecting and analyzing data Driven by your theory Planned without seeing the results of the analyses However, because of this theoretical grounding, a priori tests must be a subset of all possible comparisons A Priori Tests: FAQ Why can t I conduct a priori tests after collecting data? Seeing the data (group means) may bias the hypotheses you generate Why can t I explore all possible comparisons when conducting a priori tests? Exploring all possible comparisons is the process of conducting post hoc tests Mining the data 7
A Priori Tests: FAQ Can I conduct a priori tests without an ANOVA? Yes! If you can narrow down your hypotheses to begin with, it is possible to conduct a priori tests without an F test however, you should have a good reason for doing so. Types of Multiple Comparison Tests Post Hoc Tests planned after the data is collected and the experimenter has examined the group means These tests are subject to experimenter bias and may be influenced by expectancy effects 8
Post-hoc Tests: FAQ SPSS isn t printing out post-hoc results for me. It s giving me an error. Remember: multiple comparison tests require at least 3 groups. Otherwise, you can simply interpret the means. You likely have a grouping variable with 2 categories. Which MC is best? Given the grounding of a priori tests in theory, they are generally preferred to post hoc tests More difficult to plan and conduct Unexpected results? 9
A Priori Tests: Multiple Comparison t-tests The easiest a priori test is the multiple comparison t-test This test will NOT control for familywise error rate important to choose comparisons carefully. A Priori Tests: Multiple Comparison t-tests For homogeneous variances: t = MS n x Error x MS + n 1 2 Error Note: Variance is calculated through MS Error 10
A Priori Tests: Multiple Comparison t-tests SPSS Output: Descriptive Statistics Descriptive Statistics Treatment Group Change in weight at Post-Intervention Valid N (listwise) Change in weight at Post-Intervention Valid N (listwise) Change in weight at Post-Intervention Valid N (listwise) Std. N Minimum Maximum Mean Deviation Variance Skewness Kurtosis Statistic Statistic Statistic Statistic Statistic Statistic Statistic Statistic 12-10.00 2.00-3.4583 3.60214 12.975 -.220 -.519 12 11-2.00 4.00.3182 1.79266 3.214.776.198 11 14.00 9.00 3.7500 2.45537 6.029.375.373 14 A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 4.826.039-3.135 21.005-3.77652 1.20453-3.222 16.428.005-3.77652 1.17193 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 2.281.144-6.037 24.000-7.20833 1.19406-5.862 18.962.000-7.20833 1.22960 11
A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference.580.454-3.886 23.001-3.43182.88318-4.037 22.913.001-3.43182.85017 Obviously, we would NOT conduct all 3 of these analyses for a priori comparisons this example is for educational purposes only A Priori Tests: Bonferroni Correction A way of correcting the α level to correct for the number of analyses Divide the α used in the analysis by the number of comparisons conducted Thus, the available Type I error rate is split amongst the comparisons Test is more conservative α = α / c α =.05 / 3 α =.017 12
A Priori Tests: Bonferroni Correction Where, oh where, is the α =.017 table? It doesn t exist Bonferroni correction to α is useful ONLY when you can calculate the exact probability associated with a given finding Computerized statistical packages like SPSS Most commonly used correction for familywise error A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests α =.017 Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 4.826.039-3.135 21.0050-3.77652 1.20453-3.222 16.428.0052-3.77652 1.17193 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 2.281.144-6.037 24.0000-7.20833 1.19406-5.862 18.962.0000-7.20833 1.22960 13
A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests α =.017 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference.580.454-3.886 23.0007-3.43182.88318-4.037 22.913.0005-3.43182.85017 Obviously, we would NOT conduct all 3 of these analyses for a priori comparisons this example is for educational purposes only Before Post-Hoc Tests One-Way ANOVA results ANOVA Change in weight at Post-Intervention Sum of Squares df Mean Square F Sig. Between Groups 335.827 2 167.914 22.544.000 Within Groups 253.241 34 7.448 Total 589.068 36 14
Post Hoc Tests: Fisher s Least Significant Difference Test Also known as Fisher s LSD (no, not that one ) Same as a priori t-tests we explored earlier HOWEVER: Requires a significant F test When H 1 completely true: α = α H 1 :µ 1 µ 3 µ 2 If H 1 NOT completely true: α α H 1 :µ 1 = µ 3 µ 2 Since we know that H 1 is not always completely true, AVOID FISHER S LSD Post Hoc Tests: Fisher s Least Significant Difference Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons LSD (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.002-6.0917-1.4614-7.20833* 1.07364.000-9.3902-5.0264 3.77652* 1.13921.002 1.4614 6.0917-3.43182* 1.09961.004-5.6665-1.1972 7.20833* 1.07364.000 5.0264 9.3902 3.43182* 1.09961.004 1.1972 5.6665 *. The mean difference is significant at the.05 level. 15
Post Hoc Tests: The Studentized Range Statistic A statistic reflecting the difference between the largest and smallest means The studentized range statistic, because we look only at the largest and smallest means, systematically underestimates the Type I error rate However, the studentized range statistic is a step towards calculating other post hoc tests Post Hoc Tests: The Studentized Range Statistic First, rank order the means -3.46 ().32 () 3.75 () Second, plug into equation q = x l arg est smallest MS n Error l x MS + n 2 Error s 16
Post Hoc Tests: The Studentized Range Statistic A more useful form of the studentized range equation allows us to calculate the minimum difference between means that would be statistically significant MSError + (, ) nl xl xs = q.05 groups dferror 2 MS n Error s Post Hoc Tests: Tukey Highly Significant Difference Test Derived from the Studentized range statistic Conservatively controls for the number of steps between comparison groups One of the most common Post-hoc tests in use 17
Post Hoc Tests: Tukey s Highly Significant Difference Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons Tukey HSD (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.006-6.5681 -.9850-7.20833* 1.07364.000-9.8392-4.5774 3.77652* 1.13921.006.9850 6.5681-3.43182* 1.09961.010-6.1263 -.7373 7.20833* 1.07364.000 4.5774 9.8392 3.43182* 1.09961.010.7373 6.1263 *. The mean difference is significant at the.05 level. Post Hoc Tests: Scheffé Test Another derivation of the Studentized range statistic and Tukey s HSD One of the most conservative post-hoc tests in use 18
Post Hoc Tests: Scheffé Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound Scheffe -3.77652* 1.13921.009-6.6925 -.8605-7.20833* 1.07364.000-9.9565-4.4602 3.77652* 1.13921.009.8605 6.6925-3.43182* 1.09961.014-6.2464 -.6172 7.20833* 1.07364.000 4.4602 9.9565 3.43182* 1.09961.014.6172 6.2464 *. The mean difference is significant at the.05 level. Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons Tukey HSD Scheffe LSD B f i (I) Treatment Group C t l (J) Treatment Group C t l 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.006-6.5681 -.9850-7.20833* 1.07364.000-9.8392-4.5774 3.77652* 1.13921.006.9850 6.5681-3.43182* 1.09961.010-6.1263 -.7373 7.20833* 1.07364.000 4.5774 9.8392 3.43182* 1.09961.010.7373 6.1263-3.77652* 1.13921.009-6.6925 -.8605-7.20833* 1.07364.000-9.9565-4.4602 3.77652* 1.13921.009.8605 6.6925-3.43182* 1.09961.014-6.2464 -.6172 7.20833* 1.07364.000 4.4602 9.9565 3.43182* 1.09961.014.6172 6.2464-3.77652* 1.13921.002-6.0917-1.4614-7.20833* 1.07364.000-9.3902-5.0264 3.77652* 1.13921.002 1.4614 6.0917-3.43182* 1.09961.004-5.6665-1.1972 7.20833* 1.07364.000 5.0264 9.3902 3.43182* 1.09961.004 1.1972 5.6665 19
Post-Hoc Test Comparisons p LSD < p Tukey < p Scheffé Scheffé is most conservative LSD is least conservative (& potentially wrong) Which post-hoc test should you use? Purpose of the analysis Effect size 20