Multiple Comparisons

Similar documents
H0: Tested by k-grp ANOVA

H0: Tested by k-grp ANOVA

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

A posteriori multiple comparison tests

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

COMPARING SEVERAL MEANS: ANOVA

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

What Does the F-Ratio Tell Us?

Hypothesis testing: Steps

13: Additional ANOVA Topics. Post hoc Comparisons

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Comparing Several Means: ANOVA

B. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1)

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Contrasts (in general)

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Hypothesis testing: Steps

Introduction to Analysis of Variance (ANOVA) Part 2

Contrasts and Multiple Comparisons Supplement for Pages

Analysis of variance

Introduction to the Analysis of Variance (ANOVA)

Chapter 6 Planned Contrasts and Post-hoc Tests for one-way ANOVA

N J SS W /df W N - 1

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Independent Samples ANOVA

Analysis of Variance (ANOVA)

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

One-way between-subjects ANOVA. Comparing three or more independent means

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

One-way between-subjects ANOVA. Comparing three or more independent means

ANOVA Analysis of Variance

Topic 1. Definitions

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

ANOVA continued. Chapter 11

ANOVA continued. Chapter 10

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Difference in two or more average scores in different groups

ANOVA continued. Chapter 10

Parametric Analysis of Variance

An Old Research Question

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Using SPSS for One Way Analysis of Variance

Laboratory Topics 4 & 5

STAT22200 Spring 2014 Chapter 5

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

SPSS Guide For MMI 409

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Comparisons among means (or, the analysis of factor effects)

Lec 1: An Introduction to ANOVA

13: Additional ANOVA Topics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

M A N O V A. Multivariate ANOVA. Data

Analysis of Variance: Repeated measures

Analysis of Variance (ANOVA)

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Linear Combinations of Group Means

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Factorial Independent Samples ANOVA

PLSC PRACTICE TEST ONE

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

Sampling Distributions: Central Limit Theorem

Group comparison test for independent samples

Specific Differences. Lukas Meier, Seminar für Statistik

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

Introduction. Chapter 8

Preview from Notesale.co.uk Page 3 of 63

Two-Way ANOVA. Chapter 15

PSY 216. Assignment 12 Answers. Explain why the F-ratio is expected to be near 1.00 when the null hypothesis is true.


In ANOVA the response variable is numerical and the explanatory variables are categorical.

Workshop Research Methods and Statistical Analysis

Lecture Notes #3: Contrasts and Post Hoc Tests 3-1

Tukey Complete Pairwise Post-Hoc Comparison

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

STAT 115:Experimental Designs

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

1 One-way Analysis of Variance

Advanced Experimental Design

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

9 One-Way Analysis of Variance

Basic Statistical Analysis

Cheat Sheet: ANOVA. Scenario. Power analysis. Plotting a line plot and a box plot. Pre-testing assumptions

Introduction to Statistics with GraphPad Prism 7

Analysis of Variance

Sampling Distributions

Regression Part II. One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons

Chapter 13 Correlation

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

ANOVA. Testing more than 2 conditions

Chapter 10. Design of Experiments and Analysis of Variance

Transcription:

Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when we conduct ANOVAs Our ANOVA provides a single F statistic assessing the suitability of H o NOT how groups differ Omnibus test Rejecting H 0 simply tells us that: H 0 :µ 1 = µ 2 = µ 3 Is not an accurate representation of the data 1

Multiple Comparisons: A Rationale Any of the following could be true: H 1 :µ 1 µ 2 µ 3 H 1 :µ 1 µ 2 = µ 3 H 1 :µ 1 µ 3 = µ 2 H 1 :µ 1 = µ 2 µ 3 H 1 :µ 1 = µ 3 µ 2 All are potentially valid null hypotheses Which is most accurate? Multiple Comparisons: Example Going back to our AN treatment example: Which treatment is best for AN? H 0 :µ = µ = µ H 1 :µ µ µ H 1 :µ µ = µ H 1 :µ µ = µ H 1 :µ = µ µ H 1 :µ = µ µ 2

Multiple Comparisons: Example We could simply compare differences between groups by conducting a series of individual groups t-tests vs. vs. vs. From these three analyses, we can answer which of the alternate hypotheses best fits the data Why don t we do it this way? Multiple Comparisons: The Limitation Too much work (3 more analyses!) Inflation of Type I error rate 3

Error Associated With Multiple Comparisons When we want to consider error rates for analyses, there are two ways to calculate error: Per-comparison Error Associated With Multiple Comparisons The probability associated with making an error for each analysis Could be Type I or Type II but generally we think only about Type I error Per-Comparison (PC) error rate = α Thus, for every analysis we conduct, there is a fixed amount of error we have to live with 4

Familywise Error Associated With Multiple Comparisons The probability associated with making an error for a set of comparisons Familywise error rate = 1 (1 α) c α = α c = the number of comparisons made Error Associated With Multiple Comparisons Thus, for our AN treatment example: α = 1 (1.05) 3 α = 1 (.95) 3 α = 1 (.8574) α =.14 Thus, we see that the familywise error rate is slightly less than the sum of the α values for all 3 analyses 5

Error Associated With Multiple Comparisons When conducting multiple comparisons, we run the risk of inflating the error in our analysis The more analyses, the greater the chance of making a Type I error 3 t-tests =.14 Multiple Comparisons: Conclusion Most multiple comparison procedures seek to minimize or eliminate the impact of familywise error This is the reason we use multiple comparison tests to evaluate group differences in ANOVA 6

Types of Multiple Comparison Tests A Priori Tests based on hypotheses you have BEFORE collecting and analyzing data Driven by your theory Planned without seeing the results of the analyses However, because of this theoretical grounding, a priori tests must be a subset of all possible comparisons A Priori Tests: FAQ Why can t I conduct a priori tests after collecting data? Seeing the data (group means) may bias the hypotheses you generate Why can t I explore all possible comparisons when conducting a priori tests? Exploring all possible comparisons is the process of conducting post hoc tests Mining the data 7

A Priori Tests: FAQ Can I conduct a priori tests without an ANOVA? Yes! If you can narrow down your hypotheses to begin with, it is possible to conduct a priori tests without an F test however, you should have a good reason for doing so. Types of Multiple Comparison Tests Post Hoc Tests planned after the data is collected and the experimenter has examined the group means These tests are subject to experimenter bias and may be influenced by expectancy effects 8

Post-hoc Tests: FAQ SPSS isn t printing out post-hoc results for me. It s giving me an error. Remember: multiple comparison tests require at least 3 groups. Otherwise, you can simply interpret the means. You likely have a grouping variable with 2 categories. Which MC is best? Given the grounding of a priori tests in theory, they are generally preferred to post hoc tests More difficult to plan and conduct Unexpected results? 9

A Priori Tests: Multiple Comparison t-tests The easiest a priori test is the multiple comparison t-test This test will NOT control for familywise error rate important to choose comparisons carefully. A Priori Tests: Multiple Comparison t-tests For homogeneous variances: t = MS n x Error x MS + n 1 2 Error Note: Variance is calculated through MS Error 10

A Priori Tests: Multiple Comparison t-tests SPSS Output: Descriptive Statistics Descriptive Statistics Treatment Group Change in weight at Post-Intervention Valid N (listwise) Change in weight at Post-Intervention Valid N (listwise) Change in weight at Post-Intervention Valid N (listwise) Std. N Minimum Maximum Mean Deviation Variance Skewness Kurtosis Statistic Statistic Statistic Statistic Statistic Statistic Statistic Statistic 12-10.00 2.00-3.4583 3.60214 12.975 -.220 -.519 12 11-2.00 4.00.3182 1.79266 3.214.776.198 11 14.00 9.00 3.7500 2.45537 6.029.375.373 14 A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 4.826.039-3.135 21.005-3.77652 1.20453-3.222 16.428.005-3.77652 1.17193 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 2.281.144-6.037 24.000-7.20833 1.19406-5.862 18.962.000-7.20833 1.22960 11

A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference.580.454-3.886 23.001-3.43182.88318-4.037 22.913.001-3.43182.85017 Obviously, we would NOT conduct all 3 of these analyses for a priori comparisons this example is for educational purposes only A Priori Tests: Bonferroni Correction A way of correcting the α level to correct for the number of analyses Divide the α used in the analysis by the number of comparisons conducted Thus, the available Type I error rate is split amongst the comparisons Test is more conservative α = α / c α =.05 / 3 α =.017 12

A Priori Tests: Bonferroni Correction Where, oh where, is the α =.017 table? It doesn t exist Bonferroni correction to α is useful ONLY when you can calculate the exact probability associated with a given finding Computerized statistical packages like SPSS Most commonly used correction for familywise error A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests α =.017 Independent Samples Test vs. Levene's Test for Equality of Variances t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 4.826.039-3.135 21.0050-3.77652 1.20453-3.222 16.428.0052-3.77652 1.17193 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed Std. Error F Sig. t df Sig. (2-tailed) Mean Difference Difference 2.281.144-6.037 24.0000-7.20833 1.19406-5.862 18.962.0000-7.20833 1.22960 13

A Priori Tests: Multiple Comparison t-tests SPSS Output: t-tests α =.017 vs. Levene's Test for Equality of Variances Independent Samples Test t-test for Equality of Means Change in weight at Post-Intervention Equal variances assumed Equal variances not assumed F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference.580.454-3.886 23.0007-3.43182.88318-4.037 22.913.0005-3.43182.85017 Obviously, we would NOT conduct all 3 of these analyses for a priori comparisons this example is for educational purposes only Before Post-Hoc Tests One-Way ANOVA results ANOVA Change in weight at Post-Intervention Sum of Squares df Mean Square F Sig. Between Groups 335.827 2 167.914 22.544.000 Within Groups 253.241 34 7.448 Total 589.068 36 14

Post Hoc Tests: Fisher s Least Significant Difference Test Also known as Fisher s LSD (no, not that one ) Same as a priori t-tests we explored earlier HOWEVER: Requires a significant F test When H 1 completely true: α = α H 1 :µ 1 µ 3 µ 2 If H 1 NOT completely true: α α H 1 :µ 1 = µ 3 µ 2 Since we know that H 1 is not always completely true, AVOID FISHER S LSD Post Hoc Tests: Fisher s Least Significant Difference Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons LSD (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.002-6.0917-1.4614-7.20833* 1.07364.000-9.3902-5.0264 3.77652* 1.13921.002 1.4614 6.0917-3.43182* 1.09961.004-5.6665-1.1972 7.20833* 1.07364.000 5.0264 9.3902 3.43182* 1.09961.004 1.1972 5.6665 *. The mean difference is significant at the.05 level. 15

Post Hoc Tests: The Studentized Range Statistic A statistic reflecting the difference between the largest and smallest means The studentized range statistic, because we look only at the largest and smallest means, systematically underestimates the Type I error rate However, the studentized range statistic is a step towards calculating other post hoc tests Post Hoc Tests: The Studentized Range Statistic First, rank order the means -3.46 ().32 () 3.75 () Second, plug into equation q = x l arg est smallest MS n Error l x MS + n 2 Error s 16

Post Hoc Tests: The Studentized Range Statistic A more useful form of the studentized range equation allows us to calculate the minimum difference between means that would be statistically significant MSError + (, ) nl xl xs = q.05 groups dferror 2 MS n Error s Post Hoc Tests: Tukey Highly Significant Difference Test Derived from the Studentized range statistic Conservatively controls for the number of steps between comparison groups One of the most common Post-hoc tests in use 17

Post Hoc Tests: Tukey s Highly Significant Difference Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons Tukey HSD (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.006-6.5681 -.9850-7.20833* 1.07364.000-9.8392-4.5774 3.77652* 1.13921.006.9850 6.5681-3.43182* 1.09961.010-6.1263 -.7373 7.20833* 1.07364.000 4.5774 9.8392 3.43182* 1.09961.010.7373 6.1263 *. The mean difference is significant at the.05 level. Post Hoc Tests: Scheffé Test Another derivation of the Studentized range statistic and Tukey s HSD One of the most conservative post-hoc tests in use 18

Post Hoc Tests: Scheffé Test Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons (I) Treatment Group (J) Treatment Group 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound Scheffe -3.77652* 1.13921.009-6.6925 -.8605-7.20833* 1.07364.000-9.9565-4.4602 3.77652* 1.13921.009.8605 6.6925-3.43182* 1.09961.014-6.2464 -.6172 7.20833* 1.07364.000 4.4602 9.9565 3.43182* 1.09961.014.6172 6.2464 *. The mean difference is significant at the.05 level. Dependent Variable: Change in weight at Post-Intervention Multiple Comparisons Tukey HSD Scheffe LSD B f i (I) Treatment Group C t l (J) Treatment Group C t l 95% Confidence Interval Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound -3.77652* 1.13921.006-6.5681 -.9850-7.20833* 1.07364.000-9.8392-4.5774 3.77652* 1.13921.006.9850 6.5681-3.43182* 1.09961.010-6.1263 -.7373 7.20833* 1.07364.000 4.5774 9.8392 3.43182* 1.09961.010.7373 6.1263-3.77652* 1.13921.009-6.6925 -.8605-7.20833* 1.07364.000-9.9565-4.4602 3.77652* 1.13921.009.8605 6.6925-3.43182* 1.09961.014-6.2464 -.6172 7.20833* 1.07364.000 4.4602 9.9565 3.43182* 1.09961.014.6172 6.2464-3.77652* 1.13921.002-6.0917-1.4614-7.20833* 1.07364.000-9.3902-5.0264 3.77652* 1.13921.002 1.4614 6.0917-3.43182* 1.09961.004-5.6665-1.1972 7.20833* 1.07364.000 5.0264 9.3902 3.43182* 1.09961.004 1.1972 5.6665 19

Post-Hoc Test Comparisons p LSD < p Tukey < p Scheffé Scheffé is most conservative LSD is least conservative (& potentially wrong) Which post-hoc test should you use? Purpose of the analysis Effect size 20