Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Similar documents
A posteriori multiple comparison tests

COMPARING SEVERAL MEANS: ANOVA

Multiple Comparisons

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

H0: Tested by k-grp ANOVA

B. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1)

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Lec 1: An Introduction to ANOVA

H0: Tested by k-grp ANOVA

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Group comparison test for independent samples

Introduction. Chapter 8

Chapter 6 Planned Contrasts and Post-hoc Tests for one-way ANOVA

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Multiple Comparison Methods for Means

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota


13: Additional ANOVA Topics. Post hoc Comparisons

Specific Differences. Lukas Meier, Seminar für Statistik

More about Single Factor Experiments

9 One-Way Analysis of Variance

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

One-way between-subjects ANOVA. Comparing three or more independent means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Laboratory Topics 4 & 5

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Analysis of Variance (ANOVA)

Contrasts and Multiple Comparisons Supplement for Pages

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Orthogonal, Planned and Unplanned Comparisons

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

An Old Research Question

Linear Combinations of Group Means

Chapter Seven: Multi-Sample Methods 1/52

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Comparing Several Means: ANOVA

Lecture Notes #3: Contrasts and Post Hoc Tests 3-1

Introduction to Analysis of Variance (ANOVA) Part 2

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

One-way between-subjects ANOVA. Comparing three or more independent means

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Introduction to the Analysis of Variance (ANOVA)

These are all actually contrasts (the coef sum to zero). What are these contrasts representing? What would make them large?

Analysis of Variance and Contrasts

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Group comparison test for independent samples

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

same hypothesis Assumptions N = subjects K = groups df 1 = between (numerator) df 2 = within (denominator)

STAT22200 Spring 2014 Chapter 5

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

26:010:557 / 26:620:557 Social Science Research Methods

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Analysis of Variance (ANOVA)

Analysis of variance

Analysis of Variance: Repeated measures

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Advanced Experimental Design

N J SS W /df W N - 1

13: Additional ANOVA Topics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Regression With a Categorical Independent Variable: Mean Comparisons

Independent Samples ANOVA

Preview from Notesale.co.uk Page 3 of 63

Analysis of Variance (ANOVA)

Comparisons among means (or, the analysis of factor effects)

On Selecting Tests for Equality of Two Normal Mean Vectors

Factorial Independent Samples ANOVA

Review of Statistics 101

Inferences for Regression

Tukey Complete Pairwise Post-Hoc Comparison

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Unit 12: Analysis of Single Factor Experiments

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

PLSC PRACTICE TEST ONE

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

Presented to the Graduate Council of the. North Texas State University. in Partial. Fulfillment of the Requirements. For the Degree of.

Difference in two or more average scores in different groups

AN EMPIRICAL INVESTIGATION OF TUKEY 1 S HONESTLY SIGNIFICANT DIFFERENCE TEST WITH VARIANCE HETEROGENEITY AND EQUAL SAMPLE SIZES,

Hypothesis testing: Steps

Garvan Ins)tute Biosta)s)cal Workshop 16/7/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

Why ANOVA? Analysis of Variance (ANOVA) One-Way ANOVA F-Test. One-Way ANOVA F-Test. One-Way ANOVA F-Test. Completely Randomized Design

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Hypothesis testing: Steps

Chap The McGraw-Hill Companies, Inc. All rights reserved.

What Does the F-Ratio Tell Us?

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

1 One-way Analysis of Variance

Analysis of Covariance (ANCOVA) Lecture Notes

Transcription:

Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1

We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without flying away after theories and fancies. Inspector Lestrade to Sherlock Holmes The Boscombe Valley Mystery Cohen Chap 13 - Multiple Comparisons 2

ANOVA Omnibus: Significant F-ratio Factor (IV) had effect on DV Groups are not from same population Which levels of factor differ? Must compare and contrast means from different levels Indicates 1 significant difference among all POSSIBLE comparisons Simple vs. complex comparisons Simple comparisons Comparing 2 means, pairwise Possible for no pair of group means to significantly differ Complex comparisons Comparing combinations of > 2 means Cohen Chap 13 - Multiple Comparisons 3

Multiple Comparison Procedure Multiple comparison procedures used to detect simple or complex differences Significant omnibus test NOT always necessary Inaccurate when assumptions violated Type II error OKAY to conduct multiple comparisons when p-value CLOSE to significance Cohen Chap 13 - Multiple Comparisons 4

Cohen Chap 13 - Multiple Comparisons 5

Error Rates α = p(type I error) Determined in study design Generally, α =.01,.05, or.10 comparison error rate (α PC ) α = α PC α PC = Error rate for any 1 comparison Experimentwise (α EW ) p( 1 Type I error for all comparisons) Relationship between α PC and α EW α EW = 1 (1 α PC ) c c = Number of comparisons (1 α PC ) c = p(not making Type I error over c) Cohen Chap 13 - Multiple Comparisons 6

Error rates ANOVA with 4 groups F-statistic is significant Comparing each group with one another c = 6 α PC =.05 α EW = α EW when c = 10? 3 Options Ignore α PC or α EW Modify α PC Modify α EW XvsX. 1 2 XvsX. 1 3 XvsX. 1 4 XvsX. 2 3 XvsX. 2 4 XvsX. 3 4 Cohen Chap 13 - Multiple Comparisons 7

Comparisons Post hoc (a posteriori) Selected after data collection and analysis Used in exploratory research Larger set of or all possible comparisons Inflated α EW : Increased p(type I error) Pre Planned (a priori) Selected before data collection Follow hypotheses and theory Justified conducting ANY planned comparison (ANOVA doesn t need to be significant) α EW is much smaller than alternatives α EW can slightly exceed α when planned Adjust when c is large or includes all possible comparisons?

Problems with comparisons Decision to statistically test certain post hoc comparisons made after examining data When only most-promising comparisons are selected, need to correct for inflated p(type I error) Biased sample data often deviates from population When all possible pairwise comparisons are conducted, p(type I error) or α EW is same for a priori and post hoc comparisons Cohen Chap 13 - Multiple Comparisons 9

For example, a significant F-statistic is obtained: Assume 20 pairwise comparisons are possible But, in population, no significant differences exist Made a Type I error obtaining significant F-statistic However, a post hoc comparison using sample data suggests largest and smallest means differ If we had conducted 1 planned comparison 1 in 20 chance (α =.05) of conducting this comparison and making a type I error If we had conducted all possible comparisons 100% chance (α = 1.00) of conducting this comparison and making a type I error If researcher decides to make only 1 comparison after looking at data, between largest and smallest means, chance of type I error is still 100% All other comparisons have been made in head and this is only one of all possible comparisons Testing largest vs. smallest means is probabilistically similar to testing all possible comparisons Cohen Chap 13 - Multiple Comparisons 10

Common techniques a priori tests Multiple t-tests Bonferroni (Dunn) Dunn-Ŝidák* Holm* Linear contrasts *adjusts α PC Italicized: not covered post hoc tests Fisher LSD Tukey HSD Student-Newman-Keuls (SNK) Tukey-b Tukey-Kramer Games-Howell Duncan s Dunnett s REGWQ Scheffé 11

Common techniques a priori tests Multiple t-tests Bonferroni (Dunn) All called post Student-Newman-Keuls hoc (SPSS) or multiple comparisons (R) Dunn-Ŝidák* (SNK) Holm* can be used as Tukey-Kramer post hoc procedures Linear contrasts Called post hoc, Games-Howell not because they were planned after doing the study per se, but because they are Duncan s conducted after an omnibus test *adjusts α PC Italicized: not covered Many more comparison techniques available post hoc tests Most statistical Fisher packages LSD make no a priori / post hoc distinction Tukey HSD In practice, most a priori comparison techniques Tukey-b Dunnett s REGWQ Scheffé 12

A Priori procedures: multiple t-tests Homogeneity of variance MS W (estimated pooled variance) and df W (both from ANOVA) for critical value (smaller F crit ) t X -X X -X = = MS MS 2MS + n n n 1 2 1 2 W W W 1 2 Heterogeneity of variance and equal n Above equation: Replace MS W with s j2 and df W with df = 2(n j - 1) for t crit Heterogeneity of variance and unequal n Above equation: Replace MS W with s j2 and df W with Welch-Satterwaite df for t crit Cohen Chap 13 - Multiple Comparisons 13 j

A Priori procedures: Bonferroni (Dunn) t-test Bonferroni inequality p(occurrence for set of events (additive) of probabilities for each event) Adjusting α PC Each comparison has p(type I error) = α PC =.05 α EW =.05 α EW c*α PC p( 1 Type I error) can never exceed c*α PC Conduct standard independent-samples t-tests per pair Example for 6 comparisons: α PC =.05/6 =.0083 Cohen Chap 13 - Multiple Comparisons 14

A Priori procedures: Bonferroni (Dunn) t-test t-tables lack Bonferroni-corrected critical values Software: Exact p-values Is exact p-value Bonferroni-corrected α-level? Example for 6 comparisons: α PC =.05/6 =.0083 More conservative: Reduced p(type I error) Less powerful: Increased p(type II error) Cohen Chap 13 - Multiple Comparisons 15

A Priori procedures: linear contrasts - idea Linear combination of means: L= c X + c X + + c X =åc X 1 1 1 2 2 k k j j i= 1 Each group mean weighted by constant (c) Products summed together k Example 1: 4 means Compare M 1 to M 2, ignore others c 1 = 1, c 2 = -1, c 3 = 0, c 4 = 0 L= (1) X + (- 1) X + (0) X + (0) X = X - X 2 3 4 1 2 Weights selected so means of interest are compared Sum of weights = 0 Example 2: Same 4 means Compare M 1, M 2, and M 3 to M 4 c 1 = 1/3, c 2 = 1/3, c 3 = 1/3, c 4 = -1 ( X + X + X ) L= (1/ 3) X + (1/ 3) X + (1/ 3) X + (- 1) X = - X 3 1 2 3 1 2 3 4 4 Cohen Chap 13 - Multiple Comparisons 16

A Priori procedures: linear contrasts - SS Each linear combination: SS Contrast SS Equal ns: Unequal ns: nl ( c X ) 2 j j j j j= 1 Contrast = = k k 2 2 åcj åcj j= 1 j= 1 n k å 2 Contrast ( cx 2 j j) L j= 1 = = k 2 k 2 æc ö æ j c ö j å ç n åç n j= 1è j ø j= 1è j ø SS Between partitioned into k SS Contrasts SS k å 2 df for SS B = k 1 df for SS Contrast = Number of groups/sets included in contrast minus 1 F = MS Contrast / MS W MS Contrast = SS Contrast / df Contrast As df = 1, MS Contrast = SS Contrast MS W from omnibus ANOVA results SS Between = SS Contrast 1 + SS Contrast 2 + + SS Contrast k F å 2 2 2 2 MS nl / c Contrast j nl L = = = or 2 2 MS * k W MSW åc j MSW æc ö j å * MS ç j= 1 n è j ø W Max # legal contrasts = df B Do not need to consume all available df Use smaller α EW if # contrasts > df B 17

A Priori procedures: linear contrasts - example Test each Contrast (ANOVA: SS Between = 26.53, SS Within = 22.8) Contrast 1: M No Noise versus M Moderate and M loud, L = (-2)(9.2) + (1)(6.6) + (1)(6.2) = -18.4 + 12.8 = -5.6 SS Contrast1 = 5*(-5.6) 2 / (-2 2 + 1 2 + 1 2 ) = 156.8 / 6 = 26.13 df B = 2 1 = 1 à MS Contrast1 = 26.13/1 = 26.13 df W = 15 3 = 12 à MS W = 22.8/12 = 1.90 Mean N 9.2 5 6.6 5 6.2 5 F = 26.13/1.980 = 13.75 P<.05 α =.05 & df W = 12 à F crit = 4.75 Note: SS B = SS Contrast1 + SS Contrast2 = 26.13 + 0.40 = 26.53 Cohen Chap 13 - Multiple Comparisons 18

A Priori procedures: linear contrasts - example Test each Contrast (ANOVA: SS Between = 26.53, SS Within = 22.8) Contrast 1: M No Noise versus M Moderate and M loud, L = (-2)(9.2) + (1)(6.6) + (1)(6.2) = -18.4 + 12.8 = -5.6 SS Contrast1 = 5*(-5.6) 2 / (-2 2 + 1 2 + 1 2 ) = 156.8 / 6 = 26.13 df B = 2 1 = 1 à MS Contrast1 = 26.13/1 = 26.13 df W = 15 3 = 12 à MS W = 22.8/12 = 1.90 Mean Contrast 2: M Moderate versus M loud N 9.2 5 6.6 5 6.2 5 F = 26.13/1.980 = 13.75 P<.05 α =.05 & df W = 12 à F crit = 4.75 Note: SS B = SS Contrast1 + SS Contrast2 = 26.13 + 0.40 = 26.53 L = (0)(9.2) + (-1)(6.6) + (1)(6.2) = -0.4 SS Contrast2 = 5*(-0.4) 2 / (1 2 + [-1] 2 ) = 0.8 / 2 = 0.40 df B = 2 1 = 1à MS Contrast2 = 0.40/1 = 0.40 df W = 15 3 = 12 à MS W = 22.8/12 = 1.90 F = 0.40/1.90 = 0.21 P >.05 Cohen Chap 13 - Multiple Comparisons 19

A Priori procedures: linear contrasts - Orthogonal Independent (orthogonal) contrasts If M 1 is larger than average of M 2 and M 3 Tells us nothing about M 4 and M 5 Dependent (non-orthogonal) contrasts If M 1 is larger than average of M 2 and M 3 Increased probability that M 1 > M 2 or M 1 > M 3 Can conduct non-orthogonal contrasts, but Dependency in data Inefficiency in analysis Contain redundant information Increased p(type I error) 20

A Priori procedures: linear contrasts - Orthogonal Orthogonality indicates SS Contrasts are independent partitions of SS B Orthogonality obtained when Σ of SS Contrasts = SS Between Two rules are met: k k åc j = 0 åc1jc2jclj j= 1 j= 1 Rule 1: Rule 2: where c Lj = Contrast weights from additional linear combinations From example Orthogonal! Rule 1: L 1 = (1)+(1)+(-2) = 0; L 2 = 1+(-1)+(0) = 0 Rule 2: -2*0 + 1*1 + 1*-1 = 1 + -1 + 0 = 0 = 0 Cohen Chap 13 - Multiple Comparisons 21

A Priori procedures: recommendations 1 pairwise comparison of interest Standard t-test Several pairwise comparisons Bonferroni, Multiple t-tests Bonferroni is most widely used (varies by field), and can be used for multiple statistical testing situations 1 complex comparison Linear contrast Several complex comparisons Orthogonal linear contrasts no adjustment Non-orthogonal contrasts Bonferroni correction or more conservative α PC Cohen Chap 13 - Multiple Comparisons 22

Post hoc procedures: Fisher s LSD Test Aka: Fisher s Protected t-test = Multiple t-test Conduct as described previously: multiple t-tests Fisher s LSD test : Only after significant F stat Multiple t-test : Planned a priori One advantage is that equal ns are not required Logic If H 0 true and all means equal one another, significant overall F-statistic ensures α EW is fixed at α PC Powerful: No adjustment to α PC Most liberal post hoc comparison Highest p(type I error) Not recommended in most cases Only use when k = 3 Cohen Chap 13 - Multiple Comparisons 23

Post hoc procedures: studentized range q t-distribution derived under assumption of comparing only 2 sample means With >2 means, sampling distribution of t is NOT appropriate as p(type I error) > α Need sampling distributions based on comparing multiple means Studentized range q-distribution k random samples (equal n) from population Difference between high and low means Differences divided by MS n Obtain probability of multiple mean differences Critical value varies to control α EW j W Rank order group means (low to high) r = Range or distance between groups being compared 4 means: Comparing M 1 to M 4, r = 4; comparing M 3 to M 4, r = 2 Not part of calculations, used to find critical value q crit : Use r, df W from ANOVA, and α q crit always positive Most tests of form: q = X - X 1 2 MS n j W 24

Post hoc procedures: studentized range q r df w q crit Cohen Chap 13 - Multiple Comparisons 25

Post hoc procedures: studentized range q Note square root of 2 missing from denominator Each critical value (q crit ) in q-distribution has already been multiplied by square root of 2 q X - X 1 2 = 1 2 1 2 MS n j W Vs. X -X X -X t = = MS MS 2MS + n n n W W W 1 2 Assumes all samples are of same n Unequal ns can lead to inaccuracies depending on group size differences If ns are unequal, alternatives are: Compute harmonic mean (below) of n (if ns differ slightly) Equal variance: Tukey-Kramer, Gabriel, Hochberg's GT2 Unequal variance: Games-Howell j Post hoc tests that rely on studentized range distribution: Tukey HSD Tukey s b S-N-K Games-Howell REGWQ Duncan Cohen Chap 13 - Multiple Comparisons 26

Post Hoc Procedures: Tukey s HSD test Based on premise that Type I error can be controlled for comparison involving largest and smallest means, thus controlling error for all Significant ANOVA NOT required q crit based on df W, α EW (table.05), and largest r If we had 5 means, all comparisons would be evaluated using q crit based on r = 5 q crit compared to q obt MS W from ANOVA One of most conservative post hoc comparisons, good control of α EW Compared to LSD HSD less powerful w/ 3 groups (Type II error) HSD more conservative; less Type I error w/ > 3 groups Preferred with > 3 groups Cohen Chap 13 - Multiple Comparisons 27

Post Hoc Procedures: Tukey s HSD test Based on premise that Type I error can be controlled for comparison involving largest and smallest means, thus controlling error for all Significant ANOVA NOT required q crit based on df W, α EW (table.05), and largest r If we had 5 means, all comparisons would be evaluated using q crit based on r = 5 q crit compared to q obt MS W from ANOVA One of most conservative post hoc comparisons, good control of α EW Compared to LSD HSD less powerful w/ 3 groups (Type II error) HSD more conservative; less Type I error w/ > 3 groups Preferred with > 3 groups Fisher s LSD is most liberal Tukey s HSD is nearly most conservative Others are in-between Cohen Chap 13 - Multiple Comparisons 28

Post hoc: Confidence intervals: HSD q = X - X 1 2 MS n j W Simultaneous Confidence Intervals for all possible pairs of populations means at the same time!! "! $ = &' ( &' ) ± +,-. / = 0 " 0 $ ± 123 Interval DOES INCLUDS zero à fail to reject H0: means are the same no difference Interval does NOT INCLUDS zero à REJECT H0 à evidence there IS a DIFFERENCE Cohen Chap 13 - Multiple Comparisons 29

Post hoc procedures: Scheffé Test Most conservative and least powerful Uses F- rather than t-distribution to find critical value F Scheffé = (k-1)*f crit (k-1, N-k) Scheffé recommended running his test with α EW =.10 F Scheffé is now F crit used in testing Similar to Bonferroni; α PC is computed by determining all possible linear contrasts AND pairwise contrasts Not recommended in most situations Only use for complex post-hoc comparisons Compare F contrast to F Scheffé Cohen Chap 13 - Multiple Comparisons 30

Post hoc procedures: recommendations 1 pairwise comparison of interest Standard independent-samples t-test Several pairwise comparisons 3 à LSD > 3 à HSD or other alternatives such as Tukey-b or REGWQ Control vs. set of Tx groups à Dunnett s 1 complex comparison (linear contrast) No adjustment Several complex comparisons (linear contrasts) Non-orthogonal Scheffé test Orthogonal Use more conservative α PC Cohen Chap 13 - Multiple Comparisons 31

Analysis of trend components Try when the independent variable (IV) is highly ordinal or truly underlying continuous * LINEAR regression: Run linear regression with the IV as predictor Compare the F-statistic s p-value for the source=regression to the ANOVA source=between * CURVE-a-linear regression: create a new variable that is = IV variable SQUARED Run linear regression with BOTH the original IV & the squared-iv as predictors Compare the F-statistic s p-value for the source=regression Cohen Chap 13 - Multiple Comparisons 32

Conclusion Not all researchers agree about best approach/methods Method selection depends on Researcher preference (conservative/liberal) Seriousness of making Type I vs. II error Equal or unequal ns Homo- or heterogeneity of variance Can also run mixes of pairwise and complex comparisons Adjusting α PC to p(type I error), p(type II error) a priori more powerful than post hoc a priori are better choice Fewer in number; more meaningful Forces thinking about analysis in advance Cohen Chap 13 - Multiple Comparisons 33

34