Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model

Similar documents
Biostatistics 270 Kruskal-Wallis Test 1. Kruskal-Wallis Test

STAT22200 Spring 2014 Chapter 5

ANOVA Randomized Block Design

Garvan Ins)tute Biosta)s)cal Workshop 16/7/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

Statistics for EES Factorial analysis of variance

Math 141. Lecture 16: More than one group. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

More about Single Factor Experiments

13: Additional ANOVA Topics. Post hoc Comparisons

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Chapter Seven: Multi-Sample Methods 1/52

Biostatistics 380 Multiple Regression 1. Multiple Regression

Statistics - Lecture 05

ANOVA: Analysis of Variation

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Introduction. Chapter 8

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Lec 1: An Introduction to ANOVA

Multiple Comparisons

Multiple comparisons - subsequent inferences for two-way ANOVA

Analysis of Variance II Bios 662

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Hypothesis testing: Steps

One-Way Analysis of Variance: ANOVA

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Introduction to Analysis of Variance (ANOVA) Part 2

ANOVA: Comparing More Than Two Means

H0: Tested by k-grp ANOVA

Hypothesis testing: Steps

Introduction to Analysis of Variance. Chapter 11

Analysis of Variance

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Analysis of Variance

Analysis of variance (ANOVA) Comparing the means of more than two groups

MAT3378 (Winter 2016)

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

ANOVA Multiple Comparisons

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

B. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1)

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 12: Detailed Analyses of Main Effects and Simple Effects

Comparing Several Means: ANOVA

Analysis of variance

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

H0: Tested by k-grp ANOVA

2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

One-Way ANOVA Source Table J - 1 SS B / J - 1 MS B /MS W. Pairwise Post-Hoc Comparisons of Means

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

ANOVA Analysis of Variance

13: Additional ANOVA Topics

Specific Differences. Lukas Meier, Seminar für Statistik

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Unit 12: Analysis of Single Factor Experiments

1. What does the alternate hypothesis ask for a one-way between-subjects analysis of variance?

Comparisons among means (or, the analysis of factor effects)

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Statistiek II. John Nerbonne using reworkings by Hartmut Fitz and Wilbert Heeringa. February 13, Dept of Information Science

An Old Research Question

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

ANOVA: Analysis of Variance

Group comparison test for independent samples

Two (or more) factors, say A and B, with a and b levels, respectively.

3. Design Experiments and Variance Analysis

SEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics

COMPARING SEVERAL MEANS: ANOVA

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

9 One-Way Analysis of Variance

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

PLSC PRACTICE TEST ONE

20.0 Experimental Design

General Linear Models (GLM) for Fixed Factors

COMPARISON OF MEANS OF SEVERAL RANDOM SAMPLES. ANOVA

Factorial designs. Experiments

In ANOVA the response variable is numerical and the explanatory variables are categorical.

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Chap The McGraw-Hill Companies, Inc. All rights reserved.


1 Introduction to One-way ANOVA

Analysis of Variance and Contrasts

Analysis of Variance (ANOVA)

N J SS W /df W N - 1

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Sleep data, two drugs Ch13.xls

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Orthogonal, Planned and Unplanned Comparisons

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Mathematical Notation Math Introduction to Applied Statistics

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Transcription:

Biostatistics 250 ANOVA Multiple Comparisons 1 ORIGIN 1 Multiple Pairwise Comparison Procedures in One-Way ANOVA with Fixed Effects Model When the omnibus F-Test for ANOVA rejects the null hypothesis that all i 's = 0, or equivalently that all i are the same, then one usually wants to determine specifically which of the i 's differ from the others. In two population t-tests, only a single comparison between population means ( 1 with 2 is made. In ANOVA, greater than two populations is standard and multiple pairwise (i.e., 1-2 and 1-3 and 2-3 for three populations or multiple linear contrasts (i.e., 1 + 2-2 3 etc. are often of interest. In these cases, comparisons are made simultaneously, and are therefore dependent upon the same sample data. As a result, it is standadard practice to pool the estimate of within group variance (MS W. However, in comparing multiple hypotheses simultaneously an important complication arises. The existence of multiple dependent probabilities derived from each comparison implies that the joint probability of a family of comparisons together is greater than each one separately. Thus, if one specifies Type I error = 0.05 for one one test then familywise fw for all tests together is always greater than (e.g., less significant - see Zar 2010 Ch. 10. From the mathematical end of things, statisticians routinely caution experimenters about the potential pitfalls of post hoc (analysis following data collection, or "unplanned" "data snooping" (to borrow a term from Neter et al. 1996. By this, they mean running a large number of simultaneous tests or confidence intervals, and then proceeding to report "statistically significant" findings as if discovered outside the context of all the tests run, or worse, as the result of a strategically-chosen a priori (before data collection, or "planned" experimental design. The problem is that if enough simultaneous tests are run, the laws of probability predict that some tests will end up showing significance merely due to chance. This mathematically-based caution is certainly correct. Given this, are one or more "significant" planned or unplanned results in ANOVA tests to be considered valid or not? Much depends on what exactly is meant by the foundational concept of in often widely differing theoretical and experimental contexts. Whereas mathematicians might like to draw a bright line between a priori and post hoc, rarely in experimental practice is the distinction so clear. All experiments exist within a framework of pre-existing literature and laboratory/field data collection. In addition, data collection in research often involves cycles of refinement, and thus rarely a one-shot deal. So of course biologists regularly conduct complex "data snooping" in conceiving of problems, designing studies and analyzing results. They could hardly do otherwise, and engaging in a debate about what a researcher knew about data, and when he or she knew it relative to some often abstract point in time, is usually very silly. However, in my opinion, the a priori vs post hoc distinction remains of interest from both theoretical and practical standpoints, and to be aware of the issues involved makes it possible to construct stronger scientific arguments. The distinction also points to clear limitations in statistical reasoning in the sciences to the extent that all of it must be acknowledged to be nothing more than a rough approximation. If one's data are overwhelmingly clear, then difficulties in approximation of probabilities do not really matter. However, if the data are unclear, then how one employs the approximation may influence what one might say within a test, but not necessarily what one might conclude. The take-home message remains the same - the data remain ambiguous. Biological interpretation, and experimental replication, must necessarily take precedence over abstract mathematical reasoning. Simultaneous Inference Procedures in Practice: Several multiple comparison procedures (MCP have been developed to adjust Probabilities of Tests and associated Confidence Interval widths to accomodate familywise assessments. Some methods explicitly permit "data snooping" whereas others do not. It is important to be aware of how, and under what circumstances, each procedure is employed. As a practical matter, of course, standard statistical packages offer a full battery of possibilites and if the data permits, use of the "most conservative" (i..e, widest confidence intervals is often considered evidence of good experimental design.

Biostatistics 250 ANOVA Multiple Comparisons 2 Data Structure: k groups with not necessarily the same numbers of observations and different means. Let index i,j indicate the ith column (treatment class and jth row (object. Model: X i,j = + i + i,j Restriction: < where: One-Way ANOVA Treatment Classes: Objects (Replicates #1 #2 #3 #k 1 2 3 n n1 n2 n3 nk means: X1bar X2bar X3bar Xkbar is the grand mean of all objects. i is the mean of i = + i for each class i. i,j is the error term specific to each object i,j n i i 0 i or i 0 i or k 0 < allows estimation of k parameters. See Rosner 2006 p. 558 Assumptions: ij are a random sample ~ N(0, 2 #ONE WAY ANOVA TABLE #MULTIPLE t TESTS ZAR=read.table("c:/DATA/Biostascs/ZarEX10.3R.txt" ZAR aach(zar Y=potassium X=factor(variety opons(digits=12 #DESCRIPTIVE STATISTICS: X1bar=mean(Y[X=="1"] X1bar X2bar=mean(Y[X=="2"] X2bar X3bar=mean(Y[X=="3"] X3bar n1=length(y[x=="1"] n1 n2=length(y[x=="2"] n2 n3=length(y[x=="3"] n3 s1=sqrt(var(y[x=="1"] s1 s2=sqrt(var(y[x=="2"] s2 s3=sqrt(var(y[x=="3"] s3 #GENERATING ANOVA TABLE: anova(lm(y~x k 3 N 18 26.9833333333 X bar 25.6666666667 29.55 ^ descriptive statistics derived from R Analysis of Variance Table Zar Example 10.3 < number of classes < total number of cases n Response: Y Df Sum Sq Mean Sq F value Pr(>F X 2 46.80333333 23.40166667 20.97341 4.5151e-05 *** Residuals 15 16.73666667 1.11577778 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 6 6 6 s 0.679460570355 1.13078144072 1.26767503722

Biostatistics 250 ANOVA Multiple Comparisons 3 One-Way ANOVA Table: Values taken from R's ANOVA table: Source: SS df MS Between SS B 46.80333333 df B k 1 SS B df B 2 MS B df B MS B 23.4017 Within SS W 16.73666667 df W N k SS W df W 15 MS W df W MS W 1.1158 TOTAL SS T SS B SS W SS T 63.54 Multiple t-test / Fisher's LSD Test for Specific Treatment Pairs: This test, offers a "solution" to the problem of distinguishing single Type I error versus familywide fw in multiple comparisons that ignores the issue entirely! According to Sheskin 1997 Handbook of Parametric and Nonparametric Statistical Procedures, "Multiple t-tests" is the term often employed for planned data, whereas "Fisher's Least Significant Difference Tests" often refers to comparisons made post hoc. However, the two are computationally equivalent. Among MCP's this procedure is the most powerful and most capable of detecting differences between means but, of course, also with the highest level of familywise error fw. Multiple t-test/fisher's LSD is often employed when the researcher feels that "data snooping" is not a major issue in the study and/or the number of multiple comparisions are relatively low. Of course, this is a judgement call. So if the data permits, use of one of the procedures below is more "conservative" and is often judged to be more prudent. Studies may report both - if it matters. Hypotheses: H 0 : i H 1 : i Test Statistic: = j for specific i & j j for specific i & j < Means in treatment classes i & j are the same < Two sided test X bar1 X bar2 t 1 t 1 2.159 < Comparing sample 1 with 2 MS W n n X bar1 X bar3 t 2 t 2 4.2086 < Comparing sample 1 with 3 MS W n n 1 3 X bar2 X bar3 t 3 t 3 6.3676 < Comparing sample 2 with 3 MS W n n ^ Note that pooled variance MS 2 3 W is used here, making the calculation of t statistics a little different from the 2-populations calculation Sampling Distribution of the test Statistic t: If Assumption hold and H 0 is true then t ~t (N-k where: k = number of classes N = total number of observations

Biostatistics 250 ANOVA Multiple Comparisons 4 Critical Value of the Test: 0.05 < Probability of Type I error must be explicitly set C 1 qt 2 N k C 1 2.1314 C 2 qt 1 N k 2 C 2 2.1314 C C 1 C 2.1314 < Note: degrees of freedom = (N-k. This also differs from the 2-populations calculation N k 15 Decision Rule: IF t > C, THEN REJECT H 0 OTHERWISE ACCEPT H 0 Probability Value: t 1 2.159 P 2 1 pt t 1 N k P 0.04746 P 2 pt t 1 N k P 1.95254 < if t 0 t 2 4.2086 P 2 1 pt t 2 N k P 1.99924 P 2 pt t 2 N k P 0.00076 < if t 0 t 3 6.3676 P 2 1 pt t 3 N k P 1.999987 P 2 pt t 3 N k P 1.263977 10 5 < if t 0 Prototype in R: #PAIRWISE COMPARISON OF MEANS: #MULTIPLE t TEST/FISHER'S LSD: pairwise.t.test(y,x,p.adj="none",alternave="two.sided" data: Y and X Pairwise comparisons using t tests with pooled SD 2 0.04746-3 0.00076 1.3e-05

Biostatistics 250 ANOVA Multiple Comparisons 5 Multiple t-test / Fisher's LSD Confidence Intervals: CI 1 X bar1 X bar2 C MS W X bar1 X bar C MS n n 2 W n n CI 1 ( 0.0168 2.6165 CI 2 X bar1 X bar3 C MS W X bar1 X bar C MS n n 3 W n n 1 3 1 3 CI 2 ( 3.8665 1.2668 CI 3 X bar2 X bar3 C MS W X bar2 X bar C MS n n 3 W n n 2 3 2 3 CI 3 ( 5.1832 2.5835 Bonferroni Multiple Comparisons Procedure: If a specific and relatively small set of simultaneous tests are desired, this procedure will often give the narrowest confidence intervals among MCP's that worry about fw and is often preferred for that reason. Since the Bonferroni method requires identifying a specific set of simultaneous tests, it is not considered appropriate for post hoc "data snooping". The Bonferroni approach of adjusting fw in a way that's directly related to a specified number of multiple comparisons is very widespread among statistical methods. Hypotheses: H 0 : i H 1 : i = j for specific i & j j for specific i & j < Means in treatment classes i & j are the same < Two sided test Test Statistic: Same as calculated above: t 1 2.159 t 2 4.2086 t 3 6.3676 < comparing sample 1 with 2 < comparing sample 1 with 3 < comparing sample 2 with 3 < Note that pooled variance MS W is used here, making the calculation of t statistics a little different from the 2-populations calculation

Biostatistics 250 ANOVA Multiple Comparisons 6 Sampling Distribution of the test Statistic t: If Assumptions hold and H 0 is true then t ~t (N-k Critical Value of the Test: 0.05 g 3 < Probability of Type I error must be explicitly set < Bonferroni's factor = number of planned comparisons. Here for three pairwise comparison of means where: k = number of classes N = total number of observations C 1 qt N k C 1 2.6937 2 g C 2 qt1 N k C 2 2.6937 2 g C B C 1 C B 2.6937 < Note: degrees of freedom = (N-k. < Note also the bonferroni factor g. N k 15 Decision Rule: IF t > C B, THEN REJECT H 0 OTHERWISE ACCEPT H 0 Probability Values: t 1 2.159 P 2 g pt t 1 N k P 5.85762 < if t 0 P 2 g 1 pt t 1 N k P 0.14238 Note below the Bonferroni factor g: t 2 4.2086 P 2 g 1 pt t 2 N k P 5.997721 P 2 g pt t 2 N k P 0.002279 < if t 0 t 3 6.3676 P 2 g 1 pt t 3 N k P 5.999962 P 2 g pt t 3 N k P 3.79193 10 5 < if t 0 Prototype in R: #BONFERRONI COMPARISON OF MEANS: pairwise.t.test(y,x,p.adj="bonferroni",alternave="two.sided" Pairwise comparisons using t tests with pooled SD data: Y and X 2 0.1424-3 0.0023 3.8e-05 P value adjustment method: bonferroni

Biostatistics 250 ANOVA Multiple Comparisons 7 Bonferroni Confidence Intervals: CI B1 X bar1 X bar2 C B MS W X bar1 X bar C n n 2 B MS W n n CI 1 ( 0.0168 2.6165 CI B1 ( 0.3261 2.9595 CI B2 X bar1 X bar3 C B MS W X bar1 X bar C n n 3 B MS W n n 1 3 1 3 CI 2 ( 3.8665 1.2668 CI B2 ( 4.2095 0.9239 CI B3 X bar2 X bar3 C B MS W X bar2 X bar C n n 3 B MS W n n 2 3 2 3 CI 3 ( 5.1832 2.5835 CI B3 ( 5.5261 2.2405 Holm Simultaneous Testing Procedure: This procedure is an iterative refinement of the Bonferroni approach designed to provide a simultaneous probability of fw for a specific set of tests. Holm sometimes rejects a null hypothesis that Bonferroni would not with the same data and is, thus, more powerful. However, Holm is computationally more complex and lacks direct computation of Confidence Intervals. Although Holm may be the preferred method for theoretical reasons, power consideration by itself may not necessarily be a good reason for chosing the test. As with Bonferroni, this method is unsuitable for "data snooping". #HOLM COMPARISON OF MEANS: pairwise.t.test(y,x,p.adj="holm",alternave="two.sided" ^ Note: Bonferroni intervals are wider than Fisher's LSD data: Y and X Pairwise comparisons using t tests with pooled SD 2 0.0475-3 0.0015 3.8e-05 P value adjustment method: holm

Biostatistics 250 ANOVA Multiple Comparisons 8 Tukey Multiple Pairwise Comparisons Procedure: This procedure is designed to provide a simultaneous probability of fw when comparing means of all possible pairs of populations within the ANOVA data structure. When sample sizes n i differ, this procedure is also called the Tukey-Kramer Procedure. Analysis post hoc is permitted with this procedure as long as one is restricts "snooping" to pairwise comparisons of population means. Tukey Test Statistic: 2 X bar1 X bar2 q 1 MS W n n q 1 3.053251732725 2 X bar1 X bar3 q 2 MS W n n 1 3 q 2 5.951908441386 2 X bar2 X bar3 q 3 MS W n n 2 3 q 3 9.00516017411 Sampling Distribution of the test Statistic q: If Assumptions hold and H 0 is true then q ~q (k,n-k where: k = number of classes N = total number of observations Critical Value of the Test: ^ "studentized range distribution" 0.05 < Probability of Type I error must be explicitly set 1 C T 2 N 18 qstudentizedrange 1 k N k k 3 < Studentized range distribution doesn't exist in MathCad, so, I'll calculate it using the function qtucky( in R. #CALCULATING C FROM THE STUDENTIZED RANGE DISTRIBUTION: alpha=0.05 N=18 k=3 qtukey(1 alpha,k,n k [1] 3.67337762497 1 C T 2 3.67337762497 C T 2.5975 < critical value constructed from "studentized" range distribution.

Biostatistics 250 ANOVA Multiple Comparisons 9 Decision Rule: IF q > C T, THEN REJECT H 0 OTHERWISE ACCEPT H 0 Probability Values: #CALCULATING P FROM THE STUDENTIZED RANGE DISTRIBUTION: #q STATISTICS FROM MATHCAD: q1=3.053251732725 q2=5.951908441386 q3=9.00516017411 opons(digits=12 #FOR SAMPLE 1 VS 2: P1=1 ptukey(q1,k,n k P1 #FOR SAMPLE 1 VS 3: P2=1 ptukey(q2,k,n k P2 #FOR SAMPLE 2 VS 3: P3=1 ptukey(q3,k,n k P3 [1] 0.111388393096 [1] 0.00206212040461 [1] 3.55985514127e 05 Tukey Confidence Interval for Multiple Pairwise Comparisons: CI T1 X bar1 X bar2 C T MS W X bar1 X bar C n n 2 T MS W n n CI 1 ( 0.0168 2.6165 CI B1 ( 0.3261 2.9595 CI T1 ( 0.2674 2.9008 CI T2 X bar1 X bar3 C T MS W X bar1 X bar C n n 3 T MS W n n 1 3 1 3 CI 2 ( 3.8665 1.2668 CI B2 ( 4.2095 0.9239 CI T2 ( 4.1508 0.9826 CI T3 X bar2 X bar3 C T MS W X bar2 X bar C n n 3 T MS W n n 2 3 2 3 CI 3 ( 5.1832 2.5835 CI B3 ( 5.5261 2.2405 CI T3 ( 5.4674 2.2992

Biostatistics 250 ANOVA Multiple Comparisons 10 Prototype in R: #TUKEY HSD TEST: TukeyHSD(aov(Y~X,conf.level=0.95 Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Y ~ X $X diff lwr upr p adj 2-1 -1.31666666667-2.900752846105 0.267419512772 0.111388393032 3-.56666666667 0.982580487228 4.150752846105 0.002062120403 3-2 3.88333333333 2.299247153895 5.467419512772 0.000035598551 ^ P values and confidence intervals match, signs of values differ for CI T