(Foundation of Medical Statistics) ( ) 4. ANOVA and the multiple comparisons 26/10/2018 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 1 / 27
Analysis of variance (ANOVA) Consider more than 2 groups populations Ω 1, Ω 2,..., Ω m, m 3 whose means are µ 1, µ 2,..., µ m. Then 1 Null hypothesis (H 0 ) : µ 1 = µ 2 = = µ n. 2 Alternative hypothesis (H 1 ) : µ i µ j for some i and j. This test is called the (one-way) analysis of variance, ANOVA. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 2 / 27
Analysis of variance (ANOVA) The two-way analysis of variance when there are two factors, and the multi-way analysis of variance when there are three or more factors. These are not treated here. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 3 / 27
Assumption Suppose that the factor X is divided into m levels X 1,..., X m. Each X i follows a normal distribution. The variances are equal. Remark (1) When m = 2, the one-way ANOVA is equivalent to the t test (2) Equality of variances can be verified with the Bartlett test or the Levene test. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 4 / 27
Table of data Table: Ex Gp size data total T mean X i var. V i level X 1 n 1 x 11 x 12 x 1n1 T 1 x 1 V 1 X 2 n 2 x 21 x 22 x 2n2 T 2 x 2 V 2....... X m n m x m1 x m2 x mnm T m x m V m tolal N T The mean of all data is x = T N. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 5 / 27
Sum of squared deviation between groups Let S T be the sum of squared deviation with respect to the total mean x. S T = (x i j x) 2 i, j The sum of squared deviation between groups S A is defined by S A = m n i ( x i x) 2 i=1 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 6 / 27
Sum of squared deviation within a group It is considered that if S A increases, then the difference between means of groups also increases. The sum of squared deviation within a group Sum of squares of errors S E is defined by S E = m n i (x i j x i ) 2 i=1 j=1 = (n 1 1)V 1 + + (n m 1)V m Math and Stat in Medical Sciences Basic Statistics 26/10/2018 7 / 27
Degrees of freedom Theorem S T = S A + S E. Variances and the degrees of freedom the degrees of freedom of S A is ϕ A = m 1. the degrees of freedom of S E is ϕ E = N m. the degrees of freedom of S T is ϕ T = N 1. Variance V A = S A ϕ A, V E = S E ϕ E (variance of errors Math and Stat in Medical Sciences Basic Statistics 26/10/2018 8 / 27
Ratio of variances and F-distribution Let F 0 = V A V E. Fact F 0 follows F-distribution of degrees of freedom (ϕ A, ϕ E ). 0.6 0.5 0.4 0.3 0.2 0.1 0.5 1 1.5 2 2.5 3 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 9 / 27
Decision When F 0 F ϕ A ϕ E (α), p-value α (H 0 ) : µ 1 = = µ m is rejected. Hence µ i µ j for some i, j. When F 0 < F ϕ A ϕ E (α), p-value > α (H 0 ) : µ 1 = = µ m can not be rejected. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 10 / 27
Using EZR Read ANOVA.csv into EZR. 1 Show the boxplots of groups 1 4. 2 Verify the equality of variances by the Bartlett test. 3 3 Perform the one-way ANOVA. 3 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 11 / 27
Remark 1 When the normality or the equality of variances are not satisfied,, use Kruskal-Wallis test a nonparametric version of analysis of variance EZR: 3 R: kruskal.test ( list(data1, data2, data3,... )) Math and Stat in Medical Sciences Basic Statistics 26/10/2018 12 / 27
Remark 2 In the case of 2-way ANOVA (repeat), the effect of the two factors X, Y and the interaction X Y of X, Y can be tested. EZR: Math and Stat in Medical Sciences Basic Statistics 26/10/2018 13 / 27
Multiple comparison problem By the above test, the null hypothesis has been rejected. Thus it turns out that some population mean is different from the others. Question Which two population means differ? ANOVA does not answer this question. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 14 / 27
Misuse of t test To see that, it seems to be necessary to repeat the t test for all pairs. But such a treatment should not be doing. Why? Math and Stat in Medical Sciences Basic Statistics 26/10/2018 15 / 27
Because... If there are 4 populations, we need to do 4 C 2 = 6 tests. Assuming that the reliability of a single t test is 95%, the total reliability of 6 times t tests is cb 0.95 6 100% = 73.5% Thus the total reliability is lower than 95%. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 16 / 27
Multiple comparisons Bonferroni method Various multiple comparison methods have been posed to avoid such difficulties. Bonferroni correction Taking the significance level to be smaller, in order to guarantee the reliability of 95% even when repeating the t test. Since (1 α) n 1 nα, if we take the siginificance level to be α/n, after n times t tests, the total significance level is less than α. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 17 / 27
Ex If n = 6 and α = 0.05, we may perform 6 times t tests under the significance level 0.05 6 = 0.0083. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 18 / 27
Multiple comparisons Holm method Boferroni s method is conservative, i.e., if n is larger, power is lower since α/n is very small. There is the Holm s method improved the Bonferroni method. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 19 / 27
Multiple comparisons Holm metho Repeat the t test at the significance level α and n times, and arrange the resulting p values (which the software will output) in ascending order p 1 < p 2 < < p n. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 20 / 27
Procedure 1 If p 1 < α/n, p 1 is significant. 2 If p 2 < α/(n 1), p 2 is significant 3 If p 3 < α/(n 2), p 3 is significant and so on. 4 If p k α/(n k + 1) for the first time, p k,..., p n are not significant. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 21 / 27
Multiple comparisons Tukey-Kramer method A method to compare all pairs of m groups in one test. Assumption Each group follows a normal distribution. The variances are equal. Math and Stat in Medical Sciences Basic Statistics 26/10/2018 22 / 27
Using EZR Read ANOVA.csv into EZR. Perform the Tukey-Kramer method 3 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 23 / 27
Using EZR Result: The simultaneous confidence intervals are displayed. As a result, there is a difference between group 1 and group 3, also group 1 and group 4 Math and Stat in Medical Sciences Basic Statistics 26/10/2018 24 / 27
Multiple comparisons Dunnet method A method of comparison between the control group X 1 and each of the other groups X 2,..., X m (there are m 1 combinations.) Math and Stat in Medical Sciences Basic Statistics 26/10/2018 25 / 27
Multiple comparisons nonparametric methods When the normality is not satisfied, use nonparametric methods. Assumption It is assumed that the distributions of all groups are the same shape. The sample size of each group is large (10 or more in each group). Math and Stat in Medical Sciences Basic Statistics 26/10/2018 26 / 27
Nonparametric methods all pair comparisons Steel-Dwass method pair comparisons between the control group and the other groups Steel method These methods are found in 3 Reference (Japanese) Math and Stat in Medical Sciences Basic Statistics 26/10/2018 27 / 27