Lecture 14: ANOVA and the F-test

Lecture 14: ANOVA and the F-test S. Massa, Department of Statistics, University of Oxford 3 February 2016

Example Consider a study of 983 individuals and examine the relationship between duration of breastfeeding and adult intelligence. Each individual had to perform 3 tests, and breastfeeding duration was marked in 5 classes. Test Duration of Breastfeeding (months) 1 2-3 4-6 7-9 > 9 N 272 305 269 104 23 Verbal IQ Adj. Mean 99.7 102.3 102.7 105.7 103.0 SD 16.0 14.9 15.7 13.3 15.2 Performance IQ Adj. Mean 99.1 100.6 101.3 105.1 104.4 SD 15.8 15.2 15.6 13.9 14.9 Full Scale IQ Adj. Mean 99.4 101.7 102.3 106.0 104.0 SD 15.9 15.2 15.7 13.1 14.4

Example First of all notice that we list adjusted means. This means that the actual data has been analysed to remove effects of confounding factors (mother smoking, parents income etc), so that the effect of breastfeeding could be isolated. Test if the duration of breastfeeding affects adult intelligence.

The General Setup Suppose we have independent samples from K different normal distributions, with means µ 1,..., µ K and common variance σ 2. Test H 0 : µ 1 =... = µ K. We call the K groups, levels. We have n i samples from the i-th level, X i1, X i2,..., X ini, and N = K i=1 n i total observations. The sample mean is X while the sample mean of group i is X i X i = 1 n i n i j=1 X ij, X = 1 N X ij. i,j

The Idea Behind ANOVA If the K means are all equal, then: the observations should be as far from their own level mean X i, as they are from the overall mean X. If the means were different then observations should be closer to the mean of their level than the overall mean.

Between Groups and Error Sum of Squares The Between Groups Sum of Squares (BSS) is the total square deviation of the group means from the overall mean, K BSS = n i ( X i X) 2 ; i=1 The Error Sum of Squares (ESS) is the total squared deviation of the samples from their group means; K n i ESS = (X ij X i ) 2 i=1 j=1 K = (n i 1)s 2 i, i=1 where s i is the SD of observations in level i.

Total Sum of Squares The Total Sum of Squares (TSS) is the total square deviation of the samples from the overall mean. T SS = (X ij X) 2 i,j = (N 1)s 2, where s is the sample SD of all observations together. We also have BMS = BSS ESS, EMS = K 1 N K.

ANOVA Two basic mathematical facts behind ANOVA First TSS = ESS + BSS. The variability among the data can be split in two pieces: 1. the variability among the means of the groups; 2. the variability within the groups; Evaluate how the total variability is split among the two types: if there is too much between group variability this would cast doubt on the validity of the null. Second Both EMS and BMS are estimates for σ 2.

The F -statistic The F -statistic is F = BMS EMS = N K BSS K 1 ESS. The critical region is of the form {F f}, where f will depend on the significance level of the test. Essentially we would reject the null hypothesis for larger values of F (that is BMS bigger than EMS).

The F -distribution Under the null hypothesis the F statistic has the F distribution with (K 1, N K) degrees of freedom. This is a continuous distribution on the positive real numbers with two parameters. Figure: The probability density function of the F distribution for various degrees of freedom.

The F -statistic The important quantities are summerised in the following table: Errors SS d.f. MS F Between Groups BSS K 1 BMS = BSS/(K 1) BMS/EMS Within Groups ESS N K EMS = ESS/(N K) Total TSS N 1

ANOVA: the Breastfeeding Study Recall the data from the breastfeeding study: Test Duration of Breastfeeding (months) 1 2-3 4-6 7-9 > 9 N 272 305 269 104 23 Verbal IQ Adj. Mean 99.7 102.3 102.7 105.7 103.0 SD 16.0 14.9 15.7 13.3 15.2 Performance IQ Adj. Mean 99.1 100.6 101.3 105.1 104.4 SD 15.8 15.2 15.6 13.9 14.9 Full Scale IQ Adj. Mean 99.4 101.7 102.3 106.0 104.0 SD 15.9 15.2 15.7 13.1 14.4 Find TSS via TSS = ESS + BSS.

Example: the Breastfeeding Study Test Duration of Breastfeeding (months) 1 2-3 4-6 7-9 > 9 N 272 305 269 104 23 Verbal IQ Adj. Mean 99.7 102.3 102.7 105.7 103.0 SD 16.0 14.9 15.7 13.3 15.2 Performance IQ Adj. Mean 99.1 100.6 101.3 105.1 104.4 SD 15.8 15.2 15.6 13.9 14.9 Full Scale IQ Adj. Mean 99.4 101.7 102.3 106.0 104.0 SD 15.9 15.2 15.7 13.1 14.4 5 ESS = (n k 1)s 2 k k=1 = 271 15.9 2 + 304 15.2 2 + 268 15.7 2 + 103 13.1 2 + 22 14.4 2 = 227000;

Example: the Breastfeeding Study BSS = Test Duration of Breastfeeding (months) 1 2-3 4-6 7-9 > 9 N 272 305 269 104 23 Verbal IQ Adj. Mean 99.7 102.3 102.7 105.7 103.0 SD 16.0 14.9 15.7 13.3 15.2 Performance IQ Adj. Mean 99.1 100.6 101.3 105.1 104.4 SD 15.8 15.2 15.6 13.9 14.9 Full Scale IQ Adj. Mean 99.4 101.7 102.3 106.0 104.0 SD 15.9 15.2 15.7 13.1 14.4 5 n k ( x k x) 2 k=1 = 272 (99.4 101.7) 2 + 305 (101.7 101.7) 2 + = 3597. + 269 (102.3 101.7) 2 + 104 (106.0 101.7) 2 + + 23 (104.0 101.7) 2

Example: the Breastfeeding Study We complete as follows Table: ANOVA table for breastfeeding data: Full Scale IQ, Adjusted. SS d.f. MS F Between 3597 4 894.8 3.81 Samples = 3597/4 = 894.8/234.6 Within 227000 968 234.6 Samples = 227000/968 Total 230600 972 = 3597 + 227000 Since N = 973 and K = 5 under the null the F -statistic is distributed according to F (4, 968).

Example: the Breastfeeding Study Having computed F = 3.81 we now look up the critical values in our table for the 0.05 level: K = 4 so we pick the fourth column, but N K is much more than 60 so we take the bottom row. The critical value turns out to be 2.37 so we reject the null hypothesis at the 0.05 level.

The Kruskal-Wallis Test The F -test has one basic assumption: the samples are assumed to be normally distributed, that is the F -test is parametric. The non-parametric version is known as the Kruskal-Wallis test. As with the rank sum test, the basic idea is to substitute the ranks for the actual observed values.

The Kruskal-Wallis test Suppose K levels, with n i observations in level i. Assign to each observation its rank relative to the whole sample. Sum the ranks in each group giving rank sums R 1,..., R K. The Kruskal-Wallis test statistic is H = 12 N(N + 1) K i=1 Under the null hypothesis, H χ 2 K 1. R 2 i n i 3(N + 1). (1)

Exercise and Bone Density in Rats A study was performed to examine the effect of exercise on bone density in rats. 30 rats were divided into three groups of ten: high, low and control. Their bone density was measured at the end of the treatment period. Test H 0 : different groups have same mean density H 1 : different groups have different mean density

Exercise and Bone Density in Rats Compute Thus High 626 650 622 674 626 643 622 650 643 631 Low 594 599 635 605 632 588 596 631 607 638 Control 614 569 653 593 611 600 603 593 621 554 x high = 638.70, s 2 high = 275.34 x low = 612.5 s 2 low = 373.61 x cont = 601.10, s 2 cont = 748.77 x = 617.4 ESS = 9s 2 high + 9s 2 low + 9s 2 cont = 12579.5, BSS = 10(638.7 617.4) 2 + 10(612.5 617.4) 2 + 10(601.1 617.4) 2 = 7433.9

Exercise and Bone Density in Rats ANOVA table: SS d.f. MS F Between 7434 2 3717 7.98 Errors (Within 12580 27 466 Total 20014 29 The number of degrees of freedom here is (K 1, N K) = (2, 27). There is no row for 27 so we just look at the row for 30 and find the critical value to be 3.32. So we reject the null at the 5% level.

Exercise and Bone Density in Rats I Use the Kruskal-Wallis test. First we assign ranks to the data, breaking ties as usual. High 18.5 27.5 16.5 30 18.5 25.5 16.5 27.5 25.5 20.5 Low 6 8 23 11 22 3 7 20.5 12 24 Control 14 2 29 4.5 13 9 10 4.5 15 1 The rank sums are then computed as follows R high = 226.5, R low = 136.5, R cont = 102.

Exercise and Bone Density in Rats II Then compute H = 12 N(N + 1) K i=1 = 12 [ 226.5 2 30 31 10 R 2 i n i 3(N + 1) + 136.52 10 + 1022 ] 3 31 = 10.66 10 At the 5% level, the critical value for χ 2 with K 1 = 2 degrees of freedom is 5.99. We therefore still reject the null hypothesis.

Recap Given independent samples from K normally distributed populations N(µ 1, σ 2 ),... N(µ K, σ 2 ) we want to test if the level means µ 1,..., µ K could all be equal. We compute ESS: the squared deviations of observations from their own group mean; and BSS: the squared deviations of group means from the overall mean. Failure of the null should result in higher BMS compared to EMS. The F -test is defined as F = N K K 1 BSS ESS = BMS EMS. Under the null F has the F -distribution with (K 1, N K) degrees of freedom.

Recap We summarise our calculation in an ANOVA table: SS d.f. MS F Between BSS K 1 BMS Treatments (A) (B) (X = A/B) Errors (Within ESS N K EMS Treatments) (C) (D) (Y = C/D) Total TSS N 1 X/Y

Recap Now the F -test depends crucially on our data being normally distributed. If we have reason to believe this may not be satisfied then we use the non-parametric Kruskal-Wallis test. Replace the data by their ranks relative to the whole sample. Let R i be the rank sum in the i-th level. H = 12 N(N + 1) Under the null H χ 2 K 1. K i=1 R 2 i n i 3(N + 1). (2)