1 One-way Analysis of Variance

Size: px

Start display at page:

Download "1 One-way Analysis of Variance"

Corey Edgar Pierce
5 years ago
Views:

1 1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment T i. A mathematical model the relates the expected response to the effectiveness of the treatment administered is j = 1,,..., q; i = 1,,..., p, whrere F : Y ij = t i + ɛ (i)j t i = mean influence of ith treatment i = 1,,..., p ɛ (i)j = random error ( this is in fact a linear model) We make the further assumptiom that the ɛ (i)j s are independent N(0, σ ) random variables. The average response Ȳi+ = 1 q q j=1 Y ij from the individuals which received the ith treatment is a good estimator of the mean influence t i of the ith treatment T i (it is in fact the least squares estimator of t i ). This follows from the fact that Ȳi+ N(t i, σ ). Consequently the variation between the q Ȳ i+ s i.e. p (Ȳi+ Ȳ++) where Ȳ++ = 1 p p Ȳi+ = 1 q N j=1 Y ij, N = pq, a variation with (p 1) degrees of freedom, will be entirely due to randomness if the treatments are identical i.e. if t 1 = t =... = t p. In particular 1 p 1 (Ȳi+ Ȳ++) estimates σ if the treatments are identical i.e. if t q 1 = t =... = t p or more specifically SS T r p 1 = q (Ȳi+ p 1 Ȳ++) estimates σ if t 1 = t =... = t p. However, if the treatments are not identical must measure (random variability per source + treatment variability per SS T r p 1 1

2 source). Thus SS T r p 1 SS T r p 1 estimates σ if t 1 = t =... = t p (1) estimates σ + q p 1 (t i t) if t 1 t... t p () where t = 1 p p t i. If, for a fixed i, we now consider the responses Y i1, Y i,..., Y iq from the individuals which received the ith treatment only, any differences or variability between them will be exclusively due to error (randomness). Thus q (Y ij Ȳi+) i = 1,,..., p j=1 is part of the pure random variation with (q 1) degrees of freedom for each i = 1,,..., p, irrespective of whether treatments are the same or not. Thus q SS R = (Y ij Ȳi+) j=1 is variation due to the random error unaffected by the presence of treatment differences and has p(q 1) = N p degrees of freedom. Thus SS R N p estimates σ (3) SS R is in fact the Residual sum of squares of the linear model we have assumed, and therefore SS R is the least squares estimator of N p σ. From (1.3),(1.4) and (1.5) we see that E( SS { T r σ p 1 ) = if t 1 = t =... = t p σ + p (t i t) (4) if t 1 t... t p q p 1 and E( SS R N p ) = σ (5) From (3) - (7) we see that if one wishes to test the hypothesis of no differences between the treatments i.e H o : t 1 = t =... = t p (= t say) against the alternative that at least one of the treatments differs from the rest i.e H 1 : t 1 t... t p the appropriate test procedure at α-level of significance is

3 if F = SS T r/(p 1) SS R /(N p) accept H o if F > k accept H 1 where k is determined by the level of significance and will of course depend on the distribution of the statistic F under the null hypothesis. From our knowledge of basic statistics we know that SS T r σ χ p 1 when H o is true and SS R σ χ N p. (Recall from your first elementary statistics course that when sampling from the Normal distribution then the sample variance is an unbiased estimator of the population variance and that (sample size - 1). sample variance population variance is chi-squared distributed with degrees of freedom equal to (sample size - 1)) Since further, SS T r is a function of the l.s.e. of the t i s which in turn are independent of SS R, the models residual sum of squares (see your notes on Linear Models) it follows that F F p 1,N p when H o is true. Consequently k = F p 1,N p;α and the appropriate test procedure is: Note that since if F = SS T r/(p 1) SS R /(N p) p 1,N p;α do not reject H o if F > F p 1,N p;α reject H 0 q q (Y ij Ȳ++) = q (Ȳi+ Ȳ++) + (Y ij Ȳi+) (6) j=1 j=1 we have the identity SS T ot = SS T r + SS R (7) with SS T ot = p q j=1(y ij Ȳ++). It is customary to present the analysis of the data from such a model in the form of a table, the Analysis of Variance (ANOVA) table, and, when calculating the sum of squares, to express them first in terms of totals rather than averages. In particular, if n i T i+ = Y ij = qȳi+ = total response from all individuals treated with treatment T i j=1 3

4 and q T ++ = Y ij = NȲ++ = Total yield from all plots in the experiment j=1 then SS T r = q (Ȳi+ Ȳ++) = Ti+ q T ++ N (8) and q q SS R = (Y ij Ȳi+) = Y Ti+ ij (9) j=1 j=1 q = SS T ot SS T r (10) q q SS T ot = (Y ij Ȳ++) = Yij T ++ j=1 j=1 N (11) (1) The term T ++ N is called the correction factor (C.F). ANOVA table Source Sum of Squares d.f. Mean Squares F EMS Treatments T i+ q T ++ N = SS T r p 1 SS T r p 1 = MS T r MS T r MS R σ + q p (t i t) p 1 Residual SS T ot SS T r = SS R N p SS R N p = MS R σ Total i,j Y ij T ++ N = SS T ot N 1 If the number of individuals treated with each treatment is not the same for all treatments but varies from treatment to treatment with n i individuals treated with the ith treatment, i = 1,,..., p, p n i = N, then the above Analysis of Variance table is still valid with the slight modification that q is replaced by n i and if need be taken inside the summation with respect to i, as shown below. 4

5 ANOVA table Source Sum of Squares d.f. Mean Squares F EMS Treatments T i+ n i T ++ N = SS T r p 1 SS T r p 1 = MS T r MS p T r σ n i (t i t) + MS R p 1 Residual SS T ot SS T r = SS R N p SS R N p = MS R σ Total i,j Y ij T ++ N = SS T ot N 1 where now t = 1 N p n i t i Some practical considerations As the analysis of variance of the one-way classification model is based on the assumption of Normality of the observations constant variance of the observations these assumptions need to be checked..1 Checking Normality: In the case of the assumption Normality this can be checked graphically by producing separate box-plots and/or a histograms and probability plots of the response values for each individual treatment, or by producing a probability plot of the standardised residuals of the one-way classification model. Of course each of these probability plots can be backed by a formal statistical test for Normality such as the Darling-Anderson test. The conclusions of the Analysis of Variance of the one-way classification model are fairly robust to failure of the assumption of Normality as long as the true distribution of the 5

6 responses is not too skewed i.e. the result of the analysis of variance are not affected by small deviations from normality. The data will need to be fairly skewed before the conclusions of the analysis of variance are invalidated. Strongly skewed data will need to be transformed to near symmetric data before an analysis of variance is reliably applied. If you need to transform the data you can one method you can use is the Box-Cox method. However, since the response values come from p different populations the Minitab menu facility of finding the optimal Box-Cox transformation cannot be used. It only works in the case when the data come from the same population. You will need to resort using the boxcox function in R. This allows for the response variable following a linear model. Recall that before you use the boxcox function you need to import library(mass). Checking for constant variance In the case of the assumption of constant variance this needs to be checked more diligently since the analysis of variance of the one way classification model is not robust against failure of equality of variances. Bartlett s test of equality of variances To test for equality of variances you can use Bartlett s test provided you are fairly confident that the response values are Normally distribute i.e. that Y ij N(t i, σ i ) j = 1,,..., n i for each treatment i = 1,,..., p. Bartlett s test is essentially the generalised likelihood ratio test statistic for testing the hypothesis H 0 : σ 1 = σ = = σ p against H 1 : σ 1 σ = σ p using the data Y ij, j = 1,,..., n i from the N(t i, σ i ) distribution, i = 1,,..., p. The test statistic is X = Q c with Q = (N p) log s pooled (n i 1) log s i 6

7 where s i = 1 n i 1 n i (Y ij Y i+ ) j=1 and c = 1 + s pooled = 1 N p (n i 1)s i ( 1 p 1 3(p 1) n i 1 1 ) N p Here N = p n i. When H 0 is true X χ p 1 so that the null hypothesis is rejected at the 100α% level of significance if X > χ p 1,α. The Bartlett test is very sensitive to deviations from Normality and can give unreliable results if the response variable is not Normally distributed. It should therefore be avoided if there is any doubt about the Normality of the response variable. The alternative to Bartlett s test is the Levene test. Levene s test of equality of variances Levene s test for equality of variances of variances which is fairly robust to deviations from Normality uses as its working data the differences D ij = Y ij Ỹi j = 1,,..., n i i = 1,,..., p, where the Y ij s are the response variable values and Ỹi is the median of these values for the ith treatment. It can be shown that under the assumption of Normality of the response values, i.e. under the assumption that Y ij N(t i, σ i ), the mean value of the working data d ij is given by µ Di = E(D ij ) σ i j = 1,,..., n i i = 1,,..., p. Consequently to test the hypothesis H 0 : σ 1 = σ = = σ p against H 1 : σ 1 σ = σ p one tests the equivalent hypothesis H 0 : µ D1 = µ D = = µ Dp against H 1 : µ D1 µ D = µ Dp using the D ij s as the working data. This can be achieved by performing an one way analysis of variance on the D ij s which is exactly what Levene s test 7

8 does. Remark: If you have strong evidence that the response values are Normally distributed or near Normally distributed then the Bartlett test is the more powerful of the two i.e. it has the higher probability of correctly detecting differences between the variances. However the Levene test should be preferred if there doubts about the Normality of the data. To perform either of the two tests, when using Minitab use Stats ANOVA Test for equality of Variance and to perform the Levene test when using SPSS, choose Analyze Compare means One-way ANOVA Options Homogeneity of Variance test If the test for homogeneity of variance indicate that the variance is not constant you will need to transform the response variable to stabilize the variance (make it have constant variance). Usually lack of normality and lack of constant variance go hand in hand and the transformation that rectify Normality may also stabilize the variance. Thus a Box-Cox transformation may stabilize the variance. Alternatively you may exploit the fact that the data are grouped into samples from different treatments as follows: Suppose that the variance is not the same for all treatments i.e. E(Y ij ) = t i, V ar(y ij ) = σi j = 1,,..., n i for i = 1,,..., p and that for some λ i.e. σ i t λ i (13) log σ i = α + λ log t i (14) i = 1,,..., p. This can be verified by plotting s i against Ȳi i = 1,,..., p, where s i = 1 ni n i 1 j=1(y ij Ȳi+) and Ȳi+ = 1 ni n i j=1 Y ij. If the plotted points lie about a straight line that provides visual evidence that the assumption (13) is valid. In fact the slope of the best fitting line through the plotted points provides an estimate ˆλ of the power λ in (13). The transformation of the responses Y ij that stabilizes the variance can be found using the delta 8

9 method. In particular if Z = g(y ) is the required transformation then to a first order approximation g(y ) g(t) + (Y t)g (t) where t is the mean of Y. Taking the variance of both sides in the above expansion we have V ar(g(y )) = V ar(y )[g (t)] where g (.) denotes the first derivative of g(.). Thus if g(.) is the right transformation for stabilizing the variance then V ar(g(y )) =constant and g (t) constant s.d.(y ) = constant t λ i.e. 1 g(t) dt = t1 λ tλ Thus the suggested variance stabilizing transformation can be taken as Y 1 ˆλ with ˆλ the slope of the best fitting line to the points of the scatterplot of log s i against Ȳi+. 9

10 3 Pairwise Comparisons of treatments in a one way analysis of variance model. Consider the Pulse data on 68 workers after they performed one of six different tasks. Task Total T i Ti n i Ti+/n i Ȳ i CF = = Y = ANOVA table Source Sum of Squares df MS F p Tasks = Residuals = Total =

11 The analysis indicates that there is significant evidence of differences in the effects of the tasks on the pulse rate. One would want to know which pairs of treatments (tasks) differ. 3.1 The Fisher s Least Significant Differences (LSD) procedure. Under this procedure one performms 1 p(p 1) different hypothesis tests, each at α-level of significance in order to test the 1 p(p 1) different hypotheses H o(i,j) : t i t j = 0 vs H 1(ij) : t i t j 0 for ALL i j. For the (i, j)th test, the test statistic is t (ij) = Ȳi+ Ȳj+ ˆσ 1 n i + 1 n j and if t (ij) > t ν;α/ we conclude that the ith and jth treatments differ. Here ˆσ = MS R and ν = degrees of freedom of the Residual, both obtainable from the ANOVA table. Thus the (i, j)th pair of treatments are judged to differ if their sample averages Ȳi+ and Ȳj+ differ by more than LSD α = ˆσt ν;α/ 1 n i + 1 n j If of course all sample sizes are equal (to q say) then the α-level Least Significant Differences are the same for all samples and equal to LSD α = ˆσt ν;α/ q Example. For the Pulse data ˆσ = = 5.56 ν = 6 and t 6;0.05 =.00 Hence the LSD 0.05 s are 11

12 Sample Sizes (n i, n j ) (13,1) (13,11) (13,10) (1,11) (1,10) (1,1) (11,10) (10,10 LSD Pairwise comparisons using Fisher s LSD procedure Pair comparison Ȳ i+ Ȳj+ ( * = significant ) 1 vs = vs = vs = * 1 vs =.43 1 vs = vs = vs = * vs = vs =.65 3 vs =.00 3 vs = * 3 vs = 6.98 * 4 vs = * 4 vs = 9.18 * 5 vs = 0.68 It is customary to present the averages in ascending order and underscore the pairs which DONOT differ significantly, as shown below (6) (5) () (1) (3) (4) It must be realised that since we are performing ( ) a lot of tests each at p 5% level of significance we would expect 0.05 of these tests to show significant, even when there are no differences between the treatments. In fact the probability of finding at least one significant difference amongst these 1 p(p 1) tests when the treatments are equal in their effects, increases dramatically with p (= # of treatments) as shown in the table below. 1

13 Probability of at least one significant pair using the LSD 0.05 procedure p Probability It is therefore strongly recommended NOT to use the LSD α unless you first performed an F test for the hypothesis method H o : t 1 = t =... = t p vs H 1 : t 1 t... t p which produced significant results. Note that (Ȳi+ Ȳj+) ± LSD α constitutes a 100(1 α)% confidence interval for (t i t j ). 3. Tukey s Honest Significant Difference (HSD) To get round this difficulty, associated with the Fisher s LSD method, of having a considerable risk of finding significant differences in pairs of treatments when they do not exist, Tukey suggested a single test at α-level for testing whether all possible differences of pairs are non significant or not. His approach also provides simultaneous 100(1 α)% confidence intervals for all pair differences (t i t j ). Tukeys method assumes that the treatment samples are all of equal size (say q). From the theory of linear models we know that the statistics T i = Ȳi+ t i ˆσ q i = 1, 1,..., p where ˆσ = MS R, have distribution independent of t 1, t,..., t p and consequently the distribution of max T i min T i = D 1 i p 1 i p is independent of the t i s. Tukey computed the distribution of D and found its critical values q p,ν;α defined by P r(d > q p,ν;α ) = α 13

14 Here ν = degrees of freedom of the residual. These critical values are tabulated and are called the Upper percentage points of the studentised range. We therefore have that i.e. P r(max T i min T i q p,ν;α ) = 1 α P r((ȳi+ Ȳj+) ˆσ q q p,ν;α t i t j (Ȳi+ Ȳj+)+ ˆσ q q p,ν;α for ALL i, j) = 1 α i.e. (Ȳi+ Ȳj+) ± ˆσ q q p,ν;α i, j = 1,,..., p (15) are 100(1 α)% simultaneous confidence intervals for all pair differences t i t j. Cosequently if max Ȳ i+ min Ȳ i+ 1 i p 1 i p ˆσ q q p,ν;α we reject the hypothesis H o : t i t j = 0 for all i, j, in favour of the alternative that for at least one pair the difference t i t j is non-zero. Any pair of treatments for which Ȳi+ Ȳj+ > ˆσ q q p,ν;α (16) are then declared to be significantly different. If the treatment samples are not of equal size, it is suggested that Tukeys method is adjusted so that, in the set of simultaneous confidence intervals, the confidence interval for the t i t j is replaced by (Ȳi+ Ȳj+) ± ˆσq p,ν;α min( n i, n j ) (17) and that the difference t i t j is declared to be significant if Ȳi+ Ȳj+ > ˆσq p,ν;α min( n i, n j ) Tukey s method is very conservative (very reluctant to find pairs significant) - in fact the Type I error is smaller than the nominal α - In consequence it has a rather low power of detecting true differences between pairs. 14

15 3.3 The Newman-Keuls test for pair comparisons This test attempts to reduce the conservatism of Tukey s HSD test by making the HSD value on the r.h.s of (1.36) dependent on the rank range of the treatment averages involved in the comparison. It is natural that when the treatment averages are ranked in ascending order of magnitude then the further apart the ranks of two such averages are, the bigger the difference in their values should be before we declare them as representing differing treatments. The Newman-Keuls test proceeds as follows: Order the treatment averages in ascending order of magnitude from smallest to largest, Ȳ(1), Ȳ(),..., Ȳ(p). Step 1. If the range Ȳ (p) Ȳ(1) ˆσ q p,ν;α q accept the hypothesis that all pair differences are zero and stop. If Ȳ (p) Ȳ(1) > conclude that t (p) t (1) and go to step Step. (i) If the range Ȳ (p 1) Ȳ(1) ˆσ q q p,ν;α ˆσ q q p 1,ν;α accept the hypothesis that all the pair differences not involving t (p) are zero (ii) If the range Ȳ (p) Ȳ() ˆσ q p 1,ν;α q accept the hypothesis that all pair differences not involving t (1) are zero. If both (i) and (ii) are true then stop; otherwise go to step 3. Step 3 If (i) of step is not true compare the range of the set (Ȳ(1), Ȳ(),..., Ȳ(p )) and the range of the set (Ȳ(), Ȳ(3),..., Ȳ(p 1)) with ˆσ q q p,ν;α. If (ii) in step is not true compare the range of the set (Ȳ(), Ȳ(3),..., Ȳ(p 1)) and the range of set (Ȳ(3), Ȳ(4),..., Ȳ(p)) with ˆσ q q p,ν;α. One continues to examine the ranges of smaller subsets as long as the previous subset has a significant range. Each time a range proves non-significant 15

16 the averages involved in it are included in a single group and no subgroups of it can later be deemed have significant ranges, so no further range tests are conducted on subgroups of this group. The Newman-Keuls requires treatment samples of equal size but it can be used with samples of unequal sizes if q is replaced by the harmonic mean q of the n i s p q = ( 1 n n n p ) The Newman-Keuls procedure, although less conservative that Tukey s procedure, it does, none the less, contain a cerain amount of conservatism. 3.4 Duncan s Multiple Range test Duncan s Multiple range test is similar to the Newman-Keuls test except that it attempts to remove some of the conservatism by changing in each Step the nominal significance level α of the studentised range coefficient q r,ν,αr i.e. α r changes with r (α r = 1 (1 α) r 1 ). The Duncan test, therefore has its own table of coefficients Q α (r, ν) of the studentised range for an overall nominal α-level of significance. Like the Newman-Keuls test, it can be used with unequal sample sizes if q is replaced by the harmonic means q of the sample sizes n i. Further Reading: See D.C.Montgomery Design and Analysis of Experiments pages Example. Consider the pulse rate data obtained on 68 workers after performing one of six tasks. q = 6( ) 1 = 11.3 ν = 6 σ = 5.56 r Q 0.05 (r, 6) Q ˆσ q 0.05 (r, 6)

17 Table of differences Treatment * 8.500* 6.917* 6.077* * 6.300* [3.877] (.43) (0.840) (.90) (1.583) 5 (1.319) Task Average Although the Duncan Multiple range test produced the same results as the Fisher LSD test on this occasion, this is not the case in general. Reference Douglas Montgomery Design and analysis of Experiments John Wiley 17

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine