Why We Use Analysis of Variance to Compare Group Means and How it Works The question of how to compare the population means of more than two groups is an important one to researchers. Let us suppose that we are testing three new drugs against a control group in their effect on some type of medical problem. Thus, we are comparing four groups: Control Group Drug A Treated Group Drug B Treated Group Drug C Treated Group Let us assume that we draw a sample of people from some population of people with a particular medical problem and randomly assign them to each of the four groups, so each person is equally likely to be placed into any of the four groups. Because the process is random, the researcher has no control of who goes into which group either. 12/30/02 DOE_Class_2003, ECE-5XXX 1
ANOVA Basics and Examples A drug is administered to the patients in such a manner that neither the administrator of the drug, nor the patient knows either which group into which the patient was classified or the drug (or placebo) that the patient is receiving. After some period of time the patient is measured on some variable that measures the medical effect of interest. The question is: How do we proceed with the analysis of the data? Most beginning researchers who know about two-sample tests would propose performing a whole series of t-tests--comparing, for example the Control Group with each of the Drug groups, then comparing the Drug groups with each other. The number of possible t-tests required to make all possible comparisons for the four groups would be six: Control with A Control with B Control with C A with B A with C B with C 12/30/02 DOE_Class_2003, ECE-5XXX 2
ANOVA Now each of these t-tests has an a level. Let us presume that we predetermine that a should be set at.05, so we have a reasonable b. What happens is that because we no longer have independence between the data sets (we are performing statistical tests on the same set of data more than once) the a probabilities add. Thus, the overall a level for the analysis of this experiment could be as high as a= 0.3. Clearly this is unacceptable for most research. The problem can be minimized if we were to use smaller a levels. The problem is that when this is done, the probability of committing a Type II error balloons to unacceptable levels. Additionally, if we have many groups (more than just four), the overall a level becomes unacceptable very quickly. The statistical analysis process desired is to test for the equality of all of the means in a single test. H 0 : µ 1 =µ 2 =µ 3 =µ 4 H 1 : at least one pair of means is unequal 12/30/02 DOE_Class_2003, ECE-5XXX 3
ANOVA This is the hypothesis that is tested with a completely randomized analysis of variance, or ANOVA. Notice that we don't know which pair (or pairs) of means are unequal to each other when we are finished. This is accomplished by performing a post hoc comparison that will identify which pair (or pairs) of means are significantly different from each other. Why is this hypothesis test about multiple means called analysis of variance? The answer is that it is possible to infer what is happening to population means by examining and analyzing the variability or variance of the data. If we were to examine the data, we would find that the overall variability of the data, as measured by the variance can be calculated in several ways: Way One - we measure the overall variability of the data using a formula such as In this formula, one can easily see that that variance is computed by summing the squared differences from the overall mean to each piece of data. In our four-group experiment, this would be like finding the variability across all four groups by using the mean across the four groups denoted by 12/30/02 DOE_Class_2003, ECE-5XXX 4
ANOVA Way Two - we partition the variance into several components. We could, for instance, find the variability for each group and add them. This would give us something like the following: We would have such an equation for each of the four groups. Thus, another way to estimate the overall variability would be to add the variances of the individual groups. Thus, we would have something like: 12/30/02 DOE_Class_2003, ECE-5XXX 5
ANOVA Now, if each of the group means is identically equal to the overall mean across all groups, the estimate of the variance we computed in Way One will be exactly equal to what we compute in Way Two. When each of the group means is not identically equal to the overall mean, the method of estimating the variance for Way One will be larger than the method used in Way Two. The reason for this is that there is some variability included in Way One that is comprised of the differences between the individual group means and the overall mean. This can be shown in the following equation:. The total of the variances is equal to the sum of the between and within variances: 12/30/02 DOE_Class_2003, ECE-5XXX 6
ANOVA If we were to compute a ratio of the between sample variation divided by the within sample variation we would get a ratio like the following: If you look at this equation, you should notice that as the mean difference between groups increases, the numerator of the equation will become large relative to the denominator because the between variation is dependent on the difference between the group means and the overall mean while the denominator is relatively independent of this difference. 12/30/02 DOE_Class_2003, ECE-5XXX 7
ANOVA This ratio follows a statistical distribution called the F-distribution. The F- distribution in statistics is frequently used to make probability statements about the ratio of two variances. F Distribution: The F-test is always a one-tail test. If the computed test statistics F, exceeds the tabled value, one rejects the hypothesis that all of the population means are equal. One then concludes that at least one pair of means is significantly different and proceeds to use the Scheffè post hoc comparison procedure to identify which pair or pairs of means caused us to reject the null hypothesis of equal group means. 12/30/02 DOE_Class_2003, ECE-5XXX 8
ANOVA Calculations Example 3 operators measure sample volume deltas to specification for 5 different wort vats in a brewery on different days. Is there a difference between Operators or the Vat volumes? Volume OPER 1 OPER 2 OPER 3 V1 4 0 7 V2 1 2 1 V3 6 3 8 V4 7 0 6 V5 2 9 8 12/30/02 DOE_Class_2003, ECE-5XXX 9
ANOVA Calculations Example 3 operators measure sample volume deltas to specification for 5 different wort vats in a brewery on different days. Is there a difference between Operators or the Vat volumes? Run an ANOVA to test for any statistically significant difference. This is a Anova: Two-Factor Without Replication test. 12/30/02 DOE_Class_2003, ECE-5XXX 10
ANOVA Manual Calculations Example: 3 operators measure sample volume. Columns j 3 Volume OPER 1 SQ OPER 2 SQ OPER 3 SQ SUMj (SUM)SQj SUM SQj rows i x i x j 2 x i x j 2 x i x j 2 V1 4 16 0 0 7 49 V2 1 1 2 4 1 1 V3 6 36 3 9 8 64 V4 7 49 0 0 6 36 V5 2 4 9 81 8 64 (Σx j ) (Σx j ) 2 Σ(x 2 j ) 11 121 65 4 16 6 17 289 109 13 169 85 19 361 149 Rows I 5 Σ(Σx j ) Σ((Σx j ) 2 ) Σ(Σ(x j 2 )) Σ(Σx ij ) Σ(Σ(x ij 2 )) (Σx i ) 20 14 30 Σ(Σx i ) 64 64 956 414 (Σx i ) 2 400 196 900 Σ((Σx 2 i ) 1496 Σ(x 2 i ) 106 94 214 Σ(Σ(x 2 i )) 414 12/30/02 DOE_Class_2003, ECE-5XXX 11
ANOVA Manual Calculations Example: 3 operators measure sample Total number n= 15 volume. # Columns c= 5 # rows r= 3 SOURCE OF VARIATION Sum of squares Degrees of Freedom n -1 MS Mean square ( estimate = σ 2 ) Volumes( rows) II 45.60 4 11.40 σ i 2 Operators (columns)j III 26.13 2 13.07 σ j 2 Residual or error 69.20 8 8.65 σ o 2 Total I 140.93 14 Volumes( rows) II =(c(σσx 2 j)-(σσxj) 2 )/n VARIANCE σ 2 Fcal Fcal σ i 2 /σ o 2 σ j 2 /σ o 2 Fbook or F crit Fbook or F crit Significant 1.32 F (0.05,4,8) 3.84 no 1.51 F (0.05,2,8) 4.46 no Operators (columns) III =(r(σσx 2 i)-(σσxij) 2 )/n Residual I - II - III =[(n(σσx2ij)- (ΣΣxij)2)/n]-[(c(ΣΣx2j)- (ΣΣxj)2)/n]-[(r(ΣΣx2i)- (ΣΣxij)2)/n] 69.2 Total I =(n(σσx 2 ij)-(σσxij) 2 )/n 12/30/02 DOE_Class_2003, ECE-5XXX 12
A Faster way: ANOVA use DATA ANALYSIS in EXCEL Anova: Two-Factor Without Replication Example: 3 operators measure sample volume. non-replicated experiment Anova: Two-Factor Without Replication SUMMARY Count Sum Average Variance Volume OPER 1 OPER 2 OPER 3 V1 3 11 3.666667 12.33333 V1 4 0 7 V2 1 2 1 V3 6 3 8 V4 7 0 6 V2 3 4 1.333333 0.333333 V3 3 17 5.666667 6.333333 V4 3 13 4.333333 14.33333 V5 3 19 6.333333 14.33333 V5 2 9 8 OPER 1 5 20 4 6.5 OPER 2 5 14 2.8 13.7 OPER 3 5 30 6 8.5 12/30/02 DOE_Class_2003, ECE-5XXX 13
A faster way: ANOVA use DATA ANALYSIS in EXCEL Anova: Two-Factor Without Replication Example: 3 operators measure sample volume. ANOVA Source of Variation SS df MS F P- value F crit 0.05 level Volume Rows 45.6 4 11.4 1.317919 0.34 3.84 Operator Columns 26.13333333 2 13.06666667 1.510597 0.28 4.46 Error 69.2 8 8.65 Total 140.9333333 14 P value = probability that a larger Fcal value would occur due to random chance. In our case there is a 34% chance that any difference detected between Volumes is just due to random chance Another way to express it would be there is a 34% chance that rejecting the Null Hypothesis that there is no difference between Volumes would be correct Based on this sampling the statement that there is a difference between volumes would only be correct 66% of the time! If P=1 then there is 100% chance that a larger value would occur due to random chance. A low P value means that the factor being tested has a significant effect ( not due to random chance) If Fcal > F crit then the factor has a significant effect! 12/30/02 DOE_Class_2003, ECE-5XXX 14