1 Introduction to One-way ANOVA

Size: px

Start display at page:

Download "1 Introduction to One-way ANOVA"

Augustine Willis
6 years ago
Views:

Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.

1 Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: CH10/ Link to Notes: 01.ppt 10_02.ppt 10_03.ppt 1 Introduction to One-way ANOVA Suppose we wished to examine the differences between I different normal populations with possibly different means µ 1, µ 2,..., µ I, but all the variances are equal to σ 2 (a generalization of the twosample t-test with equal population variances in Chapter 9). In One-Way Analysis of Variance (ANOVA), we begin with the following null hypothesis, with the alternative hypothesis H 0 : µ 1 = µ 2 = = µ I H a : µ l µ m, for some l m, so, if the alternative is true, we say at least two means are different. Figure 1 is a plot of three (I = 3) normal distributions all with variance equal to one (σ 2 = 1, but means 100, 110, and 120. Figure 1: Plot of three different normal densities with means 100, 110 and 120, with common variance equal to 1. This is an illustration of the null hypothesis, H 0 : µ 1 = µ 2 = µ 3, being false. To develop a test, we would, draw random samples from each population in question, then use this data to draw inferences about the true state of nature in the underlying distributions. Two such Page 1

2 designed experiments are referred to as balanced and unbalanced single-factor designs associated with ANOVA. Balanced Design Single Factor ANOVA: In a balanced design, we would draw independent random samples of the same size, say J, from each of the I populations. If the sample sizes were not all equal, the design would be said to be unbalanced (note: there is nothing inherently wrong with an unbalanced design). Table 1 (a) illustrates a balanced design with I treatments/groups and J measurements/observations and Table 1 (b) presents an unbalanced design. Table 1 (a): Illustration of a balanced single-factor/one-way design. Xi. = J j=1 X ij/j, i = 1, 2,..., I are the individual sample means for each treatment/group and S1 2, S2 2,..., S2 I are the individual sample variances from each treatment/group. Group or treatment Random sample sample size Mean Var Assumed Distribution 1 X 11, X 12,..., X 1J J X1. S 2 1 N(µ 1, σ 2 ) 2 X 21, X 22,..., X 2J J X2. S 2 2 N(µ 2, σ 2 )... I X I1, X I2,..., X IJ J XI. SI 2 N(µ I, σ 2 ) I J X.. where X.. = ( X 1. + X X I. ) I = ( J j=1 X 1j + J j=1 X 2j + + ) J j=1 X 1j I J = I J X ij I J is the grand mean. Table 1 (b): Illustration of a unbalanced single-factor/one-way design. Xi. = J i j=1 X ij/j i, i = 1, 2,..., I are the individual sample means for each treatment/group and S1 2, S2 2,..., S2 I are the individual sample variances from each treatment/group. Group or treatment Random sample sample size Mean Var Assumed Distribution 1 X 11, X 12,..., X 1J1 J 1 X1. S 2 1 N(µ 1, σ 2 ) 2 X 21, X 22,..., X 2J2 J 2 X2. S 2 2 N(µ 2, σ 2 )... I X I1, X I2,..., X IJI J I XI. SI 2 N(µ I, σ 2 ) where X.. = ( X 1. + X X I. ) I = J 1 + J J I ( J1 j=1 X 1j + J 2 j=1 X 2j + + ) J I j=1 X 1j = J 1 + J J I X.. I Ji X ij J 1 + J J I Page 2

3 is the grand mean. Example 1: The article Compression of Single-Wall Corrugated Shipping Containers Using Fixed and Floating Test Platens (J. Testing and Evaluation, 1992: ) describes an experiment in which several different types of boxes were compared with respect to compression strength (lb). Table 2 displays the data for each box type. The data represent independent random samples from each box type population. Since the sample sizes are all equal, this one-way ANOVA is considered a balanced design. If the sample sizes were not all equal, the design would said to be unbalanced (note: there is nothing inherently wrong with an unbalanced design). Note: for this example there are I = 4 groups/samples and each sample has J = 6 observations. There are a total of I J observations in a balanced design experiment, so in this example there are a total of 24 = 4 6 observations. Table 2: Data associated with the one-way designed experiment for Box Type strength. Type of Box Compression Strength (lbs) Sample Mean Sample SD Grand mean= We see the box type 2 has the largest sample mean strength, x 2. = , and box type 4 has the smallest sample mean strength, x 4. = Box types 1 and 3 seem to be close in strength, on average, with sample mean strengths x 1. = and x 3. = , respectively. The question is, are the differences in the sample means large enough to conclude that the true means are different? Is the sample mean for box type 2 significantly greater than the rest? Is the sample mean for box type 4 significantly smaller than the others? What about the small mean difference between box types 1 and 3? Data Structure for ANOVA: Before we begin this example, let s examine the typical (but not exclusive) data structure for conducting a one-way ANOVA using statistical software like MINITAB. The data in Table 2 are entered into MINITAB with one column (variable) indicating with group/sample the data are from and the other column contains all of the measurements/observations. Page 3

4 Figure 2: Screenshot of the data set for Example 10.1 illustrating the typical data structure for ANOVA. The order of appearance of the columns doesn t matter; the variable containing the observations can come first or the variable containing the group/treatments labels can come first. Basic/Descriptive Statistics: Before doing ANOVA directly in MINITAB lets compute the basic statistics and box-plots. First, go to Stat, Basic Statistics, Display Descriptive Statistics and complete the dialogue box displayed in Figure 3(a). After you select the correction options (statistics displayed), you will get the basic statistics displayed in Figure 3(b). The boxplot displayed in Figure 3(c), was obtained by Graph, Boxplot, selecting with groups option, then scale, transform value category scales. We can see from Figure 3(c) that the sample distribution for Box Type 4 doesn t seem to overlap with the other three sample distributions. Also, while there is quite a bit of overlap for the other three samples, Box Types 1, 2 and 3, Box Type 2 tends to have greater strength than the other two. Whatever overall test we develop, we would expect the null hypothesis to be rejected and follow-up multiple comparisons should at least lead us to the conclusion that Box Type 4 has significantly less strength, on average, compared to the other three. Page 4

5 Figures 3 (a), (b) and (c): Page 5

6 Analysis of means (ANOM) From MINITAB description, ANOM is a graphical analog to ANOVA that tests the equality of population means. The graph displays each factor level mean, the overall mean, and the decision limits. If a point falls outside the decision limits, then evidence exists that the factor level mean represented by that point is significantly different from the overall mean. Figure 4 contains the ANOM for Example 1. Since the second box type mean is above the decision limits and fourth box type mean is below decision limits, this suggest the second is significantly above the the rest and the fourth is significantly below. The first and third means are not significantly different from each other. We will show later that these conclusions will be confirmed by ANOVA and multiple comparisons using Tukey s method. Figure 4: Analysis of Means (ANOM) from Example 1 data. Page 6

7 1.1 Sum of Squares (balance design) Total Sum of Squares: Total Sum of Squares (SST) would be the numerator of the sample variance if you were to compute the sample variance of all n = I J observations without regard to group/treatment, I J SST = (X ij X.. ) 2 This is why SST is referred to as the total amount of variability in the response/measurement variable. The degrees of freedom associated with SST are I J 1. Error Sum of Squares: Error Sum of Squares (SSE) is the numerator for pooled sample variances, I J SSE = (X ij X i. ) 2 SSE is considered the within treatment/group variability. Note, we can also express SSE as SSE = (J 1)S (J 1)S (J 1)S 2 I. SSE is the unexplained variability to uncertainly or random variability. The degrees of freedom associated with SSE are I (J 1. Treatment Sum of Squares: Treatment Sum of Squares (SSTr) represents the variability between treatment/groups, I J SST r = ( X i. X.. ) 2 If all the population means were equal, SSTr would tend to be small. The bigger the differences between the means, the larger SSTr would tend to be. SSTr is the amount of variability explained by differences between group/treatments. The degrees of freedom associated with SSTr are I 1. Decomposition of Sum of Squares: It can be shown that the total variability (SST) can be decomposed into the sum of SSTr and SSE. Also, the degrees of freedom can be decomposed additively. That is, SST = SST r + SSE and df(total) = df(error) + df(t reatments). So, the overall variability in the response/measurement variable is sum of the between group/treatment variability and the the within group (random variability). In other words, it is the sum of explained variability and unexplained variability. Coefficient of Determination (R 2 ): The coefficient of determination, denoted as R 2, is the proportion of the total variability (SST) which is explained by the between treatment/group (SSTr) variability. Since SST = SST r + SSE, the coefficient of determination is defined to be R 2 = SST r SST = 1 SSE SST. Page 7

8 Note that 0 R 2 1. An R 2 = 1 would indicate a perfect fit in that 100% of the total variability is explained by the differences between treatments/groups and, therefore, there is no random variability. Example 1 (cont): ANOVA: strength versus box-type Factor Type Levels Values box-type fixed 4 1, 2, 3, 4 Analysis of Variance for strength Source DF SS MS F P box-type Error Total S = R-Sq = 79.01% R-Sq(adj) = 75.86% Calculator or algebraic simplification for Sum of Squares: SST = I J (x ij x.. ) 2 = I J i 1 j=1 x 2 ij 1 IJ x2.. SST r = I J ( x i. x.. ) 2 = 1 J I i=1 x 2 i. 1 IJ x2.. SSE = I J J (x ij x i. ) 2, where x i. = x ij and x.. = j=1 I i=1 x i. 1.2 Sum of Squares (unbalanced) The results for the unbalanced design are exactly the same as for the balanced design, except for the interim algebra computations, reflecting the differing sample sizes. Once the SST, SSE and SSTr are computed the analysis for unbalanced designs are exactly the same as for balanced designs. The total sample size is n = I J i = J 1 + J J I. i=1 Total Sum of Squares: Total Sum of Squares (SST) would be the numerator of the sample variance if you were to compute the sample variance of all J 1 + J J I observations without regard to group/treatment, I J i SST = (X ij X.. ) 2 Page 8

9 This is why SST is referred to as the total amount of variability in the response/measurement variable. The Total degrees of freedom are df = n 1, where n = J 1 + J J I Error Sum of Squares: Error Sum of Squares (SSE) is the numerator for pooled sample variances, I J i SSE = (X ij X i. ) 2 SSE is considered the within treatment/group variability. Note, we can also express SSE as SSE = (J 1 1)S (J 2 1)S (J I 1)S 2 I. SSE is the unexplained variability to uncertainly or random variability. freedom are n I. The error degrees of Treatment Sum of Squares: Treatment Sum of Squares (SSTr) represents the variability between treatment/groups, I J i SST r = ( X i. X.. ) 2 If all the population means were equal, SSTr would tend to be small. The bigger the differences between the means, the larger SSTr would tend to be. SSTr is the amount of variability explained by differences between group/treatments. The treatment degrees of freedom are df = I 1. Decomposition of Sum of Squares: It can be shown that the total variability (SST) can be decomposed into the sum of SSTr and SSE. That is, SST = SST r + SSE and df(total) = df(error) + df(treatment). So, the overall variability in the response/measurement variable is sum of the between group/treatment variability and the the within group (random variability). In other words, it is the sum of explained variability and unexplained variability. Coefficient of Determination (R 2 ): The coefficient of determination, denoted as R 2, is the proportion of the total variability (SST) which is explained by the between treatment/group (SSTr) variability. Since SST = SST r + SSE, the coefficient of determination is defined to be R 2 = SST r SST = 1 SSE SST. Note that 0 R 2 1. An R 2 = 1 would indicate a perfect fit in that 100% of the total variability is explained by the differences between treatments/groups and, therefore, there is no random variability. Page 9

10 Calculator or algebraic simplification for Sum of Squares: SST = I J i (x ij x.. ) 2 = I J i x 2 ij 1 n x2.. i 1 j=1 SST r = I J i ( x i. x.. ) 2 = 1 J i I x 2 i. 1 n x2.. i=1 SSE = I J i J i (x ij x i. ) 2, where x i. = x ij and x.. = j=1 I x i. and n = J 1 +J 2 + +J I. i=1 Page 10

11 1.3 Mean Square Error Mean Square Error (balanced design): The Mean-Squared-Error (MSE) is MSE = S2 1 + S S2 I I = I i=1 J j=1 (X ij X i. ) 2 I(J 1) = SSE I(J 1) Notice that MSE is an unbiased estimator of σ 2, since ES 2 i = σ2, i = 1, 2,..., I. Mean Square Error (unbalanced design): The Mean-Squared-Error (MSE) is MSE = (J 1 1)S (J 2 1)S (J I 1)S 2 I n I = I Ji (X ij X i. ) 2 n I = SSE I(J 1) Notice that MSE is an unbiased estimator of σ 2, since ES 2 i = σ2, i = 1, 2,..., I. Mean Square for Treatments (both balanced and unbalanced design): The Mean square for treatments (MSTr) is MST r = SST r i 1. Note: if the null hypothesis is true, µ 1 = µ 2 = = µ I, the MSTr is also an unbiased estimator of σ 2. However, if then null hypothesis were false then E(MST r) > E(MSE) = σ 2. F Ratio: If all the normality assumptions hold and the null hypothesis is true, then the ratio of the mean squares is distributed as a F distribution with numerator degrees of freedom equal to (I-1) and denominator degrees of freedom I(J-1), F = MST r MSE F I 1,I(J 1) ANOVA Table (balanced one-factor design) Source of Sum of Variation df squares Mean Square f Treatments I-1 SStr MSTr=SSTr/(I-1) MSTr/MSE Error I(J-1) SSE MSE=SSE/[I(J-1)] Total IJ-1 SST ANOVA Table (unbalanced one-factor design) Source of Sum of Variation df squares Mean Square f Treatments I-1 SStr MSTr=SSTr/(I-1) MSTr/MSE Error n-i SSE MSE=SSE/[I(J-1)] Total n-1 SST Page 11

12 where n = J 1 + J J I. Example 1 (continued): Let µ 1, µ 2, µ 3, and µ 4 represent the true mean compression strength for each of box types 1, 2, 3 and 4, respectively. Assuming the populations are normally distributed with common variance, σ 2, use the data provided to test the null hypothesis that all populations means are equal. That is, the null hypothesis is with the alternative hypothesis H 0 : µ 1 = µ 2 = = µ I H a : µ l µ m, for some l m. The completed ANOVA table (produced by MINITAB) is given below. MINITAB are given on the next page in Figure 5. ANOVA: strength versus box-type Factor Type Levels Values box-type fixed 4 1, 2, 3, 4 The instructions for Analysis of Variance for strength Source DF SS MS F P box-type Error Total S = R-Sq = 79.01% R-Sq(adj) = 75.86% Coefficient-of-variation (R 2 ): The coefficient of variation is, R 2 = 79.01%. This means that 79.01% of the total variability in compression strength is explained by the mean differences between box-types. p-value and the test: The p-value associated with the global hypothesis that all the population means are equal (null hypothsis) is near zero, which would lead us reject the null hypothesis for any reasonable significance level α. Therefore, we reject the null hypothesis and conclude that the sample means are significantly different between the box types. p-value interpretation: If the null hypothesis were true (all the population mean strengths were equal), there is a near zero chance of observing sample mean differences as large or larger then we did in this experiment/sample. So, since we concluded that the sample means strengths were statistically signficant between the box-types and 79.01% of the variability is explained by the differences in strengths between the box-types, we have a great deal of evidence that at least two of the box-types are different, on average. We would like to investigate this further using multiple comparisons. Page 12

levels option, then in the box labeled Response put the strength variable (name or column

13 Figure 5: To produce this in MINTAB, Goto Stat, ANOVA, one-way. In the dialogue box, make sure to select the Response data in one column for all factor levels option, then in the box labeled Response put the strength variable (name or column number) and in the box labeled factor put the box-type variable (name or column number). See the dialogue box below. Page 13

14 1.4 Multiple comparisons: So, as noted, if the null hypothesis is rejected via ANOVA, our conclusion is that at least two are different. To examine the nature of these differences, we could do, as suggested previously, comparison box plots or some other graphical methods to determine which means are different from each other. However, we would want to follow that up with significance tests, for a more formal analysis. However, to compare all means to each other we would have to do I(I 1) 2 comparisons. Recall, in example 1, there were I = 4 treatment groups, so, for all pairwise comparisons you would have to do 4(3)/2 = 6 paired tests, µ 1 to µ 2, µ 1 to µ 3, µ 1 to µ 4, µ 2 to µ 3, µ 2 to µ 4, and µ 3 to µ 4, to cover all possibilities. To control for the fact that we must do many different individual tests, we make adjustments at the individual test level to ensure that the family-wise or experiment-wise type I level is fixed at α. Tukey s Studentized range Test in one such method. Tukey s Method: The process works similarly to the hypothesis testing using confidence intervals, as in Chapter 9. To ensure the family-wise level is α, we test the hypothesis for any two means µ i and µ j, for some i < j, H 0 : µ i = µ j versus H a : µ i µ j, by computing the adjusted confidence intervals X i. X j. ± Q α,i,i(j 1) MSE/J where Q α,i,i(j 1) is found in Table A.10 and is an inflated version of t-distribution used in Chapter 9 methods. For each pair, if the interval contains zero then we would fail to reject the null hypothesis and conclude that the two sample means are not significantly different. If the interval does not contain zero, then we would reject the null hypothesis and conclude that the sample means are significantly different from each other (and, therefore, we conclude µ i µ j ). Example 1 (continued: Recall, previously, we rejected the global null hypothesis that all the population means are equal and concluded that at least two population means are different. Summary statistics, box plots and ANOM, all suggested that sample mean for Box-type 4 was significantly less than the other three. Here, we follow-up with a formal multiple comparisons using Tukey s method and MINITAB, see Figure 6 for MINTAB instructions. Below is the output produced. Tukey Pairwise Comparisons Grouping Information Using the Tukey Method and 95% Confidence box_type N Mean Grouping A A A B Means that do not share a letter are significantly different. Page 14

15 Since the means for 2, 1 and 3, all share the same letter, the means aren t considered significantly different. However, box-type 4 has a different letter than the other three, so we say the sample mean for box-type 4 is significantly different than the other three. This confirms our visual inspection (graphical analysis). MINTAB also supplied the output for Turkey simultaneous Tests for Differences between the means, upon which the above summery was based. I provided this output below: Tukey Simultaneous Tests for Differences of Means Difference Difference SE of Adjusted of Levels of Means Difference 95% CI T-Value P-Value ( -22.6, 110.4) ( -81.4, 51.6) (-217.5, -84.5) (-125.4, 7.6) (-261.4, ) (-202.5, -69.5) Individual confidence level = 98.89% Figure 6: In MINTAB, follow all the steps to produce the ANOVA (as demonstrated previously), but click on the Comparisons button and fill in the dialogue box as indicated below, Page 15

16 Figure 7 (a): Figure 7(b): Page 16

17 1.5 Checking the Assumptions In class, I went through an example where we checked the assumptions. Recall, for the F-test in an ANOVA to be valid, we assumption that the underlying populations are normally distributed with equal variances (common variance assumption). To examine the normality common variance assumptions, we produce a histogram, normal probability plot and residual plot all based on the residuals. To get the plots in Figure 8, below, you fill in the Graphs dialogue box as indicated in Figure 9, on the next page. Figure 8: Residual plots based on ANOVA for Example 1 Based on the normal probability plot (upper left corner), the data do not indicate any significant deviations from normality. The residual plot (upper right hand corner), we would expect random scatter about zero and the spread of the data points should be constant. Based on this graph it looks like the constant variance assumption holds. Page 17

18 Figure 9: How to get the Residual plots based on ANOVA for Example 1. Page 18

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population