Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population order to predict a future value X n1 using its point estimate X the prediction error X X n1 being a linear combination of normals is itself normally distributed with mean E [ X X n1 ] = = 0 and variance V [ X X n1 ]=V [ X ]V [ X n1 ]= 2 n 2 = 2 1 1 n. Thus the standardized variable Z = X X n1 1 1 n will be standard normal. Hence with probability 1 we can say z /2 Z z / 2 or solving for the future value X n1 we find that the 1001 % predictionterval for a single future observation X n1 (PI for short) is : X z / 2 1 1 n X n1 X z / 2 1 1 n. Note that the width of the predictionterval which is proportional to the factor 1 1 n (close to 1) is considerably wider (by approximately a factor of n ) than the width of the confidence interval for the mean whose width is proportional to 1 n. Completely randomized designs (or one way classification) : Given independent random samples from different populations which could represent treatments, groups, etc. the experimenter wishes to test the null hypothesis H 0 : 1 = 2 =...= = that the (actual population) means from these samples are the same. Denoting the j th observation the i th sample by y i j We have the one way classification scheme Observations Means Sums of Squares n 1 Sample 1 : y 11, y 12,..., y 1n 1 y 1 y 1j y 1 2 n 1 Sample 2: y 21, y 22,..., y 2n2 y 2 y 2j y 2 2 n 1 Sample : y 1, y 2,..., y n y 1 y j y 2 The sum of all observations, the total sample size N and the overall sample mean or grand mean are y
T o = y = ij T i where T i = y N = ij y= y ij i =1 j =1 = WEEK 13 page 2 i =1 y i = Total sum of squares decomposition : (SSE=within sample SS) + (SS(Tr)=between sample SS) total sum of squares = error sum of squares + treatment sum of squares SST = SSE + SS( Tr ) n y i ij y 2 = y ij y i 2 y i y 2 j =1 Degrees of freedom N - 1 = N- + -1 Mean Square (MS) treatment mean square MS(Tr) = SS(Tr) /(-1) error mean square MSE = SSE/(N-) To get mean square sum of squares we divide each sum of squares by its number of degrees of freedom. j =1 Derivation : The sum of squares identity follows with a little algebra upon squaring out y ij y = y ij y i y i y and summing the result, noting that the sum of the cross term vanishes by definition of the sample means while the square of the last term is the same within a given sample of size as it doesn't depend on j. With the correction term for the mean given by one has the Shortcut formulas : 2 C = T o N = N y 2, SST = j =1 2 T i y 2 ij C SS Tr = C i =1 SSE = SST SS Tr where T i = y = j i y i is the sum of observations in the i th sample. Hypothesis test : Under the assumption of the null hypothesis H 0 : 1 = 2 =...= = that the treatment means are the same, both the error and the treatment mean squares are unbiased estimates of 2. That is one has 2 =E [ MS Tr]=E [ MSE ]. Thus under H 0 these mean square quantities behave very much lie sample variances. Their ratio upon which the test is based is the F test statistic MS Tr SS Tr / 1 F =F 1, N = = MSE SSE /N with -1 numerator and N- denominator degrees of freedom. We reject H 0 at significance level if the above F statistic exceeds the F-critical value F F T o N
corresponding to the numbers of degrees of freedom parameters above. Typically WEEK 13 page 3 the results obtained by decomposing the total sum of squares into its parts is summarized in an Analysis of Variance Table : Source of Degrees of Sum of variation freedom squares Mean square F Treatments -1 SS(Tr) MS(Tr) = SS(Tr)/(-1) MS Tr/ MSE Error N- SSE MSE = SSE/(N-) Total N-1 SST Conclusion : Reject H 0 : 1 = 2 =...= = if F F The alternative hypothesis here is that at least two of the population means are different : H a : m n for some m n (at least two means are unequal). EXAMPLE 1 In an effort to determine the most effective way to teach safety principles to a group of employees at Weedco, four different methods were used. A sample of 20 employees were randomly assigned to one of the four groups. The first group was given programmed instruction boolets and wored through the course at their own pace. The second group attended lectures. The third group watched television presentations, and a fourth was divided into small discussion groups. At the end of the session, a test was given to the four groups. A high score of 10 was possible. The results were : TEST GRADES Programmed Group instruction Lecture TV discussion 6 8 7 5 5 7 9 5 6 8 6 6 5 8 8 6 6 8 9 5 The following is an Analysis of Variance Mini-tab software output with missing information: ANALYSIS OF VARIANCE ON GRADES TREAT 26.550 8.850 ERROR TOTAL 36.550 a) Complete the missing values : SSE = SST SS(Tr) = 36.550 26.550 = 10.0 N = 20 total observations so N-1 = 19 is the total DF (degrees of freedom), There are = 4 groups so -1 = 3 is the treatment DF and N - = 16 is the error DF, MSE = SSE/(N- ) = 10/16 =.625 F = MS(Tr) / MSE = 8.850/.625 = 14.16 To summarize we have TREAT 3 26.550 8.850 14.16 ERROR 16 10.0.625 TOTAL 19 36.550 b) Test at the.05 level that there is no difference among the four means.
H 0 : 1 = 2 =...= = H a : m n for some m n Reject H 0 if F F.05 3,16=3.24 WEEK 13 page 4 Decision : F = 14.16 is greater than 3.24 so we reject H 0 at significance level.05. For these sample means y 1 =5.6, y 2 =7.8, y 3 =7.8, y 4 =5.4, the largest difference is y 2 y 4 =2.4. To get a 95% confidence interval for the actual difference 2 4, for the sample variance s 2 we use the pooled MSE (the mean sum of squares for error) which is an unbiased t estimator of 2 / 2 t= y 2 y 4 2 4. Re-woring the statement s2 1 1 t /2 where the number n 2 n 4 of degrees of freedom of the t variable is the same as the number N- = 16 for s 2 = MSE, since s 2 1 1 =.6252 /5=.25= 1 n 2 n 4 4 so s2 1 1 = 1 n 2 n 4 2, we find our 95% CI for 2 4 is y 2 y 4 ±t.025, 16 s2 1 n 2 1 n 4 =2.4±2.12/ 2=[1.34,3.46]. Note that we can not be 95% sure that all 2 = 4 2 =6 differences of means simultaneously lie in these six 95% confidence intervals for each since nowing that 6 different events each have probability.95 does not allow us to conclude the same is true of their intersection. Bonferroni's method discussed in section 12.4 says that if we want a confidence interval statement for all 6 differences of means to hold simultaneously with probability 1 in the individual statements we should replace by /6 to get our CI's for each difference. EXAMPLE 2 Problem 12.5 of text The following are numbers of mistaes made in 5 successive days for 4 technicians woring for a photographic laboratory : Technician I Technician II Technician III Technician IV 6 14 10 9 14 9 12 12 10 12 7 8 8 10 15 10 11 14 11 11 Test at the level of significance =.01 whether the differences among the 4 sample means can be attributed to chance. There are N = 20 total observations here hence 19 total degrees of freedom The grand meas y=10.65 The four sample means are y 1 =9.8, y 2 =11.8, y 3 =11, y 4 =10 corresponding to the sample totals T 1 =49, T 2 =59, T 3 =55, T 4 =50 The grand total is T o =213 so C =213 2 / 20=2268.45.
4 5 y 2 ij j =1 = 2383 so SST = 2383 2268.45 = 114.55 by the shortcut formula. 4 2 T Since each =5, SS Tr= i 5 C=2281.4 2268.45=12.95 Thus SSE = SST SS(Tr) = 114.55-12.95 = 101.6 and we have the ANOVA table TREAT 3 12.95 4.3167.67979 ERROR 16 101.6 6.350 TOTAL 19 114.55 Decision : Since (using table 6(b) of appendix B), F =.67979 F.01 3,16 = 5.29 we do not reject H 0 at significance level.01. WEEK 13 page 5 Note in this case if we want to write down a confidence interval using the grand mean y to estimate the common mean belonging to all = 4 populations, for the pooled variance s 2, under the assumption of H 0 we can use the total sum of squares over the total degrees of freedom or s 2 = SST/(N-1) = 114.55/19 = 6.028947 which gives s = 2.45539 to get (for a t critical value with N-1 = 19 degrees of freedom) a 95% CI for we solve the inequality t / 2 t= y s/ N t /2 giving the 1001 % CI for : y±t.025,19 s/20=10.65±2.0932.45539/20=10.65±1.1491=[9.501,11.799] EXAMPLE 3 Problem 12.7 of text Given the following observations collected according to the one way analysis of variance design Treatment 1 : 6 4 5 Treatment 2 : 13 10 13 12 Treatment 3 : 7 9 11 Treatment 4 : 3 6 1 4 1 a) Decompose each observation y i j as y ij =y y i y y ij y i and obtain the sum of squares and degrees of freedom for each component. There are N = 15 observations, with total sum T o =105 hence grand mean 105/15 = 7 The sample sums are T 1 =15, T 2 =48, T 3 =27, T 4 =15 The corresponding sample means are y 1 =15/3=5, y 2 =48/ 4=12, y 3 =27/3=9, and y 4 =15/5=3 In matrix notation we have y ij = y y i y y ij y i 6 4 5 1 7 7 7 7 2 2 2 1 1 0 13 10 13 12 = 7 7 7 7 5 5 5 5 1 2 1 0 7 9 11 7 7 7 2 2 2 2 0 2 3 6 1 4 7 7 7 7 4 4 4 4 4 0 3 2 1 2 SST = 1 2 3 2 2 2 6 2 3 2 6 2 5 2 0 2 2 2 4 2 4 2 1 2 6 2 3 2 6 2 =238 with N-1 = 14 d. f.
SS(Tr) = 3 2 2 45 2 32 2 5 4 2 =204 with -1 = 3 d. f. WEEK 13 page 6 SSE = SST SS(Tr) = 34 with N - = 11 d. f. b) Construct an analysis of variance table and test the equality of treatments using =.05 : TREAT 3 204 68 22 ERROR 11 34 3.0909 TOTAL 14 238 By table 6(a) of appendix B F =22 F.05 3,11=3.59 so we reject H 0 at level.05. (said differently F = 22 is significant at this level ) The Model equation for one way classificatios : Y ij = i ij for,2,..., ;,2,..., where the ij are independent normals with zero means and common variance 2. Here i = i gives the mean of the i th population. The null hypothesis in this formulation says that with i = the effect of the i th treatment, all the effects are zero or H o : 1 = 2 =...= =0. Our best estimates of the parameters under a least squares criterion are =y= grand mean, i = y i y, i = y i