QUEEN MARY, UNIVERSITY OF LONDON

QUEEN MARY, UNIVERSITY OF LONDON MTH634 Statistical Modelling II Solutions to Exercise Sheet 4 Octobe07. We can write (y i. y.. ) (yi. y i.y.. +y.. ) yi. y.. S T. ( Ti T i G n Ti G n y i. +y.. ) G n T i + G n ( ) T i G +n n The form S T t T i / G /n of the treatment sum of squares is convenient for calculation by hand. However, the equivalent expression S T t (y i. y.. ) shows more clearly that S T is nonnegative. We also have (y ij y i. ) j (yij y ijy i. +yi. ) j yij j yij j yij j S E. y i. T i T i + j y ij + yi. ( ) Ti In practice, to calculate the residual sum of squares for a given set of data, we would always find S G and S T first, and then compute S E as the difference S E S G S T. The equivalent expression S E t ri j (y ij y i. ) better shows that S E is nonnegative.. (a) An appropriate model for the data is the one-way ANOVA model for a completely randomised design. The treatments, and 3 are Control, NT-H and NT-L, respectively. Each of these has replication 5 fo,,3.

The corresponding model equation is Y ij µ+α i +ǫ ij fo,,3 and j,,...,5. Here, Y ij is the weight increase for the jth rat under treatment i, µ represents the overall mean and α i is the effect of treatment i. The ǫ ij terms are random errors for which it is assumed that ǫ ij N(0,σ ). Moreover, all of the ǫ ij are assumed to be independent. (b) The treatment totals for Control, NT-H and NT-L are, respectively, equal to T 578, T 36 and T 3 44. The grand total is G,38 and the sum of the squared responses is 3 5 j y ij 33,357. The correction factos G /n,38 /5,907,6/5 7,44.07. From the totals given, the treatment sum of squares can be calculated as S T 5 (578 +36 +44 ) 7,44.07 659,769 7,44.07 4,809.73. 5 Since 3 5 j y ij 33,357, the total sum of squares is and so the residual sum of squares is S G 33,357.00 7,44.07 6,.93, S E 6,.93 4,809.73,403.0. The corresponding ANOVA table is then as follows: Source SS df MS F Treatments 4, 809.73, 404.87 0.57 Residual, 403.0 6.93 Total 6,.93 4 To test for differences between the treatments, we test H 0 : α α α 3 against the alternative H that the effects of at least two of the treatments are different. For the test at the 5% level of significance, the observed value of F 0.57 is compared with the percentage point F,,0.05 3.885. Since F > F,,0.05, we reject H 0. We can thus conclude that the three diets have different effects on the increase in body weight. Note that the test does not allow us to draw any more specific conclusions. (c) The data points and means are included in the scatterplot. The means for the treatments are given below. Treatment Mean Control 5.6 NT-H 7. NT-L 88.4

Weight 30 00 70 Control NT-H NT-L The plot shows a clear distinction between results for Control and NT-H, and, less strongly, between Control and NT-L. The separation between results for treatments NT-H and NT-L is less clear. Regarding variability, there does not seem to be strong evidence for variability differing between treatments. (d) Tables (a)-(f) of the New Cambridge Statistical Tables present per cent points of various F distributions. Under the null hypothesis, the F distribution of the F test in part (b) had ν and ν degrees of freedom. Looking at Table (a), we can see that the 0% point of the F, distribution is equal to F,,0.0.807. The observed value of the test statistic in part (b) is F 0.57. Since F > F,,0.0, it follows that the p-value is smaller than 0%, and so 0.0 or 0% is an upper bound for the p-value. One way to demonstrate that you have understood why this is so is to draw a picture which sketches the probability density function of the F, distribution, shows the value F,,0.0.807 and the corresponding area under the curve, the size of which is 0. or 0% of the total area, and the value F 0.57 of the test statistic and the corresponding area under the curve. This area is equal to the p-value. We want to find the smallest upper bound and so we compare F 0.57 with the percent pointsintheothertablesinasimilar way. Inall cases, F 0.57 isgreater than the per cent point of the F, distribution. In particular, from Table (f), which gives 0.% points, we see that F > F,,0.00, where F,,0.00.97. Thus, 0.00 or 0.% is an upper bound for the p-value of the test in part (b) and it is in fact the smallest one that we can find by using Tables (a)-(f). 3. The test statistic for the two-sample t test, assuming equal variances in the two populations from which the samples are drawn, can be written as t y. y. ( ), r + {(r )s +( )s } r + where y i. T i / is the sample mean and s i j (y ij y i. ) /( ) is the sample variance for the ith sample. This form of the test statistic may appear to look different from the form of the statistic you met previously, but this is only due to the different notation. 3

We want to show that t is equal to F M T /M E. Each of the two samples may be regarded as corresponding to one out of t treatments. We start by showing that, if there are only t treatments, then Subsequently, we show that M E (r )s +( )s. () r + M T (y. y. ) r +. () From () and (), it is then obvious that t F. Question shows that S E t ri j (y ij y i. ). For t treatments, we have n r +, and so n t r +. Hence, we have M E S E n t r + r + r + r (y j y. ) + j r r r j (r )s +( )s. r + j (y j y. ) (y j y. ) + { (r )s +( )s } j (y j y. ) For t treatments, we have t. The mean square for treatments is therefore equal to M T S T t S T T + T G r n. Finally, the following calculation shows that () holds as desired: ( ) T r T (y. y. ) r + T r + T T r + T r r + r r ( r T T T + r T ) r + r r + ( r T r + r T r T T T + r T + T T T r + T (T +T ) r + T + T G r n M T. 4 )

4. (Feedback component) (a) The model for the score Y ij of student j given treatment i is Y ij µ+α i +ǫ ij, whereµandα,...,α t arethemodel parametersrepresentingtheoverall mean and the individual effect of each treatment; these parameters are constants with unknownvalue. Theerrorsǫ ij areindependentandidentically distributedasn(0,σ ). There are four treatments, so t 4, and thus i {,,...,t}. For each value of i, the student numbes indexed by j that ranges between and, and the replication pattern for this problem is r 6, 4, r 3 3 and r 4 3, so that the number of observations is n r + +r 3 +r 4 56. (b) This part is solved by identifying information available in the question, and from it building the required statistics for the ANOVA table. The following steps are needed: i. The means per method y i relate directly to the treatment totals via the formula y i T i /. Starting from the values y 74.4375, y 70.743, y 3 70.93 and y 4 77.93, the totals per treatment are T y r 74.4375 6,9, T y 70.743 4 990, T 3 y 3 r 3 70.93 3 9 and T 4 y 4 r 4 77.93 3,03. ii. The grand total is G T +T +T 3 +T 4 4,6. iii. The correction factos G /n 4,6 /56 6,94,456/56 30,56. iv. The treatment sum of squares is computed as S T T + T + T 3 + T 4 G r r 3 r 4 n,9 + 990 6 4 + 9 3 +,03 30,56 3 88,655.065 +70,007.48 +65,39.0769 +78,936.0769 30,56 463.4768. v. For the total sum of squares, we use S G 4 j y ij G n 3,388 30,56 8,86. vi. The residual sum of squares is obtained as the difference S E S G S T 8,86 463.4768 8,398.53. We are now able to complete the analysis of variance table: Source SS df MS F Methods 463.4768 3 M T S T /3 54.49 M T /M E 0.9565 Residual 8,398.53 5 M E S E /5 6.50 Total 8, 86. 55 The hypothesis to be tested is H 0 : α α α 3 α 4 against the alternative that there exist at least two treatment effects α i and α j, i j, such that α i α j. 5

Under H 0, the test statistic follows a F 3,5 distribution. An approximate critical value at the 5% significance level is.7903. This value is obtained by interpolating between values F 3,40,0.05.839 and F 3,60,0.05.758, so that F 3,5,0.05 F 3,40,0.05 + 5 40 60 40 (F 3,40,0.05 F 3,60,0.05 ).839 0.08.7904. 0 As F 0.9565 <.7904 F 3,5,0.05, we do not reject H 0 and conclude that there is no significant difference in the average score between the methods of teaching. The final part of the question involves the computation of 4 ri j y ij in step v. above, but only using information from the table. For each value of i, we compute the corresponding inner sum. This computation starts by noting that the second column of data given are standard errors of means. Each of the standard errors is s.e.(mean) i s.e. i /, where s.e. i yij yi. j Inverting the above relation gives the required inner sum: j y ij {s.e.(mean) i ri } ( )+ y i, which is easy to evaluate for each value of i. The addition of the founner sums is 4 yij 90,558.9038 +7,97.9699 +67,309.9899 +80,546.970 3,388, j which is the same value as given in the question. 6