Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Analysis of Variance and Design of Experiment-I MODULE IX LECTURE - 38 EXERCISES Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Example (Completely randomized design) Suppose there are four types of medicines which claim to control the body temperature of patient having fever. Let us denote them as M, M, M 3 and M. Suppose there are 0 patients who are suffering from fever and have agreed to use the medicine. Our objective is to know if there is any difference in the effects of the medicines. Note that the efficiency of the medicine also depends on the age of person but at present we are ignoring this aspect and we will consider it later in the designing of randomized block design. At present, we consider the most simple set up of completely randomized design. Suppose it is decided to give medicines M to patients, M to 5 patients, M 3 to 6 patients and M to remaining 5 patients. Note that the number of patients to be given a specific medicine are decided at random. Moreover, which patient has to be given which of the medicine is decided randomly so that other factors, e.g., age does not affect the final conclusions. The medicines are administrated and the observations on the number of hours of temperature control are recorded as follows: M 6, 8,, 9 Number of hours M 7, 9, 6, 5, M 3 8, 0, 9,, 3, 5 M 7, 5, 6, 6,

3 In the one way model τ i y = μ + τ + ε, ij i ij denotes the effect the medicine and there are types of medicine, so I =,, 3, ; y ij denotes the number of hours of temperature control on j th patient who is given i th medicine, so in our set up: M y y y3 y = 6, = 8, =, = 9 M y y y3 y y5 = 7, = 9, = 6, = 5, = M y y y y y y 3 3 3 33 3 35 36 = 8, = 0, = 9, =, = 3, = 5 M y y y3 y y5 = 7, = 5, = 6, = 6, = The null and alternative hypothesis under consideration are H H : τ = τ = τ = τ 0 3 : The effect of at least one pair of medicines is not the same.

We now test this null hypothesis using the set up of one way analysis of variance. First we explain how to compute all the related terms: n =, n = 5, n = 6, n = 5 3 n= n = 0 y y y y i= i n 6+ 8+ + 9 = y = = 6.75 o j n j= n 7+ 9+ 6+ 5+ = y = = 6. 5 o j n j= n3 8+ 0+ 9+ + 3+ 5 = y = = 9.5 6 3o 3j n3 j= n 7+ 5+ 6+ 6+ = y = = 5.6 o j n j= 5 G = sum of all the observations=3 y oo n i G = yij = = 7.5. n 0 i= j= The estimates of treatment effects are ˆ τ = y y = 0. o ˆ τ = y y = 0.95 o 3 3o o oo oo ˆ τ = y y =.35 oo ˆ τ = y y =.55 oo

5 The total sum of squares is n i G TSS = yij n i= j= sum of squares of G = all the observations n = 9.56 = 6.. The treatment totals are the total of the observations obtained by giving a specific treatment. The treatment totals due to treatment τ, τ, τ3 and τ are denoted as T, T, T3 and T respectively and are obtained as n T = y = 6+ 8+ + 9= 7 j j= T n = y = 7+ 9+ 6+ 5+ = 3 j j= n3 T = y = 8 + 0 + 9 + + 3+ 5 = 57 T 3 3j j= n = y = 7+ 5+ 6+ 6+ = 8. j j=

6 The sum of squares due to treatment is T G SSTr = n n i i= i T T T T G = + + + n n n n = 07.75.56 3 3 = 070.9. The sum of squares due to error is SSE = TSS SSTr = 6. 070.9 = 76.5. The mean square due to treatment is SStr MSTr = = 356.73. The mean square due to error is SSE MSE = =.77 0 The value of F - statistic is MStr F = = 7.85. MSE The tabulated value of F at 3 and 6 degrees of freedom at 5% level of significance ( ) F 3,6 = 3.. tab

7 The analysis of variance is now constructed t as Source of variation Degrees of freedom Sum of squares Mean squares F- value Medicines =3 070.9 356.73 (treatments) Error 0 = 6 76.5.77 7.85 0 = 9 6. Since F>F tab, so H 0 is rejected at 5% level of significance. This means that on the basis of given sample, it can be concluded that the treatment effects are not the same, i.e., the effect of the four medicines are not the same on the patients but they have different effects. This conclusion poses the next question. When the null hypothesis is rejected then which of the treatment effect are responsible for the rejection in the sense that whether all the treatment effects are different from each other or some of the treatmentst t have same effect. To know the answer, we go for multiple l comparison test. t Various available multiple l comparison tests can be used. It is not necessary that the conclusions based on different tests are always same.

8 Example (Randomized block design) We continue here with the sameset up of Example but now conduct the experiment in the set up of arandomized d block design. Suppose there are four medicines denoted as M, M, M 3 and M which are to be tested over 0 patients to know their effect in controlling the body temperature of patients having fever. The effect of medicine depends not only on the chemical composition but also on other factors also. We consider here another important factor as age of the patients. Based on age, we divide the patients into five groups comprising of patients with ages 5-0 years, 5-0 years, 5-30 years, 0-5 years and 50-65 years which provides the five blocks denoted as B, B, B 3, B and B 5 respectively. In each age group (or block) there are patients which is same as the number of medicines. Now the four medicines are given to four patients at random in each block and the readings on the number of hours of fever control are recorded. The observations are compiled in the following table which are under the set up of an RBD. Block Block Block 3 Block Block 5 5-0 years 5-0 years 5-30 years 0-55 years 50-65 years Medicine 5 7 8 6 7 Medicine 6 7 5 5 Medicine 3 6 8 8 5 Medicine 7 9 7 6 One can observe here that the observations depends not only on the type of medicine given but also how the patients are allocated to different blocks or in simple words, how the blocks are constructed. The way in which blocks are constructed introduces the block effect. Ideally, it is expected that the construction of block does not affects the efficiency of medicines. So it is also also important to check whether all the block effects of all the blocks are same or not besides the equality of treatment effects.

9 The computations involved are as follows: Block Block Block 3 Block Block 5 Block totals Means M 5 7 8 6 7 B = 33 y o = 6.6 M 6 7 5 5 M 6 8 8 5 3 M 7 9 7 6 B = 7 y o = 5. B 3 = 3 y o3 = 6. 33 y o Treatment totals T = 9 T = 8 T 3 = 3 T = T 5 = 3 B = y = 6.6 The model is yij = μ + βi + τ j + εij where * y is the number of hours of fever control when medicine is given to a patient in i th block. y ij μ * is the general mean effect. β i * is the i th block effect (effect of age), i =,, 3,, 5. τ j * is the j th treatment effect (effect of medicine), j =,, 3, as there are four medicines. * n = 0.

0. 5 yij i= j= ˆ μ = yoo = = = 6. 5 0 5 ˆ β = y y i ij oo 5 j= = y y io oo ˆ β =.75 6. =.5 ˆ β = 7 6.= 0.8 ˆ β = 8 6.=.8 3 ˆ β = 5.5 6. = 0.7 ˆ β = 5.75 6. = 0.5 5 ˆ τ = y y j ij oo i= ˆ τ = 6.6 6. = 0. = y y oj oo ˆ τ = 5 5. 6 6.= 0.8 08 ˆ τ = 6. 6. = 0 3 ˆ τ = 6.6 6. = 0. Correction factor = G 768.8 n = 0 =

The total sum of squares is 5 G TSS = yij n i= j= sum of squares = of all the observation = 80 768.8 =.. G n The sum of squares due to blocks is B j G SSBl = j= n = 795.5 768.8 = 6.7. The sum of squares due to treatments is T G SSTr = = 773.6 768.8 =.8. 5 i i= 5 n The sum of squares due to error is SSE = TSS SSBl SStr =. 6.7.8 = 9.7. The mean square due to block is SSbl 6.7 MSBl = = = 6.675. 5

The mean square due to treatment is SStr MSTr = =.6. The mean square due to error is SSE MSE = = 0.808. (5 )( ) The F-statistic for testing H : β = β = β = β = β is F bl = MSBl 8.6. MSE = ob 3 5 The critical value of F at 5% level of significance at and degrees of freedom is F (,) = 3.6. Based on the given set of data the null hypothesis H ob is rejected as F (,). bl > Ftab tab The F - statistic for testing H : τ = τ = τ = τ is F tr MSTr = =.98. MSE ot 3 The critical value of F at 5% level of significance with 3 and degrees of freedom is F tab(3,) = 3.9.

3 Based on the given set of data, the null hypothesis H ot is accepted as F < F (3,). The analysis of variance table is compiled as follows: tr tab Source of variation Degrees of freedom Sum of squares Mean squares F-value Blocks 6.7 6.675 8.6. Medicines 3.8.6.98 Error 9.7 0.808 Ttl Total 9. Looking the conclusions drawn from this arrangement of RBD, we observe that on the basis of given set of data. The null hypothesis corresponding to the medicines, i.e., the treatment effects is accepted. This means that all the medicines are equally effective and there is no difference in the medicines. The null hypothesis about the block effect is rejected, i.e., the effect due to age of patients is the not same in all the blocks. Now note that in the example, the age effect was not considered and the conclusion was that all the medicines have same effect. This conclusion is reversed in this example when the age effect is incorporated. So the blocking factor plays an important role.

Example 3 (Latin square design) The mileage of a car is the number of kilometers it runs with one liter of petrol. The mileage depends on the type of petrol, type of car as well as the driving habits of the driver. Suppose there are five varieties of petrol denoted as A,B,C,D,E; five different drivers denoted as D, D, D3, D, D5 and five cars denoted as car, car, car 3, car and car 5. An experiment is conducted to know the effect of these factors, viz., petrol, driver and car using the Latin square design. A Latin square of order 5 5 is chosen and its rows are columns are randomized. The resulting square is given as follows: ( ) A D B C E D A C E B E B A D C B C E A D C E D B A

5 Based on this Latin square, the experiment is conducted and following data on the number of kilometers run by one liter of petrol is obtained as Drivers D D D 3 D D 5 Car A D B C E 7 9 3 Car D A C E B 9 6 3 7 Car3 E B A D C Cars 7 5 Car B C E A D 6 8 30 8 0 Car5 C E D B A 8 9

6 The interpretation of this data is as follows. The first cell has value 6. This means that the car was driver by driven D with one liter of petrol type A and it runs 6 km. Similarly the second entry in the first row tells that when car was driven by driver car D using one liter of petrol type D, then it runs 9 kms. Now we conduct the analysis of data as follows. The corresponding model is where yijk = μ + αi + β j + τk + εijk, μ α i is the general mean effect, is the main effect of i th row, i.e., the main effect of i th car, i =,, 3,, 5, β j is the main effect of j th column, i.e., the main effect of j th driver, j =,, 3,, 5, and τ is the main effect of k th treatment, i.e., the main effect of k th variety of petrol. τ k The null hypothesis are H : α = α = α = α = α, H H 0R 3 5 : β = β = β = β = β, 0C 3 5 : τ = τ = τ = τ = τ. 0T 3 5

7 The observations are tabulated as follows Drivers Row totals D D D 3 D D 5 Car 7 9 3 R = 0 Car 9 6 3 7 R = 99 Cars Car3 7 5 R 3 = 99 Car 6 8 30 8 0 R = Car5 8 9 R 5 = 0 Column totals C = C = 09 C 3 = 96 C = 3 C 5 = 9 Grand Total G = 53 The row total and column totals are mentioned in this table. The treatment totals are obtained as follow. Treatment total due to A is the sum of all the observations obtained by the use of petrol A. It is given as TA = T = 7 + 6 + + 8 + =.

8 Similarly, the treatment totals due to the use of petrols B,C,D and E are T T T B C T D E = T = + 7 + + 6 + = 00 = T = 3+ 3+ 5 + 8 + 8 = 87 3 = T = 9 + 9 + 7 + 0 + 9 = 9 = T5 = + + + 30 + = 8 n = 5. Correction factor (CF)= G n 53 53 = = 09.6. 5 SSR = Sum of squares due to rows (cars) = R G n 5 i i= 5 = 08.-09.6 09 6 = 77.0 SSC = Sum of squares due to columns (driver) = C 5 j G j= 5 n = 00.6-09.6 = 63. SSTr = Sum of squares due to treatment (petrol) = 5 T G n k k = 5 = -09.6 = 99.8

9 5 5 5 TSS = Total sum of squares = y y ijk i= j= k= = 5-09.6 = 83.8 G n Sum of squares due to error SSE = TSS SSR SSC SSTr = 83.8 77.0 63. 99.8 =3.5 SSR Mean square due to rows = MSR = = 9.6 5 Mean square due to column = SSC MSC = = 5.86 5 SSTr Mean square due to treatments = MSTr = = 0.96 5 SSE Mean square due to error = MSE = =.96 (5 )(5 )

0 F - statistic for rows F r MSR = =.6 MSE F - statistic for columns F c MSC = =.33 MSE F - statistic for treatments F Tr MSTr = = 0. MSE

The tabulated value of F at and degrees of freedom at 5% level of significance is These values can be compiled in the following analysis of variance table: F tab (,) = 3.6. Source of variation Degrees of freedom Sum of squares Mean squares F- value Rows Columns Treatments 77.0 63. 99.8 9.6 5.86 0.96 Error 3.5.96.6.33 0. Total 83.8 We conclude now on the basis of given data that the values of F - statistic. ( ) H : α = α = α = α = α Since F F,, so is accepted and it means that the effect of all the cars is r < tab 0R 3 5 same on the mileage. ( ) Since F F,, so H : β = β = β = β = β is accepted. This means that the effect of all the drivers c< tab 0C 3 5 is same on mileage. ( ) H : τ = τ = τ = τ = τ Since F F,, so is rejected. This means that the effect of all varieties of tr > tab 0T 3 5 petrol on mileage is different. So different petrol affect the mileage differently.

Example (Factorial experiment) Suppose the rotation per minute (rpm) of an electric motor depends on four factors - Voltage (A), current (B), temperature (C) and length of blades (D). Each factor has two levels as follows: Voltage: level = 0 volts (a 0 ) level = 60 volts (a ) Current : level = 7 ampere (b 0 ) level = ampere (b ) Temperature: level = 0 degree centigrade (c 0 ) level = 35 degree centigrade (c ) Length of blades: level = 0 cm (d 0 ) level = 5 cm (d ).

3 The experiment is conducted using RBD is the set up of combinations is observed as follows. factorial and rpm with four replications for each treatment Treatment combinations () a b ab c ac bc abc d ad bd abd cd acd bcd abcd Observed rpm (in hundreds) Replications Replication Replication Replication 3 Replication 7 6 3 6 6 6 0 9 38 50 70 0 80 60 9 5 7 0 5 3 0 5 3 8 79 6 7 8 6 30 7 0 50 36 0 50 60 3 6 6 70 5 5 3 8 6 0 0 6 33 5 7 50 5 80 The total rpm are obtained for each treatment combination by summing up all the observations corresponding to each treatment combination from the four replications. For example, the total rpm is -- due to a is (a) = 7 + 9 + 8 + 5 -- due to b is (b) = 6 + 5 + 6 + 3

The total rpm are compiled in the following table Treatment combinations Observed rpm (in hundreds) Replications Replication Replication Replication 3 Replication Total rpm () a b ab c ac bc abc d ad bd abd cd acd bcd abcd 7 6 3 6 6 6 0 9 38 50 70 0 80 60 9 5 7 0 5 3 0 5 3 8 79 6 7 8 6 30 7 0 50 36 0 50 60 3 6 6 70 5 5 3 8 6 0 0 6 33 5 7 50 5 80 () = 5 (a) = 89 (b) = 0 (ab)= 73 (c) = (ac) = 87 (bc) = 9 (abc) = 73 (d) = 9 (ad) = 3 (bd) = 90 (abd) = 6 (cd) = 9 (acd) = 7 (bcd) = 7 (abcd) = 7

5 Now various sums of squares can be computed using the Yates procedure as follows: Yates procedure Treatment totals Yates procedure calculations () () (3) () (5) 5 3 37 697 37 = [M] 89 3 350 0 570 = [A] 0 8 69 37 635 = [B] 73 8 3 6 = [AB] 3 77 73 85 = [C] 87 06 70 6 86 = [AC] 9 66 65 67 = [BC] 73 55 58-5 - = [ABC] 9 79 3 73 = [D] 3 33 9 8 - = [AD] 90 6 83-7 89 = [BD] 6 79 93-38 = [ABD] 9 39 89 5 79 = [CD] 7 6 78 96 00 = [ACD] 7 6 3 65 83 = [BCD] 7-3 -6-77 - = [ABCD]

6 The sum of squares are obtained by using For example, SS( Effect) = [ A] SS( A) =,. [ B ] SS( B ) =,. [ Effect] r. and so on. These sum of squares are complied in the following analysis of variance table.

Analysis of variance table for factorial experiment 7

8 The tabulated value of F-statistics at 5% level of significance with and 5 degrees of freedom is F tab (, 5) =.06. The F values of the null hypothesis corresponding to A, B, D, BC, AC, AD, BD, CD, ABC, ABD and ABCD are greater than F tab (, 5) =.06, so the null hypothesis is rejected. So these effects are significant. The F values of the null hypothesis corresponding to C, AB, ACD and BCD are smaller than F tab (, 5) =.06, so the null hypothesis is accepted. Thus these effects arenot significant. ifi