MAT 3378 3X Midterm Examination (Solutions) 1. An experiment with a completely randomized design was run to determine whether four specific firing temperatures affect the density of a certain type of brick. Overall the experiment yielded for n T = 13 experimental units a mean of 21.677 and a standard deviation of 0.1691. Summary statistics by treatment are given in the following table: mean standard deviation temperature i n i of density of density 100 1 5 21.74 0.1140 125 2 4 21.5 0.1414 150 3 4 21.775 0.1258 (a) Write down an appropriate ANOVA model for this experiment. (b) Let µ i be the mean density for treatment i for i = 1, 2, 3. Consider the following linear combination of the three treatment means µ = 3 n i n T µ i. Is µ a contrast of the treatment means? Explain. (c) Estimate the variance σ 2 of the random error. (d) Give a point estimate for µ and also compute its (estimated) standard error. Use these estimates to construct a 95% confidence interval for µ. Note: You can use the fact that Solution: t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) = 1.782. (a) Let Y ij be the jth density for temperature i. A corresponding ANOVA model is Y ij = µ i + ε ij where ε ij are independent N(0, σ 2 ).
(b) Since the sum of the coefficients is (n 1 + n 2 + n 3 )/n T = 1, which is not equal to zero. Thus, it is not a contrast. (c) We will use a pooled estimate of the error variance: MSE = SSE n T r = 3 (n i 1) (n T r) s2 i = (4) 0.11402 + (3) 0.1414 2 + (3) 0.1258 2 10 = 0.01594, where s 2 i is the sample variance for the density within the treatment group i. (d) A point estimate for µ is Y = 3 n i n T Y i = (5/13)(21.74) + (4/13)(21.5) + (4/13)(21.775) = 21.6769. Its (estimated) standard error is s{y } = MSE 3 (n i /n T ) 2 /n i = 21.6769 (5/13) 2 /5 + (4/13) 2 /4 + (4/13) 2 /4 = 1.2913 A 95% confidence interval for µ is Y ±t(0.975; 13 3)s{Y } = 21.6769±(2.228) (1.2913) = [18.80, 24.55]. 2. Consider the experiment from Question 1. Below is a table of p-values for the pairwise comparisons of the temperature effects using the Tukey method. 2
(a) Using a family significance level of 5% for the pairwise comparison, describe the temperature effects. (b) Each of these p-values is computed using a particular probability distribution. Give the name of this distribution with the values of its parameters. (c) Say that it is determined prior to the experiment that only a difference in the treatment means of more than 0.25 units would be considered important. Could we use the above table of p-values to determine if there are important temperature effects? Explain. Solution: (a) The largest mean density is Y 3 = 21.775 which corresponds to a temperature of 150. This mean density is not significantly different than the mean density of Y 1 = 21.74 when using a temperature of 100. However, when using a temperature of 125, we obtain a mean density of Y 2 = 21.5 which is significantly different than the mean densities for the other two treatments. (b) It is a studentized range distribution with ν = n T 3 = 10 degrees of freedom and r = 3 groups, i.e. q(r = 3; ν = 10). (c) A p-value is a measure of significance. It does not measure the size of the effect on the scale of the density. So using p-values, we can determine if the difference between the means is significantly different. However, using the p-values, we cannot determine if the significant different is important. 3. Two hockey teams try a new skate model to see if it increases the speed of players. They both compare with their old skate models. The first team has 23 players. Its test consists of 12 with the old skates racing 3
around the rink, with average time y 1 = 58.95 seconds and standard deviation s 1 = 2.71, and the 11 other players with the new skates do a race around the rink with average time y 2 = 58.78 and standard deviation s 2 = 1.75. The response variable is the time to skate around the rink and the explanatory variable is the model of the skate. (a) What type of experimental design is being implemented here. What is the basic unit of study (i.e. the experimental unit)? (b) The other team has 22 regular players, and tests all players with the old skates and the new skates, so they all do one race with each. The summary statistics are d = 1.02 and s d = 0.098 for the difference in race time for each player (old skates minus new skates). What type of experimental design is being implemented here? (c) Which team has the best method to test the new skates? Why? (d) What reasons could lead one team to go for a less optimal experimental design? Solution: (a) The treatment which is the model of the skate is assigned to a player. So the player is the experimental unit. Assuming that the assignment is random, i.e. we randomly assign the players to the treatment groups, then it is a completely randomized design. (b) It is a repeated measures design. We can also consider each player as a block and since within each block we find all the treatments, then it is a complete block design. Furthermore, if we randomize the treatment within the player, i.e. randomly assign the order, then it is a complete randomized block design. (c) The speed of players can vary greatly which can explain a large proportion of the variation in the response variable. By using repeated measures within a player, we are controlling for the varying speeds between the players. A comparison of the two speeds within a player eliminates the between player variability. This generally will mean that the repeated measures design will be more powerful. 4
(d) Often constraints in resources will lead the investigator to choose a less optimal design. For example, if a team has to buy the new skates to test them, it may not be able to afford a new pair for every player (we should note that in that case it still might be better to use a smaller number of repeated measures). 4. Let Y 1, Y 2, Y 3, Y 4 be independent normal random variable, such that Y i N(µ i, σ 2 ). We will assume that µ 1 = µ 2 = 5, µ 3 = 6, µ 4 = 6.5. (a) Give the distribution of the following random variable 4 ( ) 2 Yi 5. σ (b) Give the distribution of the random variables U 1 and U 2, where and U 1 = Q 1 /σ 2 and U 2 = Q 2 /σ 2 Q 1 = (Y 1 5) 2 + (Y 2 5) 2 and Q 2 = (Y 3 5) 2 + (Y 4 5) 2. (c) Give the distribution of Q 2 /Q 1, where Q 1 = (Y 1 5) 2 + (Y 2 5) 2 and Q 2 = (Y 3 5) 2 + (Y 4 5) 2. (a) Give the distribution of the following random variable 4 ( ) 2 Yi 5. σ Define W i = Y i 5, then W 1, W 2, W 3, W 4 are independent normal random variables with a common variance σ 2. Thus, 4 ( ) 2 Yi 5 4 Wi 2 = σ σ 2 has a non-central chi-square distribution with ν = 4 degrees of freedom and its non-centrality parameter is nc = 4 µ2 W i σ 2 = 02 + 0 2 + 1 2 + (1.5) 2 σ 2 = 3.25 σ 2. 5
(b) The random variable U 1 has a chi-square distribution with 2 degrees of freedom, while U 2 has a (non-central) chi-square distribution with 2 degrees of freedom and its non-centrality parameter is nc = (6 5)2 + (6.5 5) 2 = 3.25 σ 2 σ. 2 (c) Since U 1 and U 2 are defined above are independent, then Q 2 = Q 2/(2 σ 2 ) Q 1 Q 1 /(2 σ 2 ) = U 2/ν 2 U 1 /ν 1 has a (non-central) F (2, 2, nc) distribution, where nc = (6 5)2 + (6.5 5) 2 σ 2 = 3.25 σ 2. 6