Practice Final Exam December 14, 29 1 New Material 1.1 ANOVA 1. A purication process for a chemical involves passing it, in solution, through a resin on which impurities are adsorbed. A chemical engineer is testing the eciency of 3 dierent resins in collecting impurities; he breaks each resin into 5 pieces and measures the concentration of impurities after passing through the resins. The data are as follows: Concentration of impurities Resin 1 Resin 2 Resin 3.46.38.31.25.35.42.14.31.2.17.22.18.43.12.39 Test the hypothesis that there is no dierence in the eciency of the resins, using analysis of variance techniques. Solution We want to test the hypothesis H : µ 1 µ 2 µ 3. The analysis of variance table is Source d.f. SOS Mean Squares F -statistic Treatments 2 SSTr 1.4533 1 5 MSTr 7.27 1 6 F.487 Error 12 SSE.18 MSE 1.49 1 4 Total 14 SST.18 p-value:.953 Since our p-value is.953, we can accept the null hypothesis. 2. Four standard chemical procedures are used to determine the magnesium content in a certain chemical compound. Each procedure is used four times on a given compound with the following data resulting: Magnesium content Method 1 Method 2 Method 3 Method 4 76.42 8.41 74.2 86.2 78.62 82.26 72.68 86.4 8.4 81.15 78.84 84.36 78.2 79.2 8.32 8.68 Do the data indicate that the procedures yield equivalent results? Solution We want to test the hypothesis H : µ 1 µ 2 µ 3 µ 4. The analysis of variance table is Source d.f. SOS Mean Squares F -statistic Treatments 3 SSTr 135.7625 MSTr 45.254 F 7.474 Error 12 SSE 72.66 MSE 6.55 Total 15 28.423 p-value:.44 Since our p-value is.44, we can safely reject the null hypothesis. 1
3. For data x ij, i 1,..., m, j 1,..., m, show that x m x i /m where x 1 mn m n x ij is the sample mean of all x ij. x j /n Solution We can write Also, x x 1 mn m x ij 1 m 1 x ij m n }{{} x i m x i /m 1 n 1 mn 1 mn m m x ij x ij m ) 1 x ij m }{{} x j /n x j 4. Problem 11.1.21 in the text. Solution The rst condence interval, for example, is µ 1 µ 2 x 1 x 2 ± sq α,k,ν 1 + 1 ) 2 n 1 n 2 46.9 42.21 ± 4.33 q ).5,5,4 1 2 1 + 1 9 1.83, 9.563) This establishes that the hypothesis µ 1 µ 2 is plausible at the.5 signicance level, since the interval contains. Carrying out this procedure for all pairs, the condence intervals that contain are µ 1 µ 2, µ 2 µ 5, and µ 3 µ 4. The largest mean is µ 3 or µ 4 and the smallest mean is either µ 2 or µ 5, which can be veried by looking at the values of x 1 through x 5 1.2 Regression 1. The following table shows the number of units of a good that customers ordered, when the good was priced at various levels. In economics, these points are said to lie on a demand curve. Number ordered 88 112 123 136 158 172 Price 5 4 35 3 2 15 2
How many units do you think would be ordered if the price were 25? Solution We perform a regression of the form y i β + β 1 x i + ɛ i with y i being the number ordered and x i the price. Solving for β and β 1, we nd that ˆβ 26.7 and ˆβ 1 2.38. At x 25 we estimate Y x 26.7 2.38 25 147 units ordered. 2. Consider the simple linear regression model Suppose that < β 1 < 1. Y β + β 1 x + ɛ i a) Show that if x < β 1 β 1, then Solution We can write x < E Y ) < β E Y ) β + β 1 x x E Y ) β β 1 < β E Y ) < β 1β + β β Now, to show that x < E Y ), we just have to show that x < β + β 1 x since E Y ) β + β 1 x. This is straightforward: we have as desired. b) Show that if x > β 1 β 1, then x < ) x < β β x β 1 x < β x < β + β 1 x x > E Y ) > β and hence conclude that E Y ) always lies between x and Solution We can write β 1 β 1. E Y ) β + β 1 x x E Y ) β β 1 > β E Y ) > β 1β + β β Now, to show that x < E Y ), we just have to show that x > β + β 1 x 3
since E Y ) β + β 1 x. This is, again, straightforward: we have as desired. x > ) x > β β x β 1 x > β x > β + β 1 x 3. It has been determined that the relation between stress S) and the number of cycles to failure N) for a particular type of alloy is given by S A N m where A and m are unknown constants. An experiment is run yielding the following data: Stress 55. 5.5 43.5 42.5 42. 41. 35.7 34.5 33. N millions).223.925 6.75 18.1 29.1 5.5 126 215 445 a) Estimate A and m hint: use a logarithmic transformation). Solution Using a logarithmic transformation, we nd that log S log A m log N so there is a linear relationship between log S and log N. We set y log S and x log N, and we obtain the new table y 4.1 3.92 3.77 3.75 3.74 3.71 3.57 3.54 3.5 x -1.5 -.8 1.91 2.89 3.37 3.92 4.84 5.37 6.1 Solving for β and β 1 we nd that ˆβ 3.92 and ˆβ 1.66, and therefore  e ˆβ 5.51 and m β 1.66 b) Estimate β, β 1, and β 2 if we instead use the relation S β + β 1 N + β 2 N 2 Why is this probably) a less reasonable model? In particular,what happens to each model as N? Solution This is a multi-variable regression of the form y β + β 1 x 1 + β 2 x 2 with y S, x 1 N, x 2 x 2 1 N 2. We write the matrices 1.223.497 1.925.8556 X... ; Y 1 445. 1.98 1 5 and solve the normal equations X T Xβ X T Y, which gives us ˆβ 47.86, ˆβ1.114, and ˆβ 2.2. This is not a very good model because S as N as ˆβ 2 > ), which is not reected in the data set, whereas in our original model we have S as N, which is reected in the data set. 1.3 Multi-factor experiments 1. Suppose we observe the following data in a two-factor experiment: Factor A Level 1 Level 2 Factor B Level 1 2,4,6 1,3,5 Level 2 1,3,11 2,6,12 55. 5.5. 33. 4
Estimate the parameters µ, α i, β j, and α β) ij using the appropriate estimators. Solution We estimate the parameters with ˆµ x 4.67 ˆα 1 x 1 x.167 ˆα 2 x 2 x.167 ˆβ 1 x 1 x 1.167 ˆβ 2 x 2 x 1.167 α β) 11 x 11 ˆµ ˆα 1 ˆβ 1.67 α β) 12 x 12 ˆµ ˆα 1 ˆβ 2.67 α β) 21 x 21 ˆµ ˆα 2 ˆβ 1.67 α β) 22 x 22 ˆµ ˆα 2 ˆβ 2.67 2. This one's hard) Consider a two-factor experiment with cell means µ ij decomposed as µ ij µ + α i + β j Notice that there are no interaction eects i.e. α β) ij for all ij). Suppose that µ, α 1,..., α a ), β 1,..., β b ) and µ, ᾱ 1,..., ᾱ a ), β1,..., β b ) satisfy for all ij and α i µ + α i + β j µ + ᾱ i + β j 1) ᾱ i β j β j 2) Show that µ µ, α i ᾱ i, and β j β j This shows that the parameters µ, α 1,..., α a ), β 1,..., β b ) are uniquely determined). Hint 1 First, show that µ µ. To do this, suppose for a contradiction that µ > µ. Then from 1) it must be true that α i + β j < ᾱ i + β j If we sum over all i and j, we have which is a contradiction with 2). Why? Hint 2 Since µ µ, we know from 1) that α i + β j < ᾱ i + β j for all ij. Suppose that α i > ᾱ i for some index i. Since 3) says that it must be true that α i + β j ᾱ i + β j 3) α i + β j ᾱ i + β j β j < β j for all j. However, this is a contradiction with 2). Why? Solution Following Hint 1, we have α i + β j < 5 ᾱ i + β j 4)
However, 2) says that and therefore α i ᾱ i β j β j α i + β j β j + α i bα i + β j }{{} bα i b α i We could also conclude that a b ᾱi + β j, using the same reasoning. Then 4) says that <, a contradiction. Next, following hint 2, we have β j < β j for all j. This is again a contradiction because 2) says that but we concluded in the hint that β j β j β j < β j β j < β j which again implies that <, a contradiction. 3. Consider a two-factor layout. Show that the estimators satisfy ˆµ x ˆα i x i x ˆβ j ) x j x ˆα ˆβ x ij ˆµ ˆα i ˆβ j ij ˆα i ˆβ j in fact, it is also true that a ˆα ˆβ ) and b ˆα ˆβ ), but you don't need to show that ij ij here). 6
Solution We have ˆα i x i x ) x i x i a x 1 bn x 1 bn k1 k1 x ijk a x ijk 1 bn 1 abn k1 k1 x ijk x ijk Similarly, ˆβ j 2 Old statistics material 1 an x j x ) x j x j b x x ) 1 x ijk b an k1 k1 x ijk 1 an 1 abn k1 k1 x ijk 1. Find a maximum likelihood estimator for the parameter p in a Bernoulli random variable, letting A be the number of successes and B n A the number of failures. Solution Let x 1,..., x n represent a collection of samples. We write the likelihood function L x 1,..., x n ; p) p A 1 p) B l log L A log p + B log 1 p) dl A dp p B 1 p p A/ A + B) A/n 2. Find a maximum likelihood estimator for the parameter p in a geometric random variable. Is this the same estimator that one would obtain with the method of moments? x ijk 7
Solution Let x 1,..., x n represent a collection of samples. We write the likelihood function [ ] [ ] L x 1,..., x n ; p) 1 p) x1 1 p 1 p) xn 1 p 1 p) x1+ +xn n p n l log L x 1 + + x n n) log 1 p) + n log p dl n dp p x 1 + + x n n 1 p n p 1/ x x 1 + + x n which is indeed the same estimator as the method of moments would give. 3. During two consecutive seasons in the NBA, Larry Bird shot a pair of free throws on 338 occasions. On 251 occasions he made both shots; on 34 occasions he made the rst shot but missed the second one; on 48 occasions he missed the rst shot but made the second one; on 5 occasions he missed both shots. a) Use these data to test the hypothesis that Bird's probability of making the rst shot is equal to his probability of making the second shot. Solution We'll model this as a two-population hypothesis test, using the method of paired samples. Let x 1,..., x 338 denote the set of all rst shots that Bird made, and let y 1,..., y 338 denote the set of all second shots that he made. We want to test the null hypothesis H : µ, where µ E Z) with z i x i y i. Notice that, given the data, we know that we have z i for 251 + 5 256 occasions the occasions when he made both or missed both), z i 1 for 48 occasions, and z i 1 for 34 occasions. Hence, we nd that We nd that S 2 Our t-statistic is ˆµ z 256 + 48 1) + 34 1 338.414 338 z i z) 2 256 z)2 + 48 1 z) 2 + 34 1 z) 2.2416 337 337 n z µ ) t s 338.414).2416 1.55 Using the 1% signicance level, we see that t.5,338 1.645. Since 1.55 < 1.645, the hypothesis is plausible. b) Use these data to test the hypothesis that Bird's probability of making the second shot is the same regardless of whether he made or missed the rst one. Solution We'll again model this as a two-population hypothesis test, but this time we can't use paired samples. The two populations we're comparing are: a) The set of second shots, after a successful rst shot population A) b) The set of second shots, after an unsuccessful rst shot population B) There are 251 + 34 285 occasions in which Bird successfully made his rst shot, and 48 + 5 53 occasions in which he missed his rst shot. We'll test the hypothesis H : µ A µ B. Let the members of population A be denoted by x 1,..., x 285, where x i is if Bird missed his second shot and x i is 1 if he made his second shot. and let the members of population B be denoted by y 1,..., y 53, with similar denitions for y i. Since Bird made his second shot on 251 of the 285 occasions that he made his rst shot, we have x 251/285.887 Similarly, since Bird made his second shot on 48 of the 53 occasions that he missed his rst shot, we have ȳ 48/53.95 8
So, we have x and ȳ; we just need the variances S 2 x and S 2 y, and we'll be all set. We have Our t-statistic is S 2 x S 2 y 285 x i x) 2 284 251 1.887)2 + 34.887) 2 284 53 y i ȳ) 2 48 1.95)2 + 5.95) 2.871 52 52 x ȳ t Sx 2 n + S2 y m.887.95.154 285 +.871 53.5416.154 Using the 1% signicance level, we see that t.5,52 1.677. Since.5416 < 1.677, the hypothesis is accepted. Hint Each shot is a Bernoulli random variable, with X indicating a miss and X 1 indicating a basket. The rst question asks you to test the hypothesis H : µ 1 µ 2, where µ 1 is the probability of success of the rst shot and µ 2 is the probability of success of the second shot. The second question asks you to test the hypothesis H : µ 1 µ 2, where µ 1 is the probability of success of the second shot when the rst shot was a miss, and µ 2 is the probability of success of of the second shot when the rst shot was a success. 4. In a certain chemical process, it is very important that a particular solution that is to be used as a reactant have a ph of exactly 8.2. Suppose 1 independent measurements yielded the following ph values: 8.18, 8.17, 8.16, 8.15, 8.17, 8.21, 8.22, 8.16, 8.19, 8.18 a) What conclusion can be drawn at the α.1 level of signicance? Solution We want to test the null hypothesis H : µ 8.2. We nd that x 8.179. The sample variance is S 2.49889, so s.223. The t-statistic is n x µ ) 1 8.179 8.2) t 2.9779 s.223 Since t.5,9 1.833, we nd that t > t.5,9, so the hypothesis is rejected. b) What about at the α.5 level of signicance? Solution Since t.25,9 2.262, we nd that t > t.25,9, so we can still reject the hypothesis. 5. A certain type of bipolar transistor has a mean value of current gain that is at least 21. A sample of these transistors is tested. If the sample mean value of current gain is 2 with a sample standard deviation of 35, would the claim be rejected at the 5 percent level of signicance if a) the sample size is 25? Solution We want to test the null hypothesis H : µ 21. If the sample size is 25 and s 35, the t-statistic is n x µ ) 25 2 21) t 1.426 s 35 Since t.5,24 1.711, we nd that t > 1.711, so we can accept the null hypothesis. b) the sample size is 64? Solution We want to test the null hypothesis H : µ 21. If the sample size is 64 and s 35, the t-statistic is n x µ ) 64 2 21) t 2.2857 s 35 Since t.5,64 1.671, we nd that t < 1.671, so we should reject the null hypothesis. 9
3 Probability Questions 1. The density function of X is given by f x) { a + bx 2 x 1 otherwise If E X) 3/5, nd a and b. Solution We have ˆ 1 a + bx 2 dx 1 [ ax + b ] 1 3 x3 1 a + b/3 1 Next, we also have ˆ 1 x a + bx 2) dx 3/5 [ a 2 x2 + b ] 1 4 x4 3/5 a/2 + b/4 3/5 Solving simultaneously for a and b, we have a 3/5, b 6/5. 2. The lifetime in hours of electronic tubes is a random variable having a probability density function given by Compute E X). Solution We have Integrating by parts, we nd that ˆ and therefore a 2 ˆ f x) a 2 xe ax, x E X) ˆ x a 2 xe ax) dx ˆ a 2 x 2 e ax dx x 2 e ax ax ax ax + 2) 2 dx e a 3 [ x 2 e ax dx a 2 ax ax ax + 2) 2 e 3. Consider a sequence of independent uniform random variables X i U, 1) a 3 ] 2 a a) Let Write the c.d.f. and p.d.f. of X. X max {X 1,..., X n } 1
Solution The c.d.f. for X is F x) Pr X x) Pr X 1,..., X n x) For any particular X i, the probability that X i x is precisely x. Therefore, F x) x n, so f x) nx n 1. b) Compute E X). Solution We have E X) ˆ 1 n n + 1 x nx n 1) dx [ x n+1 ] 1 n n + 1 4. The annual rainfall in Cincinnati is normally distributed with mean 4.14 inches and standard deviation 8.7 inches. a) What is the probability this year's rainfall will exceed 42 inches? Solution Let X denote the annual rainfall in Cincinnati. We have Pr X > 42) ) X 4.14 42 4.14 Pr > 8.7 8.7 Pr Z >.2138) 1 Φ.2138).4154 where, as usual, Z N, 1). b) What is the probability that the sum of the next 2 years' rainfall will exceed 84 inches? Solution Let X 1 denote the rainfall next year, and X 2 the rainfall the year after that. Then X 1, X 2 N 4.14, 8.7 2) and therefore the sum X X 1 + X 2 satises X N 8.28, 8.7 2 + 8.7 2). We have Pr X > 84) ) X 8.28 84 8.28 Pr > 8.72 + 8.72 8.72 + 8.7 2 Pr Z >.323) 1 Φ.323).3812 c) What is the probability that the sum of the next 3 years' rainfall will exceed 126 inches? Solution Let X 1 denote the rainfall next year, X 2 the rainfall the year after that, and X 3 the rainfall the year after that. Then X 1, X 2, X 3 N 4.14, 8.7 2) and therefore the sum X X 1 + X 2 + X 3 satises X N 12.42, 8.7 2 + 8.7 2 + 8.7 2). We have ) X 12.42 Pr X > 12) Pr 8.72 + 8.7 2 + 8.7 > 126 12.42 2 8.72 + 8.7 2 + 8.7 2 Pr Z >.373) 1 Φ.373).373 d) For parts b) and c), what independence assumptions are you making? Solution We're assuming that X 1, X 2, and X 3 are all independent; this assumption is necessary to justify the statement that Var X 1 + X 2 ) Var X 1 ) + Var X 2 ) for example. 11