Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem# 1. The shipment route (X) and the number of ampules to be broken upon arrival (Y). The summary of simple linear regression fit is following: Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -2.2-1.2 0.3 0.8 1.8 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.2000 0.6633 15.377 3.18e-07 *** x 4.0000 0.4690 8.528 2.75e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.483 on 8 degrees of freedom Multiple R-Squared: 0.9009, Adjusted R-squared: 0.8885 F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05 (a) The estimated function is Ŷ = 10.2 + 4 x. Linear regression function apprears to give a good fit here because the data are around the linear regression function and the pvalue of testing β 1 = 0 is 2.75e-05 supporting that linear fit is statistically 1

significant, R 2 = 0.9, and pearson correlation is 0.949158. However, the number of unique values of X are only four. If we have more various values of X, the fitted line would be more informative. (b) A point estimate of the expected number of broken ampules when X = 1 transfer is made is Ŷ = 10.2 + 4 1 = 14.2 (c) The expected number of ampules broken when there are 2 transfer is Ŷ = 10.2 + 4 2 = 18.2. Therefore, the estimation of the increase is Ŷ (2) Ŷ (1) = 4(2 1) = 4. (d) Since x = 1 and ȳ = 14.2, ȳ = 10.2 + 4 x. Problem# 2. Refer to Airfreight breakage (Problem 1). (a) Estimate β 1 with a 95% confidence interval. Interpret your interval estimate. A.(a) Summary of simple linear regression: > Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -2.2-1.2 0.3 0.8 1.8 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 10.2000 0.6633 15.377 3.18e-07 *** x 4.0000 0.4690 8.528 2.75e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.483 on 8 degrees of freedom Multiple R-Squared: 0.9009, Adjusted R-squared: 0.8885 F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05 > confint(lmfit) 2.5 % 97.5 % (Intercept) 8.670370 11.729630 x 2.918388 5.081612 With confidence coefficient.95, we estimate that the mean number of broken ampules increases by somewhere between 2.92 and 5.08 for each additional unit time that a carton is transfered. NOTE: In 95% confidence interval, 95% is NOT a probability that interval includes means of number of broken ampules (b) Conduct a t-test to decide whether or not there is a linear association between number of times a carton is transferred (X) and number of broken ampules (Y). Use a level of significance of 0.05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? A.(b) 2

Hypothesis: H 0 : β 1 = 0 vs H a : β 1 0 Test statistic: t = 4 0 = 8.528 0.4690 Decision rule: reject H 0 if t > t(1 0.05/2, 10 2) = 2.069. So, we reject H 0 Conclusion: There is an statistical evidence that there is significant linear relationship between x and y with a level of significance=0.05. P-value: P r( t > t ) = 2 P r(t > t ) = 2.75e 05 which is less than 0.05 (=a level of significance). Therefore, we have the same conclusion which we made based on decison rule using test statistic. That is, there is an evidence that a linear relationship between number of times and number of broken ampules is statistically significant with α = 0.05 (c) Conduct a t-test to decide whether or not there is a POSITIVE linear association between number of times a carton is transferred (X) and number of broken ampules (Y). Use a level of significance of 0.05. State the alternatives, decision rule, and conclusion. What is the P-value of the test? A.(c) Hypothesis: H 0 : β 1 = 0 vs H a : β 1 > 0 Test statistic: t = 4 0 = 8.528 0.4690 Decision rule: Reject H 0 if t > t(1 0.05, 10 2) = 1.86. We reject H 0 P-value: P r(t > t ) = 2.75e 05/2 which is less than 0.05 (=a level of significance). Therefore, we have the same conclusion which we made based on decison rule. Conclusion: there is an evidence that a positive linear relationship between number of times and number of broken ampules is statistically significant with α = 0.05 (d) β 0 represents here the mean number of ampules broken when no transfers of the shipment are made-i.e., when X = 0. Obtain a 95% confidence interval for β 0 and interpret it. A.(d) With confidence coefficient.95, we estimate that the mean number of broken ampules when no transfers of the shipment are made are in somewhere between 8.67 and 11.73. (e) A consultant has suggested, on the basis of previous experience, that the mean number of broken ampules should not exceed 9.0 when no transfers are made. Conduct an appropriate test, using α = 0.025. State the alternatives, decision rule, and conclusion. What is the P -value of the test? A.(e) Hypothesis: H 0 : β 0 = 9 vs H a : β 0 > 9 Test statistic t = 14.2 9 = 1.809 0.6633 Decision rule: reject H 0 if t > t(1 0.0025, 8) = 2.306 Conclusion: There is no statistical evidence that the mean number of broken ampules exceed 9.0 when no transfers are made. P r(t > t ) = 1 P r(t < t ) = 1 P r(t 8 < 1.809) = 0.054 which is not less than α = 0.025. Hence we have the same conclusion based on decision rule. 3

(f) Obtain the power of your test in part (b) if actually β 1 = 2.0. Assume σ{b 1 } =.50. A.1(f) (Case1) If we know the σ{b 1 }, we use normal distribution, P ower = P r(accepth a H a istrue) = P r(accepth a β 1 = 2) = P r( ˆβ 1 0 > 1.96 β 1 = 2) = P r( ˆβ 1 2 > 1.96 2/) + P r( ˆβ 1 2 < 1.96 2/) = 0.9793 (Case2) If we don t know σ{b 1 }, we use t-distribution. P ower = P r(accepth a H a istrue) = P r(accepth a β 1 = 2) = P r( ˆβ 1 0 > t 1 0.05/2,8 β 1 = 2) = P r( ˆβ 1 2 > 2.31 2/) + P r( ˆβ 1 2 < 2.31 2/) = 0.9352 + 0.0001151 = 0.9353 A.2(f) Also obtain the power of your test in part (b) if acutaully β 0 = 11. Assume σ{b 0 } =. (Case1) If we know the σ{b 0 }, we use normal distribution. = P r( ˆβ 0 11 P ower = P r(accepth a H a istrue) = P r( ˆβ 0 9 > z 1 α β 0 = 11) > 1.654 2 ) = 1 P r(z < 1.021667) = 1 0.1534 = 0.8465 (Case2) If we don t know σ{b 0 }, we use t-distribution. = P r( ˆβ 0 11 P ower = P r(accepth a H a istrue) = P r( ˆβ 0 9 > t 1 α β 0 = 11) > 1.86 2 ) = 1 P r(z < 0.8066) = 1 0.1534 = 0.7784 Problem# 3. Refer to Airfreight breakage (Problem 1). (a) Because of changes in airline routes, shipments may have to be transferred more frequently than in the past. Estimate the mean breakage for the following numbers of transfers: X = 2,4. Use separate 99% confidence intervals. Interpret your results. A.(a) 99% CI for E(Y x new ) is ˆβ 0 + ˆβ 1 x new ± t 0.995,10 2 ˆσ 2 ( 1 10 + (xnew x) (xi x) 2 ). 4

99% CI for E(Y x new = 2) is 18.20 ± 2.23 = (15.97, 20.43) and for E(Y x new = 4 is 26.20 ± 4.98 = (21.22, 31.18), respectively. We conclude with confidence coefficient 0.99 that the mean number of broken ampules required when 2 transfers are produced is somewhere between 15.97 and 20.43. We conclude with confidence coefficient 0.99 that the mean number of broken ampules required when 4 transfers are produced is somewhere between 21.22 and 31.18. (b) The next shipment will entail two transfers. Obtain a 99% prediction intervals for the number of broken ampules for this shipment. Interpret your prediction interval. A.(b) 99% PI for Y new when X new = 2 is ˆβ 0 + ˆβ 1 X new ± t 0.995,10 2 ˆσ 2 (1 + 1 + (Xnew x) 10 (xi ). x) 2 99% PI for Y new when X new = 2 is (12.75, 23.65). (c) In the next several days, three independent shipments will be made, each entailing two transfers. Obtain a 99% prediction interval for the mean number of ampules broken in the three shipments. Convert this interval into 99% prediction interval for the total number of ampules broken in the three shipments A.(c) 99% PI for mean of m new observations for given X new is ˆβ 0 + ˆβ 1 X new ±t 0.995,10 2 ˆσ 2 ( 1 3 + 1 10 + (Xnew x) (xi x) 2 ). The 99% PI for mean of 3 new observations for given X new = 2 is (14.57, 21.83). We obtain the prediction interval for total number of ampules broken in three shipments is (14.57 3, 21.83 3) = (43.70, 65.50) (d) Determine the boundard values of the 99% confidence band for the regression line when X new = 2 and when X new = 4. Is your confidence band wider at these two points than the corresponding confidence intervals in part (a)? Should it be? A.(d) The 1 α confidence band for the regression line is ŷ new ± W S{y new }, where w 2 = 2F (1 α, 2, 8) The 99% confidence band when X new = 2 is (15.44, 20.96) and when X new = 4 is (20.03, 32.37), respectively. It is wider than the corresponding CI in part (a) which make sense because it is boundary of CI. Problem# 4. Refer to Airfreight breakage (Problem 1). (a) Set up the ANOVA table. Which elements are additive? A.(a) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x 1 160.0 160.0 72.727 2.749e-05 *** Residuals 8 17.6 2.2 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Summ of Square and degree of freedom are additive, that is, SST o = SSR + SSE 5

and df total = df reg + df error. (b) Conduct an F test to decide whether or not there is a linear association between the number of times a carton is transferred and then the number of broken ampules; control the α risk at.05. State the alternatives, decision rule, and conclusion. A.(b) H 0 : y = β 0 + ɛ vs H a : y = β 0 + β 1 x + ɛ, i.e., H 0 : β 1 = 0 vs H a : β 1 0 Test statistic: F = MSR/MSE = 160/2.2 = 72.72 Decision rule: reject H 0 if F > F 0.95,1,8 = 5.32. We reject H 0 Conclusion: there is an evidence that a linear association between the number of times a carton is transferred and then the number of broken ampules is statistically significant with α = 0.05. (c) Obtain the t statistic for the test in part (b) and demonstrate numerically its equivalance to the F statistic obtained in part (b) A.(c) t = 8.528 and F = 72.727. Hence, F = t (d) Calculate R 2 and r. What proportion of the variation in Y is accounted for by introducing X into the regression model? A.(d) Since SSR=160, SSE=17.6, SSTo=160+17.6, R 2 = SSR/SST o = 1 SSE = 0.9009 SST o Ra 2 = 1 SSE/(n 2) = 0.8885 r = cor(x, y) = 0.949158 = sqrt(0.9009) SST 0/(n 1) 6