Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as many replacement parts. In the past, one of the replacement parts has been produced periodically in lots of varying sizes. When a cost improvement program was undertaken, company officials wished to determine the optimum lot size for producing this part. The production of this part involves setting up the production process (which must be done no matter what is the lot size) and machining and assembly operations. One key input for the model to ascertain the optimum lot size was the relationship between lot size and labor hours requires to produce the lot. To determine this relationship, data on lot size and work hours for 25 recent production runs were utilized. The production conditions were stable during the six-months period in which the 25 runs were made and were expected to continue to be the same during the next three years, the planning period for which the cost improvement program was being conducted. The output from SAS is below. Lot Hours Lot Hours Lot Hours Lot Hours 80 399 80 352 110 435 90 468 30 121 100 353 100 420 40 244 50 221 50 157 30 212 80 342 90 376 40 160 50 268 70 323 70 361 70 252 90 377 60 224 90 389 110 421 120 546 20 113 30 273 Table 1: Data on Lot Size and Work Hours Toluca Company The REG Procedure Model: MODEL1 Dependent Variable: y Work Hrs Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 252378 252378 105.88 <.0001 Error 23 54825 2383.71562 Corrected Total 24 307203 Root MSE 48.82331 R-Square 0.8215 Dependent Mean 312.28000 Adj R-Sq 0.8138 Coeff Var 15.63447 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept 1 62.36586 26.17743 2.38 0.0259 x Lot Size 1 3.57020 0.34697 10.29 <.0001 Obs yhat stdi lower upper 9 419.386 50.8666 332.207 506.565
Linear Regression and Prediction Intervals, page 2 1. Find a 99% confidence interval for the true regression slope. 2. Test the hypothesis of no linear relationship. H 0 : H a : α = Test Statistic: P -value: Decision: (Circle one:) Reject H 0 or Fail to Reject H 0 Conclusion: 3. Predict the number of work hours needed for a lot size of 100.
Linear Regression and Prediction Intervals, page 3 4. Find a 90% prediction interval for the work hours needed for a lot size of 100. 5. Find a 99% prediction interval for the work hours needed for a lot size of 100.
Linear Regression and Prediction Intervals, page 4 Solution: 1. Find a 99% confidence interval for the true regression slope. The degrees of freedom for the critical value are found by either looking for the degrees of freedom for the ERROR on the printout (df = 23) or from the formula df = n 2 = 23. The critical value is then found in the t-table is 2.807. The estimate and standard error of the estimate for the true regression slope can be found on the printout. They are 3.57020 and 0.34697 respectively. Hence, a 99% confidence interval for the true regression slope is b ± m 2. Test the hypothesis of no linear relationship. b ± t SE b 3.57020 ± 2.807(.34697) (2.5963, 4.5441) This test can be performed easily as it is the same as the test that the true regression slope is zero. Hence, H 0 : β = 0 H a : β 0 α = 0.05 Test Statistic: t = 10.29 P -value: P -value < 0.0001 Decision: Since P -value < 0.05, then Reject H 0 Conclusion: There is a significant linear relationship between the lot size and the number of work hours. 3. Predict the number of work hours needed for a lot size of 100. The predicted number of work hours is W = 62.36586 + 3.57020(100) = 419.38586 Note that the printout also gives the prediction and is labeled as yhat. The ŷ given is for observation 9 or the ordered pair (100, 353). 4. Find a 90% prediction interval for the work hours needed for a lot size of 100. The easiest way to do a prediction interval is to have the computer do it. I had the computer compute a 90% prediction interval. The prediction interval for y when x is 100 is given as 332.207 < y x=100 < 506.565 To compute from the data, first compute the standard error of ŷ. In order to do so, x, Σx, and Σx 2 is needed. They are x = 70, Σx = 1750, and Σx 2 = 142300. The standard error for the
Linear Regression and Prediction Intervals, page 5 prediction of an individual y is SEŷ = s = 48.82331 1 + 1 n + n(x x) 2 n(σx 2 ) (Σx) 2 1 + 1 25 + 25(100 70) 2 25(142300) (1750) 2 = 50.8666387 Hence a 90% prediction interval for y when x = 100 is ŷ ± t SEŷ = 419.38586 ± 1.714(50.8666387) = (332.20044, 506.57128) Note that the computer prediction interval is more accurate since they use a better critical value. Notice that the number 50.8666387 is also given on the printout (under the label stdi). 5. Find a 99% prediction interval for the work hours needed for a lot size of 100. This can easily be modified by just changing the critical value. For a 99% confidence interval with 23 degrees of freedom, t = 2.807. Hence, ŷ ± t SEŷ = 419.38586 ± 2.807(50.8666387) = (276.60, 562.17)