Math 221: Multiple Regression S. K. Hyde Chapter 27 (Moore, 5th Ed.) The following data set contains observations on the fish biomass of 26 streams. The potential regressors from which we wish to explain the biomass (y) are the average depth of the stream (x 1 ), the area of instream cover (x 2 ) and the percent of canopy cover (x 3 ). Obs no. x 1 x 2 x 3 y 1 12.980 0.317 9.998 57.702 2 14.295 2.028 6.776 59.296 3 15.531 5.305 2.947 56.166 4 15.133 4.738 4.201 55.767 5 15.342 7.038 2.053 51.722 6 17.149 5.982 0.055 60.446 7 15.462 2.737 4.657 60.715 8 12.801 10.663 3.048 37.447 9 17.039 5.132 0.257 60.974 10 13.172 2.039 8.738 55.270 11 16.125 2.271 2.101 59.289 12 14.340 4.077 5.545 54.027 13 12.923 2.643 9.331 53.199 14 14.231 10.401 1.041 41.896 15 15.222 1.220 6.149 63.264 16 15.740 10.612 1.691 45.798 17 14.958 4.815 4.111 58.699 18 14.125 3.153 8.453 50.086 19 16.391 9.698 1.714 48.890 20 16.452 3.912 2.145 62.213 21 13.535 7.625 3.851 45.625 22 14.199 4.474 5.112 53.923 23 15.837 5.753 2.087 55.799 24 16.565 8.546 8.974 56.741 25 13.322 8.598 4.011 43.145 26 15.949 8.290 0.248 50.706 Table 1: Fish Biomass data set on 26 streams
Multiple Regression, page 2 Dependent Variable: Y fish biomass Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 3 1062.33881 354.11294 109.675 0.0001 Error 22 71.03239 3.22875 C Total 25 1133.37120 Root MSE 1.79687 R-square 0.9373 Dep Mean 53.80019 Adj R-sq 0.9288 C.V. 3.33990 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > T INTERCEP 1 8.205460 6.29105860 1.304 0.2056 X1 1 3.560581 0.36122443 9.857 0.0001 X2 1-1.640143 0.15952144-10.282 0.0001 X3 1 0.334277 0.17827217 1.875 0.0741 Figure 1: SAS Multiple Regression printout
Multiple Regression, page 3 1. Find a 99% confidence interval for the coefficient of the canopy cover variable. 2. Test the hypothesis of no multiple linear relationship (e.g. The overall significance of the multiple linear regression equation). F = 3. How well does the multiple linear regression equation predict the data? 4. Test the hypothesis that the coefficient of x 1 is zero. t =
Multiple Regression, page 4 5. Test the hypothesis that the coefficient of x 2 is zero. t = 6. Test the hypothesis that the coefficient of x 3 is zero. t = 7. Predict the fish biomass for a depth of the stream of 5, an area of 3 for the instream cover and the 42 percent canopy cover.
Multiple Regression, page 5 1. Find a 99% confidence interval for the coefficient of the canopy cover variable. The degrees of freedom for the critical value are found by either finding the degrees of freedom for the ERROR on the printout (df = 22) or by computing df = n k 1 = 26 3 1 = 22, where k is the number of independent variables. The critical value found in the t-table is 2.819. The estimate and standard error of the estimate for canopy cover can be found on the printout. They are 0.334277 and 0.17827217 respectively. Hence, a 99% confidence interval for the coefficient of canopy cover is b 3 ± m = b 3 ± t SE b3 = 0.334277 ± 2.819(.17827217) = ( 0.16827, 0.83683) 2. Test the hypothesis of no multiple linear relationship (e.g. The overall significance of the multiple linear regression equation). The test statistic and P -value for this test can be found on the printout under the Analysis of Variance (ANOVA) section. The f stat is 109.675 and its P -value is given to the right of it on the table. The model is not significant The model is significant 0.05 f = 109.675 P =.0001 Decision: Reject H 0 There is a significant multiple regression model between the depth of the stream, percent of canopy cover, and area of instream cover. Jointly, the three variables explain a significant amount of variability in the fish biomass. 3. How well does the multiple linear regression equation predict the data? Since r 2 =.9373, then it can be concluded that it fits well. In fact the adjusted r 2 value is not very much different. 4. Test the hypothesis that the coefficient of x 1 is zero. The test statistic and P -value for this test can be found on the line labeled X1 in the Parameter Estimates table of the printout. The test statistic is 9.857 with a P -value of 0.0001. The test of hypothesis proceeds as follows: β 1 = 0 β 1 0 0.05 t = 9.857 P =.0001 Decision: Reject H 0 There is sufficient evidence to indicate that the coefficient for depth of the stream is different than zero.
Multiple Regression, page 6 5. Test the hypothesis that the coefficient of x 2 is zero. The test statistic and P -value for this test can be found on the line labeled X2 in the Parameter Estimates table of the printout. The test statistic is 10.282 with a P -value of 0.0001. The test of hypothesis proceeds as follows: β 2 = 0 β 2 0 0.05 t = 10.282 P =.0001 Decision: Reject H 0 There is sufficient evidence to indicate that the coefficient for area of instream cover is different than zero. 6. Test the hypothesis that the coefficient of x 3 is zero. The test statistic and P -value for this test can be found on the line labeled X3 in the Parameter Estimates table of the printout. The test statistic is 1.875 with a P -value of 0.0741. The test of hypothesis proceeds as follows: β 3 = 0 β 3 0 0.05 t = 1.875 P =.0741 Decision: Fail to Reject H 0 There is not sufficient evidence to indicate that the coefficient for percent of canopy cover is different than zero. 7. Predict the fish biomass for a depth of the stream of 5, an area of 3 for the instream cover and the 42 percent canopy cover. ŷ = 8.20546 + 3.560581(5) 1.64014(3) + 0.334277(42) = 35.12757