STAT212_E3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS & STATISTICS Term 171 Page 1 of 9 STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 2017 @ 6:00 PM Name: ID #: Serial#: Section: 1 2 Question No Full Marks Marks Obtained Q1 15 Q2 35 Q3 10 Total 60 Directions: 1. You must show all your work to obtain full credit. 2. Round your answers to 4 decimal places. 3. You are allowed to use electronic calculators and other reasonable writing accessories that help write the exam.
STAT212_E3 Page 2 of 9 Question one An auditor for a county government would like to develop a model to predict county taxes, based on the age of single-family houses. She selects a random sample of 19 single-family houses, use the computer outputs to answer the following questions. A. The auditor wants to fit a quadratic regression mode Y i = β 0 + β 1 X i + β 2 X 2 i + ϵ 1. (4 Marks) based on the scatter-plot, What are the expected signs for the coefficients of x and x 2? explain. 2. State the quadratic regression equation and Interpret the coefficients of x and x 2. (2 Marks) The fitted equation: (2 Marks) Interpretation for the coefficient of x: (2 Marks) Interpretation for the coefficient of x 2 : B.(5 Marks) The auditor tried different models: Linear model, Quadratic model, model with transformation for x, and model with transformation for y. Which model(s) can be considered, in addition to the quadratic model? Justify your answer.
STAT212_E3 Page 3 of 9 Regression Analysis: Taxes versus Age : Taxes = 756-10.3 Age Constant 756.26 30.41 24.87 0.000 Age -10.274 1.233-8.33 0.000 S= 82.0567 R-Sq= 80.3% R-Sq(adj)= 79.2% Regression Analysis: Taxes versus Age^2 Taxes = 661-0.170 Age^2 Regression Analysis: Taxes versus Age, Age^2 Taxes = 858-24.7 Age + 0.294 Age^2 Constant 660.71 34.93 18.92 0.000 Age^2-0.17012 0.03526-4.82 0.000 S= 120.180 R-Sq= 57.8% R-Sq(adj)= 55.3% Constant 857.59 25.19 34.05 0.000 Age -24.722 2.624-9.42 0.000 Age^2 0.29354 0.05122 5.73 0.000 S =48.4076 R-Sq =93.6% R-Sq(adj)= 92.7% Regression Analysis: (1/taxes) versus Age (1/taxes) = 0.00123 + 0.000038 Age Regression Analysis: Taxes versus Sqrt(1/age) Taxes = 304 + 761 Sqrt(1/age) Constant 0.00123381 0.00004632 26.63 0.000 Age 0.00003814 0.00000188 20.30 0.000 Constant 304.25 27.79 10.95 0.000 Sqrt(1/age) 761.05 70.21 10.84 0.000 S=0.000124981 R-Sq=96.0% R-Sq(adj) = 95.8% S= 65.7661 R-Sq= 87.4% R-Sq(adj)= 86.6%
STAT212_E3 Page 4 of 9 Question Two Crazy Dave, a well-known baseball analyst, wants to determine which variables are important in predicting a team s wins in a given season. He has collected data related to wins, earned run average (ERA), and runs scored, Hits Allowed, Walks Allowed, Saves, Errors, and League (0 = American, 1 = National) for the 2009 season. He wants to develop a model to predict the number of wins. Use the computer outputs to answer the following: Correlations: Wins, E.R.A., Runs Scored, Hits Allowed, Walks Allowed, Saves, Errors, League Wins E.R.A. Runs Scored Hits Allowed Walks Allowed Saves Errors E.R.A. -0.637 0.000 Runs Scored 0.606 0.067 0.000 0.725 Hits Allowed -0.525 0.875 0.168 0.003 0.000 0.375 Walks Allowed -0.388 0.276-0.312-0.100 0.034 0.140 0.093 0.598 Saves 0.784-0.437 0.317-0.440-0.229 0.000 0.016 0.088 0.015 0.224 Errors -0.405 0.181-0.255-0.003 0.283-0.144 0.026 0.339 0.174 0.988 0.130 0.447
STAT212_E3 Page 5 of 9 A. Based on the matrix plot and the correlation matrix for the all variables: 1. (3 Marks) Which independent variable(s) is/are not linearly correlated with the number of Wins at α = 3 %? Explain. 2. (3 Marks) Can we conclude that there is a NONLINEAR relationship between number of Wins and Saves? Explain. 3. (3 Marks) If the Forward selection method is used, Which variable will be included in step 1? Explain.
STAT212_E3 Page 6 of 9 Regression Analysis: Wins versus E.R.A., Runs Scored,... Wins = 77.4-10.2 E.R.A. + 0.0717 Runs Scored - 0.0147 Hits Allowed + 0.0000 Walks Allowed + 0.669 Saves - 0.115 Errors - 0.86 League VIF Constant 77.39 24.60 3.15 0.005 E.R.A. -10.213 3.756-2.72 0.013 12.107 Runs Scored 0.071691 0.007876 9.10 0.000 1.580 Hits Allowed -0.01469 0.01965-0.75 0.463 11.577 Walks Allowed 0.00000 0.01827 0.00 1.000 3.004 Saves 0.66940 0.08813 7.60 0.000 1.674 Errors -0.11500 0.03313-3.47 0.002 1.232 League -0.865 1.128-0.77 0.452 1.513 S = 2.50672 R-Sq = 96.3% R-Sq(adj) = % Analysis of Variance Source DF SS MS F P Regression 7 3642.73 520.39 82.82 0.000 Residual Error 22 138.24 6.28 Total 29 3780.97 B. (20 Marks) Based on the output Regression Analysis: Wins versus E.R.A., Runs Scored,..., answer the following questions: 1. (2 Marks) Which variable is the most important variable in explaining number of Wins? Explain. 2. (3 Marks) Predict the number of Wins for the Team=San Diego.
3. (2 Marks) Interpret the coefficient for League. STAT212_E3 Page 7 of 9 4. (3 Marks) Compute and interpret the adjusted R 2. 5. (3 Marks) Write hypothesis for testing the overall significance of the regression model, then write your conclusion? 6. (4 Marks) At 5% level of significance, which variable(s) affect the number of wins? Explain. 7. (3 Marks) Is there reason to suspect the existence of collinearity? If yes, Explain how to remove the collinearity from the model.
STAT212_E3 Page 8 of 9 C. (2 Marks each =6 Marks ) Based on the output Best Subsets Regression: Wins versus E.R.A., Runs Scored,..., What are the best models based on 1. Adjusted R-square criterion? 2. Mallows Cp criterion? 3. Standard error? Best Subsets Regression: Wins versus E.R.A., Runs Scored,... Response is Wins W H a R i l u t k n s s s A A E S l l E L. c l l S r e R o o o a r a. r w w v o g Mallows A e e e e r u # Vars R-Sq R-Sq(adj) Cp S. d d d s s e 1 1 61.4 60.0 206.2 7.2183 X 2 1 40.6 38.5 331.6 8.9579 X 3 2 82.9 81.6 78.9 4.8929 X X 4 2 77.2 75.6 112.9 5.6449 X X 5 3 94.3 93.6 12.6 2.8908 X X X 6 3 88.8 87.5 45.6 4.0413 X X X 7 4 96.0 95.4 3.8 2.4443 X X X X 8 4 94.3 93.4 14.1 2.9282 X X X X 9 5 96.2 95.5 4.7 2.4353 X X X X X 10 5 96.1 95.3 5.2 2.4663 X X X X X 11 6 96.3 95.4 6.0 2.4516 X X X X X X 12 6 96.3 95.3 6.6 2.4826 X X X X X X 13 7 96.3 95.2 8.0 2.5067 X X X X X X X
STAT212_E3 Page 9 of 9 Question Three I. (2 Marks) For the following regression equation y! = 12 2x 1 + 5x 2 3x 1 x 2 a unit increase in, while holding constant at a value of 2, decreases the value of y on average by:. II. (2 Marks) If the Durbin Watson statistic approaches 0, it means that the residuals are III. (2 Marks) Which residual plot would you examine to determine whether the assumption of constant error variance is satisfied for a model with two independent variables x1 and x2? Answer: IV. (2 Marks) The Variance Inflationary Factor (VIF) measures the a) correlation of the X variables with the Y variable. b) correlation of the X variables with each other c) contribution of each X variable with the Y variable after all other X variables are included in the model. d) standard deviation of the slope. e) none of the above V. (2 Marks) The Cp statistic is used a) to determine if there is a problem of collinearity. b) if the variances of the error terms are all the same in a regression model. c) to choose the best model d) to determine if there is an irregular component in a time series. e) none of the above.