Chapter 14 Multiple Regression Analysis

Similar documents
(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Basic Business Statistics 6 th Edition

Ch 13 & 14 - Regression Analysis

Statistics for Managers using Microsoft Excel 6 th Edition

The simple linear regression model discussed in Chapter 13 was written as

STATISTICS 110/201 PRACTICE FINAL EXAM

Correlation Analysis

28. SIMPLE LINEAR REGRESSION III

Model Building Chap 5 p251

Confidence Interval for the mean response

MBA Statistics COURSE #4

Chapter 12: Multiple Regression

SMAM 314 Practice Final Examination Winter 2003

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapter 15 Multiple Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Econometrics Homework 1

Basic Business Statistics, 10/e

Multiple Regression Methods

Practice Questions for Exam 1

STAT 212 Business Statistics II 1

INFERENCE FOR REGRESSION

Chapter 4: Regression Models

Chapter 4. Regression Models. Learning Objectives

Chapter 3 Multiple Regression Complete Example

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Mathematics for Economics MA course

Regression Models REVISED TEACHING SUGGESTIONS ALTERNATIVE EXAMPLES

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Correlation and Regression

27. SIMPLE LINEAR REGRESSION II

Analysis of Bivariate Data

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Homework 1 Solutions

School of Mathematical Sciences. Question 1

Inferences for Regression

Solutions to Exercises in Chapter 9

Final Exam Bus 320 Spring 2000 Russell

Multiple Regression: Chapter 13. July 24, 2015

Chapter 13. Multiple Regression and Model Building

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

Chapter 14 Student Lecture Notes 14-1

Inference for Regression Inference about the Regression Model and Using the Regression Line

Linear Regression Communication, skills, and understanding Calculator Use

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Multiple Regression Examples

FAQ: Linear and Multiple Regression Analysis: Coefficients

LI EAR REGRESSIO A D CORRELATIO

FREC 608 Guided Exercise 9

Introduction to Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Section 3: Simple Linear Regression

12.12 MODEL BUILDING, AND THE EFFECTS OF MULTICOLLINEARITY (OPTIONAL)

Correlation & Simple Regression

Ch14. Multiple Regression Analysis

Inference for the Regression Coefficient

SMAM 314 Exam 42 Name

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Examining Relationships. Chapter 3

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Concordia University (5+5)Q 1.

Solution: X = , Y = = = = =

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

23. Inference for regression

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 7 9 Review. Select the letter that corresponds to the best answer.

Mrs. Poyner/Mr. Page Chapter 3 page 1

STAT 350 Final (new Material) Review Problems Key Spring 2016

STATS DOESN T SUCK! ~ CHAPTER 16

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Test 3A AP Statistics Name:

Chapter 7 Student Lecture Notes 7-1

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

MULTIPLE LINEAR REGRESSION IN MINITAB

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Problem 4.1. Problem 4.3

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

1 Introduction to Minitab

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Regression Models. Chapter 4

Final Exam - Solutions

Business 320, Fall 1999, Final

y response variable x 1, x 2,, x k -- a set of explanatory variables

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Sem. 1 Review Ch. 1-3

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 16. Simple Linear Regression and Correlation

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

Transcription:

Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b. One dependent, four independent c. A regression coefficient d..2 e. 15.14, found by Y ˆ = 11.6 +.4(6) +.286(28) +.112(97) +.2(35) (LO 1) 3. a. 497.736, found by Y ˆ = 16.24 +.17(18) +.28(26,5) + 42(3) +.12(156,) +.19(141) + 26.8(2.5) b. Two more social activities. Income added only 28 to the inde; social activities added 53.6. (LO 1) 4. a. 3.69 cubic feet b. A decrease of 1.86 cubic feet, down to 28.83 cubic feet. c. Yes, logically, as the amount of insulation increases and outdoor temperature increases, the consumption of natural gas decreases. (LO 1) SSE 583.693 5. a. s Y. 12 9.414 3. 68 n ( k 1) 65 (2 1) 95% of the residuals will be between ±6.136, found by 2(3.68) 2 SSR 77.97 b. R. 118 SStotal 661.6 The independent variables eplain 11.8% of the variation. SSE 583.693 2 n ( k 1) 65 (2 1) 9.414 c. R adj 1 1 1 1.911. 89 (LO 3) SStotal 661.6 1.3375 n 1 65 1 SSE 2647.38 6. a. s Y. 12345 57.55 7. 59 n ( k 1) 52 (5 1) 95% of the residuals will be between ±15.17, found by 2(7.59) 2 SSR 371 b. R. 584 SStotal 6357.38 The independent variables eplain 58.4% of the variation. 14-1

SSE 2647.38 c. 2 n ( k 1) 46 57.55 R adj 1 1 1 1.462. 538 (LO 3) SStotal 6357.38 124.65 n 1 51 7. a. Y ˆ 84.998 2.391 X.486 1 X2 b. 9.674, found by Y^ = 84.998+2.391(4)-.486(11) c. n = 65 and k = 2 d. H o : 1 = 2 = H 1 : Not all s are Reject H o if F > 3.15 F = 4.14 Reject H o. Not all net regression coefficients equal zero. e. For X 1 for X 2 H o : 1 = H o : 2 = H 1 : 1 H 1 : 2 t = 1.99 t = -2.38 Reject H o if t > 2. or t < 2. Delete variable 1 and keep 2 f. The regression analysis should be repeated with only X 2 as the independent variable. (LO 2, LO 4) 8. a. Yˆ 7.987.12242X1.12166X2.6281X3.5235X4.6472X5. b n = 52 and k = 5 c. H o : 1 = 2 = 3 = 4 = 5 = H 1 : Not all s are Reject H o if F > 2.417 Since the computed value of F is 12.89, reject H o. Not all of the regression coefficients are zero. d. For X 1 for X 2 for X 3 for X 4 for X 5 H o : 1 = H o : 2 = H o : 3 = H o : 4 = H o : 5 = H 1 : 1 H 1 : 2 H 1 : 3 H 1 : 4 H 1 : 5 t = 3.92 t = 2.27 t = -1.61 t = 3.69 t = -1.62 Reject H o if t > 2.13 or t < 2.13. Consider eliminating variables 3 and 5. e. Delete variable 3 then rerun. Perhaps you will delete variable 5 also. (LO 2, LO 4) 9. a. The regression equation is: Performance = 29.3 + 5.22 Aptitude + 22.1 Union Predictor Coef SE Coef T P Constant 29.28 12.77 2.29.41 Aptitude 5.222 1.72 3.7.1 Union 22.135 8.852 2.5.28 S = 16.9166 R-Sq = 53.3% R-Sq(adj) = 45.5% Analysis of Variance Source DF SS MS F P Regression 2 3919.3 1959.6 6.85.1 Residual Error 12 3434. 286.2 Total 14 7353.3 b. These variables are effective in predicting performance. They eplain 45.5 percent of the variation in performance. In particular union membership increases the typical performance by 22.1. c. H o : 2 = H 1 : 2 Reject H o if t < 2.1788 or t > 2.1788 Since 2.5 is greater than 2.1788 we reject the null hypothesis and conclude that union membership is significant and should be included. (LO 1, LO 5, LO 9) 14-2

1. The number of family members would enter first. None of the other variables is significant. Response is Size on 4 predictors, with N = 1 Step 1 2 Constant 1271.6 712.7 Family 541 372 T-Value 6.12 3.59 P-Value..9 Income 12.1 T-Value 2.26 P-Value.59 S 336 273 R-Sq 82.42 89.82 R-Sq(adj) 8.22 86.92 (LO 1, LO 1) 11. a. n = 4 b. 4 c. R 2 = 75/125 =.6 d. S y 1234 5/35 3.7796 e. H o : 1 = 2 = 3 = 4 = H 1 : Not all s equal H o is rejected if F > 2.641 75/ 4 F 13.125 5/ 35 H o is rejected. At least one I does not equal zero. (LO 2, LO 3, LO 4) 12. H o : 1 = H o : 2 = H 1 : 1 H 1 : 2 H o is rejected if t < 2.74 or t > 2.74 2.676.88 t 4.779 t 1.239.56.71 The second variable can be deleted. (LO 5) 13. a. n = 26 b. R 2 = 1/14 =.7143 c. 1.4142, found by 2 d. H o : 1 = 2 = 3 = 4 = 5 = H 1 : Not all s are Reject H o if F > 2.71 Computed F = 1.. Reject H o. At least one regression coefficient is not zero. e. H o is rejected in each case if t < 2.86 or t > 2.86. X 1 and X 5 should be dropped. (LO 2, LO 3, LO 4, LO 5) 14. H o : 1 = 2 = 3 = 4 = 5 = H 1 : Not all s equal zero. df 1 = 5 df 2 = 2 (5 + 1) = 14, so H o is rejected if F > 2.96 Source SS df MSE F 14-3

Regression 448.28 5 89.656 17.58 Error 71.4 14 5.1 Total 519.68 19 So H o is rejected. Not all the regression coefficients equal zero. (LO 4) 15. a. $28, b. 2 SSR 35.589 found by R SStotal 525 c. 9.199 found by 84.62 d. H o is rejected if F > 2.97 (approimately) Computed F = 116.67/84.62 = 12.1 H o is rejected. At least one regression coefficient is not zero. e. If computed t is to the left of 2.56 or to the right of 2.56, the null hypothesis in each of these cases is rejected. Computed t for X 2 and X 3 eceed the critical value. Thus, population and advertising epenses should be retained and number of competitors, X 1 dropped. (LO 1, LO 3, LO 4, LO 5) 16. a. The strongest relationship is between sales and income (.964). A problem could occur if both cars and outlets are part of the final solution. Also, outlets and income are strongly correlated (.825). This is called multicollinearity. b. 2 1593.81 R.9943 162.89 c. H o is rejected. At least one regression coefficient is not zero. The computed value of F is 14.42. d. Delete outlets and bosses. Critical values are 2.776 and 2.776 e. 2 1593.66 R.9942 162.89 There was little change in the coefficient of determination f. The normality assumption appears reasonable. g. There is nothing unusual about the plots. (LO 4, LO 5, LO 6, LO 7) 17. a. The strongest correlation is between GPA and legal. No problem with multicollinearity. b. 2 4.3595 R.861 5.631 c. H o is rejected if F > 5.41 F = 1.4532/.147 = 1.328 At least one coefficient is not zero. d. Any H o is rejected if t < 2.571 or t > 2.571. It appears that only GPA is significant. Verbal and math could be eliminated. 2 4.261 e. R.837 R 2 has only been reduced.33 5.631 f. The residuals appear slightly skewed (positive), but acceptable. g. There does not seem to be a problem with the plot. (LO 4, LO 5, LO 6, LO 7) 14-4

18. a. The correlation matri is: Salary Years Rating Years.868 Rating.547.187 Master.311.28.458 Years has the strongest correlation with salary. There does not appear to be a problem with multicollinearity. b. The regression equation is: Ŷ = 19.92 +.899X 1 +.154X 2.67 X 3 Y ˆ = 33.655 or $33,655 c. H o is rejected if F > 3.24 Computed F = 31.6/5.71 = 52.72 H o is rejected. At least one regression coefficient is not zero. d. A regression coefficient is dropped if computed t is to the left of 2.12 or right of 2.12. Keep years and rating ; drop masters. e. Dropping masters, we have: Salary = 2.1157 +.8926(years) +.1464(rating) f. The stem and leaf display and the histogram revealed no problem with the assumption of normality. Again using MINITAB: Midpoint Count 4 1 * 3 1 * 2 3 * * * 1 3 * * * 4 * * * * 1 4 * * * * 2 3 3 * * * 4 5 1 * g. There does not appear to be a pattern to the residuals according to the following MINITAB plot. (LO 4, LO 5, LO 6, LO 7) Residuals (Y - Y') 6 4 2-2 1 15 2 25 3 35 4 45-4 -6 Y' 19. a. The correlation of Screen and Price is.893. So there does appear to be a linear relationship between the two. b. Price is the dependent variable. c. The regression equation is Price = 2484 + 11 Screen. For each inch increase in screen size, the price increases $11 on average. 14-5

d. Using dummy indicator variables for Sharp and Sony, the regression equation is Price = 238 + 94.1 Screen + 15 Manufacturer Sharp + 381 Manufacturer Sony Sharp can obtain on average $15 more than Samsung and Sony can collect an additional benefit of $381. e. Here is some of the output. Predictor Coef SE Coef T P Constant -238.2 492. -4.69. Screen 94.12 1.83 8.69. Manufacturer_Sharp 15.1 171.6.9.931 Manufacturer_Sony 381.4 168.8 2.26.36 The p-value for Sharp is relatively large. A test of their coefficient would not be rejected. That means they may not have any real advantage over Samsung. On the other hand, the p-value for the Sony coefficient is quite small. That indicates that it did not happen by chance and there is some real advantage to Sony over Samsung. f. A histogram of the residuals indicates they follow a normal distribution. 4 Residual plot Normal 3-5.93 Frequency 2 1-6 -4-2 RESI1 2 4 6 14-6

g. The residual variation may be increasing for larger fitted values. 5 Residuals vs. Fits for LCD's 25 RESI1-25 -5 1 15 2 FITS1 25 3 (LO 1, LO 5, LO 6, LO 8) 2. a. The correlation of Income and Median Age =.721. So there does appear to be a linear relationship between the two. b. Income is the dependent variable. c. The regression equation is Income = 2285 + 362 Median Age. For each year increase in age, the income increases $362 on average. d. Using an indicator variable for a coastal county, the regression equation is Income = 29619 + 137 Median Age + 164 Coastal. That changes the estimated effect of an additional year of age to $137 and the effect of living in the coastal area as adding $1,64 to income. e. Notice the p-values are virtually zero on the output following. That indicates both variables are significant influences on income Predictor Coef SE Coef T P Constant 29619.2.3 9269.67. Median Age 136.887.8 1858.14. Coastal 1639.7.2 54546.92. 14-7

f. A histogram of the residuals is close to a normal distribution. Residual plot Normal 3. 2.5 Mean -3.23376E-12 StDev.25 N 9 Frequency 2. 1.5 1..5. -.4 -.2. RESI2.2.4 g The variation is about the same across the different fitted values..4 Residuals vs. Fits.3.2 RESI2.1. -.1 -.2 -.3 35 4 FITS2 45 5 (LO 1, LO 5, LO 6, LO 8) 14-8

21. a. Scatterplot of Sales vs Advertising, Accounts, Competitors, Potential Advertising Accounts 3 2 1 Sales 2 4 6 8 3 4 5 6 7 Competitors Potential 3 2 1 4 6 8 1 12 5 1 15 2 Sales seem to fall with the number of competitors and rise with the number of accounts and potential. b. Pearson correlations Sales Advertising Accounts Competitors Advertising.159 Accounts.783.173 Competitors -.833 -.38 -.324 Potential.47 -.71.468 -.22 The number of accounts and the market potential are moderately correlated. c. The regression equation is Sales = 178 + 1.81 Advertising + 3.32 Accounts - 21.2 Competitors +.325 Potential Predictor Coef SE Coef T P Constant 178.32 12.96 13.76. Advertising 1.87 1.81 1.67.19 Accounts 3.3178.1629 2.37. Competitors -21.185.7879-26.89. Potential.3245.4678.69.495 S = 9.6441 R-Sq = 98.9% R-Sq(adj) = 98.7% Analysis of Variance Source DF SS MS F P Regression 4 176777 44194 479.1. Residual Error 21 1937 92 Total 25 178714 The computed F value is quite large. So we can reject the null hypothesis that all of the regression coefficients are zero. We conclude that some of the independent variables are effective in eplaining sales. d. Market potential and advertising have large p-values (.495 and.19, respectively). You would probably drop them. 14-9

e. If you omit potential, the regression equation is Sales = 18 + 1.68 Advertising + 3.37 Accounts - 21.2 Competitors Predictor Coef SE Coef T P Constant 179.84 12.62 14.25. Advertising 1.677 1.52 1.59.125 Accounts 3.3694.1432 23.52. Competitors -21.2165.7773-27.3. Now advertising is not significant. That would also lead you to cut out the advertising variable and report the polished regression equation is: Sales = 187 + 3.41 Accounts - 21.2 Competitors. Predictor Coef SE Coef T P Constant 186.69 12.26 15.23. Accounts 3.481.1458 23.37. Competitors -21.193.828-26.4. f. 9 Histogram of the Residuals (response is Sales) 8 7 6 Frequency 5 4 3 2 1-2 -1 Residual 1 2 The histogram looks to be normal. There are no problems shown in this plot. g. The variance inflation factor for both variables is 1.1. They are less than 1. There are no troubles as this value indicates the independent variables are not strongly correlated with each other. (LO 4, LO 5, LO 6, LO 7, LO 9) 14-1

22. a. Ŷ = 5.7328 +.754X 1 +.59X 2 + 1.974X 3 b. H o : 1 = 2 = 3 = H i : Not all I s = Reject H o if F > 3.7 11.3437 /3 F 35.38 2.2446 / 21 c. All coefficients are significant. Do not delete any. d. The residuals appear to be random. No problem. Residual.8.6.4.2 -.2 37 38 39 4 41 -.4 -.6 -.8 Fitted e. The histogram appears to be normal. No problem. Histogram of Residual N = 25 Midpoint Count.6 1 *.4 3 ***.2 6 ******. 6 ******.2 6 ******.4 2 **.6 1 * 23. The computer output is: Predictor Coef Stdev t-ratio P Constant 651.9 345.3 1.89.71 Service 13.422 5.125 2.62.15 Age 6.71 6.349 1.6.31 Gender 25.65 9.27 2.28.32 Job 33.45 89.55.37.712 Analysis of Variance Source DF SS MS F p Regression 4 16683 26678 4.77.5 Error 25 1398651 55946 Total 29 2465481 14-11

a. Y ˆ = 651.9 + 13.422X1 6.71X 2 + 25.65X 3 33.45X 4 b. R 2 =.433, which is somewhat low for this type of study. c. H o : 1 = 2 = 3 = 4 = H i : Not all I s = Reject H o if F > 2.76 16683/ 4 F 4.77 1398651/ 25 H o is rejected. Not all the i s equal zero. d. Using the.5 significance level, reject the hypothesis that the regression coefficient is if t < 2.6 or t > 2.6. Service and gender should remain in the analyses, age and job should be dropped. e. Following is the computer output using the independent variable service and gender. Predictor Coef Stdev t-ratio P Constant 784.2 316.8 2.48.2 Service 9.21 3.16 2.9.7 Gender 224.41 87.35 2.57.16 Analysis of Variance Source DF SS MS F p Regression 2 998779 499389 9.19.1 Error 27 146673 54322 Total 29 2465481 A man earns $224 more per month than a woman. The difference between technical and clerical jobs in not significant. (LO 1, LO 4, LO 5, LO 6) 24. a. The correlation matri is: Food Income Income.156 Size.876.98 Income and size are not related. There is no multicollinearity. b. The regression equation is: Food = 2.84 +.613 Income +.425 Size. Another dollar of income leads to an increased food bill of.6 cents. Another member of the family adds $425 to the food bill. c. 2 18.41 R.826 22.29 To conduct the global test: H o : = 1 2 versus H 1 : Not all i s = At the.5 significance level, H o is rejected if F > 3.44. Source DF SS MS F p Regression 2 18.495 9.248 52.14. Error 22 3.884.1765 Total 24 22.2935 The computed value of F is 52.14, so H o is rejected. Some of the regression coefficients and R 2 are not zero. d. Since the p-values are less than.5, there is no need to delete variables. Predictor Coef SE Coef T P Constant 2.8436.2618 1.86. Income.6129.225 2.72.12 Size.42476.4222 1.6. 14-12

e. The residuals appear normally distributed. Histogram of the Residuals (response is Food) 6 5 Frequency 4 3 2 1 -.8 -.6 -.4 -.2. Residual.2.4.6.8 f. The variance is the same as we move from small values to large. So there is no homoscedasticity problem. Residuals Versus the Fitted Values (response is Food) 1 Residual -1 (LO 1, LO 3, LO 4, LO 5) 4 5 Fitted Value 6 7 14-13

25. a. The regression equation is P/E = 29.9 5.32 EPS + 1.45 Yield. b. Here is part of the software output: Predictor Coef SE Coef T P Constant 29.913 5.767 5.19. EPS 5.324 1.634 3.26.5 Yield 1.449 1.798.81.431 Thus EPS has a significant relationship with P/E while Yield does not. c. As EPS increase by one, P/E decreases by 5.324 and when Yield increases by one, P/E increases by 1.449. d. The second stock has a residual of -18.43, signifying that it s fitted P/E is around 21. e. 7 6 5 Frequency 4 3 2 1-2 -15-1 -5 5 1 Residuals f. The distribution of residuals may show a negative skewed. Residuals Versus the Fitted Values (response is P/E) 1 Residual -1-2 1 2 Fitted Value 3 4 There does not appear to any problem with homoscedasticity. 14-14

g. The correlation matri is P/E EPS EPS.62 Yield.54.162 The correlation between the independent variables is.162. That is not multicollinearity. (LO 1, LO 3, LO 6, LO 7) 26. a. The regression equation is Tip = -.586 +.985 Bill +.598 Diners Predictor Coef SE Coef T P VIF Constant -.586.3952-1.48.15 Bill.9852.283 4.73. 3.6 Diners.598.2322 2.58.16 3.6 Another diner on average adds $.6 to the tip. b. Analysis of Variance Source DF SS MS F P Regression 2 112.596 56.298 89.96. Residual Error 27 16.898.626 Total 29 129.494 The F value is quite large and the p value is etremely small. So we can reject a null hypothesis that all the regression coefficients are zero and conclude that at least one of the independent variables is significant. c. The null hypothesis is that the coefficient is zero in the individual test. It would be rejected if t is less than -2.5183 or more than 2.5183. In this case they are both larger than the critical value of 2.518. Thus, neither should be removed. d. The coefficient of determination is.87, found by 112.596/129.494. e. 7 Histogram of the Residuals (response is Tip) 6 5 Frequency 4 3 2 1-1.2 -.6. Residual.6 1.2 The residuals are approimately normally distributed. 14-15

f. 1.5 Residuals Versus the Fitted Values (response is Tip) 1..5 Residual. -.5-1. 1 2 3 4 5 Fitted Value 6 7 8 The residuals look random. (LO 1, LO 6, LO 7) 27. a. The regression equation is Sales = 1.2 +.829 Infomercials Predictor Coef SE Coef T P Constant 1.188.315 3.28.6 Infomercials.8291.168 4.94. S =.38675 R-sq = 65.2% R-sq(adj) = 62.5% Analysis of Variance Source DF SS MS F P Regression 1 2.3214 2.3214 24.36. Residual Error 13 1.2386.953 Total 14 3.56 The global test on F demonstrates there is a substantial connection between sales and the number of commercials. b. 3. Histogram of the Residuals (response is Sales) 2.5 Frequency 2. 1.5 1..5. -.4 -.2. Residual.2.4 The residuals are close to normally distributed. (LO 1, LO 3, LO 4, LO 5, LO 6) 14-16

28. a. The regression equation is: Fall = 1.72 +.762 July Predictor Coef SE Coef T P Constant 1.721 1.123 1.53.149 July.7618.1186 6.42. Analysis of Variance Source DF SS MS F P Regression 1 36.677 36.677 41.25. Residual Error 13 11.559.889 Total 14 48.236 The global F test demonstrates there is a substantial connection between attendance at the Fall Festival and the fireworks. b. The Durbin-Watson statistic is 1.613. There is no problem with autocorrelation. (LO 1, LO 6, LO 7) 29. The computer output is as follows: Predictor Coef StDev T P Constant 57.3 39.99 1.43.157 Bedrooms 7.118 2.551 2.79.6 Size.38.1468 2.59.11 Pool 18.321 6.999 2.62.1 Distance.9295.7279 1.28.25 Garage 35.81 7.638 4.69. Baths 23.315 9.25 2.58.11 s = 33.21 R-sq = 53.2% R-sq (adj) = 5.3% Analysis of Variance Source DF SS MS F P Regression 6 122676 2446 18.54. Error 98 1892 113 Total 14 23768 a. The regression equation is: Price = 57. + 7.12 Bedrooms +.38 Size 18.3 Pool.929 Distance + 35.8 Garage + 23.3 Baths. Each additional bedroom adds about $7 to the selling price, each additional square foot of space in the house adds $38 to the selling price, each additional mile the home is from the center of the city lowers the selling price about $9, a pool lowers the selling price $18,3, an attached garage adds and additional $35,8 to the selling price as does another bathroom $23,3. b. The R-square value is.532. The variation in the si independent variables eplains somewhat more than half of the variation in the selling price. c. The correlation matri is as follows: Price Bedrooms Size Pool Distance Garage Bedrooms.467 Size.371.383 Pool.294.5.21 Distance.347.153.117.139 Garage.526.234.83.114.359 Baths.382.329.24.55.195.221 The independent variable Garage has the strongest correlation with price. Distance and price are inversely related, as epected. There does not seem to be any strong correlations among the independent variables. 14-17

d. To conduct the global test: H o : 1 = 2 = 3 = 4 = 5 = 6 =, H 1 : Some of the net regression coefficients do not equal zero. The null hypothesis is rejected if F > 2.25. The computed value of F is 18.54, so the null hypothesis is rejected. We conclude that all the net coefficients are equal to zero. e. In reviewing the individual values, the only variable that appears not to be useful is distance. The p-value is greater than.5. We can drop this variable and rerun the analysis. f. The output is as follows: Predictor Coef StDev T P Constant 36.12 36.59.99.326 Bedrooms 7.169 2.559 2.8.6 Size.3919.147 2.67.9 Pool 19.11 6.994 2.73.7 Garage 38.847 7.281 5.34. Baths 24.624 8.995 2.74.7 s = 33.32 R-sq = 52.4% R-sq (adj) = 5.% Analysis of Variance Source DF SS MS F P Regression 5 12877 24175 21.78. Error 99 1989 111 Total 14 23768 In this solution observe that the p-value for the Global test is less than.5 and the p- values for each of the independent variables are also all less than.5. The R-square value is.524, so it did not change much when the independent variable distance was dropped g. The following histogram was developed using the residuals from part f. The normality assumption is reasonable. Histogram of Residuals 8 1 * 6 7 ******* 4 13 ************* 2 2 ******************** 2 ******************** 2 26 ************************** 4 1 ********** 6 8 ******** 14-18

h. The following scatter diagram is based on the residuals in part f. There does not seem to be any pattern to the plotted data. (LO 9) - - 2 5+ RESI1 - - - 2 - + 2 2-2 - - 2 2-2 -5+ - 2 - - - --------+---------+---------+---------+---------+--------FITS1 175 2 225 25 275 3. a. The regression equation is: Wins = 39.7 + 392 BA +.189 SB.991 Errors 17. ERA +.115 HR +.7 League AL For each additional point that the team batting average increases, the number of wins goes up by.392. Each stolen base adds.189 to average wins. Each error lowers the average number of wins by.991. An increase of one on the ERA decreases the number of wins by 17.. Each home run adds.115 wins. Playing in the American League raises the number of wins by.7. b. R-Sq is.866, found by 341/3512. Eighty-seven percent of the variation in the number of wins is accounted for by these si variables. c. The correlation matri is as follows: Wins BA SB Errors ERA HR BA.461 SB.34 -.176 Errors -.634 -.166 -.133 ERA -.681.58 -.23.48 HR.438.317 -.38 -.279.87 League_AL.49.224.27 -.16.145.114 Errors and ERA have the strongest correlation with winning. Stolen bases and league have the weakest correlation with winning. The highest correlation among the independent variables is.48. So multicollinearity does not appear to be a problem. d. To conduct the global test: H o : 1 = 2 = = 6 =, H 1 : Not all I s = At the.5 significance level, H o is rejected if F > 2.528. Source DF SS MS F P Regression 6 341.15 56.86 24.76. Residual Error 23 47.85 2.47 Total 29 3512. Since the p-value is virtually zero. You can reject the null hypothesis that all of the net regression coefficients are zero and conclude at least one of the variables has a significant relationship to winning. 14-19

e. League would be deleted first because of its large p-value. Predictor Coef SE Coef T P Constant 39.71 25.66 1.55.135 BA 392.36 89.14 4.4. SB.1892.318.6.558 Errors -.991.626-1.64.114 ERA -16.976 2.417-7.2. HR.11451.2971 3.85.1 League_AL.71 1.859.4.97 The p-values for stolen bases, errors and league are large. Think about deleting these variables. f. The reduced regression equation is: Wins = 36.2 + 46 BA - 19.3 ERA +.125 HR g. The residuals are normally distributed. Frequency 9 8 7 6 5 4 3 2 1 Histogram of Residuals -5 RESI1 5 1 h. This plot appears to be random and to have a constant variance. 15 Residuals versus Fitted values 1 RESI1 5-5 -1 5 6 7 FITS3 8 9 1 (LO 1, LO 4, LO 5, LO 6, LO 7, LO 1) 31. a. The regression equation is: Maintenance = 12 + 5.94 Age +.374 Miles 11.8 GasolineIndicator 14-2

Each additional year of age adds $5.94 to upkeep cost. Every etra mile adds $.374 to maintenance total. Gasoline buses are cheaper to maintain than diesel by $11.8 per year. b. The coefficient of determination is.286, found by 65135/227692. Twenty nine percent of the variation in maintenance cost is eplained by these variables. c. The correlation matri is: Maintenance Age Miles Age.465 Miles.45.522 GasolineIndicato -.118 -.68.25 Age and Miles both have moderately strong correlations with maintenance cost. The highest correlation among the independent variables is.522 between Age and Miles. That is smaller than.7 so multicollinearity may not be a problem. d. Analysis of Variance Source DF SS MS F P Regression 3 65135 21712 1.15. Residual Error 76 162558 2139 Total 79 227692 The p-value is zero. Reject the null hypothesis of all coefficients being zero and say at least one is important. e. Predictor Coef SE Coef T P Constant 12.3 112.9.91.368 Age 5.939 2.227 2.67.9 Miles.374.145 2.58.12 GasolineIndicator -11.8 1.99-1.7.286 The p-value the gasoline indicator is bigger than.l. Consider deleting it. f. The condensed regression equation is Maintenance = 16 + 6.17 Age +.363 Miles g. 25 Histogram of the Residuals 2 Frequency 15 1 5-1 -5 Residuals 5 1 The normality conjecture appears realistic. 14-21

h. 1 Plot of Residuals versus the Fitted Values 5 Residuals -5-1 -15 4 42 44 46 48 Fitted Values 5 52 54 This plot appears to be random and to have a constant variance. (LO 1, LO 4, LO 5, LO 6, LO 7, LO 1) 14-22