The simple linear regression model discussed in Chapter 13 was written as

Similar documents
Ch 13 & 14 - Regression Analysis

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Multiple Regression Methods

Basic Business Statistics, 10/e

Correlation Analysis

Basic Business Statistics 6 th Edition

Chapter 7 Student Lecture Notes 7-1

Chapter 14 Student Lecture Notes 14-1

INFERENCE FOR REGRESSION

Chapter 13. Multiple Regression and Model Building

Chapter 3 Multiple Regression Complete Example

Multiple Regression Examples

Chapter 4. Regression Models. Learning Objectives

Chapter 15 Multiple Regression

The Multiple Regression Model

Mathematics for Economics MA course

Inference with Simple Regression

Regression Analysis II

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 16. Simple Linear Regression and dcorrelation

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Inferences for Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 14 Simple Linear Regression (A)

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

STAT 212 Business Statistics II 1

LI EAR REGRESSIO A D CORRELATIO

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Six Sigma Black Belt Study Guides

STATISTICS 110/201 PRACTICE FINAL EXAM

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 4: Regression Models

Confidence Interval for the mean response

Correlation & Simple Regression

Chapter 16. Simple Linear Regression and Correlation

Model Building Chap 5 p251

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

28. SIMPLE LINEAR REGRESSION III

School of Mathematical Sciences. Question 1. Best Subsets Regression

STA121: Applied Regression Analysis

SMAM 314 Practice Final Examination Winter 2003

Chapter 14 Multiple Regression Analysis

Finding Relationships Among Variables

TMA4255 Applied Statistics V2016 (5)

Simple Linear Regression

Simple Linear Regression

Inference for the Regression Coefficient

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

y response variable x 1, x 2,, x k -- a set of explanatory variables

A discussion on multiple regression models

Inference for Regression

School of Mathematical Sciences. Question 1

STAT Chapter 11: Regression

1 Introduction to Minitab

Inference for Regression Simple Linear Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

MBA Statistics COURSE #4

Lecture 3: Inference in SLR

23. Inference for regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Regression Analysis. BUS 735: Business Decision Making and Research

Concordia University (5+5)Q 1.

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Simple Linear Regression: A Model for the Mean. Chap 7

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Ch 2: Simple Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

9. Linear Regression and Correlation

Regression Analysis IV... More MLR and Model Building

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

What is a Hypothesis?

ECO220Y Simple Regression: Testing the Slope

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

CHAPTER EIGHT Linear Regression

Conditions for Regression Inference:

F-tests and Nested Models

Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Multiple Linear Regression

Lecture 11: Simple Linear Regression

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

1 Correlation and Inference from Regression

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

16.3 One-Way ANOVA: The Procedure

STA 4210 Practise set 2a

Variance Decomposition and Goodness of Fit

Applied Regression Analysis. Section 2: Multiple Linear Regression

Multiple Linear Regression

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Sociology 593 Exam 1 Answer Key February 17, 1995

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ordinary Least Squares Regression Explained: Vartanian

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Transcription:

1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple Regression Model 14.3 Standard Deviation of Errors 14.4 Coefficient of Multiple Determination I n Chapter 13, we discussed simple linear regression and linear correlation. A simple regression model includes one independent and one dependent variable, and it presents a very simplified scenario of real-world situations. In the real world, a dependent variable is usually influenced by a number of independent variables. For example, the sales of a company s product may be determined by the price of that product, the quality of the product, and advertising expenditure incurred by the company to promote that product. Therefore, it makes more sense to use a regression model that includes more than one independent variable. Such a model is called a multiple regression model. In this chapter we will discuss multiple regression models. 14.5 Computer Solution of Multiple Regression 14.1 Multiple Regression Analysis The simple linear regression model discussed in Chapter 13 was written as y A Bx This model includes one independent variable, which is denoted by x, and one dependent variable, which is denoted by y. As we know from Chapter 13, the term represented by in the above model is called the random error. Usually a dependent variable is affected by more than one independent variable. When we include two or more independent variables in a regression model, it is called a multiple regression model. Remember, whether it is a simple or a multiple regression model, it always includes one and only one dependent variable. 614

14.1 Multiple Regression Analysis 615 A multiple regression model with y as a dependent variable and x 1, x 2, x 3, p, x k as independent variables is written as y A B 1 x 1 B 2 x 2 B 3 x 3 p B k x k (1) where A represents the constant term, B 1, B 2, B 3, p, B k are the regression coefficients of independent variables x 1, x 2, x 3, p, x k, respectively, and represents the random error term. This model contains k independent variables x 1, x 2, x 3, p, and x k. From model (1), it would seem that multiple regression models can only be used when the relationship between the dependent variable and each independent variable is linear. Furthermore, it also appears as if there can be no interaction between two or more of the independent variables. This is far from the truth. In the real world, a multiple regression model can be much more complex. Discussion of such models is outside the scope of this book. When each term contains a single independent variable raised to the first power as in model (1), we call it a first-order multiple regression model. This is the only type of multiple regression model we will discuss in this chapter. In regression model (1), A represents the constant term, which gives the value of y when all independent variables assume zero values. The coefficients B 1, B 2, B 3, p, and B k are called the partial regression coefficients. For example, B 1 is a partial regression coefficient of x 1. It gives the change in y due to a one-unit change in x 1 when all other independent variables included in the model are held constant. In other words, if we change x 1 by one unit but keep x 2, x 3, p, and x k unchanged, then the resulting change in y is measured by B 1. Similarly the value of B 2 gives the change in y due to a one-unit change in x 2 when all other independent variables are held constant. In model (1) above, A, B 1, B 2, B 3, p, and B k are called the true regression coefficients or population parameters. A positive value for a particular B i in model (1) will indicate a positive relationship between y and the corresponding x i variable. A negative value for a particular B i in that model will indicate a negative relationship between y and the corresponding x i variable. Remember that in a first-order regression model such as model (1), the relationship between each x and y is a straight-line relationship. In model (1), A B 1 x 1 B 2 x 2 B 3 x 3 p i B k x k is called the deterministic portion and is the stochastic portion of the model. When we use the t distribution to make inferences about a single parameter of a multiple regression model, the degrees of freedom are calculated as df n k 1 where n represents the sample size and k is the number of independent variables in the model. Definition Multiple Regression Model A regression model that includes two or more independent variables is called a multiple regression model. It is written as y A B 1 x 1 B 2 x 2 B 3 x 3 p B k x k where y is the dependent variable, x 1, x 2, x 3, p, x k are the k independent variables, and is the random error term. When each of the x i variables represents a single variable raised to the first power as in the above model, this model is referred to as a first-order multiple regression model. For such a model with a sample size of n and k independent variables, the degrees of freedom are: df n k 1 When a multiple regression model includes only two independent variables (with model (1) reduces to y A B 1 x 1 B 2 x 2 A multiple regression model with three independent variables (with k 32 is written as y A B 1 x 1 B 2 x 2 B 3 x 3 k 22,

616 Chapter 14 Multiple Regression If model (1) is estimated using sample data, which is usually the case, the estimated regression equation is written as ŷ a b 1 x 1 b 2 x 2 b 3 x 3 p b k x k (2) In equation (2), a, b 1, b 2, b 3, p, and b k are the sample statistics, which are the point estimators of the population parameters A, B 1, B 2, B 3, p, and B k, respectively. In model (1), y denotes the actual values of the dependent variable for members of the sample. In the estimated model (2), ŷ denotes the predicted or estimated values of the dependent variable. The difference between any pair of y and ŷ values gives the error of prediction. For a multiple regression model, SSE a 1y ŷ2 2 where SSE stands for the error sum of squares. As in Chapter 13, the estimated regression equation (2) is obtained by minimizing the sum of squared errors, that is, Minimize a 1y ŷ22 The estimated equation (2) obtained by minimizing the sum of squared errors is called the least squares regression equation. Usually the calculations in a multiple regression analysis are made by using statistical software packages for computers, such as MINITAB, instead of using the formulas manually. Even for a multiple regression equation with two independent variables, the formulas are complex and manual calculations are time consuming. In this chapter we will perform the multiple regression analysis using MINITAB. The solutions obtained by using other statistical software packages such as JMP, SAS, S-Plus, or SPSS can be interpreted the same way. The TI-84 and Excel do not have built-in procedures for the multiple regression model. 14.2 Assumptions of the Multiple Regression Model Like a simple linear regression model, a multiple (linear) regression model is based on certain assumptions. The following are the major assumptions for the multiple regression model (1). Assumption 1: The mean of the probability distribution of is zero, that is, E1 2 0 If we calculate errors for all measurements for a given set of values of independent variables for a population data set, the mean of these errors will be zero. In other words, while individual predictions will have some amount of errors, on average our predictions will be correct. Under this assumption, the mean value of y is given by the deterministic part of regression model (1). Thus, E1y2 A B 1 x 1 B 2 x 2 B 3 x 3 p B k x k where E1y2 is the expected or mean value of y for the population. This mean value of y is also denoted by m y0x1, x 2, p, x k. Assumption 2: The errors associated with different sets of values of independent variables are independent. Furthermore, these errors are normally distributed and have a constant standard deviation, which is denoted by s. Assumption 3: The independent variables are not linearly related. However, they can have a nonlinear relationship. When independent variables are highly linearly correlated, it is referred to as multicollinearity. This assumption is about the nonexistence of the multicollinearity problem. For example, consider the following multiple regression model: y A B 1 x 1 B 2 x 2 B 3 x 3

14.4 Coefficient of Multiple Determination 617 All of the following linear relationships (and other such linear relationships) between x 1, x 2, and x 3 should be invalid for this model. If any linear relationship exists, we can substitute one variable for another, which will reduce the number of independent variables to two. However, nonlinear relationships, such as x 1 4x 2 and x 2 2x 1 6x 2 2 3 between x 1, x 2, and x 3 are permissible. In practice, multicollinearity is a major issue. Examining the correlation for each pair of independent variables is a good way to determine if multicollinearity exists. Assumption 4: There is no linear association between the random error term and each independent variable x i. 14.3 Standard Deviation of Errors The standard deviation of errors (also called the standard error of the estimate) for the multiple regression model (1) is denoted by s, and it is a measure of variation among errors. However, when sample data are used to estimate multiple regression model (1), the standard deviation of errors is denoted by s e. The formula to calculate s e is as follows. SSE s where SSE a 1y ŷ2 2 e B n k 1 Note that here SSE is the error sum of squares. We will not use this formula to calculate s e manually. Rather we will obtain it from the computer solution. Note that many software packages label as Root MSE, where MSE stands for mean square error. s e x 1 x 2 4x 3 x 2 5x 1 2x 3 x 1 3.5x 2 14.4 Coefficient of Multiple Determination In Chapter 13, we denoted the coefficient of determination for a simple linear regression model by r 2 and defined it as the proportion of the total sum of squares SST that is explained by the regression model. The coefficient of determination for the multiple regression model, usually called the coefficient of multiple determination, is denoted by R 2 and is defined as the proportion of the total sum of squares SST that is explained by the multiple regression model. It tells us how good the multiple regression model is and how well the independent variables included in the model explain the dependent variable. Like r 2, the value of the coefficient of multiple determination R 2 always lies in the range 0 to 1, that is, 0 R 2 1 Just as in the case of the simple linear regression model, SST is the total sum of squares, SSR is the regression sum of squares, and SSE is the error sum of squares. SST is always equal to the sum of SSE and SSR. They are calculated as follows. SSE a e 2 a 1y ŷ2 2 SST SS yy a 1y y2 2 SSR a 1ŷ y2 2 SSR is the portion of SST that is explained by the use of the regression model, and SSE is the portion of SST that is not explained by the use of the regression model. The coefficient of multiple determination is given by the ratio of SSR and SST as follows. R 2 SSR SST

618 Chapter 14 Multiple Regression R 2 The coefficient of multiple determination has one major shortcoming. The value of generally increases as we add more and more explanatory variables to the regression model (even if they do not belong in the model). Just because we can increase the value of R 2 does not imply that the regression equation with a higher value of R 2 does a better job of predicting the dependent variable. Such a value of R 2 will be misleading, and it will not represent the true explanatory power of the regression model. To eliminate this shortcoming of R 2, it is preferable to use the adjusted coefficient of multiple determination, which is denoted by R 2. Note that R 2 is the coefficient of multiple determination adjusted for degrees of freedom. The value of R 2 may increase, decrease, or stay the same as we add more explanatory variables to our regression model. If a new variable added to the regression model contributes significantly to explain the variation in y, then R 2 increases; otherwise it decreases. The value of R 2 is calculated as follows. SSE/1n k 12 R 2 1 11 R 2 n 1 2a or 1 n k 1 b SST/1n 12 Thus, if we know R 2, we can find the vlaue of R 2. Almost all statistical software packages give the values of both R 2 and R 2 for a regression model. Another property of R 2 to remember is that whereas R 2 can never be negative, R 2 can be negative. While a general rule of thumb is that a higher value of R 2 implies that a specific set of independent variables does a better job of predicting a specific dependent variable, it is important to recognize that some dependent variables have a great deal more variability than others. Therefore, R 2.30 could imply that a specific model is not a very strong model, but it could be the best possible model in a certain scenario. Many good financial models have values of R 2 below.50. 14.5 Computer Solution of Multiple Regression In this section, we take an example of a multiple regression model, solve it using MINITAB, interpret the solution, and make inferences about the population parameters of the regression model. R 2 Using MINITAB to find a multiple regression equation. EXAMPLE 14 1 A researcher wanted to find the effect of driving experience and the number of driving violations on auto insurance premiums. A random sample of 12 drivers insured with the same company and having similar auto insurance policies was selected from a large city. Table 14.1 lists Table 14.1 Number of Monthly Premium Driving Experience Driving Violations (dollars) (years) (past 3 years) 148 5 2 76 14 0 100 6 1 126 10 3 194 4 6 110 8 2 114 11 3 86 16 1 198 3 5 92 9 1 70 19 0 120 13 3

14.5 Computer Solution of Multiple Regression 619 the monthly auto insurance premiums (in dollars) paid by these drivers, their driving experiences (in years), and the numbers of driving violations committed by them during the past three years. Using MINITAB, find the regression equation of monthly premiums paid by drivers on the driving experiences and the numbers of driving violations. Solution Let y the monthly auto insurance premium 1in dollars2 paid by a driver x 1 the driving experience 1in years2 of a driver x 2 the number of driving violations committed by a driver during the past three years We are to estimate the regression model y A B 1 x 1 B 2 x 2 (3) The first step is to enter the data of Table 14.1 into MINITAB spreadsheet as shown in Screen 14.1. Here we have entered the given data in columns C1, C2, and C3 and named them Monthly Premium, Driving Experience and Driving Violations, respectively. Screen 14.1 To obtain the estimated regression equation, select Stat Regression Regression. In the dialog box you obtain, enter Monthly Premium in the Response box, and Driving Experience and Driving Violations in the Predictors box as shown in Screen 14.2. Note that you can enter the column names C1, C2, and C3 instead of variable names in these boxes. Click OK to obtain the output, which is shown in Screen 14.3. From the output given in Screen 14.3, the estimated regression equation is: ŷ 110 2.75x 1 16.1x 2

620 Chapter 14 Multiple Regression Screen 14.2 Screen 14.3

14.5 Computer Solution of Multiple Regression 621 14.5.1 Estimated Multiple Regression Model Example 14 2 describes, among other things, how the coefficients of the multiple regression model are interpreted. EXAMPLE 14 2 Refer to Example 14 1 and the MINITAB solution given in Screen 14.3. (a) Explain the meaning of the estimated regression coefficients. (b) What are the values of the standard deviation of errors, the coefficient of multiple determination, and the adjusted coefficient of multiple determination? (c) What is the predicted auto insurance premium paid per month by a driver with seven years of driving experience and three driving violations committed in the past three years? (d) What is the point estimate of the expected (or mean) auto insurance premium paid per month by all drivers with 12 years of driving experience and 4 driving violations committed in the past three years? Interpreting parts of the MINITAB solution of multiple regression. Solution (a) From the portion of the MINITAB solution that is marked I in Screen 14.3, the estimated regression equation is ŷ 110 2.75x 1 16.1x 2 (4) From this equation, a 110, b 1 2.75, and b 2 16.1 We can also read the values of these coefficients from the column labeled Coef in the portion of the output marked II in the MINITAB solution of Screen 14.3. From this column we obtain a 110.28, b 1 2.7473, and b 2 16.106 Notice that in this column the coefficients of the regression equation appear with more digits after the decimal point. With these coefficient values, we can write the estimated regression equation as ŷ 110.28 2.7473x 1 16.106x 2 (5) The value of a 110.28 in the estimated regression equation (5) gives the value of ŷ for x 1 0 and x 2 0. Thus, a driver with no driving experience and no driving violations committed in the past three years is expected to pay an auto insurance premium of $110.28 per month. Again, this is the technical interpretation of a. In reality, that may not be true because none of the drivers in our sample has both zero experience and zero driving violations. As all of us know, some of the highest premiums are paid by teenagers just after obtaining their drivers licenses. The value of b 1 2.7473 in the estimated regression model gives the change in ŷ for a one-unit change in x 1 when x 2 is held constant. Thus, we can state that a driver with one extra year of experience but the same number of driving violations is expected to pay $2.7473 (or $2.75) less per month for the auto insurance premium. Note that because b 1 is negative, an increase in driving experience decreases the premium paid. In other words, y and x 1 have a negative relationship. The value of b 2 16.106 in the estimated regression model gives the change in ŷ for a one-unit change in x 2 when x 1 is held constant. Thus, a driver with one extra driving violation during the past three years but with the same years of driving experience is expected to pay $16.106 (or $16.11) more per month for the auto insurance premium.

622 Chapter 14 Multiple Regression (b) (c) (d) The values of the standard deviation of errors, the coefficient of multiple determination, and the adjusted coefficient of multiple determination are given in part III of the MINITAB solution of Screen 14.3. From this part of the solution, s R 2 and R 2 e 12.1459, 93.1%, 91.6% Thus, the standard deviation of errors is 12.1459. The value of R 2 93.1% tells us that the two independent variables, years of driving experiences and the numbers of driving violations, explain 93.1% of the variation in the auto insurance premiums. The value of R 2 91.6% is the value of the coefficient of multiple determination adjusted for degrees of freedom. It states that when adjusted for degrees of freedom, the two independent variables explain 91.6% of the variation in the dependent variable. To Find the predicted auto insurance premium paid per month by a driver with seven years of driving experience and three driving violations during the past three years, we substitute x 1 7 and x 2 3 in the estimated regression model (5). Thus, ŷ 110.28 2.7473x 1 16.106x 2 110.28 2.7473 172 16.106 132 $139.37 Note that this value of ŷ is a point estimate of the predicted value of y, which is denoted by y p. The concept of the predicted value of y is the same as that for a simple linear regression model discussed in Section 13.8.2 of Chapter 13. To obtain the point estimate of the expected (mean) auto insurance premium paid per month by all drivers with 12 years of driving experience and four driving violations during the past three years, we substitute x 1 12 and x 2 4 in the estimated regression equation (5). Thus, ŷ 110.28 2.7473x 1 16.106x 2 110.28 2.7473 1122 16.106 142 $141.74 This value of ŷ is a point estimate of the mean value of y, which is denoted by E1y2 or m y0x1 x 2. The concept of the mean value of y is the same as that for a simple linear regression model discussed in Section 13.8.1 of Chapter 13. 14.5.2 Confidence Interval for an Individual Coefficient The values of a, b 1, b 2, b 3, p, and b k obtained by estimating model (1) using sample data give the point estimates of A, B 1, B 2, B 3, p, and B k, respectively, which are the population parameters. Using the values of sample statistics a, b 1, b 2, b 3, p, and b k, we can make confidence intervals for the corresponding population parameters A, B 1, B 2, B 3, p, and B k, respectively. Because of the assumption that the errors are normally distributed, the sampling distribution of each b i is normal with its mean equal to B i and standard deviation equal to s bi. For example, the sampling distribution of b 1 is normal with its mean equal to B 1 and standard deviation equal to s b1. However, usually s e is not known and, hence, we cannot find s bi. Consequently, we use s bi as an estimator of s bi and use the t distribution to determine a confidence interval for B i. The formula to obtain a confidence interval for a population parameter B i is given below. This is the same formula we used to make a confidence interval for B in Section 13.5.2 of Chapter 13. The only difference is that to make a confidence interval for a particular B i for a multiple regression model, the degrees of freedom are n k 1. Confidence Interval for B i The 11 a2 100% confidence interval for B i is given by b i ts bi The value of t that is used in this formula is obtained from the t distribution table for a 2 area in the right tail of the t distribution curve and 1n k 12 degrees of freedom. The values of b i and s bi are obtained from the computer solution.

14.5 Computer Solution of Multiple Regression 623 Example 14 3 describes the procedure to make a confidence interval for an individual regression coefficient B i. EXAMPLE 14 3 B 1 Determine a 95% confidence interval for (the coefficient of experience) for the multiple regression of auto insurance premium on driving experience and the number of driving violations. Use the MINITAB solution of Screen 14.3. Making a confidence interval for an individual coefficient of a multiple regression model. Solution To make a confidence interval for B 1, we use the portion marked II in the MINITAB solution of Screen 14.3. From that portion of the MINITAB solution, b 1 2.7473 and s b1.9770 Note that the value of the standard deviation of b 1, s b1.9770, is given in the column labeled SE Coef in part II of the MINITAB solution. The confidence level is 95%. The area in each tail of the t distribution curve is obtained as follows. The sample size is 12, which gives n 12. Because there are two independent variables, k 2. Therefore, From the t distribution table (Table V of Appendix C), the value of t for.025 area in the right tail of the t distribution curve and 9 degrees of freedom is 2.262. Then, the 95% confidence interval for is B 1 Area in each tail of the t distribution 11.952 2.025 Degrees of freedom n k 1 12 2 1 9 b 1 ts b1 2.7473 2.2621.97702 2.7473 2.2100 4.9573 to.5373 Thus, the 95% confidence interval for b 1 is 4.96 to.54. That is, we can state with 95% confidence that for one extra year of driving experience, the monthly auto insurance premium changes by an amount between $4.96 and $.54. Note that since both limits of the confidence interval have negative signs, we can also state that for each extra year of driving experience, the monthly auto insurance premium decreases by an amount between $.54 and $4.96. By applying the procedure used in Example 14 3, we can make a confidence interval for any of the coefficients (including the constant term) of a multiple regression model, such as A and in model (3). For example, the 95% confidence intervals for A and B 2, respectively, are B 2 a ts a 110.28 2.262114.622 77.21 to 143.35 b 2 ts b2 16.106 2.26212.6132 10.20 to 22.02 14.5.3 Testing a Hypothesis about an Individual Coefficient We can perform a test of hypothesis about any of the coefficients of the regression model (1) using the same procedure that we used to make a test of hypothesis about B for a simple regression model in Section 13.5.3 of Chapter 13. The only difference is that the degrees of freedom are equal to n k 1 for a multiple regression model. Again, because of the assumption that the errors are normally distributed, the sampling distribution of each b i is normal with its mean equal to B i and standard deviation equal to s bi. However, usually s e is not known and, hence, we cannot find s bi. Consequently, we use s bi as an estimator of s bi, and use the t distribution to perform the test. B i

624 Chapter 14 Multiple Regression Test Statistic for b i The value of the test statistic t for is calculated as t b i B i s bi The value of B i is substituted from the null hypothesis. Usually, but not always, the null hypothesis is H 0 : B i 0. The MINITAB solution contains this value of the t statistic. b i Example 14 4 illustrates the procedure for testing a hypothesis about a single coefficient. Testing a hypothesis about a coefficient of a multiple regression model. EXAMPLE 14 4 Using the 2.5% significance level, can you conclude that the coefficient of the number of years of driving experience in regression model (3) is negative? Use the MINITAB output obtained in Example 14 1 and shown in Screen 14.3 to perform this test. Solution From Example 14 1, our multiple regression model (3) is y A B 1 x 1 B 2 x 2 where y is the monthly auto insurance premium (in dollars) paid by a driver, x 1 is the driving experience (in years), and x 2 is the number of driving violations committed during the past three years. From the MINITAB solution, the estimated regression equation is ŷ 110.28 2.7473x 1 16.106x 2 To conduct a test of hypothesis about B 1, we use the portion marked II in the MINITAB solution given in Screen 14.3. From that portion of the MINITAB solution, b 1 2.7473 and s b1.9770 Note that the value of the standard deviation of b 1, s b1.9770, is given in the column labeled SE Coef in part II of the MINITAB solution. To make a test of hypothesis about B 1, we perform the following five steps. Step 1. State the null and alternative hypotheses. We are to test whether or not the coefficient of the number of years of driving experience in regression model (3) is negative, that is, whether or not B 1 is negative. The two hypotheses are H 0 : B 1 0 H 1 : B 1 6 0 Note that we can also write the null hypothesis as H 0 : B 1 0, which states that the coefficient of the number of years of driving experience in the regression model (3) is either zero or positive. Step 2. Select the distribution to use. The sample size is small 1n 6 302 and s e is not known. The sampling distribution of b 1 is normal because the errors are assumed to be normally distributed. Hence, we use the t distribution to make a test of hypothesis about B 1. Step 3. Determine the rejection and nonrejection regions. The significance level is.025. The sign in the alternative hypothesis indicates that the test is left-tailed. Therefore, area in the left tail of the t distribution curve is.025. The degrees of freedom are: df n k 1 12 2 1 9 From the t distribution table (Table V in Appendix C), the critical value of t for 9 degrees of freedom and.025 area in the left tail of the t distribution curve is 2.262, as shown in Figure 14.1.

14.5 Computer Solution of Multiple Regression 625 Reject H 0 Do not reject H 0 Figure 14.1 =.025 2.262 0 Critical value of t t Step 4. Calculate the value of the test statistic and p-value. The value of the test statistic t for b 1 can be obtained from the MINITAB solution given in Screen 14.3. This value is given in the column labeled T and the row named Driving Experience in the portion marked II in that MINITAB solution. Thus, the observed value of t is Also, in the same portion of the MINITAB solution, the p-value for this test is given in the column labeled P and the row named Driving Experience. This p-value is.020. However, MINITAB always gives the p-value for a two-tailed test. Because our test is one-tailed, the p-value for our test is Step 5. Make a decision. The value of the test statistic, t 2.81, is less than the critical value of t 2.262 and it falls in the rejection region. Consequently, we reject the null hypothesis and conclude that the coefficient of x 1 in regression model (3) is negative. That is, an increase in the driving experience decreases the auto insurance premium. Also the p-value for the test is.010, which is less than the significance level of a.025. Hence, based on this p-value also, we reject the null hypothesis and conclude that B 1 is negative. Note that the observed value of t in Step 4 of Example 14 4 is obtained from the MINITAB solution only if the null hypothesis is H 0 : B 1 0. However, if the null hypothesis is that B 1 is equal to a number other than zero, then the t value obtained from the MINITAB solution is no longer valid. For example, suppose the null hypothesis in Example 14 4 is and the alternative hypothesis is In this case the observed value of t will be calculated as t b 1 B 1 s b1 t b 1 B 1 s b1 p-value.020 2.010 H 0 : B 1 2 H 1 : B 1 6 2 2.81 2.7473 1 22.765.9770 To calculate this value of t, the values of b 1 and s b1h0 are obtained from the MINITAB solution of Screen 14.3. The value of B 1 is substituted from. EXERCISES CONCEPTS AND PROCEDURES 14.1 How are the coefficients of independent variables in a multiple regression model interpreted? Explain. 14.2 What are the degrees of freedom for a multiple regression model to make inferences about individual parameters?

Correlations: Monthly Prem, Driving Exper, No. of violations Monthly Prem Driving Exper -0.800 0.002 Driving Exper No. of violation 0.933-0.660 0.000 0.020 Cell Contents: Pearson correlation P-Value Regression Analysis: Monthly Prem versus Driving Exper, No. of violation The regression equation is Monthly Prem = 110-2.75 Driving Exper + 16.1 No. of violations Predictor Coef SE Coef T P VIF Constant 110.28 14.62 7.54 0.000 Driving Exper -2.7473 0.9770-2.81 0.020 1.771 No. of violations 16.106 2.613 6.16 0.000 1.771 S = 12.1459 R-Sq = 93.1% R-Sq(adj) = 91.6% Analysis of Variance Source DF SS MS F P Regression 2 17961.3 8980.6 60.88 0.000 Residual Error 9 1327.7 147.5 Total 11 19289.0 Source DF Seq SS Driving Exper 1 12357.8 No. of violations 1 5603.5 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 115.02 3.55 (106.98, 123.05) (86.39, 143.64) Values of Predictors for New Observations New Driving No. of Obs Exper violations 1 10.0 2.00

Correlations: Monthly Prem, Driving Exper, No. of violations Monthly Prem Driving Exper -0.800 0.002 Driving Exper Standard deviation of random errors Se= 12.1459 No. of violation 0.933-0.660 0.000 0.020 Cell Contents: Pearson correlation P-Value Regression Analysis: Monthly Prem versus Driving Exper, No. of violation = 110 1= -2.75 2= 16.1 The regression equation is Monthly Prem = 110-2.75 Driving Exper + 16.1 No. of violations Predictor Coef SE Coef T P VIF Constant 110.28 14.62 7.54 0.000 Driving Exper -2.7473 0.9770-2.81 0.020 1.771 No. of violations 16.106 2.613 6.16 0.000 1.771 S = 12.1459 R-Sq = 93.1% R-Sq(adj) = 91.6% Analysis of Variance Sb1 = 0.9770 Sb2 = 2.613 to= -2.81 for b1 to= 6.16 for b2 Source DF SS MS F P Regression 2 17961.3 8980.6 60.88 0.000 Residual Error 9 1327.7 147.5 Total 11 19289.0 Source DF Seq SS Driving Exper 1 12357.8 No. of violations 1 5603.5 SSyy = 19289 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 115.02 3.55 (106.98, 123.05) (86.39, 143.64) Values of Predictors for New Observations New Driving No. of Obs Exper violations 1 10.0 2.00 ( ) ( )