Chapter 3: Multiple Regression. August 14, 2018

Size: px

Start display at page:

Download "Chapter 3: Multiple Regression. August 14, 2018"

Luke Hopkins
5 years ago
Views:

1 Chapter 3: Multiple Regression August 14, 2018

2 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ j,j = 0,1,,k, are called the regression coefficients. This model describes a hyperplane in thek-dimensional space of the regressor variables x j. The parameter β j

3 represents the expected change in the response y per unit change in x j when all of the remaining regressor variables x i (i j) are held constant. For this reason the parameters β j, j = 1,2,,k, are often called partial regression coefficients. To estimateβ s in (1), we will use a sample ofnobservations ony and the associatedx s. The model for theith observation is y i = β 0 +β 1 x i1 + +β k x ik +e i, i = 1,2,,n.

4 The assumptions for e i or y i are analogous to as those for simple linear regression, namely: 1. E(e i ) = 0 for i = 1,2,,n, or, equivalentlye(y i ) = β 0 +β 1 x i1 + +β k x ik. 2. var(e i ) = σ 2 fori = 1,2,,n, or, equivalently, var(y i )) = σ cov(e i,e j ) = 0 for alli j, or, equivalently, cov(y i,y j ) = 0.

5 2 Terms and Predictions Regression problems start with a collection of potential predictors, which are either continuous or discrete. From the pool of potential predictors, we create a set of terms that are the X-variable that appear in (1). The terms might include:

6 The intercept The mean function can be rewritten as E(Y X) = β 0 X 0 +β 1 X 1 + +β p X p, wherex 0 is a term that is always equal to one. Transformations of predictors Sometimes the original predictors need to be transformed in some way to make (1) hold to a reasonable approximation.

7 Polynomials Problems with curbed mean functions can sometimes be accommodated in the multiple linear regression model by including polynomial terms in the predictor variables. Interactions and other combinations of predictors Products of predictors called interactions are often included in a mean function along with the original predictors to allow for joint effect of two or more variables.

8 Dummy variables and factors A categorical predictor with two or more levels is called a factor. Factors are included in multiple linear regression using dummy variables, which are typically terms that have only two values, often zero and one, indicating which category is present for a particular observation. A regression with k predictors may combine to give fewer thank terms or expand to require more

9 thank terms. Figure 1 shows the scatterplot matrix for the fuel consumption data. In this plot, the relationships between all pairs of terms appear to be very weak, suggesting that for this problem the marginal plots including Fuel are quite information about the multiple linear regression problem. A more traditional and less informative, summary of the two-variable relationships is the matrix

10 Fuel Tax Dlic Income logmiles

11 of sample correlations, shown in Table 3.2. In this instance, the correlation matrix helps to reinforce the relationships we see in the scatterplot matrix, with fairly small correlations between the predictors and Fuel, and essentially no correlation between the predictors themselves.

12 Table 1: Sample correlation for the fuel data. Tax Dlic Income logmiles Fuel Tax Dlic Income logmiles Fuel

13 3 Ordinary Least Squares 3.1 Parameter estimation In matrix notation, the model given by eq. (1) is y = Xβ +ǫ,

14 where β = β 0 β 1. ǫ = ǫ 1 ǫ 2. y = y 1 y 2. β k X = ǫ n 1 x 11 x 12 x 1k 1 x 21 x 22 x 2k..... y n 1 x n1 x n2 x nk

15 We wish to find the vector of least squares estimators, ˆβ, that minimizes S(β) = n i=1 e 2 i = e e = (y Xβ) (y Xβ) = y y 2β X y +β X Xβ. The least squares estimators must satisfy S β = 2X y +2X Xˆβ = 0 which simplifies to X Xˆβ = X y.

16 The above equations are the least squares normal equations. The least squares estimator ofβ is ˆβ = (X X) 1 X y provided that the inverse matrix (X X) 1 exists. The matrix (X X) 1 will always exist if the regressors are linearly independent, that is, if no column of thex matrix is a linear combination of the other columns. The fitted regression model corresponding to

17 the levels of the regressor variablesx = [1,x 1,x 2,,x k ] is k ŷ = x ˆβ = ˆβ0 + The vector of fitted valuesŷ i is j=1 ˆβ j x j. ŷ = Xˆβ = X(X X) 1 X y = Hy where n n matrix H = X(X X) 1 X is usually called the hat matrix. Thenresiduals may

18 be conveniently written as ê = y ŷ = y Xˆβ = y Hy = (I H)y. 3.2 Properties of the least-squares estimators Theorem 3.1 IfE(y) = Xβ, then ˆβ is an unbiased estimator for ˆβ. Theorem 3.2 If cov(y)=σ 2 I, the covariance matrix for ˆβ is given byσ 2 (X X) 1. Theorem 3.3 (Gauss-Markov theorem) IfE(y) =

19 Xβ and cov(y) = σ 2 I, the least squares estimators ˆβ j, j = 0,1,,k, have minimum variance among all linear unbiased estimators.

20 3.3 Estimation ofσ 2 The residual sum of squares SS Res = n (y i ŷ i ) 2 = e e i=1 = (y Xβ) (y Xβ) = y (I H)y. The residual mean square is MS Res = SS Res n p,

21 and it is an unbiased estimate ofσ Fuel Consumption Data Fit the fuel data by a multiple linear regression model with mean functione(fuel X) = β 0 +β 1 Tax+ β 2 Dlic+β 3 Income+β 4 log(miles).

22 The5 5matrix(X X) 1 is given by Intercept Tax Dlic Income logmiles Intercept e e e e-01 Tax e e e e-04 Dlic e e e e-06 Income e e e e-03 logmiles e e e e-03 The coefficients can then be calculated as β = (X X) 1 X Y = ( , 4.228,0.472, 6.135,18.545). The output gives the estimates ˆβ and their stan-

23 dard errors computed based on ˆσ 2 and the diagonal element of(x X) 1. Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Tax Dlic Income logmiles Residual standard error: on 46 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 46 DF, p-value: 9.33e-07

24 4 The analysis of variance The test for significance of regression is a test to determine if there is a linear relationship between the responsey and any of the regressor variables x 1,x 2,,x k. This procedure is often thought of as an overall or global test of model adequacy. The appropriate hypothesis are H 0 :β 1 = β 2 = = β k = 0 H 1 :β j 0 for at least one j

25 Rejection of this null hypothesis implies that at least one of the regressorsx 1,,x k contributes significantly to the model. The test is based on the identity: SS T = SS R +SS Res, where SS R = ˆβ X y nȳ 2, SS Res = y y ˆβ X y, andss T = y y nȳ 2 ; and the following ANOVA table Therefore, to test the null hypothesis, compute

26 Source of Sum of Degrees of Mean F 0 Variation Squares Freedom Square Regress SS R k MS R MS R MS Res Residual SS Res n k 1 MS Res Total SS T n 1 Table 2: Analysis of Variance (ANOVA) for testing significance of regression the test statisticf 0 and rejecth 0 iff 0 > F α,k,n 1.

27 4.1 R 2 and AdjustedR 2 R 2 = 1 SS Res SS T R 2 Adj = 1 SS Res/(n p) SS T /(n 1) The adjusted R 2 penalizes us for adding terms that are not helpful, so it is very useful in evaluating and comparing candidate regression models. by The overall ANOVA table for the fuel data is given

28 Source of Sum of Degrees of Mean F 0 Variation Squares Freedom Square Regress Residual Total To get a significance level for the test, we would comparef 0 = with thef(4,46) distribution. Since the probability Pr(> F 0 ) = 9.33e 07, a very small number, leading to a very strong

29 evidence against the null hypothesis that the mean function does not depend on any of the terms. The value of R 2 = / = indicates that about half the variation in Fuel is explained by the terms.

30 4.2 Tests on individual regression coefficients The hypotheses for testing the significance of any individual regression coefficient, such asβ j, are H 0 :β i = 0 H 1 :β j 0 If H 0 is not rejected, then this indicates that the regressor x j can be deleted from the model. The

31 test statistic for this hypothesis is t 0 = ˆβ j ˆσ2 C jj = ˆβ j se(ˆβ j ), where C jj is the diagonal element of (X X) 1 corresponding to ˆβ j. The null hypothesis is rejected if t 0 > t α/2,n k 1. Note that this is really a partial or marginal test because the regression coefficient ˆβ j depends on all of the other regressor variablesx i (i j) that are in the model.

32 Thus, this is a test of the contribution of x j given the other regressors in the model. Consider the regression model with k regressors y = Xβ +e = X 1 β 1 +X 2 β 2 +e, where p = k + 1, β 1 is a (p r)-vector of coefficients, andβ 2 is ar-vector of coefficients. We wish to test the hypothesis H 0 : β 2 = 0 H 1 : β 2 0

33 To find the contribution of the terms in β 2 to the regression, fit the model assuming that the null hypothesish 0 is true. The reduced model is y = X 1 β +e. The LS estimator of β 1 in the reduced model is ˆβ = (X 1X 1 ) 1 X y. The regression sum of squares is SS R (β 1 ) = ˆβ 1 X 1y ( n i=1 y i ) 2 /n

34 The regression sum of squares due to β 2 given thatβ 1 is SS R (β 2 β 1 ) = SS R (β) SS R (β 1 ) withp (p r) = r degrees of freedom. This sum of squares is called the extra sum of squares due toβ 2 because it measures the increase in the regression sum of squares that results from adding the regressors X 2 to a model that already contains X 1. Now SS R (β 2 β 1 ) is independent of

35 MS Res, and the null hypothesis H 0 : β 2 = 0 may be tested by the statistic F 0 = SS R(β 2 β 1 )/r MS Res. If F 0 > F α,r,n p, we reject H 0, concluding that at least one of the parameters in β 2 is not zero, and consequently at least one of the regressors x k r+1,,x k in X 2 contribute significantly to the regression model. This test is also a partial F test because it measures the contribution of the

36 regressors in X 2 given that the other regressors inx 1 are in the model. Analysis of Variance Table Model 1: Fuel Dlic + Income + logmiles Model 2: Fuel Tax + Dlic + Income + logmiles Res.Df RSS Df Sum of Sq F Pr(>F) Note that the t-statistic for Tax is t = 2.083, and t 2 = ( 2.083) 2 = 4.34, the same as the F -statistic we just computed.

37 5 Confidence interval: estimation of the mean response We may construct a confidence interval on the mean response at a particular pointx 0 = (1,x 01,,x 0k ). The fitted value at this point is ŷ 0 = x 0ˆβ This is an unbiased estimate ofe(y x 0 ), and the variance ofŷ 0 is Var(ŷ 0 ) = σ 2 x 0(X x) 1 x 0

38 Therefore, a100(1 α) percent confidence interval on the mean response at the pointx 0 is ˆσ ŷ 0 t 2 α/2,n p x 0 (X X) 1 x 0 E(y x 0 ) ˆσ ŷ 0 +t 2 α/2,n p x 0 (X X) 1 x 0

39 6 Prediction of new observations A point estimate of the future observationy 0 at the pointx 0 is ŷ 0 = x 0ˆβ. A 100(1 α) percent prediction interval for this future observation is ˆσ ŷ 0 t 2 α/2,n p (1+x 0 (X X) 1 x 0 ) E(y x 0 ) ˆσ ŷ 0 +t 2 α/2,n p (1+x 0 (X X) 1 x 0 )

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery