Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery time (y) 2. the number of cases of product stocked (x 1 ) 3. the distance walked by the route driver (x 2 ) With these variables, the regression model is y = β 0 + β 1 x 1 + β 2 x 2 + ε, where we assume that the errors ε i.i.d N(0,σ 2 ) In general, a multiple linear regression model with k regressor variables is y = β 0 + β 1 x 1 + + β k x k + ε, where ε i.i.d N(0,σ 2 ). Example with the delivery time data. Plot scatter diagram (matrix) 5 10 15 20 25 30 d.time 10 20 30 40 50 60 70 80 5 10 15 20 25 30 case distance 0 200 600 1000 1400 10 20 30 40 50 60 70 80 0 200 600 1000 1400 1
Data for multiple linear regression: Observation Response Regressors i y i x i1 x i2... x ik 1 y 1 x 11 x 12... x 1k 2 y 2 x 21 x 22... x 2k...... n y n x n1 x n2... x nk With the data above, the multiple regression model is y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + ε i, where i = 1, 2,..., n Multiple regression model (matrix notation): The model is where y = Xβ + ɛ, y 1 1 x 11 x 12... x 1k y y = 2 1 x, X = 21 x 22... x 2k..... y n 1 x n1 x n2... x nk β 0 ε 1 β β = 1 ε, and ɛ = 2.. β k ε n, In general, y is an n 1 matrix (vector) of the observation, X is an n p(= k +1) matrix of the regressor variables, β is a p(= k + 1) 1 vector of the regression coefficients and ɛ is an n 1 vector of random errors. The assumption of the error is ɛ N(0, σ 2 I n ) 2
2. Least-Squares Estimation In general, the multiple linear regression model is y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + ε i, (i = 1, 2,..., n) y = Xβ + ε, where ε N(0, σ 2 I n ) Wish to find ˆβ that minimize S(β) = n ε 2 i i=1 = ε ε = (y Xβ) (y Xβ) S(β) can be expressed as S(β) = (y Xβ) (y Xβ) = y y β X y y Xβ + β X Xβ = y y 2y Xβ + β X Xβ LS estimators must satisfy S β ˆβ which is = 2X y + 2X X ˆβ = 0 X X ˆβ = X y It is called the least-squares normal equations The least-squares estimators of β is ˆβ = (X X) 1 X y provided that (X X) 1 exists. 3
The fitted regression model is ŷ = X ˆβ = X(X X) 1 X y = Hy, where the n n matrix H = X(X X) 1 X is called the hat matrix The residual is e = y ŷ = y X ˆβ = y Hy = (I H)y Example (The delivery time data) The multiple regression model is y i = β 0 + β 1 x i1 + β 2 x i2 + ε i, (i = 1, 2,..., 25) y = Xβ + ε, Least-squares estimator of β is β = ˆβ 0 ˆβ 1 ˆβ 2 = 2.3412 1.6159 0.0143 The least-squares fit is ŷ = 2.3412 + 1.6159x 1 + 0.0143x 2 Properties of the Least-Squares Estimators Under the assumption E(ε) = 0 and V (ε) = σ 2 I n E(ˆβ) = β and V (ˆβ) = σ 2 (X X) 1 4
Estimation of σ 2 Estimation of σ 2 SS E = n e 2 i i=1 = e e = (y X ˆβ) (y X ˆβ) = y y 2ˆβ X y + ˆβ X X ˆβ = y y ˆβ X y SS E has n (k + 1) degree of freedom The estimator of σ 2 is Note that MS E is model dependent. ˆσ 2 = MS E = SS E n k 1 MS E is an unbiased estimator of σ 2, that is E(MS E ) = σ 2 Example (The delivery time data) SS E = y y ˆβ X y = 18310.63 18076.90 = 233.726 So the estimator is ˆσ 2 = 233.726 25 2 1 = 10.6239 Output from R (the delivery time data) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.341231 1.096730 2.135 0.044170 * 5
case 1.615907 0.170735 9.464 3.25e-09 *** distance 0.014385 0.003613 3.981 0.000631 *** Residual standard error: 3.259 on 22 degrees of freedom 3. Hypothesis testing in multiple linear regression Test for significance of regression The appropriate hypotheses are H 0 : β 1 = β 2 = = β k = 0 H 1 : β j 0 for at least one j Rejection the null hypothesis: at least one of the regressors x 1, x 2,..., x k contributes significantly to the model Do perform ANOVA 1. Decompose SS T into SS R and SS E. That is, y y 1 ( n ) 2 y i = {ˆβ X y 1 ( n ) 2 } { y i + y y n n ˆβ X y }. i=1 2. Consider degrees of freedom for each source of variation i=1 df SST = df SSR + df SSE (n 1) = k + (n k 1) 3. Compute MS 4. Obtain F-statistic MS R = SS R k 5. Decision rule: reject H 0 if and MS E = SS E n k 1 F 0 = MS R MS E F 0 > F α,k,(n k 1) 6
ANOVA table Source of Variation SS DF MS F 0 Regression ˆβ X y 1 n ( n i=1 y i ) 2 k MS R MS R /MS E Error y y ˆβ X y n k 1 MS E Total y y 1 n ( n i=1 y i ) 2 n 1 Example (the delivery time data) Source of Variation SS DF MS F 0 Regression 5550.8166 2 2775.408 261.24 Error 233.726 22 10.6239 Total 5784.54 24 Note that F 0.05,2,22 = 3.44 Tests on individual regression coefficients Test for the significance of any individual regression coefficient H 0 : β j = 0 H 1 : β j 0 Test statistic for the hypothesis is t 0 = ˆβ j ˆσ2 C jj = ˆβ j se( ˆβ j ), where C jj is the diagonal element of (X X) 1 corresponding to ˆβ j. Decision rule: reject H 0 if t 0 > t α/2,n k 1 Output from R (the delivery time data) 7
Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.341231 1.096730 2.135 0.044170 * case 1.615907 0.170735 9.464 3.25e-09 *** distance 0.014385 0.003613 3.981 0.000631 *** Residual standard error: 3.259 on 22 degrees of freedom Multiple R-Squared: 0.9596, Adjusted R-squared: 0.9559 F-statistic: 261.2 on 2 and 22 DF, p-value: 4.441e-016 Coefficient of multiple determination The R 2 is defined as R 2 = SS R SS T = 1 SS E SS T In general, R 2 always increases when a regressor is added to the model regardless of the value of the contribution of that variable. Use an adjusted R 2 defined as Example (the delivery time data) R 2 adj = 1 R 2 adj = 1 ( n 1 ) SSE n k 1 SS T ( ) 24 233.726 22 5784.54 = 0.9559 4. Confidence Intervals in Multiple Regression C.I. on the regression coefficients The test statistic ˆβ j β j ˆσ C jj, is distributed as t with n (k + 1) df j = 0, 1,..., k 8
100(1 α) percent C.I. for β j is ˆβ j t α/2,n (k+1) se( ˆβ j ) β j ˆβ j + t α/2,n (k+1) se( ˆβ j ) Example (the delivery time): find C.I. for β j. With ˆβ 1 = 1.61519, C 11 = 0.00274378, and ˆσ 2 = 10.6239, obtain 1.26181 β 1 1.97001 C.I. on the mean response at a particular point x 0 = (1, x 01,..., x 0k ) The fitted value at this point is ŷ 0 = x 0 ˆβ E(ŷ 0 ) = x 0β = E(y x 0 ) and V ar(ŷ 0 ) = σ 2 x 0(X X) 1 x 0 The test statistic is distributed as t with n (k + 1) df ŷ 0 E(y x 0 ) ˆσ x 0(X X) 1 x 0 100(1 α) percent C.I. on the mean response at the point x 0 is ŷ 0 t α/2,n (k+1)ˆσ x 0(X X) 1 x 0 E(y x 0 ) ŷ 0 + t α/2,n (k+1)ˆσ x 0(X X) 1 x 0 Example (the delivery time): find C.I. on the mean response at the point x 0 = (1, 8, 275). 17.66 E(y x 0 ) 20.78 5. Extra sum of squares method for testing Introduction The goal of this section is to investigate the contribution of a subset of the regressor variables to the model. 9
Let the vector of regression coefficients be partitioned as follows Wish to test the hypotheses β = β 1 β 2 H 0 : β 2 = 0 H 1 : β 2 0 The model my be written as y = Xβ + ε = X 1 β 1 + X 2 β 2 + ε Under the assumption that the null hypothesis is true, we have the reduced model as y = X 1 β 1 + ε Note that the full model is y = Xβ + ε = X 1 β 1 + X 2 β 2 + ε In the reduced model, the model sum of squares (SSR) is SS R (β 1 ) = ˆβ 1X 1y The SS R due to β 2 given that β 1 is already in the model is SS R (β 2 β 1 ) = SS R (β) SS R (β 1 ) where this sum of squares is called the extra sum of squares due to β 2 The test statistic the null hypothesis is F 0 = SS R(β 2 β 1 )/r MS E follows the distribution F r,n (k+1) under the null hypothesis 10
Partial F test Consider the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε The sum of squares, SS R (β 1 β 0, β 2, β 3 ) SS R (β 2 β 0, β 1, β 3 ) SS R (β 3 β 0, β 1, β 2 ) represent partial contribution to the model SS T = SS R (β 1, β 2, β 3 β 0 ) + SSE Decompose the 3 df SS R into SS with 1 df as follows SS R (β 1, β 2, β 3 β 0 ) = SS R (β 1 β 0 ) + SS R (β 2 β 1, β 0 ) + SS R (β 3 β 1, β 2, β 0 ) Example (the delivery time data): Investigate the contribution of the distance (x 2 ) to the model The hypothesis is H 0 : β 2 = 0 H 1 : β 2 0 The extra SS due to β 2 is SS R (β 2 β 1, β 0 ) = SS R (β 1, β 2, β 0 ) SS R (β 1, β 0 ) = SS R (β 1, β 2 β 0 ) SS R (β 1 β 0 ) The term SS R (β 1, β 2 β 0 ) = 5550.8166 can be obtained from SS R of the model y = β 0 + β 1 x 1 + β 2 x 2 + ε The term SS R (β 1 β 0 ) = 5382.4077 can be obtained from the SS R of the model y = β 0 + β 1 x 1 + ε 11
SS R (β 2 β 1, β 0 ) = 168.4078 with 1 df The test statistic is F 0 = SS R(β 2 β 1, β 0 )/1 MS E = 168.4078 10.6239 = 15.85 Reject H 0 because (15.85 > F 0.05,1,22 = 4.30) Testing the general linear hypothesis Consider the full model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε Wish to test the hypothesis H 0 : β 1 = β 3 H 0 : T β = 0, where T = (0, 1, 0, 1). Under the null hypothesis, the full model will be the reduced model as y = β 0 + β 1 x 1 + β 2 x 2 + β 1 x 3 + ε = β 0 + β 1 (x 1 + x 3 ) + β 2 x 2 + ε = γ 0 + γ 1 z 1 + γ 2 z 2 + ε, where γ 0 = β 0, γ 1 = β 1 = β 3, z 1 = x 1 + x 3, γ 2 = β 2, and z 2 = x 2. The sum of squares due to hypothesis is SS H = SS E (RM) SS E (F M) The df is (n 3) (n 4) = 1 The statistic for testing is F 0 = SS H /1 SS E (F M)/(n 4) 12
Consider the full model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε Wish to test the hypothesis H 0 : β 1 = β 3, β 2 = 0 H 0 : T β = 0, where T = 1 0 1 0 0 1 0 Under the null hypothesis, the full model will be the reduced model as y = β 0 + β 1 x 1 + β 1 x 3 + ε = β 0 + β 1 (x 1 + x 3 ) + ε = γ 0 + γ 1 z 1 + ε, where γ 0 = β 0, γ 1 = β 1 = β 3, z 1 = x 1 + x 3. The sum of squares due to hypothesis is SS H = SS E (RM) SS E (F M) The df is (n 2) (n 4) = 2 The statistic for testing is F 0 = SS H /2 SS E (F M)/(n 4) In general, the test statistic for the hypothesis, T β = 0 F 0 = SS H /r SS E (F M)/(n k 1), where r is the degrees of freedom due to the hypothesis. It is equal to the number of independent equations of T β = 0. 13