Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression Model 3 Polynomial Regression Models 4 Interaction Regression Models 5 Model Selection 6 Weighted Least Squares and Ridge Regression 7 Principle Component Analysis

General Regression Model in Matrix Terms Y 1. 1 X 11 X 12... X 1,p 1 Y =. X =.... 1 X n1 X n2... X n,p 1 Y n β 0 ɛ 1.. β =. ɛ =... β p 1 ɛ n

General Linear Regression in Matrix Terms With E(ɛ) = 0 and Y = X β + ɛ σ 2 0... 0 σ 2 (ɛ) = 0 σ 2... 0... 0 0... σ 2 We have E(Y ) = X β and σ 2 {y} = σ 2 I

Least Square Solution The matrix normal equations can be derived directly from the minimization of w.r.t to β Q = (y Xβ) (y Xβ)

Least Square Solution We can solve this equation X Xb = X y (if the inverse of X X exists) by the following (X X) 1 X Xb = (X X) 1 X y and since we have (X X) 1 X X = I b = (X X) 1 X y

Fitted Values and Residuals Let the vector of the fitted values are ŷ 1 ŷ 2 ŷ =... ŷ n in matrix notation we then have ŷ = Xb

Hat Matrix-Puts hat on y We can also directly express the fitted values in terms of X and y matrices ŷ = X(X X) 1 X y and we can further define H, the hat matrix ŷ = Hy H = X(X X) 1 X The hat matrix plans an important role in diagnostics for regression analysis.

Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A.

Residuals The residuals, like the fitted value ŷ can be expressed as linear combinations of the response variable observations Y i e = y ŷ = y Hy = (I H)y also, remember e = y ŷ = y Xb these are equivalent.

Covariance of Residuals Starting with we see that but which means that e = (I H)y σ 2 {e} = (I H)σ 2 {y}(i H) σ 2 {y} = σ 2 {ɛ} = σ 2 I σ 2 {e} = σ 2 (I H)I(I H) = σ 2 (I H)(I H) and since I H is idempotent, we have σ 2 {e} = σ 2 (I H)

Quadratic Forms In general, a quadratic form is defined by y Ay = i j a ijy i Y j where a ij = a ji with A the matrix of the quadratic form. The ANOVA sums SSTO,SSE and SSR can all be arranged into quadratic forms. SSTO = y (I 1 n J)y SSE = y (I H)y SSR = y (H 1 n J)y

Inference We can derive the sampling variance of the β vector estimator by remembering that b = (X X) 1 X y = Ay where A is a constant matrix A = (X X) 1 X A = X(X X) 1 Using the standard matrix covariance operator we see that σ 2 {b} = Aσ 2 {y}a

Variance of b Since σ 2 {y} = σ 2 I we can write σ 2 {b} = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 I = σ 2 (X X) 1 Of course E(b) = E((X X) 1 X y) = (X X) 1 X E(y) = (X X) 1 X Xβ = β

F-test for regression H 0 : β 1 = β 2 = = β p 1 = 0 H a : no all β k, (k = 1,, p 1) equal zero Test statistic: Decision Rule: F = MSR MSE if F F (1 α; p 1, n p), conclude H 0 if F > F (1 α; p 1, n p), conclude H a

R 2 and adjusted R 2 The coefficient of multiple determination R 2 is defined as: R 2 = SSR SSTO = 1 SSE SSTO 0 R 2 1 R 2 always increases when there are more variables. Therefore, adjusted R 2 : R 2 a = 1 SSE n p SSTO n 1 R 2 a may decrease when p is large. Coefficient of multiple correlation: Always positive square root! = 1 R = R 2 ( ) n 1 SSE n p SSTO

Inferences The estimated variance-covariance matrix Then, we have s 2 {b} = MSE(X X) 1 b k β k s{b k } t(n p), k = 0, 1,, p 1 1 α confidence intervals: b k ± t(1 α/2; n p)s{b k }

t test Tests for β k : H 0 : β k = 0 H 1 : β k 0 Test Statistic: t = b k s{b k } Decision Rule: t t(1 α/2; n p); conclude H 0 Otherwise, conclude H a

Extra Sums of Squares A topic unique to multiple regression An extra sum of squares measures the marginal decrease in the error sum of squares when one or several predictor variables are added to the regression model, given that other variables are already in the model. Equivalently-one can view an extra sum of squares as measuring the marginal increase in the regression sum of squares

Definitions Definition SSR(X 1 X 2 ) = SSE(X 2 ) SSE(X 1, X 2 ) Equivalently SSR(X 1 X 2 ) = SSR(X 1, X 2 ) SSR(X 2 ) We can switch the order of X 1 and X 2 in these expressions We can easily generalize these definitions for more than two variables SSR(X 3 X 1, X 2 ) = SSE(X 1, X 2 ) SSE(X 1, X 2, X 3 ) SSR(X 3 X 1, X 2 ) = SSR(X 1, X 2, X 3 ) SSR(X 1, X 2 )

ANOVA Table Various software packages can provide extra sums of squares for regression analysis. These are usually provided in the order in which the input variables are provided to the system, for instance Figure:

Test whether a single β k = 0 Does X k provide statistically significant improvement to the regression model fit? We can use the general linear test approach Example First order model with three predictor variables Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i We want to answer the following hypothesis test H 0 : β 3 = 0 H 1 : β 3 0

Test whether a single β k = 0 For the full model we have SSE(F ) = SSE(X 1, X 2, X 3 ) The reduced model is Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i And for this model we have SSE(R) = SSE(X 1, X 2 ) Where there are df r = n 3 degrees of freedom associated with the reduced model

Test whether a single β k = 0 The general linear test statistics is which becomes F = SSE(R) SSE(F ) df R df F / SSE(F ) df F F = SSE(X 1,X 2 ) SSE(X 1,X 2,X 3 ) (n 3) (n 4) / SSE(X 1,X 2,X 3 ) n 4 but SSE(X 1, X 2 ) SSE(X 1, X 2, X 3 ) = SSR(X 3 X 1, X 2 )

Test whether a single β k = 0 The general linear test statistics is F = SSR(X 3 X 1,X 2 ) 1 / SSE(X 1,X 2,X 3 ) n 4 = MSR(X 3 X 1,X 2 ) MSE(X 1,X 2,X 3 ) Extra sum of squares has one associated degree of freedom.

Test whether β k = 0 Another example H 0 : β 2 = β 3 = 0 H 1 : not both β 2 and β 3 are zero The general linear test can be used again F = SSE(X 1) SSE(X 1,X 2,X 3 ) (n 2) (n 4) / SSE(X 1,X 2,X 3 ) n 4 But SSE(X 1 ) SSE(X 1, X 2, X 3 ) = SSR(X 2, X 3 X 1 ) so the expression can be simplified.

Tests concerning regression coefficients Summary: General linear test can be used to determine whether or not a predictor variable( or sets of variables) should be included in the model The ANOVA SSE s can be used to compute F test statistics Some more general tests require fitting the model more than once unlike the examples given.

Summary of Tests Concerning Regression Coefficients Test whether all β k = 0 Test whether a single β k = 0 Test whether some β k = 0 Test involving relationships among coefficients, for example, H 0 : β 1 = β 2 vs. H a : β 1 β 2 H 0 : β 1 = 3, β 2 = 5 vs. H a : otherwise Key point in all tests: form the full model and the reduced model