18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Size: px

Start display at page:

Download "18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013"

Benjamin Lester
5 years ago
Views:

1 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are n vectors, X is an n p matrix (of full rank p n) and β is the p-vector regression parameter. he Ordinary-Least-Squares (OLS) estimate of the regression parameter is: ˆβ = (X X) 1 X y he vector of fitted values of the dependent variable is given by: ŷ = Xβ ˆ = X(X X) 1 X y = Hy, where H = X(X X) 1 X is the n n Hat Matrix and the vector of residuals is given by: ɛˆ = (I n H)y, 1 (a) Prove that H is a projection matrix, i.e., H has the following properties: Symmetric: H = H Idempotent: H H = H 1 (b) he ith diagonal element of H, H i,i is called the leverage of case i. Show that dŷ i dy i = H i,i 1 (c) If X has full column rank p, p Average(H i,i ) = n Hint: Use the property: tr(ab) = tr(ba) for conformal matrices A and B. 1 (d) Prove that the Hat matrix H is unchanged if we replace the (n p) matrix X by X = XG for any non-singular (p p) matrix G. 1

2 1 (e) Consider the case where X is n (p + 1) with a constant term and p independent variables defining the regression model, i.e., 1 x 1,1 x 1, x 1,p 1 x,1 x, x,p X = x n,1 x n, x p,n Define G as follows: 1 x 1 x x p G = n where x j = i=1 x i,j/n, for j = 1,,..., p. By (d), the regression model with X = XG is equivalent to the original regression model in terms of having the same fitted values ŷ and residuals ˆɛ If β = (β 0, β 1,..., β p ) is the regression parameter for X, show that β = G 1 β is the regression parameter for X. Solve for G 1 and provide explicit formulas for the elements of β. Show that: [ ] n 0 [X X ] = p 0 p X X x 1,1 x 1 x 1, x x 1,p x p x,1 x 1 x, x x,p x p where X = x n,1 x 1 x n, x x p,n x p Prove the following formula for elements of the projection/hat matrix: H i,j = 1 n + (x i x ) [X X ] 1 (x j x ) where x i = (x i,1, x i,,... x i,p ) is the vector of independent variable values for case i, and x = (x 1, x,..., x p). he leverage of case i, H i,i, increases with the second term, the squared Mahalanobis distance between x i and the mean vector x.

3 Case Deletion Influence Measures (a) Sherman-Morrison-Woodbury (S-M-W) heorem: Suppose that A is a p p symmetric matrix of rank p, and a and b are each q p matrices of rank q < p. hen provided inverses exist (A + a b) 1 = A 1 A 1 a (I q + ba 1 a ) 1 ba 1. Prove the theorem. (b) Case deletion impact on β: ˆ Apply the S-M-W heorem to show that the least squares estimate of β when the ith case is deleted from the data is ˆ ˆ (X X) 1 xiɛˆi β(i) = β 1 H i,i, where x i is the ith row of X and ɛˆi = y i ŷ i = y i x ˆ i β. (c) A popular influence ( measure ) for a case i is the ith Cook s distance CD i = 1 ŷ ŷ pˆσ (i) where ŷ (i) = X ˆβ (i). Show that CD i = ˆɛ i H pˆσ i,i (1 H i,i ) (d) Case deletion impact on σˆ : Let σˆ (i) be the unbiased estimate of the residual variance σ when case i is deleted from the data. Show that: ( 1 σˆ (i) = σˆ+ n p 1 ) ( ˆσ ˆɛ i 1 H i,i ) Sequential ANOVA in Normal Linear Regression Models via the QR Decomposition Recall from the lecture notes that the QR-decomposition, X = QR is a factorization of the n p matrix X into Q, an n p columnorthonormal matrix (Q Q = I p, the p p identify matrix) times R, a p p upper-triangular matrix. Denoting the jth column of X and of Q by X [j] and Q [j], respectively, we can write out the QR-decomposition for X, column-wise: X [1] = Q [1] R 1,1 X [] = Q [1] R 1, + Q [] R, X [3] = Q [1] R 1,3 + Q [] R,3 + Q [3] R 3,3. X [p] = Q [1] R 1,p + Q [] R,p + Q [3] R 3,p + + Q [p] R p,p 3

4 A common issue arising in a regression analysis with p explanatory variables is whether just the first k (< p) explanatory variables (given by the first k columns of X) enter in the regression model. his can be expressed as an hypothesis about the regression parameter β, H 0 : β k+1 = β k+ = = β p 0. ( ) ˆ 3 (a) Consider the estimate βˆ βi 0 = where 0 p k ˆβ I = (XI X I) 1 XI y X I = [ ] X [1] X [] X [k] Show that βˆ 0 is the constrained least-squares estimate of β corresponding to the hypothesis H 0, i.e., ˆβ 0 minimizes: SS(β) = (y Xβ) (y Xβ) subject to ˆβ j = 0, j = k + 1, k +,..., p. 3 (b) Show that the QR-decomposition of X I is X I = Q I R I, where Q I is the matrix of the first k columns of Q and R I is the upper-left k k block of R. Furthermore, verify that: ˆβ I = R 1 I Q I y, and ŷ I = H I y, where H I = Q I Q I, the n n projection/hat matrix under the null hypothesis. 3 (c) From the lecture notes, recall the definition of [ ] Q A =, where W A is an (n n) orthogonal matrix (i.e. A = A 1 ) Q is the column-orthonormal matrix in a Q-R decomposition of X Note: W can be constructed by continuing the Gram-Schmidt Orthonormalization process (which was used to construct Q from X) with X = [ X I n ]. hen, consider [ Q y ] [ ] z Q (p 1) z = Ay = W = y z W (n p) 1 Prove the following relationships for the unconstrained regression model: 4

5 y y = y1 + y + + yn = z1 + z + + z n = z z ŷ ŷ = ŷ1 + ŷ + + ŷ n = z1 + z + + zk + + z p ɛˆ ɛˆ = ɛˆ1 + ɛˆ + + ɛˆn = zp+1 + z p+ + + z n Prove the following relationships for the constrained regression model: y y = y1 + y + + y n = z1 + z + + zn = z z ŷi ŷ I = (ŷ I) 1 + (ŷ I) + + (ŷ I) n = z1 + z + + z k ɛˆ ˆ I ɛˆi = (ɛ I ) ˆ ˆ 1 + (ɛ I) + + (ɛ I ) n = zk z p + zp+1 + z p+ + + z n 3 (d) Under the assumption of a normal linear regression model, the lecture notes ( detail ho ) w the distribution [( ) of z = Ay ] is z z = Q Rβ Nn, σ z In W On p = z Q N p [(Rβ), σ I p ] z W N(n p)[(o (n p), σ I (n p) ] and z Q and z W are independent. For the unconstrained (and the constrained) model, deduce that: SS ERROR = ɛˆ ɛˆ σ χ n p a Chi-Square r.v. with (n p) degrees of freedom scaled by σ. For the constrained model under H 0, deduce that: SS REG(k+1,...,p 1,,...,k) = ŷ ŷ ŷi ŷ I = ɛˆ I ɛˆi ɛˆ ɛˆ = zk+1 + zp σ χ p k, a σ multiple of a Chi-Square r.v. with (p k) degrees of freedom which is independent of SS ERROR. Under H 0, deduce that the statistic: 5

6 F ˆ = SS REG(k+1,...,p 1,,...,k) /(p k) SS ERROR /(n p) has an F distribution with (p k) degrees of freedom for the numerator and (n p) degrees of freedom for the denominator. It is common practice to summarize in a table the calculations of the F -statistics for testing the null hypothesis that the last (p k) components of the regression parameter are zero: ANOVA able Source Sum of Squares Degrees of Freedom Mean Square F-Statistic Regression on 1,,..., k Regression on k + 1,..., p Adjusting for 1,,..., k ŷ I ŷ I k I y I ŷ ŷ ŷ ŷ I ŷ I (p-k) MS 0 = ŷ ŷ ˆ (p k) ˆF = MS 0 MS Error Error ˆɛ ˆɛ (n p) MS Error = ˆɛ ˆɛ (n p) otal y y n 6

7 MI OpenCourseWare 18.S096 Mathematical Applications in Financial Industry Fall 013 For information about citing these materials or our erms of Use, visit:

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,