RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy to understand, since: My (I X(X X) X )y y X(X X) X y y X ˆβ ˆε M plays a central role in many derivations. The following properties are worth noting (and showing for yourself) : M M, symmetric matrix () M M, idempotent matrix (3) MX 0, regression of X on X gives perfect fit (4) Another way of interpreting MX 0 is that since M produces the OLS residuals, M is orthogonal to X. M does not affect the residuals: Since y Xβ + ε, we also have: Note that this gives: Mˆε M(y X ˆβ) My MX ˆβ ˆε (5) M ε M(y Xβ) ˆε. (6) ˆε ˆε ε Mε (7) whihc show that SSR is a quadratic form in the theoretical disturbances (See lecture note for use of this). A second important matrix in regression analysis is: which is called the prediction matrix, since P X(X X) X (8) ŷ X ˆβ X(X X) X y Py (9) P is also symmetric and idempotent. In linear algebra, M and P are both known as projection matrices, Ch in DM, page 57, in particular gives the geometric interpretation. M and P are orthogonal: MP PM 0 (0) Since M I P,
M and P are also complementary projections M + P I () which gives (M + P)y Iy and therefore: y ŷ + ˆε Py + My ŷ + ˆε () The scalar product y y then becomes: y y(y P + y M)(Py + My) ŷ ŷ + ˆε ˆε Written out, this is: i Y i T SS i Ŷ i ESS + i ˆε i SSR You may be more used to write this famous decomposition as: (3) (Y i Ȳ ) i } {{ } T SS (Ŷi Ŷ ) i } {{ } ESS + i ˆε i SSR (4) Question: Why is there no conflict between (3) and (4) as long as X contains a vector of ones (an intercept)? Alternative ways of writing SSR that sometimes come in handy: ˆε ˆε (y M )My y My y ˆε ˆε y (5) Partitioned regression ˆε ˆεy y y P Py y y y [ X(X X) X [ X(X X) X y (6) y y y X(X X) X y y y y X ˆβ y y ˆβ X y. We now let β [ ˆβ ˆβ where β has k parameters and β has k parameters. k + k k. [ The corresponding partitioning of X is X X n k X. The OLS estimated model with this notation: y X ˆβ +X ˆβ + ˆε, (7) We begin by writing out the normal equations: ˆε My. n k (X X) ˆβ X y in terms of the partitioning: where we have used X X X X ˆβ X X X X ˆβ [ X X [X X X X X y X. (8) y
see page 0 and Exercise.9 in DM. The vector of OLS estimators can therefore be expressed as: ˆβ ˆβ X X X X X y X X X X X. (9) y We need a result for the inverse of a partitioned matrix. There are several versions, but probably the most useful for us is: ( A A A A A A A ) A A A A A A (0) where With the aid of (0)-() it can be shown that A A A A A () A A A A A. () ˆβ ˆβ ( (X M X ) ) X M y (X M X ), (3) X M y where M and M are the two residual-makers: M I X (X X ) X (4) M I X (X X ) X. (5) As an application, start with: [ X [ X. X ι. X where ι X.. n X... X k X... X k.. X n... X nk ι ι n so M I ι (ι ι) ι I ι (ι ι) ι M I n ιι M ι the so-called centering matrix. Show that it is: n n n M ι n n..... n n n n n 3
By using the general formula for partitioned regression above, we get ˆβ (X M X ) X M y ( [M ι X [M ι X ) [Mι X y [ (X X ) (X X ) (X X ) y (6) where X is the n (k ) matrix with the variable means in the k columns. In the exercise set to Seminar you are invited to show the same result for ˆβ more directly, without the use of the results from partitioned regression, by re-writing the regression model. Frisch-Waugh-Lovell (FWL) theorem The expression for ˆβ in (3) suggests that there is another simple method for finding ˆβ that involves M : We premultiply the model (7), first with M and then with X : M y M X ˆβ + ˆε (7) X M y X M X ˆβ (8) where we have used M X 0 and M ˆε ˆε in (7) and X ˆε 0 in (7), since the influence of the k second variables has already been regressed out. We can find ˆβ from this normal equation: ˆβ X M X X M y (9) It can be shown that the covariance matrix is Cov ˆβ ˆσ (X M X ). where ˆσ denotes the unbiased estimator for the common variance of the disturbances: The corresponding expressions for ˆβ is: ˆσ ˆεˆε n k. ˆβ (X M X ) X M y (30) Cov ˆβ ˆσ (X M X ). Interpretation of (9):Use symmetry and idempotency of M : ˆβ ((M X ) (M X )) (M X ) M y Here, M y is the residual vector ˆε y X and M X are the matrix with k residuals obtained from regressing each variable in X on X, ˆε X X. But this means that ˆβ can be obtained by first running these two auxiliary regressions,saving the residuals in ˆε Y X and ˆε X X and then running a second regression with ˆε Y X as regressand, and with ˆε X X as the matrix with regressors: ˆβ (M X ) (M X ) ˆε X X ˆε X X (M X ) (M y ) (3) ˆε X X ˆε Y X We have shown the Frisch-Waugh(-Lovell) theorem. Followers of ECON 450 spring 03 will have seen a scalar version of this. The 450 lecture note is still on that course page. In sum: If we want to estimate the partial effects on Y of changes in the variables in X (i.e., controlled for the influence of X ), we can do that in two ways. Either by estimating the full multivariate model y X ˆβ +X ˆβ + ˆε or, by following the above two-step procedure of obtaining residuals with X as regressor and then obtaining ˆβ from (3). Other applications: 4
Trend correction Seasonal adjustment by seasonal dummy variables (DM p 69) Leverage (or influential observations), DM p 76-77 5