RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made perfectly uncorrelated (orthogonal) by a transformation. In the context of a model, e.g., a regression equation, variables can be orthogonalized, without changing the statistical properties of the models disturbance, however one or more of the original coefficients will be changed as a consequence of the transformation. We then have a re-parameterization of the model (with the aim of achieving orthogonalization) In the bivariate regression model, the original parameterization is Y i β + β X i + ε i () If the assumptions of the model holds, the disturbance has so called classical properties. Now consider adding and subtracting β X : Y i β + β X + β (X i X ) + ε i () δ This is a reparameterization of the model because one of the parameters, the constant term, has changed. Importantly, the disturbance is unaffected, hence the statistical properties of the model is unaffected by the re-parameterization. This means that the maximized value of the likelihood function for () must be the same value as for (). The OLS/ML estimators of δ and β are uncorrelated, although the estimators of β and β are correlated, as you will know from elementary econometrics. In 7.. in HN, the original parameterization of the model equation is like: We know that the reparameterization Y i β + β X i + β 3 X 3i + ε i (3) Y i β + β X + β X + β (X i X ) + β 3 (X i X 3 ) + ε i (4) δ secures that the two regressors are uncorrelated with the intercept. They are still correlated however. In 7... a parameterization is shown that gives uncorrelated regressors as well. The likelihood function can easily be maximized with respect to the parameters of the fully orthogonalized model (thus extending the step-wise maximization for the first model equation Y i β + ε i ) Structural interpretation 5..3. in HN In economics, the label structural model is often given to theoretical relationships that are linearized and supplemented with an additive disturbance, and then estimated. In addition to theoretical interpretability, the book suggests that a structural model should have the features:
. Parameter constancy with respect to changes in sample (for example extensions). Parameter invariance to regime shifts 3. Invariance with respect to variable addition, for example from other studies. At this point in the exposition, the book only illustrates requirement 3. by the simplest example of omitted variable bias. Where the true model contains a second regressor: Y i b + b X i + b 3 X i + v i (5) From elementary econometrics you are able to derive the bias expression of the OLS/ML estimator of ˆβ given that (5) is the true model. Projection matrices Reference: Davidson and MacKinnon Ch. In particular page 57-8, or a suitable book chapter on matrix algebra! The matrix M I X(X X) X (6) is often called the residual maker. That nickname is easy to understand, since: My (I X(X X) X )y y X(X X) X y y X ˆβ ˆε M plays a central role in many derivations. The following properties are worth noting (and showing for yourself) : M M, symmetric matrix (7) M M, idempotent matrix (8) MX 0, regression of X on X gives perfect fit (9) Another way of interpreting MX 0 is that since M produces the OLS residuals, M is orthogonal to X. M does not affect the residuals: Since y Xβ + ε, we also have: Note that this gives: Mˆε M(y X ˆβ) My MX ˆβ ˆε (0) M ε M(y Xβ) ˆε. () ˆε ˆε ε Mε () which show that RSS is a quadratic form in the theoretical disturbances. A second important matrix in regression analysis is: which is called the prediction matrix, since P X(X X) X (3) ŷ X ˆβ X(X X) X y Py (4)
P is also symmetric and idempotent. In linear algebra, M and P are both known as projection matrices, Ch in DM, page 57, in particular gives the geometric interpretation. M and P are orthogonal: MP PM 0 (5) Since M and P are also complementary projections which gives and therefore: The scalar product y y then becomes: M I P, M + P I (6) (M + P)y Iy y ŷ + ˆε Py + My ŷ + ˆε (7) y y(y P + y M)(Py + My) ŷ ŷ + ˆε ˆε Written out, this is: Y i T SS Ŷ i ESS + ˆε i RSS You may be more used to write this famous decomposition as: (8) (Y i Ȳ ) } {{ } T SS (Ŷi Ŷ ) } {{ } ESS + ˆε i RSS (9) Question: Why is there no conflict between (8) and (9) as long as X contains a vector of ones (an intercept)? Alternative ways of writing RSS that sometimes come in handy: ˆε ˆε (y M )My y My y ˆε ˆε y (0) Partitioned regression ˆε ˆεy y y P Py y y y [ X(X X) X ] [ X(X X) X ] y () y y y X(X X) X y y y y X ˆβ y y ˆβ X y. We now let β [ ˆβ ˆβ ] where β has k parameters and β has k parameters. k + k k. [ The corresponding partitioning of X is X X n k X. The OLS estimated model with this notation: y X ˆβ +X ˆβ + ˆε, () We begin by writing out the normal equations: ˆε My. n k ] (X X) ˆβ X y 3
in terms of the partitioning: ( ) ( ) X X X X ˆβ X X X X ˆβ ( ) X y X. (3) y where we have used [ ] X X [X X ] X X see page 0 and Exercise.9 in DM. The vector of OLS estimators can therefore be expressed as: ( ) ( ) ˆβ X X X ( ) X X y ˆβ X X X X X. (4) y We need a result for the inverse of a partitioned matrix. There are several versions, but probably the most useful for us is: ( ) ( A A A A A A A ) A A A A A A (5) where With the aid of (5)-(7) it can be shown that A A A A A (6) A A A A A. (7) ( ) ˆβ ˆβ ( (X M X ) ) X M y (X M X ), (8) X M y where M and M are the two residual-makers: M I X (X X ) X (9) M I X (X X ) X. (30) As an application, start with: [ X ] [ X. X ι. X ] where ι X. n X... X k X... X k.. X n... X nk ι ι n so M I ι (ι ι) ι I ι (ι ι) ι M I ιι M ι 4
the so-called centering matrix. Show that it is: n M ι..... n n By using the general formula for partitioned regression above, we get ˆβ (X M X ) X M y ( [M ι X ] [M ι X ] ) [Mι X ] y [ (X X ) (X X ) ] (X X ) y (3) where X is the n (k ) matrix with the variable means in the k columns. In the exercise set to Seminar you are invited to show the same result for ˆβ more directly, without the use of the results from partitioned regression, by re-writing the regression model. Frisch-Waugh-Lovell (FWL) theorem The expression for ˆβ in (8) suggests that there is another simple method for finding ˆβ that involves M : We premultiply the model (), first with M and then with X : M y M X ˆβ + ˆε (3) X M y X M X ˆβ (33) where we have used M X 0 and M ˆε ˆε in (3) and X ˆε 0 in (3), since the influence of the k second variables has already been regressed out. We can find ˆβ from this normal equation: ˆβ It can be shown that the covariance matrix is ( ) Cov ˆβ ˆσ (X M X ). ( X M X ) X M y (34) where ˆσ denotes the unbiased estimator for the common variance of the disturbances: The corresponding expressions for ˆβ is: ˆσ ˆεˆε n k. ˆβ (X M X ) X M y (35) ( ) Cov ˆβ ˆσ (X M X ). Interpretation of (34):Use symmetry and idempotency of M : ˆβ ((M X ) (M X )) (M X ) M y Here, M y is the residual vector ˆε y X and M X are the matrix with k residuals obtained from regressing each variable in X on X, ˆε X X. But this means that ˆβ can be obtained by first running these two auxiliary regressions,saving the residuals in ˆε Y X and ˆε X X and then running a second regression with ˆε Y X as regressand, and with ˆε X X as the matrix with regressors: ˆβ (M X ) (M X ) ˆε X X ˆε X X (M X ) (M y ) (36) ˆε X X ˆε Y X 5
We have shown the Frisch-Waugh(-Lovell) theorem. [Followers of ECON 450 spring 03 of 04 will have seen a scalar version of this. The 450 lecture notes are still on those course pages]. In sum: If we want to estimate the partial effects on Y of changes in the variables in X (i.e., controlled for the influence of X ), we can do that in two ways. Either by estimating the full multivariate model y X ˆβ +X ˆβ + ˆε or, by following the above two-step procedure of obtaining residuals with X as regressor and then obtaining ˆβ from (36). Other applications: Trend correction Seasonal adjustment by seasonal dummy variables (DM p 69) Leverage (or influential observations), DM p 76-77 6