First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical usefuless but are good exercises for uderstadig the algebra of least squares. Partitioed ad partial regressio is ot treated as seriously as it might be i a graduate course, ad is oly give a cursory presetatio. However, the residual maker matrix Mi is preseted, ad is used i to defie R 2, ad i several other parts of the course. 2. Some basic properties of OLS First, ote that the LS residuals are orthogoal to the regressors X Xb X y = 0 ( ormal equatios ; (k ) ) So, X (y Xb) = X e = 0 ; or, X e = 0 Orthogoality implies liear idepedece (but ot vice versa). See Liearly Idepedet, Orthogoal, ad Ucorrelated Variables (Rodgers et al., 984), for the defiitio ad relatioship betwee the three terms. If the model icludes a itercept term, the oe regressor (say, the first colum of X) is a uit vector. I this case we get some further results:. The LS residuals sum to zero From the first elemet: x k e X e = ( ) (e ) = ( ) ( ) x k e x k x k e i e i 0 = (? ) = ( )? 0 i= e i = 0

2. Fitted regressio passes through sample mea X y = X Xb, y x k b or, ( ) ( ) = ( ) ( ) ( ). x k x k y x k x k x k b k i? y i i x i2 So, ( ) = (?? ) ( ).??? b k b From the first row of this vector equatio or, i y i = b + b 2 x i2 + + b k x ik i i y = b + b 2 x 2 + + b k x k 3. Sample mea of the fitted y-values equals sample mea of actual y-values So, y i = x i β + ε i = x i b + e i = y i + e i. or, i=, y i = y i= i + e i= i y = y + 0 = y Note: These last 3 results use the fact that the model icludes a itercept. 2.2 Partitioed ad Partial Regressio We ca solve for blocks of b. Suppose we partitio our model ito 2 blocks: y = X β + X 2 β 2 + ε. ( ) ( k)(k ) ( k2)(k2 ) ( ) We could solve for the OLS estimator of β oly: Whe does b = (X X ) X y? b = (X X ) X y (X X ) X X 2 b 2

2 Above we have a solutio for b i terms of b 2. We ca substitute i a similar solutio for b 2 to obtai a solutio for b that is a fuctio of oly the X ad y data. Defie: M 2 = (I X 2 (X 2 X 2 ) X 2 ) M2 is a idempotet matrix M 2 M 2 = M 2 M 2 = M 2 = M 2 M 2. The, we ca write: b = (X M 2 X ) X M 2 y By the symmetry of the problem, we ca iterchage the ad 2 subscripts, ad get: b 2 = (X 2 M X 2 ) X 2 M y So, fially, we ca write: b = (X X ) X y b 2 = (X 2 X 2 ) X 2 y 2 where: X = M 2 X ; X 2 = M X 2 ; y = M 2 y ; y 2 = M y These results are importat for several reasos: Historically, for computatioal reasos. See the Frisch-Waugh-Lovell Theorem. I certai situatios b ad b 2 may have differet properties. This is difficult to show without have separate formulas. Havig established the Mi residual maker matrix, it is used frequetly to derive other results. Why is Mi called a residual maker matrix?

3 2.3 Goodess-of-Fit Oe way of measurig the quality of fitted regressio model is by the extet to which the model explais the sample variatio for y. Sample variace of y is ( ) (y i y) 2 i=. Or, we could just use i= (y i y) 2 to measure variability. Our fitted regressio model, usig LS, gives us y = Xb + e = y + e where y = Xb = X(X X) X y Recall that if the model icludes a itercept, the the residuals sum to zero, ad y = y. To simplify thigs, itroduce the followig matrix: M 0 = [I ii ] where: i = ( ) ; ( ) Note that: M 0 is a idempotet matrix. M 0 i = 0. M 0 trasforms elemets of a vector ito deviatios from sample mea. y M 0 y = y M 0 M 0 y = i= (y i y) 2. Let s check the third of these results: 0 / / M 0 y = {[ ] [[ ]]} ( ) 0 / / y y

4 y y y 2 y y y = [ ] = ( ). y y y 2 y y y Returig to our fitted model: y = Xb + e = y + e So, we have: M 0 y = M 0 y + M 0 e = M 0 y + e. [M 0 e = e ; because the residuals sum to zero.] The y M 0 y = y M 0 M 0 y = (M 0 y + e) (M 0 y + e) = y M 0 y + e e + 2e M 0 y However, e M 0 y = e M 0 y = (M 0 e) y = e y = e X(X X) X y = 0. So, we have Recall: y = y. y M 0 y = y M 0 y + e e i= (y i y) 2 = i= (y i y) 2 2 + e i i= SST = SSR + SSE This lets us defie the Coefficiet of Determiatio R 2 = ( SSR SST ) = (SSE SST )

5 Note: The secod equality i defiitio of R 2 holds oly if model icludes a itercept. R 2 = ( SSR SST ) 0 R 2 = ( SSE SST ) So, 0 R 2 Iterpretatio of 0 ad? R 2 is uitless. What happes if we add ay regressor(s) to the model? y = X β + ε ; [] The: y = X β + X 2 β 2 + u ; [2] (A) Applyig LS to [2]: mi. (u u ) ; u = y X b X 2 b 2 (B) Applyig LS to []: mi. (e e) ; e = y X β Problem (B) is just Problem (A), subject to restrictio: β 2 = 0. Miimized value i (A) must be miimized value i (B). So, u u e e. What does this imply? Addig ay regressor(s) to the model caot icrease (ad typically will decrease) the sum of squared residuals. So, addig ay regressor(s) to the model caot decrease (ad typically will icrease) the value of R 2. Meas that R 2 is ot really a very iterestig measure of the quality of the regressio model, i terms of explaiig sample variability of the depedet variable. For these reasos, we usually use the adjusted Coefficiet of Determiatio. We modify R 2 = [ e e y M 0 y ] to become:

6 R 2 = [ e e/( k) y M 0 y/( ) ]. What are we doig here? We re adjustig for degrees of freedom i umerator ad deomiator. Degrees of freedom = umber of idepedet pieces of iformatio. e = y Xb. We estimate k parameters from the data-poits. We have ( k) degrees of freedom associated with the fitted model. I deomiator have costructed y from sample. Lost oe degree of freedom. Possible for R 2 < 0 (eve with itercept i the model). R 2 ca icrease or decrease whe we add regressors. Whe will it icrease (decrease)? I multiple regressio, R 2 will icrease (decrease) if a variable is deleted, if ad oly if the associated t-statistic has absolute value less tha (greater tha) uity. If model does t iclude a itercept, the SST SSR + SSE, ad i this case o loger ay guaratee that 0 R 2. Must be careful comparig R 2 ad R 2 values across models. Example () C i = 0.5 + 0.8Y i ; R 2 = 0.90 (2) log (C i ) = 0.2 + 0.75Y i ; R 2 = 0.80 Sample variatio is i differet uits.