Econometrics I. Professor William Greene Stern School of Business Department of Economics 5-1/36. Part 5: Regression Algebra and Fit

Size: px

Start display at page:

Download "Econometrics I. Professor William Greene Stern School of Business Department of Economics 5-1/36. Part 5: Regression Algebra and Fit"

Clare Todd
5 years ago
Views:

1 Econometrics I Professor William Greene Stern School of Business Department of Economics 5-1/36

2 Econometrics I Part 5 Regression Algebra and Fit; Restricted Least Squares 5-2/36

3 Minimizing ee b minimizes ee = (y - Xb)(y - Xb). Any other coefficient vector has a larger sum of squared residuals. Proof: d = the vector, not equal to b; u = Xd u = y Xd = y Xb + Xb Xd = e - X(d - b). Then, uu = (y - Xd)(y-Xd) = sum of squares using d = [(y Xb) - X(d - b)][(y Xb) - X(d - b)] = [e - X(d - b)][e - X(d - b)] Expand to find uu = ee + (d-b)xx(d-b) (The cross product term is 2eX(d-b)=0 as X e = 0.) = e e + v v > ee 5-3/36

4 Dropping a Variable An important special case. Suppose b X,z = [b,c] = the regression coefficients in a regression of y on [X,z] b X = [d,0] = is the same, but computed to force the coefficient on z to equal 0. This removes z from the regression. We are comparing the results that we get with and without the variable z in the equation. Results which we can show: Dropping a variable(s) cannot improve the fit - that is, it cannot reduce the sum of squared residuals. Adding a variable(s) cannot degrade the fit - that is, it cannot increase the sum of squared residuals. 5-4/36

5 Adding a Variable Never Increases the Sum of Squares Theorem 3.5 on text page 40. u = the residual in the regression of y on [X,z] e = the residual in the regression of y on X alone, uu = ee c 2 (z*z*) ee where z* = M X z and c is the coefficient on z in the regression of y on [X,z]. 5-5/36

6 The Fit of the Regression Variation: In the context of the model we speak of covariation of a variable as movement of the variable, usually associated with (not necessarily caused by) movement of another variable. Total variation = (y = ym 0 i - y) y. M 0 = I i(i i) -1 i = the M matrix for X = a column of ones. n 2 i=1 5-6/36

7 Decomposing the Variation y i xb + e y y xb - xb + e i i i i = ( x - x) b+e i i i n 2 n 2 n 2 (y i y) [( x i 1 i1 i - x) b] e i1 i (Sum of cross products is zero.) Total variation = regression variation + residual variation Recall the decomposition: Var[y] = Var [E[y x]] + E[Var [ y x ]] = Variation of the conditional mean around the overall mean + Variation around the conditional mean function. 5-7/36

8 Decomposing the Variation of Vector y Decomposition: (This all assumes the model contains a constant term. one of the columns in X is i.) y = Xb + e so M 0 y = M 0 Xb + M 0 e = M 0 Xb + e. (Deviations from means.) ym 0 y = b(x M 0 )(M 0 X)b + ee = bxm 0 Xb + ee. (M 0 is idempotent and e M 0 X = e X = 0.) Total sum of squares = Regression Sum of Squares (SSR)+ Residual Sum of Squares (SSE) 5-8/36

9 A Fit Measure R 2 = bxm 0 Xb/yM 0 y e'e 1 (y y) n 2 i1 i Regression Variation Total Variation (Very Important Result.) R 2 is bounded by zero and one only if: (a) There is a constant term in X and (b) The line is computed by linear least squares. 5-9/36

10 Adding Variables R 2 never falls when a variable z is added to the regression. A useful general result for adding a variable 2 R with both X and variable z equals Xz 2 R X with only *2 Xz X X yz X X plus the increase in fit due to after X is accounted for: R R (1 R )r z Useful practical wisdom: It is not possible meaningfully to accumulate R 2 by adding variables in sequence. The incremental fit added by each variable depends on the order. The increase in R 2 that occurs from x 3 in (x 1 then x 2 then x 3 ) is different from that in (x 1 then x 3 then x 2 ). 5-10/36

11 Adding Variables to a Model What is the effect of adding PN, PD, PS? 5-11/36

12 A Useful Result Squared partial correlation of an x in X with y is squared t - ratio squared t - ratio + degrees of freedom We will define the 't-ratio' and 'degrees of freedom' later. Note how it enters: R R (1 R )r r R R / 1 R *2 2* Xz X X yz yz Xz X X 5-12/36

13 Partial Correlation Partial correlation is a difference in R 2 s. For PS in the example above, R 2 without PS =.9861, R 2 with PS =.9907 ( ) / ( ) = / ( (36-5)) = /36

14 Comparing fits of regressions Make sure the denominator in R 2 is the same - i.e., same left hand side variable. Example, linear vs. loglinear. Loglinear will almost always appear to fit better because taking logs reduces variation. 5-14/36

15 5-15/36

16 (Linearly) Transformed Data How does linear transformation affect the results of least squares? Z = XP for KK nonsingular P Based on X, b = (XX) -1 X y. Based on Z, c = (ZZ) -1 Z y = (P XXP) -1 P X y = P -1 (X X) -1 P -1 P X y = P -1 b Fitted value is Zc = (XP)(P -1 b) = Xb. The same!! Residuals from using Z are y - Zc = y - Xb (we just proved this.). The same!! Sum of squared residuals must be identical, as y-xb = e = y-zc. R 2 must also be identical, as R 2 = 1 - ee/y M 0 y (!!). 5-16/36

17 Linear Transformation What are the practical implications of this result? (1) Transformation does not affect the fit of a model to a body of data. (2) Transformation does affect the estimates. If b is an estimate of something (), then c cannot be an estimate of - it must be an estimate of P -1, which might have no meaning at all. Xb is the projection of y into the column space of X. Zc is the projection of y into the column space of Z. But, since the columns of Z are just linear combinations of those of X, the column space of Z must be identical to that of X. Therefore, the projection of y into the former must be the same as the latter, which now produces the other results.) 5-17/36

18 Principal Components Z = XC Fewer columns than X Includes as much variation of X as possible Columns of Z are orthogonal Why do we do this? Collinearity Combine variables of ambiguous identity such as test scores as measures of ability 5-18/36

19 What is a Principal Component? X = a data matrix (deviations from means) z = Xp = a linear combination of the columns of X. Choose p to maximize the variation of z. How? p = eigenvector that corresponds to the largest eigenvalue of X X. (Notes 7:41-44.) 5-19/36

20 5-20/36

21 Movie Regression. Opening Week Box for 62 Films Ordinary least squares regression LHS=LOGBOX Mean = Standard deviation = Number of observs. = 62 Residuals Sum of squares = Standard error of e = Fit R-squared = Adjusted R-squared = Variable Coefficient Standard Error t-ratio P[ T >t] Mean of X Constant *** LOGBUDGT STARPOWR SEQUEL MPRATING * ACTION *** COMEDY ANIMATED ** HORROR INTERNET BUZZ VARIABLES LOGADCT.29451** LOGCMSON LOGFNDGO CNTWAIT *** /36

22 The fit goes down when the 4 buzz variables are reduced to a single linear combination of the Ordinary least squares regression p LHS=LOGBOX Mean = p Standard deviation = P= p Number of observs. = Residuals Sum of squares = p Standard error of e = Fit R-squared = Adjusted R-squared = Variable Coefficient Standard Error t-ratio P[ T >t] Mean of X Constant *** LOGBUDGT.38159** STARPOWR SEQUEL MPRATING ACTION ** COMEDY ANIMATED * HORROR PCBUZZ.39704*** /36

23 5-23/36

24 Adjusted R Squared Adjusted R 2 (adjusted for degrees of freedom) 2 n1 2 R 1 1 R n K Degrees of freedom adjustment. R 2 includes a penalty for variables that don t add much fit. Can fall when a variable is added to the equation. 5-24/36

25 (1.34) /36

26 What is being adjusted? Adjusted R 2 The penalty for using up degrees of freedom. 2 R = 1 - [ee/(n K)]/[yM 0 y/(n-1)] = 1 [(n-1)/(n-k)(1 R 2 )] R 2 Will rise when a variable is added to the regression? 2 R is higher with z than without z if and only if the t ratio on z is in the regression when it is added is larger than one in absolute value. (See p. 46 in text.) 5-26/36

27 Full Regression (Without PD) Ordinary least squares regression... LHS=G Mean = Standard deviation = Number of observs. = 36 Model size Parameters = 9 Degrees of freedom = 27 Residuals Sum of squares = Standard error of e = Fit R-squared = <********** Adjusted R-squared = <********** Info criter. LogAmemiya Prd. Crt. = <********** Akaike Info. Criter. = <********** Model test F[ 8, 27] (prob) = 503.3(.0000) Variable Coefficient Standard Error t-ratio P[ T >t] Mean of X Constant ** PG *** Y.02214*** PNC PUC PPT PN * PS *** YEAR ** /36

28 PD added to the model. R 2 rises, Adjusted R 2 falls Ordinary least squares regression... LHS=G Mean = Standard deviation = Number of observs. = 36 Model size Parameters = 10 Degrees of freedom = 26 Residuals Sum of squares = Standard error of e = Fit R-squared = Was Adjusted R-squared = Was Variable Coefficient Standard Error t-ratio P[ T >t] Mean of X Constant ** PG *** Y.02231*** PNC PUC PPT PD (NOTE LOW t ratio) PN PS ** YEAR * /36

29 Linear Least Squares Subject to Restrictions Restrictions: Theory imposes certain restrictions on parameters. Some common applications Dropping variables from the equation = certain coefficients in b forced to equal 0. (Probably the most common testing situation. Is a certain variable significant? ) Adding up conditions: Sums of certain coefficients must equal fixed values. Adding up conditions in demand systems. Constant returns to scale in production functions. Equality restrictions: Certain coefficients must equal other coefficients. Using real vs. nominal variables in equations. General formulation for linear restrictions: Minimize the sum of squares, ee, subject to the linear constraint Rb = q. 5-29/36

30 Restricted Least Squares In practice, restrictions can usually be imposed by solving them out. 1. Force a coefficient to equal zero. Drop the variable from the equation n i 1 i1 2xi2 3x 2 i1 i3 3 Problem: Minimize for,, (y x n i1 i 1 i1 2 i2 Solution: Minimize for, (y x x ) 2. Adding up restriction. Impose + + = 1. Strategy: =1. Solution: Minimize for, ( 3. Equality restriction. Impose n yi 1xi1 2x i2 (1 i 1 1 2)x i3) ) subject to 0 n 2 i i3 1 i1 i1 i3 2 i2 i3 = [(y x ) (x x ) (x x )] 3 2 n i 1 i1 2 i2 3x i3) subject to i1 3 2 Minimize for,, (y x x n i1 i 1 i1 2 i2 i3 Solution: Minimize for, [y x (x x )] In each case, least squares using transformations of the data. 5-30/36

31 Restricted Least Squares Solution General Approach: Programming Problem Minimize for L = (y - X)(y - X) subject to R = q Each row of R is the K coefficients in a restriction. There are J restrictions: J rows 3 = 0: R = [0,0,1,0, ] q = (0). J=1 2 = 3 : R = [0,1,-1,0, ] q = (0). J=1 2 = 0, 3 = 0: R = 0,1,0,0, q = 0 J=2 0,0,1,0, /36

32 Solution Strategy Quadratic program: Minimize quadratic criterion subject to linear restrictions All restrictions are binding Solve using Lagrangean formulation Minimize over (,) L* = (y - X)(y - X) + 2(R-q) (The 2 is for convenience see below.) 5-32/36

33 Necessary Conditions L* Restricted LS Solution 2 X( y X) 2R 0 L* 2( R q) 0 Divide everything by 2. Collect in a matrix form XX R Xy or = Solution ˆ = R 0 q Does not rely on full rank of X. Relies on column rank of A = K J. 1 A w. A w 5-33/36

34 Restricted Least Squares If X has full rank, there is a partitioned solution for * and * β* b XX R R XX R Rb q 1 1 = - ( ) [ ( ) ]( ) 1 * [ R( X X) R ]( Rb q) 1 where b the simple least squares coefficients, b = ( X X) X y. Note β* = b and * 0 if Rb q = /36

35 Aspects of Restricted LS 1. b* = b - Cm where m = the discrepancy vector Rb - q. Note what happens if m = 0. What does m = 0 mean? 2. =[R(XX) -1 R] -1 (Rb - q) = [R(XX) -1 R] -1 m. When does = 0. What does this mean? 3. Combining results: b* = b - (XX) -1 R. How could b* = b? 5-35/36

36 Restrictions and the Criterion Function Assume full rank X case. (The usual case.) b XX Xy y X X 1 = ( ) uniquely minimizes ( - ) (y- ) =. ( y- Xb) ( y- Xb) < ( y- Xb*) ( y-xb*) for any b* b. Imposing restrictions cannot improve the criterion value. 2 2 It follows that R * < R. Restrictions must degrade the fit. 5-36/36

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra Econometrics I Professor William Greene Stern School of Business Department of Economics 3-1/29 Econometrics I Part 3 Least Squares Algebra 3-2/29 Vocabulary Some terms to be used in the discussion. Population