Econ 2120: Section 2 - PDF Free Download

Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Orthogonality Conditions Goal of first half of the class: Get you to think in terms of projections. Best Linear Predictor: Projection onto space of linear functions of X Conditional expectation: Projection onto space of all linear functions of X Key: There is a one-to-one map between the projection problem and the orthogonality condition.

Orthogonality Conditions We can always (trivially) write Y = g(x ) + U, U = Y g(x ). To understand what g(x ) describes, you just need to know the orthogonality condition that U satisfies. Two cases: (1): g(x ) = β 0 + β 1 X and E[U] = E[U X ] = 0 = g(x ) is the best linear predictor. (2): E[U h(x )] = 0 for all h( ). = g(x ) = E[Y X ].

Orthogonality Conditions Why is this useful? If you can transform your minimization problem e.g. into a projection problem, min E[(Y β 0 β 1 X ) 2 ] β 0,β 1 min Y β 0 β 1 X 2, β 0,β 1 you have a whole new set of (very) useful tools. By the projection theorem, we can characterize the solution by a set of orthogonality conditions AND we have uniqueness.

Best Linear Predictor Why are we emphasizing the best linear predictor E [Y X ] so much?

Best Linear Predictor Why are we emphasizing the best linear predictor E [Y X ] so much? In some sense, it can always be computed. Requires very few assumptions aside from the existence of second moments of the joint distribution. You can always go to Stata, type reg y x and get something that has the interpretation of a best linear predictor. Regression output = estimated coefficients of best linear predictor.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Linear Predictor X = (X 1,..., X K ) is a K 1 vector, β = (β 1,..., β K ) is the K 1 vector of coefficients of the best linear predictor. Coefficients characterized by K orthogonality conditions E[(Y X β)x j ] = E[X j (Y X β)] = 0 for j = 1,..., K. Re-write in vector form to get E[ X (Y X β) ] = 0. K 1 1 1 This is a K 1 system of linear equations.

Least Squares Data are (y i, x i1,..., x ik ) for i = 1,..., n. x i = (x i1,..., x ik ) is the 1 K vector of covariates for the i-th observation and x j = (x 1j,..., x nj ) be the n 1 vector of observations of the j-th covariate. Define y = n 1 y 1. y n, b = K 1 b 1. b k, x = n K x 1. x n = ( x 1... x K ). n K matrix x is sometimes referred to as the design matrix.

Least Squares Least-squares coefficients b j defined by K orthogonality conditions (x j ) (y xb) = 0 for j = 1,..., K. 1 n n 1

Least Squares Least-squares coefficients b j defined by K orthogonality conditions (x j ) (y xb) = 0 for j = 1,..., K. 1 n n 1 Stack these orthogonality conditions to get (x 1 ). (y xb) = x (y xb) = 0 (x K ) n 1 K n n 1 K n Produces K 1 system of linear equations.

Least Squares Can show (just algebra) x x = n x i x i, x y = i=1 n x i y i. i=1

Least Squares Can show (just algebra) x x = n x i x i, x y = i=1 n x i y i. i=1 Can also write the least-squares coefficients as ( n ) 1 ( n ) ( 1 b = x i x i x i y i = n i=1 i=1 n ) 1 ( 1 x i x i n i=1 n x i y i ). i=1 This formula is extremely useful when we discuss asymptotics.

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1,

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1, Long linear predictor of Y E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1, Long linear predictor of Y E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K Auxiliary linear predictor of X K E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1.

OVB formula (again) Let X K = X K γ 1 X 1... γ K 1 X K 1.

OVB formula (again) Let X K = X K γ 1 X 1... γ K 1 X K 1. So X K = γ 1 X 1 +... + γ K 1 X K 1 + X K, Y = β 1 X 1 +... + β K 1 X K 1 + β K X K + U where U = Y E [Y X 1,..., X K ] and β j = β j + γ j β K for j = 1,..., K 1.

OVB formula (again) Note that Y β 1 X 1... β K 1 X K 1, X j = β K X K + U, X j = 0 for j = 1,..., K 1.

OVB formula (again) Note that Y β 1 X 1... β K 1 X K 1, X j = β K X K + U, X j = 0 for j = 1,..., K 1. These are the same orthogonality conditions of the short linear predictor. So α j = β j + γ j β K for j = 1,..., K 1.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Residual Regression Very useful trick that is used all the time. Consider best linear predictor with K covariates E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K. Focus on β K. Is there simple closed-form expression for β K?

Residual Regression Auxiliary linear predictor of X K given X 1,..., X K 1. Denote this as E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1 Associated residual is X K = X K γ 1 X 1... γ K 1 X K 1.

Residual Regression Auxiliary linear predictor of X K given X 1,..., X K 1. Denote this as E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1 Associated residual is X K = X K γ 1 X 1... γ K 1 X K 1. Theorem (Residual Regression) β K can be written as β K = E[Y X K ] E[ X 2 K ].

Residual Regression: proof By definition, X K = γ 1 X 1 +... + γ K 1 X K 1 + X K.

Residual Regression: proof By definition, X K = γ 1 X 1 +... + γ K 1 X K 1 + X K. Substitute into E [Y X 1,..., X k ] to get E [Y X 1,..., X K ] = β 1 X 1 +... + β K 1 X K 1 + β K X K where β j = β j + γ j β K for j = 1,..., K 1.

Residual Regression: proof X K is a linear combination of X 1,..., X K. So, it is orthogonal to the residual Y E [Y X 1,..., X K ]. Y β 1 X 1... β K 1 X K 1 β K XK, X K = 0.

Residual Regression: proof X K is a linear combination of X 1,..., X K. So, it is orthogonal to the residual Y E [Y X 1,..., X K ]. Y β 1 X 1... β K 1 X K 1 β K XK, X K = 0. X K is orthogonal to X 1,..., X K 1. Above simplifies to Y β K XK, X K = 0 and so, β K = E[Y X K ] E[ X 2 K ].

Residual Regression: intuition Coefficient β K on X K is the coefficient of the best linear predictor of Y given the residuals of X K. If the conditional expectation is linear, β K is the "partial effect" of X K on Y holding all else constant.

Exercise 1 Consider the long and short predictors in the population: E [Y 1, X 1, X 2, X 3 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 E [Y 1, X 1, X 2 ] = α 0 + α 1 X 1 + α 2 X 2 (1) Provide a formula relating α 2 and β 2.

Exercise 1 Consider the long and short predictors in the population: E [Y 1, X 1, X 2, X 3 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 E [Y 1, X 1, X 2 ] = α 0 + α 1 X 1 + α 2 X 2 (1) Provide a formula relating α 2 and β 2. (2) Suppose that X 3 is uncorrelated to X 1 and X 2. Does α 2 = β 2?

Exercise 2 The partial covariance between the random variables Y, X K given X 1,..., X K 1 is where The partial variance of Y is Cov (Y, X K X 1,..., X K 1 ) = E[Ỹ X K ] Ỹ = Y E [Y X 1,..., X K 1 ] X K = X K E [X K X 1,..., X K 1 ]. V (Y X 1,..., X K 1 ) = E[Ỹ 2 ]. The partial correlation between Y, X K is Cor (Y, X K X 1,..., X K 1 ) = E[Ỹ X K ] (E[Ỹ ]E[ X K ]) 1/2.

Exercise 2 (continued) (1): Consider the linear predictor E [Y 1, X 1,..., X K ] = β 0 + β 1 X 1 +... + β K X K. Use our residual regression results to show that β K = Cov (Y, X K 1, X 1,..., X K 1 ) V. (X K 1, X 1,..., X K 1 )

Residual Regression: sample analog Consider the least-squares fit using K covariates ŷ i = y i x 1,..., x K = b 1 x i1 +... + b K x ik. Following a similar argument, can show that b K is the coefficient of the least-squares fit of y on x k, the vector of residuals given by That is, b K can be written as x ik = x ik x ik x 1,..., x K 1. b K = 1 n n i=1 y i x ik 1 n n i=1 x ik 2.

Frisch-Waugh-Lovell Theorem Residual regression is typically known as the Frisch-Waugh-Lovell Theorem.

Frisch-Waugh-Lovell Theorem Residual regression is typically known as the Frisch-Waugh-Lovell Theorem. Interested in regression Y = X 1 β 1 + X 2 β 2 + u, where Y is an n 1 vector, X 1 is an n K 1 matrix, X 2 is an n K 2 matrix and u is an n 1 vector of residuals. How can we write the least squares coefficients in β 2?

Frisch-Waugh-Lovell Theorem Same as the estimate in the modified regression M X1 Y = M X1 X 2 β 2 + M X1 u where M X1 is the orthogonal complement of the projection matrix X 1 (X 1 X 1) 1 X 1. M X1 = I X 1 (X 1X 1 ) 1 X 1 It projects onto the orthogonal complement of the space spanned by the columns of X 1.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Useful Trick - Orthogonal Decomposition Notice we used the same trick in OVB and residual regression results. Decomposed variable into two pieces - one piece that lives in simple space and another that is orthogonal to this simple space.

Useful Trick - Orthogonal Decomposition Example: random variable Y, decompose it into Y = E[Y X ] + (Y E[Y X ]). }{{} U E[Y X ] lives on the space of functions of X only and U is orthogonal to any function of X. Example: random variable Y, decompose it into Y = E [Y X 1,..., X K ] + (Y E [Y X 1,..., X K ]). }{{} V E [Y X 1,..., X K ] lives on the space of linear functions of X 1,..., X K only and V is orthogonal to any linear function of X 1,..., X K.