Econ 2120: Section 2

Similar documents
Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Motivation for multiple regression

Covariance and Correlation

Problem Set - Instrumental Variables

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Ordinary Least Squares Regression

Simple Linear Regression Estimation and Properties

Solutions for Econometrics I Homework No.1

Reference: Davidson and MacKinnon Ch 2. In particular page

ECO375 Tutorial 8 Instrumental Variables

Homework Set 2, ECO 311, Spring 2014

Essential of Simple regression

Multivariate Regression: Part I

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Final Exam. Economics 835: Econometrics. Fall 2010

Multiple Regression Model: I

Chapter 2: simple regression model

Properties of Summation Operator

We begin by thinking about population relationships.

Basic Probability Reference Sheet

Dealing With Endogeneity

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

STAT 100C: Linear models

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Linear Regression. Junhui Qian. October 27, 2014

Statistics 910, #15 1. Kalman Filter

The returns to schooling, ability bias, and regression

2 (Statistics) Random variables

Multivariate Random Variable

Lecture 6: Geometry of OLS Estimation of Linear Regession

ECON The Simple Regression Model

Vectors and Matrices Statistics with Vectors and Matrices

ECO220Y Simple Regression: Testing the Slope

The Hilbert Space of Random Variables

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Multivariate Regression

Topics in Probability and Statistics

Homework Set 2, ECO 311, Fall 2014

Regression Analysis. Ordinary Least Squares. The Linear Model

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

STA 302f16 Assignment Five 1

Regression and Covariance

Non-Spherical Errors

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra

Regression: Ordinary Least Squares

CS70: Jean Walrand: Lecture 22.

Econometrics for PhDs

Key Algebraic Results in Linear Regression

1 Correlation between an independent variable and the error

Lecture 4: Least Squares (LS) Estimation

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

For more information about how to cite these materials visit

MTH 2310, FALL Introduction

Introduction to Simple Linear Regression

Linear Regression with Time Series Data

ECE Lecture #9 Part 2 Overview

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

The Multivariate Gaussian Distribution

Math Linear Algebra

CS70: Lecture 33. Linear Regression. 1. Examples 2. History 3. Multiple Random variables 4. Linear Regression 5. Derivation 6.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Multiple Equation GMM with Common Coefficients: Panel Data

STAT 350: Geometry of Least Squares

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Matrix Factorizations

Homoskedasticity. Var (u X) = σ 2. (23)

Linear Models in Econometrics

Econ 582 Fixed Effects Estimation of Panel Data

Linear Model Under General Variance

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Fitting Linear Statistical Models to Data by Least Squares II: Weighted

University of Regina. Lecture Notes. Michael Kozdron

conditional cdf, conditional pdf, total probability theorem?

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Exercises * on Principal Component Analysis

WLS and BLUE (prelude to BLUP) Prediction

PROBABILITY THEORY REVIEW

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

1.12 Multivariate Random Variables

Regression. Econometrics II. Douglas G. Steigerwald. UC Santa Barbara. D. Steigerwald (UCSB) Regression 1 / 17

Econometrics I Lecture 3: The Simple Linear Regression Model

Chapter 4 continued. Chapter 4 sections

Lecture Notes Special: Using Neural Networks for Mean-Squared Estimation. So: How can we evaluate E(X Y )? What is a neural network? How is it useful?

Unsupervised Learning: Dimensionality Reduction

Projection. Ping Yu. School of Economics and Finance The University of Hong Kong. Ping Yu (HKU) Projection 1 / 42

Linear Algebra. Maths Preliminaries. Wei CSE, UNSW. September 9, /29

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Singular Value Decomposition and Principal Component Analysis (PCA) I

Instrumental Variables

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Transcription:

Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Orthogonality Conditions Goal of first half of the class: Get you to think in terms of projections. Best Linear Predictor: Projection onto space of linear functions of X Conditional expectation: Projection onto space of all linear functions of X Key: There is a one-to-one map between the projection problem and the orthogonality condition.

Orthogonality Conditions We can always (trivially) write Y = g(x ) + U, U = Y g(x ). To understand what g(x ) describes, you just need to know the orthogonality condition that U satisfies. Two cases: (1): g(x ) = β 0 + β 1 X and E[U] = E[U X ] = 0 = g(x ) is the best linear predictor. (2): E[U h(x )] = 0 for all h( ). = g(x ) = E[Y X ].

Orthogonality Conditions Why is this useful? If you can transform your minimization problem e.g. into a projection problem, min E[(Y β 0 β 1 X ) 2 ] β 0,β 1 min Y β 0 β 1 X 2, β 0,β 1 you have a whole new set of (very) useful tools. By the projection theorem, we can characterize the solution by a set of orthogonality conditions AND we have uniqueness.

Best Linear Predictor Why are we emphasizing the best linear predictor E [Y X ] so much?

Best Linear Predictor Why are we emphasizing the best linear predictor E [Y X ] so much? In some sense, it can always be computed. Requires very few assumptions aside from the existence of second moments of the joint distribution. You can always go to Stata, type reg y x and get something that has the interpretation of a best linear predictor. Regression output = estimated coefficients of best linear predictor.

Best Linear Predictor Why are we emphasizing the best linear predictor E [Y X ] so much? In some sense, it can always be computed. Requires very few assumptions aside from the existence of second moments of the joint distribution. You can always go to Stata, type reg y x and get something that has the interpretation of a best linear predictor. Regression output = estimated coefficients of best linear predictor. BUT: It is a completely separate question if this is an interesting interpretation. To say more, you need to assume more.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Linear Predictor X = (X 1,..., X K ) is a K 1 vector, β = (β 1,..., β K ) is the K 1 vector of coefficients of the best linear predictor. Coefficients characterized by K orthogonality conditions E[(Y X β)x j ] = E[X j (Y X β)] = 0 for j = 1,..., K.

Linear Predictor X = (X 1,..., X K ) is a K 1 vector, β = (β 1,..., β K ) is the K 1 vector of coefficients of the best linear predictor. Coefficients characterized by K orthogonality conditions E[(Y X β)x j ] = E[X j (Y X β)] = 0 for j = 1,..., K. Re-write in vector form to get E[ X (Y X β) ] = 0. K 1 1 1 This is a K 1 system of linear equations.

Linear Predictor X = (X 1,..., X K ) is a K 1 vector, β = (β 1,..., β K ) is the K 1 vector of coefficients of the best linear predictor. Coefficients characterized by K orthogonality conditions E[(Y X β)x j ] = E[X j (Y X β)] = 0 for j = 1,..., K. Re-write in vector form to get E[ X (Y X β) ] = 0. K 1 1 1 This is a K 1 system of linear equations. Provided E[XX ] is non-singular, E[XY ] K 1 E[XX ] K K β K 1 = 0 = β K 1 = E[XX ] 1 K K E[XY ]. K 1

Least Squares Data are (y i, x i1,..., x ik ) for i = 1,..., n. x i = (x i1,..., x ik ) is the 1 K vector of covariates for the i-th observation and x j = (x 1j,..., x nj ) be the n 1 vector of observations of the j-th covariate.

Least Squares Data are (y i, x i1,..., x ik ) for i = 1,..., n. x i = (x i1,..., x ik ) is the 1 K vector of covariates for the i-th observation and x j = (x 1j,..., x nj ) be the n 1 vector of observations of the j-th covariate. Define y = n 1 y 1. y n, b = K 1 b 1. b k, x = n K x 1. x n = ( x 1... x K ). n K matrix x is sometimes referred to as the design matrix.

Least Squares Least-squares coefficients b j defined by K orthogonality conditions (x j ) (y xb) = 0 for j = 1,..., K. 1 n n 1

Least Squares Least-squares coefficients b j defined by K orthogonality conditions (x j ) (y xb) = 0 for j = 1,..., K. 1 n n 1 Stack these orthogonality conditions to get (x 1 ). (y xb) = x (y xb) = 0 (x K ) n 1 K n n 1 K n Produces K 1 system of linear equations.

Least Squares Least-squares coefficients b j defined by K orthogonality conditions (x j ) (y xb) = 0 for j = 1,..., K. 1 n n 1 Stack these orthogonality conditions to get (x 1 ). (y xb) = x (y xb) = 0 (x K ) n 1 K n n 1 K n Produces K 1 system of linear equations. Provided x x K K non-singular, b = (x x) 1 x y. K 1 K K K 1

Least Squares Can show (just algebra) x x = n x i x i, x y = i=1 n x i y i. i=1

Least Squares Can show (just algebra) x x = n x i x i, x y = i=1 n x i y i. i=1 Can also write the least-squares coefficients as ( n ) 1 ( n ) ( 1 b = x i x i x i y i = n i=1 i=1 n ) 1 ( 1 x i x i n i=1 n x i y i ). i=1 This formula is extremely useful when we discuss asymptotics.

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1,

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1, Long linear predictor of Y E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K

OVB formula (again) Short linear predictor of Y : E [Y X 1,..., X K 1 ] = α 1 X 1 +... + α K 1 X K 1, Long linear predictor of Y E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K Auxiliary linear predictor of X K E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1.

OVB formula (again) Let X K = X K γ 1 X 1... γ K 1 X K 1.

OVB formula (again) Let X K = X K γ 1 X 1... γ K 1 X K 1. So X K = γ 1 X 1 +... + γ K 1 X K 1 + X K, Y = β 1 X 1 +... + β K 1 X K 1 + β K X K + U where U = Y E [Y X 1,..., X K ] and β j = β j + γ j β K for j = 1,..., K 1.

OVB formula (again) Note that Y β 1 X 1... β K 1 X K 1, X j = β K X K + U, X j = 0 for j = 1,..., K 1.

OVB formula (again) Note that Y β 1 X 1... β K 1 X K 1, X j = β K X K + U, X j = 0 for j = 1,..., K 1. These are the same orthogonality conditions of the short linear predictor. So α j = β j + γ j β K for j = 1,..., K 1.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Residual Regression Very useful trick that is used all the time. Consider best linear predictor with K covariates E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K. Focus on β K. Is there simple closed-form expression for β K?

Residual Regression Very useful trick that is used all the time. Consider best linear predictor with K covariates E [Y X 1,..., X K ] = β 1 X 1 +... + β K X K. Focus on β K. Is there simple closed-form expression for β K? Yes! Use residual regression.

Residual Regression Auxiliary linear predictor of X K given X 1,..., X K 1. Denote this as E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1 Associated residual is X K = X K γ 1 X 1... γ K 1 X K 1.

Residual Regression Auxiliary linear predictor of X K given X 1,..., X K 1. Denote this as E [X K X 1,..., X K 1 ] = γ 1 X 1 +... + γ K 1 X K 1 Associated residual is X K = X K γ 1 X 1... γ K 1 X K 1. Theorem (Residual Regression) β K can be written as β K = E[Y X K ] E[ X 2 K ].

Residual Regression: proof By definition, X K = γ 1 X 1 +... + γ K 1 X K 1 + X K.

Residual Regression: proof By definition, X K = γ 1 X 1 +... + γ K 1 X K 1 + X K. Substitute into E [Y X 1,..., X k ] to get E [Y X 1,..., X K ] = β 1 X 1 +... + β K 1 X K 1 + β K X K where β j = β j + γ j β K for j = 1,..., K 1.

Residual Regression: proof X K is a linear combination of X 1,..., X K. So, it is orthogonal to the residual Y E [Y X 1,..., X K ]. Y β 1 X 1... β K 1 X K 1 β K XK, X K = 0.

Residual Regression: proof X K is a linear combination of X 1,..., X K. So, it is orthogonal to the residual Y E [Y X 1,..., X K ]. Y β 1 X 1... β K 1 X K 1 β K XK, X K = 0. X K is orthogonal to X 1,..., X K 1. Above simplifies to Y β K XK, X K = 0 and so, β K = E[Y X K ] E[ X 2 K ].

Residual Regression: intuition Coefficient β K on X K is the coefficient of the best linear predictor of Y given the residuals of X K. If the conditional expectation is linear, β K is the "partial effect" of X K on Y holding all else constant.

Exercise 1 Consider the long and short predictors in the population: E [Y 1, X 1, X 2, X 3 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 E [Y 1, X 1, X 2 ] = α 0 + α 1 X 1 + α 2 X 2 (1) Provide a formula relating α 2 and β 2.

Exercise 1 Consider the long and short predictors in the population: E [Y 1, X 1, X 2, X 3 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 E [Y 1, X 1, X 2 ] = α 0 + α 1 X 1 + α 2 X 2 (1) Provide a formula relating α 2 and β 2. (2) Suppose that X 3 is uncorrelated to X 1 and X 2. Does α 2 = β 2?

Exercise 1 Consider the long and short predictors in the population: E [Y 1, X 1, X 2, X 3 ] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 E [Y 1, X 1, X 2 ] = α 0 + α 1 X 1 + α 2 X 2 (1) Provide a formula relating α 2 and β 2. (2) Suppose that X 3 is uncorrelated to X 1 and X 2. Does α 2 = β 2? (3): Suppose that Cov(X 2, X 3 ) = 0, Cov(X 1, X 3 ) 0, Cov(X 1, X 2 ) 0. Does α 2 = β 2?

Exercise 2 The partial covariance between the random variables Y, X K given X 1,..., X K 1 is where The partial variance of Y is Cov (Y, X K X 1,..., X K 1 ) = E[Ỹ X K ] Ỹ = Y E [Y X 1,..., X K 1 ] X K = X K E [X K X 1,..., X K 1 ]. V (Y X 1,..., X K 1 ) = E[Ỹ 2 ]. The partial correlation between Y, X K is Cor (Y, X K X 1,..., X K 1 ) = E[Ỹ X K ] (E[Ỹ ]E[ X K ]) 1/2.

Exercise 2 (continued) (1): Consider the linear predictor E [Y 1, X 1,..., X K ] = β 0 + β 1 X 1 +... + β K X K. Use our residual regression results to show that β K = Cov (Y, X K 1, X 1,..., X K 1 ) V. (X K 1, X 1,..., X K 1 )

Residual Regression: sample analog Consider the least-squares fit using K covariates ŷ i = y i x 1,..., x K = b 1 x i1 +... + b K x ik. Following a similar argument, can show that b K is the coefficient of the least-squares fit of y on x k, the vector of residuals given by That is, b K can be written as x ik = x ik x ik x 1,..., x K 1. b K = 1 n n i=1 y i x ik 1 n n i=1 x ik 2.

Frisch-Waugh-Lovell Theorem Residual regression is typically known as the Frisch-Waugh-Lovell Theorem.

Frisch-Waugh-Lovell Theorem Residual regression is typically known as the Frisch-Waugh-Lovell Theorem. Interested in regression Y = X 1 β 1 + X 2 β 2 + u, where Y is an n 1 vector, X 1 is an n K 1 matrix, X 2 is an n K 2 matrix and u is an n 1 vector of residuals. How can we write the least squares coefficients in β 2?

Frisch-Waugh-Lovell Theorem Same as the estimate in the modified regression M X1 Y = M X1 X 2 β 2 + M X1 u where M X1 is the orthogonal complement of the projection matrix X 1 (X 1 X 1) 1 X 1. M X1 = I X 1 (X 1X 1 ) 1 X 1 It projects onto the orthogonal complement of the space spanned by the columns of X 1.

Frisch-Waugh-Lovell Theorem Same as the estimate in the modified regression M X1 Y = M X1 X 2 β 2 + M X1 u where M X1 is the orthogonal complement of the projection matrix X 1 (X 1 X 1) 1 X 1. M X1 = I X 1 (X 1X 1 ) 1 X 1 It projects onto the orthogonal complement of the space spanned by the columns of X 1. β 2 can be written as the coefficient in the regression of residuals of Y on residuals of X 2.

Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted Variable Bias Formula Residual Regression Useful Trick

Useful Trick - Orthogonal Decomposition Notice we used the same trick in OVB and residual regression results. Decomposed variable into two pieces - one piece that lives in simple space and another that is orthogonal to this simple space.

Useful Trick - Orthogonal Decomposition Example: random variable Y, decompose it into Y = E[Y X ] + (Y E[Y X ]). }{{} U E[Y X ] lives on the space of functions of X only and U is orthogonal to any function of X.

Useful Trick - Orthogonal Decomposition Example: random variable Y, decompose it into Y = E[Y X ] + (Y E[Y X ]). }{{} U E[Y X ] lives on the space of functions of X only and U is orthogonal to any function of X. Example: random variable Y, decompose it into Y = E [Y X 1,..., X K ] + (Y E [Y X 1,..., X K ]). }{{} V E [Y X 1,..., X K ] lives on the space of linear functions of X 1,..., X K only and V is orthogonal to any linear function of X 1,..., X K.