Our point of departure, as in Chapter 2, will once more be the outcome equation:

Similar documents
Properties of the least squares estimates

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Final Exam. Economics 835: Econometrics. Fall 2010

We begin by thinking about population relationships.

MS&E 226: Small Data

Instrumental Variables

1 Motivation for Instrumental Variable (IV) Regression

Linear IV and Simultaneous Equations

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

FNCE 926 Empirical Methods in CF

The Linear Regression Model

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

ECO Class 6 Nonparametric Econometrics

ECON The Simple Regression Model

Least Squares Estimation-Finite-Sample Properties

Handout 11: Measurement Error

Instrumental Variables and the Problem of Endogeneity

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Chapter 2: simple regression model

The Statistical Property of Ordinary Least Squares

Vector, Matrix, and Tensor Derivatives

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

Using Instrumental Variables to Find Causal Effects in Public Health

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

ECO375 Tutorial 8 Instrumental Variables

plim W 0 " 1 N = 0 Prof. N. M. Kiefer, Econ 620, Cornell University, Lecture 16.

Instrumental Variables. Ethan Kaplan

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

The Multivariate Gaussian Distribution [DRAFT]

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =

Missing dependent variables in panel data models

1 Correlation between an independent variable and the error

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Applied Health Economics (for B.Sc.)

ECNS 561 Multiple Regression Analysis

Lecture 8: Instrumental Variables Estimation

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Intermediate Econometrics

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Instrumental Variables

Math 5a Reading Assignments for Sections

Simultaneous Equations and Weak Instruments under Conditionally Heteroscedastic Disturbances

Dealing With Endogeneity

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Algebra Exam. Solutions and Grading Guide

A quadratic expression is a mathematical expression that can be written in the form 2

Discussion of Sensitivity and Informativeness under Local Misspecification

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Discrete Dependent Variable Models

Econometrics Summary Algebraic and Statistical Preliminaries

5.1 Simplifying Rational Expressions

Lecture 14. More on using dummy variables (deal with seasonality)

0.1. Linear transformations

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Specification Errors, Measurement Errors, Confounding

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Chapter 0 of Calculus ++, Differential calculus with several variables

Dr. Relja Vulanovic Professor of Mathematics Kent State University at Stark c 2008

Linear Models in Econometrics

Linear Regression. Junhui Qian. October 27, 2014

Next is material on matrix rank. Please see the handout

Multiple Linear Regression CIVL 7012/8012

Interpreting Regression Results

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

CAAM 454/554: Stationary Iterative Methods

Updated: January 16, 2016 Calculus II 7.4. Math 230. Calculus II. Brian Veitch Fall 2015 Northern Illinois University

Statistics 910, #5 1. Regression Methods

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Fixed Effects Models for Panel Data. December 1, 2014

Generalized Method of Moments (GMM) Estimation

Specification Test for Instrumental Variables Regression with Many Instruments

[y i α βx i ] 2 (2) Q = i=1

Control Function and Related Methods: Nonlinear Models

Eco517 Fall 2014 C. Sims FINAL EXAM

FNCE 926 Empirical Methods in CF

The Estimation of Simultaneous Equation Models under Conditional Heteroscedasticity

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Math Lecture 3 Notes

Introductory Econometrics

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Chapter 1. GMM: Basic Concepts

11. Further Issues in Using OLS with TS Data

The LIML Estimator Has Finite Moments! T. W. Anderson. Department of Economics and Department of Statistics. Stanford University, Stanford, CA 94305

Econ 2148, fall 2017 Instrumental variables II, continuous treatment

Motivation for multiple regression

( ) is called the dependent variable because its

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Lecture 4: Testing Stuff

What s New in Econometrics. Lecture 13

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

5.2. a. Unobserved factors that tend to make an individual healthier also tend

Transcription:

Chapter 4 Instrumental variables I 4.1 Selection on unobservables Our point of departure, as in Chapter 2, will once more be the outcome equation: Y Dβ + Xα + U, 4.1 where treatment intensity will once again be given by equation 4.2: D Xπ + Zγ + U D. 4.2 4.1.1 Selection on unobservables Contrary to the assumption in chapter 2, we will now assume that it is no longer the case that U D U, and will hence assume that: cov U D, U. 4.3 Equation 4.3 implies that D and U are correlated, and we are thus in the presence of Source of Bias No. 1, as in Chapter 3. Why does the correlation between U and U D cause problems? To see why, simply consider the expression for E β OLS that we derived in Chapter 2. You will certainly recall that equation 2.22 stated that, under the assumption of selection on observables i.e. that D Xπ + U D : E β OLS β + cov U D, U. var U D It should be transparent that as soon as cov U D, U, and as long as var U D, that we will now have E β OLS β. In section 2.1.3 in Chapter 2, there was a first hint on how to proceed. Recall that we wrote: Y Dβ + Xα + Zφ + U, D Xπ + Zγ + U D, and then assumed that we "forgot" to include Z in our outcome equation, despite the 39

4 CHAPTER 4. INSTRUMENTAL VARIABLES I fact that it entered the treatment intensity equation. This yielded: γ var Z φ + cov U D, U E β OLS β +. γ var Z + EZ M X EZ γ + var U D But what if we should "forget" Z in our outcome equation? Can we make any headway if we were able to safely assume that φ, and that γ? OLS is certainly not going to be the answer because, under these assumptions: cov U D, U E β OLS β +. γ var Z + EZ M X EZ γ + var U D 4.2 Instrumental variables are consistent, but biased Assume that φ, and that γ. 1 Our pair of equations is then given by: Y Dβ + Xα + U, D Xπ + Zγ + U D. which we shall write in partialled out form as: M X Y M X Dβ + M X U, 4.4 M X D M X Zγ + M X U D. 4.5 We will also write the structural equation in another way as: Y Xπ + Zγ + U D β + Xα + U, Zγβ + X πβ + α + U + U D β, or, in partialled out form, and posing V U + U D β : M X Y M X Zγβ + M X V. 4.6 4.2.1 Notation and basic setup As most readers will know, the intuition behind two-stage least squares is to run OLS on our structural equation as we saw this would yield β OLS M XD M X Y M X D M X, which we D know to be biased and inconsistent, but to replace D with its predicted value D from the first-stage reduced form. Note that the predicted value of M X D from the estimated partialled out reduced form can be written as: M X D MX Z γ M X Z M X Z M X Z 1 MX Z M X D, 4.7 γ M X Z M X Z M X Z 1 MX Z M X D P MX ZM X D, 4.8 P MX Z 1 This section is based on Hahn and Hausman 22b.

4.2. INSTRUMENTAL VARIABLES ARE CONSISTENT, BUT BIASED 41 where: P MX Z M X Z M X Z M X Z 1 MX Z, is the usual projection matrix of the instruments. expression for the 2SLS estimate of β as: We can then write the standard β 2SLS M XZ γ M X Y M X Z γ M X Z γ D M X P MX ZM X Y D M X P MX ZM X D D P MX ZY D P MX ZD. 4.9 The basic problem here is that we don t have Zγ, but an estimate, given by Z γ. More specifically: γ M XZ M X D M X Z M X Z. Begin with 4.9 and subtract the true value of β yields: Replacing using 4.6: β 2SLS β M XZ γ M X Y M X Z γ β M X Z γ. M X Z γ M X Y from 4.6 β 2SLS β M {}}{ XZ γ M X Zγβ + M X V M X Z γ β M X Z γ, 4.1 M X Z γ M XZ γ M X V M X Z γ γ β M X Z γ. M X Z γ This is the key expression that we will work on in what follows. 4.2.2 Writing things in terms of ther 2 of the reduced form Recall that the partial R 2 of the partialled out first stage reduced form is defined as the proportion of the variance of M X D that is explained by M X Z: R 2 f M XZ γ M X Z γ M X D M X D, so that: and therefore M X Z γ M X Z γ R 2 f M X D M X D, β 2SLS β M XZ γ M X V M X Z γ γ β M X Z γ M X Z γ M XZ γ M X V M X Z γ γ β Rf 2 M XD. M X D 4.11 The interpretation of this expression is clear: ceteris paribus, the bias of 2SLS increases as Rf 2 becomes small. This is a first indication of why it is so important that your instruments be "strong".

42 CHAPTER 4. INSTRUMENTAL VARIABLES I 4.2.3 The numerator of 2SLS bias We now return to dealing with 4.1. Equating 4.4 and 4.6 yields: which can be rearranged as: which in turn implies that: M X Y from 4.4 M X Y from 4.6 {}}{ M X Dβ + M X U M X Zγβ + M X V, M X V M X Dβ M X Zγβ + M X U, M X V M X Z γ γ β M X Dβ M X Zγβ + M X U M X Z γ γ β 4.12 M X V M X Dβ + M X U M X Z γβ. Now consider the numerator of 4.1 and substitute using 4.7 and 4.12: M X Z γ M X V M X Z γ γ β P MX ZM X D M X Dβ + M X U M X Z γβ, M X Z γ from 4.7 M X V M X Z γ γβ from 4.12 Now substitute again using 4.7: P MX ZM X D M X Dβ + M X U M X Z γβ. M X Z γ M X V M X Z γ γ β P MX ZM X D M X Dβ + M X U P MX ZM X D β, M X Z γ from 4.7 which can be rewritten as: P MX ZM X D I P MX Z M X Dβ + M X U, M X Z γ M X V M X Z γ γ β P MX ZM X D M MX Z M X Dβ + M X U. I P MX Z Carrying out the multiplication yields: M X Z γ M X V M X Z γ γ β P MX ZM X D M MX ZM X Dβ + M X U, D M X P MX ZM MX ZM X Dβ + P MX ZM X D M X U. where P MX ZM MX Z because P MX Z and M MX Z are orthogonal complements. 2 equality therefore simplifies to: The M X Z γ M X V M X Z γ γ β P MX ZM X D M X U. 2 P MX ZM MX Z P MX Z I P MX Z P MX Z P MX ZP MX Z P MX Z P MX Z n.

4.2. INSTRUMENTAL VARIABLES ARE CONSISTENT, BUT BIASED 43 Now develop the remaining term on the RHS using the "true" expression for M X D from equation 4.5: M X Z γ M X V M X Z γ γ β P MX ZM X D M X U, and thus: P MX ZM X Zγ + M X U D M X U, M X D from 4.5 P MX ZM X Zγ M X U + P MX ZM X U D M X U, M X Z γ M X V M X Z γ γ β P MX ZM X Zγ M X U + P MX ZM X U D M X U. Now replace the projection matrix explicitly on the RHS of this expression: M X Z γ M X V M X Z γ γ β M X Z M X Z M X Z 1 MX Z M X Z γ M X U P MX Z Simplifying terms then yields: so that: + P MX ZM X U D M X U. M X Z γ M X V M X Z γ γ β M X Zγ M X U + P MX ZM X U D M X U, Note also that: γ Z M X U + M X U D P MX Z M X U, M X Z γ M X V M X Z γ γ β γ Z M X U + U D M XP MX ZM X U. which implies that: Finally, take expectations: M X P MX ZM X M X M X Z M X Z M X Z 1 MX Z M X, M X Z M X Z M X Z 1 MX Z, P MX Z, M X Z γ M X V M X Z γ γ β γ Z M X U + U D P M X ZU. E M X Z γ M X V M X Z γ γ β γ E Z M X U + E U DP MX ZU. 4.13 4.2.4 The denominator of 2SLS bias Now consider the denominator of 4.1: M X Z γ M X Z γ P MX ZM X D P MX ZM X D M X D P MX Z M X D.

44 CHAPTER 4. INSTRUMENTAL VARIABLES I Substitute from the "true" reduced form and expand the expression; this yields: M X Z γ M X Z γ M X D P MX Z M X D, M X Zγ + M X U D P MX Z M X Zγ + M X U D. Now develop the product on the RHS: M X Z γ M X Z γ γ M X Z P MX Z M X Z γ + γ M X Z P MX Z M X U D + M X U D P MX ZM X Zγ + M X U D P MX ZM X U D, which can be written more esthetically as: M X Z γ M X Z γ γ M X Z M X Z γ + γ M X Z M X U D + M X U D M X Z γ + M X U D P MX Z M X U D, or M X Z γ M X Z γ γ Z M X Zγ + γ Z M X U D + U DM X Zγ + U D M X P MX ZM X U D. Since we have already shown that M X P MX ZM X P MX Z, this simplifies to: M X Z γ M X Z γ γ Z M X Zγ + γ Z M X U D + U DM X Zγ + U DP MX ZU D. Finally, take expectations: E M X Z γ M X Z γ γ E Z M X Z γ + γ E Z M X U D 4.14 +E U DM X Z γ + E U DP MX ZU D. 4.2.5 Combining the two expressions Pulling everything together, and substituting 4.13 and 4.14 into 4.1 allows one to rewrite the expression for the bias of 2SLS as: E β 2SLS β E M X Z γ M X V M X Z γ γ β E M X Z γ M X Z γ γ E Z M X U + E U DP MX ZU γ E Z M X Z γ + γ E Z M X U D + E U D M XZ γ + E U D P M X ZU D. Applying Theorem 2 to the various bilinear forms in this last expression, we obtain: E Z M X U tr M X cov Z, U + E Z M X E U, E U D P M X ZU tr P MX Z cov U D, U + E U D P MX ZE U, tr P MX Z cov U D, U,

4.2. INSTRUMENTAL VARIABLES ARE CONSISTENT, BUT BIASED 45 E U DP MX ZU D tr P MX Z cov U D, U D + E U D P MX ZE U D, tr P MX Z var U D, E Z M X U D tr M X cov Z, U D + E Z M X E U D, E Z M X Z tr M X var Z + EZ M X E Z. Note also that there is an alternative expression for the denominator, given by E R 2 f M XD M X D R 2 f E D M X D R 2 f tr M X var D + E D M X E D. Substituting into the expression for E β 2SLS β yields: E β 2SLS β {}}{ γ E Z M X U + E U D P M X ZU γ E Z M X Z γ + γ E Z M X U D + E U DM X Z γ + E U D P M X ZU D, so that: E β 2SLS β tr P MX Z cov U D, U γ tr M X var Z + E Z M X E Z γ + tr P MX Z var U D. In order to tidy up this expression, we need to establish the trace of P MX Z. tr P MX Z tr M X Z M X Z M X Z 1 MX Z, tr M X Z M X Z M X Z M X Z 1, tr Z M X Z Z M X Z 1, tr I KZ, K Z, Note that: where the second equality follows from the commutative property of the trace operator. Recall also from Lemma 1 that rank M X n K X. Therefore: tr M X var Z + E Z M X E Z n K X var Z + E Z M X E Z. Therefore, we can rewrite E β 2SLS β as: E β 2SLS β tr P MX Z cov U D, U γ tr M X var Z + E Z M X E Z γ + tr P MX Z var U D, K Z cov U D, U, n K X γ var Z + EZ M X EZ γ + K Z var U D

46 CHAPTER 4. INSTRUMENTAL VARIABLES I or alternatively, in terms of the r-squared of the reduced form: E β 2SLS β K Z cov U D, U R 2 f n K X var D + E D M X E D. These are two very useful expression for understanding 2SLS. following: They tell you the bias is increasing in the number of instruments, K Z ; bias is increasing in the "degree of endogeneity," cov U D, U; bias is decreasing in the fit of the first-stage reduced form, R 2 f ; bias is increasing in the number of exogenous covariates, K X ; bias is decreasing in the variance of treatment intensity, var D ; as instruments become weak, lime β 2SLS β covu D,U γ varu D instruments bias 2SLS towards OLS; as n gets large n the bias decreases: 2SLS is consistent. E β OLS β : weak

4.3. THE FORBIDDEN REGRESSION 47 4.3 The forbidden regression One of the most common mistakes made in applying 2SLS in practice has been memorably christened the "forbidden regression" by Jerry Hausman. Though the version of the forbidden regression discussed in standard textbooks such as Wooldridge stems, for example, from confusing the linear prediction of a quadratic term with the square of the linear prediction, the Hausman version of the "forbidden regression" is arguably much more common. 3 There are probably two reasons for this. First, because higher order or multiplicative terms are common though not legion in applied work, but second, and most importantly, because running the Hausman forbidden regression is so easy to do. This is particularly true because of the bad habit of carrying out two-stage procedures in which the first-stage reduced form is used to produce an estimated value of treatment intensity, D, which is then plugged into an OLS estimate of β in the structural equation. The problem is that generations of applied researchers have plugged the wrong value of D into the structural equation. In algebraic terms, the predicted value of D used in the forbidden regression is obtained by incorrectly replacing γ M XZ M X D with γ M X Z M X Z F R Z D and only then partialing Z Z out X from the structural equation. In all fairness to the aforementioned researchers, it is often the case that a subset of the Xs is unintentionally omitted from the first-stage reduced form. In what follows, I consider the case where all Xs have been forgotten: the attentive reader will notice that the argument applies to the general case as long as the other non-offensive covariates have already been partialled out both from the structural equation and from the reduced form. In words, the correct 2SLS procedure entails including all of the exogenous covariates that appear in the structural equation in the first-stage reduced form. The forbidden regression involves leaving some or all of them out. In what follows, I derive the expression for the bias of the 2SLS estimator in the case of the forbidden regression, and consider the conditions under which the bias in question will be equal to that of the correct 2SLS procedure. I also show, as is well-known, that 2SLS is inconsistent when the forbidden regression is run, except under very particular circumstances. The incorrect predicted value of D that is used in the forbidden regression can be written as: 1 Z M X Z γ F R M X ZZ Z 1 Z D, M X Z Z Z D M X P Z D. 4.15 γ F R P Z Consider the expression for bias given earlier in equation 4.11, where we have now replaced γ with γ F R : β F R β M XZ γ F R M X V M X Z γ F R γ β M X Z γ F R. M X Z γ F R Substituting M X P Z D for M X Z γ F R into the numerator of the expression for β F R β 3 In the Wooldridge version, one confuses, for example, P Z D 2 with P Z D 2 for the quadratic example, or P Z WD with WP Z D for a multiplicative example.

48 CHAPTER 4. INSTRUMENTAL VARIABLES I yields: and thus: M X Z γ F R M X V M X Z γ F R γ β M X Z γ F R M X Dβ + M X U M X Z γ F R β, M X P Z D M X Dβ + M X U M X P Z Dβ, M X P Z D M X I P Z Dβ + M X U, D P Z M X M X M Z Dβ + M X U, M X Z γ F R M X V M X Z γ F R γ β D P Z M X M Z Dβ + D P Z M X U. 4.16 Partialling out Z from the treatment intensity equation yields: M Z D M Z Xπ + M Z U D, whereas pre-multiplying treatment intensity by P Z and transposing yields: D P Z π X P Z + γ Z + U DP Z. Substituting these two expressions into 4.16: M X Z γ F R M X V M X Z γ F R γ β D P Z M X M Z Dβ + D P Z M X U D P {}} Z M { Z D {}}{ π X P Z + γ Z + U DP Z M X M Z Xπ + M Z U D β D P Z {}}{ + π X P Z + γ Z + U D P ZM X U. Developing terms on the RHS, one obtains: M X Z γ F R M X V M X Z γ F R γ β π X P Z M X M Z Xπ + γ Z M X M Z Xπ + U D P ZM X M Z Xπ +π X P Z M X M Z U D + γ Z M X M Z U D + U D P β ZM X M Z U D + π X P Z M X U + γ Z M X U + U DP Z M X U. Taking expectations yields: E M X Z γ F R M X V M X Z γ F R γ β π E X P Z M X M Z X π + γ E Z M X M Z X π +E U D P ZM X M Z X π + π E X P Z M X M Z U D β 4.17 +γ E Z M X M Z U D + E U DP Z M X M Z U D +π E X P Z M X U + γ E Z M X U + E U DP Z M X U.

4.3. THE FORBIDDEN REGRESSION 49 Applying Theorem 2 allows one to write: E X P Z M X M Z X tr P Z M X M Z var X + E X P Z M X M Z E X tr M Z P Z M X var X + E X P Z M X M Z E X E X P Z M X M Z E X, where the second equality follows from the commutative property of the trace, E Z M X M Z X tr M X M Z cov Z, X + E Z M X M Z E X, E U DP Z M X M Z X, E X P Z M X M Z U D, E Z M X M Z U D, E U DP Z M X M Z U D tr P Z M X M Z var U D, E X P Z M X U, E Z M X U, E U DP Z M X U tr P Z M X cov U D, U. Substituting the preceding expectations into the expression for the numerator given in 4.17 then yields: E M X Z γ F R M X V M X Z γ F R γ β π E X P Z M X M Z E X +γ tr M X M Z cov Z, X + E Z M X M Z E X +tr P Z M X cov U D, U. πβ 4.18 Now consider the denominator of the expression for β F R β. substitutions as for the denominator yields: Making the same M X Z γ F R M X Z γ F R M X P Z D M X P Z D D P Z M X P Z D and performing the multiplication gives us: D P {}} Z P { Z D π X P Z + γ Z + U D P {}}{ ZM X P Z Xπ + Zγ + P Z U D, M X Z γ F R M X Z γ F R π X P Z M X P Z Xπ + π X P Z M X Zγ + π X P Z M X P Z U D +γ Z M X P Z Xπ + γ Z M X Zγ + γ Z M X P Z U D +U DP Z M X P Z Xπ + U DP Z M X Zγ + U DP Z M X P Z U D,

5 CHAPTER 4. INSTRUMENTAL VARIABLES I whereas taking expectations yields: E M X Z γ F R M X Z γ F R π E X P Z M X P Z X π + π E X P Z M X Z γ which can be simplified to: {}}{ +π E X P Z M X P Z U D + γ E Z M X P Z X π {}}{ +γ E Z M X Z γ + γ E Z M X P Z U D {}}{ + E U DP Z M X P Z Xπ + E U DP Z M X Zγ +E U DP Z M X P Z U D, E M X Z γ F R M X Z γ F R π E X P Z M X P Z X π + π E X P Z M X Z γ 4.19 +γ E Z M X P Z X π + γ E Z M X Z γ +E U DP Z M X P Z U D. Noticing, again by the commutative property of the trace, that then allows one to rewrite 4.19 as: tr P Z M X P Z tr P Z P Z M X tr P Z M X, E M X Z γ F R M X Z γ F R π tr P Z M X var X + E X P Z M X P Z E X π +π tr P Z M X cov Z, X + E X P Z M X E Z γ +γ tr P Z M X cov Z, X + E Z M X P Z E X π +γ n K X var Z + E Z M X E Z γ +tr P Z M X var U D, or E M X Z γ F R M X Z γ F R π var X + EX P Z M X P Z EX π +π cov Z, X + EX P Z M X EZ γ tr P Z M X +γ cov Z, X + EZ M X P Z EX π n K + X γ var Z + EZ M X EZ γ +var U D

4.3. THE FORBIDDEN REGRESSION 51 4.3.1 Bias in the forbidden regression We now pull everything together into the expression for E β F R β, which is trivially seen to be given by: Noticing that: E β F R β π E X P Z M X M Z E X +γ tr M X M Z cov Z, X + E Z πβ M X M Z E X +tr P Z M X cov U D, U π var X + EX P Z M X P Z EX π +π cov Z, X + EX P Z M X EZ γ tr P Z M X +γ cov Z, X + EZ M X P Z EX π n K + X γ var Z + EZ M X EZ γ +var U D. 4.2 tr M X M Z tr M Z M X tr M X P Z M X n K X tr P Z M X, allows one to rewrite this as: E β F R β π E X P Z M X M Z E X +γ n K X tr P Z M X cov Z, X + E Z M X M Z E X +tr P Z M X cov U D, U π var X + EX P Z M X P Z EX π +π cov Z, X + EX P Z M X EZ γ tr P Z M X +γ cov Z, X + EZ M X P Z EX π n K + X γ var Z + EZ M X EZ γ +var U D πβ 4.21 In order to clarify issues, consider a special: π : in this case, in which the omission of X from the first stage reduced form is "justified", 4.21 simplifies to: E β F R β tr P Z M X cov U D, U. 4.22 n K X γ var Z + EZ M X EZ γ + tr P Z M X var U D This is the exactly the same expression as for E β 2SLS β, except that K Z is replaced with tr P Z M X. Since, by Theorem 1.24 in Takeuchi, Yanai, and Mukherjee 1982: tr P Z M X min tr P Z, tr M X min K Z, n K X K Z,

52 CHAPTER 4. INSTRUMENTAL VARIABLES I it follows that: E β F R β E β 2SLS β. Unsurprisingly, if the covariates X should indeed be excluded from the first-stage reduced form, bias is smaller if one does so. In all other cases the expression for E β F R β differs from that for E β 2SLS β. 4.3.2 Inconsistency in the forbidden regression Finally, note that, contrary to the correct 2SLS expression, running the forbidden regression yields an inconsistent estimate of β. To see why, rewrite 4.21 as: E β F R β 1 +γ 1 1 π E X P Z M X M Z E X tr P Z M X + 1 tr P Z M X cov U D, U 1 π tr P Z M X cov Z, X + 1 E Z M X M Z E X var X + EX P Z M X P Z EX π + 1 π cov Z, X + EX P Z M X EZ γ + 1 γ cov Z, X + EZ M X P Z EX π 1 + γ var Z + EZ M X EZ γ + 1 var U D πβ where we have simply divided by n K X in both the numerator and the denominator. It follows, taking the limit, that: lim E β F R β n γ cov Z, X π β 4.23 γ var Z + EZ M X EZ γ As such, running the forbidden regression will yield an inconsistent parameter estimate unless one of the three following conditions holds: cov Z, X ; π ; β.

4.4. BIAS CORRECTION AND K CLASS 53 4.4 Bias correction and k class Taking the numerator for 2SLS bias that we have just derived and combining it with the initial expression for the denominator from earlier allows us to write: E β 2SLS β K Zcov U D, U D P MX ZD. Though we do not have an estimate of U since it depends upon the parameter, β, that we are seeking to estimate, we do have an estimate of U D, given by: Û D D X π Z γ. From the partitioned inverse formula that we saw in chapter 2, we know that: X π γ X 1 X I Z Z M X Z 1 Z M X D Z M X Z 1 Z. M X D It follows that: Û D D X π Z γ, D X X X 1 X I Z Z M X Z 1 Z M X D Z Z M X Z 1 Z M X D, I X X X 1 X D I X X X 1 X Z Z M X Z 1 Z M X D, M X D M X Z Z M X Z 1 Z M X D, M X D P MX ZD, M X M MX ZD. Now note that: covû D U 1 n K X K Z EÛ D U Substituting into the expression for bias yields: E β 2SLS β E Û D 1 {}}{ E D M MX ZM X Y Xα Dβ n K X K Z 1 ED M MX ZM X Y Dβ n K X K Z K Z K Z D M MX ZM X Y Dβ D P MX ZD Now define the bias-corrected estimator β N such that: β N β 2SLS E β 2SLS β D P MX ZY D P MX ZD U KZ K Z D M MX ZM X Y Dβ N D P MX ZD..

54 CHAPTER 4. INSTRUMENTAL VARIABLES I Solving for β N then yields: K Z β N D P MX ZY K Z D M MX ZM X Y D K P MX ZD Z K Z D M MX ZM X D. This is the estimator first proposed by Nagar 1959. Donald and Newey 21 have another version of this estimator which is almost identical, but where the correction factor is K Z 2 1 K Z 2 instead of K Z K Z drop the "2" and they become the same: β DN D P MX ZY D P MX ZD K Z 2 1 K Z 2 D M MX ZM X Y K Z 2 1 K Z 2 D M MX ZM X D. Many of the estimators that one uses in practice including OLS, 2SLS and Nagar, that we have already seen can be expressed as special cases of what are known as k-class estimators this goes a long way back to work in the 195s by Henri Theil. The general form taken by these estimators is: β κ D P MX ZY κd M MX ZM X Y D P MX ZD κd M MX ZM X D, 4.24 where κ is a scalar. You should then have no trouble in seeing that: OLS: κ 1 : β κ D P MX ZY + D M MX ZM X Y D P MX ZD + D M MX ZM X D D M X Y D M X D β OLS. 2SLS: κ : β κ D P MX ZY D P MX ZD β 2SLS. Nagar: κ K Z K Z It is harder to see, but nevertheless true that: LIML which stands for "Limited Information Maximum Likelihood": κ smallest eigenvalue, κ, of the matrix W P MX ZW W M MX ZM X W 1, where W Y, D : β κ D P MX ZY κd M MX ZM X Y D P MX ZD κd M MX ZM X D β LIML, Fuller 1977: κ smallest eigenvalue, κ, of the matrix W P MX ZW W M MX ZM X W 1,

4.4. BIAS CORRECTION AND K CLASS 55 where W Y, D, and for a parameter a > : D P MX ZY κ β κ D P MX ZD κ a K Z D M MX ZM X Y a K Z D M MX ZM X D β F uller.