IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Similar documents
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Instrumental Variables and the Problem of Endogeneity

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

1 Motivation for Instrumental Variable (IV) Regression

ECO375 Tutorial 8 Instrumental Variables

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

8. Instrumental variables regression

Dealing With Endogeneity

Linear IV and Simultaneous Equations

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Lecture 11 Weak IV. Econ 715

Lecture 8: Instrumental Variables Estimation

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

Instrumental Variables Estimation in Stata

Applied Health Economics (for B.Sc.)

Linear Regression. Junhui Qian. October 27, 2014

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Ec1123 Section 7 Instrumental Variables

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Econometrics. 8) Instrumental variables

IV estimators and forbidden regressions

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Problem Set - Instrumental Variables

Ordinary Least Squares Regression

Linear Models in Econometrics

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

Dynamic Panel Data Models

Motivation for multiple regression

Birkbeck Working Papers in Economics & Finance

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Instrumental Variables

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental variables estimation using heteroskedasticity-based instruments

Econometric Analysis of Cross Section and Panel Data

4.8 Instrumental Variables

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Simple Linear Regression Model & Introduction to. OLS Estimation

The Simple Linear Regression Model

Final Exam. Economics 835: Econometrics. Fall 2010

Econometrics - 30C00200

Economics 308: Econometrics Professor Moody

ECON3327: Financial Econometrics, Spring 2016

Econ 583 Final Exam Fall 2008

Econometrics. 7) Endogeneity

Introductory Econometrics

Handout 12. Endogeneity & Simultaneous Equation Models

THE MULTIVARIATE LINEAR REGRESSION MODEL

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

What s New in Econometrics. Lecture 13

Multivariate Regression Analysis

A Robust Test for Weak Instruments in Stata

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Essential of Simple regression

Advances in microeconometrics and finance using instrumental variables

Topics in Applied Econometrics and Development - Spring 2014

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Econometrics Problem Set 11

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Dynamic Panel Data Models

Asymptotic Properties and simulation in gretl

Lecture Module 8. Agenda. 1 Endogeneity. 2 Instrumental Variables. 3 Two-stage least squares. 4 Panel Data: First Differencing

Environmental Econometrics

Instrumental variables: Overview and advances

We begin by thinking about population relationships.

Multiple Linear Regression CIVL 7012/8012

Econometrics I Lecture 3: The Simple Linear Regression Model

Econ 510 B. Brown Spring 2014 Final Exam Answers

Omitted Variable Bias, Coefficient Stability and Other Issues. Chase Potter 6/22/16

Homework Set 2, ECO 311, Fall 2014

The Simple Regression Model. Part II. The Simple Regression Model

Instrumental Variables

EMERGING MARKETS - Lecture 2: Methodology refresher

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Review of Econometrics

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Applied Econometrics Lecture 1

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

ECNS 561 Multiple Regression Analysis

EC327: Advanced Econometrics, Spring 2007

Econometrics of Panel Data

Chapter 2: simple regression model

Simultaneous Equation Models

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

The BLP Method of Demand Curve Estimation in Industrial Organization

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models

Lecture 6: Dynamic panel models 1

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Transcription:

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015

Roadmap of the course Introduction. Why do we need IVs? IV estimation: 2SLS (brief overview), GMM and LIML (even briefer!) Consequences of the failure of the assumptions behind IV estimation. Irrelevant/Weak instruments and Endogeneous/weakly endogenous instruments Detection of weak instruments Inference robust to weak instruments

1. Introduction 1.1 Causality versus correlation Goal: Estimate the causal effect of some independent variable(s), X, on a dependent variable, Y. OLS on a linear model relating Y and X: Y = Xβ + U We obtain: β ols, ˆΣ ˆβ

Two key questions Under what circumstances β ols measures the causal effect of X on Y? How shall we interpret β ols when these conditions do not hold?

1.2. Endogeneity Consider the simplest case: X contains a constant and 1 variable. Then, β ols = (X X) 1 X Y = β + (X X) 1 X U (1) p β + cov(x, U) var(x) (2) Consistency of β ols requires cov(x, U) = 0 If cov(x, U) = 0 X is endogeneous. If X is endogenous, what is β ols measuring?

1.3. Main causes of Endogeneity 1. Omitted variables 2. Errors-in-variables (measurement error) 3. Simultaneous Causality

i) Omitted variables Problem: You should estimate Y = Xβ + Zδ + U But you estimate Y = Xβ + U where U = Zδ + U. If cov(x, Z) = 0 X is endogeneous.

i) Omitted variables Problem: You should estimate Y = Xβ + Zδ + U But you estimate Y = Xβ + U where U = Zδ + U. If cov(x, Z) = 0 X is endogeneous. The bias: β ols = (X X) 1 X Y = β + (X X) 1 X Zδ + (X X) 1 X U p β + E(X X) 1 E(X Z)δ

Example Effect of an additional year of schooling on wages log(wage i ) = β 0 + β 1 tenure i + β 2 educ i + u i innate ability, IA, is a missing variable. educ and IA are likely to be correlated. educ is endogenous β 2 ols is likely to overestimate or underestimate β 2?

Solutions to omitted variables include Z in the regression. If Z is not available but a proxy for Z, Z*, include it. Example: wage vs. years of education. Proxy: IQ score as a proxy for IA. Panel Data Find an instrument for X and estimate your model using IV methods.

ii) Simultaneous Causality Problem: Direction of causality, X= Y and Y = X. X is endogeneous because: Y = Xβ + U X = Y δ + V cov(x, U) = 0 because X is a function of Y and Y is correlated with U!

Example Elasticity of demand for wheat Price and Quantities are jointly determined through the interaction of the supply and demand curves log(q i ) = β 0 + β 1 log(p i ) + U i P is endogenous

Solution Instrumental variables. other solutions: randomized controlled experiment with no reverse causality.

iii) Errors-in-variables Problem: Y = Xβ + U but X is measured with error, X. why errors? typographical errors, survey data, etc. You estimate Y = X β + U Y = X β + ((X X )β + U) X is endogeneous if X is correlated with the measurement error e = (X X )

Solutions to measurement error Get a more accurate measure of X. Use instrumental variables.

1.4. A general solution to endogeneity: Instrumental variables Suppose you suspect that X is endogenous so you want to instrument it. Simplest case: 1 endogeneous variable, 1 IV. Y i = β 0 + β 1 X i + u i X i endogenous Simplest approach: use 2SLS to estimate β

Two stage least squares, I First stage: decompose X in two parts a problematic component: correlated with U a problem free component: uncorrelated with U

Two stage least squares, I First stage: decompose X in two parts a problematic component: correlated with U a problem free component: uncorrelated with U Second stage regress Y on the problem free component of X.

Two stage least squares, II 2 OLS regressions First stage: X i = π 0 + π 1 Z i + ν i = ˆX i = ˆπ 0 + ˆπ 1 Z i Second stage: Y i = ˆβ 0 + ˆβ 1 ˆX i + û i It is easy to show that ˆβ 12sls = cov(z i, Y i ) cov(z i, X i )

Two stage least squares, III If the sample size is large enough (under suitable assumptions), sample moments are close to the population moments: cov(z i, Y i ) cov(z i, X i ) = β 1cov(Z i, X i ) + cov(z i, U i ) cov(z i, X i ) = = β 1 + cov(z i, U i ) cov(z i, X i ) For ˆβ 12sls to approach β 1 as the sample size gets large, we need cov(z i, U i ) cov(z i, X i ) = 0

Two stage least squares, IV: instrument validity Z is a valid instrument for X if Instrument relevance: cov(z i, X i ) = 0 (relevance condition) Instrument exogeneity: cov(z i, U i ) = 0 (exclusion condition)

Example I: 2SLS using Stata Estimating the elasticity of demand of Flaxseed First application of IV regression (1926) annual data, 1904-23 log(q i ) = β 0 + β 1 log(p i ) + β 2 year i + u i β 1 = % increase in Q if P increases 1%. IV: (log) rainfall ( supply shifter )

Course overview, I This course focuses on understanding the behaviour of IV estimators when the above assumptions fail or almost fail. If the relevance restriction fails (or is close to failing) Z is irrelevant (or weak). If the exclusion restriction fails (or is close to failing) Z is endogenous (or weakly endogenous).

Course overview, II We ll focus on How to estimate β using IVs. What happens with your estimators and test of hypothesis when Z is weak (the literature is ample here) Z is (weakly) endogenous (the literature is much more limited here) Z is weak and endogenous (this case is still largely unexplored) How to detect weak instruments How to carry out inference when instruments are weak

The literature typically treats the relevance and the exogeneity of the instruments as two different problems. But recall, ˆβ 12sls = β 1 + cov(z i, U i ) cov(z i, X i ) Thus, what matters is what happens with the ratio of these two quantities. When searching for IVs, we often face a tradeoff: More exogenous instruments (i.e., less correlated with the error term) are often weaker. And stronger instruments are usually more at risk of being more endogenous (i.e., more correlated with the error term). Our conclusions we ll hopefully shed light on how to look for IVs in applied work.

2. IV estimation A bit of history: http: // scholar. harvard. edu/ stock/ content/ history-iv-regression

The Wrights (father and son) were interested in how to estimate the slope of supply and demand curves of agricultural products when observed prices and quantities are determined by the intersection of the supply and demand lines Wright (1928) first solves the problem. He notes that if there is an observed variable that shifts supply but not demand, this variable could be used to estimate the slope of the demand curve.

Different IV estimation procedures Two stage least squares (2SLS). Very popular, mainly because it s efficient (in the class of IV estimators) under conditional homokedasticity. GMM (Hansen, 1982): more efficient than 2SLS under heterokedasticity. See Wooldridge 2010 (Chapter 8). k-estimators: LIML, Fuller K-estimators,... (OLS and 2SLS also belong to this group). Limited Information maximum likelihood (LIML): Less precise than 2SLS but less biased when instruments are weak (we ll come back to this point). In this short course we ll mainly focus on 2SLS. See Andrews and Stock (2005) and Stock and Yogo (2005) for a more general treatment.

2SLS: general treatment Consider the following model: y = Xβ 0 + ɛ; (3) (4) where, X is a N K matrix, several components of X can be correlated with ɛ (multiple endogenous regressors). Z is a N L matrix containing the exogenous variables in X plus the instruments. The number of instruments can be larger than the number of endogenous regressors, so L K

The 2SLS estimate of β is given by ˆβ 2sls = (X P z X) 1 (X P z y); P z = Z(Z Z) 1 Z If both Z and X are N 1 (simplest case as above) ˆβ 2sls = Z y Z X ( ˆβ 2sls β 0 ) = Z ɛ Z X

Asymptotic properties Consistency Two standard conditions are needed (same as before!) 1. E(Z ɛ) = 0, (exclusion condition) 2. rank(e(z X)) = K, (relevance condition) Condition 1 implies that Z is exogeneous, condition 2, that it is strong. Under these conditions, ˆβ 2sls is consistent: ˆβ 2sls β 0 p 0

Asymptotic distribution By the CLT, under conditional homokedasticity of ɛ, T 1/2 ( ˆβ 2sls β 0 ) d N(0, σ 2 ɛ (E(X Z)(E(Z Z) 1 )E(Z X)) 1 ) If Z and X are univariate and X = ΠZ + v T 1/2 ( ˆβ 2sls β 0 ) d N(0, σ 2 ɛ E(Z Z) E(Z X) 2 ) = N(0, σ 2 ɛ E(Z Z)Π 2 ) Remark: the asymptotic variance is closely related to a measure of the strength of the instrument, µ 2 (we ll come back to this point latter on), where µ 2 = E(Z Z)Π 2 σ 2 v

Other IV estimation procedures 2SLS is very popular mainly because (under homokesdasticity of ɛ) is efficient (i.e., smallest variance) in the class of all instrumental variables estimators using instruments linear in Z. GMM (Hansen 1982): more efficient than 2SLS if homokedasticity fails. See Wooldridge, Chapter 8. Limited Information maximum likelihood (Anderson and Rubin, 1949). LIML is a linear combination of the OLS and 2SLS estimate (with weights depending on the data), and they the weights happen to be such that they (approximately) eliminate the 2SLS bias. Although less precise than 2SLS in general, under deviations from the standard assumptions, LIML behaves much better than 2SLS (we ll come back to this).

Wrapping up We are usually interested in estimating causal effects Independent variables are often endogenous due to different reasons IV estimation is a general solution to the problem above When carefully implemented, IV is great! But unfortunately, there are some problems as well...

Why should we not always use IV? It s difficult to find good IVs. Even if instruments are good IV estimators are biased and their finite-sample properties are often problematic most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. (Remember: OLS is consistent AND unbiased if X is exogenous) Even if instruments are good: the precision of IV estimates is lower than that of OLS estimates (i.e., standard errors are larger). IV estimators have good properties ONLY if instruments are both relevant and strong. But what happens otherwise?

A few examples using STATA You can use the built-in command ivreg or the userwritten ivreg2. They are similar, but the latter provides more complete output. A few examples using ivreg (from Wooldridge) http: // fmwww. bc. edu/ gstat/ examples/ wooldridge/ wooldridge15. html And some more http: // www. nuff. ox. ac. uk/ teaching/ economics/ bond/ IV% 20Estimatio 20Using% 20Stata. pdf