Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Similar documents
Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Instrumental Variables and the Problem of Endogeneity

Econ 510 B. Brown Spring 2014 Final Exam Answers

Dealing With Endogeneity

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Instrumental Variables, Simultaneous and Systems of Equations

Instrumental Variables

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

1 Motivation for Instrumental Variable (IV) Regression

Economics 308: Econometrics Professor Moody

Handout 12. Endogeneity & Simultaneous Equation Models

Graduate Econometrics Lecture 4: Heteroskedasticity

Reliability of inference (1 of 2 lectures)

Multiple Linear Regression CIVL 7012/8012

ECON Introductory Econometrics. Lecture 16: Instrumental variables

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 9: Multiple equation models II

Instrumental variables estimation using heteroskedasticity-based instruments

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

CHAPTER 6: SPECIFICATION VARIABLES

Linear Models in Econometrics

Instrumental Variables

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

8. Instrumental variables regression

Empirical Economic Research, Part II

Econometrics of Panel Data

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Problem Set # 1. Master in Business and Quantitative Methods

Ch 7: Dummy (binary, indicator) variables

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics Summary Algebraic and Statistical Preliminaries

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Week 11 Heteroskedasticity and Autocorrelation

Lecture Notes on Measurement Error

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Econometrics. 7) Endogeneity

EMERGING MARKETS - Lecture 2: Methodology refresher

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics Homework 4 Solutions

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Using Instrumental Variables to Find Causal Effects in Public Health

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

ECO375 Tutorial 8 Instrumental Variables

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Econometrics Multiple Regression Analysis: Heteroskedasticity

ECNS 561 Multiple Regression Analysis

Truncation and Censoring

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Iris Wang.

ECON 497: Lecture Notes 10 Page 1 of 1

Simultaneous Equation Models (Book Chapter 5)

Applied Quantitative Methods II

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Endogeneity. Tom Smith

We begin by thinking about population relationships.

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Topic 7: HETEROSKEDASTICITY

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Controlling for Time Invariant Heterogeneity

Lecture 4: Heteroskedasticity

LECTURE 11. Introduction to Econometrics. Autocorrelation

On Variance Estimation for 2SLS When Instruments Identify Different LATEs

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Problem Set 11

Economics 582 Random Effects Estimation

An explanation of Two Stage Least Squares

Instrumental variables estimation using heteroskedasticity-based instruments

The regression model with one stochastic regressor (part II)

Rockefeller College University at Albany


Econometrics 2, Class 1

More on Roy Model of Self-Selection

Dynamic Panels. Chapter Introduction Autoregressive Model

Applied Microeconometrics (L5): Panel Data-Basics

Econometrics I Lecture 7: Dummy Variables

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Birkbeck Working Papers in Economics & Finance

SIMULTANEOUS EQUATION MODEL

Identifying the Monetary Policy Shock Christiano et al. (1999)

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Handout 11: Measurement Error

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Re-estimating Euler Equations

Multiple Regression Analysis

Identification of the New Keynesian Phillips Curve: Problems and Consequences

The Simple Linear Regression Model

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Ability Bias, Errors in Variables and Sibling Methods. James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

A Guide to Modern Econometric:

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Transcription:

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28

Why IV estimation? So far, in OLS, we assumed independence. E{x i ε i } = 0. In other words, all the explanatory variables are exogenous. There are a number of cases in economics where this assumption is unrealistic. When a variable is endogenous, the error term will be correlated with the explanatory variable. Thus, OLS is no more unbiased and inconsistent. The linear model no longer corresponds to a conditional expectation or a best linear approximation Many reasons for contemporaneous correlation between the error term and one or more of the X variables. 2/28

Causes of Endogeneity 1.Introduction of a lagged dependent variable A regression equation may contain a lagged dependent variable as one explanatory variable. Common in Labor Economics (unemployment duration models), Development Economics (poverty and consumption dynamics), and Agricultural Economics (technology adoption). Let the regression equation be given by y t = β 1 + β 2 x t + β 3 y t 1 + ε t (1) suppose ε t follows a MA(1) process given as ε t = ρε t 1 + υ t (2) 3/28

Causes of Endogeneity 1.Introduction of a lagged dependent variable contd... Rewriting the model as This implies y t = β 1 + β 2 x t + β 3 y t 1 + ρε t 1 + υ t y t 1 = β 1 + β 2 x t 1 + β 3 y t 2 + ε t 1 The lagged dependent variable and the error term are correlated and OLS will be biased and inconsistent! The possible solution is an IV technique. 4/28

Causes of Endogeneity 2. Measurement error in the explanatory variables measurement error in one or more of the explanatory variables leads to correlation with the error term suppose the relationship: y t = β 1 + β 2 w t + ν t you can think of y t as household savings, and w t as disposable income where the error term has a zero mean and finite variance 5/28

Causes of Endogeneity 2. Measurement error in the explanatory variables contd... E{ν t w t } = 0 implying that E{y t w t } = β 1 + β 2 w t If there is measurement error, x t = w t + u t. Even under a set of simplifying assumptions such as 1 u i (0, σ 2 u), 2 u i is independent of ν i, and 3 The measurement error is independent of the underlying true value w t Estimating the equation y t = β 1 + β 2 x t + ε t using OLS will result in biased and inconsistent parameter estimates Note: ε t = ν t β 2 u t 6/28

Causes of Endogeneity 3. Omitted variable bias One of the most common cause of endogeneity An omitted variable (which is captured by the error term) is correlated with one or more of the explanatory variables Also arises from unobservable omitted factors correlated with the explanatory variable(s) Consider an individual wage equation given by y i = x 1iβ 1 + x 2i β 2 + u i γ + ν i 7/28

Causes of Endogeneity 3. Omitted variable bias contd... where: y i is log wage of an individual, x 1i is a vector of individual characteristics,and x 2i refers to years of schooling. Let the variable u i represent ability Obviously, ability and years of schooling will be correlated = OLS will be biased and inconsistent Assuming for instance that E(x i, u i ) > 0, = OLS overestimates the returns to schooling! We ll see an application on this in the next section 8/28

Causes of Endogeneity 4. Simultaneity and reverse causality A situation where not only x i has an impact on y i, but also y i an impact on one or more of the x. Reverse causality arises when the two variables (x i and y i ) are determined simultaneously. A number of examples in Macro-economics where one needs a system of equations to determine endogenous variables. E.g: Demand and Supply. A classic example: the simple Keynesian consumption function - with a closed economy and no government. 9/28

Causes of Endogeneity 4. Simultaneity and reverse causality contd... Assume a closed economy with no government Let the aggregate consumption function be given by C t = a t + by t + ε t Where t = 1,..., T years, and 0 < b < 1 Aggregate income will be determined by the identity. Y t = C t + I t + ε t Where I t represents private investment - assumed exogenous. 10/28

Causes of Endogeneity 4. Simultaneity and reverse causality contd... It is easy to show in the consumption function that Cov(Y t, ε t ) = 0 What will be the change in consumption for a unit of change in income? Estimating the equation using OLS will result in inconsistent estimates because consumption and income are endogenous. One needs to solve for the reduced form equations 11/28

Causes of Endogeneity 4. Simultaneity and reverse causality contd... Y = C = a 1 b + 1 1 b I + 1 1 b ε a 1 b + b 1 b I + 1 1 b ε Which can be written in more general form as Y = π 1 + π 2 I + υ 1 C = π 1 + π 2 I + υ 2 OLS can be used to estimate the equations separately. These coefficients (may be) used to estimate b depending on identification. 12/28

Causes of Endogeneity 4. Simultaneity and reverse causality contd... More specifically, one needs to revert to other methods like, IV (instrumental variable), ILS (Indirect Least Squares), 2SLS(Two-stage least squares - a special case of the IV technique), LI/ML (Limited Information, Maximum Likelihood) methods, depending on identification. (We don t discuss these in detail in this lecture). 13/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV Consider the linear wage model y i = x 1iβ 1 + x 2i β 2 + ε i (3) To make the conditional expectation (the best linear approximation) of y i given x 1i and x 2i, we needed to impose and E{ε i x 1i } = 0 (4) E{ε i x 2i } = 0 (5) If not, the model no longer corresponds to E{y i x 1i, x 2i } = OLS will be biased and inconsistent In the above wage equation, ability or intelligence (which is unobserved and hence included in ε i ) would be correlated with education 14/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. In such a case, E{ε i x 2i } = 0 and we say that x 2i is endogenous Other than education, many variables in the wage equation (union status, sickness, industry and marital status) are in fact potentially endogenous Married individuals earn on average 10% more wage than unmarried individuals in the US But this is not reflecting the causal effect of being married, rather it reflects the difference in unobservable characteristics of married and unmarried people Under additional model identifying assumptions, we would be able to derive another estimator The conditions in (4) and (5) are called moment conditions These conditions would be sufficient to identify the unknown parameters in the model 15/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. The K parameters in β 1 and β 2 should be such that the following K equalities hold: E{(y i x 1iβ 1 x 2i β 2 )x 1i } = 0 (6) E{(y i x 1iβ 1 x 2i β 2 )x 2i } = 0 (7) These conditions are imposed on the estimator when estimating OLS through the corresponding sample moments That is, the OLS estimator b = (b 1, b 2) for β = (β 1, β 2) is solved from 1 N 1 N N (y i x 1ib 1 x 2i b 2 )x 1i = 0 (8) i=1 N (y i x 1ib 1 x 2i b 2 )x 2i = 0 (9) i=1 16/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. These are the first-order conditions for the minimization of the least square criterion and the number of conditions = K (number of unknown parameters) b 1 and b 2 can be solved uniquely from (8) and (9) When (5) is violated, (9) drops out and we can no longer solve for b 1 and b 2 = β 1 and β 2 are no longer identified Identification requires at least one additional moment condition which is possible when we have what is called an Instrumental Variable (IV) An instrumental variable z 2i in this case is a variable such that: E{ε i z 2i } = 0 (the IV is exogenous) and E{z 2i x 2i } = 0 (the IV is relevant) 17/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. In this case we have E{(y i x 1iβ 1 x 2i β 2 )z 2i } = 0 (10) Such an IV would be referred to as exogenous and would be sufficient to the model s K parameters Condition (10) is known as the exclusion restriction The IV estimator ˆβ IV can then be solved from 1 N 1 N N (y i x ˆβ 1i 1,IV x 2i ˆβ 2,IV )x 1i = 0 (11) i=1 N (y i x ˆβ 1i 1,IV x 2i ˆβ 2,IV )z 2i = 0 (12) i=1 18/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. Solving these equations analytically gives the IV estimator as follows. N N ˆβ IV = ( z i x i) 1 ( z i y i ) (13) i=1 i=1 where x i = (x 1i, x 2i) and z i = (x 1i, z 2i) Do you see what happens when z 2i = x 2i? Identification of the model and consistency of the IV estimator requires that the moment conditions uniquely identify the parameters of interest 19/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. This is equivalent to saying that π 2 in the following equation is significantly different from zero x 2i = x 1iπ 1 + z 2i π 2 + υ i (14) z 2i should also not be a linear combination of the elements in x 1 i If these conditions are satisfied, we say that the instrument is relevant (testable by H 0 : π 2 = 0) The IV estimator therefore would be implemented using a two-stage framework Stage 1: Estimate (14) (the reduced form equation), and get the predicted values of x 2i (the endogenous variable) Stage 2: Run OLS regression of the model using predicted values of from stage 1 instead of the endogenous varible (i.e., use ˆx 2i in place of x 2i ) 20/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. ˆσ 2 = 1 N K N (y i x ˆβ i IV ) 2 (15) i=1 Like OLS, one can compute standard errors robust to heteroskedasticity of unknown form 21/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. Practical challenges with the IV estimator 1 Finding an exogenous and relevant instrument 2 High standard errors compared to OLS Note however that the moment conditions we stated earlier are identifying, they cannot be tested statistically They can however be tested if there are more conditions than actually needed for identification One can however test endogeneity of x 2i using a variant of the Hausman test called (Durbin-Wu-Hausman test) by comparing the OLS and IV estimators for β provided that the instrument z 2i is valid 22/28

Instrumental Variables (IV) Estimation Single Endogenous Regressor and Single IV cont. Durbin-Wu-Hausman test: steps Step 1: estimate a reduced-form equation explaining x 2i from x 1i and z 2i and save the residuals, say ˆ υ i Step 2: add the residuals to the mode of interest and estimate an OLS model of y i = x 1iβ 1 + x 2i β 2 + υˆ i γ + e i (16) One can test the endogeneity of x 2i by performing a standard t-test on γ = 0 23/28

Instrumental Variables (IV) Estimation Multiple Endogenous Regressors If more than one explanatory variable is endogenous, the dimension of x 2i is increased accordingly and the model becomes y i = x 1iβ 1 + x 2iβ 2 + ε i (17) To estimate this equation, we need an instrument for each element in x 2i We need instrument for each element in x 2i, i.e., equal number of instruments with endogenous variables The IV estimator in this case would be N N ˆβ IV = ( z i x i) 1 ( z i y i ) (18) i=1 i=1 where now x i = (x 1i, x 2i) and z i = (x 1i, z 2i) 24/28

Instrumental Variables (IV) Estimation Multiple Endogenous Regressors The entire vector z i is called the vector of instruments An exogenous variable doesn t need an instrument (or it serves as its own instrument) If z i = x i, the IV model reduces to an OLS model, where each variable is instrumented by itself 25/28

IV Estimation Specification Tests In the exactly identified case, (1/N ) i ˆε iz i = 0 by construction = K = R and identifying restrictions are not testable But if the model is overidentified (i.e., if there are more instruments than endogenous variables), it would be possible to derive a test ststistic which has an asymptotic Chi-squared distribution with R K d.f The test is called an overidentifying restrictions test or Sargan test A simple way to compute the test statistic is by taking N R 2 of an auxiliary regression of IV residuals ˆε i upon the full set of instruments z i 26/28

IV Estimation Specification Tests Weak Instruments The instrument may exhibit only weak correlation with the endogenous regressor(s) The normal distribution provides a very poor approximation of the distribution of the true IV estimator (even if the sample size is large) The standard IV estimator would therefore be biased, its standard errors are misleading and hypothesis test are unreliable It is possible to investigate this using what is called the reduced form regression Consider the linear model. y i = x 1iβ 1 + x 2i β 2 + ε i (19) and assume that E{x 1i ε i } = 0, and additional instruments z 2i (for x 2i ) satisfy E{z 2i ε i } = 0 27/28

IV Estimation Specification Tests Weak Instruments Cont. The appropriate reduced form is given by x 2i = x 1iπ 1 + z 2iπ 2 + υ i (20) If π 2 = 0, the z 2i s are irrelevant and the IV estimator is inconsistent If π 2 is close to zero, the instruments are weak One can use the value of the F statistic for π 2 = 0 in this regression As a rule of thumb, if the F statistic is greater than 10, we don t need to worry about weak instruments If the IVs are insignificant in the reduced form regression, do not trust the IV results! If you have more instruments, try by dropping the weakest ones 28/28