Econometrics. 8) Instrumental variables

Similar documents
4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Instrumental Variables, Simultaneous and Systems of Equations

Эконометрика, , 4 модуль Семинар Для Группы Э_Б2015_Э_3 Семинарист О.А.Демидова

Handout 11: Measurement Error

Fixed and Random Effects Models: Vartanian, SW 683

Econometrics. 7) Endogeneity

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Problem Set 10: Panel Data

Econometrics. 9) Heteroscedasticity and autocorrelation

Lecture 8: Instrumental Variables Estimation

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Handout 12. Endogeneity & Simultaneous Equation Models

Maria Elena Bontempi Roberto Golinelli this version: 5 September 2007

Lab 07 Introduction to Econometrics

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

An explanation of Two Stage Least Squares

ECO220Y Simple Regression: Testing the Slope

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Lecture#12. Instrumental variables regression Causal parameters III

Microeconometrics (PhD) Problem set 2: Dynamic Panel Data Solutions

Econometrics Midterm Examination Answers

Lecture 14. More on using dummy variables (deal with seasonality)

GMM Estimation in Stata

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Applied Statistics and Econometrics

10) Time series econometrics

Multivariate Regression: Part I

Graduate Econometrics Lecture 4: Heteroskedasticity

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Specification Error: Omitted and Extraneous Variables

Question 1 [17 points]: (ch 11)

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Dynamic Panels. Chapter Introduction Autoregressive Model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Lecture 4: Multivariate Regression, Part 2

Ecmt 675: Econometrics I

Econometrics. 4) Statistical inference

ECON Introductory Econometrics. Lecture 17: Experiments

Practice 2SLS with Artificial Data Part 1

Essential of Simple regression

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Greene, Econometric Analysis (7th ed, 2012)

ECON3150/4150 Spring 2016

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Empirical Application of Simple Regression (Chapter 2)

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Lab 6 - Simple Regression

ECON 836 Midterm 2016

THE MULTIVARIATE LINEAR REGRESSION MODEL

Dealing With and Understanding Endogeneity

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Dynamic Panel Data Models

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Problem set - Selection and Diff-in-Diff

Spatial Regression Models: Identification strategy using STATA TATIANE MENEZES PIMES/UFPE

9) Time series econometrics

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Suggested Answers Problem set 4 ECON 60303

Lecture 4: Multivariate Regression, Part 2

1 Motivation for Instrumental Variable (IV) Regression

Instrumental Variable Regression

Lecture#17. Time series III

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Econometrics II. Lecture 4: Instrumental Variables Part I

Linear Regression with Multiple Regressors

5.2. a. Unobserved factors that tend to make an individual healthier also tend

Chapter 6: Linear Regression With Multiple Regressors

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments

Quantitative Methods Final Exam (2017/1)

Instrumental Variables and the Problem of Endogeneity

Nonrecursive models (Extended Version) Richard Williams, University of Notre Dame, Last revised April 6, 2015

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Exercices for Applied Econometrics A

Motivation for multiple regression

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

Instrumental variables estimation using heteroskedasticity-based instruments

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

point estimates, standard errors, testing, and inference for nonlinear combinations

xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-t panel data models

Transcription:

30C00200 Econometrics 8) Instrumental variables Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen

Today s topics Thery of IV regression Overidentification Two-stage least squates (2SLS) Testing for endogeneity: Weak instruments Hausman test

Examples of instrumental variables In the case of measurement errors, instrument could be another measurement (or proxy) for unobserved x Example: twins study of returns to education x is the self-reported years of schooling by respondent z is the years of schooling reported by respondent s twin brother / sister In time series and panel data models, past values of x observed in previous periods are frequently used as instruments.

IV estimator Assume the regression model is: y = β 1 + β 2 x + ε However, the exogeneity assumption Cov(ε,x) = 0 is violated. Examples: measurement error in x, omitted variable in ε Assume we have an instrument z that is Highly correlared with endogenous x : Cov(x,z) >> 0 Uncorrelated with disturbance ε : Cov(ε,z) = 0

IV estimator Recall the OLS estimator for slope β 2 b Est. Cov( x, y) n ( x x)( y y) ( x x)( y y) i i i i OLS i1 i1 2 n n Est. Var( x) 2 i i i i1 i1 n ( x x) ( x x)( x x) The instrumental variable (IV) estimator: b IV 2 Est. Cov( z, y) Est. Cov( z, x) n i1 n i1 ( z z )( y y) i ( z z )( x x) i i i

IV estimator The instrumental variable (IV) estimator can be rewritten as: b IV Est. Cov( z, y) Est Cov z 1 2x 2 The IV estimator is unbiased and consistent. (, ) Est. Cov( z, x) Est. Cov( z, x) 2Est. Cov( z, x) Est. Cov( z, ) Est. Cov( z, x) Est. Cov( z, ) 2 Est. Cov( z, x) Since we assumed Cov(z,ε) = 0, the expected value is Eb ( ) IV 2 2

Variance of IV estimator Variance of the IV estimator Var( b ) Var( ) IV 2 2 ( n 1) Var( x) rzx Precision of the IV estimator improves if Variance of disturbance ε decreases Sample size n increases Variance of regressor x increases Correlation (r zx ) of regressor x and instrument z increases

OLS and IV as GMM estimators The OLS residuals have the property n i1 xe i OLS i 0 Thus, Est.Cov(x,e) = 0. This is the sample counterpart to the assumed population orthogonality condition Cov(x,ε) = 0 Note: we can derive the OLS estimator directly from the sample orthogonality condition. Assume centered data where sample averages of x and y are equal to zero, and assume the constant term is zero. Then n n n n OLS OLS OLS 2 xiei xi ( yi b xi ) xi yi b xi 0 i1 i1 i1 i1 OLS b Est. Cov( x, y) / Est. Var( x)

OLS and IV as GMM estimators Analogously, the IV estimator is based on the population orthogonality condition Cov(z,ε) = 0. We can derive the IV estimator using the sample orthogonality condition n i1 ze i IV i 0 n n n IV IV zi yi b xi zi yi b zixi i1 i1 i1 ( ) 0 IV b Est. Cov( z, y) / Est. Cov( z, x) Both OLS and IV can be seen as special cases of the generalized method of moment (GMM)

IV regression in Stata Two-stage least squares can be implemented in Stata using the command ivreg instead of the usual reg Syntax.ivreg y x2 x3 x4 (x2 = z1 z2 x3 x4) In matrix form: OLS: IV: -1 b = (XX) Xy -1 b = (ZX) Zy

Over-identification Thus far, we assumed that there exist a single instrumental variable z that is highly correlated with x but uncorrelated with ε Examples of instrumental variables Alternative proxy variables Past values x t-1 If a useful instrument is available, then there are potentially more than just one instrument If past value x t-1 is a good instrument for x t, then also x t-2, x t-3,, are likely useful instruments. Choosing just one of the many instruments would be inefficient use of information available Solution: two-stage least squares (2SLS) method

Two-stage least squares (2SLS) Assume we have one endogenous regressor x in the model y = β 1 + β 2 x + ε Assume we have (L-1) instruments z 2, z 3,, z L for x 2-stage estimation procedure: 1) Regress by using OLS: x = κ 1 + κ 2 z 2 + κ 3 z 3 + + κ L z L + ε Save the fitted values: x* = k 1 + k 2 z 2 + k 3 z 3 + + k L z L 2) Use the fitted values x* to estimate the original regression equation: y = β 1 + β 2 x* + ε

Two-stage least squares (2SLS) Practical notes: If we have more than one endogenous problem variable x, then stage 1 can be done separately for each variable Different endogenous regressors can be instrumented with different z variables All exogenous regressors x are usually included as instruments z If OLS is used in the stepwise estimation, the standard errors of the 2-stage regression need to be adjusted Stata does this automatically when ivreg is used

Example: production function of electricity distribution networks Assume Cobb-Douglas production function ln y = β 0 + β 1 L i + β 2 K i + ε i Output y: ln Energy (GWh) Inputs x: L = ln OPEX, K = ln Krepl Instrument for K: ln Knuse OPEX = operational expenditure (incl. wages) Krepl = Capital stock (replacement value) Knuse = Capital stock (net use value) Sample of 160 observations in years 2011 and 2012.

CD function, direct OLS estimation. regress lnenergy lnopex lnkrepl Source SS df MS Number of obs = 160 F( 2, 157) = 911.83 Model 261.157887 2 130.578943 Prob > F = 0.0000 Residual 22.483226 157.143205261 R-squared = 0.9207 Adj R-squared = 0.9197 Total 283.641113 159 1.78390637 Root MSE =.37842 lnenergy Coef. Std. Err. t P> t [95% Conf. Interval] lnopex.4460534.1190976 3.75 0.000.210813.6812938 lnkrepl.591481.1152412 5.13 0.000.3638579.8191042 _cons -4.696363.4388221-10.70 0.000-5.563119-3.829606

Two-stage least squares (2SLS) Capital stock is hard to measure. Suppose our proxy for capital stock K contains measurement error. If that is the case, the OLS estimator of the output elasticity of K is biased towards zero. Two alternative proxy measures of K: Krepl and Knuse. Two-stage least squares: Stage 1: Regress ln Krepl on ln Knuse and ln OPEX. Record the predicted ln Krepl (ln PrKrepl). Stage 2: Regress ln Energy on ln OPEX and ln PrKrepl to estimate the production function of interest.

2SLS regression. reg3 (lnkrepl = lnopex lnknuse) (lnenergy = lnopex lnkrepl), exog(lnopex) 2sls Two-stage least-squares regression Equation Obs Parms RMSE "R-sq" F-Stat P lnkrepl 160 2.1486906 0.9862 5624.20 0.0000 lnenergy 160 2.3923997 0.9148 858.94 0.0000 lnkrepl lnopex.2911237.0407567 7.14 0.000.2109328.3713145 lnknuse.6873963.0377983 18.19 0.000.6130263.7617664 _cons 1.755602.1187836 14.78 0.000 1.52189 1.989314 lnenergy lnopex.0456137.1489348 0.31 0.760 -.2474226.33865 lnkrepl.9875145.1451144 6.81 0.000.7019949 1.273034 _cons -6.052248.5352627-11.31 0.000-7.105403-4.999093 Endogenous variables: lnkrepl lnenergy Exogenous variables: lnopex lnknuse Coef. Std. Err. t P> t [95% Conf. Interval]

IV (2SLS) regression. ivregress 2sls lnenergy lnopex (lnkrepl = lnknuse lnopex) Instrumental variables (2SLS) regression Number of obs = 160 Wald chi2(2) = 1750.71 Prob > chi2 = 0.0000 R-squared = 0.9148 Root MSE =.3887 lnenergy Coef. Std. Err. z P> z [95% Conf. Interval] lnkrepl.9875145.1437475 6.87 0.000.7057744 1.269254 lnopex.0456137.1475319 0.31 0.757 -.2435436.334771 _cons -6.052248.5302208-11.41 0.000-7.091462-5.013034 Instrumented: lnkrepl Instruments: lnopex lnknuse

IV (GMM) regression. ivregress gmm lnenergy lnopex (lnkrepl = lnknuse) Instrumental variables (GMM) regression Number of obs = 160 Wald chi2(2) = 1320.63 Prob > chi2 = 0.0000 R-squared = 0.9148 GMM weight matrix: Robust Root MSE =.3887 Robust lnenergy Coef. Std. Err. z P> z [95% Conf. Interval] lnkrepl.9875145.1540702 6.41 0.000.6855425 1.289486 lnopex.0456137.1595625 0.29 0.775 -.2671231.3583505 _cons -6.052248.5687178-10.64 0.000-7.166915-4.937582 Instrumented: lnkrepl Instruments: lnopex lnknuse

Testing for weak instruments F-test of joint significance in the 1-stage regression serves as a useful diagnostic test of weak instruments To avoid the problems with weak instruments (imprecise coefficients), the coefficients of stage 1 regression should be jointly significant: F-stat > F crit

Hausman test also referred to as Durbin-Wu-Hausman test Rationale: it is not always clear if endogeneity is a problem or not If exogeneity assumption Cov(x, ε) = 0 holds, then OLS estimator is unbiased and efficient IV estimator is also unbiased, but less efficient (OLS preferred) However, if exogeneity assumption Cov(x, ε) = 0 fails, then OLS estimator is biased and inconsistent IV estimator remains unbiased (IV preferred)

H 0 : Cov(x, ε) = 0; OLS preferred H 1 : Cov(x, ε) 0; IV preferred Hausman test Procedure: Estimate both OLS and IV regressions Compare the estimated coefficients b OLS and b IV and their standard errors If H 0 is true, then difference b IV - b OLS should be small (due to inefficiency of the IV estimator) If H 0 is true, the Hausman statistic follows chi-squared distribution with the degrees of freedom equal to the number of endogenous regressors instrumented in the IV model

Hausman test in Stata Stata computes the Hausman test automatically Run the IV and OLS regressions Save the results by command estimates store name Example: estimates store CostIV and estimates store CostOLS Hausman test is conducted by command hausman Example: hausman CostIV CostOLS constant

Hausman test in Stata. hausman IV OLS Coefficients (b) (B) (b-b) sqrt(diag(v_b-v_b)) IV OLS Difference S.E. lnkrepl.9875145.591481.3960335.0859234 lnopex.0456137.4460534 -.4004397.0870714 b = consistent under Ho and Ha; obtained from ivregress B = inconsistent under Ha, efficient under Ho; obtained from regress Test: Ho: difference in coefficients not systematic chi2(2) = (b-b)'[(v_b-v_b)^(-1)](b-b) = 21.24 Prob>chi2 = 0.0000

Next time Mon 5 Oct Topic: Autocorrelation