Instrumental Variables

Similar documents
Instrumental Variables

Econometrics Master in Business and Quantitative Methods

Instrumental Variables, Simultaneous and Systems of Equations

Exogeneity and Causality

Linear IV and Simultaneous Equations

We begin by thinking about population relationships.

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

ECON 4160, Spring term 2015 Lecture 7

The Statistical Property of Ordinary Least Squares

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

Final Exam. Economics 835: Econometrics. Fall 2010

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

1. The OLS Estimator. 1.1 Population model and notation

Estimation of Dynamic Regression Models

Generalized Method of Moments (GMM) Estimation

The outline for Unit 3

Econometrics II - EXAM Answer each question in separate sheets in three hours

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

GARCH Models Estimation and Inference

Asymptotic Theory. L. Magee revised January 21, 2013

Lecture 6: Dynamic panel models 1

Generalized Method of Moment

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

STAT 100C: Linear models

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

4.8 Instrumental Variables

Introduction to Estimation Methods for Time Series models. Lecture 1

Maximum Likelihood Estimation

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Greene, Econometric Analysis (5th ed, 2003)

Dealing With Endogeneity

IV estimators and forbidden regressions

Specification testing in panel data models estimated by fixed effects with instrumental variables

Simultaneous Equation Models

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

ECONOMETRIC THEORY. MODULE XVII Lecture - 43 Simultaneous Equations Models

Linear Regression with Time Series Data

SEM with observed variables: parameterization and identification. Psychology 588: Covariance structure and factor models

1 Motivation for Instrumental Variable (IV) Regression

Homoskedasticity. Var (u X) = σ 2. (23)

plim W 0 " 1 N = 0 Prof. N. M. Kiefer, Econ 620, Cornell University, Lecture 16.

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Instrumental Variables and the Problem of Endogeneity

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Asymptotic Properties and simulation in gretl

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Our point of departure, as in Chapter 2, will once more be the outcome equation:

GARCH Models Estimation and Inference

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Simple Linear Regression Estimation and Properties

Econ 583 Final Exam Fall 2008

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Simultaneous Equations and Weak Instruments under Conditionally Heteroscedastic Disturbances

ECON 3150/4150, Spring term Lecture 7

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

11. Further Issues in Using OLS with TS Data

Economics 472. Lecture 10. where we will refer to y t as a m-vector of endogenous variables, x t as a q-vector of exogenous variables,

Chapter 1. GMM: Basic Concepts

An Introduction to Parameter Estimation

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Lecture 8: Instrumental Variables Estimation

Missing dependent variables in panel data models

Econometrics. 7) Endogeneity

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

The Linear Regression Model

MEI Exam Review. June 7, 2002

Econometrics of Panel Data

Interpreting Regression Results

Greene, Econometric Analysis (7th ed, 2012)

Agricultural and Applied Economics 637 Applied Econometrics II

The Estimation of Simultaneous Equation Models under Conditional Heteroscedasticity

2.1 Linear regression with matrices

Advanced Quantitative Methods: ordinary least squares

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Linear Regression with Time Series Data

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

The Multiple Regression Model Estimation

Birkbeck Working Papers in Economics & Finance

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Econometrics of Panel Data

Intermediate Econometrics

Linear Regression. Junhui Qian. October 27, 2014

Exogeneity tests and weak identification

Answers to Problem Set #4

Topic 10: Panel Data Analysis

Transcription:

Università di Pavia 2010 Instrumental Variables Eduardo Rossi

Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the DGP that generated X is independent of the error terms ε in the DGP for y. This independence implies that E(ε X) = 0 the mean of the entire vector ε, that is, of every one of the ε t, is zero conditional on the entire matrix X. This assumption is acceptable for cross-section data but not for time series data. Eduardo Rossi c - Econometria Finanziaria 2010 2

Exogeneity Time series data: each observation might correspond to a year, quarter, month, or day. Even if we are willing to assume that ε t is in no way related to current and past values of the regressors, it must be related to future values if current values of the dependent variable affect future values of some of the regressors. In the context of time-series data, the exogeneity assumption is a very strong one that we may often not feel comfortable in making. Eduardo Rossi c - Econometria Finanziaria 2010 3

Innovation The assumption: E[ε t x t ] = 0 is substantially weaker than Exogeneity assumption, because this rules out the possibility that the mean of ε t may depend on the values of the regressors for any observation, while E[ε t x t ] = 0 rules out the possibility that it may depend on their values for the current observation. From the point of view of the error terms, it says that they are innovations. Eduardo Rossi c - Econometria Finanziaria 2010 4

Predeterminedness An innovation is a r.v. of which the mean is 0 conditional on the information in the x t, and so knowledge of the values taken by the latter is of no use in predicting the mean of the innovation. From the point of view of the explanatory variables x t, Predeterminedness assumption says that they are predetermined with respect to the error terms. Eduardo Rossi c - Econometria Finanziaria 2010 5

Endogeneity There is a problem of endogeneity in the linear model y t = x tβ +ε t if β is the parameter of interest and E(x t ε t ) 0. Eduardo Rossi c - Econometria Finanziaria 2010 6

Simultaneous equations Demand-supply system: q d,t q s,t q s,t = α 1 p t +α 2 x t +ε d,t = β 1 p t +ε s,t = q d,t = q t x t income exogenous. Prices and quantities are measured in logs and in deviation form their sample means. Interest centers on estimating the demand elasticity: E[ε d,t x t ] = 0 E[ε s,t x t ] = 0 E[ε 2 d,t x t ] = σ 2 s E[ε 2 s,t x t ] = σ 2 d E[ε s,t x t ] = E[ε d,t x t ] = 0 E[ε s,t ε d,t x t ] = 0 Eduardo Rossi c - Econometria Finanziaria 2010 7

Simultaneous equations Reduced form of the model: p t = α 2 x t β 1 α 1 + ε d,t ε s,t β 1 α 1 = π 1 x t +v 1t q t = β 1α 2 x t β 1 α 1 + β 1ε d,t α 1 ε s,t β 1 α 1 = π 2 x t +v 2t σ2 d Cov[p t,ε d,t ] = β 1 α 1 Neither the demand not the supply equation satisfies the assumptions of the classical regression model. The price elasticity of demand (α 1 ) cannot be consistently estimated by OLS regression of q on x and p. Because the endogenous variables are all correlated with the disturbances, the OLS estimators of the parameters of equations with endogenous variables on the RHS are inconsistent. Eduardo Rossi c - Econometria Finanziaria 2010 8

measurement error Measurement error in the regressors. Suppose that (y t,x t) are joint random variables E(y t x t ) = x t β is linear. But x t is not observed. Instead we observe x t = x t +u t u t (K 1) measurement error, independent of y t and x t. Eduardo Rossi c - Econometria Finanziaria 2010 9

measurement error where y t = x t β +ε t = (x t u t ) β +ε t = x tβ +v t v t = ε t u tβ. The problem is that E(x t v t ) = E[(x t +u t )(ε t u tβ)] = E(u t u t)β 0 If β 0 and E(u t u t) 0. It follows that if β is the OLS estimator [ ] 1 β = β + (x t x t) x t v t t t Eduardo Rossi c - Econometria Finanziaria 2010 10

measurement error then [ 1 plim β = β +plim N ] 1 1 (x t x t) N t x t v t t plim β = β (E(x t x t)) 1 E(u t u t)β β. Eduardo Rossi c - Econometria Finanziaria 2010 11

Instrumental Variables Estimator Let the equation of interest be y t = x tβ +ε t (1) where x t (K 1), E(x t ε t ) 0 so that there is a problem of endogeneity. Eq. (1) is called structural equation. In matrix notation, this can be written as y = Xβ +ε any solution to the problem of endogeneity requires additional information which we call instruments. Eduardo Rossi c - Econometria Finanziaria 2010 12

Instrumental Variables Estimator Definition The vector z t, (L 1), is an instrumental variable for eq.(1) if E(z t ε t ) = 0. In a typical set-up, some regressors in x t will be uncorrelated with ε t (for example, at least the intercept). Thus we make the partition x t = x 1t x 2t K 1 K 2 where E(x 1t ε t ) = 0 yet E(x 2t ε t ) 0. We call x 1t exogenous and x 2t endogenous. x 1t is an instrument for eq.(1). Eduardo Rossi c - Econometria Finanziaria 2010 13

Instrumental Variables Estimator So we have the partition z t = x 1t z 2t K 1 L 2 Three possible cases: Just-identified if L = K Over-identified if L > K Under-identified if L < K. Eduardo Rossi c - Econometria Finanziaria 2010 14

Reduced Form The reduced form relationship between the variables or regressors x t and the instruments z t is found by linear projection. Let Γ = E(z t z t) 1 E(z t x t) be the (L K) matrix of coefficients from a projection of x t on z t, and define u t = x t z tγ as the projection error. Then the reduced form linear relationship between x t and z t is x t = Γ z t +u t (2) Eduardo Rossi c - Econometria Finanziaria 2010 15

Reduced Form In matrix notation X = ZΓ+u (3) u (N K). By construction E(z t u t) = 0 So (3) is a projection and can be estimated by OLS X = Z Γ+û Γ = (Z Z) 1 Z X Substituting X = ZΓ+u in y = Xβ +ε y = (ZΓ+u)β +ε = Zλ+v (4) where λ = Γβ, v = uβ +ε. Eduardo Rossi c - Econometria Finanziaria 2010 16

Reduced Form Observe that E(z t v t ) = E(z t u t)β +E(z t ε t ) = 0 Thus (4) is a projection equation and may be estimated by OLS. This is y = Z λ+ v λ = (Z Z) 1 Z y The equation (4) is the reduced form for y. The system y = Zλ+v X = ZΓ+u OLS yields the reduced-form estimates ( λ, Γ). Eduardo Rossi c - Econometria Finanziaria 2010 17

Identification The structural parameter β relates to (λ,γ) through λ = Γβ. The parameter is identified, meaning that it can be recovered from the reduced form, if rank(γ) = K (5) Assume that (5) holds. If L = K, then β = Γ 1 λ. If (5) is not satisfied, then β cannot be recovered from (λ,γ). Note that a necessary (although not sufficient) condition for (5) is L K. Eduardo Rossi c - Econometria Finanziaria 2010 18

Identification Since X and Z have the common variables X 1, we can rewrite some of the expressions [ ] X = X 1 X 2 [ ] Z = X 1 Z 2 we can partition Γ as Γ = Γ 11 Γ 12 Γ 21 Γ 22 = I K 1 Γ 12 0 Γ 22 L 1 K 2 L 2 K 2 Eduardo Rossi c - Econometria Finanziaria 2010 19

Identification ZΓ = = [ [ ] X 1 Z 2 I K 1 Γ 12 0 Γ 22 ] X 1 X 1 Γ 12 +Z 2 Γ 22 β is identified if rank(γ) = K, which is true if and only if rank(γ 22 ) = K 2 (by the upper-diagonal structure of Γ). The key to identification of the model rests on the (L 2 K 2 ) matrix Γ 22. Eduardo Rossi c - Econometria Finanziaria 2010 20

Instrumental Variable Estimation Suppose that the model is just identified (L = K). Then β = Γ 1 λ. This suggests the Indirect Least Squares estimator: β IV = = Γ 1 λ [ ] (Z Z) 1 1 [ ] Z X (Z Z) 1 Z y = (Z X) 1 (Z Z)(Z Z) 1 (Z y) = (Z X) 1 (Z y) β IV is the instrumental variables estimator of β. For identification by any given sample, the condition is just that Z X should be nonsingular. Eduardo Rossi c - Econometria Finanziaria 2010 21

Consistency of IV Estimator Since ) p ( λ, Γ (λ,γ) and Γ is invertible, β is consistent. A more direct way to see consistency is to substitute the equation y = Xβ +ε into the equation for β to obtain β IV Given: ( Z ) X Q ZX = plim N = (Z X) 1 (Z (Xβ +ε)) = β +(Z X) 1 Z ε the IV estimator is consistent iff: ( N 1 Z X ) 1( N 1 Z ε ) p 0 deterministic with rank K(Asymptotic identification) or p lim N Z ε = 0 asymptotically uncorrelated. Eduardo Rossi c - Econometria Finanziaria 2010 22

The asymptotic distribution of IV estimator Q ZX Q ZZ ( Z ) X = plim deterministic with rank K(Asymptotic identification) N ( Z ) Z = plim positive definite N N ( βiv β) = [ ] N (Z X) 1 Z ε Under the hypothesis that E(εε X,Z) = σ 2 I n, The Central Limit Theorem gives us the limit distribution ) L ( N ( βiv β N 0,σ 2 Q 1 ZX Q ZZQ 1 ) XZ The estimate of σ 2 σ 2 = 1 N ( ) ( ) y X β y X β Eduardo Rossi c - Econometria Finanziaria 2010 23

Efficiency If the model is correctly specified the asymptotic covariance matrix: [ Var p lim N( βiv β)] = σ 2 Q 1 ZX Q ZZQ 1 XZ N ( Z = σ [plim 2 )] 1 [ ( X Z )][ ( Z X )] 1 Z plim plim N N N { [ ( X = σ 2 )][ ( Z Z )] 1 [ ( Z Z )] } 1 X plim plim plim N N N { (X = σ 2 )( Z Z ) 1 ( Z Z ) } 1 X plim N N N ( ) 1 1 = σ 2 plim N X P Z X If we have some choice over what instruments to use in the matrix Z, it makes sense to choose them so as to minimize the above asymptotic covariance matrix. Eduardo Rossi c - Econometria Finanziaria 2010 24

Optimal instruments Optimal instruments: instruments that minimize the asymptotic covariance matrix. In order to determine the optimal instruments, we must know the data generating process. Quite generally, we can suppose that the explanatory variables satisfy the relation: where X = X+V x t = E[x t Ω t ] E[v t Ω t ] = 0 where Ω t is the set of uncorrelated variables with ε t. It turns out that the matrix X provides the optimal instruments. Eduardo Rossi c - Econometria Finanziaria 2010 25

Hausman Specification Test It might not be obvious that the regressors are correlated with the disturbances or that the regressors are measured with error. If not, there would be some benefit to using the OLS rather than the IV estimator. We can compare the asymptotic variance-covariance matrices under the assumption that both are consistent, that is plim 1 N X ǫ = 0 Eduardo Rossi c - Econometria Finanziaria 2010 26

Hausman Specification Test The difference between the asymptotic covariance matrices of the two estimators is ( AVar( β IV ) AVar( β OLS ) = σ2 X N plim Z(Z Z) 1 Z ) 1 X N ( σ2 X ) 1 N plim X N = σ2 N plimn [ (X Z(Z Z) 1 Z X) 1 (X X) 1] Eduardo Rossi c - Econometria Finanziaria 2010 27

Hausman Specification Test Compare the inverses (X Z(Z Z) 1 Z X) = X P Z X = X (I N M Z )X = X X X M Z X since M Z is p.s.d. it follows that X M Z X is also so. Since (X Z(Z Z) 1 Z X) is smaller, in the matrix sense, than X X then its inverse is larger. If the OLS is consistent is the preferred estimator (asymptotically more efficient). Eduardo Rossi c - Econometria Finanziaria 2010 28

Hausman Specification Test We want to test the null hypothesis that the error terms are uncorrelated with all the regressors against the alternative that they are correlated with some of the regressors, although not with the instruments. The null hypothesis is plim 1 N X ǫ = 0 under the null hypothesis both estimators are consistent. Under the alternative only β IV is consistent. The suggestion is to examine d = β IV β OLS Eduardo Rossi c - Econometria Finanziaria 2010 29

Hausman Specification Test Under the null hypothesis plimd = 0 We might test this hypothesis using a Wald statistic H = d [AVar( β IV β OLS )] 1 d Eduardo Rossi c - Econometria Finanziaria 2010 30

Hausman Specification Test AVar( β IV β OLS ) = AVar( β IV )+AVar( β OLS ) 2ACov( β IV, β OLS ) We do not have an expression for the covariance term. Hausman shows that Cov[ β OLS, β OLS β IV ] = 0 Cov(X,X Y) = E[(X E(X))(X Y E(X)+E(Y))] then = E[(X E(X)) 2 ] E[(X E(X))(Y E(Y))] = Var[X] Cov(X,Y) Cov[ β OLS, β OLS β IV ] = Var[ β OLS ] Cov[ β IV, β OLS ] = 0 Var[ β OLS ] = Cov[ β IV, β OLS ] Eduardo Rossi c - Econometria Finanziaria 2010 31

Hausman Specification Test AVar( β IV β OLS ) = AVar( β IV ) AVar( β OLS ) Inserting this useful result into the Wald statistics and reverting to empirical estimates H = ( β IV β OLS ) [AVar( β IV ) AVar( β OLS )] 1 ( β IV β OLS ) Under the null hypothesis we are using two consistent estimators of σ 2. If we use s 2 as the common estimator, then the test statistic where H = ( β IV β OLS ) [( X X) 1 (X X) 1 ] 1 ( β IV β OLS ) s 2 X = (Z Z) 1 Z X Eduardo Rossi c - Econometria Finanziaria 2010 32

Hausman Specification Test Unless X and Z have no variables in common, the rank of the matrix in the statistic is less than K, and the ordinary inverse will not even exist. In most cases some of the variables in X appear in Z (at least the constant term!). Let X = [X 1,X 2 ] where X 2 are the suspected endogenous variables. X 1 is included in Z. A subset of K 1 variables is uncorrelated with the disturbances. The quadratic form in the statistic is used to test only K 2 = K K 1 hypotheses. In this case H is quadratic form of rank K 2. Eduardo Rossi c - Econometria Finanziaria 2010 33

Hausman Specification Test Since Z(Z Z) 1 Z is an idempotent matrix X = P Z X X X = X P Z P Z X = X P Z X = X X. Using this result and expanding d d = ( X X) 1 X y (X X) 1 X y = ( X X) 1 [ X y ( X X)(X X) 1 X y] = ( X X) 1 X [y X(X X) 1 X y] = ( X X) 1 X ǫ K 1 columns of X are in X. Suppose that these are the first K 1 columns. Thus the first K 1 rows of X ǫ are the same as the first K 1 rows of X ǫ which are 0. Eduardo Rossi c - Econometria Finanziaria 2010 34

Hausman Specification Test Thus we can write d = ( X X) 1 0 X 2ǫ = ( X X) 1 0 q the test statistic becomes H = d [( X X) 1 (X X) 1 ] 1 d s 2 = [ ] 0 q ( X X) 1 W( X X) 1 0 q where W = s 2 [( X X) 1 (X X) 1 ] Eduardo Rossi c - Econometria Finanziaria 2010 35

Hausman Specification Test An (n m) matrix, denoted by A is the generalized inverse of the (m n) A matrix if it satisfies AA A = A 1. A is not unique in general. 2. AA and A A are idempotent. 3. I m AA and I n A A 4. rank(a) = n A A = I n 5. rank(a) = m AA = I m Eduardo Rossi c - Econometria Finanziaria 2010 36

Hausman Specification Test The Moore-Penrose Inverse. The (n m) matrix A + is the Moore-Penrose (generalized) inverse of the (m n) A if it satisfies the following four conditions: 1. AA + A = A, 2. A + AA + = A + 3. (AA + ) H = AA + 4. (A + A) H = A + A Eduardo Rossi c - Econometria Finanziaria 2010 37

Hausman Specification Test Properties 1. A (m n): A + exists and is unique. 2. A (m n): (a) rank(a) = m AA + = I m (b) rank(a) = m A + = A H (AA H ) 1 (c) rank(a) = n A + A = I n (d) rank(a) = n A + = (A H A) 1 A H 3. A nonsingular (m m) A + = A 1. Eduardo Rossi c - Econometria Finanziaria 2010 38

Hausman Specification Test Denoting P = ( X X) 1 W( X X) 1 H = [ 0 q ]P 0 = q P 22 q q P 22 is the lower K 2 K 2 submatrix of P. Eduardo Rossi c - Econometria Finanziaria 2010 39

Hausman Specification Test An alternative approach, provided by Durbin (1954) and Wu (1973), circumvents the computation of the generalized inverse. As before, we want to test that plimd = 0. The idea of the DWH test is to check whether the difference β IV β OLS is significantly different from zero in the available sample. By construction the OLS residuals are orthogonal to the columns of X, in particular those in X 1. Eduardo Rossi c - Econometria Finanziaria 2010 40

Hausman Specification Test We know that d = ( X X) 1 X ǫ testing whether β IV β OLS is significantly different from zero is equivalent to testing whether the vector X ǫ is significantly different from zero. X ǫ = X P Z ǫ since X 1 Z X 1 ǫ = X 1M X y = 0 The test is thus concerned only with the K 2 elements of X 2P Z ǫ (or X 2P Z M X y) which will not in general be identically zero, but should not differ from it significantly under H 0. Eduardo Rossi c - Econometria Finanziaria 2010 41

Hausman Specification Test An F test statistic can be computed to test the joint significance of the elements of γ in the augmented regression y = Xβ + X 2 γ +ǫ This is equivalent to the Hausman test for this model. The OLS estimates of δ are, by the FWL Theorem, the same as those from the FWL regression of M X y on M X P Z X 2 δ = (X 2P Z M X P Z X 2 ) 1 X 2P Z M X y Since the inverted matrix is positive definite, we see that testing whether δ = 0 is equivalent to testing whether X 2P Z M X y = 0 as desired. Eduardo Rossi c - Econometria Finanziaria 2010 42