Lecture 11 Weak IV. Econ 715

Similar documents
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

What s New in Econometrics. Lecture 13

Inference with Weak Instruments

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Imbens/Wooldridge, Lecture Notes 13, Summer 07 1

GMM, Weak Instruments, and Weak Identification

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Exogeneity tests and weak identification

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

1 Procedures robust to weak instruments

Instrumental Variables Regression, GMM, and Weak Instruments in Time Series

Specification Test for Instrumental Variables Regression with Many Instruments

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

A Robust Test for Weak Instruments in Stata

Weak Instruments and the First-Stage Robust F-statistic in an IV regression with heteroskedastic errors

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

Instrumental Variables Estimation in Stata

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models

Unbiased Instrumental Variables Estimation Under Known First-Stage Sign

Bootstrap Tests for Overidentification in Linear Regression Models

Likelihood Ratio based Joint Test for the Exogeneity and. the Relevance of Instrumental Variables

Weak Instruments, Weak Identification, and Many Instruments, Part II

Implementing Conditional Tests with Correct Size in the Simultaneous Equations Model

Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

Weak Instruments in IV Regression: Theory and Practice

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1

Bootstrap Tests for Overidentification in Linear Regression Models

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Weak Identification in Maximum Likelihood: A Question of Information

Testing for Weak Instruments in Linear IV Regression

Econometrics of Panel Data

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Robust Two Step Confidence Sets, and the Trouble with the First Stage F Statistic

Finite Sample Bias Corrected IV Estimation for Weak and Many Instruments

Conditional Linear Combination Tests for Weakly Identified Models

Econometrics II - EXAM Answer each question in separate sheets in three hours

Department of Econometrics and Business Statistics

OPTIMAL TWO-SIDED INVARIANT SIMILAR TESTS FOR INSTRUMENTAL VARIABLES REGRESSION DONALD W. K. ANDREWS, MARCELO J. MOREIRA AND JAMES H.

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Size Distortion and Modi cation of Classical Vuong Tests

A Very Brief Summary of Statistical Inference, and Examples

Finite Sample Bias Corrected IV Estimation for Weak and Many Instruments

Non-Stationary Time Series and Unit Root Testing

ST5215: Advanced Statistical Theory

Introduction to Estimation Methods for Time Series models Lecture 2

Regression. Saraswata Chaudhuri, Thomas Richardson, James Robins and Eric Zivot. Working Paper no. 73. Center for Statistics and the Social Sciences

Symmetric Jackknife Instrumental Variable Estimation

CIRJE-F-577 On Finite Sample Properties of Alternative Estimators of Coefficients in a Structural Equation with Many Instruments

Reliability of inference (1 of 2 lectures)

Improved Inference in Weakly Identified Instrumental Variables Regression

Maximum Likelihood (ML) Estimation

A Very Brief Summary of Statistical Inference, and Examples

A more powerful subvector Anderson Rubin test in linear instrumental variables regression

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

3. Linear Regression With a Single Regressor

11. Bootstrap Methods

Instrumental Variables and the Problem of Endogeneity

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Estimation and Inference with Weak Identi cation

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University

Performance of conditional Wald tests in IV regression with weak instruments

Wild Bootstrap Tests for IV Regression

Understanding Regressions with Observations Collected at High Frequency over Long Span

Split-Sample Score Tests in Linear Instrumental Variables Regression

A more powerful subvector Anderson Rubin test in linear instrumental variable regression

LECTURE 10: MORE ON RANDOM PROCESSES

MA Advanced Econometrics: Applying Least Squares to Time Series

1 Motivation for Instrumental Variable (IV) Regression

Identification of the New Keynesian Phillips Curve: Problems and Consequences

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Parameter Estimation

BTRY 4090: Spring 2009 Theory of Statistics

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Consistency without Inference: Instrumental Variables in Practical Application *

Generalized Method of Moment

Contributions to the Theory of Optimal Tests. Humberto Moreira and Marcelo J. Moreira FGV/EPGE

Lecture 8 Inequality Testing and Moment Inequality Models

Instrumental Variable Estimation with Heteroskedasticity and Many Instruments

Multiple Regression Analysis

Weak Identification & Many Instruments in IV Regression and GMM, Part I

GMM ESTIMATION FOR DYNAMIC PANELS WITH FIXED EFFECTS AND STRONG INSTRUMENTS AT UNITY. Chirok Han and Peter C. B. Phillips.

Non-Stationary Time Series and Unit Root Testing

A Practitioner s Guide to Cluster-Robust Inference

Testing Endogeneity with Possibly Invalid Instruments and High Dimensional Covariates

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 12 - Lecture 2 Inferences about regression coefficient

Econometric Analysis of Cross Section and Panel Data

A New Paradigm: A Joint Test of Structural and Correlation Parameters in Instrumental Variables Regression When Perfect Exogeneity is Violated

Econometrics - 30C00200

Linear models and their mathematical foundations: Simple linear regression

ECON3327: Financial Econometrics, Spring 2016

Appendix A: The time series behavior of employment growth

Heteroskedasticity and Autocorrelation

Transcription:

Lecture 11 Weak IV Instrument exogeneity and instrument relevance are two crucial requirements in empirical analysis using GMM. It now appears that in many applications of GMM and IV regressions, instruments are only weakly correlated with included endogenous variables. It has been noted in the literature that 1. If instruments are weak, the sampling distribution for GMM and IV statistics are in general nonnormal and standard inference is not reliable. 2. Features that make an instrument plausibly exogenous also make it weak. 3. Weak Instrument is not a small sample problem. (Bound, Jaeger and Baker (1995) demonstrates one example with 329,000 observations.) 4. There are more robust methods than conventional GMM. In this lecture note, we first describe the effect of weak IV, and then introduce some more robust alternatives of 2SLS. To focus on the main idea, we consider a linear regression model with fixed exogenous regressor (Z) and normal error. y 1 = βy 2 + u with y 2 = ZΠ + v 2 where y 1 and y 2 are n 1 vectors and Z is n k matrix of instruments, and (u i, v 2,i ) N(0, [σu, 2 σ uv ; σ uv, σv]). 2 The reduced form of the structural equation is y 1 = ZΠβ + v 1, and y 2 = ZΠ + v 2, (1) where v 1 = βv 2 + u. Then (v 1,i, v 2,i ) N(0, [σ 2 1, ρσ 1 σ 2 ; ρσ 1 σ 2, σ 2 2]), where σ 1 = β 2 σ 2 v + σ 2 u + 2βσ uv, ρσ 1 σ 2 = βσ 2 v + σ uv and σ 2 2 = σ 2 v. The 2SLS estimator is ˆβ 2SLS = y 2Z(Z Z) 1 Z y 1 y 2 Z(Z Z) 1 Z y 2 = (Π Z + v 2)P Z (βzπ + v 1 ) (Π Z + v 2 )P Z(ZΠ + v 2 ) = Π (Z Z)Πβ + Π Z v 1 + v 2ZΠβ + v 2P Z v 1 Π Z ZΠ + 2Π Z v 2 + v 2 P Zv 2 = β + Π Z (σ uv v 2 /σ 2 v + σ 1 1 ρ2 v 0 ) + v 2P Z (σ uv v 2 /σ 2 v + σ 1 1 ρ2 v 0 ) Π Z ZΠ + 2Π Z v 2 + v 2 P Zv 2 (2) Xiaoxia Shi Page: 1

Clearly, the 2SLS estimator is not unbiased, due to the noise v 2. In the extremum case that Π = 0, ˆβ 2SLS = v 2P Z v 1 v 2 P = v 2P Z (ρv 2 /σ 2 + 1 ρ2 v 0 )σ 1 Zv 2 v 2 P Zv 2 = ρσ 1 /σ 2 + σ 1 1 ρ2 v 2P Z v 0 v 2 P Zv 2 = β + σ uv /σ 2 v + σ 1 ρ 2 v 2P Z v 0 v 2 P Zv 2. (3) v 1,i = σ 1 (ρv 2,i /σ 2 + 1 ρ 2 v 0 ), where v 0 N(0, 1) and v 0 is independent of v 2. The OLS estimator is ˆβ OLS = y 2y 1 y 2 y = (Π Z + v 2)(βZΠ + v 1 ) 2 (Π Z + v 2 )(ZΠ + v 2) = Π Z ZΠβ + Π Z v 1 + v 2βZΠ + v 2v 1 Π Z ZΠ + 2Π Z v 2 + v 2 v 2 (4) In the extreme case that Π = 0, ˆβ OLS = v 2v 1 v 2 v 2 = β + σ uv /σv 2 + σ 1 1 ρ2 v 2v 0 v 2 v. (5) 2 Therefore, in the extreme case, the OLS estimator and the 2SLS estimator have the same bias. But they clearly do not have the same distributional feature. The OLS estimator converges to ρσ 2 /σ 1, but the 2SLS estimator does not converge to a deterministic limit as n. The 2SLS estimator is biased and nonnormal. In principle, under the normality and fixed Z assumption, one can study the exact bias and exact distribution of the 2SLS estimator, but it is obviously very cumbersome to do. It is easier to use asymptotics. 1 Weak IV Asymptotics The weak IV asymptotics was introduced by Staiger and Stock (1997). To reflect the instrument weakness, Staiger and Stock (1997) propose to derive the asymptotic distribution under a drifting sequence of DGPs such that Π = C/ n, where C is a finite k 1 vector. Under such a sequence, we shall have ˆβ 2SLS (C Q 1/2 ZZ β + Z v 2 Q 1/2 ZZ )Q 1/2 ZZ Z u d (C Q 1/2 ZZ + Z v 2 Q 1/2 ZZ )(C Q 1/2 ZZ + Z v 2 Q 1/2 ZZ ) (λ + W v2 ) W u = (λ + W v2 ) (λ + W v2 ), (6) where (W v 2, W v u ) N(0, Σ I k )/σ v and λ = C Q 1/2 ZZ /σ2 v. Clearly, the relative magnitude of λ λ and W v 2 W v2 determines how normal the right hand side is. In this sense, the strength of the Xiaoxia Shi Page: 2

instruments is measured by λ λ. The distribution of the right hand side of the above display is still difficult to study. However, it simplifies if we then let the number of instruments k goes to infinity, and keep λ λ/k Λ, then the right-hand-side of the above equation converges to σ uv (Λ + 1)σv 2. This can be considered the asymptotic bias of the 2SLS estimator. Note that when Λ = 0, we are close to the extreme case in the finite sample, and the bias of 2SLS is the same as the OLS estimator. 1 If Λ could be consistently estimated, one could evaluate the relative bias of 2SLS and OLS estimator: σ uv /((Λ + 1)σ 2 v). However, Λ cannot be consistently estimated. Yet, one could conduct a test for the hypothesis H 0 : σ uv /((Λ + 1)σ 2 v) = c versus H 1 : σ uv /((Λ + 1)σ 2 v) < c using the first stage F-statistic. This is the idea proposed in Staiger and Stock (1997) and developed in Stock and Yogo (2005). Even though the F-statistic is used, the usual F-critical value for overall significance of the first-stage model is too small because now the null hypothesis is not Π = 0, but Π Z ZΠ/k equals some positive constant that depends on the relative bias level that one can tolerate. Stock and Yogo (2005) provides a critical value table for a 5% test for c 2 = 5%, 10%, 15% and 20% and for different k. The critical value for the tolerance level 5% ranges from 13 to 21 as k ranges from 3 to 30. Besides bias, one also might want to know how far the usual t-statistic based on 2SLS is from N(0, 1). This determines how much over-rejection the 2SLS-based t-test has. Using similar logic, Stock and Yogo also proposed a F-statistic based test for the null hypothesis H 0 : a 5% t-test over rejects by x%, versus the alternative that it over rejects more. The critical values for such a hypothesis is also tablized in Stock and Yogo (2005). For x = 5, the critical value ranges from 16 to 86, (!) when k ranges from 1 to 30. (This probably means 2SLS-based t-test usually doesn t work...) More robust estimators There are more robust estimators than 2SLS. These estimators can be summarized as the k-class estimators as they have similar shape and only differ from each other by 1 The sequential asymptotics that we describe here is not rigorous. To be more rigorous, one should take n and k to infinity simultaneously while specifying a relative diverging rate between the two. Our result here, however, is correct, under the conditions that k = o(n), k and Π Z ZΠ/k Λ. See Bekker (1994) or Chao and Swanson (2005). Under different conditions on the relative divergence rate of k and n and on the limiting behavior of Π Z ZΠ, one may obtain different limits. See Hahn and Hausman (2005). Though this limit is obtained under large k asymptotics, it is argued that it characterizes the high-order bias of the estimator well even when k is not very large. See Bekker (1994). Xiaoxia Shi Page: 3

an adjusting factor k. The k-class estimator is of the shape ˆβ k class = y 2(I n km Z )y 1 y 2 (I n km Z )y 2, where M Z = I n Z(Z Z) 1 Z. Several famous k-class estimators include: 1. LIML estimator. k is the smallest root of the equation det(y Y ky M Z Y ) = 0, where Y = [y 1, y 2 ]. 2. Fuller-k estimator. k = k LIML c/(n k), where c is an adjustment constant. The LIML and Fuller-k estimators are less biased than 2SLS, especially for large k. The LIML, Fuller-based t-tests are also closer to N(0,1) under the null than the 2SLS based t-test, especially for large k. One can see their superiority in Stock and Yogo (2005), where the F statistic-based weak instrument tests are also designed for LIML and Fuller estimators. For some reason, the LIML is only tested for over-rejection, while the Fuller is only tested for bias. For LIML, for H 0 : a 5% t-test over rejects by 5%, the critical value ranges from 16 to 4, when k ranges from 1 to 30. For Fuller (with c=1), the critical value for the tolerance level 5% ranges from 24 to 2 as k ranges from 1 to 30. This suggests that whenever k is greater than 2 and the instruments seem weak, one should not use the 2SLS estimator, but should try LIML or Fuller. 2 Fully Robust Inference Even though LIML or Fuller estimator are more robust to weak instruments than 2SLS, they are not fully robust. They are biased and their t-statistic nonnormal if the strength of the instruments are below the weak instrument threshold as tablized in Stock and Yogo (2005). One question is: is there any fully robust estimator? That is, is there an estimator ˆβ n that is consistent regardless of instrument strength, and a t-test based on it has null distribution N(0, 1) regardless of instrument strength? It is quite easy to see that there cannot be such an estimator. This is because when the instrument strength is zero (Π = 0), the β parameter is not identified. A non-identified parameter cannot be consistently estimated. Without consistency, there can also be no valid t-test. However, there are fully robust procedures to test the hypothesis H 0 : β = β 0. One such test is the Anderson-Rubin test (AR). The AR test considers the following regression model y 1 β 0 y 2 = Zκ + u Xiaoxia Shi Page: 4

and write H 0 as H 0 : κ = 0, and use the statistic AR = (y 1 β 0 y 2 ) P Z (y 1 β 0 y 2 ) (y 1 β 0 y 2 ) M Z (y 1 β 0 y 2 )/(n k 1). Under H 0, AR = u P Z u (u u u P Z u)/(n k 1) d χ 2 (k). Thus, the AR test uses the 1 α quantile of χ 2 (k) as critical value and rejects H 0 iff AR > χ 2 1 α(k). The AR test is fully robust to weak instrument because the AR statistic s null distribution has nothing to do with Π. Assuming that u i is normal with mean zero and homoskedastic variance, and assuing Z is fixed, the AR statistic is actually pivotal under the null its distribution is fully known under the null. Under the alternative β β 0, AR = (u + (β β 0)ZΠ) P Z ((β β 0 )ZΠ + u) (u u u, P Z u)/(n k 1) thus, the power of the AR test depends on (β β 0 ) 2 Π Z ZΠ/σ 2 u. When Π = 0, that is, the instruments are completely irrelevant, AR test has no power, which is evidence that AR is robust because in that case no test should have power. discussed below. This is a feature for all the fully robust tests A fully robust confidence interval for β can be constructed by inverting the AR test. 3 Conditional Test The AR test has very good power, and is in fact shown to be uniformly most powerfu unbiasedl when k = 1 in Moreira (2001). However, its power deteriorates quickly when k increases because the critical value increases quickly with k. Better tests are developed using the conditional test idea proposed in Moreira (2001, 2003). To introduce the conditional test, we first review the concept of sufficient statistics. Consider a parametric family of distributions {f(w, θ) : θ Θ} of the observable W. Example: the number of successes is a sufficient statistic for the success rate given the i.i.d. binomial sample X 1,..., X n. Definition A statistic T (W 1,..., W n ) is a sufficient statistic for a parameter θ if the distribution of W 1,..., W n given T does not depend on θ. Factorization Theorem: Let f W (w; θ)denote the pdf of a random variable (vector) W. A statistic T (X) is a sufficient statistic for θ if and only if there are functions g(t) and h(x) such that the pdf Xiaoxia Shi Page: 5

can be written as f W (w; θ) = g(t(w); θ) h(x) for all values of θ. Now consider the reduced form model y 1 = ZΠβ + v 1, and y 2 = ZΠ + v 2, (7) where v 1 = βv 2 + u. Then (v 1,i, v 2,i ) N(0, Ω) = N(0, [σ1, 2 ρσ 1 σ 2 ; ρσ 1 σ 2, σ2]), 2 where σ 1 = β 2 σv 2 + σu 2 + 2βσ uv, ρσ 1 σ 2 = βσv 2 + σ uv and σ2 2 = σv. 2 Suppose that Ω is known and Z is nonrandom. The likelihood of this model (which is also the joint density of y 1 and y 2 ) is ( ( n ) ) f(y 1, y 2 ; θ, Π) = (2π) n/2 Ω n/2 exp Y i Ω 1 Y i 2Π Z Y Ω 1 a + a Ω 1 Π Z ZΠ /2, (8) i=1 where Y i = [y 1,i, y 2,i ] and Y = [y 1, y 2 ] and a = [β, 1]. By the Factorization Theorem, Z Y is a sufficient statistic for (β, Π ). Let D = [b 0, Ω 1 a 0 ], where b 0 = [1, β 0 ] and a 0 = [β 0, 1]. Then D is invertible, and thus Z Y D [Z (y 1 β 0 y 2 ), Z Y Ω 1 a 0 ] is also a sufficient statistic. And it can be shown that S := Z (y 1 β 0 y 2 ) and T := Z Y Ω 1 a 0 are independent normal random vectors. Under the null, the first vector is simply Z u and its distribution does not depend on Π. The second vector has distribution that depends on Π. According to the Rao-Blackwell thoerem (e.g. Theorem 7.3.1, Hogg, McKean and Craig, 2005), there is no loss of information by only considering test statistics that are functions of the sufficient statistics. For any test statistic that is a function of S, T, say ψ(s, T ; β 0 ), the null distribution of it is not pivotal it depends on the nuisance parameter Π because T s distribution depends on Π. However, due to the independence between S and T, the conditional null distribution of ψ(s, T ; β 0 ) given T = t is simply the distribution of ψ(s, t; β 0 ) and is pivotal it does not depend on Π. The distribution of ψ(s, t; β 0 ) is known because Z is fixed and u s distribution is assumed to be known. Thus, its (1 α) quantile is also known. Let the quantile be c(s, t, 1 α; β 0 ). If the distribution of ψ(s, t; β 0 ) is continuous and strictly increasing at its 1 α quantile, then we have Pr(ψ(S, t; β 0 ) > c(s, t, 1 α; β 0 )) = α, (9) for any t. This implies that Pr(ψ(S, T ; β 0 ) > c(t, 1 α; β 0 )) = α. This clearly holds regardless of Π. This powerful technique allows us to construct a fully robust test from any test statistic that is a function of S and T. A few examples of test statistics that are functions of S and T are as follows: Xiaoxia Shi Page: 6

1. the AR test statistic with known Ω: AR = (y 1 β 0 y 2 ) P Z (y 1 β 0 y 2 ) σ 2 u AR does not depend on T. Thus, c AR (T, 1 α; β 0 ) = χ 2 1 α(k). 2. the 2SLS-based t-statistic. = S (Z Z) 1 S/σ 2 u χ 2 (k). t n = y 2Z(Z Z) 1 Z (y 1 β 0 y 2 ) (T c 1 S )(Z Z) 1 S =, σ u y 2 Z(Z Z) 1 Z y 2 σ u (T c 1 S )(Z Z) 1 Z (T c 1 S)/c 2 for some constants c 1, c 2 that depends on Ω and β 0. Thus, the critical value c tn (t, 1 α; β 0 ) can be the the quantile of (t c 1 u Z)(Z Z) 1 Z u σ u (t c 1 u Z)(Z Z) 1 Z (t c 1 Zu)/c 2, where Z u N(0, Z Zσ 2 u). The critical value can be easily simulated. 3. the LR statistic. The usual LR statistic based on the likelihood we wrote down above can be written into LR = (1/2)(S S T T + (S S T T ) 2 + 4(S T ) 2 ), The critical value is the 1 α quantile of (1/2)(S S t t + (S S t t) 2 + 4(S t) 2 ) evaluated at t = T. It can also be easily simulated because S N(0, Z Zσ 2 u) and σ 2 u = b 0Ωb 0. 4. the LM statistic: LM = (S T ) 2 /(T Z ZT σ 2 u) The critical value is χ 2 1 α(1) because LM T = t χ 2 (1). Now that we have a number of fully-robust tests once we use the conditional technique. Which test is the best? Andrews and Stock (2005) recommends the conditional LR test because numerical evidence show that show that the CLR test s power function is close to a power envelope derived in Andrews, Moreira and Stock for all two-sided invariant similar tests. Xiaoxia Shi Page: 7