Instrument endogeneity, weak identification, and inference in IV regressions

Size: px

Start display at page:

Download "Instrument endogeneity, weak identification, and inference in IV regressions"

Allan Merritt
5 years ago
Views:

1 Instrument endogeneity, weak identification, and inference in IV regressions Firmin Doko Tchatoka The University of Adelaide August 18, 14 School of Economics, The University of Adelaide, 1 Pulteney Street, Adelaide SA 55, Tel: , Fax: , firmin.dokotchatoka@adelaide.edu.au

2 ABSTRACT We study the possibility of making exact inference in structural models where: (a) instrumental variables (IVs) may be arbitrary weak, collinear, and invalid; (b) the errors may have non-gaussian distributions (possibly heavy-tailed and heteroskedastic); and (c) the reduced-form specification may be arbitrary heterogenous, nonlinear, unspecified, and incomplete (missing instruments). We provide the necessary and sufficient conditions under which such models are identifiable, despite instrument invalidity. Under these conditions, Wald-type tests and confidence sets (CSs) based on k-class type estimators may apply. However, these conditions rule out models in which IVs are weak and are further difficult to check in practice. To alleviate these drawbacks, we develop identification-robust procedures to test and build CSs for model coefficients. CSs for individual component of the structural and instrument endogeneity parameters are obtained by projection. Tests of exclusion restrictions and instrument selection are covered as instances of the class of proposed procedures. Key words: Instrument endogeneity; weak instruments; identification-robust inference; finite-sample; non-gaussian errors; projection method; exact Monte Carlo tests. JEL classification: C1; C13; C36. i

3 1. Introduction This paper contributes to the literature on weak instruments by developing exact tests and confidence sets (CSs) in IV regressions where: (i) instrumental variables may be arbitrary weak, collinear, and invalid; (ii) the errors may have non-gaussian distributions (possibly heavy-tailed and heteroskedastic); and (iii) the reduced-form specification may be heterogenous, nonlinear, unspecified, and incomplete (omitted instruments). IV methods usually requires the availability of exogenous instruments, at least as great as the number of coefficients to be estimated, whereas the validity of those instruments is not testable. In the last two decades, the so-called weak instruments problem has received considerable attention in econometrics. Research on this topic is widespread and most of the studies have usually imposed the exclusion restrictions. 1 Several studies of weak instruments have recently questioned the validity of the strictly exogeneity assumption. For example, Murray (6) states: in most IV applications, the instruments often arrive with a dark cloud of invalidity hanging overhead and researchers usually do not know whether their correlations with the error are exactly zero. He suggests avoiding invalid instruments in IV procedures. However, as it is difficult to test the validity of all candidate instruments, it might seem that if we want to avoid invalid instruments, there is little hope in trying to use IV methods. Bound et al. (1995, Section 3) provide evidence on how a slight violation of instrument exogeneity can cause severe bias in IV estimates, especially when identification is weak. Hausman and Hahn (5) show that even in large samples, IV estimator can have a substantial bias even when the instruments are only slightly correlated with the error. Doko Tchatoka and Dufour (8) and Guggenberger (11) show that 1 For example, see Phillips (1989), Nelson and Startz (199a, 199b), Choi and Phillips (199), Bekker (1994), Hall, Rudebusch and Wilcox (1996), Dufour (1997, 3, 9), Staiger and Stock (1997), Wang and Zivot (1998), Stock and Wright (), Donald and Newey (1), Dufour and Jasiak (1), Kleibergen (, 4, 5), Moreira (3), Stock, Wright and Yogo (), Hall and Peixe (3), Stock and Yogo (5), Dufour and Taamouti (5, 7), Swanson and Chao (5), Andrews and Stock (7a, 7b), Guggenberger and Smith (5), Andrews, Moreira and Stock (6), Dufour and Hsiao (8), Hansen, Hausman and Newey (8), Moreira, Porter and Suarez (9), Chaudhuri and Zivot (1), Dufour, Khalaf and Beaulieu (1), Guggenberger (11), Guggenberger, Kleibergen, Mavroeidis and Chen (1), Dufour, Khalaf and Kichian (13), Mikusheva (1, 13), Doko Tchatoka and Dufour (14), and Doko Tchatoka (14). See Bound, Jaeger and Baker (1995), Brock and Durlauf (1), Imbens (3), Hausman and Hahn (5), Murray (6), Kiviet and Niemczyk (7, 1), Doko Tchatoka and Dufour (8), Kraay (8), Ashley (9), Bazzi and Clemens (9), Hahn, Ham and Moon (1), Guggenberger (11), and Berkowitz, Caner and Fang (8, 1). 1

4 Anderson and Rubin (1949) (AR) and Kleibergen () (K) tests are highly sensitive to instrument invalidity. In this paper, we stress the fact that valid tests and CSs can be obtained in IVs regressions in which the exclusion restrictions are violated. Several studies have adopted the same position and we wish to make progress in this direction. Imbens (3) shows that bounds on average treatment effect in program evaluation can be recovered via a sensitivity analysis of the correlations between treatment and unobserved components of the outcomes. Ashley (9) shows how the discrepancy between OLS and IV estimates can be used to estimate the degree of bias under any given assumption about the degree to which IVs violate the exclusion restrictions. Kiviet and Niemczyk (7, 1) show that the realizations of IV estimator based on strong but invalid instruments seem much closer to the true parameter values than those obtained from valid but weak instruments. Doko Tchatoka (13) shows that bootstrapping improves the size of Durbin-Wu-Hausman tests of exogeneity when IVs are invalid. Imbens et al. (11) show that Donald and Newey (1) bias-corrected estimator and Phillips and Hale (1977) jackknife IV estimator can be consistent and asymptotically normal even when the exclusion restrictions are violated. Their framework, however, rules out weak issues. Berkowitz, Caner and Fang (1) show that re-sampling Anderson and Rubin (1949) AR-statistic yields test that has correct level asymptotically, under localto-zero instrument endogeneity. 3 However, their method is valid only in large-sample and is overly conservative. By contrast, we develop a finite-sample procedure for testing and building CSs in IV regressions where: IVs can be arbitrary weak, collinear, and violate the exclusion restrictions; the errors may have non-gaussian distributions (possibly heavy-tailed and heteroskedastic); and the reduced-form specification may be arbitrary heterogenous, nonlinear, unspecified, or incomplete. To be more specific, we consider a model of the form y 1 = y β + X 1 γ 1 + u, u=x γ + e where y 1 is an observed dependent variable, y is an observed (possibly) endogenous regressor, X 1 is a matrix of exogenous variables, X is a matrix of instruments which may be 3 The parameter that controls instrument endogeneity goes to zero [at rate n 1/ ] when the sample size n increases.

5 rank-deficient and violate the exclusion restrictions if γ, e is an error term. We call γ instrument endogeneity because it determines which variables in X are valid instruments and which are not. We observe that a procedure similar to that of Anderson and Rubin (1949) can be used to develop identification-robust tests and CSs on θ =(β, γ ). So, identification-robust CSs for each component of β and γ can be derived through the projection method. 4 When the error e follows a Gaussian distribution and is independent of X, we show that the standard Fisher-type critical values are applicable. But for a wide class of parametric non-gaussian errors (possibly heavy-tailed and heteroskedastic), we supply exact Monte Carlo tests 5 critical values. We provide the analytical forms of the proposed CSs for θ and scalar linear transformations of θ, and characterize the necessary and sufficient conditions under which there are bounded. Tests of exclusion restrictions and instrument selection are covered as instances of the class of proposed procedures, including in exactly identified models. The remainder of this paper is organized as follows. Section formulates the model and related assumptions. Section 3 studies structural parameters identification with invalid instruments. Section 4 develops finite-sample tests and CSs with correct level, even in the presence of non-gaussian errors. Section 5 deals with the Monte Carlo experiment, while Section 6 presents the empirical application. Conclusions are drawn in Section 7 and proofs are presented in the Appendix. Throughout this paper, I q stands for the identity matrix of order q. For any n m matrix A, P A = A(A A) + A is the projection matrix on the space spanned by A, and M A = I n P A, where B + refers to the Moore-Penrose inverse of the matrix B. The notation rank(a) is the rank of the matrix A, while A =[tr(a A)] 1 denotes the usual Euclidian or Frobenius norm for A. B > for a squared matrix B means that B is positive definite (p.d.). The symbol d signifies equivalence in distribution. The orthogonal group of p p matrices is denoted by O(p)= { H M(p, p) : H } H = I p, where M(p, p) is the set of all squared matrices of order p. Finally, for any n m matrix Ω, K er(ω) = {ω R m : Ωω = } is the null set (kernel) of Ω, and I m(ω) = {x R n : x=ωω for some ω R m } is the column space of Ω. 4 See Dufour and Jasiak (1), Dufour and Taamouti (5), and Doko Tchatoka and Dufour (14). 5 See Dufour (6). 3

6 . Model and assumptions We consider a standard linear IV regression with one endogenous right hand side (rhs) variable, k 1 exogenous variables, and k IVs. The sample size is n. The model consists of a structural equation and a reduced-form equation: (.1) (.) y 1 = y β + X 1 γ 1 + u, y = X 1 π 1 + X π + v where y 1, y R n, X 1 R n k 1, and X R n k (k 1) are observed variables; u, v R n are unobserved errors; β R, γ 1 R k 1, π R k, and π 1 R k 1 are unknown fixed parameters. Let Y =[y : y ]=[Y 1,...,Y n ] R n (G+1) and X =[X 1 : X ]=[X 1,...,X n ] R n k (k= k 1 +k ) denote the matrix of endogenous variables and instruments, respectively. We define Y t R and X t R k as the tth rows of Y and X, written as column vectors, and similarly for other random matrices. We make the following assumptions on the model variables. Assumption A For some fixed vector γ inr k, we have: (.3) u = X γ + e, where e R n is an error term. Assumption A implies that X violates the usual exclusion restrictions if γ. If X t (t = 1,,...,1) have same finite second moments and e is uncorrelated with X, then cov(x t,u t ) whenever γ / K er(cov(x )), where Cov(X ) = E[(X t µ X )(X t µ X ) ], is the covariance 6 matrix of X t, µ X = E(X t ). Therefore, some variables in X do not constitute valid instruments. Because of this property, we call γ instrument endogeneity. The usual tests of exclusion restrictions such as that of Sargan (1958) Basmann (196), and Hansen (198) typically test the null hypothesis that γ = in (.3); see Staiger and Stock (1997) and Hahn et al. (1). Under Assumption A, the conditional mean and variance of u t, given X t, depend on X t if γ (conditional structural heteroskedasticity). Staiger and Stock (1997) and Guggenberger (11) made a similar assumption with γ = γ / n for some fixed vector γ Rk (local-to-zero instrument en- 6 See Anderson (1971, Section.3) and Muirhead (5, Section 1.) for a similar definition and notation. 4

7 dogeneity). 7 Doko Tchatoka and Dufour (8) show the Anderson and Rubin (1949) (AR)-test and Kleibergen () (K)-test are highly size distorted under Assumption A. Imbens et al. (11) show that Donald and Newey (1) bias-corrected estimator and Phillips and Hale (1977) jackknife-instrumental-variables estimator may still be consistent and asymptotically normal under Assumption A. But their framework assumes strong instruments (i.e., π ), thus ruling out issues associated with weak instruments. Assumption B rank(x 1 )=k 1 and rank(x )=ν k for some integer ν >. Assumption B imposes full-column rank on the matrix of exogenous variables X 1, but allows X to have any arbitrary rank ν >. For example, some linear combinations of the columns of X may be collinear or close to being so. Dufour and Taamouti (7) also consider a similar setup. Note that there no impediment to expanding the full-column rank assumption of X 1 to any arbitrary rank ν 1. Under Assumption B, we may also have < rank(x) = ν k. In the remainder of this paper, W = M X1 X, where M X1 = I n X 1 (X 1 X 1) 1 X 1, denotes the residuals of the regression of X on the columns of X 1. Assumption C (i) { (et, v t,x t) : t = 1,..., n } are i.i.d. E[(e t,v t ) X t ]= t = 1,..., n; and(iii)e(w t W t)=ω W t = 1,..., n. across t n and n; (ii) Assumption C-(i) and (ii) are widely used in the IV literature; see Staiger and Stock (1997), Stock and Wright (), Kleibergen (, 5), Andrews et al. (6), Guggenberger et al. (1). (i) states that the errors and IVs are random and i.i.d across i nand n, while (ii) is the usual conditional zero mean assumption of the errors. Assumption C-(iii) requires the existence of same second moments for each row of W. Note that Ω W may not be positive definite and can be singular. In particular, this is the case when X is rankdeficient. No assumption on the existence of second moments or more for the errors(e,v ) is needed. Now, consider the linear map defined by (.4) R n R n e ε σ (e)=σ(x)e, 7 Also, see Berkowitz, Caner and Fang (8, 1). 5

8 where σ(x) is possibly a random function of X such that the event {σ(x) X} has probability 1 a.s., e is the error term defined in(.3). Note that σ(x) need not to be constant and the distribution of ε σ (e) may arbitrarily depend on X. For the purpose of developing finite-sample theory, we make the following assumption. Assumption D There is σ (X) such that ε σ (e) satisfies (.4) and ε σ (e) d ε, where given X = x, ε has a completely specified distribution (x). P ε Assumption D states that the conditional distribution, given X, of the error in the regression of u on X only depends on X and a (typically unknown) possibly random scale factor σ (X). This assumption holds whenever e is independent of X with a distribution of the form e ε/σ d, where ε has a specified distribution and σ is an unknown positive constant. In this context, the standard Gaussian model is obtained by taking (.5) ε N(, I n ). But non-gaussian distributions which may be heteroskedastic and lack moments (such as the Cauchy or Student t distributions) are covered. Under Assumption A, we can write model (.1)-(.) as: (.6) (.7) y 1 = y β + X 1 γ 1 + X γ + e, y = X 1 π 1 + X π + v wheree[(e : v ) X]= by Assumption C. Let θ =(β,γ ) and δ =(θ,γ 1,π 1,π ) Θ R R k R k1 R k, where Θ is the parameter space. The statistical model associated with (.6)-(.7) is defined as (Y X,{P δ, δ Θ}), where Y and X are drawn from Y and X, respectively. For any random variable Z (possibly function of δ ), P Z (x;δ) denotes the distribution of Z conditional on X = x and we write Z X=x P Z (x;δ). We consider the problem of testing (.8) H θ : θ = θ vs. H θ 1 : θ θ, for some fixed θ =(β,γ ). 6

9 Our main focus is on finite-sample and we are concerned with developing similar tests for H θ and confidence sets for β and γ when some instruments in X may be arbitrary weak, invalid, or collinear. But before proceeding, it will illuminating to study the identification of β in the presence of possibly invalid instruments first. 3. Identification of β We study the identification of the structural coefficient (β) when some instruments in X may be invalid or collinear. If the exclusion restrictions (γ = ) are satisfied, X has fullcolumn rank with probability 1, and [e,v ] has mean zero, then the weak IV literature documents that the necessary and sufficient condition for the identification of β is π ; 8 see Stock et al. (), Dufour (3), Andrews and Stock (7a), Dufour and Hsiao (8), and Mikusheva (13). Here, we investigate the identification of β when γ is left unrestricted (possibly invalid instruments) and X may contains redundant columns (ν < k ) or close to being so. First, we can write the reduced-form for Y =[y 1 : y ] as: (3.1) Y = X 1 ξ 1 + X ξ +V with V =[v 1 : v ], where v 1 = v 1 (β) = v β + e, ξ 1 = (ξ 11 : π 1 ) = (γ 1 + π 1 β : π 1 ), and ξ = (ξ 1 : π ) = (γ + π β : π ). Suppose first that rank(x )=ν = k and E([e : v ]/X)=. Hence, the least squares estimators of the coefficients on X j ( j = 1,) in each regression of (3.1) are unique. On expressing the coefficients on X in (3.1) as ξ =(ξ 1 : π )=(γ : )+π a, where a = (β,1), it can be seen that ξ 1 is proportional to β (with factor π ) if γ =. Since ξ is identifiable, β is identifiable whenever π if γ =. However, if γ is left unrestricted, ξ 1 = γ + π β does not necessary have a solution for β even if π. To be more specific, ξ 1 = γ + π β has a solution with respect to β if, and only if, (ξ 1 γ ) I m(π ), where I m(π ) is the column space of π ; see Magnus and Neudecker (1999, Ch., Section 9, Theorems 11-1). Even if a solution β exists, it generally depends on the unknown value γ. The condition under which a solution β (when 8 Note that this condition is replaced by the full-column rank assumption of π if G > 1 (i.e., there are more than one endogenous regressor in y ). 7

10 it exists) does not depend on γ is that γ K er(π ), where K er(π ) is the null space of π. We can generalize the above argument to cases in which X is rank-deficient (ν < k ) or close to being so. One difficulty here, is that, ξ 1 and π are not uniquely determined from the regression (3.1); see Magnus and Neudecker (1999, Ch. 13, Section 6, Eqs.(1)- ()). However, the conditional means E(y X) and E(y 1 X) are still estimable despite the fact that X does not have full-column rank; see Magnus and Neudecker (1999, Ch. 13, Theorem 15). This implies that the errors v 1t (t = 1,..., n) of the reduced-form equation for y 1 in (3.1) are identifiable, despite the multiplicity of least squares estimators. 9 So, β may be identifiable through the orthogonality between v 1t and X t. We can prove the following proposition on the identification of β when γ is left restricted and X may be rank-deficient. Proposition 3.1 Suppose that(.1)-(.) and Assumptions A - C are satisfied. Then: (3.) β is identifiable π / K er(ω W ) and γ K er(π Ω W ), where K er(ω W ) and K er(π Ω W) denote the null sets of Ω W and π Ω W, respectively. Remark 3. (i) The identification condition in Proposition 3.1 can be stated as π Ω W and π Ω W γ =, and is easy to interpret. First, π Ω W means that the instruments in X are strong. Second, observe that W t π can be viewed as the indirect effect of X t on y 1t when the effect of the exogenous variables X 1t has been eliminated, and W t γ is its direct effect on y 1t. So, the condition π Ω W γ =E(π W tw t γ ) = means that both effects are uncorrelated [similar to Imbens et al. (11)]. (ii) If γ = (strict exogeneity) and ν = k (X has full-column rank), the identification condition of Proposition 3.1 becomes π. So, Proposition 3.1 generalizes the usual necessary and sufficient condition for identification in the previous weak IV literature. (iii) Proposition 3.1 also generalizes the condition under which the two-stage least squares estimator is consistent in Doko Tchatoka and Dufour (8, Eqs(4.8)-(4.9)), and 9 Under Assumption C-(ii), we can write v 1t = y t E(y 1t X t ) for all t = 1,..., n. From Magnus and Neudecker (1999, Ch. 13, Theorem 15), E(y 1t X t ) is identifiable even when ν < k, hence v 1t is also identifiable. 8

11 those under which the Donald and Newey (1) bias-corrected estimator and Phillips and Hale (1977) jackknife-instrumental-variables estimators are consistent and asymptotically normal in Imbens et al. (11). Both Doko Tchatoka and Dufour (8) and Imbens et al. (11) assume that X has full-column rank k. Here, we allow X to have any arbitrary rank. In addition, Imbens et al. (11) analyze the setup in which X is strong (i.e., π in our framework), meaning that weakly identified models are rule out of their scope. Here, we also allow for any arbitrary value of π. (iv) Under the conditions of Lemma 3.1, the usual F-type or Wald-type statistics based on k-class estimators 1 could be used to assess H θ and build CSs for β, despite instrument endogeneity. However, these identification conditions albeit interesting, rule out model where identification is not very strong and they are in addition difficult to implement in practice (because the condition γ K er(π Ω W) cannot be verified empirically, as γ is not consistently estimable under instrument invalidity). Clearly, Proposition 3.1, albeit interesting because it shows that the usual procedures, such as F- or Wald-type tests, may yield valid inference when IVs are invalid, it cannot be implemented in empirical applications. In the remainder of this paper, we focus on developing tests for H θ and building CSs for θ and scalar linear transformations of θ. 4. Exact inference In this section, we develop a finite-sample procedure for assessing H θ on θ. First, we propose a test for H θ and building CSs that is similar despite instrument possible endogeneity and rank deficiency. Second, we use test inversion method to obtain joint CSs with level 1 α for θ, where < α < 1. Finally, we apply the projection techniques 11 to get identification-robust CSs with level 1 α (at least) for scalar linear transformations of θ. The marginal CSsc for the structural coefficient (β ) and each component of instrument endogeneity (γ ) are deduced as special cases of the proposed projection method. 1 For example, the Donald and Newey (1) bias-corrected estimator or the Phillips and Hale (1977) jackknife-instrumental-variables estimator. 11 see Dufour and Jasiak (1), Dufour and Taamouti (5), Dufour and Taamouti (7), and Doko Tchatoka and Dufour (14). 9

12 4.1. Similar test for H θ We propose a generalization of Anderson and Rubin (1949) approach for assessing H θ. We note that alternative procedures, such as Kleibergen (, K) and Moreira (3, CLR) tests, could be exploited for that purpose. 1 However, no finite-sample distributional theory is available for these methods, especially with heteroskedastic non-gaussian errors. Further, these are not robust to missing instruments. 13 The Anderson and Rubin (1949) approach to test H θ is to consider the transformed reduced-form equation for y 1 : (4.1) y Y β X γ = X 1ξ 11 + X ξ 1 + v 1, where ξ 11 = π 1(β β )+γ 1, ξ 1 = π (β β )+γ γ, and v 1 v 1 (β)=v (β β )+ e. Since ξ 1 = when β = β and γ = γ, we can assess Hθ by considering the F-statistic of the null hypothesis ξ 1 = in (4.1). Let Ω = 1 n νỹ M X Ỹ, where Ỹ =[Y : X ], and define (4.) S + =[(W W) + ] 1/ W Ỹ b.(b Ωb ) 1/ with b =(1, θ ). The generalization of the AR-statistic for assessing H θ is given by: (4.3) Ψ AR (S + ;θ )=S + S + /(ν ν 1 ). The corresponding test rejects H θ at level α (<α < 1) when (4.4) Ψ AR (S + ;θ )>κ Ψ,α (S + ;θ ) where κ Ψ,α (S + ;θ ) is the 1 α quantile of Ψ AR (S + ;θ ) and the critical value function is { } defined as κ Ψ,α (S + ;θ ) = inf τ R; P θ (Ψ AR (S + ;θ )>τ) α. If the distribution of Ψ AR (S + ;θ ), conditional 14 on X = x, is absolutely continuous with respect to the Lebesgue 1 For example, Andrews et al. (6) show that the CLR-test is nearly uniformly more powerful (UMP) among invariant similar tests that are asymptotically efficient, and have recommend the use of this test in empirical practice. Guggenberger et al. (1) show that the plug-in Anderson and Rubin (1949) (AR) and Kleibergen () (K) subset statistics yield more powerful tests than their projection-based counterparts. 13 See Dufour and Taamouti (7), Dufour et al. (13), and Doko Tchatoka (14). 14 Observe that for a given b, S + only depends on the data (Y,X) Y X. So, if the distribution of Y, given X, is absolutely continuous with respect to the Lebesgue measure, then the distribution of Ψ AR (S + ;θ ) is also absolutely continuous with respect to the Lebesgue measure. 1

13 measure, we obtain (4.5) P θ [Ψ AR (S + ;θ )>κ Ψ,α (S + ;θ )]=α so that the test based on the critical value κ Ψ,α (S + ;θ ) is exact. To implement this test, the critical values κ Ψ,α (S + ;θ ) need to be computed from the observed data, especially with non-gaussian errors. This will be done using numerical simulations. Let (4.6) S + ω =[(W W) + ] 1/ W ω. ( ω ) 1/ M X ω n ν for all ω {e, ε σ }, where e and ε σ are the error terms satisfying (.3) and (.4), respectively. Let P S +(x;θ ) and P (x,ω), ω {e, ε S + σ }, denote the distributions of S + X=x and S ω + X=x, respectively, ω under H θ. We note that P (x,ω) does not directly depend on a specific value θ S + tested ω because the statistic S + ω does not directly involve θ. We can now state Lemma 4.1 on the behavior of S + and S + ω, ω {e, ε σ}, under H θ. Lemma 4.1 Suppose that Assumptions A - B and H θ X = x, we have: are satisfied. Then, conditional on (a) P +(x;θ )=P (x,e); S S + e (b) P (x, e) is invariant to the transformation(.4) P (x,e)=p (x,ε S + e S + e S + σ ) ε σ εσ satisfying(.4). If further Assumption D holds, we have P (x,e) P (x, ε), S + e S + ε where ε P ε (x) and P ε (x) is completely specified. Remark 4. (i) Lemma 4.1-(a) shows that the distribution of S +, under H θ, only depends on X and the error of the regression (.3). So, the reduced-form errors v plays no role, therefore, they can heteroskedastic in any arbitrary way. From (4.3), it is also clear that the null distribution of Ψ AR (S + ;θ ) also depends only on X and the distribution of e. (ii) Lemma 4.1-(b) shows that the conditional distribution of S + under H θ, given X = x, is invariant to any linear transformation satisfying (.4). In particular, the conditional distribution of S + under H θ, given X = x, only depends on the distribution of ε under Assumption D. Therefore, the distribution Ψ AR (S + ;θ ) X=x under H θ, only depends on the distribution of ε. 11

14 (iii) If ε is normally distributed 15 and is independent of X, then it is straightforward to show that Ψ AR (S + ;θ ) F(ν ν 1,n ν) for all values of π. So, H θ can be assessed by using the critical values of a F-distribution with (ν ν 1,n ν) degrees of freedom. However, If (.5) does not hold (non-gaussian error) or if ε is not independent of X, the null distribution of Ψ AR (S + ;θ ) X=x is nonstandard. Nevertheless, it does not involve any nuisance parameter. So, we can proceed as follows 16 to compute the 1 α critical value of Ψ AR (S + ;θ ) under H θ : (1) choose α 1 and N so that α = [α 1N]+1 N+1, where [z] is the smallest integer greater than z; () for a given θ, compute the test statistic Ψ () AR (S+ ;θ ) based on the observed data; (3) generate N i.i.d. error vectors ε ( j) = [ ε ( j) j) 1,..., ε( n ], j = 1,..., N, according to the specified distribution P ε,x and compute the corresponding statistic Ψ ( j) AR, j = 1,..., N, following (4.3); note that the null distribution of Ψ AR (S+ ;θ ) does not depend on the specific values θ tested, so there is no need to make it depend on θ ; (4) compute the empirical distribution function based on Ψ ( j), j=1,..., N, AR (4.7) P Ψ (z;n) P Ψ (z)= N j=11[ψ( j) z] AR N+ 1 where 1[C] = 1 if condition C holds, and 1[C] = otherwise; (5) reject H θ when Ψ () AR (S+ ;θ ) κ MC ( ε;α)= P 1 Ψ, at level α (1 α 1), where P 1(q)= inf{z : P Ψ Ψ (z) q} is the generalized inverse of P Ψ ( ). We can now prove Theorem 4.3 on the validity of the AR-test, where F α (n ν,ν ν 1 ) denotes the 1 α quantile of F(ν ν,n ν). Theorem 4.3 Suppose that Assumptions A - B and D are satisfied. Then, the test that rejects H θ when Ψ AR (S + ;θ ) > ĉ Ψ ( ε;α) is similar with significance level α for all values of π (instrument quality), where ĉ Ψ ( ε;α) = F α (n ν,ν ν 1 ) if (.5) holds and X is independent of ε, and ĉ Ψ ( ε;α)=κ MC ( ε;α) otherwise. Remark 4.4 (i) Theorem 4.3 shows that the critical values computed as in Remark??-(iii) yield a test with correct level in finite-sample, even when the model is weakly identified (π = or is close to being so) and the errors are non-gaussian. So, the proposed test is 15 That is, if equation (.5) is satisfied. 16 We cover the case in which P ε (x) is continuous, so that the null distribution of Ψ AR (S + ;θ ) X=x is also continuous. If P ε (x) is not continuous, the Monte Carlo test algorithm can easily be adapted by using tiebreaking method, as in Dufour (6). 1

15 robust to weak IVs and non-gaussian errors (even in small samples), despite instrument possible invalidity (γ ). (ii) Since the null distribution of Ψ AR (S + ;θ ) does not depend on any of the variables and parameters in (.), Theorem 4.3 hold even when the reduced-form for y is given by (4.8) y = m(x 1,X,X 3, v, π 1,π,π 3 ), where π 1, π, and π 3 are vectors of unknown reduced-form coefficients, m( ) is an arbitrary unspecified (possibly) nonlinear function, and X 3 R n k 3 is a matrix of instruments that may have been omitted from (.). Because of the later properties, the proposed procedure is robust to nonlinear and incomplete reduced-forms [similar to Dufour and Taamouti (7) and Dufour et al. (13)]. More interestingly, y 1,..., y n may be arbitrary heterogenous and the reduced-form disturbances v 1,..., v n may not follow a Gaussian distribution or may also be arbitrary heteroskedastic. So, the proposed procedure is also robust to heterogeneity in the reduced-forms. We now examine the finite-sample power of the proposed test. To do this, we consider the following linear transformation [similar to (.4)] on the error v 1 of the regression (4.1): R n R n (4.9) v 1 ε σβ = σ β (X)v 1, where σ β (X) is (possibly) a random function of X and β such thatp δ [σ β (X) X=x ]=1. In addition, we also make the following assumption. d Assumption E There exists σ β (X) satisfying (4.9) such that ε σβ X=x ṽ, where the distribution Pṽ(x) of ṽ, given X = x, is completely specified. Assumption E is similar to Assumption D. It states that the distribution of the reducedform disturbance v 1 only depends on X and a typically unknown (possibly) random scale factor σ β (X), which is also (possibly) a function of both X and the structural coefficient β. Again, a Gaussian distribution for v 1 is obtained by choosing P (x) = N(, I ṽ n). But non- Gaussian distributions, including heavy-tailed distributions which may lack moments, are covered. In general, Assumptions D and E do not entail each other, except when β = β or 13

16 the conditional distribution of(e,v ), given X = x, is Gaussian with finite second moments. Let (4.1) Sṽ =[(W W) + ] 1/ W ṽ, σ ṽ = (ṽ )1/ M X ṽ, and µ n ν = µ π θ π C θ where C θ = (θ θ )σ β (X), µ π = [(W W) + ] 1/ W W[π : I k ], and ṽ is the error in Assumption E. Lemma 4.5 characterizes the distribution of S + and Ψ AR (S + ;θ ) under H θ 1. Lemma 4.5 Suppose that Assumptions A - B and E are satisfied. If further θ θ, Then we have: S + d σ 1 ( Sṽ+ µ ) and ṽ π θ Ψ AR (S+ ;θ ) (ν d ν 1 ) 1 σ ( Sṽ+ µ ṽ π θ ) ( Sṽ+ µ ). π θ Remark 4.6 (i) states that the distribution of Ψ AR (S + ;θ ), under H θ, only depends on the 1 distributions of Sṽ and σ ṽ, as well as the factor µ π. Since given X = x, the distributions of θ Sṽ and σ ṽ only depend on that of ṽ, it is clear that the conditional distribution of Ψ AR (S + ;θ ) under H θ 1, given X = x, only depends on µ π θ and the distribution of ṽ. Therefore, the power function, η AR ( ), of the corresponding AR-test that rejects H θ when Ψ AR (S + ;θ )> ĉ Ψ ( ε;α), is entirely determined by the distribution of ṽ and the factor µ π θ, i.e. (4.11) ] P θ H θ [Ψ AR (S + ;θ )>ĉ Ψ ( ε;α) = η AR (ṽ, µ ;α). π θ 1 Under Assumption E, ṽ Pṽ(x) and Pṽ(x) does not depend on θ. So, µ π is the only factor θ that determines test power. (ii) If ṽ Pṽ(x) N(,I n ) and X is independent of ṽ, then Ψ AR (S + ;θ ) X=x F τ (ν x,θ ν 1,n ν) for all values of π, where τ = σ x,θ β µ π θ is the non-centrality parameter [similar to Revankar and Hartley (197)]. Therefore, the exact power of the test in (4.11) can be computed from the sample using a noncentral F-distribution with (ν ν 1,n ν) degrees of freedom and non-centrality parameter τ for θ and π x,θ fixed. If Pṽ(x) is not a normal distribution or X depends on ṽ, the distribution of Ψ AR (S + ;θ ) X=x, under H θ, is 1 nonstandard but it can be simulated for θ and π fixed. So, the exact power of the test can also be simulated for θ and π fixed, by using the Monte Carlo test method described in Remark??-(iii). We can now state the following necessary and sufficient condition under which the proposed AR-test exhibit power in finite-sample. 14

17 Theorem 4.7 Suppose that Assumptions A - B and E are satisfied. Then, the test that rejects H θ when Ψ AR (S + ;θ )>ĉ Ψ ( ε;α) exhibits power for all values of π, if, and only if, ) ξ 1 ([(W / K er W) + ] 1/ W W, where ξ 1 =[π : I k ](θ θ ). Theorem 4.7 follows directly from Lemma 4.5 shows that µ π θ is the only factor that determines the proposed AR-test power, i.e., power exists if, and only if, µ ) = π θ σ β (X)[(W W) + ] 1/ W Wξ 1, or equivalently, ξ 1 ([(W / K er W) + ] 1/ W W, since P δ [σ β (X) X=x ] = 1 from (4.9). As seen from the expression of ξ 1, power may still exist even when π = (irrelevant instruments), provided γ γ. However, the test has low power if both π and γ γ building CSs for θ and scalar linear transformations of θ. are zero or close to being so. We now focus on 4.. Exact confidence sets In section, we develop a methodology to builds CSs on θ and linear combinations of the elements of θ. When θ is unknown, Ψ () AR (S+ ;θ ) is also unknown and the test procedure described in Remark 4.-(iii) is not directly implementable. We stress the fact exact CSs can be obtained for model parameters by using test inversion techniques. In Section 4..1, we describe how to build joint CSs for θ, while in Section 4.., we deal with scalar linear transformations w θ, for some w Joint confidence sets for θ In Theorem 4.3, we show that the test that rejects H θ similar with significance level α for any identification strength π. when Ψ AR (S + ;θ ) > ĉ Ψ ( ε;α) is So, we can invert Ψ AR (S + ;θ ) to obtain a joint CS with level 1 α for θ. More precisely, the generalized Anderson-Rubin-type CS for θ is given by: (4.1) C θ (α)= { } θ : Ψ AR (S + ;θ ) ĉ Ψ ( ε;α) ={θ : Q(θ ) } where Q(θ ) = θ Aθ + b θ + c is a quadratic-linear form in θ such that A = [y : X ] H[y : X ], b = [y : X ] Hy, c = y Hy, H = M X1 [1+ĉ Ψ ( ε;α)( ν ν 1 n ν )]M X. Depending on the value of A, b, and c, the quadric surface Q(θ ) = may take different 15

18 forms: ellipsoid, paraboloid, hyperboloid, and cone. So, the confidence set C θ (α) may be unbounded; see Dufour and Taamouti (5, Theorem 4.1). In particular, C θ (α) is unbounded when A is not positive semi-definite. We will now focus on building CSs for w θ Projection-based confidence sets for w θ We use the projection techniques 17 to obtained CSs for scalar linear transformtion w θ. Let h(θ) be any arbitrary function of θ, and C θ (α) be the joint CS for θ in (4.1). Since the event θ C θ (α) entails h(θ) h[c θ (α)], hence h[c θ (α)]={h(θ) : θ C θ (α)} is a confidence set with level (at least) 18 1 α for h(θ). Conceptually, the confidence set with level (at least) 1 α for h(θ )=w θ, obtained by projecting C θ (α) is defined as: (4.13) C w θ(α) = h[c θ (α)]={ζ : ζ = w θ for some θ C θ (α)} = {ζ : ζ = w θ s.t. Q(θ ) }. Without any loss of generality, let partition w as w = (w 1,w ), where w 1 is a scalar and w is a k 1 vector (possible zero). Let R= w R = w 1 w I G+k 1 and define (4.14) Ā = R 1 AR 1 = ā11 Ā 1 Ā 1 Ā, b=r 1 b= b 1 b, A and b are given in (4.1). Also, consider the spectral decomposition of Ā given by: (4.15) Ā = P Λ P, Λ = diag(λ 1,..., λ k ), where P 1 : k p, P : k (k p ), and λ j are the eigenvalues of Ā with λ j if 1 j p, λ j = if j > p ; and p = rank(ā ). We can now prove Theorem 4.8 on the analytic form of C w θ(α) in (4.14). Theorem 4.8 Suppose that (.1) -(.), Assumptions A - B, and D are satisfied. Then, we have: 17 see Dufour and Jasiak (1) and Dufour and Taamouti (5, 7). 18 Observe thatp[h(θ) hc θ (α)] P[θ C θ (α)] 1 α so that h[c θ (α)] has level at least 1 α. 16

19 { } C w θ(α) = ζ : ã 1 ζ + b 1 ζ + c 1 S 1 if Ā is p.s.d., { } = ζ : ā 1 ζ + b 1 ζ + c { ζ : Ā 1 ζ + b } if Ā =, where = R otherwise; ã 1 = ā 11 Ā 1Ā+ Ā1, b 1 = b 1 Ā 1Ā+ b, c 1 = c 1 4 b Ā+ b, S 1 = / if rank(ā )=k, and S 1 = { ζ : P (Ā 1 ζ + b ) } if 1 rank(ā )<k. Remark 4.9 (i) First, we observe that Theorem 4.8 is similar to Theorem 4.1 in Dufour and Taamouti (7), so, we only give the guide lines of the proof in the appendix. (ii) The theorem provides the analytical form of the CSs for any linear combination of the elements of θ, but we find it useful to discuss the follow two interesting applications in details: (1) CS for the structural coefficient β, and () instrument selection. 1. CS for the structural coefficient β The CS for β is obtained from Theorem 4.8 by choosing w 1 = 1 and w = in (4.14). In this case, we have ā 11 = y Hy, Ā 1 = W y, Ā = W W, b 1 = y Hy 1, and b = W y 1, where H is given in (4.1) and W = M X1 X. So, the CS for β with level (at least) 1 α is explicitly given by: (4.16) { C β (α)= } β : ã 1 β + b 1 β + c 1 } { β : a 1 β + b 1β + c S 1, if W W is p.s.d., S, if W W =, R if W W is not p.s.d., where ã 1 = y (H P W)y, b 1 = y (H P W)y 1, c 1 = y 1 (H P W)y 1, P W = W(W W) + W, S = {β : W y β W y 1 }, and S 1 = / if rank(w W) = k and S 1 = {β : P (W y β W y 1 ) } if 1 rank(w W) < k. So, the analytical form of C β (α) in (4.16) can be explicitly given by looking the eigenvalues of instrument matrix W W. For example, if all eigenvalues of W W are positive, C β (α) takes the form of the { } quadratic inequality, i.e., C β (α)= β : ã 1 β + b 1 β + c 1.. Instrument selection A second interesting application of Theorem 4.8 is instrument selection. Let γ = 17

20 (γ 1,...,γ k ) (γ p ) 1 p k. Since γ p = entails that the variable X p constitute a valid instrument, the CS for γ p provides a test of the validity of X p for all p=1,...,k. Specific, we select X p as a valid IV if the CS of its coefficient (γ p ) in the structural equation contains zero, i.e., C γ p (α). If / C γ p (α), X p does not constitute a valid instrument. We stress the fact that instrument selection may still be meaningful, although (4.16) provides a valid CS for the structural coefficient of interest β. For example, in empirical applications where not all instruments are weak, providing a procedure to select those that are valid may yield consistent point estimate of β that is relevant for policy analysis. We now show how to obtain C γ p (α) from Theorem 4.8, for all p=1,...,k. (1) For each p=1,...,k, rearrange the parameters and data as follows: (4.17) (4.18) θ (p) = (γ p,θ (p) ), θ (p) =(β,γ (p) ), γ (p) = γ \{γ p }, X (p) = [y : X (p) ], X (p) = X \{X p }, W = M X1 X =[W 1,...,W p,...,w k ], where convention, we consider that γ (p) is simply not present in (4.17) when k = 1. () Compute the quantities a (p) 11 = W p W p, A (p) 1 = X(p) W p, A (p) = X(p) HX (p), b (p) 1 = W p y 1, b (p) = X (p) Hy 1, and c (p) = y 1 Hy 1, as well as ã 1p = W P HX (p) W p, b 1p = W p(i n P (p))y HX 1, and c 1p = y 1 (H P HX (p) )y 1, where P (p) = HX (p) HX (X (p) HX (p) ) + X (p) H. And (3) the CS C γ (α), p=1,...,k, is obtained by choosing w w (p) =(1, ) and replacing p a 11, A 1, A, b 1, and b by a (p) 11, A(p) 1, A(p), b(p) 1, b(p), and c(p) in (4.16), respectively, i.e.: { (4.19) C γ (α)= p R } γ p : ã 1p(γ p ) + b 1p (γ p )+ c 1p } { γ p : a 1p(γ p ) + b 1p γ p + c 1p S 1p, if A (p) S p, if A (p) =, if A (p) is p.s.d., is not p.s.d., where S = {γ p : X(p) W p γ p X(p) Hy 1 }, S 1 = / if rank(x (p) HX (p) ) = k and } S 1 = {γ p : P (X(p) W p γ p X(p) Hy 1 ) if 1 rank(x (p) HX (p) ) < k. Again, if X (p) HX (p) is positive definite, then C γ (α) takes the form of the quadratic inequality, { p } i.e., C γ (α)= γ p p : ã 1p(γ p ) + b 1p (γ p )+ c 1p. We will now illustrate our theory through a Monte Carlo experiment. 18

21 5. Simulation experiment We use simulation to examine the performance of the proposed AR-test. The DGP 19 is (5.1) (5.) y 1t = y t β + u t, u t = X tγ + e t, y t = m(x t, X 3t, v t ; π, δ), t = 1,..., n, where the reduced-form model for y t uses two alternative specifications: (1) m(x t, X 3t, v t ; π, δ)=x t π + X 3t δ + v t, and () m(x t, X 3t, v t ; π, δ)=exp(x t π + X 3t δ)+v t. The first specification is the usual linear model, while the second is nonlinear. X 3 is a n 5 matrix of instruments that belong to the true DGP, but are omitted in the inference (missing instruments). So, δ measures the degree of instrument omission in this setup. If δ =, then no instrument is omitted while δ means relevant instrument exclusion. In this experiment, we set δ = λδ, where δ is a 5 1 vector of ones and λ varies in {,.1,.1, 1}. For example, λ = is a design of no instrument exclusion, λ =.1 is a design of weak instrument exclusion, λ =.1 is a design of moderately weak instrument exclusion, and λ = 1 is a design of strong instrument exclusion. X contains k = 5 instruments that violate the exclusion restrictions if γ. Each column of X and X 3 is generated i.i.d. normal with identity matrix. The reduced-form coefficient vector π µ is chosen as π =( n X π )1/ π, where π is a 5 1 vector of ones, µ is the concentration parameter which describes the strength of X. We vary µ in {, 13, 1}, where µ = is a complete non-identification or irrelevant IVs setup, µ = 13 is a design of weak instruments, and µ = 1 is for strong identification (strong instruments). We set β β = β, γ γ = τ.γ, where β = 1, γ is a 5 1 vector of ones (so the IVs are invalid), and β and τ vary inr. In this setup, the null hypothesis H θ is equivalent to test whether β = τ =. So, β = τ = in the graphs indicates the empirical size, while the values β and τ indicate test empirical power. To shorten the exposition, we only present the empirical power in the direction of τ = β /3, but the results do not change qualitatively with alternative directions. We also consider two alternative specifications for the errors[e,v ] joint distribution. In 19 Note that there is no exogenous variable X 1 in (5.1)-(5.), but the results do not change qualitatively if such exogenous variables were included. See Hansen et al. (8) and Guggenberger (1) for a similar parametrization. 19

22 the first one,(e t,v t ) N [, σ ] (X )Σ ρ for all t = 1,..., n (conditional Gaussian errors), ( ) where σ (X ) = exp ϖ k 1/ X, Σ ρ = 1 ρ, ρ varies in {.,.5,.9}, and ρ 1 ϖ {, 1, 1}. In the second one, (e t,v t ) follow a multivariate t(3) distribution with the same covariance matrix as the first specification. In both cases, Assumptions D and E are satisfied. If ϖ =, the errors are homoskedastic, but they are heteroskedastic if ϖ {1, 1}. We use the exact Monte Carlo test critical values in all cases. Figures 1-3 present the results. Figure 1 is about Gaussian heteroskedastic errors, while Figures and 3 deal with homoskedastic and heteroskedastic t(3)-errors, respectively. In all figures, the power curves are drawn for each strength of the omitted instruments X 3 (λ {,.1,.1, 1}). In each figure, the sub-figures (a) and (c) represent the cases in which all instruments in X are irrelevant (µ = ), whereas (b) and (d) describe strong instruments (µ = 1). 1 Meanwhile, the sub-figures (a) and (b) represent a linear specification of the reduced-form for y, while those in (c) and (d) are the nonlinear specification. The sample size is set at n = 5, the nominal level at 5%, and the rejection frequencies are computed using N = 1, pseudo-samples. First, we observe that in all cases including heteroskedastic errors, nonlinear reducedform, and missing instruments the rejection frequencies under H θ is very close to the nominal 5% level (see β = in all graphs). So, the proposed tests are robust to weak identification, heteroskedastic and possibly non-gaussian errors, as well as nonlinearity and instrument exclusion in the reduced-form specification, thus conforming our theory findings in Section 4. Second, we note that all tests have good power in all cases considered despite the relatively small sample size (n = 5). In particular, the exclusion of relevant instruments in the inference does not substantially affect the power of the tests when identification is strong (µ = 1), showed the power curves for different values of λ in each sub-figure (b) and (d). However, instrument exclusion have a slight effect on test power in absence of identification (µ = ), as showed the power curves for different values of λ in each sub-figure (a) and (c). In addition, note that the proposed test have good power even with t(3)-type heteroskedastic errors and nonlinear reduced-form with missing instruments, this confirming our theoretical conclusions in Section The case in which µ = 13 (weak instruments) is omitted to shorten the exposition.

23 Figure 1. Power of AR-test with heteroskedastic errors (σ =.1) Power of AR test with heteroskedastic errors (σ=.1): n=5 Power of AR test with heteroskedastic errors (σ=.1): n= Rejection frequencies Rejection frequencies No IV exclusion: λ= No IV exclusion: λ= 1 IV exclusion:λ=.1 IV exclusion:λ=.1 1 IV exclusion:λ=.1 IV exclusion:λ= β β (a) Normal errors: µ =,m(x, X 3 ; Π)=X Π + X 3 λ δ (b) Normal errors: µ = 1 3,m(X, X 3 ; Π)=X Π + X 3 λ δ Power of AR test with heteroskedastic errors (σ=.1): n=5 Power of AR test with heteroskedastic errors (σ=.1): n= Rejection frequencies Rejection frequencies No IV exclusion: λ= No IV exclusion: λ= 1 IV exclusion:λ=.1 1 IV exclusion:λ=.1 IV exclusion:λ=.1 IV exclusion:λ= β β (c) Normal errors: µ =,m(x, X 3 ; Π)=exp(X Π + X 3 λ δ ) (d) Normal errors: µ = 1 3,m(X, X 3 ; Π)=exp(X Π + X 3 λ δ ) Figure. Power of exact Monte Carlo AR-test with homoskedastic errors (σ = ) Power of exact Monte Carlo AR test with homoskedastic errors (σ=): n=5 8 Power of exact Monte Carlo AR test with homoskedastic errors (σ=): n=5 1 Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β (a) t(3)-errors: µ =,m(x, X 3 ; Π)=X Π + X 3 λ δ (b) t(3)-errors: µ = 1 3,m(X, X 3 ; Π)=X Π + X 3 λ δ Power of exact Monte Carlo AR test with homoskedastic errors (σ=): n=5 7 Power of exact Monte Carlo AR test with homoskedastic errors (σ=): n=5 1 Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β (c) t(3)-errors: µ =, m(x, X 3 ; Π)=exp(X Π + X 3 λ δ ) (d) t(3)-errors: µ = 1, m(x, X 3 ; Π)=exp(X Π + X 3 λ δ ) 1

24 Figure 3. Power of exact Monte Carlo AR-test with heteroskedastic errors (σ =.1) 7 Power of exact Monte Carlo AR test with heteroskedastic errors (σ=.1): n=5 Power of exact Monte Carlo AR test with heteroskedastic errors (σ=.1): n= Rejection frequencies Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ=.1 1 No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β β (a) t(3)-errors: µ =,m(x, X 3 ; Π)=X Π + X 3 λ δ (b) t(3)-errors: µ = 1 3,m(X, X 3 ; Π)=X Π + X 3 λ δ 35 Power of exact Monte Carlo AR test with heteroskedastic errors (σ=.1): n=5 Power of exact Monte Carlo AR test with heteroskedastic errors (σ=.1): n= Rejection frequencies Rejection frequencies No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β 1 No IV exclusion: λ= IV exclusion:λ=.1 IV exclusion:λ= β (c) t(3)-errors: µ =, m(x, X 3 ; Π)=exp(X Π + X 3 λ δ ) (d) t(3)-errors: µ = 1, m(x, X 3 ; Π)=exp(X Π + X 3 λ δ )

25 6. Empirical application We apply the proposed methods to Card (1995) model of the returns to education and earnings. The version of this model after controlling for eventual instrument invalidity is: (6.1) (6.) log(wage) = βeduc+x 1γ 1 + X γ + e, educ = X 1π 1 + X π + v, where wage is the earning, educ is the length of education (schooling), X 1 = [1,exper,exper,race,smsa66,south66,IQ] consists of a constant, experience variables and indicator variables for race, residence in a metropolitan area, residence in the south of the United States, and IQ score. The instrument matrix X consists of the proximity-to-college indicators for educational attainment; these are proximity to - and 4-year college. Hence, we have γ = (γ 1,γ ) R. The original specification in Card (1995) imposes the exclusion restrictions, i.e., γ =. In recent years, several studies have raised concerns about the validity of the proximity to - and 4-year college indicators as instruments for educ; for example, see Slichter (13, Section 5). Here, we allow γ and we use the method proposed in this paper to build a joint confidence sets for θ = (β,γ 1,γ ) and marginal confidence sets for β, γ 1, and γ. Moreover, Kleibergen (4, Table, p. 41) shows that the proximity-to-college indicator instruments are not very strong. So, it is important to use statistical procedures that are robust to both weak and invalid instruments for inference in model (6.1 )-(6.). The data analyzed are from the National Longitudinal Survey of Young Men (from 1966 to 1981). We use the cross-sectional 1976 subsample which contains 31 observations. The variables contained in the data set are: two variables indicating the proximity to college, the length of education, log wages, experience, age, racial, metropolitan, family, and regional indicators. If we impose the exclusion restrictions (γ = ), the identification-robust confidence set with level 95% for the returns to education (β) that result on inverting AR(β ) is given by: } C β (α)= {β : β 1.193β =[11.47%, 39.67%] when IQ score is The results reported are based on the critical values of the F-distribution but the results are similar when the asymptotic χ critical values are used. 3

Identification-Robust Inference for Endogeneity Parameters in Linear Structural Models

SCHOOL OF ECONOMICS AND FINANCE Discussion Paper 2012-07 Identification-Robust Inference for Endogeneity Parameters in Linear Structural Models Firmin Doko Tchatoka and Jean-Marie Dufour ISSN 1443-8593