Exogeneity tests and weak identification

Cireq, Cirano, Départ. Sc. Economiques Université de Montréal Jean-Marie Dufour Cireq, Cirano, William Dow Professor of Economics Department of Economics Mcgill University June 20, 2008

Main Contributions Outline Motivation This paper is motivated by some empirical facts : Durbin-Wu-Hausman (DWH) specifications tests are routinely used as pretests in applied work [see, Bradford (2003)]; For example, in November 2007, www.jstor.org listed 400 citations of Hausman; In the American Economic Review alone, more than 75 applied papers used DWH-type tests and about 25 of them were written in the years 2000-2004 [see Guggenberger (2008)].

Main Contributions Outline Motivation The problem is that the DWH-type tests are built on the prerequisite of strong instruments. However, the last decade shows growing interest for weak instruments problems in the econometric literature, i.e situations where instruments are poorly correlated with endogenous explanatory variables [see the reviews of Dufour (2003) and Stock-Wright-Yogo (2002)].

Main Contributions Outline Motivation For example, when the instruments are weak, the limiting distributions of OLS and 2SLS estimators often heavily depend on nuisance parameters [see Phillips (1989), Bekker (1994), Dufour (1997), Staiger-Stock(1997) and Wang-Zivot (1998)]. Since DWH specification tests are built from these estimators, this raises the following questions: what happens to specification tests when some instruments are weak? Are these tests robust to weak instruments?

Main Contributions Outline Motivation and Main Contributions Can we use specification tests as instrument selection procedure if we have weak identification as done in Bradford (2003)? If no, is it possible to build a procedure based on these tests that can be used as pretests even if the instruments are weak? In this paper, we considers 3 versions of Hausman tests and 4 versions of DW specification tests and: (1) investigate the finite and large-sample properties of these tests with (possibly) weak instruments, characterize the distributions of the tests and show that:

Main Contributions Outline Major findings (a) when the model parameters are not identified or are close not to being identifiable, 2 versions of Hausman tests and 1 version of DW tests are not valid even asymptotically while the other versions of these tests are valid in both finite and large-sample; (b) when we have weak identification, the DWH tests have a low power. More precisely, the DWH tests are inconsistent and their power does not exceed the nominal levels;

Main Contributions Outline Major Findings (2) we alleviate this drawback by proposing a Three-step consistent procedure which can be used as instrument selection procedure even if the parameters are not identified. The procedure can be summarized as follows: in the first step, we test the weakness of the instruments. If the instruments are strong, we use DWH exogeneity tests in the second step. Otherwise, we use directly identification-robust tests. If the exogeneity tests in the second step accept exogeneity, one may use identification-robust tests or tests possibly non-robust to weak instruments in the third step.

Main Contributions Outline Outline 1. 2. 3. 4. 5. 6.

consider the following simultaneous equation y = Y β + Z 1 γ + u, Y = Z 1 Π 1 + Z 2 Π 2 + V (1) data: y R T, Y R T G, Z 1 R T k 1, Z 2 R T k 2, u R T and V R T G ; unknown coefficients: β R G, γ R k 1, Π 1 R k 1 G, Π 2 R k 2 G.

Define Z = [Z 1 : Z 2 ] R T k and Z = [Z 1, Z 2 ], where Z 2 = M 1 Z 2 and M 1 = I Z 1 (Z 1 Z 1) 1 Z 1, k = k 1 + k 2. (1) can be written as y = Z 1 (γ + Π 1 β) + Z 2 Π 2 β + u + V β Y = Z 1 Π1 + Z 2 Π 2 + V. (2) where Π 1 = Π 1 + (Z 1 Z 1) 1 Z 1 Z 2Π 2.

Define the covariance matrix of [u t, V t ] by [ σ 2 Σ = u δ ], (3) δ Σ V We want to test H 0 : δ = 0. (4)

Define the reduced form errors in (2) as and assume that U = [u + V β, V ] = [Ũ1, Ũ2,..., ŨT ], (5) Ũ t = JW t, t = 1,..., T, (6) where the vector vec(w 1,..., W T ) has a known distribution F W and J R (G+1) (G+1) is an unknown, non-singular matrix [see Dufour and Khalaf (1997)].

In particular, these conditions will be satisfied when W t N(0, I G+1 ), t = 1,..., T. (7) In which case the covariance matrix of Ũt is [ Ω = JJ σ 2 = u + β Σ V β + 2β δ β Σ V + δ Σ V β + δ Σ V ]. (8) Since Ω > 0, there exists a lower triangular non-singular matrix P such that [ ] P P11 0 ΩP = I G+1, P =, (9) P 21 P 22

where with a little algebraic calculus, we have J = P 1. P 11 = (σu 2 δ Σ 1 V δ) 1/2 P 21 = (β + Σ 1 V δ)(σ2 u δ Σ 1 V δ) 1/2 P 22Σ V P 22 = I G. (10) Consider the transformation [ ȳ, Ȳ ] = [y, Y ] P, i.e

ȳ = Z 1 (γp 11 + Π 1 ζ) + Z 2 Π 2 ζ + w 1 Ȳ = Z 1 Π1 P 22 + Z 2 Π 2 P 22 + W 2, (11) where ζ = (Σ 1 V δ)/(σ2 u δ Σ 1 V δ) 1/2. From (11), we have

Mȳ = Mw 1, MȲ = MW 2, M 1 ȳ = M 1 ( Z 2 Π 2 ζ + w 1 ), M 1 Ȳ = M 1 ( Z 2 Π 2 P 22 + W 2 ), (12) where M = I Z 1 (Z 1 Z 1) 1 Z 1 Z 2 ( Z 2 Z 2 ) 1 Z 2. M 1 = I Z 1 (Z 1 Z 1) 1 Z 1.

We also assume that u = Va + ε, (13) where ε has mean zero and variances σ 2 εi T and independent with V, a is a G 1 vector of unknown coefficients. We make the following generic assumptions on the asymptotic behaviour of model variables [where M > 0 for a matrix M means that M is positive definite (p.d.), and refers to limits as T ]:

[ 1 [ ] [ ] p ΣV 0 V ε V ε T 0 σε 2 1 T Z [ V ε ] p 1 0, T Z Z p Σ Z = ] > 0, (14) [ ] ΣZ1 0 > 0, (15) 0 Σ Z 2 1 T Z V L S V, vec(s V ) N [0, Σ V Σ Z ], (16) 1 T [ Z u Z ε ] [ ] L Su N [0, Σ S S ]. (17) ε

Following Staiger and Stock (1997), we consider two setups : (1) Π 2 = Π 0 with rank(π 0 ) = G and (2) Π 2 = Π 0 / T (Π 0 = 0 is allowed), where Π 0 is a k 2 G constant matrix. For a random variable X whose distribution depends on the sample size T, the notation X L + means that P[X > x] 1 as T, for any x.

We consider 3 versions of Hausman (1978) tests H i = T ( ˆβ β) S 1 i ( ˆβ β), i = 1, 2, 3 (18) and the 4 version of Durbin-Wu tests [see Wu (1973)] T l = κ l ( ˆβ β) S 1 l ( ˆβ β), l = 1, 2, 3, 4, (19)

where S 1 = σ 2 (Y P Z2 Y ) 1 ˆσ 2 (Y M 1 Y ) 1, S 2 = [ σ 2 (Y P Z2 Y ) 1 (Y M 1 Y ) 1] S 3 = [ ˆσ 2 (Y P Z 2 Y ) 1 (Y M 1 Y ) 1], (20) σ 2 /T 2SLS estimator of σu 2, ˆσ 2 /T OLS estimator of σu, 2 ˆβ OLS estimator of β, β 2SLS estimator of β. (21)

S 1 = σ 2 1 [(Y P Z2 Y ) 1 (Y M 1 Y ) 1 ], S 2 = Q 2 [(Y P Z2 Y ) 1 (Y M 1 Y ) 1], (22) σ 2 1 = (y Y β) P Z 2 (y Y β), Q 2 = ˆσ 2 ( ˆβ β) [(Y P Z2 Y ) 1 (Y M 1 Y ) 1 ] 1 ( ˆβ β), (23) T 3 = κ 3 T H 2 T 4 = κ 4 T H 3, (24) κ 1 = (k 2 G)/G, κ 2 = (T k 1 2G)/G and κ 3 = κ 4 = (T k 1 G)/G.

Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) Distributions with possibly weak instruments The statistics H i and T l above are all invariant to the transformation P defined by (11) we can replace [y, Y ] by [ȳ, Ȳ ] in the expression of the tests. For the finite-sample theory, we distinguish: (1) [w 1t, W 2t ] F W, where F W is known but non necessary standard; (2) [w 1t, W 2t ] N(0, I G+1 )

Distributions of the statistics Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) Theorem 1: Under the assumptions of the model, if Z is fixed and W F W, then where H i F i (W, µ 1, µ 2 ), i = 1, 2, 3, H i G l (W, µ 1, µ 2 ), l = 1, 2, 3, 4, (25) µ 1 = Z 2 Π 2 ζ = µ 2 P22 1 2 = Z 2 Π 2 P 22, ζ = (Σ 1 V δ)/(σ2 u δ Σ 1 V δ) 1/2.

Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) Distributions of the statistics Under H 0, we have µ 1 = 0, hence the above distributions depend only on µ 2 as nuisance parameter. If Π 2 = 0, we have µ 1 = 0 and µ 2 = 0 for any δ and the the statistics are pivotal.

Distributions under normality Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) Theorem 2: Under the assumptions of the model, if Z is fixed and W N(0, I G+1 ), then H 3 W2 T 1 + 1 κ 2 F (T k 1 2G, G; υ 2, ν 1 ) κ 1F (G, T k 1 2G; ν 1, υ 2 ), T 1 W2 F (G, k 2 G; ν 1, υ 1 ), T 2 W2 F (G, T k 1 2G; ν 1, υ 2 ), T 4 W2 κ 1 + 1 κ 2 F (T k 1 2G, G; υ 2, ν 1 ) κ 2F (G, T k 1 2G; ν 1, υ 2 ),

Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) Distributions under normality where ν 1 = µ 1(C D 1 C)µ 1, υ 1 = µ 1Eµ 1, υ 2 = µ 1(C C D 1 C)µ 1, υ 3 = µ 1D D µ 1, If Π 2 = 0 or δ = 0, we have µ 1 = 0 ν 1 = ν 3 = υ 1 = υ 2 = 0. Hence we have

Distributions under normality Possibly nonstandard distribution of W: (W F W ) Gaussian errors : W N(0, I G+1 ) T 1 F (G, k 2 G), T 2 F (G, T k 1 2G), T 4 κ 4 1 + 1 κ 2 F (T k 1 2G, G) κ 2F (G, T k 1 2G), (26) H 3 T 1 + 1 κ 2 F (T k 1 2G, G) κ 1F (G, T k 1 2G), (27)

Null limiting distributions Theorem 3: Suppose that the assumptions of the model and δ = 0. (A) If Π 2 = Π 0 where Π 0 is a k 2 G constant matrix with rank G, then H i L χ 2 (G), i = 1, 2, 3, T l L 1 G χ2 (G), l = 2, 3, 4, T 1 L F (G, k2 G).

Null limiting distributions (B) If Π 2 = Π 0 / T where Π 0 = 0 is allowed, then H 3 L χ 2 (G), H i L H χ 2 (G), i = 1, 2. T 1 L L 1 F (G, k G), Tl G χ2 (G), l = 2, 4, T 3 L 1 G H 1 G χ2 (G),

Power functions Theorem 4: Suppose that the assumptions of the model and δ 0. (A) If Π 2 = Π 0 where Π 0 is a k 2 G constant matrix with rank G, then H i L +, i = 1, 2, 3, Tl L +, l = 1, 2, 3, 4. (B) If Π 2 = Π 0 / T where Π 0 is a k G constant matrix, then H 3 S V L χ 2 (G, µ V ), T 1 S V L F (G, k G; µv, λ V ) and T 2 S V, T 4 S V L 1 G χ2 (G, µ V ),

Asymptotic power functions λ V = 1 σε 2 a S V (Σ 1 Z 2 Σ 1 Z 2 V Σ 1 Z 2 )S V a µ V = 1 σε 2 a Π 0 V Π 0 a = δ Σ 1 V Π 0 V Π 0 Σ 1 V δ,

Power functions (C) If Π 2 = 0, then H 1, H 2 L H χ 2 (G), H 3 L χ 2 (G), T 1 L F (G, k G), T2, T 4 L 1 G χ2 (G), T 3 L 1 G H 1 G χ2 (G)

Simulations The DGP is y = Y β + u, Y = Zπ 2 + V (28) ( [ ]) where (u t, V t ) i.i.d 1 δ N 0,, Z is a T k matrix δ 1 i.i.d uncorrelated with (u, V ), such that Z t N(0, I ) and Y t is a scalar, k varies from 5 to 20, δ {.5,.25, 0,.25,.5}

simulations π 2 = ηπ; η = 0 (design of strict non identification), η = 0.001 (design of weak identification) η = 1 (design of strong identification), π is a vector of ones. N = 10000 and T = 50, 100, 500

simulations table 1 : Percent Rejected at nominal level of 5 % for T = 50, β = 5 δ = 0 δ =.25 k η = 0 η =.001 η = 1 η = 0 η =.001 η = 1 T 1 5 5.33 4.93 4.85 5.27 4.82 20.2 T 2 5 5.2 5.03 5.11 5.12 4.85 31.01 T 3 5 0.38 0.32 4.87 0.42 0.35 30.13 T 4 5 5.08 4.92 5.03 4.98 4.77 30.65 H 1 5 0.27 0.21 4.3 0.24 0.22 28.06 H 2 5 0.43 0.39 5.09 0.45 0.37 30.89 H 3 5 5.35 5.12 5.22 5.3 5.06 31.42 T 1 20 4.77 5.06 5.1 4.74 4.9 22.58 T 2 20 5.01 5.15 5.35 4.69 5.03 24.91 T 3 20 3.5 3.53 5.21 3.19 3.49 24.55 T 4 20 4.88 5.08 5.21 4.56 4.93 24.63 H 1 20 3.03 2.81 4.37 2.6 2.81 22.42 H 2 20 3.65 3.82 5.49 3.39 3.78 25.13 H 3 20 5.09 5.32 5.55 4.88 5.14 25.18

simulations table 1 (continued): Percent Rejected at nominal level of 5 % for T = 50, β = 5 δ =.50 δ =.25 k η = 0 η =.001 η = 1 η = 0 η =.001 η = 1 T 1 5 4.88 4.69 53.71 5.17 4.63 20.28 T 2 5 4.98 4.64 82 5.15 4.56 32.1 T 3 5 0.25 0.34 81.4 0.37 0.32 31.25 T 4 5 4.87 4.55 81.8 5.06 4.43 31.8 H 1 5 0.19 0.25 79.54 0.23 0.2 28.65 H 2 5 0.29 0.39 81.92 0.4 0.35 31.98 H 3 5 5.11 4.78 82.3 5.31 4.72 32.47 T 1 20 4.96 4.85 61.66 4.72 4.99 21.99 T 2 20 5.03 4.89 69.21 4.85 4.81 24.44 T 3 20 3.43 3.22 68.81 3.21 3.29 24.05 T 4 20 4.92 4.77 68.91 4.77 4.68 24.12 H 1 20 2.67 2.69 65.99 2.62 2.72 22.26 H 2 20 3.59 3.54 69.57 3.46 3.49 24.71 H 3 20 5.12 5.11 69.64 5 5 24.78

simulations table 1 : Percent Rejected at nominal level of 5 % for T = 500, β = 5 δ = 0 δ =.25 k η = 0 η =.001 η = 1 η = 0 η =.001 η = 1 T 1 5 4.75 5.56 5.09 5 4.96 95.01 T 2 5 5.36 5.49 5.4 5.03 5.13 99.89 T 3 5 0.31 0.24 5.36 0.26 0.23 99.89 T 4 5 5.34 5.49 5.38 5.01 5.11 99.89 H 1 5 0.31 0.21 5.24 0.26 0.22 99.88 H 2 5 0.31 0.24 5.38 0.26 0.24 99.89 H 3 5 5.36 5.53 5.41 5.04 5.17 99.89 T 1 20 4.74 4.79 5.11 4.8 5.33 99.83 T 2 20 4.55 4.78 5.13 5.02 5.39 99.95 T 3 20 2.63 2.68 5.13 2.74 3.17 99.95 T 4 20 4.55 4.78 5.13 5.02 5.38 99.95 H 1 20 2.55 2.64 5.08 2.7 3.12 99.95 H 2 20 2.65 2.69 5.15 2.78 3.19 99.95 H 3 20 4.55 4.8 5.15 5.05 5.39 99.95

Three-step procedure Let us reconsider model (2): y = Y β + Z 1 γ + u, Y = Z 1 Π1 + Z 2 Π 2 + V. (29) The parameter of interest is β and we want to test H 0 : β = β 0 by using Z 2 as instruments.

Three-step procedure However, we suspect that some variables in Z 2 are endogenous. What people often do in empirical work [see Bradford (2003)] is to apply a two-stage test by using DWH tests in the first stage to test the validity of these instruments. As we have showed here, this may be misleading if we have weak identification.

Three-step procedure We propose the following three-step procedure to alleviate this drawback of the DWH tests: step 1: test whether the instruments Z 2 are weak, i.e test the hypothesis H π2 : m 2 = det(π 2Π 2 ) = 0 ; (30) (1a) if H π2 is rejected, go to step 2; (1b) if H π2 is not rejected, go to step (3a);

Three-step procedure step 2: use exogeneity tests (Durbin-Wu-Hausman pretests) (2a) if exogeneity is accepted, use the t-test based on OLS estimator; (2b) if exogeneity is rejected, go to step (3a) or (3b); step 3: (3a) use identification-robust tests; (3b) use tests possibly non-robust to weak instruments. THANK YOU!!!