009 Granger Lecture Granger Center for Time Series The University of Nottingham Instrumental Variables Regression, GMM, and Weak Instruments in Time Series James H. Stock Department of Economics Harvard University June 11, 009 1
NBER-NSF Time Series Conference, SMU, Sept. 005
Outline 1) IV regression and GMM ) Problems posed by weak instruments 3) Detection of weak instruments 4) Some solutions to weak instruments 5) Current research issues & literature 3
1) IV Regression and GMM Philip Wright (198) needed to estimate the supply equation for butter: Or: Q t butter butter ln( ) = β 0 + β 1 ln( ) + u t P t y t = Y t β + u t, (generic notation) β 1 = price elasticity of supply of butter OLS is inconsistent because ln( butter P t ) is endogenous: price and quantity are determined simultaneously by supply and demand Philip Wright (1861-1934), MA Harvard, Econ, 1887 Lecturer, Harvard, 1913-1917 4
Figure 4, p. 96, Wright (196), Appendix B 5
Derivation of the IV estimator in P. Wright (198, p. 314) Now multiply each term in this equation by A (the corresponding deviation in the price of a substitute) and we shall have: ea P = A O A S 1. Suppose this multiplication to be performed for every pair of price-output deviations and the results added, then: e A P = A O A A O A S S1 or e = 1. A P But A was a factor which did not affect supply conditions; hence it is uncorrelated A O with S 1 ; hence A S1 = 0; and hence e =. A P In modern notation: for an exogenous instrument, i.e. Z t s.t. EZ t u t = 0, Z (y Yβ) = Z y Z Yβ = z u, but Ez u = 0, so EZ y EZ Yβ = 0, which suggests: ˆ Zy β TSLS = ZY 6
Classical IV regression model & notation Equation of interest: y t = Y t β + u t, m = dim(y t ) k exogenous instruments Z t : E(u t Z t ) = 0, k = dim(z t ) Auxiliary equations: Y t = Π Z t + v t, corr(u t,v t ) = ρ (vector) Sampling assumption: (y t, Y t, Z t ) are i.i.d. (for now) Equations in matrix form: y = Yβ + u ( second stage ) Y = ZΠ + v ( first stage ) Comments: We assume throughout the instrument is exogenous (E(u t Z t ) = 0) Included exogenous regressors have been omitted without loss of generality Auxiliary equation is just the projection of Y on Z 7
Generalized Method of Moments GMM notation and estimator: GMM error term (G equations): h(y t ;θ); θ 0 = true value Note: In the linear model, h(y t ;θ) = y t θ Y t Errors times k instruments: φ t (θ) = G 1 k 1 hy (, θ ) Z Moment conditions - k instruments: Eφ t (θ) = E[ hy (, θ ) Z] = 0 GMM objective function: S T (θ) = t 0 t G 1 k 1 t T T 1/ 1/ T φt( θ) WT T φt( θ) t= 1 t= 1 GMM estimator: ˆ θ minimizes S T (θ) Efficient (infeasible) GMM: W T = Ω 1, Ω = π S φ ( ) (0) t θ CUE (Hansen, Heaton, Yaron 1996): W T = ˆΩ (θ) 1 (each θ) 0 t 8
Weak instruments: four examples Example #1 (time-series IV): Estimating the elasticity of intertemporal substitution, linearized Euler equation e.g. Campbell (003), Handbook of Economics of Finance Δc t+1 = consumption growth, t to t+1 r i,t+1 = return on i th asset, t to t+1 Log-linearized Euler equation moment condition: E t (Δc t+1 τ i ψr i,t+1 ) = 0 where ψ = elasticity of intertemporal substitution (EIS) 1/ψ = coeff. of relative risk aversion under power utility Resulting IV estimating equation: E[(Δc t+1 τ i ψr i,t+1 )Z t ] = 0 (or use Z t 1 because of temporal aggregation) 9
EIS estimating equations: Δc t+1 = τ i + ψr i,t+1 + u i,t+1 (a) or r i,t+1 = μ i + (1/ψ)Δc t+1 + η i,t+1 (b) Under homoskedasticity, standard estimation is by the TSLS estimator in (a) or by the inverse of the TSLS estimator in (b). Findings in literature (e.g. Campbell (003), US data): regression (a): 95% TSLS CI for ψ is (-.14,.8) regression (b): 95% TSLS CI for 1/ψ is (-.73,.14) What is going on? 10
Example # (cross-section IV): Angrist-Kreuger (1991), What are the returns to education? Example #3 (linear GMM): Hybrid New Keynesian Phillips Curve e.g. Gali and Gertler (1999), where x t = labor share; see survey by Kleibergen and Mavroeidis (008). Example #4 (nonlinear GMM): Estimating the elasticity of intertemporal substitution, nonlinear Euler equation Hansen, Heaton, Yaron (1996), Stock & Wright (000), Neely, Roy, & Whiteman (001) 11
Working definition of weak identification θ is weakly identified if the distributions of GMM or IV estimators and test statistics are not well approximated by their standard asymptotic normal or chi-squared limits because of limited information in the data. Departures from standard asymptotics are what matters in practice The source of the failures is limited information, not (for example) heavy tailed distributions, near-unit roots, unmodeled breaks, etc. The focus is on large T. Throughout, we assume instrument exogeneity 1
Why do weak instruments cause problems? IV regression with one Y and a single irrelevant instrument βˆ TSLS = Zy ZY = Z ( Y β +u) ZY = β + Zu ZY If Z is irrelevant (as in Bound et. al. (1995)), then Y = ZΠ + v = v, so T 1 ˆ Zu Ztut β TSLS β = Zv = d T t= 1 z u 1 T z, where zu ~ σu σuv N 0, v z v Ztv σuv σv t T t= 1 Comments: ˆ TSLS β isn t consistent (nor should it be!) Distribution of ˆ β TSLS is Cauchy-like (ratio of correlated normals) (Choi & Phillips (199)) 13
The distribution of ˆ β TSLS is a mixture of normals with nonzero mean: write z u = δz v + η, η z, where δ = σ uv / σ. Then z z u v δ z + η z = v v = δ + v η η, and zv ~ N(0, z v zv σ ) so the asymptotic distribution of βˆ TSLS β 0 is the mixture of normals, ˆ TSLS β (β 0 + δ) d η v η z v σ N(0, ) f z ( z ) v v dzv (1 irrelevant instrument) z heavy tails (mixture is based on inverse chi-squared) center of distribution of ˆ TSLS β is β 0 + δ. But ˆ OLS β β 0 = so Zu ZZ ˆ β TSLS plim( ˆ β OLS ) / T / T d = vu vv / / T T σ η zv σ p uv σ v = δ, so plim( ˆ OLS β ) = β 0 + δ, N(0, ) f z ( z ) v v dzv (1 irrelevant instrument) 14
The unidentified and strong-instrument distributions are two ends of a spectrum. Distribution of the TSLS t-statistic (Nelson-Startz (1990a,b)): Dark line = irrelevant instruments; dashed light line = strong instruments; intermediate cases: weak instruments. The key parameter is: μ = Π Z ZΠ/ σ (concentration parameter) v = k (numerator) noncentrality parameter of first-stage F statistic 15
Weak instrument asymptotics bridges this spectrum. Adopt nesting that makes the concentration parameter tend to a constant as the sample size increases by setting Π = C/ T (weak instrument asymptotics) This is the Pitman drift for obtaining the local power function of the first-stage F. This nesting holds Eμ constant as T. Under this nesting, F d noncentral χ k /k with noncentrality parameter Eμ /k (so F = O p (1)) Letting the parameter depend on the sample size is a common ways to obtain good approximations e.g. local to unit roots (Bobkoski 1983, Cavanagh 1985, Chan and Wei 1987, and Phillips 1987) 16
Weak IV asymptotics for TSLS estimator, 1 included end ogenous vble: ˆ β TSLS β 0 = (Y P Z u)/(y P Z Y) Now where Y P Y = Z λ = = = d 1 ( ZΠ+ v) Z ZZ Z ( ZΠ+ v) T T T 1/ 1/ ΠZZ vz ZZ ZZ ZZ Π Zv + + T T T T T T 1/ 1/ 1/ ZZ vz ZZ ZZ ZZ C + C+ T T T T T (λ + z v) (λ + z v), 1/ CQ ZZ, Q ZZ = EZ t Z t, and z σ u z ~ u N 0, v σ σ uv uv σv 1/ Zv T 17
Similarly, so Y P Z u = = ˆ TSLS β 1 ( ZΠ+ v) Z Z Z Zu ) T T T 1 ZZ vz ZZ Zu C + T T T T d (λ + z v ) z u, β 0 d ( λ + zv) zu ( λ + z )( λ + z ) Under weak instrument asymptotics, μ p C Q ZZ C/ Unidentified special case: ˆ β TSLS β 0 Strong IVs: λ λ ( ˆ β TSLS β 0 ) v d v λ zu λ λ z z z z d v v u v σ = λ λ/ σ v (obtained earlier) ~ N(0, σ ) (standard limit) u v 18
Summary o f weak IV asymptotic results: Resulting asymptotic distributions are the same as in the exact normal classical model with fixed Z but with known covariance matrices. Weak IV asymptotics yields good approximations to sampling distributions uniformly in μ for T moderate or large. Under this nestin g: o IV estimators are not consistent, are nonnormal o Test statistics (including the J-test of overidentifying restrictions) do not have normal or chi-squared distributions o Conventional confidence intervals do not have correct coverage Because μ is unknown, these distributions can t be used directly in practice to obt ain a corrected distribution for purposes of inference 19
3) Detection of weak instruments How weak is weak? Need a cutoff value for μ ˆ TSLS β β 0 d ( λ + zv) zu, ( λ+ z )( λ+ z ) where μ = Π Z ZΠ/ σ v (concentration parameter) p v v = k (numerator) noncentrality parameter of first-stage F λ λ/ σ For what values of μ does Various procedures: v ( λ + zv) zu ( λ+ z )( λ+ z ) v v λ zu λ λ First stage F > 10 rule of thumb (Staiger-Stock (1997)) Stock-Yogo (005a) relative bias method (approximately yields F>10) Stock-Yogo (005a) size method Hahn-Hausman (003) test Other methods (R, partial R, Shea (1997), etc.)? 0
TSLS relative bias cutoff method (Stock-Yogo (005a)) Some background: The relative squared normalized bias of TSLS to OLS is, ( Eβˆ β)'σ ( Eβˆ β) IV IV YY B n = OLS Eˆ YY OLS Eˆ ( β β)'σ ( β β) The square root of the maximal relative squared asymptotic bias is: max B = max ρ: 0 < ρ ρ 1 lim n B n, where ρ = corr(u t,v t ) This maximization problem is a ratio of quadratic forms so it turns into a (generalized) eigenvalue problem; algebra reveals that the solution to this eigenvalues problem depends only on μ /k and k; this yields the cutoff μbias. 1
Critical values One included endogenous regressor The 5% critical value of the test is the 95% percentile value of the noncentral χ k /k distribution, with noncentrality parameter μ bias/k Multiple included endogenous regressors The Cragg-Donald (1993) statistic is: ˆΣ VV ˆΣ VV / 1/ 1/ g min = mineval(g T ), where G T = Y P Z Y k, G T is essentially a matrix first stage F statistic Critical values are given in Stock-Yogo (005a) Software STATA (ivreg),
5% critical value of F to ensure indicated maximal bias (Stock-Yogo, 005a) To ensure 10% maximal bias, need F < 11.5; F < 10 is a rule of thumb 3
Other methods for detecting weak instruments R, partial R, or adjusted R None of these are a good idea, more precisely, what needs to be large is the concentration parameter, not the R. An R =.10 is small if T = 50 but is large if T = 5000. The first-stage R is especially uninformative if the first stage regression has included exogenous regressors (W s) because it is the marginal explanatory content of the Z s, given the W s, that matters. Hahn-Hausman (003) test Idea is to test the null of strong instruments, under which the TSLS estimator, and the inverse of the TSLS estimator from the reverse regression, should be the same Unfortunately the HH test is not consistent against weak instruments (power of 5% level test depends on parameters, is typically 15-0% (Hausman, Stock, Yogo (005)) 4
4) Some Solutions to Weak Instruments There are two approaches to improving inference (providing tools): Fully robust methods: Inference that is valid for any value of the concentration parameter, including zero, at least if the sample size is large, under weak instrument asymptotics o For tests: asymptotically correct size (and good power!) o For confidence intervals: asymptotically correct coverage rates o For estimators: asymptotically unbiased (or median-unbiased) Partially robust methods: Methods are less sensitive to weak instruments than TSLS e.g. bias is small for a large range of μ 5
Fully Robust Testing Approach #1: use a worst case (over all possible values of μ ) critical value for the TSLS t-stat o leads to low-power procedures Approach #: use a statistic whose distribution does not depend on (two such statistics are known) Approach #3: use statistics whose distribution depends on μ, but compute the critical values as a function of another statistic that is sufficient for μ under the null hypothesis. Approach #4: optimal nonsimilar tests (subsumes 1-3) μ 6
Approach #: Tests that are valid unconditionally linear IV case (that is, the distribution of the test statistic does not depend on μ ) The Anderson-Rubin (1949) test Consider H 0 : β = β 0 in y = Yβ + u, Y = ZΠ + v The Anderson-Rubin (1949) statistic is the F-statistic in the regression y Yβ 0 on Z. ( y Yβ0) PZ ( y Yβ0)/ k AR(β 0 ) = ( y Yβ ) M ( y Yβ )/( T k) 0 Z 0 of 7
Comments ( y Yβ0) PZ ( y Yβ0)/ k AR(β 0 ) = ( y Yβ ) M ( y Yβ )/( T k) ˆ TSLS 0 Z 0 AR( β ) = Hansen s (003) J-statistic Null distribution doesn t depend on μ : Under the null, y Yβ0 = u, so u PZ u/ k AR = ~ F k,n k if u t is normal u u/( ) d AR M T k Z χ k /k if u t is i.i.d. and Z t u t has moments (CLT) The distribution of AR under the alternative depends on μ more information, more power (of course) Difficult to interpret: rejection arises for two reasons: β 0 is false or Z is endogenous Power loss relative to other tests; inefficient under strong instruments 8
Kleibergen s (00) LM test Kleibergen developed an LM test that has a null distribution that is doesn t depend on μ. χ 1 - Fairly easy to implement Is efficient if instruments are strong Has very strange power properties (we shall see) Its power is dominated by the conditional likelihood ratio test 9
Approach #3: Conditional tes ts Conditional tests have rejection rate 5% for all points under the null (β 0, μ ) ( similar tests ). Moreira (003): LR tests β = β u ing the LIML likelihood: 0 s LR = max β log-likelihood( β) log-likelihood(β 0 ) Q T is sufficient for μ under the null Thus the distribution of LR Q n μ T does not depend o under the null Thus valid inference can be conducted using the quantiles of that is, use critical values which are a function of Q T LR Q T, 30
Moreira s (003) conditional lik elihood ratio (CLR) test LR = max β log-likelihood(β) log-likelihood(β 0 ) After some algebra, this becomes: = ½{ ˆQ ˆ + [( ˆQ ˆQ ) + 4 ˆQ ] 1/ } LR S QT S T ST where Ĵ 0 = ˆQ = ˆΩ = Ωˆ Qˆ ˆ S Q ST = Ĵ 0 Ω ˆ ˆ ˆ 1/ Y + P Z Y + ˆΩ 1/ Ĵ 0 QST QT M Y Y /(T k), Y + = (y Y) + + Z b Ωˆ a 1/ 1/ 0 0 b Ωˆb a Ωˆ a 1 0 0 0 0, b 0 = 1 β 0 a 0 = β0 1. 31
CLR test: Comments More powerful than AR or LM In fact, effectively uniformly most powerful among asymptotically efficient similar tests that are invariant to rotations of the instruments ( Andrews, Moreira, Stock (006)) STATA (condivreg), Gauss code for computing LR and conditional p- values exists but.. Only developed (so far) for a single included endogenous regressor As written here, the software requires homoskedastic errors; extensions to heteroskedasticity and serial correlation have been developed but are not in common statistical software 3
Approach #4: Nonsimilar tests (Andrews, Moreira, Stock (008)) Polar coordinate transform (Hillier (1990), Chamberlain (005)): Mapping: r = λ λh h, h = c d β β = ( ββ - 0)/ b0 Ωb0 1 1 a a / a Ω Ω a sinθ x(θ) = cosθ = h/ hh ' (so x(θ 0) = β θ β 0 0 < β 0 (> β 0 ) < 0 (> 0) 0 0 0 0 1 ) θ = lim β cos 1 [d β /(h h) 1/ ] - θ = θ π 33
Compound null hypothesis and two-sided alternative: H 0 : 0 r <, θ = 0 vs. H 1 : r = r 1, θ = ±θ 1 (*) Strategy (follows Lehmann (1986)) 1. Null: transform compound null into point null via weighs Λ: h (q) = f ( ;, ) ( ) Q q r θ dλ r Λ 0. Alterna tive: transform into point alternative via equal weighting of (r 1, ±θ 1 ) (this is a necessary but not sufficient condition for nonsimilar tests to be AE): g(q) = 1 fq( q; r, θ ) + fq( q; r, θ ). 34
3. Point optimal invariant test of h Λ vs. g: from Neyman -Pearson Lemma, reject if NP ( q) = Λ, r, θ 1 1 gq ( ) h ( q ) = 1 fq( q; r, θ ) + fq( q; r, θ ) h ( q) Λ Λ > κ Λ, r, θ ; α 1 1 4. Least favorable distribution Λ: NP ( q) Λ, r, θ 1 1 distribution if Λ is least favorable, that is, if sup Pr NP ( q) 0 r, θ = 0 Λ, r1, θ1 Λ, r1, θ1; α < r > κ = α is POINS for the original 5. POINS Power envelope. The PE of POINS tests of (*) is the envelope of power functions of NP ( q ), where Λ LF is the least favorable distribution Λ LF, r, θ 1 1 35
A closed form, POINS test of θ = 0 (using theoretical results on one-point least favorable distributions + Bessel function approximations): * P θ = r1, 1 cosh r ( D ) r, θ 1 * sin 1 1 θ 1 1 where ( ) ν 1 ν + z z ( ν + z ) z z z z z 1/4 / 1/4 ν / 0 0 φ φ 0 0 φ1 φ0 e 1/4 1/4 ν / 1 1 1 1 * 1 0 Dr 1, θ = e + 1 ( ) ν / ν + ( ν + ) φ 0 = ν z + z + ν ln ν + ν + z 0 0 0 ( e tc.), ν = (k )/, Numerical search over r 1, θ 1 resulted in r = 0k and θ 1 = π/4 1 36
Andrews, Moreira and Stock (008), Figures /3 Upper and lower bound on power envelope for nonsimilar invariant tests against (r, θ ) and power envelope for similar invariant tests against (r, θ ), 0 θ π/, r / k = 0.5, 1,, 4, 8, 16, 3, 64; k = 5 37
38
39
40
41
4
43
44
45
Figure 4 Power envelope for similar invariant tests against (r, θ ) and power functions of the CLR, LM, and AR tests, 0 θ π/, r / k = 1, 4, 8, 3, k = 5 46
Figure 5 Power functions of the CLR, P *B, and P * tests (in which π/4), for 0 θ π/, r / k = 1, 4, 8, 3, and k = 5 r 1 = 0k and θ 1 = 47
Confidence Intervals linear IV case Dufour (1997) impossibility result for Wald intervals Valid intervals come from inverting valid tests (1) Inversion of AR test: AR Confidence Intervals 95% CI = {β 0 : AR(β 0 ) < F k,t k;.05 } For m = 1, this entails solving a quadratic equation: ( y Yβ0) PZ ( y Yβ0)/ k AR(β 0 ) = < F k,t k;.05 ( y Yβ ) M ( y Yβ )/( T k) 0 Z 0 For m > 1, solution can be done by grid search or using methods in Dufour and Taamouti (005) Sets for a single coefficient can be computed by projecting the larger set onto the space of the single coefficient (see Dufour and Taamouti (005)), also see recent work by Kleibergen (008) Intervals can be empty, unbounded, disjoint, or convex 48
() Inversion of CLR test: CLR Confidence Intervals 95% CI = {β 0 : LR(β 0 ) < cv.05 (Q T )} where cv.05 (Q T ) = 5% conditional critical value Comments: Efficient GAUSS and STATA (condivreg) software Will contain the LIML estimator (Mikusheva (005)) Has certain optimality properties: nearly uniformly most accurate invariant; also minimum expected length in polar coordinates (Mikusheva (005)) Only available for m = 1 49
What about the bootstrap or subsampling? A straightforward bootstrap algorithm for TSLS: y t = β Y t + u Y t = Π Z t + v t i) Estimate β, Π by ˆ β TSLS t, ˆΠ ii) Compute the residuals u ˆt, v ˆt iii) Draw T errors and exogenous variables from { u ˆt, v ˆt, Z t }, and construct bootstrap data y t, Y t using ˆ β TSLS, ˆΠ iv) Compute TSLS estimator (and t-statistic, etc.) using bootstrap data v) Repeat, and compute bias-adjustments and quantiles from the boostrap distribution, e.g. bias = bootstrap mean of ˆ β TSLS ˆ β using actual data Under strong instruments, this algorithm works (provides second-order improvements). TSLS 50
Bootstrap, ctd. Under weak instruments, this algorithm (or variants) does not even provide first-order valid inference The reason the bootstrap fails here is that ˆΠ is used to compute the bootstrap distributio n. The true pdf depen ds on μ, say f TSLS ( β ˆ TSLS ;μ ) (e.g. Rothenberg (1984 exposition abov e, or weak instrument asymptotics). By using ˆΠ, μ is estimated, say by ˆ μ. The bootstrap correctly estimates f ˆ β TSLS TSLS TSLS ( ; ˆμ ), but f TSLS ( ˆ β ; ˆ μ ) TSLS f TSLS ( ˆ β ;μ ) because ˆ μ is not consistent for μ. This story might sound familiar it is the same reason the bootstrap fails in the unit root model, and in the local-to-unity model. Subsampling for these (non-pivotal) statistics doesn t work either; see Andrews and Guggenberger (007a,b). 51
Some remarks about estimation It isn t possible to have an estimator that is completely robust to weak instruments TSLS (-step GMM) is about the worst thing you can do from a bias perspective LIML (in GMM, CUE) has much better median bias properties but can have large (infinite) variance In the linear case, there are alternative estimators, e.g. Fuller s estimator; see Hahn, Hausman, and Kuersteine r ( JAE, 006) for an extensive MC comparison 5
Example #1: Consumption CAPM and the EIS Yogo (REStat, 004) Δc t+1 = consumption growth, t to t+1 r i,t+1 = return on i th asset, t to t+1 Moment conditions: E t (Δc t+1 τ i ψr i,t+1 ) = 0 EIS estimating equations: Δc t+1 = τ i + ψr i,t+1 + u i,t+1 ( forwards ) or r i,t+1 = μ i + (1/ψ)Δc t+1 + η i,t+1 ( backwards ) Under homoskedasticity, standard estimation is by TSLS or by the inverse of the TSLS estimator (remember Hahn-Hausman (003) test?); but with weak instruments, the normalization matters 53
First stage F-statistics for EIS (Yogo (004)): 54
Various estimates of the EIS, forward and backward 55
AR, LM, and CLR confidence intervals for ψ: 56
What about stock returns should they work? 57
Extensions to >1 included endogenous regressor CLR exists in theory, but difficult computational issues because the conditioning statistic has dimension m(m+1)/ (AMS (006), Kleibergen (007)) Can test joint hypothesis H 0 : β = β 0 using the AR statistic: ( y Yβ0) PZ ( y Yβ0)/ k AR(β 0 ) = ( y Yβ ) M ( y Yβ )/( T k) 0 Z 0 under H 0, AR d χ k /k Subsets by projection (Dufour-Taamouti (005)) or by concentration + bounds (Kleibergen and Mavroeidis (008, 009) 009 is GMM treatment) 58
Extensions to GMM (1) The GMM-Anderson Rubin statistic (Kocherlakota (1990); Burnside (1994), Stock and Wright (000)) The extension of the AR stat evaluated at θ 0 : S ( θ ) = CUE T 0 istic to GMM is the CUE objective function T T 1/ ˆ 1 1/ T φt( θ0) Ω( θ0) T φt( θ0) t= 1 t= 1 d ψ(θ 0 ) Ω(θ 0 ) 1 Ψ(θ 0 ) ~ Thus a valid test of Η 0 : θ = θ 0 can be undertaken by rejecting if S T (θ 0 ) > 5% critical value of χ k. χ k 59
GMM-Anderson-Rub in, ctd. In the homoskedastic/uncorrelated linear IV model, the GMM-AR statistic simplifies to the AR statistic (up to a degrees of freedom correction): S ( θ ) = CUE T 0 = = T T 1/ ˆ 1 1/ T φt( θ0) Ω( θ0) T φt( θ0) t= 1 t= 1 T 1 T 1/ ' 1/ T ( yt θ ZZ 0Yt) Zt sv T ( yt θ 0Yt) Zt t 1 T = t= 1 ( y Yθ0) PZ ( y Yθ0) = k AR(θ 0 ) ( y Yθ ) M ( y Yθ )/( T k) 0 Z 0 The GMM-AR statistic has the same issues of interpretation issues as the AR, specifically, the GMM-AR rejects because of endogenous instruments and/or incorrect θ The GMM-AR can fail to reject any values of θ (remember the Dufour (1997) critique of Wald tests) 60
() GMM-LM Kleibergen (005) develops score statistic (based on CUE objective function details of construction matter) that provides weakidentification valid hypothesis testing for sets of variables (3) GMM-CLR Andrews, Moreira, Stock (006) extension of CLR to linear GMM with a single included endogenous regressor, also see Kleibergen (007). Very limited evidence on performance exists; also problem of dimension of conditioning vector ( 4) Other methods Guggenberger-Smith (005) objective-function based tests based on Generalized Empirical Likelihood (GEL) objective function (Newey and Smith (004)); Guggenberger-Smith (008) generalize these to time series data. Performance is similar to CUE (asymptotically equivalent under weak instruments) 61
Confidence set s Fully-robust 95% confidence sets are obtained by inverting (are the acceptance region of) fully-robust 5% hypothesis tests Computation is by grid search in general: collect all the points θ which, when treated as the null, are not rejected by the GMM-AR statistic. Subsets by projection or by concentration + bounds (see Kleibergen and Mavroeidis (008, 009) for an application of GMM-AR confidence sets and subsets) Valid tests must be unbounded (contain Θ) with finite probability with weak instruments 6
Many instruments: a solution to weak instruments? T he appeal of using many instruments Under standard IV asymptotics, more instruments means greater efficiency. This story is not very credible because (a) the instruments you are adding might well be weak (you already have used the first two lags, say) and ( b) even if they are strong, this require s consistent estimation of increasingly many parameter to obtain the efficient projection hence slow rates of growth of the number of instruments in efficient GMM literature. 63
Example of problems with many weak instruments TS LS Recall the TSLS weak instrument asymptotic limit: ˆ β TSLS β d ( λ + z v) zu 0 ( λ + z )( λ + z ) v with the decomposition, z u = δz v + η. Suppose that k is large, and that λ λ/k Λ (one way to implement many asymptotics ). Then as k, λ z v /k p 0 and λ z u /k p 0 v weak instrument z v z v /k p 1 and z v η/k p 0 (z v and η are independent by construction) Putting these limits together, we have, as k, ( λ z ) p + v zu δ ( λ + zv)( λ + zv) 1 + Λ In the limit that Λ = 0, TSLS is consistent for the plim of OLS! 64
Comments Strictly this calculation isn t right it uses sequential asymptotics (T, then k ). However the sequential asymptotics is justified under certain (restrictive) conditions on K/T (specifically, k 4 /T 0) Typical conditions on k are k 3 /T 0 (e.g. Newey and Windmeijer (004)) Many instruments can be turned into a blessing (if they are not too weak! They can t push the scaled concentration parameter to zero) by exploiting the additional convergence across instruments. This can lead to bias corrections and corrected standard errors. There is no single best method at this point but there is promising research, e.g. Newey and Windmeijer (004), Chao and Swanson (005), and Hansen, Hausman, and Newey (006)) 65
Efficient testing in GMM with weak instruments Andrews, Moreira, Stock (006), Kleibergen (008) Many instruments Breaks in GMM with weak instruments Caner (008) Connection between weak ID, HAC estimation, and SVAR identification using long run restrictions Gospodinov (008) Weak set identification? 5) Current Research Issues & Literature Detection of weak instruments in general nonlinear GMM model Subset testing Kleibergen and Mavroides (008, 009) Improved estimation in GMM Guggenberger-Smith (005) (GEL) 66