Single-Equation GMM (II) - PDF Free Download

Single-Equation GMM (II) Guochang Zhao RIEM, SWUFE Week 12, Fall 2016 December 1, 2016 1 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Overidentification test If the equation is exactly identified, it is possible to choose δ so that all the elements of sample moments g n ( g n ) are zero and the distance (objective function) is zero. If the equation is over-identified, then the distance cannot be exactly set to zero, but we we would expect the minimized distance to be close to zero. It turns out that, if the weighting matrix Ŵ is chosen optimally so that plim Ŵ = S 1, then the minimized distance is asymptotically chi-squared. 2 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

oposition Overidentificaiton 3.6 (Hansen's test: test of Hansen s overidentifying J test restrictions (Hansen, 1982 ppose there is available a consistent estimator, $, of S (= E(gig:)). Und Suppose there is available a consistent estimator, Ŝ, of sumptions 3.1-3.5, S(= E(g i g i )). Under Assumptions 1-5: This is specification test. The test is not consistent against failures of the orthogonality test. J test in small samples exceeds the normal size (i.e., the test rejects too often). 3 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

GMM Testing subsets of orthogonality conditions I : } I : } [ [ Divide the K instruments into two groups: } K1 } rows, K1 rows, xi = xi = K - K1 rows. K - K1 rows. e model we where wish to xtest it is are suspect. we wish to test is The part of the model we wish to test is n is testable if there are at least as many nonsuspect instruments as ficients able if so there that This are K1 restriction p at L. least The basic is many testable idea nonsuspect is to compare if thereinstruments two arej atstatistics leastas as many rate o that GMM K1 estimators p nonsuspect L. The of basic the instruments same idea coefficient is to compare as there vector are two 6, one coefficients J using statistics only so that ts included in xil, and the other using also the suspect instruments xi2 M estimators K 1 of the L. same coefficient vector 6, one using only xil. If the inclusion of the suspect instruments significantly increases ed in xil, and If the restriction other using is also true, the then suspect theinstruments inclusion ofxi2 the suspect, that is a good reason for doubting the predeterrninedness of xi2. ance e inclusion with the instruments of partition the suspect of cannot xi, instruments the sample significantly orthogonality significantly increase conditions increases the J statistic. n good be written reason for doubting the predeterrninedness of xi2. th 4 / the 26 partition Guochang of Zhao xi, the RIEM, sample SWUFE orthogonality Single-Equation conditions GMM

gn (8) and S can be written as where where where where where In In particular, In particular, gln(6") gln(6") can can be written be written as In particular, gln(6") can be written as as In particular, gln(6") can be written as where where where where where For For a consistent a consistent estimate estimate 3 of 3 of S, S, the the efficient efficient GMM GMM estimator estimator using using all all the the K K For 5 instruments For / 26 a consistent a Guochang and estimate its Zhao associated 3 RIEM, of J S, SWUFE statistic the efficient have Single-Equation already GMM been GMM estimator derived using using this all all the and the K the K

previous section. Reproducing them, Chapter 3 For a consistent estimate Ŝ of S, the efficient GMM estimator using previous all the section. K instruments Reproducing and them, its associated J statistic: The efficient GMM estimator of the same coefficient vector 6 using only the first K1 instruments and its associated J The For efficient a consistent GMM estimator estimate of the ˆ statistic are obtained by replacing xi by xil in S 11 same of S 11 coefficient, the efficient vector GMM 6 using estimator these expressions. So only the first K1 using instruments all the and K 1 its instruments associated J and statistic its associated are obtained J by statistic: replacing xi by xil in these expressions. So A where (Testing Sll is a consistent a subset of estimate orthogonality of Sll. conditions): Suppose assumptions 1-3 The hold. A test is Let based x i1 on be the a subvector following proposition of x i, and specifying strengthen the Assumption asymptotic 4 distriwhere Sll is a consistent estimate of Sll. bution by requiring of J - J1 that (the the proof rank is left condition as an optional for identification exercise). is satisfied for x i1 The (So test E(x is based i1 z on the following proposition specifying the asymptotic distrii1 ) is of full column rank). Then, for any consistent bution of J - J1 (the proof is left as an optional exercise). Proposition estimator 3.7 Ŝ of (testing S and a Ŝ11 subset of S of 11, orthogonality conditions13): Suppose Assumptions 3.1-3.5 hold. C Let Jxil Jbe 1 a subvector d χ 2 (K of K 1 xi, ) and strengthen Assump- Proposition 3.7 (testing a subset of orthogonality conditions13): Suppose tion 3.4 by requiring that the rank condition for identification is satisfied for xil (SO Assumptions where K = 3.1-3.5 #x i and hold. K Let = #x xil i1 be. a subvector of xi, and strengthen Assump- E(xil zi) is of full column rank). Then, for any consistent estimators g of S and gl I tion 3.4 by requiring that the rank condition for identification is satisfied for xil (SO 6 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Testing a subset of orthogonality conditions Remarks The choice of Ŝ and Ŝ11 does not matter asymptotically as long as they are consistent. In finite sample, the test statistic C can be negative. The negative C can be avoided if the same Ŝ is used throughout: (1) Do the efficient two-step GMM with full instruments x i to obtain Ŝ from the first step, ˆδ and J from the second step. (2) Extract the submatrix Ŝ11 from Ŝ obtained from (1), calculate ˆδ using this Ŝ11 and J 1. Then take the difference in J. 7 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

LR, Wald and LM tests Consider maximum likelihood estimation of a parameter θ and a test of hypothesis H 0 : c(θ) = 0: Likelihood ratio test If the restriction c(θ) = 0 is valid, then imposing it should not lead to a large reduction in the log-likelihood function. Wald test If the restriction is valid, then c(θ MLE ) should be close to zero since the MLE is consistent. Lagrange multiplier test If the restriction is valid, then the restricted estimator should be near the point that maximizes the log-likelihood. Test ideas The LR test is based on the distance between restricted and unrestricted maximum log likelihood; the Wlad test is based on the distance between c(θ MLE ) and 0; while LM test is based on the distance between slope of the likelihood function at the restricted estimator and 0. 8 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

ln L( ) d ln L( ) d c( ) d ln L( ) d ln L Likelihood ratio ln L R ln L( ) Lagrange multiplier c( ) Wald 0 ^ R ^ MLE 9 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

LR, Wald and LM tests (cont.) LR test: LR statistic = 2 ln λ d χ 2 #r where λ = ˆL R /ˆL U, and #r is the number of restrictions. Wald test: [ ] 1 W = c(ˆθ) Avar(c(ˆθ)) c(ˆθ) d χ 2 where #r is the number of restrictions. LM test: ( ) ( ) ln L(ˆθ R ) LM = ˆθ [I(ˆθ R )] 1 ln L(ˆθ R ) R ˆθ d χ 2 #r R where I(ˆθ R ) is the Hessian matrix, #r is the number of restrictions. 10 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM #r

Hypothesis testing by the Likelihood-Ratio principle The three principle to construct test statistics can be easily generalized to GMM. The restricted efficient GMM estimator is defined as restricted efficient GMM : δ(ŝ) 1 argmin J( δ, Ŝ 1 ) s.t. H 0 δ The LR principle suggests that LR J( δ(ŝ 1 ), Ŝ 1 ) J(ˆδ(Ŝ 1 ), Ŝ 1 ) d χ 2 #r where #r is the number of restrictions. 11 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Implications of conditional homoskedasticity er conditional homoskedasticity, the matrix of fourth moments S (= = E(E? Assumption xixi)) can be 7 written : Conditional as a product homoskedasticity of second moments: E(ε 2 x i ) = σ 2., = E(xix:). UnderAs conditional in Chapter 2, homoskedasticity: this decomposition of S has several impli- S E[g i g i] = E[ε 2 x i x i] = σ 2 Σ xx, S is nonsingular by Assumption 3.5, this decomposition implies that a2 > Ex, is where nonsingular. Σ xx = E(x i x i ). stimator Since exploiting S is this nonsigular, structure of Σ xx S is is nonsigular. The estimator exploiting this structure of S is where ˆσ G2 is some consistent 2 is some consistent estimator to be specified. estimator to be specified below. By ergodic station- S, +, X,. Thus, provided that G2 is consistent, we do not need the -moment 12 / 26 assumption Guochang Zhao (Assumption RIEM, SWUFE 3.6) for g Single-Equation to be consistent. GMM

Implications of conditional homoskedasticity (cont.) Efficient GMM Becomes 2SLS In the efficient GMM estimation, the weighting matrix is g-'. If we set to (3.8.2), Efficient the GMM GMM estimator becomes 2SLS which For does the not efficient depend on GMM G2. In the estimator, general case, there is whole no need point of tothe dofirst thestep in the efficient first step two-step under GNIM conditional was to obtain homoskedasticity, a consistent estimator because of S. the Under conditional second homoskedasticity, step estimator there collapse is no need to to the do the GMM first step estimator because with the second- S xx, used for Ŝ, δ(s 1 xx). This estimator is called the TWo-Stage Least Squares. 13 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

timator." he 3): ML counterpart of 2SLS, called the "limited-information maximum likelihood Implications of conditional homoskedasticity (cont.) stimator." The expression for ~ var(b~~~~) can be obtained by substituting (3.8.1) into.5.13): The Efficient expression GMM for ~ becomes var(b~~~~) 2SLS can be (cont.) obtained by substituting (3.8.1) into 3.5.13): For 32, consider the sample variance of the 2SLS residuals: The asymptotical variance of the 2SLS is ural estimator of this is natural estimator of this is A natural estimate for this is natural estimator of this is 2, consider (Some the authors sample divide For ˆσ variance the sum of of the squares 2SLS by residuals: n - L, not by n, to calculate Z2.) By Proposition 3.2, 2, consider (3.8.6) the -+, sample a2 if E(zizi) variance exists of the and 2SLS is finite. residuals: Thus defined in r 32, (3.8.2) consider with the this sample 32 is consistent variance of for the S. 2SLS residuals: or 32, consider the sample variance of the 2SLS residuals: Substituting (3.8.2) into (3.5.15) and (3.5.16), the t-ratio and the Wald statistic become t ratio and Wald statistic e authors divide the sum of squares by n - L, not by n, to calculate Z2.) By ome sition authors 3.2, (3.8.6) divide -+, the sum a2 if of E(zizi) squares exists by n - and L, is not finite. by n, Thus to calculate defined Z2.) in By Some authors divide the sum of squares by n - oposition ) with this 3.2, 32 is (3.8.6) consistent L, not by n, to calculate Z2.) By -+, a2 for if S. E(zizi) exists and is finite. Thus defined in roposition 3.2, (3.8.6) -+, a2 if E(zizi) exists and is finite. Thus defined in.8.2) bstituting with this (3.8.2) 32 is into consistent (3.5.15) for and S. (3.5.16), the t-ratio and the Wald statistic 3.8.2) e with this 32 is consistent for S. Substituting (3.8.2) into (3.5.15) and (3.5.16), the t-ratio and the Wald statistic Substituting (3.8.2) into (3.5.15) and (3.5.16), the t-ratio and the Wald statistic come 14 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Implications of conditional homoskedasticity (cont.) Under Assumptions 3.1-3.4, the 2SLS estimator (3.8.3) is consistent. Assumption 3.5 is added, the estimator is asymptotically normal with the asy totic 15 / 26 variance Guochang given Zhaoby (3.5.1) RIEM, SWUFE with W Single-Equation = (a2~,,)-'. GMMIf Assumption 3.7 (co Becomes Sargan's Statistic 8 J Becomes Sargan s Statistic Chapte hen is set When Ŵ to is (ii2sxx)-', set to the distance defined in (3.4.6) becomes (ˆσ2 S xx ) 1, the distance becomes position 3.6 then implies that the distance evaluated at the efficient GMM es tor under conditional homoskedasticity, ~ ~SLS, is asymptotically chi-squar is distance is called Sargan's statistic (Sargan, 1958): This distance is called Sargan s Statistic: (sxy - ~x~i2sls)~sx;: (sxy - ~xzi2sls) Sargan's statistic = n.. (3.8.1 14~his estimator was first proposed by Theil (1953). summarize our results so far as oposition 3.9 (asymptotic properties of 2SLS): &2

(a) Under Assumptions 3.1-3.4, the 2SLS estimator (3.8.3) is consistent. If Assumption 3.5 is added, the estimator is asymptotically normal with the asymptotic variance given by (3.5.1) with W = (a2~,,)-'. If Assumption 3.7 (conditional homoskedasticity) is added to Assumptions 3.1-3.5, then the estimator Asymptotic properties of 2SLS: is the efficient GMM estimator. Furthermore, Underif Assumption E(zi z:) exists 1-4, and the is finite, 2SLS estimator is consistent. If l5 then Assumption 5 is added, the estimator is asymptotically normal. (b) the If asymptotic Assumption variance 7 is is added consistently to Assumptions estimated by 1-5, (3.8.5), then estimator is (c) te in the (3.8.7) efficient +d N GMM (0, I), estimator. W in (3.8.8) +d X2(#r), and Implications of conditional homoskedasticity (cont.) The asymptotical variance is consistently estimated by... (4 the Sargan statistic in (3.8.10) +d (K- L). t l d N(0, 1), W d χ 2 (#r) Proposition The Sargan 3.8 states Statistic that the LR d χ 2 statistic, (K L). which is the difference in J with and without the imposition of the null hypothesis, is asymptotically chi-squared. Since J LR can be test written in 2SLS as (3.8.9), we have If the hypothesis is linear, the LR is numerically equal to the Wald statistic W. where 8 is the restricted 2SLS estimator which minimizes (3.8.9) under the null hypothesis.16 In Proposition 3.8, the use of the same guaranteed the statistic 16 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Small properties of 2SLS There is fairly large literature on the finite-sample distribution of the 2SLS estimator. The analytical results, however, are not very useful for empirical researchers, because they are derived under the restrictive assumptions of fixed instruments and normal errors. If the instrumental variable is weak, a mass of the distribution of the sampling error ˆδ 2SLS δ remain apart from zero until the sample size gets really large. How weak is weak? The F-statistic for the instrument variable should be larger than 10 (rule of thumb). 17 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

one instrument.) Alternative Derivations of 2SLS Alternative Derivations of PSLS If we define data matrices as Define data matrix as then it is easy to see that the 2SLS estimator and associated statistics can be Chapter Then, the 2SLS estimator can be written as written 3 as where P where r X(XIX)-'X' P = X(X is the X) projection 1 X is matrix, the projection matrix..,.. A -2 EE a = - where P E y - Z62SLS. n 18 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Alternative Derivations of 2SLS (cont.) 19 / 26 Using Guochang this formula, Zhao RIEM, we can SWUFE provide Single-Equation two other derivations GMM of the 2SLS estimato where TheP estimate r X(XIX)-'X' for asymptotical is the projection variance, matrix, t-ration, Wald statistic and overidentification test...,.. -2 EE A a = - where P E y - Z62SLS. n P'PP Sargan's statistic = -. 62

P'PP 2SLS as an IV estimator Sargan's statistic = -. 62 g this formula, we can provide two other derivations of the 2SLS estimator. Now Generate we generate L instruments, those L instruments ẑ i, with from the xi as Kfollows. raw instruments, The t-th instrument is S as an the IV fitted x i Estimator : regress value from z il regressing on x i, and zit obtain (the t-th the regressor) fitted on values xi. The n-vector of fitted value ẑ l is = x(x'x)-' (X X 1 Xfze, )X zwhere l for each is the l. n-vector of the t-th regressor (i.e., the t-th i (L x 1) be the vector of L instruments (which will be generated from xi as column of Z). Therefore, the n x L data matrix of instruments is ribed below) Therefore, the L the regressors, n L data and let matrix 2 be of the the n x instruments L data matrix is of those A truments. So the i-th row of 2 is 2;. The IV estimator of 6 with li serving as Z = (x(x'x)-i X1zl...., x(x'x)-'x'z~) = x(x'x)-'x'z = PZ, (3.8.13) ments is, by (3.4.4), where Then P is theprojection IV estimator matrix. issubstituting this into the IV format (3.8.12) yields the 2SLS estimator (see (3.8.3')). we generate Replace those L Ẑ instruments with PZ from the above xi as follows. equation, The we t-th get the instrument 2SLS is tted value from estimator. regressing zit (the t-th regressor) on xi. The n-vector of fitted is x(x'x)-' Xfze, where is the n-vector of the t-th regressor (i.e., the t-th n of Z). Therefore, the n x L data matrix of instruments is 20 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

2SLS as two regressions First stage: regress the L regressors on x i and obtain fitted values ẑ i. Second stage: regress y i on these fitted values. For those regressors that are predetermined, there is no need to carry out the first stage. This derivation of the 2SLS is useful and it justifies the naming of the estimator, but there is a pitfall - the OLS standard errors and estimated asymptotic variance from the second stage cannot be used for statistical inference. 21 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

When skedastic, regressors there is a close are connection predetermined between the distance function J for efficient GMM and the sum of squared residuals (SSR). From (3.8.9') on page 230, - y'py - 2y1zii + S1zlzS - (since PZ = Z when zi c xi) &2 The efficient GMM estimator is OLS; The restrictive GMM is restrictive OLS; The Wald statistic can be calculated as the difference in SSR with and without the restriction, normalized to ˆσ 2. 22 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

GMM estimator is the 2SLS estimator under conditional homoskedasticity, Testing a subset of orthogonality conditions GMM estimators are given by = (z'pz)-'z'p~ with P = x(x'x)~'x', 6= (Z'P~Z)~'Z'P~~ with PI =xi(x',xi)-'x',. And the C statistic becomes the difference in two Sargan statistics: where -,, i'i i-y-z6, i=y-z6, a G-. n CAs isseen guaranteed for the case to be without nonnegative conditional in finite homoskedasticity, samples if thec same is guarantee matrix nonnegative Ŝ is used in finite throughout. samples if the same matrix is used throughout, whic conditional homoskedasticity amounts to using the same estimate of the er ance, C2, to deflate both i'pi and ifpl i, as in (3.8.17). By Proposition 3.7 23 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Hausman test ermore, u have as been you exposed will be asked to the to Hausman show (Analytical test in the Exercise ML context, 9), under you can rechesis e MM this guaranteeing as the GMM C version to be asymptotically of the Hausman chi-squared, principle.) By Proposition 233 3.9, (i) is consistently estimated by ermore, as you will be asked to show (Analytical Exercise 9), n GMM ˆδ is asymptotically more efficient than δ under because ˆδ 233 exploits thesis guaranteeing C to be asymptotically chi-squared, more orthogonality conditions. Therefore een rthermore, exposed as to you the will Hausman be asked test in to the show ML (Analytical context, you Exercise can rec- 9), under Avar( δ) Avar(ˆδ). pothesis GMM guaranteeing version of the C Hausman to be asymptotically principle.) By chi-squared, Proposition 3.9, larly, a consistent estimator of Avar(6) is istently estimated een exposed And to the we by Hausman also have test in the ML context, you can recthe GMM version of the Hausman principle.) By Proposition 3.9, sistently estimated by e been exposed In addition, to the Hausman we alsotest have, as in the calculation of the C statistic, the in same the ML estimate, context, 6', you is used can through- recnsistent estimator of Avar(6) is n as order the GMM to guarantee version the of test the statistic Hausman below principle.) to be nonnegative. By Proposition The resulting 3.9, onsistently ator of ~var(8 estimated - i) is by nsistent estimator of Avar(6) is calculation of the C statistic, the same estimate, 6', is used throughguarantee the test statistic below to be nonnegative. The resulting man calculation consistent ar(8 and - i) Taylor of estimator The is (1980) above the C statistic, of have Avar(6) matrix shown in the same is finite that (1) sample this matrix is positive in finite semidefinite samples is estimate, 6', is used throughive semidefinite but not (nonnegative necessarily definite) nonsingular. but not necessarily nonsingular, but o guarantee the test statistic below to be nonnegative. The resulting var(8 r any generalized - i) is inverse17 of this matrix, the Hausman statistic 24 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Hausman test (cont.) Hausman and Taylor (1980) have shown that (1) this matrix in finite samples is positive semidefinite (nonnegative definite) but not necessarily nonsingular, but (2) for any generalized inverse17 of this matrix, the Hausman statistic Hausman test: is invariant Thisto isthe invariant choice tof the the choice generalized of generalized inverse inverse and is and asymptotically is chisquared with asymptotically min(k - K1, chi-squared L - s) degrees with min(k of freedom, k 1, Lwhere s) degrees of freedom, where s is the number of regressors which are retained s = #zi asn instruments xil = number in of x i1 regressors. which are retained as instruments in xi,. The relationship between H and C: If K K 1 L s so that both H and C have the same degrees What of is freedom, the relationship then H between = C (numerically C and H under equal). conditional Otherwise, homoskedasticity? the two It can be statistics shown (see arenewey, numerically 1985) different that: and have different degrees of freedom. "A generalized inverse, A-, of a matrix A is any matrix satisfying AA-A = A. If A is square and nonsin- 25 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM

Testing conditional homoskedasticity The nr 2 statistic for the 2SLS residuals does not have the desired asymptotic distribution. A test statistic that is asymptotic chi-squared is available but is extremely cumbersome. Testing for serial correlation A modified Box-Pierce Q statistic that is asymptotically chi-squared under the null of no serial correlation. 26 / 26 Guochang Zhao RIEM, SWUFE Single-Equation GMM