The Case Against JIVE Related literature, Two comments and One reply PhD. student Freddy Rojas Cama Econometrics Theory II Rutgers University November 14th, 2011
Literature 1 Literature 2 Key de nitions 3 The case against JIVE 4 The bias of 2SLS 5 Estimators UJIVE 1 UJIVE 2 JLS or JIVE LIML 6 Experiments 7 Results 8 Conclusions 9 Two comments 10... and One reply 11 Estimation by using STATA 12 References 13 Appendix
Literature Instrumental variables performance with weak instruments Davidson and McKinnon (2004). The case Against JIVE. Journal of Applied Econometrics 21: 827-833. Blomquist and Dahlberg (1999). Small sample properties of LIML and Jaccknife IV estimators: Experiments with Weak Instruments. Journal of Applied Econometrics 14: 69-88. Angrist, J., W. Imbens and A. Krueger (1999). Jaccknife Instrumental Variables Estimation. Journal of Applied Econometrics 14: 57-67.
Key de nitions Key de nitions Jackknife Instrumental Variables Estimation (JIVE). Unbiased Jackknife Instrumental Variables Estimation (UJIVE). Limited Information maximum likelihood (LIML) Two Stage Linear Squares (2SLS) Finite-small sample properties Monte Carlo simulations.
The case against JIVE JIVE performs very badly when the instruments are weak. Davidson and McKinnon (2004) [DM] perform montecarlo experiments to compare the performance of "jackknife instrumental variables estimator" with the 2SLS and LIML estimators. They nd NO evidence for using JIVE instead of LIML; LIML has better nite small sample properties. In terms of size, 2SLS is less dispersed than JIVE. The results of DM s paper does not support ndings in Blomquist and Dahlberg (1999) [BD] and Angrist, J., W. Imbens and A. Krueger (1999) [AIK].
The case against JIVE Summarizing Finite-sample properties Evaluation (According to each study) UJIVE LIML Comment DM X LIML the best in reducing bias BD?? Hard to nd a winner! AIK X A reduced space of parameters
The bias of 2SLS The System of Equations We have the following system of equations (in matricial terms); Y = X β + ε (1) X = Z π + η (2) where X, Z and η are matrices of dimension n L, n k and n L respectively. The number of overidenti ed restricctions can be calculated as r = k L. Also, there are M common elements in X and Z, then M columns of n L matrix η are zero. The endogeneity comes from the following expression E ε i η 0 i Z = σ εη (3)
The bias of 2SLS The bias of OLS We have the following assumptions E [ ηi 0j Z ] = 0 and E [ η i ηi 0j Z ] = Σ η. The rank of Σ η is equal to L M. The inconsistency feature of OLS estimator β OLS = X 0 X 1 X 0 Y is shown in the following steps β OLS = (Z π + η) 0 (Z π + η) 1 (Z π + η) 0 Y = η 0 + π 0 Z 0 (Z π + η) 1 (Z π + η) 0 Y
The bias of 2SLS The bias of OLS β OLS = η 0 + π 0 Z 0 (Z π + η) 1 η 0 + π 0 Z 0 Y = η 0 + π 0 Z 0 (Z π + η) 1 η 0 + π 0 Z 0 (X β + ε) = η 0 + π 0 Z 0 (Z π + η) 1 η 0 + π 0 Z 0 ((Z π + η) β + ε) Then applying law of iterative expectations ((η E [E [β Z OLS ]j Z ] = E 0 + π 0 Z 0 ) (Z π + η)) 1 (η 0 + π 0 Z 0 ) ((Z π + η) β + ε) Z
The bias of 2SLS The bias of OLS β + ((η E [E [β Z OLS ]j Z ] = E 0 + π 0 Z 0 ) (Z π + η)) 1 (η 0 + π 0 Z 0 ) ε Z h = E β + η 0 + π 0 Z 0 (Z π + η) i 1 η 0 ε Z " (η = β + 0 + π 0 Z 0 # ) (Z π + η) 1 η 0 ε E N N Z In terms of consistency lim N! β OLS = β + π0 Σ Z π + Σ η 1 σεη
The bias of 2SLS The bias of OLS lim N! β OLS = β + π0 Σ Z π + Σ η 1 σεη Z Where Σ Z = lim 0 Z N! N, Σ η η = lim 0 η N! N and η0 ε N p! σ εη.
The bias of 2SLS The bias of 2SLS The optimal instrumental variable estimation must ful ll the following condition E (Y X β IV ) 0 Θ (Z ) = 0 an analytical solution is available since Y 0 Θ (Z ) X 0 Θ (Z ) 1 Y 0 Θ (Z ) = β 0 IV X 0 Θ (Z ) = β 0 IV or β IV = Θ (Z ) 0 X 1 Θ (Z ) 0 Y Woodridge states that Θ (Z ) = Z π is the optimal instrument in terms of e ciency (instead of only Z).
The bias of 2SLS k-class estimator Note that (Z bπ) 0 = X 0 P Z = X 0 Z (Z 0 Z ) 1 Z 0. P Z is the orthogonal projection on to the span of the columns of Z. Following to Davidson and Mackinnon (2007) we must take notice that we can nd a matrix A with the property that AZ = Z will lead us to have another estimator. Particularly, if we set up the choice A = I κ (I P Z ), we have a κ class estimator. Thus, quite generally we consider the estimator β by using Θ (Z ) Θ (Z, κ).
The bias of 2SLS The bias of 2SLS because π is unknown (unfeasible estimation) we approach the estimator with a feasible version of the optimal IV estimation bβ 2SLS IV = (Z bπ) 0 X 1 (Z bπ) 0 Y (4) where bπ = (Z 0 Z ) 1 Z 0 X.It is value to take notice that expression (4) is not properly the estimator for 2SLS; [AIK] states that β 2SLS IV has much better small sample properties than β 2SLS in the presence of many instruments (Nagar; 1959).
The bias of 2SLS The bias of 2SLS We show the bias of 2SLS estimator in the following lines E [ε i Z i bπ] = E [E [ε i Z i bπ]j Z ] Z h = E he ε i Z i Z 0 Z i i 1 Z 0 X Z Z = E hz i Z 0 Z i 1 Z 0 E [ε i X ] Z Z = E hz i Z 0 Z i 1 Z 0 E [ε i (Z π + η)] Z Z = E hz i Z 0 Z i 1 Z 0 E [ε i Z π + ε i η] Z Z = E hz i Z 0 Z i 1 Z 0 E [ε i η] Z Z = E hz i Z 0 Z i 1 Z 0 Ξ εi η Z Z
The bias of 2SLS The bias of 2SLS where Ξ 0 ε i η is a row vector with zeros and just one element σ εη 6= 0 in the i-position. Equivalently, E [ε i Z i bπ] = E hz i Z 0 Z i 1 Z 0 Ξ εi η Z Z h = E Z i Z 0 Z i 1 Z 0 i E [ε i η Z i ] Z h = E Z i Z 0 Z i 1 Z 0 i Z σ 0 εη Z = K N σ εη Thus, for a xed σ εη we have that for small samples E [ε i Z i bπ] increases with the number of instruments. The result is that we have a bias in the 2SLS estimator.
Estimators UJIVE 1 UJIVE 1 JIVE removes the dependence of of the constructed instrument Z i bπ on the endogenous regressor for observation i by using the following estimator eπ (i) = Z (i) 0 Z (i) 1 Z (i) 0 X (i) (5) the estimate of the optimal instrument is Z i eπ (i) ; then, because ε i is independent of X j if j 6= i we claim that E [ε i Z i eπ (i)] = 0 this is easily veri able E E ε i Z 0 i eπ (i) Z E = 0 h Z i Z (i) 0 Z (i) i 1 Z (i) 0 E [εi X (i)] Z See Phillips and Hale (1977) for details.
Estimators UJIVE 1 UJIVE 1 Thus, bx i,ujive = Z i eπ (i) ; then the estimator of β is bβ UJIVE = 1 X b UJIVE 0 X X b UJIVE 0 Y (UJIVE 1) We require to perform the estimator in (5) by each observation i. [AIK] show a sort of shortcut where h i = Z i (Z 0 Z ) 1 Z 0 i Z i eπ (i) = Z i bπ h i X i 1 h i
Estimators UJIVE 2 UJIVE 2 An alternative estimator is Z i eπ (i) = Z i bπ the resulting estimator for β is bβ UJIVE 2 = 1 1 N h i X i 1 X b UJIVE 0 2 X X b UJIVE 0 2Y (UJIVE 2) where bx i,ujive 2 = Z i eπ (i). The projection between UJIVE1 and UJIVE2 are closely similar and both are consistent. The probability of the estimator of β and their rst-order asymptotic distribution are therefore the same as those of β IV and β 2SLS. DM states that these estimators may di er noticeably in any particular sample.
Estimators JLS or JIVE JIVE DM consider that an alternative estimator is bβ JIVE = X b JIVE 0 X b 1 JIVE X b JIVE 0 Y (JIVE) where bx i,jive = Z i π (i). DM states that JIVE is biased in the direction of zero, just as OLS estimator when the explanatory variable is measured with error. But, This class of estimator is consistent.
Estimators LIML Limited Information Maximum Likelihood We can estimate the parameters in linear regressions with endogenous regressors by using the limited-information-maximum-likelihood estimator. The likelihood is based on normality for the reduced form errors and with covariance matrix, although consistency and asymptotic normality of the estimator do not rely on this assumption. The log likelihood function is L = N i=1 1 2 1 ln (2π) ln jωj (LIML) 2 (Zi π) 0 0 β Ω 1 Yi (Zi π) 0 β (6) Z i π Z i π Yi X i X i
Experiments Experiments DM have the following system of equations (in matricial terms) in order to do the simualtions; Y = ιβ 1 + x β 2 + ε (Structural eq.) x = σ η (Z π + η) (Reduced eq.) where X = [ι x], Z and η are matrices of dimension n 2, n l and n 1 respectively. The number of overidenti ed restricctions can be calculated as r = l 2. The elements of ε and η have variances σ 2 ε and 1 respectively, and correlation ρ. In order to start with the simulations we need to impose values for parameters. For this, an important guide is the size of the ratio kπk 2 to σ 2 η.
Experiments The concentration parameter The ratio kπk 2 to σ 2 η is interpreted as the signal-to-noise ratio in the reduced-form equation.
Experiments Setting up parameters DM x values of the π j to be equal excepting π 1 = 0. The parameter which does varies is denoted by R 2 = kπk2 kπk 2 +σ 2 η (This is the asymptotic R 2 ) R 2 is monotonically increasing function of the the ratio kπk 2 to σ 2 η. A small value of R 2 implies that the instruments are weak. In the experiments, DM vary the sample size n, the number of overidentifying restrictions (r), the correlation between errors (ρ) and R 2
Experiments Parameters set up 500 000 Montecarlo replications. When R 2 = 0, β 2 is not asymptotically identi ed. Absolute value of ρ matters; the sign just a ects the direction of bias. r varies from 0 to 16. Sample size: 25, 50, 100, 200, 400 and 800.
Experiments Performance evaluation Because LIML and JIVE estimators have no moments (see davison and Mackinnon; 2007). DM reports the median bias, this is median bias = β d 0.5 β As measure of dispersion DM reports the nine decile range, this is 9d range = β d 0.95 β d 0.05 Ackberg and Devereux (2006) suggest to consider Trimmed mean bias = 1 n j β j β where β j 2 [β d 0.99 β d 0.01 ].
Results Results (I): Median bias evaluation
Results Results (II): Median bias evaluation
Results Results (III): Median bias evaluation
Results Results (IV): Rejection frequencies
Results Results (V): Dispersion
Conclusions Conclusions 1 AIK and BD give divergent views of JIVE performance. DM s paper re-examinate this issue. 2 DM conclude that in most regions of the parameters space they have studied, JIVE is inferior to LIML regarding to median bias, dispersion and reliability of inference. 3 LIML should be preferred whenever you need to deal with estimators which have no moments. 4 DM points out that, however, montecarlo simulations does no support unambiguoulsy the usage of LIML; when the instruments are weak the dispersion is signi cant in this estimator. 5 Chao and Swanson (2004) state that LIML is not a consistent estimator in a context of heteroskedasticity; but UJIVE is (under some conditions).
Two comments Ackerberg and Devereux s comments They disagree with DM conclusion: The LIML estimator should almost always be prefered to the JIVE estimators. Ackerberg and Devereux (2003) show that advantages over LIML are signi cantly reduced when using the "improved" JIVE (IJIVE). They extend the unbiased estimator (UIJIVE). What matters is the results in the space of weak instruments; the resulting advantages of LIML in terms of median bias and dispersion are small in that space. But not much di erent from IJIVE estimators. There are important advantages of JUVE estimators over the LIML estimator; particualrly regarding the robustness of JIVE estimators to heteroskedasticity.(see Chao and Swanson; 2004).
Two comments Ackerberg and Devereux s comments About the performance of IJIVE estimators. Particularly, AD shows that in the region where R 2 > 0.2 the IJIVE estimator removes almost all the median bias of the JIVE estimator. But, LIML is better in terms of bias in the space of weak instruments. IJIVE improves on the 9-decile range of JIVE by about 20%; although it is considerably larger that that of LIML. If they consider the trimmed mean statistics, UIJIVE bias.is smaller than that of LIML.
Two comments Ackerberg and Devereux s comments They conclude the following about advantages of LIML By using IJIVE or UIJIVE one can either closely match the median bias of LIML or closely match the dispersion and better the trimmed mean bias of LIML. LIML does better in terms of median bias for values of R 2 below 0.2. But, the emprical work is not possible in that situation; mainly because When R 2 is below 0.2 the average rst stage F-statistic is below 3 and the corresponding p-value is above 0.10; one tipically would not be able to reject the hypothesis that the instruments are irrelevant. The variance of all estimators is likely to be large.
Two comments Ackerberg and Devereux s comments LIML is not the panacea IJIVE and UIJIVE appear to be very robust estimators. In particular, these estimators are consistent in the case of heteroskedasticity. Chao and Swanson (2004) states that LIML is not a consistent estimator in this context of heteroskedasticity. Chao and Swanson (2004) provide and alternative proof that JIVE estimators are consistent under heteroskedasticity Could "a sort of" LIML estimator become consistent under this context?. Such estimator would probably not have a closed solution and might be less robust to other perturbations (non-normality) than standard LIML estimator.
Two comments Blomquist and Dahlberg s comments they do nd di cult to understand categorical rejection of JIVE from DM; JIVE or LIML do not dominate the 2SLS in terms of RSME. In terms of bias, DM s results shows a good perfomance of LIML. In terms of variance 2SLS outperforms both LIML and JIVE estimators. LIML outperforms JIVE for some values of the parameter space, in others spaces we nd the opposite conclusion. The complexity of the DGP matter for a conclusion. AIK, BD and DM should be more nuanced.
... and One reply Davidson and Mackinnon s reply DM agree that there is some space of parameters where UJIVE outperforms LIML. They do simulations and evaluate the perform of UIJIVE s Ackerberg and Devereux (2006); interestingly, UIJIVE tends to be subtantially less dispersed than other JIVE estimators and just when the instruments are weak the this IJIVE is less dispersed than LIML. They propose to look at n 1/2 bβ 2 β 2 = t 0 + n 1/2 t 1 + o p n 1/2 to show the approximate biases for these estimators.
... and One reply Davidson and Mackinnon s reply They states out that altought UIJIVE si constructed to reduce mean bias; it does not do nearly so well as regards median bias. That implies that estimator is highly skewed. DM proposes to look more closely at modi ed LIML estimator proposed by Fuller (1977) for which the moment exist and then see how it performs relative to UIJIVE.
Estimation by using STATA STATA command In order to perform the UJIVE estimator there is an available STATA command for that; the syntaxis is as follows jive depvar [varlist1] (varlist2 =varlistiv) [if] [in] [, options] The options are UJIVE1, UJIVE2 (see AIK), JIVE1 JIVE 2 (see BD) and calculation of a robust matrix for dealing with heteroskedasticity. The command jive saves information in e(), the most interesting matrices are: e(b) -the coe cient vector, e(v) -the variance and covariance matrix, e(f1) - rst stage F stat and e(r2_1) - rst stage R 2.
References References Angrist, J., W. Imbens and A. Krueger (1999). Jaccknife Instrumental Variables Estimation. Journal of Applied Econometrics 14: 57-67. Blomquist and Dahlberg (1999). Small sample properties of LIML and Jaccknife IV estimators: Experiments with Weak Instruments. Journal of Applied Econometrics 14: 69-88. Davidson, R., and J. McKinnon (2004). The case Against JIVE. Journal of Applied Econometrics 21: 827-833. Davidson, R., and J. McKinnon (2006). Reply to Ackerberg and Devereux and Blomquist and Dhalberg on "The case Against JIVE". Journal of Applied Econometrics 21: 843-844.
References References Blomquist and Dahlberg (2006). The case Against JIVE: A comment. Journal of Applied Econometrics 21: 839-841 Ackerberg and Deverwux (2006)..Comment on "The case Against JIVE". Journal of Applied Econometrics 21: 835-838. Davidson, R., and J. McKinnon (2007). Moments of IV and JIVE estimators. The Econometrics Journal Vol 10 Number 3. Stock, J., J. Wright and M. Yogo (2002). A Survey of Weak Instruments and Weak Identi cation in Generalized Method of Moments. Journal of Business and Economic Statistics, Vol 20 Number 4. Hausman, J., W. Newey, T. Woutersen, J. Chao and N. Swanson (2006). Instrumental variable Estimation with Heteroskedasticity and Many Instruments. Working paper.
References References Chao, J. and N. Swanson. (2004). Estimation and Testing Using Jackknife IV in heteroskedasticity Regressions With many Weak Instruments. Working paper, Rutgers University. Nagar, A. (1959). The Bias and Moment matrix of the general k-class estimators of the parameters in simultaneous equation. Econometrics, 27, 575-595. Phillips, G. D. A., and C. Hale. (1977). The bias of instrumental variable estimators of simultaneous equation systems. International Economic Review 18: 219 228. Fuller, W.A. (1977) Some properties of a modi cation of the Limited Information estimator. Econometrica 45, 939-953.
2011-11-13 The Case Against JIVE when Instruments are Weak References Notes References Chao, J. and N. Swanson. (2004). Estimation and Testing Using Jackknife IV in heteroskedasticity Regressions With many Weak Instruments. Working paper, Rutgers University. Nagar, A. (1959). The Bias and Moment matrix of the general k-class estimators of the parameters in simultaneous equation. Econometrics, 27, 575-595. Phillips, G. D. A., and C. Hale. (1977). The bias of instrumental variable estimators of simultaneous equation systems. International Economic Review 18: 219 228. Fuller, W.A. (1977) Some properties of a modi cation of the Limited Information estimator. Econometrica 45, 939-953. Notes
Appendix AIK results DGP Median Absolute Error N=100, L=2, M=5000 Estimators 2SLS LIML UJIVE1 UJIVE2 OLS 1 0.11 0.12 0.13 0.13 0.59 2 0.28 0.13 0.17 0.17 0.59 3 0.16 0.25 0.32 0.15 0.17 4 0.80 1.01 0.88 0.88 0.80 5 0.28 0.41 0.20 0.20 0.59
Appendix AIK results DGP Coverage rate 95% conf. Interval N=100, L=2, M=5000 Estimators 2SLS LIML UJIVE1 UJIVE2 OLS 1 0.91 0.96 0.96 0.96 0.00 2 0.31 0.94 0.94 0.94 0.00 3 0.57 0.97 0.97 0.95 0.03 4 0.00 0.71 0.71 0.71 0.00 5 0.38 0.93 0.93 0.94 0.00
The Case Against JIVE when Instruments are Weak Appendix Appendix 2011-11-13 Notes AIK results Coverage rate 95% conf. Interval N=100, L=2, M=5000 DGP Estimators 2SLS LIML UJIVE1 UJIVE2 OLS 0.91 0.96 0.96 0.96 0.00 1 2 0.31 0.94 0.94 0.94 0.00 3 0.57 0.97 0.97 0.95 0.03 4 0.00 0.71 0.71 0.71 0.00 5 0.38 0.93 0.93 0.94 0.00 Notes The DGP processes for each model are as follows 1. There is a single overidentifying restriction K=3, L=2 and ρ = 0.25 2. A large number of instruments relative to the number of regressors; K=21, L=2 and ρ = 0.25 3. The model is non-linear and heteroscedatic; K=21, L=2 and ρ = 1.0 4. This model sets the true reduced-form coe cients to zero for all instruments in an attempt to ascertain how misleading the estimators might be in this non-identi ed case; K=21, L=2 and ρ = 0.25 5. AIK introduces a misspeci cation; one of the instruments has incorrectly been left out of the main regression; K=21, L=2 and ρ = 0.25
Appendix BD results Stats MEAN BIAS MEDIAN BIAS RMSE SIZE Results of Simulations ρ = 0.2, M=1000 replications Estimators 2SLS LIML UJIVE USSIV OLS 54.2 6.8 0.414 0.135 0.604 0.196 0.089 0.053-4.3-0.5 0.465 0.136 0.673 0.198 0.062 0.053-12.8-1.3 0.484 0.140 0.703 0.199 0.036 0.046-6.5-0.4 0.674 0.179 0.987 0.275 0.049 0.045 241 243.9 1.199 1.223 1.23 1.22 0.999 1.000
2011-11-13 The Case Against JIVE when Instruments are Weak Appendix Notes BD results Stats MEAN BIAS MEDIAN BIAS RMSE SIZE Results of Simulations ρ = 0.2, M=1000 replications Estimators 2SLS LIML UJIVE USSIV OLS 54.2-4.3-12.8-6.5 241 6.8-0.5-1.3-0.4 243.9 0.414 0.465 0.484 0.674 1.199 0.135 0.136 0.140 0.179 1.223 0.604 0.673 0.703 0.987 1.23 0.196 0.198 0.199 0.275 1.22 0.089 0.062 0.036 0.049 0.999 0.053 0.053 0.046 0.045 1.000 Notes 1. The percent bias is de ned as (ˆδ-δ)/δ. 2. The rst and second row in each statistic refers to sample size equal to 500 and 4000 respectively.
Appendix BD results Stats MEAN BIAS MEDIAN BIAS RMSE SIZE Results of Simulations ρ = 0.6, M=1000 replications Estimators 2SLS LIML UJIVE USSIV OLS 182.5 25.1 0.922 0.164 1.02 0.226 0.512 0.108-12.8-1.2 0.467 0.137 0.694 0.199 0.068 0.052-50 -4.2 0.570 0.137 0.929 0.205 0.054 0.043-25.9-1.6 0.735 0.184 1.210 0.281 0.044 0.037 587.4 590 2.396 2.951 2.94 2.95 1.00 1.00