Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator for the linear model with endogenous regressors with corresponding instrumental variables. This estimator involves two stages. The rst stage is to estimate the reduced form equations. The second stage makes use of the coecient and the associated estimated variance matrices to construct a continuous updating minimum distance estimator for the structural coecients of the endogenous regressors. This estimator is an extension of when heteroscedasticity exists, while it saves computational burden substantially relative to -CUE. A test of coecient based on the dierence in objective function, analogous to test, is also considered. This simulation study compares this minimum distance estimator with and -CUE estimators. Under heteroscedasticity, the and the minimum distance estimators are more ecient than. The eciency improvement is more prominent when the degree of overidentication is large. The tests tend to over-reject when asymptotic critical values are used. However, if we use restricted ecient bootstrap method proposed by Davidson and MacKinnon (2008), the size of the test is very close to its nominal value, and the power of test is usually the best among the three. Keywords: instrumental variables; minimum distance estimator; limited information maximum likelihood estimator; generalized method of moments; weak instrument. JEL codes: C13, C15, C36 School of Economics, Shanghai University of Finance and Economics, and Key Laboratory of Mathematical Economics (SUFE), Ministry of Education. 777 Guoding Road, Yangpu, Shanghai, 200433, China. Email: zhou.dewei@mail.shufe.edu.cn. 1

1 Introduction This paper investigates the nite sample performance of a minimum distance estimator for the linear model with endogenous regressors with corresponding instrumental variables. This estimator involves two stages. The rst stage is to estimate the reduced form equations. The second stage makes use of the coecient and the associated variance matrix estimate to construct a continuous updating minimum distance estimator for the structural coecients of the endogenous regressors. With the use of the usual homoscedasticity-only variance, the estimator is the same as. However, if there is heteroscedasticity, the use of heteroscedasticity robust variance matrix in the second stage can potentially provide a more ecient estimator. The minimum distance estimator can also substantially save computation burden relative to continuous updating because the non-linear optimization only depends on the reduced form coecients and variance matrix instead of data from each individual observation. A test of coecient value based on the dierence in the value of the objective function with and without the restriction, analogous to test, is also considered. The use of minimum distance estimator with instrumental variable estimation is more popular for limited dependent variable models, such as Amemiya (1978, 1979), Newey (1987), Lee (1992) and Magnusson (2010), while only Magnusson (2010) is about its usage under weak instruments. In this paper, the estimator is investigated under the context of linear model, while I concentrate on its nite sample performance under weak instruments, as well as assessing the performance of the test statistics based on the dierence in the value of the objective function with and without the restriction. Hensen et. al. (1998) discuss the use of continuous updating form of, which have similar properties of such as low median bias under moderately weak instruments but wider dispersion (indeed lack of nite sample moments). They also discuss the use of the objective function as a test statistics under continuous updating and nd that it is more reliable for over-identifying restriction test. Stock and Wright (2000) derive the asymptotic properties of under weak instruments, and their simulation results show that dierence in objective function for continuous updating is more reliable, though not fully robust to weak instruments. Thus in this paper, the continuous updating forms of and minimum distance are considered. The simulation study shows that this minimum distance estimator shares some properties with and the continuous updating estimator, for example, is median unbiased unless the instruments are very weak. However, as expected, under heteroscedasticity, the and the minimum distance estimators can provide more ecient estimator. The eciency improvement is more prominent when the degree of over-identication is large. The tests tend to over-reject when asymptotic critical values are used. However, if we use restricted bootstrap proposed by Davidson and MacKinnon (2008), the size of the test is close to its nominal value, and the power using the minimum distance objective function is sometimes better than that using the objective funcion. In what follows, the model and the minimum distance estimator are specied in section 2. In Section 3, the simulation scheme and results are presented. Section 4 concludes. 2

2 Model and Estimators Consider the standard linear model setting with endogenous regressors and the corresponding instruments for each endogenous variable, y = X 1 β 1 + X 2 β 2 + ε = Xβ + ε (1) X 1 = Z 1 α 1 + X 2 α 2 + u = Zα + u (2) where y is an n 1 matrix of the dependent variable, X 1 and X 2 are n l 1 and n l 2 matrices of the endogenous and exogenous regressors of the structural equation respectively, while Z 1 is an n K matrix of the instrumental variables for X 1. X and Z include all regressors in structural and rst-stage regressions respectively. For simplcity, I consider l 1 = 1. The reduced form equation for the dependent variable y on all exogenous variable can be written as Notice that putting (2) into (1), we have y = Z 1 γ 1 + X 2 γ 2 + v. (3) y = (Z 1 α 1 + X 2 α 2 + u)β 1 + X 2 β 2 + ε = Z 1 (α 1 β 1 ) + X 2 (α 2 β 1 + β 2 ) + uβ 1 + ε (4) This provides the structural restrictions to the reduced form relations: γ 1 = α 1 β 1 (5) γ 2 = α 2 β 1 + β 2 (6) If we estimate the above equations (2) and (3) for ˆα 1, ˆα 2, ˆγ 1, and ˆγ 2, by OLS 1, we can then obtain estimators for the structural parameter β 1 and β 2 through (5) and (6). From (6), once β 1 is xed, say at β 1, ˆβ 2j = ˆγ 2j ˆα 2j β 1 for the jth coecient of ˆβ 2, then each coecient in ˆβ 2 can be calculated uniquely. Consequently, these restrictions do not provide information for the identication of β 1 and so, β 1 is solved using only the restrictions (5). The minimum distance () estimator considered here is a two-step estimator. First obtain ˆα 1, ˆα 2, ˆγ 1, and ˆγ 2, by OLS and the associated covariance matrix. Second, we set β 1 to minimize min β 1 (γ1 ˆ ˆα 1 β 1 ) 1 ˆV (γ1 ˆ ˆα 1 β 1 ) (7) where ˆV = ˆV γ1 β 1 ( ˆV γ1 α 1 + ˆV α1 γ 1 ) + β1 2 ˆV α1, where ˆV h = V ar(h) for any coecient estimator h from the OLS applied to the reduced from regressions above and ˆV h,j = cov(h, j) for any coecient estimators h and j. The weight matrix used here is continuous updating for β 1. The continuous 1 SUR will provide the same estimators because the right-hand-side variables are the same. 3

updating form of minimum distance and is considered here because they are generally closer to the under weak instruments, such as small median bias, while the usual 2-step and is closer to with substantial weak instrument bias. But the drawback is that they also share the property that they have no nite sample moments even under a larger degree of over-identication. The variance estimator considered here is the heteroscedasticity-robust form, since the advantage of going beyond is mainly for the potentially better performance under heteroscedasticity (and autocorrelation when time series information is involved.) My simulation results, without reported here, also show that the continuous updating and minimum distance estimators are exactly the same as if the usual homoscedasticity only variance matrix is used for the weight matrix. The variance matrices considered are in the form ˆV h = (Z Z) 1 Z Ω h Z(Z Z) 1 (8) for h = γ 1, α 1 and also the covariance term γ 1 α 1, where Ω γ1 = diag(ˆv i 2 /(1 h i )) (9) Ω α1 = diag(û 2 i /(1 h i )) (10) Ω α1 γ 1 = diag(û iˆv i /(1 h i )) (11) h i = (Z(Z Z) 1 Z ) ii (12) which corresponds to HC2 proposed by MacKinnon and White (1985). 2 One important advantage of this minimum distance method compared to continuous updating (-CUE) is that it saves computational cost on the non-linear optimization problem. For - CUE, the optimal value has to be obtained in a number of iterations, while data from all observations are involved in calculating the objective function. However, for the -CUE estimator, the nonlinear optimization only involves the reduced form coecients, the rst-stage coecients and the associated covariance matrices. Furthermore, we concentrate on the restrictions (5) and do not need to consider coecients on exogenous regressors, which further reduces the dimension of the matrix involved. In this way, the computational burden for each iteration is substantially reduced. This has great advantage if more computationally intensive methods, such as bootstrap, are used. This paper compares the nite sampe performance of the -CUE estimator with, and -CUE. Since and the continuous-updating forms of and do not have nite sample moments, as it can be observed in the simulation, I compare the median of the estimators to assess the bias. Similarly, root mean squared errors also do not exist, and so here I use median absolute errors for comparing its central tendency for the distance from the true value. Besides the estimator, hypothesis testing is also important in statistical inference with such models. Here I consider the tests using the principle, which are generally more robust to weak instrument 2 The results from using HC1, which involved only the degree of freedom adjustment, are similar. 4

than Wald tests. (See Andrews, Moreira and Stock, 2007.) Besides the test, I also consider the dierence in objective function between restricted and unrestricted models for and as test statistics for the restriction on the value of β 1. First, I apply the asymptotic distribution that the test statistics are distributed in Chi-square with degree freedom of 1. (See Wooldridge, 2010, p.227 and 547.) Second, I consider the restricted bootstrap test proposed by Davidson and MacKinnon (2008) to nd the critical values of the test statistics and compare the sample statistics to the critical value from the bootstrap. I consider the wild bootstrap proposed by Davidson and MacKinnon (2010) since it is simple but is valid under heteroscedasticity of unknown form. Besides comparing the size at the true value, I also compare the rejection probability at points other than the true value to compare the power of the tests. 3 Simulation Results 3.1 Simulation Scheme The data generating process I consider in this simulation exercise is as follows. y = β 2 + β 1 X 1 + ε (13) X 1 = α 2 + α 1 Z 1 + u (14) where X 1 includes one single endogenous regressor, and we do not consider other exogenous regressors besides a constant term, which is used in practice. Z 1 contains K instruments not included in the structural equation (13). I consider K = 3 and 15 to represent low and high degree of overidentication under the sample size of n = 100. β 1 is set to 1, while I set β 2 = α 2 = 0. α 1 is set so that the concentration parameter normalized by number of instruments equals 1, 3, 6 and 15. 3 This is used as the benchmark because the rst-stage F statistic, the statistic usually used for judging and testing for weak instruments, has an expectation of () + 1. For 2SLS, the rule of thumb for weak instrument is to have rst-stage F above 10. But it is generally more robust for and -CUE under weak instruments, and so I have chosen a few values below 10 for comparison. Z 1 is simulated under independent standard normal distribution, which means Z 1 N(0, I K ). In the simulation we allow for heteroscedasticity to investigate the performance of continuous updating and under heteroscedasticity. First, u and v are simulated from independent standard normal distributions. Then the structural error in setting 1 is constructed by ε i = (a K 1 exp k=1 z ki )(ρu i + 1 ρ 2 v i ) (15) K where ρ measures the correlation between the structural and reduced form errors, and a 1 is a 3 In particular, α 1 = ()/n. 5

parameter controlling the degree of heteroscedasticity. The heteroscedasticity factor has a mean near 1 while dividing the sum of z by K helps to maintain the degree of heteroscedasticity when K varies. In an alternative setting, only the rst instrument aects the variance of the structural error, thus we have ε i = exp (a 1 z 1i )(ρu i + 1 ρ 2 v i ) (16) and we call this setting 2. In the simulation exercise, I have set a 1 = 2. The number of repetition used in the simulation for the results with and without bootstrap are 2500 and 25000 respectively. 3.2 Simulation Results Figure 1-12 present the simulation results of the above model for the three estimators under various strengthes of instruments, degrees of over-identication, and correlation between reduced form and structural errors. Figure 1-6 are the results for setting 1 while gure 7-12 are the results for setting 2. Figure 1 presents the median of the estimator for β 1, that we can assess the degree of bias for the three estimators. Generally, the estimator has a larger bias in median, while, - CUE and -CUE all have little bias. Across dierent strengthes of instruments, the biases for the latter three are substantial only when instruments are as weak as when is 1 (or the rst-stage F is around 2.) The -CUE estimator has a slightly higher bias than the other two in some cases. In summary, the is still robust to moderately weak instrument in terms of bias, as for and -CUE. Figure 2 present the results for the results for median absolute errors (MAE). -CUE and - CUE generally have lower MAE than, while -CUE has lower the MAE over -CUE for the cases with large degree of over-identication. can have lower MAE than the other estimators, even though its bias is often substantially larger. Thus, there is a trade-o between bias and error between and other three estimators. Figure 3 shows the actual size of the the three tests of principle,, Dierence statistics for -CUE and -CUE (also known as dierence in J or JD) using the asymptotic critical values. The test over-rejects substantially, of about 35% for a test of nominal size of 5%. However, it is not surprising because the statistics has not been adjusted for heteroscedasticity. The other two tests, the dierence statistics also tend to over-reject, and is especially serious when the degree of overidentication is large where the actual size is over 15%. Consequently, the use of asymptotic critical value under weak or moderately weak instrument for three three statistics are not recommended. Figure 4 to 6 show the results for tests performed under restricted ecient wild bootstrap of Davidson and MacKinnon (2008, 2010). Figure 4 shows the actual size for the nominal test for 5%. The performance of this bootstrap method is very good. The actual size only have a larger deviation from the actual size when instruments are very weak (with = 1.) 4 For other cases, the nominal 4 Thus, it is not fully robust to weak instrument. However, it may not be a serious matter, as very weak instruments are generally uninformative about the structural parameters and are not used practically. 6

size and actual size are very close, giving us condence about the test results. Figure 5 and 6 shows the power of the test when testing the null ofβ = 0.75 and β = 1.25 respectively. The results show that the test generally have higher power than test, while test using bootstrap critical values, is also performing well. The advantage of is more substantial when the degree of over-identication is large. Figure 7-12 are the results for an alternative setting where the heteroscedasticity is related to only one of the instruments. The results are similar to the setting 1, and so the results are not specic to the setting 1. In summary, these results show that -CUE has low median bias as for and -CUE, lower median absolute errors than and sometimes even for -CUE, and more powerful in restricted ecient bootstrap test using the dierence statistics than and JD-. However, the asymptotic distribution of Chi-square distribution should not be used to obtain the critical values, as the rate of over-rejection is substantial. This result is not surprising as test has to be modied as Moreira (2003) C test in order to be valid under weak instrument. The use of wild restricted ecient bootstrap by Davidson and MacKinnon (2008) provides very good size of the tests even when instruments are moderately weak. 4 Conclusion The results of the simulation exercise generally shows that -CUE can successfully provide a better estimator and test results relative to under heteroscedasticity, but at the same time computationally a lot faster than -CUE. This also makes computationally intensive techniques such as bootstrap becomes substantially less burdensome. The results also show that for and dierence statistics, asymptotic distributions are not reliable, while restricted ecient bootstrap proposed by Davidson and MacKinnon (2008) have very good size under moderately weak instrruments. Further theoretical and simulation research can be fruitful in determining why -CUE can further improve eciency and improve power relative to -CUE especially under large degree of over-identication, and also to investigate whether conditional approach to obtain critical value is possible for the JD statistics. This method can also be extended to cases where errors are autocorrelated, such as in dynamic panel data models. References [1] Amemiya, T. (1978) The estimation of a simultaneous equation generalized probit model, Econometrica, 46, 1193-1205. [2] Amemiya, T. (1979) The estimation of a simultaneous equation tobit model, International Economic Review, 20, 169-181. [3] Andrews D.W.K., Moreira, M.J. and Stock, M.J. (2007) Performance of conditional Wald tests in IV regression with weak instruments, Journal of Econometrics, 139, 116-132. 7

[4] Davidson, R. and MacKinnon, J.G. (2008) Bootstrap inference in a linear equation estimated by instrumental variables, Econometrics Journal, 11, 443-477. [5] Davidson, R and MacKinnon, J. G. (2010) Wild bootstrap tests for IV regression, Journal of Business and Economic Statistics, 28(1), 128-144. [6] Hensen, L.P., Heaton, H., and Yaron, A. (1996) Finite-sample properties of some alternative estimators, Journal of Business and Economic Statistics, 14(3), 262-280. [7] Lee, L-F (1992) Amemiya`s generalized least squares and tests of over-identication in simultaneous equation models with qualitative or limited dependent variables Econometric Reviews, 11(3), 319-328. [8] MacKinnon J.G and White H. (1985) Some heteroscedasticity-consistent covariance matrix estimators with improved nite sample properties, Journal of Econometrics, 29, 305-325. [9] Magusson, L. M. (2010) Inference in limited dependent variable models robust to weak identication, Econometrics Journal, 13, S56-S79. [10] Moreira, M.J. (2003) A conditional likelihood test for structural models, Econometrica, 71(4), 1027-1048. [11] Newey, W. (1987) Ecient estimation of limited dependent variable models with endogenous explanatory variables, Journal of Econometrics, 36, 231-250. [12] Stock, J. H. and Wright, J.H. (2000) with weak identication, Econometrica, 68(5), 1055-1096. [13] Wooldridge, J. M. (2010) Econometrics Analysis of Cross Section and Panel Data, 2nd edition, MIT: Cambridge. 8

Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Figure 1: Median of the estimators for setting 1 Note: True value is β = 1, where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. 9

Median Absolute Errors 0.0 0.5 1.0 1.5 2.0 2.5 Median Absolute Errors 0.0 0.5 1.0 1.5 2.0 2.5 Median Absolute Errors 0.0 0.4 0.8 1.2 Median Absolute Errors 0.0 0.4 0.8 1.2 Figure 2: Median Absolute Errors for Setting 1 Note: K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. 10

0.0 0.1 0.2 0.3 0.4 0.5 JD JD 0.0 0.1 0.2 0.3 0.4 0.5 JD JD 0.0 0.1 0.2 0.3 0.4 0.5 JD JD 0.0 0.1 0.2 0.3 0.4 0.5 JD JD Figure 3: Size of the test using usual asymptotic distribution for setting 1 Note: The nominal size is 5%, which is where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. 11

0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C Figure 4: Size of RE bootstrap tests for setting 1 Note: The nominal size is 5%, which is where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 12

JD JD C JD JD C JD JD C JD JD C Figure 5: Power of RE bootstrap test at β = 0.75 for setting 1 Note: The nominal size is 5%, which is where the horizontal line is drawn. The true value is at β = 1. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 13

JD JD C JD JD C JD JD C JD JD C Figure 6: Power of RE bootstrap test at β = 1.25 for setting 1 Note: The nominal size is 5%, which is where the horizontal line is drawn. The true value is at β = 1. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 14

Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Median of estimators 0.0 0.5 1.0 1.5 2.0 Figure 7: Median of estimators for setting 2 Note: True value is β = 1, where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related only to the rst instruments. 15

Median Absolute Errors 0.0 0.5 1.0 1.5 2.0 Median Absolute Errors 0.0 0.5 1.0 1.5 2.0 Median Absolute Errors 0.0 0.2 0.4 0.6 0.8 1.0 Median Absolute Errors 0.0 0.2 0.4 0.6 0.8 1.0 Figure 8: Median absolute errors for setting 2 Note: K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related only to the rst instruments. 16

0.0 0.1 0.2 0.3 0.4 JD JD 0.0 0.1 0.2 0.3 0.4 JD JD JD JD JD JD Figure 9: Size of tests using usual asymptotic distribution for setting 2 Note: The nominal size is 5%, which is where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to only the rst instruments. 17

0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C 0.00 0.04 0.08 0.12 JD JD C Figure 10: Size of test using RE bootstrap for setting 2 Note: The nominal size is 5%, which is where the horizontal line is drawn. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to only the rst instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 18

JD JD C JD JD C 0.0 0.1 0.2 0.3 0.4 0.5 0.6 JD JD C 0.0 0.1 0.2 0.3 0.4 0.5 0.6 JD JD C Figure 11: Power of test at the null β = 0.75 using RE bootstrap for setting 2 Note: The nominal size is 5%, which is where the horizontal line is drawn. The true value is at β = 1. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 19

JD JD C JD JD C 0.0 0.1 0.2 0.3 0.4 0.5 0.6 JD JD C 0.0 0.1 0.2 0.3 0.4 0.5 0.6 JD JD C Figure 12: Power of test at the null β = 1.25 using RE bootstrap for setting 2 Note: The nominal size is 5%, which is where the horizontal line is drawn. The true value is at β = 1. K refers to number of instrument for the single endogenous variable, µ is the concentration parameter, ρ is the correlation between the structural and rst-stage error terms. Setting 1 involves heteroscedasticity that is related to all instruments. The RE bootstrap test is the one proposed by Davidson and MacKinnon (2008) where wild bootstrap is used. 20