INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental variables quantile regressions of wages on schooling with relatively weak instruments. We find large and practically important differences between the asymptotic and finite-sample interval estimates.. Introduction The ability of quantile regression (QR) to characterize the heterogeneous impact of variables on different points of an outcome distribution makes it appealing in many economic applications; see Koenker (2005) for an excellent overview of quantile regression which includes a number of interesting applications. As with most econometric estimators, inference for objects estimated with quantile regression is typically carried out using asymptotic approximations. Asymptotic results for the case in which all covariates are exogenous were derived in Koenker and Bassett (978). These results are extended in Chernozhukov and Hansen (200) for the case with endogenous covariates when instrumental variables are available. As with conventional instrumental variables models, one might worry that the usual asymptotic approximation may perform quite poorly in cases in which the statistical relationship between the instruments and the endogenous variable is weak; Chernozhukov and Hansen (2005) offers a simple approach to inference that will be asymptotically valid in cases where there is little or no relationship between the instruments and endogenous variables. However, when one is considering models defined by quantile restrictions there is one more possibility: Chernozhukov, Hansen, and Jansson (2005) show that it is possible to construct exact finite sample confidence regions for parameters of a model defined by quantile restrictions under minimal conditions which do not require the imposition of distributional assumptions. In this note, we briefly outline each of the three approaches to obtaining inference statements for quantile regression models with endogeneity. We then compare the three Date: January 20, 2006.
approaches in a simple example due to Card (995) which uses college proximity as an instrument for years of education in a quantile regression of wages on education. We find substantial differences between the three approaches in this case that appear to be due to the weakness of the instruments. Overall, it appears that caution should be used when using asymptotic approximations in quantile regression models with instrumental variables. 2. Inference Approaches We consider a simple random coefficients model with structural equation given by where D and U may be statistically dependent, Y = D α(u) + X β(u), (2.) A. D α(u) + X β(u) is strictly increasing in U for almost every D and X, A2. U U(0, ), and A3. U is independent of X and a vector of observed exogenous variables Z that do not enter the structural equation. That is, D is a vector of potentially endogenous variables with random coefficients α(u); X is a vector of observed exogenous variables that enter the structural equation with random coefficients β(u), and Z is a vector of exogenous variables that are excluded from the structural equation where it is assumed that dim(z) dim(d). Note that this setup can incorporate the conventional linear quantile regression model of Koenker and Bassett (978) where α(u) 0 and Z = X. Also, for the finite sample approach, linearity is not important, but it is necessary that the quantile functions be indexed by a finite-dimensional parameter vector. All of the inference approaches we consider may then be motivated from the following. Under conditions A.-A3., the problem of potential dependence between U and D is overcome through the presence of instrumental variables, Z, that affect the determination of D but are independent of U. The presence of the instrumental variable leads to a set of moment equations that can be used to estimate the parameters of (2.). From (2.) and the monotonicity assumed in A., the event {Y D α(τ) + X β(τ)} is equivalent to the event {U τ}. It then follows that P[Y D α(τ) + X β(τ) Z, X] = τ. (2.2) 2
Equation (2.2) provides a moment restriction that can be used to estimate the structural parameters α(τ) and β(τ). For example, point estimation and conventional asymptotic inference may be done in the usual fashion by using this set of moment restrictions to form a GMM estimator; see Pakes and Pollard (989). A somewhat different approach based on these moment conditions may be computationally more attractive than GMM in many situations. In particular, equation (2.2) implies that 0 is the τ th conditional quantile of Y D α(τ) X β(τ) given Z and X. This suggests that a procedure for estimating α(τ) and β(τ) for a given quantile index τ may be defined as follows. For a given value of the structural parameter, say α, run the ordinary QR of Y D α on X and Z to obtain ( β(α, τ), γ(α, τ)) where γ(α, τ) are the estimated coefficients on the excluded instruments Z. To find an estimate for α(τ), one then finds a value for α that makes the coefficient on the instrumental variable γ(α, τ) as close to 0 as possible. That is, α(τ) = arg inf α A n[ γ(α, τ) ]Â(α)[ γ(α, τ)], (2.3) where A is the parameter space for α, Â(α) = A(α) + o p () and A(α) is positive definite uniformly in α A. It is convenient to set A(α) equal to the inverse of the asymptotic covariance matrix of n( γ(α, τ) γ(α, τ)) in which case W n (α) is the Wald statistic for testing γ(α, τ) = 0. The parameter estimates are then given by ( α(τ), β(τ) ) = ( α(τ), β( α(τ), τ) ). Chernozhukov and Hansen (200) and Chernozhukov and Hansen (2005) provide additional details and verify that the above procedure consistently estimates α(τ) and β(τ) and find the limiting distribution from which conventional asymptotic inference immediately follows. The above procedure may also be modified to obtain confidence intervals that will be asymptotically valid when there is only a weak or even no statistical relationship between D and Z. The idea is to base inference on the Wald statistic W n (α) for testing whether the coefficients on the instruments are zero (i.e. whether γ(α, τ) = 0). The intuition for this approach is that at the true α(τ) the structural equation (2.) implies that the coefficient on the instruments should be 0. Thus, if we fix a value for α(τ), the coefficients of the instruments should be near zero if this value is consistent with the d structural equation. More formally, when α = α(τ), W n (α) χ 2 (dim(γ)). Thus, a valid confidence region for α(τ) can also be based on the inversion of this Wald statistic: CR p [α(τ)] := {α : W n (α) < c p } contains α(τ) with probability approaching p, where c p is the p-percentile of a χ 2 (dim(γ)) distribution. Chernozhukov and Hansen (2005) show that this procedure, which is essentially a quantile version of the procedure of Anderson and Rubin (949), provides a valid confidence region for α(τ) without putting 3
restrictions on the dependence between D and Z. In particular, it will be valid when identification is weak. The final approach to inference we consider is based on the observation that P[Y D α(τ)+x β(τ) Z, X] = τ implies that the event {Y D α(τ)+x β(τ)} is distributed exactly as a Bernoulli(τ) conditional on X and Z regardless of the sample size. This result then implies that any test statistic of α(τ) and β(τ) that depends only on this event, X, and Z will have a distribution that does not depend on any unknown parameters in finite samples and so can be used to construct valid finite sample inference statements. This basic approach extends but is related to work on finite sample inference for the sample (unconditional) quantiles; see, for example, Walsh (960) and MacKinnon (964) who note the existence of finite sample pivots for unconditional quantiles. Chernozhukov, Hansen, and Jansson (2005) make use of the distributional fact mentioned above and consider inference based on the GMM objective function: ( ) L n (α, β) = ( ) n n m i (α, β) W n m i (α, β), (2.4) 2 n n i= where m i (α, β) = [τ (Y i D iα X iβ)](z i, X i). In this expression, W n is a positive definite weight matrix, which is fixed conditional on (X, Z ),..., (X n, Z n ). A convenient and natural choice of W n is given by W n = τ( τ) ( n i= ) n (Z i, X i ) (Z i, X i ), i= which equals the inverse of the variance of n /2 n i= m i(α(τ), β(τ)) conditional on (X, Z ),..., (X n, Z n ). Also, since this conditional variance does not depend on α(τ) or β(τ), the GMM function with W n defined above also corresponds to the objective function of the continuous-updating estimator of Hansen, Heaton, and Yaron (996). Since the statistic defined by (2.4) depends only on X, Z, and {Y D α + X β}, it is conditionally pivotal in finite samples at (α(τ), β(τ) ) ; that is, its distribution at the true parameter values does not depend on any unknown parameters and so exact finite-sample critical values for this statistic may be obtained. Let c n (p) denote the p th quantile of the distribution of the statistic given in (2.4), and note that c n (p) may be obtained through the following algorithm: [Computation of c n (α).] Given ((X i, Z i ), i =,..., n), for j =,..., J:. Draw (U i,j, i n) as iid Uniform, and let (B i,j = (U i,j τ), i n). 2. Compute 4
L n,j = 2 ( n n i= (τ B i,j) (Z i, X i) ) W n ( n n i= (τ B i,j) (Z i, X i) ). 3. Obtain c n (α) as the α-quantile of the sample (L n,j, j =,..., J), for a large number J. Given c n (p), a valid p-level confidence region for (α(τ), β(τ) ) may then be obtained as the set CR(p) {(α, β ) : L n (α, β) c n (p)}. CR(p) defines a joint confidence set for all of the model s parameters from which one can define a confidence bound for a real-valued functional ψ(α(τ), β(τ), τ) as CR(p, ψ) = {ψ(α, β, τ), (α, β ) CR(p)}. It is worth noting that the joint confidence region CR(p) is not conservative; however, confidence bounds for functionals, such as individual parameters, may be. Also, forming confidence regions by inverting a test-statistic will generally be computationally demanding. Chernozhukov, Hansen, and Jansson (2005) present various feasible algorithms for computing confidence regions and bounds in this context. 3. Case Study: Card s (995) Schooling Example We illustrate the different approaches to inference in an example. We estimate a simple wage model that allows for heterogeneity in the effect of schooling on wages as well as other heterogeneity in the effects of other control variables. Because we may be concerned that the level of schooling and earnings may be jointly determined, perhaps due to unobserved ability that affects both earnings and schooling, we instrument for schooling. We employ the same data and identification strategy as Card (995). Specifically, we suppose that the log of the wage is determined by the following linear quantile model: Y = α(u)s + X β(u) where Y is the log of the hourly wage; S is years of completed schooling; X is a vector of control variables consisting of a constant, potential experience (age - schooling - 6), potential experience squared, an indicator which equals if the individual is black, an indicator which equals if the individual resides in an SMSA in 976, an indicator for south, an indicator which equals if the individual lived in an SMSA in 966, and eight dummies for region of residence in 966; and U is an unobservable normalized to follow a uniform distribution over (0,). We might think of U as indexing unobserved ability, in which case α(τ) may be thought of as the return to schooling for an individual with unobserved ability τ. Since we believe that years of schooling may be jointly determined with unobserved ability, we use an indicator which equals if the individual lived near a two year college in 966 and an indicator which equals if the individual lived near a four year college in 966 as instruments for schooling. Further details about the data 5
sources, descriptive statistics, and arguments for the validity of the instruments are available in Card (995). We summarize results for the schooling coefficient obtained via conventional quantile regression and instrumental variables quantile regression in Table. For quantile regression, we report 95%-level confidence intervals computed both the asymptotic approximation and finite-sample approach. For instrumental variables quantile regression, we report 95%-level confidence intervals computed using the asymptotic approximation, the weak-instrument robust approach, and the finite-sample approach. Both the finite-sample and weak-instrument approach require that we invert a test-statistic for a range of potential values for α(τ); in all cases, we consider values in the interval [-.,.5] which we feel is sufficient to cover essentially all plausible values for the returns to an additional year of schooling. The results have a number of interesting features. Looking first at the results in Panel A of Table which treat schooling as exogenous, we see that the point estimates across the various quantiles are quite close, suggesting little heterogeneity in how the quantiles of wages vary with schooling. We also see that the finite-sample and asymptotic results produce qualitatively similar results, though the finite-sample intervals are substantially wider than the asymptotic intervals. In the estimates which treat schooling as endogenous, the differences between the inference approaches are even more striking. The point estimates suggest considerable heterogeneity in the returns to schooling, though even the usual asymptotic intervals are wide enough to make finding statistical evidence for this quite unlikely. Considering next the weak instrument robust intervals, we see that they are substantially wider than the usual asymptotic intervals in all cases. For each quantile we consider, the weak instrument robust intervals include essentially the entire parameter space including extremely large values for the returns to schooling. However, in the two cases in which the usual asymptotic intervals exclude 0, the weak instrument robust intervals do as well. While the weak instrument robust intervals rely on less stringent assumptions than do the usual asymptotic intervals, they still rely on an asymptotic argument. Looking last at the finite sample intervals which are valid under minimal assumptions and do not require any asymptotic argument, we see that they cover the entire parameter space in every case. These striking differences suggest that the asymptotic intervals may be misleading in this case. The likely cause for the large discrepancies is the weakness of the relationship between the instruments and schooling. If one regresses education on the control variables 6
and the two proximity instruments, the F-statistic on the excluded instruments is only 8.32 which is in the range that many would consider weak in the usual linear instrumental variables model. The weakness of the instruments clearly explains the difference between the usual asymptotic and weak instrument asymptotic results. The differences between the weak instrument and finite sample intervals is then likely driven by the fact that with a large number of covariates and weak instruments the effective sample size is quite small. It would be interesting to explore these differences and develop formal procedures for judging differences between asymptotic and finite sample intervals. 4. Conclusion Quantile regression is a useful statistical tool that allows one to parsimoniously capture heterogeneous covariate effects. Quantile regression is also interesting in that one can obtain exact finite-sample inference statements for models defined by quantile restrictions without imposing distributional assumptions. In this note, we have briefly outlined various approaches to inference for quantile regression and illustrated their use in an empirical study of the returns to schooling. We find substantial differences between asymptotic inference statements and finite-sample inference statements that calls into question the validity of the asymptotics in the schooling data used by Card (995). References Anderson, T. W., and H. Rubin (949): Estimation of the parameters of a single equation in a complete system of stochastic equations, Ann. Math. Statistics, 20, 46 63. Card, D. (995): Using Geographic Variation in College Proximity to Estimate the Return to Schooling, in Aspects of Labour Economics: Essays in Honour of John Vanderkamp, ed. by L. Christofides, E. K. Grant, and R. Swindinsky. University of Toronto Press. Chernozhukov, V., and C. Hansen (200): Instrumental Quantile Regression Inference for Structural and Treatment Effect Models, Journal of Econometrics (forthcoming). (2005): Instrumental Variable Quantile Regression, mimeo. Chernozhukov, V., C. Hansen, and M. Jansson (2005): Finite-Sample Inference in Econometric Models via Quantile Restrictions, mimeo. Hansen, L. P., J. Heaton, and A. Yaron (996): Finite Sample Properties of Some Alternative GMM Estimators, Journal of Business and Economic Statistics, 4(3), 262 280. Koenker, R. (2005): Quantile Regression. Cambridge University Press. Koenker, R., and G. S. Bassett (978): Regression Quantiles, Econometrica, 46, 33 50. MacKinnon, W. J. (964): Table for both the sign test and distribution-free confidence intevals of the median for sample sizes to,000, J. Amer. Statist. Assoc., 59, 935 956. Pakes, A., and D. Pollard (989): Simulation and Asymptotics of Optimization Estimators, Econometrica, 57, 027 057. Walsh, J. E. (960): Nonparametric tests for median by interpolation from sign tests, Ann. Inst. Statist. Math.,, 83 88. 7
Table. Schooling Coefficient Estimation Method τ = 0.25 τ = 0.50 τ = 0.75 A. Quantile Regression (No Instruments) α(τ) 0.074 0.074 0.079 Quantile Regression (Asymptotic) (0.064,0.083) (0.065,0.083) (0.07,0.087) Finite Sample (0.047,0.00) (0.050,0.0) (0.057,0.098) B. IV Quantile Regression (Proximity to 2 and 4 Year College as Instruments) α(τ) 0.75 0.033 0.03 Inverse Quantile Regression (Asymptotic) (0.08,0.269) (-0.077,0.42) (0.025,0.80) Weak Instrument (Asymptotic) (0.08,0.500] (-0.053,0.500] (0.068,0.483) Finite Sample [-0.00,0.500] [-0.00,0.500] [-0.00,0.500] Note: 95% level confidence interval estimates for the schooling coefficient in a simple wage model. Panel A reports results from a model which treats schooling as exogenous, and Panel B reports results from a model which treats schooling as endogenous and uses college proximity as instruments for schooling. The first row in each panel reports the point estimate, and the second row reports the interval estimate obtained using the asymptotic approximation. In Panel A, the third row reports an interval estimate obtained using the finite-sample approach. In Panel B, the third row reports a weak-instrument robust interval, and the fourth row reports the finite sample interval. 8