Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

Size: px

Start display at page:

Download "Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction"

Hector Dalton
6 years ago
Views:

1 Instrumental Variables Estimation and Weak-Identification-Robust Inference Based on a Conditional Quantile Restriction Vadim Marmer Department of Economics University of British Columbia vadim.marmer@gmail.com and Shinichi Sakata Department of Economics University of Southern California shinichi.sakata@gmail.com August 17, 2011 Extending the L 1-IV approach proposed by Sakata 1997, 2007), we develop a new method, named the ρ τ -IV estimation, to estimate structural equations based on the conditional quantile restriction imposed on the error terms. We study the asymptotic behavior of the proposed estimator and show how to make statistical inferences on the regression parameters. Given practical importance of weak identification, a highlight of the paper is a proposal of a test robust to the weak identification. The statistics used in our method can be viewed as a natural counterpart of the Anderson and Rubin s 1949) statistic in the ρ τ -IV estimation. 1

2 1 Introduction In this paper, we develop a new method, named the ρ τ -IV estimation, to estimate structural equations based on the conditional quantile restriction imposed on the error terms, extending the L 1 -IV approach proposed by Sakata 1997, 2007). We study the large sample behavior of the new estimator and show how to make statistical inferences on the regression parameters. In particular, we pay attention to the statistical inference under weak identification, as the weak identification is as important a possibility in the regression based on a conditional quantile restriction as in that based on the conditional mean restriction. We propose a weak-identification-robust test that can be viewed as a natural counterpart of the Anderson and Rubin s 1949) statistic in ρ τ -IV estimation. The conventional instrumental variables IV) estimator is based on the identification of the structural parameters through the conditional mean restriction that the mean of the structural error term conditional on a set of instrumental variables is zero. The conditional mean restriction may look appealing, because, unlike the independence between the error term and the instruments, it does not impose restrictions on other features of the conditional distribution of the error term such as the variance of it. Nevertheless, the conditional mean restriction is considered unsuitable in some applications. The conditional mean of a random variable critically depends on the tails of the conditional distribution of the variable. A small change in the tails can cause a large change in the conditional mean. In many applications, on the other hand, we know little about the part of the population distribution that correspond to the tails of the error distribution. This often makes it difficult to justify the conditional mean restriction. The conditional mean restriction is not the only natural way to identify the parameters of structural equations. In many applications, the conditional mean restriction comes from an informal intuition that the location of the conditional distribution of the error term given a suitably chosen set of instruments should be constant. When we are faced by the above-mentioned concern about the conditional mean restriction, one would desire to capture the location of the conditional distribution of the error term by a measure that does not depend on tails. The conditional quantiles of the error term are examples of such location measures. Sakata 1997, 2007) proposes identifying and estimating the regression parameters based on the conditional median restrictions. Chernozhukov and Hansen 2001, 2006) also consider identification of the regression parameters based on the conditional quantile restrictions and propose an estimation method, tak- 2

3 ing an approach related to but different from Sakata s. In the current paper, we extend the estimator of Sakata 1997, 2007) to propose an method called the ρ τ -IV method to estimate regression models with the conditional τ-quantile restriction. Being based on the same identification condition, our estimator is closely related to Chernozhukov and Hansen s estimator. The computation burden of the two estimators are also comparable, as should be clear from the discussion in Section 3 of the current paper. A benefit of our approach is that the objective function to be maximized in the ρ τ -IV estimation takes a form similar to the variance ratio in the normal) limited information maximum likelihood LIML) estimation. This allows us to formulate a statistic analogous to the Anderson-Rubin AR) statistic, with which we can make weak-identification-robust inference on the regression parameters of interest. In the IV regression literature, many researchers have been paying attention to possible identification issues. Sargan 1983) points out that near violation of identifiability is problematic. The analysis of Phillips 1984, 1985) on the exact finite sample distribution of LIML clearly shows that lack of identifiability in structural equation estimation keeps the LIML estimator from consistently estimating the coefficient of the structural equation. Hillier 1990) also shows analogous results in considering the directional estimation of the coefficients of structural equations. Choi and Phillips 1992) further explores the behavior of the IV estimator under lack of identifiability. When instrumental variables are poorly correlated with endogenous explanatory variables in linear regression, the asymptotic distribution of the IV estimator is quite different from what the standard large sample theory suggests, as demonstrated by Nelson and Startz 1990b, 1990a) and Bound, Jaeger, and Baker 1995). Staiger and Stock 1997) propose an alternative way to approximate the distribution of the IV estimator with weak instruments. Stock and Wright 2000) then establish a way to approximate the distribution of generalized-method-moments GMM) estimators under weak identification. The proposed approximation methods are useful in theoretically studying the nature of the IV and GMM estimators under weak identification. Nevertheless, they do not offer a way to approximate the distribution of the estimators based on data, involving some unidentifiable nuisance parameters. Given the absence of a convenient and reliable approximation to the distribution of the IV estimator with weak instruments, it is difficult to perform tests of hypotheses on regression parameters in the usual style i.e., the t-test, the Wald test, etc.). On the other hand, the AR test originally proposed in Anderson 3

4 and Rubin 1949) is not affected by weakness of instruments. For this reason, Staiger and Stock 1997) and Dufour 1997) recommend the use of the AR test. The AR test even has nice power properties if the number of instruments is equal to the number of endogenous explanatory variables Moreira 2001, Andrews, Moreira, and Stock 2004). The weak identification is also an important possibility in regression based on a conditional quantile restriction. To this end, we propose a test that has asymptotically correct size regardless of whether the identification is strong or not. The hypothesis we consider is that some regression parameters are equal to prespecified values. If we apply the ρ τ -IV method imposing the constraints of the null hypothesis, the objective function in the ρ τ -IV estimation maximized subject to the parameter constraints of the null hypothesis tends to be close to one under the null. The constraint maximum of the objective function is similar to the Anderson and Rubin 1949) statistic in the sense that it captures how much of the fitted structural error can be explained by the instruments. It, ranging between zero and one, is closed to one if the fitted structural residuals cannot be fitted by the instruments in the sample. Its value far from one is thus taken as an evidence against the null in our test. If the conditions in the null hypothesis include the coefficients of all regressors potentially weakly related to the instruments excluded from the regression function, then the proposed test involves no weak identification problem, so that our test is robust to weakidentification. Our test is closely related to Chernozhukov and Hansen 2008). They formulate a test in a way convenient in the estimation framework of Chernozhukov and Hansen 2001, 2006), while we propose a test convenient in the ρ τ -IV estimation. Another paper related to our test is Jun 2008). Jun formulates a test adapting the approach of Kleibergen 2005). The rest of the paper is organized as follows. We first describe the basic setup and define the ρ τ -IV estimator in Section 2. Then, after briefly discussing the computation of the ρ τ -IV estimator in Section 3, we establish the consistency and asymptotic normality of the ρ τ -IV estimator and explains how to consistently estimate the asymptotic covariance matrix of ρ τ -IV estimator in Section 4. In Section 5, we develop a weakidentification-robust method to test hypotheses on the regression parameters. Throughout the paper, denotes the Euclidean norm for vectors and the Frobenius norm for matrices, and limits are taken along the sequence of sample sizes growing to infinity, unless otherwise indicated. 4

5 2 ρ τ -IV Estimator Assumption 1: Let Ω, F, P ) be a probability space. The data are a realization of an independently and identically distributed stochastic process {X t y t, Y t, Z t) : Ω R R g R k } t N such that E[ X 1 ] <, and for each c R 1+g+k \{0}, P [c X t = 0] < 1. Partition Z t as Z t = Z t,1, Z t,2). where Z t,1 is k 1 1, and Z t,1 is k 2 1 so that k 1 +k 2 = k). The parameter of interest is the coefficients in regression of y t on Y t and Z t,1 described in the next assumption. Assumption 2: The subset B of R g is nonempty and compact. There exists a unique θ 0 β 0, α 0) B R k1 such that the conditional τ-quantile of U 1 y 1 Y 1β 0 Z 1,1α 0 given Z 1 is zero, where τ is a known real constant in 0, 1). If instead the conditional τ-quantile of U 1 given Y 1 and Z 1,1 is known to be zero, β 0 and α 0 could be consistently estimated by the estimator of Koenker and Bassett 1978). In our current setup, Koenker and Bassett s estimator is inconsistent in general. We here propose an estimator of the structural regression coefficients, following the approach described in Section 11 of Sakata 2007). Define ρ τ : R R by ρ τ v) τ 1v < 0)) v, v R, where 1A) is the indicator function that becomes one if and only if the condition A is true. Also define functions R : B R k1 R k R and Q : B R k1 R by Rβ, α, γ) E[ρ τ y 1 Y 1β Z 1,1α Z 1γ)], β, α, γ) B R k1 R k and Qβ, α) inf γ Rk Rβ, α, γ), β, α) B R k1, Rβ, α, 0) where Rβ, α, 0) > 0 by the linear independence of the elements of X t = y t, Y t, Z t) required by Assumption 1. Because of the conditional τ-quantile restriction imposed on U 1 in Assumption 2, we have that Qβ 0, α 0 ) = 1, so that for each β, α) B R k1 0 Qβ, α) Qβ 0, α 0 ) = 1. 1) 5

6 It follows that θ 0 β 0, α 0) is the maximizer of Q over Θ B R k1. Our estimator is the maximizer of the sample counterpart of Q, which is given by a sequence of random functions { ˆQ n : B R k1 Ω R} n N defined by inf γ R k 2 ˆRnβ,α,γ,ω), if inf ˆR ˆQ n β, α, ω) nβ,α,0,ω) b,a) B R k 1 ˆRn b, a, 0, ω) > 0, 1, otherwise, β, α) B R k1, ω Ω, n N, where ˆR n β, α, γ, ω) n 1 n ρ τ y t ω) Y t ω) β Z t,1 ω) α Z t ω) γ), β, α, γ) B R k1 R k, ω Ω, n N. We now define our estimator. Definition 1 The ρ τ -IV estimator): Given Assumption 1, a sequence of random vectors {ˆθ ˆβ n, ˆα n) : Ω B R k1 } n N is called the ρ τ -IV estimator if for each n N, ˆQ n ˆβ n, ˆα, ) = sup β,α) B R k 1 ˆQn β, α, ). For each β, α) B R k1, we have that inf ˆRn β, α, γ, ) = inf n 1 γ R k γ 1,γ 2) R k 1 R k 2 = inf n 1 γ 1,γ 2) R k 1 R k 2 n n ρ τ y t Y t β Z t,1α + γ 1 ) Z t,2γ 2 ) ρ τ y t Y t β Z t,1γ 1 Z t,2γ 2 ) = inf γ R k ˆRn β, 0, γ, ). Given this fact, it holds that whenever ˆR n β, α, 0, ) > 0 for every β, α) B R k1, sup α R k 1 inf ˆQ n β, α, ) = sup ˆR γ R k n β, 0, γ, ) = inf ˆR γ R k n β, 0, γ, ), β B. 2) α R k 1 ˆR n β, α, 0, ) inf α R k 1 ˆRn β, α, 0, ) Because the numerator and denominator of the ratio on the right-hand side of 2) are continuous in β, sup α R k 1 ˆQn β, α, ) is continuous in β in all realizations, whenever ˆR n β, α, 0, ) > 0 for every β, α) B R k1. The continuity of sup α R k 1 ˆQn β, α, ) in β is also satisfied when ˆR n β, α, 0, ) can touches zero, because ˆQ n β, α, ) = 1 in such case. Thus, given the compactness of B, ρ τ -IV estimator ˆβ n of β 0 exists by the standard result on the existence of extremum estimators such as Gallant and White 1988, Theorem 2.2). 6

7 Further, ˆα n is the solution of inf α R k 1 ˆR n ˆβ n, α, 0, ) = inf n 1 α R k 1 n ρ τ y t Y t ˆβ n ) Z t,1α). That is, it is the Koenker and Bassett s 1978) quantile regression estimator taking y t Y t ˆβ n ) for the dependent variable and Z t,1 for the regressors, which surely exists. Theorem 2.1: Given Assumption 1, the ρ τ -IV estimator exists. Remark. We could avoid the compactness requirement of B by first defining the ρ τ -IV directional estimator, as Sakata 2007) does, and then deriving the slope estimator in Definition 1 from it. We, however, directly define the slope estimator by imposing compactness on B for saving space in this paper. 3 Computation of the ρ τ -IV Estimator We could calculate the ρ τ -IV estimator, adapting the algorithm described in Sakata 2007) for the case τ = 0.5 in the straightforward manner. Sakata s algorithm is, however, slow if k 1 is large, because it uses a global search algorithm to minimizes ˆQ n over B R k1. Given a β, however, the ratio on the right-hand side of 2) can be quickly calculated, because the minimization problems appearing in both the numerator and denominator of the ratio can be rewritten as linear programming problems, as Koenker and Bassett 1978) explains. Thus, the ρ τ -IV estimator can be calculated by maximizing the ratio in terms of β over B. Because the ratio may have local maximum, it is advisable to use a global search algorithm such as the simulated annealing algorithm in calculating ˆβ n, while ˆα n is the solution of the minimization problem in the denominator calculated with ˆβ n. 4 Large Sample Properties of the ρ τ -IV Estimator In investigating the consistency of the ρ τ -IV estimator, it is convenient to consider the population counterpart of 2), i.e., inf γ R k Rβ, 0, γ) sup Qβ, α) = sup α R k 1 α R k 1 Rβ, α, 0) = inf γ Rk Rβ, 0, γ), β B. 3) inf α R k 1 Rβ, α, 0) 7

8 By Assumption 2, β sup α R k 1 Qβ, α) : B R is a continuous function uniquely maximized at β 0. We can also show that {inf α R k 1 ˆQn β, α, )} n N converges to inf α R k 1 Qβ, α) uniformly in β B a.s.-p Lemma A.3). By a standard result on consistency of extremum estimators e.g., Pötscher and Prucha 1991, Lemma 4.2), we can establish the consistency of { ˆβ n } n N for β 0. The estimator ˆα n, on the other hand, minimizes ˆR n ˆβ n, α, 0, ) with respect to α over R k1. Given the strong consistency { ˆβ n } for β 0, we can verify the a.s.-p convergence of ˆR n ˆβ n, α, 0, ) to Rβ 0, α, 0) for each α R ki and utilize the convexity of ˆR n ˆβ n, α, 0, ) in α to establish the strong consistency of ˆα n for α 0. Theorem 4.1: Under Assumptions 1 and 2, {ˆθ n = ˆβ n, ˆα n) } n N converges to θ 0 = β 0, α 0). In establishing the asymptotic normality of the ρ τ -IV estimator, we impose the additional conditions stated in the next theorem. Assumption 3: a) The minimizer of Rβ 0, α 0, ) : R k R over R is unique hence, it is uniquely minimized at the origin). b) The vector β 0 is interior to B. Also, a neighborhood B 0 B of β 0, a neighborhood A 0 R k1 of α 0, and a neighborhood Γ 2,0 R k2 of the origin satisfy the following conditions: i) The conditional distribution y 1 given Y 1 and Z 1 has a continuous probability density function pdf) f Y 1, Z 1 ) at Y 1β + Z 1,1α + Z 1,2γ 2 for each β, α, γ 2 ) B 0 A 0 Γ 2,0 a.s.-p. ii) There exists a random variable D : Ω R with a finite absolute moment such that for each β B 0, each α A 0, and each γ 2 Γ 2,0, fy 1β + Z 1,1α + Z 1,2γ 2 Y 1, Z 1 ) Y Z 1 2 ) < D. 4) c) Let J be the Hessian of R at β 0, α 0, 0 k 1 ) and partition it as J ββ J βα J βγ J J αβ J αα J αγ, J γβ J γα J γγ where J ββ is g g, J αα is k 1 k 1, and J γγ is k k. Then the matrix J θθ J ββ J αβ J βα J αα 8

9 is positive definite, and J γθ J γβ, J γα ) is of full column rank. d) E[ Y Z 1 2 ] <. Assumption 3b) ensures the twice continuous differentiability of R in a neighborhood of β 0, α 0, 0) in R g R k1 R k, which then implies the twice continuous differentiability of Q in a neighborhood of θ 0 = β 0, α 0). The first condition in Assumption 3c) ensures that the Hessian of Rβ 0, α 0, ) : R k R at its minimum is negative definite. Under these conditions, the Hessian of β, α) log Qβ, α) : B R k1 R at β 0, α 0) is guaranteed to be positive definite, being equal to K, where K Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ. The full column rankness of J γβ means that within a neighborhood of β 0, α 0), moving β, α ) away from β 0, α 0) causes the gradient of Rβ, α, ) : R k R to be bounded away from zero uniformly in all directions, so that we can choose γ to make Rβ, α, γ) smaller than Rβ, α, 0), once β, α ) deviates from β 0, α 0). Assumption 3d) ensures that the Lindeberg-Levy Central Limit Theorem Rao 1973, p. 127) applies to the generalized score of the ρ τ -IV estimator. The moment requirements in Assumptions 3b,d) are mild. If fy 1 Y 1β Z 1,1α Z 1γ Y 1, Z 1 ) is bounded, they merely require that each element of Y 1 and Z 1 has a finite second moment, while the asymptotic normality of the conventional IV estimator is typically established under the assumption that the fourth moments of the dependent variable, the regressors, and the instruments are finite. Lemma 4.2: Suppose that Assumptions 1 3 hold. Let {b n } n N and {a n } n N be sequences of B- and R k1 - valued random vectors, respectively. Then there exists a sequence of k 1 random vectors c n such that for each n N, ˆRn b n, a n, c n, ) = inf γ R k ˆR n b n, a n, γ, ). If, in addition, Assumptions 3 hold, and b n β 0 and a n α 0 in probability-p, then n 1/2 c n = Cn 1/2 b n β 0 + J 1 a n α 0 and C J 1 γγ J γθ. γγ n 1/2 n τ 1U t < 0)) Z t + o P n 1/2 b n β 0 + n 1/2 a n α ), Using this lemma, we can now approximate log ˆQ n. Lemma 4.3: Suppose that Assumptions 1 3 hold and let {b n } n N and {a n } n N be sequences of B- and 9

10 R k1 -valued random vectors, respectively, that converge to β 0 and α 0. Write θ n b n, a n). Then n log ˆQ 1 n ) n ) n b n, a n, ) = n 1/2 τ 1U t < 0)) Z t Jγγ 1 n 1/2 τ 1U t < 0)) Z t 2Rβ 0, α 0, 0) 1 n Rβ 0, α 0, 0) n 1/2 τ 1U t < 0)) Z tcn 1/2 θ n θ 0 ) 1 2 n1/2 θ n θ 0 ) Kn 1/2 θ n θ 0 ) + o P n 1/2 b n β 0 + n b n β ). 5) Given this lemma, it is natural to expect that the minimizer of the the second and third terms on the righthand side of 5) approximates ˆθ = ˆβ n, ˆα n). The next theorem confirms that such approximation bears an o P 1) approximation error, and derives the asymptotic distribution of {ˆθ n } n N based on the approximation. and Theorem 4.4: Suppose that Assumptions 1 3 hold. Then n 1/2 1 ˆθ n θ 0 ) = Rβ 0, α 0, 0) K 1 C n 1/2 D 1/2 n 1/2 ˆθ n θ 0 ) A N0, I l ), n 1U t < 0) τ) Z t + o P 1), where D K 1 C V CK 1, K = Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ as introduced earlier), and V τ1 τ)rβ 0, α 0, 0) 2 E[Z 1 Z 1]. To estimate the asymptotic covariance matrix D consistently, we need to estimate V, K, and C consistently. For consistent estimation of V, we can use its sample analogue, ˆV n τ1 τ) ˆR n ˆβ n, ˆα n, 0, ) 2 n 1 n Z t Z t. On the other hand, K and C are more complicated, depending on J, the Hessian of R. The Hessian of ˆR n is zero at each point in B R k1 R k, at which it is differentiable. This rules out estimation of J by using of the Hessian of ˆR n. A way to overcome the difficulty in estimation of K and C is to employ the numerical differentiation approach described in Newey and McFadden 1994, Section 7.3). Because K is the Hessian of β, α) log Qβ, α) : B R k1 R at β 0, α 0), 1 times a second-order numerical derivative of log ˆQβ, α, ) at ˆβ n, ˆα n) is our estimator of K. Let e m i denote the unit vector along the ith axis of the Cartesian coordinate system in R m. Assume: 10

11 Assumption 4: The sequence {h n } n N consists of positive possibly random) numbers such that h n 0 and n 1/2 h n in probability-p. Then our estimator of K is ˆK n, whose i, j)-element is given by ˆK nij 1 4h 2 log ˆQ n ˆθ n + h n e i + h n e j, ) log ˆQ n ˆθ n h n e i + h n e j, ) n log ˆQ n ˆθ n + h n e i h n e j, ) + log ˆQ n ˆθ n h n e i h n e j, )), i, j) {1,..., g + k 1 )} 2, n N. For C, we utilize the result of Lemma 4.2, which suggests that perturbation in ˆθ n = ˆβ n, ˆα n) would change ˆγ n approximately by C times the change in ˆθ n. Let ˆγ ni θ) denote the ith element in the usual quantile regression estimator in regression of y t Y t, Z t,1)θ on Z t i {1, 2,..., k}). Then our estimator of C is Ĉn whose i, j)-element is given by Ĉ nij 1 2h n ˆγ ni ˆθ n + h n e j ) ˆγ ni ˆθ n h n e j )), i {1,..., k}, j {1,..., g + k 1 )}. Given the estimators of K and C, we estimate D by ˆD n ˆK + n Ĉ n ˆV n Ĉ n ˆK+ n, n N, where ˆK + n is the Moore-Penrose MP) inverse of ˆK n we use the MP inverse instead of the regular inverse to ensure that this estimator is well defined for every realization). Theorem 4.5: Suppose that Assumptions 1 4 hold. Then: a) { ˆK n } n N is weakly consistent for K. b) {Ĉn} n N is weakly consistent for C. c) { ˆV n } n N is weakly consistent for V. d) { ˆD n } n N is weakly consistent for D. Remark. The same step size h n is used in each element of ˆK n and Ĉn just for simplicity. One could use a different step size for each element in ˆK n and Ĉn without affecting the consistency results in Theorem 4.5, as long as the step size satisfies the requirements in Assumption 4. 11

12 5 Testing on the Regression Coefficients under Possible Weak Identification When β, α) log Qβ, α) is flat in some directions from β 0, α 0 ), compared with the size of the error in approximating log Q by log ˆQ n, the large sample distribution of the ρ τ -IV estimator established in Section 4 can be unreliable, because the estimator can easily go astray. In other words, we may experience the so-called weak identification problem in the ρ τ -IV estimation. The flatness of β, α) log Qβ, α) described above implies near singularity of K, which is 1 times the Hessian of log Qβ, α). Because the large sample analysis in Section 4 involves the inverse of K, the near singularity of K makes the results in Section 4 unreliable unless the sample size is extremely large. To verify that the nearly singular K can arise in practice, suppose that Y t is related to Z t through Y t = Π 0 Z t + V t, t N, where Π 0 is a g k constant matrix, and V t is a g 1 zero-mean random vector independent from Z t. Let f U Z 1 ) denote the conditional pdf of U 1 given Z 1. Then, under our current assumption, we have that [ J θγ = 2E f U 0 Z) Y ] [ 1 Z 1 = 2E f U 0 Z 1 ) Π ] 0Z 1 Z 1 Z 1,1 If the last k 2 columns of Π 0 is close to zero, each of the first g rows of J θγ can be well approximated by a linear combination of the last k 1 rows of J θγ ; i.e., the columns of J θγ becomes nearly dependent. This causes K = Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ to be nearly singular and raises concerns about inference on β 0 and α 0, relying on the asymptotics in Section 4. Z 1,1 Suppose that we are interested in the hypothesis that H 0 : β 0 = β, where β is a known g 1 constant vector. In the usual IV regression based on the zero-conditional mean restriction imposed on the error term, the AR test is known to be robust to weakness of instruments. Given the structural equation estimated under the constraint of the null hypothesis, the AR test regresses the null-restricted fitted structural error term on all instruments and checks if R 2 is close to zero. If R 2 is high enough, it rejects the null hypothesis. Because the AR test rejects the null when 1 R 2 is close to zero, we can view the AR test as rejecting the null hypothesis when the null-restricted fitted structural error term can be well explained by the instruments. Note that 1 R 2 is equal to the ratio of the two sample second moments. The denominator in the ratio 12

13 is the sample second moment of the fitted structural error term, while the numerator is the sample second moment of the residuals in regression of the structural error term on the instruments. This view gives us a way to adapt Anderson and Rubin s 1949) approach in our problem setup. Namely, we replace the sample second moment in 1 R 2 with the corresponding average check functions. The resulting statistic is ˆQ n β, ˆα 0 n, ), where ˆα 0 n is the ρ τ -IV estimator obtained imposing the constraint of H 0, which is exactly equal to the Koenker and Bassett s 1978) estimator in regression of y t Y t β on Z 1,t. For convenience, we take the logarithm of it and multiply it by 2n to define a test statistic J n. J n 2n log ˆQ n β, ˆα n, 0 ) = 2n log inf ˆR γ R k n β, 0, γ, ) inf α R k 1 ˆRn β,, n N. 6) α, 0, ) Let ᾱ be a k 1 1 vector such that Z 1,1ᾱ be the ρ τ -metric projection of y 1 Y 1 β on the linear space spanned by the elements of Z 1,1. Then the standard large sample analysis on extremum estimation shows that n 1 J n 2 sup log Q β, α) = 2 log Q β, ᾱ) in probability-p. α R k 1 Under H 0, the right-hand side of this equality is zero, because sup Q β, α) = sup Qβ 0, α) = Qβ 0, α 0 ) = 1. α R k 1 α R k 1 Under the alternative, on the other hand, the limit of {n 1 J n } n N is strictly positive, because Q β, ᾱ) < Qβ 0, α 0 ) = 1. Thus, a test based on J n should reject H 0 if J n exceeds a suitably chosen critical value. We will discuss how to find the critical value below. Define C 0 J 1 γγ J γα and L Rβ 0, α 0, 0) 1 J αα. Lemma 5.1: Suppose that Assumptions 1 3 hold. If in addition H 0 is true, then J n A η L 1 C 0 C 0 L C 0 ) 1 C 0 ) η, where η is a k 1 random vector distributed with N0, V ), Thus, {J n } n N has a non-degenerate limiting distribution, though it is not asymptotically pivotal. Among the unknown parameters in the formula for the asymptotic distribution of {J n } n N, C 0 can be consistently 13

14 estimated by applying {Ĉn} n N under the null Theorem 4.5). Write ˆθ 0 n β, ˆα 0 n ). Then our estimator of C 0 is Ĉ0 n whose i, j)-element is equal to Ĉ 0 nij 1 2h n ˆγ ni ˆθ 0 n + h n e g+j ) ˆγ ni ˆθ 0 n h n e g+j )), i {1,..., k}, j {1,..., k 1 }. Analogously, V can be estimated by ˆV 0 n ˆR n ˆβ 0 n, ˆα 0 n, 0, ) 2 n 1 n Z tz t. The matrix L is the Hessian of γ log Rβ 0, α 0, γ) : R k R at the origin. We take a second-order numerical derivative of the sample counterpart of this function to estimate L. The resulting estimator ˆL n of L is the k k matrix with i, j)-element equal to ˆL nij 1 4h 2 log ˆR n β, ˆα n, 0 ˆγ n 0 + h n e i + h n e j, ) log ˆR n β, ˆα n, 0 ˆγ n 0 h n e i + h n e j, ) n log ˆR n β, ˆα 0 n, ˆγ 0 n + h n e i h n e j, ) + log ˆR n β, ˆα 0 n, ˆγ 0 n h n e i h n e j, )), i, j = 1, 2,..., k, where ˆγ 0 n is the τ-quantile regression estimator in regressing y t Y t β Z 1,t ˆα 0 n on Z t. for L. Lemma 5.2: Suppose that Assumptions 1 3 and 4 hold. If in addition H 0 holds, {ˆL 0 n} n N is consistent The limiting distribution of {J n } n N is that of a positive random variable whose distribution function is positively sloped at each positive point. Let cp, C, L, Ṽ ) denote the 1 α)-quantile of η L + C C L C) + C ) η for each k l matrix C, each k k symmetric matrix L, and each k k symmetric matrix Ṽ, where η is a k 1 random vector distributed with N0, Ṽ ), and p 0, 1), where a, b) denotes the open interval between real numbers a and b. We here propose a test that rejects H 0 if and only if J n exceeds cp, Ĉn, ˆL n, ˆV n ), where p is the desired size of the test. This test has the correct asymptotic size and it is consistent, as stated in the next theorem. Theorem 5.3: Suppose that Assumptions 1 4 hold. Then: a) If in addition H 0 holds, for each p 0, 1), P [J n > cp, Ĉ0 n, ˆL 0 n, ˆV 0 n )] p. b) Suppose instead that H 0 is violated, that R β,, 0) : R k1 R has a unique maximizer on R k1, and that R β, 0, ) : R k R has a unique minimizer on R k. Then for each p 0, 1), P [J n > cp, Ĉ0 n, ˆL 0 n, ˆV 0 n )] 1. 14

15 Because each quadratic form of normal random variables can be easily rewritten as a linear combination of χ 2 random variables using the eigenvalue decomposition, cp, Ĉn, ˆL n, ˆV n ) is the 1 α)-quantile of a linear combination of χ 2 random variables. To compute cp, Ĉn, ˆL n, ˆV n ), we can numerically find the 1 α)-quantile of the distribution of the linear combination, evaluating the distribution function by using Farebrother s 1984) algorithm. 6 Power under weak instruments According to Theorem 5.3b), our test proposed in the previous section is consistent in the regular asymptotic framework with strong instruments. In this section, we discuss the power properties of the test when the instruments are weak. For this purpose, we need a model describing how weak instruments arise in our problem. Before formalizing the notion of weak instruments in our problem setup, we first review the concept of weak instruments in the conventional IV regression. Staiger and Stock 1997) introduces weak instruments in a thought experiment in which the correlation between the endogenous regressors and the instruments becomes weaker as the sample size grows. More concretely, they relate the k 1 instrument vector Z t to the g 1 endogenous regressor vector Y n) t through Y n) = n 1/2 ΛZ t + V t, t {1, 2,..., n}, n N where Λ is a g k constant matrix, and V t is a unobservable g 1 random vector such that Z t is exogenous to V t. The superscript n) in Y n) t thought experiment is then indicates dependence of Y n) t on n. The structural equation in the y n) t = Y n) t β 0 + Z t,1α 0 + U t, t {1, 2,..., n}, n N, where Z t,1 is a k 1 1 subvector of Z t, and the regression error U t is orthogonal to Z t. In this setup, Staiger and Stock investigates the asymptotic behavior of tests of the hypothesis that H 0 : β 0 = β, where β is a known constant in R g. Define W t U t V t β β 0 ) t N). residual, i.e., the residual evaluated with coefficients β, α 0) is equal to Then it is straightforward to verify that the null-restricted y n) t Y n) t β Z t,1 α 0 = W t Z tn 1/2 Λ β β 0 ). 7) 15

16 Because E[Z t W t ] = 0, 8) it follows that E[Z t y t Y t β Z t,1α 0 )] = E[Z t Z t]n 1/2 Λ β β 0 ). Thus, the null restricted residual violates the moment condition underlying the conventional IV estimator, but only in the order of n 1/2. This is the essential feature of the setup that Staiger and Stock used to demonstrate that the behavior of the conventional tests of H 0 may be very different from what the conventional asymptotic analysis indicates, and why the AR test can be a better choice. Note that, while the fact that the null-restricted residual violates the moment condition in the order of n 1/2 hinges on 7) and 8), it does not matter for it what W t is or where Λ comes from. Also, note that there is no natural universally agreeable reduced-form equation in our setup, unlike the conventional IV regression setup. In analyzing our test of H 0 with weak instruments, we therefore take as basis 7) and 8) suitably modified for constructing an environment with weak instruments in our setup, as found in the next assumption. Assumption 5: The triangle array {X n) t y n) t, Y n) t, Z t,1, Z t,2 ) : t {1, 2,..., n}, n N} consists of random vectors on a probability space Ω, F, P ), where y n) t, Y n) t Z t,1, and Z t,2 are 1 1, g 1, k 1 1, and k 2 1, respectively; β is a constant vector in B that is a nonempty and compact subset of R g ; and τ is a known constant in 0, 1). There exists β 0 B, a g k matrix Λ, ᾱ R k1, and a sequence of random variables {W t } t N that satisfy that y n) t Y n) t β Z t,1 ᾱ = W t Z tn 1/2 Λ β β 0 ), t {1, 2,..., n}, n N, 9) and that for each t N, τ-quantile of W t given Z t is zero. In Assumption 5, β 0 appears as some vector satisfying the required condition, rather than the true coefficient of Y n) t, because our mathematical results do not depend on what β 0 is. Of course, our results are most useful when Assumption 5 holds with β 0 set equal to the true true coefficient of Y n) t. If β 0 = β, the conditional quantile restriction imposed upon {W t } t N is essentially the same as Assumption 2. The 16

17 equivalence of the two conditions can be achieved by setting ᾱ = α 0 and W 1 = U 1, in particular when we require that {Z t, W t)} t N is i.i.d., as we will do below. When β 0 β, the assumption implies that the conditional τ-quantile of the null-restricted residual given Z 1 is local-to-zero. In general, the distribution of W t depends on β β 0. Assumption 5 is clearly satisfied, if U t, V t ) is independent from Z 1 in the setup of Staiger and Stock 1997) discussed above. The matrix Λ captures the strength of the instruments. For example, the instruments are irrelevant when Λ = 0. In addition to Assumption 5, we impose the following conditions similar to Assumption 3: Assumption 6: a) Eρ τ W 1 Z 1γ) is uniquely minimized at γ = 0 k 1. b) A neighborhood Γ 0 R k of the origin satisfies the following conditions: i) The conditional distribution W 1 given Z 1 denoted by F Z 1 ) has a pdf f Z 1 ) at Z 1γ for each γ Γ 0 a.s.-p. ii) There exists a random variable D : Ω R with a finite second moment such that for each γ Γ 0, fz 1γ Z 1 ) Z 1 2 < D a.s.-p. c) J γγ = 2 R β, ᾱ, 0 k 1 )/ γ γ is positive definite, and J γα = 2 R β, ᾱ, 0 k 1 )/ γ α is of full column rank. d) E[ Z 1 2 ] <. e) {W t, Z t) : t = 1,..., n} are independent and identically distributed. The following theorem describes the asymptotic distribution of J n in the case of fixed alternatives β β 0 is a fixed vector) and the weak IVs design assumed in Assumption 5. Theorem 6.1: Suppose that Assumptions 5 and 6 hold. Then J n A Eρτ W 1 )) 1 η + J γγ Λ β β 0 ) ) J 1 γγ C 0 C 0 J γγ C 0 ) 1 C 0 ) η + J γγ Λ β β 0 ) ), where η is a k 1 random vector distributed with N 0, τ1 τ)e[z 1 Z 1] ). In the case of weak instruments and under fixed alternatives, the asymptotic distribution of J n is a noncentral mixed-χ 2 random variable. The power of the test that rejects H 0 : β = β 0 when J n > cα, Ĉ0 n, ˆL 0 n, ˆV 0 n ) 17

18 depends on the magnitude of the the non-centrality parameter given by β β0 ) Λ Jγγ J γγ C 0 C 0 J γγ C 0 ) 1 C 0 J γγ ) Λ β β0 ), where J γγ J γγ C 0 C 0 J γγ C 0 ) 1 C 0 J γγ is a positive definite matrix by Assumption 6c). Under H 0, β β 0 = 0 and the test rejects asymptotically with probability α. Thus, the test has correct size regardless of the strength of the instruments. Under the fixed alternatives, the asymptotic rejection probability depends on the distance between β and β 0 and the strength of the instruments Λ. For example, the test has no power when the instruments are irrelevant and Λ = 0. The test also lacks power in certain directions if Λ 0 however its rank is less than g. Appendix A Mathematical Proofs Given Assumption 1, write ξ ρτ E[ρ τ ξ)] for each ξ L 1 Ω, F, P ). Then ρτ is a pseudo norm on L 1 Ω, F, P ). Using ρτ, R can be written as Rβ, α, γ) = y 1 Y 1β Z 1,1α Z 1γ ρτ, β, α, γ) B R k1 R k. It follows that the minimization in the numerator of the ratio on the right-hand side of 3) is the ρτ -metric projection of y 1 Y 1β on Z 1, while the minimization in the denominator is the ρτ -metric projection of y 1 Y 1β on Z 1,1. The norm ρτ equivalent topologies, because 1 ξ ρτ ξ 1 min{τ, 1 τ} ξ ρ τ is closely related to the L 1 norm 1. They actually generate the An important implication of the equivalence is that ξ ρτ = 0 if and only if ξ 1 = 0. Our analysis uses the equivalence of the two norms, mostly without mentioning it explicitly. We show below that {sup α R k 1 ˆQn β, α, )} n N converges to sup α R k 1 Qβ, α) uniformly in β on the compact set B. We can then conclude that { ˆβ n } n N is consistent for β 0, because β 0 is the unique maximizer of β sup α R k 1 Qβ, α) on B. Once the consistency of ˆβ n is established, we can also prove that {ˆα n } n N converges a.s.-p to α 0, at which Rβ 0,, 0) : R k1 R is minimized, by utilizing the convexity of ˆR n ˆβ n, α, 0, ) in α and the pointwise convergence of { ˆR n ˆβ n, α, 0, )} n N to Rβ 0, α, 0) for each α. We first establish a few lemmas. For later conveniences, some lemmas have more generality than we need for proving Theorem 4.1. The generality will be useful in our proof of

19 Lemma A.1: Suppose that Assumptions 1 holds. Then for each β B, and inf ˆRn β, 0, γ, ) inf Rβ, 0, γ) 0 a.s.-p, γ R k γ R k inf α R k 1 ˆR n β, α, 0, ) inf α R k 1 Rβ, α, 0) 0 a.s.-p. Proof of Lemma A.1: The two convergence results can be proved in similar manners. We only prove the first one. Let β be an arbitrary point in B. Then the ρτ -metric projection of y 1 Y 1β on the linear subspace spanned by Z 1 exists and is in general a compact set. By the linear independence of Z 1 Assumption 1), this further means that Γ 1 arg min γ R k Rβ, 0, γ) is compact. It follows that there exists a closed ball Γ 2 containing Γ 1 in its interior. Now fix a point γ 1 in Γ 1. By the Kolmogorov law of large numbers Rao 1973, p. 115), { ˆR n β, 0, γ 1, )} n N converges to Rβ, 0, γ 1 ) a.s.-p. Also, by Jennrich s uniform law of large numbers Jennrich 1969, Theorem 2), { ˆR n β, 0, γ, ) Rβ, 0, γ)} n N converges to zero uniformly in γ on the boundary Γ 2 of Γ 2 a.s.-p. Because ˆR n β, 0, γ, ) is convex as a function of γ, and Rβ, 0, γ 1 ) < inf Rβ, 0, γ), γ Γ 2 it follows from the above-mentioned facts that ˆR n β, 0, γ 1, ) < inf ˆRn β, 0, γ, ) γ R k \Γ 2 for almost all n N a.s.-p. On the other hand, by Jennrich s uniform law of large numbers, { ˆR n β, 0, γ, ) Rβ, 0, γ)} n N converges to zero uniformly in γ Γ 2 a.s.-p, so that inf ˆRn β, 0, γ, ) Rβ, 0, γ 1 ) a.s.-p. γ Γ 2 The desired result therefore follows. Lemma A.2: Suppose that Assumptions 1 holds. Then sup inf ˆRn β, 0, γ, ) inf Rβ, 0, γ) 0 a.s.-p, γ R k γ R k β B and sup inf β B α R k 1 ˆR n β, α, 0, ) inf α R k 1 Rβ, α, 0) 0 a.s.-p. 19

20 Proof of Lemma A.2: We only prove the first convergence result, as the second one can be shown in an analogous manner. Because Lemma A.1 has shown the corresponding pointwise a.s. convergence, and B is compact, it suffices to show that the series in question is strongly stochastically equicontinuous Andrews 1992, Theorem 2). Let β 1 and β 2 be arbitrary points in B. Also, let g nj be Koenker and Bassett s 1978) estimator in τ-quantile regression of y t Y t β j on Z t, i.e., ˆR n β j, 0, g nj, ) = inf γ R k ˆRn β j, 0, γ, ), for j = 1, 2. Then we have that for each n N, ˆR n β 1, 0, g n1, ) ˆR n β 2, 0, g n2, ) = ˆR n β 1, 0, g n1, ) ˆR n β 1, 0, g n2, )) + ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, )) ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, ), where the inequality holds, because ˆR n β 1, 0, g n1, ) ˆR n β 1, 0, g n2, ) for each n N. We further have that ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, ) =n 1 It follows that for for each n N n n n 1 ˆR n β 1, 0, g n1, ) ˆR n β 2, 0, g n2, ) β 1 β 2 n 1 Analogously, we can also show that for each n N ˆR n β 2, 0, g n2, ) ˆR n β 1, 0, g n1, ) β 1 β 2 n 1 Thus, it holds that for each n N inf γ R k ρτ y t Y t β 1 Z tg n2 ) ρ τ y t Y t β 2 Z tg n2 ) ) Y t β 1 Y t β 2 β 1 β 2 n 1 n n Y t. Y t. n Y t. ˆRn β 2, 0, γ, ) inf ˆRn β 1, 0, γ, ) = ˆR n β 2, 0, g n2, ) ˆR n β 1, 0, g n1, ) β 1 β 2 n 1 γ R k n Y t. Because {n 1 n Y t } n N converges to E[ Y 1 ] a.s.-p by the Kolmogorov strong law of large numbers, the desired result follows by Andrews 1992, Lemma 2). 20

21 Lemma A.3: Suppose that Assumptions 1 and 2 hold. For each β B, {inf ˆQ α R k n β, α, )} converges to inf α R k Qβ, α) uniformly in β B a.s.-p Proof of Lemma A.3: Because the linear independence of the elements of X 1 = y 1, Y 1, Z 1) in Assumption 1 implies that for each β B, the distance between y 1 Y 1β and the ρτ -metric projection of y 1 Y 1β on the subspace spanned by Z 1 is positive, i.e., inf α R k 1 Rβ, α, 0) > 0. Because β inf α R k 1 Rβ, α, 0) : B R is continuous, it is bounded away from zero on B. The desired results from this fact and Lemma A.2, because r 1, r 2 ) r 1 /r 2 : R a, ) R is a Lipschitz function if a > 0. Lemma A.4: Suppose that Assumptions 1 and 2 hold. Let {b n } n N be a sequence of B-valued random vectors on Ω, F, P ) converging to β 0 a.s.-p in probability-p ). Let {a n } n N be sequences of k 1 1 vectors on Ω, F, P ) satisfying that for each n N, ˆR n b n, a n, 0, ) = inf α R k 1 ˆRn b n, α, 0, ). Then: a) Then a n α 0 a.s.-p in probability-p ). b) Let {c n } n N be a sequence of k 1 random vectors on Ω, F, P ) satisfying that for each n N ˆR n b n, a n, c n, ) = inf γ R k ˆR n b n, a n, γ, ). Then c n 0 a.s.-p in probability-p ), provided that the minimizer of Rβ 0, α 0, ) : R k R over R k is unique. Proof of Lemma A.4: We only prove the result for {a n } n N. The result for {c n } n N can be established in an analogous way. Suppose that b n β 0 a.s.-p. Then for each α R k1, { ˆR n b n, α, 0, )} n N converges to Rβ 0, α, 0) a.s.-p, because for each α R k1, { ˆR n β, α, 0, )} n N converges to Rβ, α, 0) uniformly in β B by Jennrich s uniform law of large numbers Jennrich 1969, Theorem 2). Further, we can apply Rockafellar 1970, Theorem 10.8) to show that the convergence is uniform in α over any compact subset of R k1, because for each n N, ˆR n b n, α, 0, ) is convex in α over R k1. Take an arbitrary compact subset A 1 of R k1 that contain α 0 in its interior. Then { ˆR n b n, α 0, 0, )} converges to Rβ 0, α 0, 0) a.s.-p ; { ˆR n b n, α, 0, )} converges to Rβ 0, α, 0) uniformly on α A 1 a.s.-p ; and Rβ 0, α 0, 0) < inf α A1 Rβ 0, α, 0), because α 0 is the unique minimizer of Rβ 0,, 0) on R k1 by Assumption 2. Because ˆR n b n, α, 0, ) is convex in α, it follows that ˆR n b n, α 0, 0, ) < inf α R k 1 \A 1 ˆRn b n, α, 0, ) 21

22 for almost all n N a.s.-p. That is, a n A 1 for almost all n N a.s.-p. Because A 1 is an arbitrary compact subset containing α 0 in its interior, this establishes the a.s.-p convergence of {a n } n N to α 0. The convergence of {a n } n N in probability in the current lemma immediately follows from the result of the a.s. convergence of {a n } n N by using the subsequence theorem. Proof of Theorem 4.1: By Assumption 2, β sup α R k Qβ, α) : B R is uniquely maximized at β 0. Because ˆβ n maximizes sup α R k ˆQ n β, α, ) with respect to β over the compact subset B, and {sup α R k ˆQ n β, α, )} n N converges to sup α R k Qβ, α) uniformly in β B a.s.-p, it follows by Pötscher and Prucha 1991, Lemma 4.2) that { ˆβ n } n N converges to β 0 a.s.-p. Further, applying Lemma A.4a) by setting b n = ˆβ n and a n = ˆα n establishes that the strong consistency of ˆα n for α 0. The result therefore follows. In proving Lemmas 4.2, 4.3 and Theorem 4.4, we use the following lemma. Lemma A.5: Suppose that Assumptions 1 3 hold, and let { d nj b nj, a nj, g nj ) : Ω B R k1 R k } n N be a sequence of random vectors that converges in probability-p to d 0 β 0, α 0, 0 1 k ), j = 1, 2. Then ˆR n b n2, a n2, g n2, ) ˆR n b n1, a n1, g n1, ) n = n 1 τ 1U t < 0)) X t d n2 d n1 ) d n2 d 0 ) J d n2 d 0 ) 1 2 d n1 d 0 ) J d n1 d 0 ) + o P n 1/2 d n2 d n1 + d n1 d d n2 d 0 2 )), 10) where X t Y t, Z t,1, Z t), t N. Proof of Lemma A.5: Define r : R R g+k1+k R g+k1+k R g+k1+k R by ry, x, d 1, d 2 ) 1 ) ρ τ y x d 2 ) ρ τ y x d 1 ) + τ 1y x d 0 < 0)) x d 2 d 1 ), d 2 d 1 y, x, d 1, d 2 ) R R g+k1+k R g+k1+k R g+k1+k, with the rule that devision by zero is zero. Also, following Pollard 1985), let ν n denote the standardized sample average operator such that for each function f : R R l+k R with E[ fy 1, X 1 ) ] < ν n f, ) = n 1/2 n fyt, X t ) E[fY 1, X 1 )] ), n N. 22

23 By the definition of r, we obtain that ˆR n d 2, ) ˆR n d 1, ) =Rd 2 ) Rd 1 ) l + n 1 n + n 1/2 d 2 d 1 ν n r,, d 1, d 2 ) τ 1U t < 0)) X t ) d 2 d 1 ) for each d 1, d 2 ) R l+k R l+k, where l is the gradient of R at β 0, α 0, 0 1 k ), which is equal to E[τ 1U 1 < 0)) X 1 ]. Taking the second-order Taylor expansion of Rd 1 ) and Rd 2 ) about d 0 on the right-hand side of this equality and replacing d 1 with d n1 and d 2 with d n2 in the resulting equality yields the desired result, if {ν n r,, d n1, d n2 )} n N converges to zero in probability-p. It thus suffices to show the convergence of {ν n r,, d n1, d n2 )} to zero in probability-p. It is straightforward to verify that ry 1, X 1, θ 1, θ 2 ) 2 X 1, from which it follows that [ ] E sup ry 1, X 1, d 1, d 2 ) 2 d 1,d 2) R g+k 1 +k R g+k 1 +k 4E[ X 1 2 ] <. Also, {r,, d 1, d 2 ) : d 1, d 2 ) R g+k1+k R g+k1+k } can be expressed as a sum of a fixed member of functions from a polynomial class. These facts imply that {ν n r,, d 1, d 2 )} n N is stochastically equicontinuous at d 0, d 0 ) Pollard 1985, pp ). Further, ry 1, X 1, d 1, d 2 ) 2 converges to zero as d 1, d 2 ) d 0, d 0 ) a.s.-p, and ry 1, X 1, d 1, d 2 ) 2 is dominated by 4 X 1 2 with a finite moment. It follows by the dominated convergence theorem that E[ry 1, X 1, d 1, d 2 ) 2 ] 0 as d 1, d 2 ) d 0, d 0 ). Now let {U n R g+k1+k R g+k1+k } n N be an arbitrary sequence of balls centered at d 0, d 0) that shrinks down to d 0, d 0). Then, as Pollard 1985, page. 309) explains, it follows from the above-mentioned facts that sup d1,d 2) U n ν n r,, d 1, d 2 ) 0 in probability-p. Thus, {ν n r,, d n1, d n2 )} converges to zero in probability-p, given that { d nj } n N converges to d 0 in probability-p, j = 1, 2. Lemma A.6: Let Ω, F, P ) be a probability space. Suppose that a sequence of random vectors {η n : Ω R m } n N and a sequence of random variables {ξ n : Ω R} n N satisfy that η naη n + ξ n 0 for each n N, where A is a positive definite m m symmetric matrix. Also, let {ζ n : Ω R} n N be a sequence of random variables. Suppose that ξ n = o P η n + η n 2 + ζ n ) as n. Then η n = o P ζ n 1/2 + 1) as n. We now prove Lemma 4.2. Proof of Lemma 4.2: The existence of {c n } follows immediately from the fact that the minimization of 23

24 ˆR n b n, a n, γ, ) in terms of γ is the ρτ -metric projection of y 1 Y 1b n Z 1,1a n, y 2 Y 2b n Z 1,2a n,..., y n Y nb n Z 1,na n ) on the space spanned by the rows of Z 1, Z 2,..., Z n ). To prove the second result, we first show that {c n } converges to 0 in probability-p, and then apply Lemmas A.5 and A.6. For each fixed γ R k, ˆR n β, α, γ, ) is convex in β and α. By the Kolmogorov strong law of large numbers and Hjort and Pollard 1993, Lemma 1), { ˆR n β, α, γ, )} n N converges to Rβ, α, γ) uniformly in β, α ) in each neighborhood of β 0, α 0) in probability-p. Because {b n, a n) } n N converges to β 0, α 0) in probability-p by the assumption, it follows that { ˆR n b n, a n, γ, )} n N converges to Rβ 0, α 0, γ) for each γ R k. Under Assumptions 1 3c), this fact implies by Hjort and Pollard 1993, Lemma 2) that {c n } converges to 0 in probability-p. We now set b n to both b n1 and b n2, c n to g n1, g n C b n β 0 n + Jγγ 1 n 1 τ 1U t < 0)) Z t. a n α 0 to g n2 in 10) and multiply the resulting equality by n to obtain that 0 n ˆR n b n, a n, g n, ) ˆR n b n, a n, c n, )) = 1 2 n1/2 c n g n) J γγ n 1/2 c n g n) + o P n 1/2 c n g n + n b n β n a n α n c n 2 + n g n 2) = 1 2 n1/2 c n g n) J γγ n 1/2 c n g n) + o P n 1/2 c n g n ) + n c n g n 2 + n b n β n a n α ), where the second equality holds because g n = O P b n β 0 + a n α 0 + 1) and c n = O P c n g n + g n ). The result follows from this inequality by Lemma A.6. Proof of Lemma 4.3: Let {c n } n N be as in Lemma 4.2. Note that the difference between { ˆR n b n, a n, c n, )} n N and { ˆR n b n, a n, 0, )} n N converges to zero in probability-p. Applying the delta method with this fact, we 24

25 obtain that n log ˆQ n b n, a n, ) = nlog ˆR n b n, a n, c n, ) log ˆR n b n, a n, 0, )) 1 = Rβ 0, α 0, 0) n ˆR n b n, a n, c n, ) ˆR n b n, a n, 0, )) 11) 1 2Rβ 0, α 0, 0) 2 n ˆR n b n, a n, c n, ) Rβ 0, α 0, 0)) Rβ 0, α 0, 0) 2 n ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) 2 + o P n ˆRn b n, a n, c n, ) Rβ 0, α 0, 0)) 2 + n ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) 2). We apply Lemma A.5 to each of the non-remainder terms on the right-hand side of this equality: n ˆR n b n, a n, c n, ) ˆR n b n, a n, 0, )) = 1 2 n1/2 c nj γγ n 1/2 c n + o P n 1/2 c n + n b n β n a n α c n 2) = 1 2 n1/2 c nj γγ n 1/2 c n + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ), and n 1/2 ˆR n b n, a n, c n, ) Rβ 0, α 0, 0)) = n 1/2 ˆR n b n, a n, c n, ) ˆR n β 0, α 0, 0, )) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) = n n 1 τ 1U t < 0)) Y t, Z t,1) ) n 1/2 θ n θ 0 ) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ), n 1/2 ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) = n 1/2 ˆR n b n, a n, 0, ) ˆR n β 0, α 0, 0, )) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) n ) = n 1 τ 1U t < 0)) Y t, Z t,1) n 1/2 b n β 0 ) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ). Substituting these into 11) and applying Lemma 4.2 yields the desired result. Proof of Theorem 4.4: θ n θ 0 Let 1 Rβ 0, α 0, 0) K 1 C n 1 n τ 1U t < 0)) Z t, n N, 25

26 and let b n and a n denote the vectors containing the first g elements and the remaining elements of θ n, respectively. Then, by Lemma 4.3, we have that 0 n log ˆQ n ˆβ n, ˆα n, ) n log ˆQ n b n, a n, ) = 1 2 n1/2 ˆθ n θ n) Kn 1/2 ˆθ n θ n) + o P n 1/2 ˆβ n β 0 + n 1/2 ˆα n α 0 + n ˆβ n β n ˆα n α ). The first result follows from this equality by Lemma A.6. For the second result, apply the central limit theorem CLT) for i.i.d. random vectors Rao 1973, p. 128) to show that {n 1/2 n τ 1U t < 0)) Z t } n N is asymptotically distributed with N0, Rβ 0, α 0, 0) 2 V ), and then apply the continuous mapping theorem. Proof of Theorem 4.5: To prove a), let { θ n} n N be as in the proof of Theorem 4.4 and {δ n } n N an arbitrary sequence of g + k 1 ) 1 random vectors that converges to the origin in probability-p. Recall that the expression consisting of the second and third terms on the right-hand side of 5) is minimized when b n, a n) = θ n, and that {n 1/2 ˆθ n θ n)} n N converges to zero in probability-p by Theorem 4.4. Using these facts with Lemma 4.3, we can show that n log ˆQ n ˆθ n, ) n log ˆQ n θ n, ) = o P 1) and n log ˆQ n ˆθ n + δ n, ) n log ˆQ n θ n, ) = 1 2 n1/2 ˆθ n θ n + δ n ) Kn 1/2 ˆθ n θ n + δ n ) + o P n 1/2 δ n + δ n ) = 1 2 n1/2 δ nkn 1/2 δ n + o P n 1/2 δ n + δ n ) 12) By taking each of τ n e i + τ n e j, τ n e i + τ n e j, τ n e i τ n e j, and τ n e i τ n e j for δ n in this equality and using the resulting equalities in the definition of ˆK nij i, j = 1, 2,..., l), we obtain that 4nτ 2 n ˆK nij = 4τ 2 nk ij + o P n 1/2 τ n + τ 2 n + 1 ). Dividing both sides of this equality by 4nτ 2 n and applying Assumption 4 yields the desired result. To prove b), let δ be an arbitrary g +k 1 ) 1 vector. By Lemma 4.2, we have that for each i = 1, 2,..., k and each j = 1, 2,..., l, ˆγ ni ˆθ n + τ n δ) ˆγ ni ˆθ n τ n δ) = 2τ n Cδ n + o P τ n ). It follows that 1 2τ n ˆγ n ˆβ n + τ n δ) ˆγ n ˆβ n τ n δ)) = Cδ + o P 1). Taking e j for δ for each j = 1, 2,..., k in this equality completes the proof. 26

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental