Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction

Size: px
Start display at page:

Download "Instrumental Variables Estimation and Weak-Identification-Robust. Inference Based on a Conditional Quantile Restriction"

Transcription

1 Instrumental Variables Estimation and Weak-Identification-Robust Inference Based on a Conditional Quantile Restriction Vadim Marmer Department of Economics University of British Columbia vadim.marmer@gmail.com and Shinichi Sakata Department of Economics University of Southern California shinichi.sakata@gmail.com August 17, 2011 Extending the L 1-IV approach proposed by Sakata 1997, 2007), we develop a new method, named the ρ τ -IV estimation, to estimate structural equations based on the conditional quantile restriction imposed on the error terms. We study the asymptotic behavior of the proposed estimator and show how to make statistical inferences on the regression parameters. Given practical importance of weak identification, a highlight of the paper is a proposal of a test robust to the weak identification. The statistics used in our method can be viewed as a natural counterpart of the Anderson and Rubin s 1949) statistic in the ρ τ -IV estimation. 1

2 1 Introduction In this paper, we develop a new method, named the ρ τ -IV estimation, to estimate structural equations based on the conditional quantile restriction imposed on the error terms, extending the L 1 -IV approach proposed by Sakata 1997, 2007). We study the large sample behavior of the new estimator and show how to make statistical inferences on the regression parameters. In particular, we pay attention to the statistical inference under weak identification, as the weak identification is as important a possibility in the regression based on a conditional quantile restriction as in that based on the conditional mean restriction. We propose a weak-identification-robust test that can be viewed as a natural counterpart of the Anderson and Rubin s 1949) statistic in ρ τ -IV estimation. The conventional instrumental variables IV) estimator is based on the identification of the structural parameters through the conditional mean restriction that the mean of the structural error term conditional on a set of instrumental variables is zero. The conditional mean restriction may look appealing, because, unlike the independence between the error term and the instruments, it does not impose restrictions on other features of the conditional distribution of the error term such as the variance of it. Nevertheless, the conditional mean restriction is considered unsuitable in some applications. The conditional mean of a random variable critically depends on the tails of the conditional distribution of the variable. A small change in the tails can cause a large change in the conditional mean. In many applications, on the other hand, we know little about the part of the population distribution that correspond to the tails of the error distribution. This often makes it difficult to justify the conditional mean restriction. The conditional mean restriction is not the only natural way to identify the parameters of structural equations. In many applications, the conditional mean restriction comes from an informal intuition that the location of the conditional distribution of the error term given a suitably chosen set of instruments should be constant. When we are faced by the above-mentioned concern about the conditional mean restriction, one would desire to capture the location of the conditional distribution of the error term by a measure that does not depend on tails. The conditional quantiles of the error term are examples of such location measures. Sakata 1997, 2007) proposes identifying and estimating the regression parameters based on the conditional median restrictions. Chernozhukov and Hansen 2001, 2006) also consider identification of the regression parameters based on the conditional quantile restrictions and propose an estimation method, tak- 2

3 ing an approach related to but different from Sakata s. In the current paper, we extend the estimator of Sakata 1997, 2007) to propose an method called the ρ τ -IV method to estimate regression models with the conditional τ-quantile restriction. Being based on the same identification condition, our estimator is closely related to Chernozhukov and Hansen s estimator. The computation burden of the two estimators are also comparable, as should be clear from the discussion in Section 3 of the current paper. A benefit of our approach is that the objective function to be maximized in the ρ τ -IV estimation takes a form similar to the variance ratio in the normal) limited information maximum likelihood LIML) estimation. This allows us to formulate a statistic analogous to the Anderson-Rubin AR) statistic, with which we can make weak-identification-robust inference on the regression parameters of interest. In the IV regression literature, many researchers have been paying attention to possible identification issues. Sargan 1983) points out that near violation of identifiability is problematic. The analysis of Phillips 1984, 1985) on the exact finite sample distribution of LIML clearly shows that lack of identifiability in structural equation estimation keeps the LIML estimator from consistently estimating the coefficient of the structural equation. Hillier 1990) also shows analogous results in considering the directional estimation of the coefficients of structural equations. Choi and Phillips 1992) further explores the behavior of the IV estimator under lack of identifiability. When instrumental variables are poorly correlated with endogenous explanatory variables in linear regression, the asymptotic distribution of the IV estimator is quite different from what the standard large sample theory suggests, as demonstrated by Nelson and Startz 1990b, 1990a) and Bound, Jaeger, and Baker 1995). Staiger and Stock 1997) propose an alternative way to approximate the distribution of the IV estimator with weak instruments. Stock and Wright 2000) then establish a way to approximate the distribution of generalized-method-moments GMM) estimators under weak identification. The proposed approximation methods are useful in theoretically studying the nature of the IV and GMM estimators under weak identification. Nevertheless, they do not offer a way to approximate the distribution of the estimators based on data, involving some unidentifiable nuisance parameters. Given the absence of a convenient and reliable approximation to the distribution of the IV estimator with weak instruments, it is difficult to perform tests of hypotheses on regression parameters in the usual style i.e., the t-test, the Wald test, etc.). On the other hand, the AR test originally proposed in Anderson 3

4 and Rubin 1949) is not affected by weakness of instruments. For this reason, Staiger and Stock 1997) and Dufour 1997) recommend the use of the AR test. The AR test even has nice power properties if the number of instruments is equal to the number of endogenous explanatory variables Moreira 2001, Andrews, Moreira, and Stock 2004). The weak identification is also an important possibility in regression based on a conditional quantile restriction. To this end, we propose a test that has asymptotically correct size regardless of whether the identification is strong or not. The hypothesis we consider is that some regression parameters are equal to prespecified values. If we apply the ρ τ -IV method imposing the constraints of the null hypothesis, the objective function in the ρ τ -IV estimation maximized subject to the parameter constraints of the null hypothesis tends to be close to one under the null. The constraint maximum of the objective function is similar to the Anderson and Rubin 1949) statistic in the sense that it captures how much of the fitted structural error can be explained by the instruments. It, ranging between zero and one, is closed to one if the fitted structural residuals cannot be fitted by the instruments in the sample. Its value far from one is thus taken as an evidence against the null in our test. If the conditions in the null hypothesis include the coefficients of all regressors potentially weakly related to the instruments excluded from the regression function, then the proposed test involves no weak identification problem, so that our test is robust to weakidentification. Our test is closely related to Chernozhukov and Hansen 2008). They formulate a test in a way convenient in the estimation framework of Chernozhukov and Hansen 2001, 2006), while we propose a test convenient in the ρ τ -IV estimation. Another paper related to our test is Jun 2008). Jun formulates a test adapting the approach of Kleibergen 2005). The rest of the paper is organized as follows. We first describe the basic setup and define the ρ τ -IV estimator in Section 2. Then, after briefly discussing the computation of the ρ τ -IV estimator in Section 3, we establish the consistency and asymptotic normality of the ρ τ -IV estimator and explains how to consistently estimate the asymptotic covariance matrix of ρ τ -IV estimator in Section 4. In Section 5, we develop a weakidentification-robust method to test hypotheses on the regression parameters. Throughout the paper, denotes the Euclidean norm for vectors and the Frobenius norm for matrices, and limits are taken along the sequence of sample sizes growing to infinity, unless otherwise indicated. 4

5 2 ρ τ -IV Estimator Assumption 1: Let Ω, F, P ) be a probability space. The data are a realization of an independently and identically distributed stochastic process {X t y t, Y t, Z t) : Ω R R g R k } t N such that E[ X 1 ] <, and for each c R 1+g+k \{0}, P [c X t = 0] < 1. Partition Z t as Z t = Z t,1, Z t,2). where Z t,1 is k 1 1, and Z t,1 is k 2 1 so that k 1 +k 2 = k). The parameter of interest is the coefficients in regression of y t on Y t and Z t,1 described in the next assumption. Assumption 2: The subset B of R g is nonempty and compact. There exists a unique θ 0 β 0, α 0) B R k1 such that the conditional τ-quantile of U 1 y 1 Y 1β 0 Z 1,1α 0 given Z 1 is zero, where τ is a known real constant in 0, 1). If instead the conditional τ-quantile of U 1 given Y 1 and Z 1,1 is known to be zero, β 0 and α 0 could be consistently estimated by the estimator of Koenker and Bassett 1978). In our current setup, Koenker and Bassett s estimator is inconsistent in general. We here propose an estimator of the structural regression coefficients, following the approach described in Section 11 of Sakata 2007). Define ρ τ : R R by ρ τ v) τ 1v < 0)) v, v R, where 1A) is the indicator function that becomes one if and only if the condition A is true. Also define functions R : B R k1 R k R and Q : B R k1 R by Rβ, α, γ) E[ρ τ y 1 Y 1β Z 1,1α Z 1γ)], β, α, γ) B R k1 R k and Qβ, α) inf γ Rk Rβ, α, γ), β, α) B R k1, Rβ, α, 0) where Rβ, α, 0) > 0 by the linear independence of the elements of X t = y t, Y t, Z t) required by Assumption 1. Because of the conditional τ-quantile restriction imposed on U 1 in Assumption 2, we have that Qβ 0, α 0 ) = 1, so that for each β, α) B R k1 0 Qβ, α) Qβ 0, α 0 ) = 1. 1) 5

6 It follows that θ 0 β 0, α 0) is the maximizer of Q over Θ B R k1. Our estimator is the maximizer of the sample counterpart of Q, which is given by a sequence of random functions { ˆQ n : B R k1 Ω R} n N defined by inf γ R k 2 ˆRnβ,α,γ,ω), if inf ˆR ˆQ n β, α, ω) nβ,α,0,ω) b,a) B R k 1 ˆRn b, a, 0, ω) > 0, 1, otherwise, β, α) B R k1, ω Ω, n N, where ˆR n β, α, γ, ω) n 1 n ρ τ y t ω) Y t ω) β Z t,1 ω) α Z t ω) γ), β, α, γ) B R k1 R k, ω Ω, n N. We now define our estimator. Definition 1 The ρ τ -IV estimator): Given Assumption 1, a sequence of random vectors {ˆθ ˆβ n, ˆα n) : Ω B R k1 } n N is called the ρ τ -IV estimator if for each n N, ˆQ n ˆβ n, ˆα, ) = sup β,α) B R k 1 ˆQn β, α, ). For each β, α) B R k1, we have that inf ˆRn β, α, γ, ) = inf n 1 γ R k γ 1,γ 2) R k 1 R k 2 = inf n 1 γ 1,γ 2) R k 1 R k 2 n n ρ τ y t Y t β Z t,1α + γ 1 ) Z t,2γ 2 ) ρ τ y t Y t β Z t,1γ 1 Z t,2γ 2 ) = inf γ R k ˆRn β, 0, γ, ). Given this fact, it holds that whenever ˆR n β, α, 0, ) > 0 for every β, α) B R k1, sup α R k 1 inf ˆQ n β, α, ) = sup ˆR γ R k n β, 0, γ, ) = inf ˆR γ R k n β, 0, γ, ), β B. 2) α R k 1 ˆR n β, α, 0, ) inf α R k 1 ˆRn β, α, 0, ) Because the numerator and denominator of the ratio on the right-hand side of 2) are continuous in β, sup α R k 1 ˆQn β, α, ) is continuous in β in all realizations, whenever ˆR n β, α, 0, ) > 0 for every β, α) B R k1. The continuity of sup α R k 1 ˆQn β, α, ) in β is also satisfied when ˆR n β, α, 0, ) can touches zero, because ˆQ n β, α, ) = 1 in such case. Thus, given the compactness of B, ρ τ -IV estimator ˆβ n of β 0 exists by the standard result on the existence of extremum estimators such as Gallant and White 1988, Theorem 2.2). 6

7 Further, ˆα n is the solution of inf α R k 1 ˆR n ˆβ n, α, 0, ) = inf n 1 α R k 1 n ρ τ y t Y t ˆβ n ) Z t,1α). That is, it is the Koenker and Bassett s 1978) quantile regression estimator taking y t Y t ˆβ n ) for the dependent variable and Z t,1 for the regressors, which surely exists. Theorem 2.1: Given Assumption 1, the ρ τ -IV estimator exists. Remark. We could avoid the compactness requirement of B by first defining the ρ τ -IV directional estimator, as Sakata 2007) does, and then deriving the slope estimator in Definition 1 from it. We, however, directly define the slope estimator by imposing compactness on B for saving space in this paper. 3 Computation of the ρ τ -IV Estimator We could calculate the ρ τ -IV estimator, adapting the algorithm described in Sakata 2007) for the case τ = 0.5 in the straightforward manner. Sakata s algorithm is, however, slow if k 1 is large, because it uses a global search algorithm to minimizes ˆQ n over B R k1. Given a β, however, the ratio on the right-hand side of 2) can be quickly calculated, because the minimization problems appearing in both the numerator and denominator of the ratio can be rewritten as linear programming problems, as Koenker and Bassett 1978) explains. Thus, the ρ τ -IV estimator can be calculated by maximizing the ratio in terms of β over B. Because the ratio may have local maximum, it is advisable to use a global search algorithm such as the simulated annealing algorithm in calculating ˆβ n, while ˆα n is the solution of the minimization problem in the denominator calculated with ˆβ n. 4 Large Sample Properties of the ρ τ -IV Estimator In investigating the consistency of the ρ τ -IV estimator, it is convenient to consider the population counterpart of 2), i.e., inf γ R k Rβ, 0, γ) sup Qβ, α) = sup α R k 1 α R k 1 Rβ, α, 0) = inf γ Rk Rβ, 0, γ), β B. 3) inf α R k 1 Rβ, α, 0) 7

8 By Assumption 2, β sup α R k 1 Qβ, α) : B R is a continuous function uniquely maximized at β 0. We can also show that {inf α R k 1 ˆQn β, α, )} n N converges to inf α R k 1 Qβ, α) uniformly in β B a.s.-p Lemma A.3). By a standard result on consistency of extremum estimators e.g., Pötscher and Prucha 1991, Lemma 4.2), we can establish the consistency of { ˆβ n } n N for β 0. The estimator ˆα n, on the other hand, minimizes ˆR n ˆβ n, α, 0, ) with respect to α over R k1. Given the strong consistency { ˆβ n } for β 0, we can verify the a.s.-p convergence of ˆR n ˆβ n, α, 0, ) to Rβ 0, α, 0) for each α R ki and utilize the convexity of ˆR n ˆβ n, α, 0, ) in α to establish the strong consistency of ˆα n for α 0. Theorem 4.1: Under Assumptions 1 and 2, {ˆθ n = ˆβ n, ˆα n) } n N converges to θ 0 = β 0, α 0). In establishing the asymptotic normality of the ρ τ -IV estimator, we impose the additional conditions stated in the next theorem. Assumption 3: a) The minimizer of Rβ 0, α 0, ) : R k R over R is unique hence, it is uniquely minimized at the origin). b) The vector β 0 is interior to B. Also, a neighborhood B 0 B of β 0, a neighborhood A 0 R k1 of α 0, and a neighborhood Γ 2,0 R k2 of the origin satisfy the following conditions: i) The conditional distribution y 1 given Y 1 and Z 1 has a continuous probability density function pdf) f Y 1, Z 1 ) at Y 1β + Z 1,1α + Z 1,2γ 2 for each β, α, γ 2 ) B 0 A 0 Γ 2,0 a.s.-p. ii) There exists a random variable D : Ω R with a finite absolute moment such that for each β B 0, each α A 0, and each γ 2 Γ 2,0, fy 1β + Z 1,1α + Z 1,2γ 2 Y 1, Z 1 ) Y Z 1 2 ) < D. 4) c) Let J be the Hessian of R at β 0, α 0, 0 k 1 ) and partition it as J ββ J βα J βγ J J αβ J αα J αγ, J γβ J γα J γγ where J ββ is g g, J αα is k 1 k 1, and J γγ is k k. Then the matrix J θθ J ββ J αβ J βα J αα 8

9 is positive definite, and J γθ J γβ, J γα ) is of full column rank. d) E[ Y Z 1 2 ] <. Assumption 3b) ensures the twice continuous differentiability of R in a neighborhood of β 0, α 0, 0) in R g R k1 R k, which then implies the twice continuous differentiability of Q in a neighborhood of θ 0 = β 0, α 0). The first condition in Assumption 3c) ensures that the Hessian of Rβ 0, α 0, ) : R k R at its minimum is negative definite. Under these conditions, the Hessian of β, α) log Qβ, α) : B R k1 R at β 0, α 0) is guaranteed to be positive definite, being equal to K, where K Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ. The full column rankness of J γβ means that within a neighborhood of β 0, α 0), moving β, α ) away from β 0, α 0) causes the gradient of Rβ, α, ) : R k R to be bounded away from zero uniformly in all directions, so that we can choose γ to make Rβ, α, γ) smaller than Rβ, α, 0), once β, α ) deviates from β 0, α 0). Assumption 3d) ensures that the Lindeberg-Levy Central Limit Theorem Rao 1973, p. 127) applies to the generalized score of the ρ τ -IV estimator. The moment requirements in Assumptions 3b,d) are mild. If fy 1 Y 1β Z 1,1α Z 1γ Y 1, Z 1 ) is bounded, they merely require that each element of Y 1 and Z 1 has a finite second moment, while the asymptotic normality of the conventional IV estimator is typically established under the assumption that the fourth moments of the dependent variable, the regressors, and the instruments are finite. Lemma 4.2: Suppose that Assumptions 1 3 hold. Let {b n } n N and {a n } n N be sequences of B- and R k1 - valued random vectors, respectively. Then there exists a sequence of k 1 random vectors c n such that for each n N, ˆRn b n, a n, c n, ) = inf γ R k ˆR n b n, a n, γ, ). If, in addition, Assumptions 3 hold, and b n β 0 and a n α 0 in probability-p, then n 1/2 c n = Cn 1/2 b n β 0 + J 1 a n α 0 and C J 1 γγ J γθ. γγ n 1/2 n τ 1U t < 0)) Z t + o P n 1/2 b n β 0 + n 1/2 a n α ), Using this lemma, we can now approximate log ˆQ n. Lemma 4.3: Suppose that Assumptions 1 3 hold and let {b n } n N and {a n } n N be sequences of B- and 9

10 R k1 -valued random vectors, respectively, that converge to β 0 and α 0. Write θ n b n, a n). Then n log ˆQ 1 n ) n ) n b n, a n, ) = n 1/2 τ 1U t < 0)) Z t Jγγ 1 n 1/2 τ 1U t < 0)) Z t 2Rβ 0, α 0, 0) 1 n Rβ 0, α 0, 0) n 1/2 τ 1U t < 0)) Z tcn 1/2 θ n θ 0 ) 1 2 n1/2 θ n θ 0 ) Kn 1/2 θ n θ 0 ) + o P n 1/2 b n β 0 + n b n β ). 5) Given this lemma, it is natural to expect that the minimizer of the the second and third terms on the righthand side of 5) approximates ˆθ = ˆβ n, ˆα n). The next theorem confirms that such approximation bears an o P 1) approximation error, and derives the asymptotic distribution of {ˆθ n } n N based on the approximation. and Theorem 4.4: Suppose that Assumptions 1 3 hold. Then n 1/2 1 ˆθ n θ 0 ) = Rβ 0, α 0, 0) K 1 C n 1/2 D 1/2 n 1/2 ˆθ n θ 0 ) A N0, I l ), n 1U t < 0) τ) Z t + o P 1), where D K 1 C V CK 1, K = Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ as introduced earlier), and V τ1 τ)rβ 0, α 0, 0) 2 E[Z 1 Z 1]. To estimate the asymptotic covariance matrix D consistently, we need to estimate V, K, and C consistently. For consistent estimation of V, we can use its sample analogue, ˆV n τ1 τ) ˆR n ˆβ n, ˆα n, 0, ) 2 n 1 n Z t Z t. On the other hand, K and C are more complicated, depending on J, the Hessian of R. The Hessian of ˆR n is zero at each point in B R k1 R k, at which it is differentiable. This rules out estimation of J by using of the Hessian of ˆR n. A way to overcome the difficulty in estimation of K and C is to employ the numerical differentiation approach described in Newey and McFadden 1994, Section 7.3). Because K is the Hessian of β, α) log Qβ, α) : B R k1 R at β 0, α 0), 1 times a second-order numerical derivative of log ˆQβ, α, ) at ˆβ n, ˆα n) is our estimator of K. Let e m i denote the unit vector along the ith axis of the Cartesian coordinate system in R m. Assume: 10

11 Assumption 4: The sequence {h n } n N consists of positive possibly random) numbers such that h n 0 and n 1/2 h n in probability-p. Then our estimator of K is ˆK n, whose i, j)-element is given by ˆK nij 1 4h 2 log ˆQ n ˆθ n + h n e i + h n e j, ) log ˆQ n ˆθ n h n e i + h n e j, ) n log ˆQ n ˆθ n + h n e i h n e j, ) + log ˆQ n ˆθ n h n e i h n e j, )), i, j) {1,..., g + k 1 )} 2, n N. For C, we utilize the result of Lemma 4.2, which suggests that perturbation in ˆθ n = ˆβ n, ˆα n) would change ˆγ n approximately by C times the change in ˆθ n. Let ˆγ ni θ) denote the ith element in the usual quantile regression estimator in regression of y t Y t, Z t,1)θ on Z t i {1, 2,..., k}). Then our estimator of C is Ĉn whose i, j)-element is given by Ĉ nij 1 2h n ˆγ ni ˆθ n + h n e j ) ˆγ ni ˆθ n h n e j )), i {1,..., k}, j {1,..., g + k 1 )}. Given the estimators of K and C, we estimate D by ˆD n ˆK + n Ĉ n ˆV n Ĉ n ˆK+ n, n N, where ˆK + n is the Moore-Penrose MP) inverse of ˆK n we use the MP inverse instead of the regular inverse to ensure that this estimator is well defined for every realization). Theorem 4.5: Suppose that Assumptions 1 4 hold. Then: a) { ˆK n } n N is weakly consistent for K. b) {Ĉn} n N is weakly consistent for C. c) { ˆV n } n N is weakly consistent for V. d) { ˆD n } n N is weakly consistent for D. Remark. The same step size h n is used in each element of ˆK n and Ĉn just for simplicity. One could use a different step size for each element in ˆK n and Ĉn without affecting the consistency results in Theorem 4.5, as long as the step size satisfies the requirements in Assumption 4. 11

12 5 Testing on the Regression Coefficients under Possible Weak Identification When β, α) log Qβ, α) is flat in some directions from β 0, α 0 ), compared with the size of the error in approximating log Q by log ˆQ n, the large sample distribution of the ρ τ -IV estimator established in Section 4 can be unreliable, because the estimator can easily go astray. In other words, we may experience the so-called weak identification problem in the ρ τ -IV estimation. The flatness of β, α) log Qβ, α) described above implies near singularity of K, which is 1 times the Hessian of log Qβ, α). Because the large sample analysis in Section 4 involves the inverse of K, the near singularity of K makes the results in Section 4 unreliable unless the sample size is extremely large. To verify that the nearly singular K can arise in practice, suppose that Y t is related to Z t through Y t = Π 0 Z t + V t, t N, where Π 0 is a g k constant matrix, and V t is a g 1 zero-mean random vector independent from Z t. Let f U Z 1 ) denote the conditional pdf of U 1 given Z 1. Then, under our current assumption, we have that [ J θγ = 2E f U 0 Z) Y ] [ 1 Z 1 = 2E f U 0 Z 1 ) Π ] 0Z 1 Z 1 Z 1,1 If the last k 2 columns of Π 0 is close to zero, each of the first g rows of J θγ can be well approximated by a linear combination of the last k 1 rows of J θγ ; i.e., the columns of J θγ becomes nearly dependent. This causes K = Rβ 0, α 0, 0) 1 J θγ J 1 γγ J γθ to be nearly singular and raises concerns about inference on β 0 and α 0, relying on the asymptotics in Section 4. Z 1,1 Suppose that we are interested in the hypothesis that H 0 : β 0 = β, where β is a known g 1 constant vector. In the usual IV regression based on the zero-conditional mean restriction imposed on the error term, the AR test is known to be robust to weakness of instruments. Given the structural equation estimated under the constraint of the null hypothesis, the AR test regresses the null-restricted fitted structural error term on all instruments and checks if R 2 is close to zero. If R 2 is high enough, it rejects the null hypothesis. Because the AR test rejects the null when 1 R 2 is close to zero, we can view the AR test as rejecting the null hypothesis when the null-restricted fitted structural error term can be well explained by the instruments. Note that 1 R 2 is equal to the ratio of the two sample second moments. The denominator in the ratio 12

13 is the sample second moment of the fitted structural error term, while the numerator is the sample second moment of the residuals in regression of the structural error term on the instruments. This view gives us a way to adapt Anderson and Rubin s 1949) approach in our problem setup. Namely, we replace the sample second moment in 1 R 2 with the corresponding average check functions. The resulting statistic is ˆQ n β, ˆα 0 n, ), where ˆα 0 n is the ρ τ -IV estimator obtained imposing the constraint of H 0, which is exactly equal to the Koenker and Bassett s 1978) estimator in regression of y t Y t β on Z 1,t. For convenience, we take the logarithm of it and multiply it by 2n to define a test statistic J n. J n 2n log ˆQ n β, ˆα n, 0 ) = 2n log inf ˆR γ R k n β, 0, γ, ) inf α R k 1 ˆRn β,, n N. 6) α, 0, ) Let ᾱ be a k 1 1 vector such that Z 1,1ᾱ be the ρ τ -metric projection of y 1 Y 1 β on the linear space spanned by the elements of Z 1,1. Then the standard large sample analysis on extremum estimation shows that n 1 J n 2 sup log Q β, α) = 2 log Q β, ᾱ) in probability-p. α R k 1 Under H 0, the right-hand side of this equality is zero, because sup Q β, α) = sup Qβ 0, α) = Qβ 0, α 0 ) = 1. α R k 1 α R k 1 Under the alternative, on the other hand, the limit of {n 1 J n } n N is strictly positive, because Q β, ᾱ) < Qβ 0, α 0 ) = 1. Thus, a test based on J n should reject H 0 if J n exceeds a suitably chosen critical value. We will discuss how to find the critical value below. Define C 0 J 1 γγ J γα and L Rβ 0, α 0, 0) 1 J αα. Lemma 5.1: Suppose that Assumptions 1 3 hold. If in addition H 0 is true, then J n A η L 1 C 0 C 0 L C 0 ) 1 C 0 ) η, where η is a k 1 random vector distributed with N0, V ), Thus, {J n } n N has a non-degenerate limiting distribution, though it is not asymptotically pivotal. Among the unknown parameters in the formula for the asymptotic distribution of {J n } n N, C 0 can be consistently 13

14 estimated by applying {Ĉn} n N under the null Theorem 4.5). Write ˆθ 0 n β, ˆα 0 n ). Then our estimator of C 0 is Ĉ0 n whose i, j)-element is equal to Ĉ 0 nij 1 2h n ˆγ ni ˆθ 0 n + h n e g+j ) ˆγ ni ˆθ 0 n h n e g+j )), i {1,..., k}, j {1,..., k 1 }. Analogously, V can be estimated by ˆV 0 n ˆR n ˆβ 0 n, ˆα 0 n, 0, ) 2 n 1 n Z tz t. The matrix L is the Hessian of γ log Rβ 0, α 0, γ) : R k R at the origin. We take a second-order numerical derivative of the sample counterpart of this function to estimate L. The resulting estimator ˆL n of L is the k k matrix with i, j)-element equal to ˆL nij 1 4h 2 log ˆR n β, ˆα n, 0 ˆγ n 0 + h n e i + h n e j, ) log ˆR n β, ˆα n, 0 ˆγ n 0 h n e i + h n e j, ) n log ˆR n β, ˆα 0 n, ˆγ 0 n + h n e i h n e j, ) + log ˆR n β, ˆα 0 n, ˆγ 0 n h n e i h n e j, )), i, j = 1, 2,..., k, where ˆγ 0 n is the τ-quantile regression estimator in regressing y t Y t β Z 1,t ˆα 0 n on Z t. for L. Lemma 5.2: Suppose that Assumptions 1 3 and 4 hold. If in addition H 0 holds, {ˆL 0 n} n N is consistent The limiting distribution of {J n } n N is that of a positive random variable whose distribution function is positively sloped at each positive point. Let cp, C, L, Ṽ ) denote the 1 α)-quantile of η L + C C L C) + C ) η for each k l matrix C, each k k symmetric matrix L, and each k k symmetric matrix Ṽ, where η is a k 1 random vector distributed with N0, Ṽ ), and p 0, 1), where a, b) denotes the open interval between real numbers a and b. We here propose a test that rejects H 0 if and only if J n exceeds cp, Ĉn, ˆL n, ˆV n ), where p is the desired size of the test. This test has the correct asymptotic size and it is consistent, as stated in the next theorem. Theorem 5.3: Suppose that Assumptions 1 4 hold. Then: a) If in addition H 0 holds, for each p 0, 1), P [J n > cp, Ĉ0 n, ˆL 0 n, ˆV 0 n )] p. b) Suppose instead that H 0 is violated, that R β,, 0) : R k1 R has a unique maximizer on R k1, and that R β, 0, ) : R k R has a unique minimizer on R k. Then for each p 0, 1), P [J n > cp, Ĉ0 n, ˆL 0 n, ˆV 0 n )] 1. 14

15 Because each quadratic form of normal random variables can be easily rewritten as a linear combination of χ 2 random variables using the eigenvalue decomposition, cp, Ĉn, ˆL n, ˆV n ) is the 1 α)-quantile of a linear combination of χ 2 random variables. To compute cp, Ĉn, ˆL n, ˆV n ), we can numerically find the 1 α)-quantile of the distribution of the linear combination, evaluating the distribution function by using Farebrother s 1984) algorithm. 6 Power under weak instruments According to Theorem 5.3b), our test proposed in the previous section is consistent in the regular asymptotic framework with strong instruments. In this section, we discuss the power properties of the test when the instruments are weak. For this purpose, we need a model describing how weak instruments arise in our problem. Before formalizing the notion of weak instruments in our problem setup, we first review the concept of weak instruments in the conventional IV regression. Staiger and Stock 1997) introduces weak instruments in a thought experiment in which the correlation between the endogenous regressors and the instruments becomes weaker as the sample size grows. More concretely, they relate the k 1 instrument vector Z t to the g 1 endogenous regressor vector Y n) t through Y n) = n 1/2 ΛZ t + V t, t {1, 2,..., n}, n N where Λ is a g k constant matrix, and V t is a unobservable g 1 random vector such that Z t is exogenous to V t. The superscript n) in Y n) t thought experiment is then indicates dependence of Y n) t on n. The structural equation in the y n) t = Y n) t β 0 + Z t,1α 0 + U t, t {1, 2,..., n}, n N, where Z t,1 is a k 1 1 subvector of Z t, and the regression error U t is orthogonal to Z t. In this setup, Staiger and Stock investigates the asymptotic behavior of tests of the hypothesis that H 0 : β 0 = β, where β is a known constant in R g. Define W t U t V t β β 0 ) t N). residual, i.e., the residual evaluated with coefficients β, α 0) is equal to Then it is straightforward to verify that the null-restricted y n) t Y n) t β Z t,1 α 0 = W t Z tn 1/2 Λ β β 0 ). 7) 15

16 Because E[Z t W t ] = 0, 8) it follows that E[Z t y t Y t β Z t,1α 0 )] = E[Z t Z t]n 1/2 Λ β β 0 ). Thus, the null restricted residual violates the moment condition underlying the conventional IV estimator, but only in the order of n 1/2. This is the essential feature of the setup that Staiger and Stock used to demonstrate that the behavior of the conventional tests of H 0 may be very different from what the conventional asymptotic analysis indicates, and why the AR test can be a better choice. Note that, while the fact that the null-restricted residual violates the moment condition in the order of n 1/2 hinges on 7) and 8), it does not matter for it what W t is or where Λ comes from. Also, note that there is no natural universally agreeable reduced-form equation in our setup, unlike the conventional IV regression setup. In analyzing our test of H 0 with weak instruments, we therefore take as basis 7) and 8) suitably modified for constructing an environment with weak instruments in our setup, as found in the next assumption. Assumption 5: The triangle array {X n) t y n) t, Y n) t, Z t,1, Z t,2 ) : t {1, 2,..., n}, n N} consists of random vectors on a probability space Ω, F, P ), where y n) t, Y n) t Z t,1, and Z t,2 are 1 1, g 1, k 1 1, and k 2 1, respectively; β is a constant vector in B that is a nonempty and compact subset of R g ; and τ is a known constant in 0, 1). There exists β 0 B, a g k matrix Λ, ᾱ R k1, and a sequence of random variables {W t } t N that satisfy that y n) t Y n) t β Z t,1 ᾱ = W t Z tn 1/2 Λ β β 0 ), t {1, 2,..., n}, n N, 9) and that for each t N, τ-quantile of W t given Z t is zero. In Assumption 5, β 0 appears as some vector satisfying the required condition, rather than the true coefficient of Y n) t, because our mathematical results do not depend on what β 0 is. Of course, our results are most useful when Assumption 5 holds with β 0 set equal to the true true coefficient of Y n) t. If β 0 = β, the conditional quantile restriction imposed upon {W t } t N is essentially the same as Assumption 2. The 16

17 equivalence of the two conditions can be achieved by setting ᾱ = α 0 and W 1 = U 1, in particular when we require that {Z t, W t)} t N is i.i.d., as we will do below. When β 0 β, the assumption implies that the conditional τ-quantile of the null-restricted residual given Z 1 is local-to-zero. In general, the distribution of W t depends on β β 0. Assumption 5 is clearly satisfied, if U t, V t ) is independent from Z 1 in the setup of Staiger and Stock 1997) discussed above. The matrix Λ captures the strength of the instruments. For example, the instruments are irrelevant when Λ = 0. In addition to Assumption 5, we impose the following conditions similar to Assumption 3: Assumption 6: a) Eρ τ W 1 Z 1γ) is uniquely minimized at γ = 0 k 1. b) A neighborhood Γ 0 R k of the origin satisfies the following conditions: i) The conditional distribution W 1 given Z 1 denoted by F Z 1 ) has a pdf f Z 1 ) at Z 1γ for each γ Γ 0 a.s.-p. ii) There exists a random variable D : Ω R with a finite second moment such that for each γ Γ 0, fz 1γ Z 1 ) Z 1 2 < D a.s.-p. c) J γγ = 2 R β, ᾱ, 0 k 1 )/ γ γ is positive definite, and J γα = 2 R β, ᾱ, 0 k 1 )/ γ α is of full column rank. d) E[ Z 1 2 ] <. e) {W t, Z t) : t = 1,..., n} are independent and identically distributed. The following theorem describes the asymptotic distribution of J n in the case of fixed alternatives β β 0 is a fixed vector) and the weak IVs design assumed in Assumption 5. Theorem 6.1: Suppose that Assumptions 5 and 6 hold. Then J n A Eρτ W 1 )) 1 η + J γγ Λ β β 0 ) ) J 1 γγ C 0 C 0 J γγ C 0 ) 1 C 0 ) η + J γγ Λ β β 0 ) ), where η is a k 1 random vector distributed with N 0, τ1 τ)e[z 1 Z 1] ). In the case of weak instruments and under fixed alternatives, the asymptotic distribution of J n is a noncentral mixed-χ 2 random variable. The power of the test that rejects H 0 : β = β 0 when J n > cα, Ĉ0 n, ˆL 0 n, ˆV 0 n ) 17

18 depends on the magnitude of the the non-centrality parameter given by β β0 ) Λ Jγγ J γγ C 0 C 0 J γγ C 0 ) 1 C 0 J γγ ) Λ β β0 ), where J γγ J γγ C 0 C 0 J γγ C 0 ) 1 C 0 J γγ is a positive definite matrix by Assumption 6c). Under H 0, β β 0 = 0 and the test rejects asymptotically with probability α. Thus, the test has correct size regardless of the strength of the instruments. Under the fixed alternatives, the asymptotic rejection probability depends on the distance between β and β 0 and the strength of the instruments Λ. For example, the test has no power when the instruments are irrelevant and Λ = 0. The test also lacks power in certain directions if Λ 0 however its rank is less than g. Appendix A Mathematical Proofs Given Assumption 1, write ξ ρτ E[ρ τ ξ)] for each ξ L 1 Ω, F, P ). Then ρτ is a pseudo norm on L 1 Ω, F, P ). Using ρτ, R can be written as Rβ, α, γ) = y 1 Y 1β Z 1,1α Z 1γ ρτ, β, α, γ) B R k1 R k. It follows that the minimization in the numerator of the ratio on the right-hand side of 3) is the ρτ -metric projection of y 1 Y 1β on Z 1, while the minimization in the denominator is the ρτ -metric projection of y 1 Y 1β on Z 1,1. The norm ρτ equivalent topologies, because 1 ξ ρτ ξ 1 min{τ, 1 τ} ξ ρ τ is closely related to the L 1 norm 1. They actually generate the An important implication of the equivalence is that ξ ρτ = 0 if and only if ξ 1 = 0. Our analysis uses the equivalence of the two norms, mostly without mentioning it explicitly. We show below that {sup α R k 1 ˆQn β, α, )} n N converges to sup α R k 1 Qβ, α) uniformly in β on the compact set B. We can then conclude that { ˆβ n } n N is consistent for β 0, because β 0 is the unique maximizer of β sup α R k 1 Qβ, α) on B. Once the consistency of ˆβ n is established, we can also prove that {ˆα n } n N converges a.s.-p to α 0, at which Rβ 0,, 0) : R k1 R is minimized, by utilizing the convexity of ˆR n ˆβ n, α, 0, ) in α and the pointwise convergence of { ˆR n ˆβ n, α, 0, )} n N to Rβ 0, α, 0) for each α. We first establish a few lemmas. For later conveniences, some lemmas have more generality than we need for proving Theorem 4.1. The generality will be useful in our proof of

19 Lemma A.1: Suppose that Assumptions 1 holds. Then for each β B, and inf ˆRn β, 0, γ, ) inf Rβ, 0, γ) 0 a.s.-p, γ R k γ R k inf α R k 1 ˆR n β, α, 0, ) inf α R k 1 Rβ, α, 0) 0 a.s.-p. Proof of Lemma A.1: The two convergence results can be proved in similar manners. We only prove the first one. Let β be an arbitrary point in B. Then the ρτ -metric projection of y 1 Y 1β on the linear subspace spanned by Z 1 exists and is in general a compact set. By the linear independence of Z 1 Assumption 1), this further means that Γ 1 arg min γ R k Rβ, 0, γ) is compact. It follows that there exists a closed ball Γ 2 containing Γ 1 in its interior. Now fix a point γ 1 in Γ 1. By the Kolmogorov law of large numbers Rao 1973, p. 115), { ˆR n β, 0, γ 1, )} n N converges to Rβ, 0, γ 1 ) a.s.-p. Also, by Jennrich s uniform law of large numbers Jennrich 1969, Theorem 2), { ˆR n β, 0, γ, ) Rβ, 0, γ)} n N converges to zero uniformly in γ on the boundary Γ 2 of Γ 2 a.s.-p. Because ˆR n β, 0, γ, ) is convex as a function of γ, and Rβ, 0, γ 1 ) < inf Rβ, 0, γ), γ Γ 2 it follows from the above-mentioned facts that ˆR n β, 0, γ 1, ) < inf ˆRn β, 0, γ, ) γ R k \Γ 2 for almost all n N a.s.-p. On the other hand, by Jennrich s uniform law of large numbers, { ˆR n β, 0, γ, ) Rβ, 0, γ)} n N converges to zero uniformly in γ Γ 2 a.s.-p, so that inf ˆRn β, 0, γ, ) Rβ, 0, γ 1 ) a.s.-p. γ Γ 2 The desired result therefore follows. Lemma A.2: Suppose that Assumptions 1 holds. Then sup inf ˆRn β, 0, γ, ) inf Rβ, 0, γ) 0 a.s.-p, γ R k γ R k β B and sup inf β B α R k 1 ˆR n β, α, 0, ) inf α R k 1 Rβ, α, 0) 0 a.s.-p. 19

20 Proof of Lemma A.2: We only prove the first convergence result, as the second one can be shown in an analogous manner. Because Lemma A.1 has shown the corresponding pointwise a.s. convergence, and B is compact, it suffices to show that the series in question is strongly stochastically equicontinuous Andrews 1992, Theorem 2). Let β 1 and β 2 be arbitrary points in B. Also, let g nj be Koenker and Bassett s 1978) estimator in τ-quantile regression of y t Y t β j on Z t, i.e., ˆR n β j, 0, g nj, ) = inf γ R k ˆRn β j, 0, γ, ), for j = 1, 2. Then we have that for each n N, ˆR n β 1, 0, g n1, ) ˆR n β 2, 0, g n2, ) = ˆR n β 1, 0, g n1, ) ˆR n β 1, 0, g n2, )) + ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, )) ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, ), where the inequality holds, because ˆR n β 1, 0, g n1, ) ˆR n β 1, 0, g n2, ) for each n N. We further have that ˆR n β 1, 0, g n2, ) ˆR n β 2, 0, g n2, ) =n 1 It follows that for for each n N n n n 1 ˆR n β 1, 0, g n1, ) ˆR n β 2, 0, g n2, ) β 1 β 2 n 1 Analogously, we can also show that for each n N ˆR n β 2, 0, g n2, ) ˆR n β 1, 0, g n1, ) β 1 β 2 n 1 Thus, it holds that for each n N inf γ R k ρτ y t Y t β 1 Z tg n2 ) ρ τ y t Y t β 2 Z tg n2 ) ) Y t β 1 Y t β 2 β 1 β 2 n 1 n n Y t. Y t. n Y t. ˆRn β 2, 0, γ, ) inf ˆRn β 1, 0, γ, ) = ˆR n β 2, 0, g n2, ) ˆR n β 1, 0, g n1, ) β 1 β 2 n 1 γ R k n Y t. Because {n 1 n Y t } n N converges to E[ Y 1 ] a.s.-p by the Kolmogorov strong law of large numbers, the desired result follows by Andrews 1992, Lemma 2). 20

21 Lemma A.3: Suppose that Assumptions 1 and 2 hold. For each β B, {inf ˆQ α R k n β, α, )} converges to inf α R k Qβ, α) uniformly in β B a.s.-p Proof of Lemma A.3: Because the linear independence of the elements of X 1 = y 1, Y 1, Z 1) in Assumption 1 implies that for each β B, the distance between y 1 Y 1β and the ρτ -metric projection of y 1 Y 1β on the subspace spanned by Z 1 is positive, i.e., inf α R k 1 Rβ, α, 0) > 0. Because β inf α R k 1 Rβ, α, 0) : B R is continuous, it is bounded away from zero on B. The desired results from this fact and Lemma A.2, because r 1, r 2 ) r 1 /r 2 : R a, ) R is a Lipschitz function if a > 0. Lemma A.4: Suppose that Assumptions 1 and 2 hold. Let {b n } n N be a sequence of B-valued random vectors on Ω, F, P ) converging to β 0 a.s.-p in probability-p ). Let {a n } n N be sequences of k 1 1 vectors on Ω, F, P ) satisfying that for each n N, ˆR n b n, a n, 0, ) = inf α R k 1 ˆRn b n, α, 0, ). Then: a) Then a n α 0 a.s.-p in probability-p ). b) Let {c n } n N be a sequence of k 1 random vectors on Ω, F, P ) satisfying that for each n N ˆR n b n, a n, c n, ) = inf γ R k ˆR n b n, a n, γ, ). Then c n 0 a.s.-p in probability-p ), provided that the minimizer of Rβ 0, α 0, ) : R k R over R k is unique. Proof of Lemma A.4: We only prove the result for {a n } n N. The result for {c n } n N can be established in an analogous way. Suppose that b n β 0 a.s.-p. Then for each α R k1, { ˆR n b n, α, 0, )} n N converges to Rβ 0, α, 0) a.s.-p, because for each α R k1, { ˆR n β, α, 0, )} n N converges to Rβ, α, 0) uniformly in β B by Jennrich s uniform law of large numbers Jennrich 1969, Theorem 2). Further, we can apply Rockafellar 1970, Theorem 10.8) to show that the convergence is uniform in α over any compact subset of R k1, because for each n N, ˆR n b n, α, 0, ) is convex in α over R k1. Take an arbitrary compact subset A 1 of R k1 that contain α 0 in its interior. Then { ˆR n b n, α 0, 0, )} converges to Rβ 0, α 0, 0) a.s.-p ; { ˆR n b n, α, 0, )} converges to Rβ 0, α, 0) uniformly on α A 1 a.s.-p ; and Rβ 0, α 0, 0) < inf α A1 Rβ 0, α, 0), because α 0 is the unique minimizer of Rβ 0,, 0) on R k1 by Assumption 2. Because ˆR n b n, α, 0, ) is convex in α, it follows that ˆR n b n, α 0, 0, ) < inf α R k 1 \A 1 ˆRn b n, α, 0, ) 21

22 for almost all n N a.s.-p. That is, a n A 1 for almost all n N a.s.-p. Because A 1 is an arbitrary compact subset containing α 0 in its interior, this establishes the a.s.-p convergence of {a n } n N to α 0. The convergence of {a n } n N in probability in the current lemma immediately follows from the result of the a.s. convergence of {a n } n N by using the subsequence theorem. Proof of Theorem 4.1: By Assumption 2, β sup α R k Qβ, α) : B R is uniquely maximized at β 0. Because ˆβ n maximizes sup α R k ˆQ n β, α, ) with respect to β over the compact subset B, and {sup α R k ˆQ n β, α, )} n N converges to sup α R k Qβ, α) uniformly in β B a.s.-p, it follows by Pötscher and Prucha 1991, Lemma 4.2) that { ˆβ n } n N converges to β 0 a.s.-p. Further, applying Lemma A.4a) by setting b n = ˆβ n and a n = ˆα n establishes that the strong consistency of ˆα n for α 0. The result therefore follows. In proving Lemmas 4.2, 4.3 and Theorem 4.4, we use the following lemma. Lemma A.5: Suppose that Assumptions 1 3 hold, and let { d nj b nj, a nj, g nj ) : Ω B R k1 R k } n N be a sequence of random vectors that converges in probability-p to d 0 β 0, α 0, 0 1 k ), j = 1, 2. Then ˆR n b n2, a n2, g n2, ) ˆR n b n1, a n1, g n1, ) n = n 1 τ 1U t < 0)) X t d n2 d n1 ) d n2 d 0 ) J d n2 d 0 ) 1 2 d n1 d 0 ) J d n1 d 0 ) + o P n 1/2 d n2 d n1 + d n1 d d n2 d 0 2 )), 10) where X t Y t, Z t,1, Z t), t N. Proof of Lemma A.5: Define r : R R g+k1+k R g+k1+k R g+k1+k R by ry, x, d 1, d 2 ) 1 ) ρ τ y x d 2 ) ρ τ y x d 1 ) + τ 1y x d 0 < 0)) x d 2 d 1 ), d 2 d 1 y, x, d 1, d 2 ) R R g+k1+k R g+k1+k R g+k1+k, with the rule that devision by zero is zero. Also, following Pollard 1985), let ν n denote the standardized sample average operator such that for each function f : R R l+k R with E[ fy 1, X 1 ) ] < ν n f, ) = n 1/2 n fyt, X t ) E[fY 1, X 1 )] ), n N. 22

23 By the definition of r, we obtain that ˆR n d 2, ) ˆR n d 1, ) =Rd 2 ) Rd 1 ) l + n 1 n + n 1/2 d 2 d 1 ν n r,, d 1, d 2 ) τ 1U t < 0)) X t ) d 2 d 1 ) for each d 1, d 2 ) R l+k R l+k, where l is the gradient of R at β 0, α 0, 0 1 k ), which is equal to E[τ 1U 1 < 0)) X 1 ]. Taking the second-order Taylor expansion of Rd 1 ) and Rd 2 ) about d 0 on the right-hand side of this equality and replacing d 1 with d n1 and d 2 with d n2 in the resulting equality yields the desired result, if {ν n r,, d n1, d n2 )} n N converges to zero in probability-p. It thus suffices to show the convergence of {ν n r,, d n1, d n2 )} to zero in probability-p. It is straightforward to verify that ry 1, X 1, θ 1, θ 2 ) 2 X 1, from which it follows that [ ] E sup ry 1, X 1, d 1, d 2 ) 2 d 1,d 2) R g+k 1 +k R g+k 1 +k 4E[ X 1 2 ] <. Also, {r,, d 1, d 2 ) : d 1, d 2 ) R g+k1+k R g+k1+k } can be expressed as a sum of a fixed member of functions from a polynomial class. These facts imply that {ν n r,, d 1, d 2 )} n N is stochastically equicontinuous at d 0, d 0 ) Pollard 1985, pp ). Further, ry 1, X 1, d 1, d 2 ) 2 converges to zero as d 1, d 2 ) d 0, d 0 ) a.s.-p, and ry 1, X 1, d 1, d 2 ) 2 is dominated by 4 X 1 2 with a finite moment. It follows by the dominated convergence theorem that E[ry 1, X 1, d 1, d 2 ) 2 ] 0 as d 1, d 2 ) d 0, d 0 ). Now let {U n R g+k1+k R g+k1+k } n N be an arbitrary sequence of balls centered at d 0, d 0) that shrinks down to d 0, d 0). Then, as Pollard 1985, page. 309) explains, it follows from the above-mentioned facts that sup d1,d 2) U n ν n r,, d 1, d 2 ) 0 in probability-p. Thus, {ν n r,, d n1, d n2 )} converges to zero in probability-p, given that { d nj } n N converges to d 0 in probability-p, j = 1, 2. Lemma A.6: Let Ω, F, P ) be a probability space. Suppose that a sequence of random vectors {η n : Ω R m } n N and a sequence of random variables {ξ n : Ω R} n N satisfy that η naη n + ξ n 0 for each n N, where A is a positive definite m m symmetric matrix. Also, let {ζ n : Ω R} n N be a sequence of random variables. Suppose that ξ n = o P η n + η n 2 + ζ n ) as n. Then η n = o P ζ n 1/2 + 1) as n. We now prove Lemma 4.2. Proof of Lemma 4.2: The existence of {c n } follows immediately from the fact that the minimization of 23

24 ˆR n b n, a n, γ, ) in terms of γ is the ρτ -metric projection of y 1 Y 1b n Z 1,1a n, y 2 Y 2b n Z 1,2a n,..., y n Y nb n Z 1,na n ) on the space spanned by the rows of Z 1, Z 2,..., Z n ). To prove the second result, we first show that {c n } converges to 0 in probability-p, and then apply Lemmas A.5 and A.6. For each fixed γ R k, ˆR n β, α, γ, ) is convex in β and α. By the Kolmogorov strong law of large numbers and Hjort and Pollard 1993, Lemma 1), { ˆR n β, α, γ, )} n N converges to Rβ, α, γ) uniformly in β, α ) in each neighborhood of β 0, α 0) in probability-p. Because {b n, a n) } n N converges to β 0, α 0) in probability-p by the assumption, it follows that { ˆR n b n, a n, γ, )} n N converges to Rβ 0, α 0, γ) for each γ R k. Under Assumptions 1 3c), this fact implies by Hjort and Pollard 1993, Lemma 2) that {c n } converges to 0 in probability-p. We now set b n to both b n1 and b n2, c n to g n1, g n C b n β 0 n + Jγγ 1 n 1 τ 1U t < 0)) Z t. a n α 0 to g n2 in 10) and multiply the resulting equality by n to obtain that 0 n ˆR n b n, a n, g n, ) ˆR n b n, a n, c n, )) = 1 2 n1/2 c n g n) J γγ n 1/2 c n g n) + o P n 1/2 c n g n + n b n β n a n α n c n 2 + n g n 2) = 1 2 n1/2 c n g n) J γγ n 1/2 c n g n) + o P n 1/2 c n g n ) + n c n g n 2 + n b n β n a n α ), where the second equality holds because g n = O P b n β 0 + a n α 0 + 1) and c n = O P c n g n + g n ). The result follows from this inequality by Lemma A.6. Proof of Lemma 4.3: Let {c n } n N be as in Lemma 4.2. Note that the difference between { ˆR n b n, a n, c n, )} n N and { ˆR n b n, a n, 0, )} n N converges to zero in probability-p. Applying the delta method with this fact, we 24

25 obtain that n log ˆQ n b n, a n, ) = nlog ˆR n b n, a n, c n, ) log ˆR n b n, a n, 0, )) 1 = Rβ 0, α 0, 0) n ˆR n b n, a n, c n, ) ˆR n b n, a n, 0, )) 11) 1 2Rβ 0, α 0, 0) 2 n ˆR n b n, a n, c n, ) Rβ 0, α 0, 0)) Rβ 0, α 0, 0) 2 n ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) 2 + o P n ˆRn b n, a n, c n, ) Rβ 0, α 0, 0)) 2 + n ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) 2). We apply Lemma A.5 to each of the non-remainder terms on the right-hand side of this equality: n ˆR n b n, a n, c n, ) ˆR n b n, a n, 0, )) = 1 2 n1/2 c nj γγ n 1/2 c n + o P n 1/2 c n + n b n β n a n α c n 2) = 1 2 n1/2 c nj γγ n 1/2 c n + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ), and n 1/2 ˆR n b n, a n, c n, ) Rβ 0, α 0, 0)) = n 1/2 ˆR n b n, a n, c n, ) ˆR n β 0, α 0, 0, )) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) = n n 1 τ 1U t < 0)) Y t, Z t,1) ) n 1/2 θ n θ 0 ) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ), n 1/2 ˆR n b n, a n, 0, ) Rβ 0, α 0, 0)) = n 1/2 ˆR n b n, a n, 0, ) ˆR n β 0, α 0, 0, )) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) n ) = n 1 τ 1U t < 0)) Y t, Z t,1) n 1/2 b n β 0 ) + n 1/2 ˆR n β 0, α 0, 0, ) Rβ 0, α 0, 0)) + o P n 1/2 b n β 0 + n 1/2 a n α 0 + n b n β n a n α ). Substituting these into 11) and applying Lemma 4.2 yields the desired result. Proof of Theorem 4.4: θ n θ 0 Let 1 Rβ 0, α 0, 0) K 1 C n 1 n τ 1U t < 0)) Z t, n N, 25

26 and let b n and a n denote the vectors containing the first g elements and the remaining elements of θ n, respectively. Then, by Lemma 4.3, we have that 0 n log ˆQ n ˆβ n, ˆα n, ) n log ˆQ n b n, a n, ) = 1 2 n1/2 ˆθ n θ n) Kn 1/2 ˆθ n θ n) + o P n 1/2 ˆβ n β 0 + n 1/2 ˆα n α 0 + n ˆβ n β n ˆα n α ). The first result follows from this equality by Lemma A.6. For the second result, apply the central limit theorem CLT) for i.i.d. random vectors Rao 1973, p. 128) to show that {n 1/2 n τ 1U t < 0)) Z t } n N is asymptotically distributed with N0, Rβ 0, α 0, 0) 2 V ), and then apply the continuous mapping theorem. Proof of Theorem 4.5: To prove a), let { θ n} n N be as in the proof of Theorem 4.4 and {δ n } n N an arbitrary sequence of g + k 1 ) 1 random vectors that converges to the origin in probability-p. Recall that the expression consisting of the second and third terms on the right-hand side of 5) is minimized when b n, a n) = θ n, and that {n 1/2 ˆθ n θ n)} n N converges to zero in probability-p by Theorem 4.4. Using these facts with Lemma 4.3, we can show that n log ˆQ n ˆθ n, ) n log ˆQ n θ n, ) = o P 1) and n log ˆQ n ˆθ n + δ n, ) n log ˆQ n θ n, ) = 1 2 n1/2 ˆθ n θ n + δ n ) Kn 1/2 ˆθ n θ n + δ n ) + o P n 1/2 δ n + δ n ) = 1 2 n1/2 δ nkn 1/2 δ n + o P n 1/2 δ n + δ n ) 12) By taking each of τ n e i + τ n e j, τ n e i + τ n e j, τ n e i τ n e j, and τ n e i τ n e j for δ n in this equality and using the resulting equalities in the definition of ˆK nij i, j = 1, 2,..., l), we obtain that 4nτ 2 n ˆK nij = 4τ 2 nk ij + o P n 1/2 τ n + τ 2 n + 1 ). Dividing both sides of this equality by 4nτ 2 n and applying Assumption 4 yields the desired result. To prove b), let δ be an arbitrary g +k 1 ) 1 vector. By Lemma 4.2, we have that for each i = 1, 2,..., k and each j = 1, 2,..., l, ˆγ ni ˆθ n + τ n δ) ˆγ ni ˆθ n τ n δ) = 2τ n Cδ n + o P τ n ). It follows that 1 2τ n ˆγ n ˆβ n + τ n δ) ˆγ n ˆβ n τ n δ)) = Cδ + o P 1). Taking e j for δ for each j = 1, 2,..., k in this equality completes the proof. 26

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments CIRJE-F-466 Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments Yukitoshi Matsushita CIRJE, Faculty of Economics, University of Tokyo February 2007

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Exponential Tilting with Weak Instruments: Estimation and Testing

Exponential Tilting with Weak Instruments: Estimation and Testing Exponential Tilting with Weak Instruments: Estimation and Testing Mehmet Caner North Carolina State University January 2008 Abstract This article analyzes exponential tilting estimator with weak instruments

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Lecture 11 Weak IV. Econ 715

Lecture 11 Weak IV. Econ 715 Lecture 11 Weak IV Instrument exogeneity and instrument relevance are two crucial requirements in empirical analysis using GMM. It now appears that in many applications of GMM and IV regressions, instruments

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral, IAE, Barcelona GSE and University of Gothenburg U. of Gothenburg, May 2015 Roadmap Testing for deviations

More information

Estimation and Inference with Weak Identi cation

Estimation and Inference with Weak Identi cation Estimation and Inference with Weak Identi cation Donald W. K. Andrews Cowles Foundation Yale University Xu Cheng Department of Economics University of Pennsylvania First Draft: August, 2007 Revised: March

More information

Inference for Identifiable Parameters in Partially Identified Econometric Models

Inference for Identifiable Parameters in Partially Identified Econometric Models Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Regression. Saraswata Chaudhuri, Thomas Richardson, James Robins and Eric Zivot. Working Paper no. 73. Center for Statistics and the Social Sciences

Regression. Saraswata Chaudhuri, Thomas Richardson, James Robins and Eric Zivot. Working Paper no. 73. Center for Statistics and the Social Sciences Split-Sample Score Tests in Linear Instrumental Variables Regression Saraswata Chaudhuri, Thomas Richardson, James Robins and Eric Zivot Working Paper no. 73 Center for Statistics and the Social Sciences

More information

Department of Econometrics and Business Statistics

Department of Econometrics and Business Statistics ISSN 440-77X Australia Department of Econometrics and Business Statistics http://wwwbusecomonasheduau/depts/ebs/pubs/wpapers/ The Asymptotic Distribution of the LIML Estimator in a artially Identified

More information

University of Pavia. M Estimators. Eduardo Rossi

University of Pavia. M Estimators. Eduardo Rossi University of Pavia M Estimators Eduardo Rossi Criterion Function A basic unifying notion is that most econometric estimators are defined as the minimizers of certain functions constructed from the sample

More information

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables

Likelihood Ratio Based Test for the Exogeneity and the Relevance of Instrumental Variables Likelihood Ratio Based est for the Exogeneity and the Relevance of Instrumental Variables Dukpa Kim y Yoonseok Lee z September [under revision] Abstract his paper develops a test for the exogeneity and

More information

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics Abstract This paper considers instrumental variable regression with a single endogenous variable

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Conditional Linear Combination Tests for Weakly Identified Models

Conditional Linear Combination Tests for Weakly Identified Models Conditional Linear Combination Tests for Weakly Identified Models Isaiah Andrews JOB MARKET PAPER Abstract This paper constructs powerful tests applicable in a wide range of weakly identified contexts,

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may

Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may Instrumental Variable Estimation Based on Mean Absolute Deviation Shinichi Sakata University of Michigan Department of Economics 240 Lorch Hall 6 Tappan Street Ann Arbor, MI 4809-220 U.S.A. February 4,

More information

Asymptotics for Nonlinear GMM

Asymptotics for Nonlinear GMM Asymptotics for Nonlinear GMM Eric Zivot February 13, 2013 Asymptotic Properties of Nonlinear GMM Under standard regularity conditions (to be discussed later), it can be shown that where ˆθ(Ŵ) θ 0 ³ˆθ(Ŵ)

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1

A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS. By Marcelo J. Moreira 1 Econometrica, Vol. 71, No. 4 (July, 2003), 1027 1048 A CONDITIONAL LIKELIHOOD RATIO TEST FOR STRUCTURAL MODELS By Marcelo J. Moreira 1 This paper develops a general method for constructing exactly similar

More information

Weak Identification in Maximum Likelihood: A Question of Information

Weak Identification in Maximum Likelihood: A Question of Information Weak Identification in Maximum Likelihood: A Question of Information By Isaiah Andrews and Anna Mikusheva Weak identification commonly refers to the failure of classical asymptotics to provide a good approximation

More information

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω) Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.

More information

1 Procedures robust to weak instruments

1 Procedures robust to weak instruments Comment on Weak instrument robust tests in GMM and the new Keynesian Phillips curve By Anna Mikusheva We are witnessing a growing awareness among applied researchers about the possibility of having weak

More information

Generalized Method of Moments (GMM) Estimation

Generalized Method of Moments (GMM) Estimation Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Estimation and Inference with Weak, Semi-strong, and Strong Identi cation

Estimation and Inference with Weak, Semi-strong, and Strong Identi cation Estimation and Inference with Weak, Semi-strong, and Strong Identi cation Donald W. K. Andrews Cowles Foundation Yale University Xu Cheng Department of Economics University of Pennsylvania This Version:

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan Outline C.-M. Kuan,

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models

Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Location Properties of Point Estimators in Linear Instrumental Variables and Related Models Keisuke Hirano Department of Economics University of Arizona hirano@u.arizona.edu Jack R. Porter Department of

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression. Patrik Guggenberger Pennsylvania State University A more powerful subvector Anderson and Rubin test in linear instrumental variables regression Patrik Guggenberger Pennsylvania State University Joint work with Frank Kleibergen (University of Amsterdam)

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments

Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments CHAPTER 6 Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments James H. Stock and Motohiro Yogo ABSTRACT This paper extends Staiger and Stock s (1997) weak instrument asymptotic

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Greene, Econometric Analysis (6th ed, 2008)

Greene, Econometric Analysis (6th ed, 2008) EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract

ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics. Abstract ROBUST CONFIDENCE SETS IN THE PRESENCE OF WEAK INSTRUMENTS By Anna Mikusheva 1, MIT, Department of Economics Abstract This paper considers instrumental variable regression with a single endogenous variable

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Exogeneity tests and weak-identification

Exogeneity tests and weak-identification Exogeneity tests and weak-identification Firmin Doko Université de Montréal Jean-Marie Dufour McGill University First version: September 2007 Revised: October 2007 his version: February 2007 Compiled:

More information

Uniformity and the delta method

Uniformity and the delta method Uniformity and the delta method Maximilian Kasy January, 208 Abstract When are asymptotic approximations using the delta-method uniformly valid? We provide sufficient conditions as well as closely related

More information

Comparison of inferential methods in partially identified models in terms of error in coverage probability

Comparison of inferential methods in partially identified models in terms of error in coverage probability Comparison of inferential methods in partially identified models in terms of error in coverage probability Federico A. Bugni Department of Economics Duke University federico.bugni@duke.edu. September 22,

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Testing for Weak Identification in Possibly Nonlinear Models

Testing for Weak Identification in Possibly Nonlinear Models Testing for Weak Identification in Possibly Nonlinear Models Atsushi Inoue NCSU Barbara Rossi Duke University December 24, 2010 Abstract In this paper we propose a chi-square test for identification. Our

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference Università di Pavia GARCH Models Estimation and Inference Eduardo Rossi Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Instrumental Variables Estimation in Stata

Instrumental Variables Estimation in Stata Christopher F Baum 1 Faculty Micro Resource Center Boston College March 2007 1 Thanks to Austin Nichols for the use of his material on weak instruments and Mark Schaffer for helpful comments. The standard

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

Analysis of least absolute deviation

Analysis of least absolute deviation Analysis of least absolute deviation By KANI CHEN Department of Mathematics, HKUST, Kowloon, Hong Kong makchen@ust.hk ZHILIANG YING Department of Statistics, Columbia University, NY, NY, 10027, U.S.A.

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Section 9: Generalized method of moments

Section 9: Generalized method of moments 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i

More information

A Robust Test for Weak Instruments in Stata

A Robust Test for Weak Instruments in Stata A Robust Test for Weak Instruments in Stata José Luis Montiel Olea, Carolin Pflueger, and Su Wang 1 First draft: July 2013 This draft: November 2013 Abstract We introduce and describe a Stata routine ivrobust

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

The Influence Function of Semiparametric Estimators

The Influence Function of Semiparametric Estimators The Influence Function of Semiparametric Estimators Hidehiko Ichimura University of Tokyo Whitney K. Newey MIT July 2015 Revised January 2017 Abstract There are many economic parameters that depend on

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Conditional Inference With a Functional Nuisance Parameter

Conditional Inference With a Functional Nuisance Parameter Conditional Inference With a Functional Nuisance Parameter The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Inference for identifiable parameters in partially identified econometric models

Inference for identifiable parameters in partially identified econometric models Journal of Statistical Planning and Inference 138 (2008) 2786 2807 www.elsevier.com/locate/jspi Inference for identifiable parameters in partially identified econometric models Joseph P. Romano a,b,, Azeem

More information

Nonlinear minimization estimators in the presence of cointegrating relations

Nonlinear minimization estimators in the presence of cointegrating relations Nonlinear minimization estimators in the presence of cointegrating relations Robert M. de Jong January 31, 2000 Abstract In this paper, we consider estimation of a long-run and a short-run parameter jointly

More information

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Y t = ΦD t + Π 1 Y t Π p Y t p + ε t, D t = deterministic terms

Y t = ΦD t + Π 1 Y t Π p Y t p + ε t, D t = deterministic terms VAR Models and Cointegration The Granger representation theorem links cointegration to error correction models. In a series of important papers and in a marvelous textbook, Soren Johansen firmly roots

More information

Statistics and econometrics

Statistics and econometrics 1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample

More information

Exogeneity tests and weak identification

Exogeneity tests and weak identification Cireq, Cirano, Départ. Sc. Economiques Université de Montréal Jean-Marie Dufour Cireq, Cirano, William Dow Professor of Economics Department of Economics Mcgill University June 20, 2008 Main Contributions

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Subsampling Tests of Parameter Hypotheses and Overidentifying Restrictions with Possible Failure of Identification

Subsampling Tests of Parameter Hypotheses and Overidentifying Restrictions with Possible Failure of Identification Subsampling Tests of Parameter Hypotheses and Overidentifying Restrictions with Possible Failure of Identification Patrik Guggenberger Department of Economics U.C.L.A. Michael Wolf Department of Economics

More information

Split-Sample Score Tests in Linear Instrumental Variables Regression

Split-Sample Score Tests in Linear Instrumental Variables Regression Split-Sample Score Tests in Linear Instrumental Variables Regression Saraswata Chaudhuri, Thomas Richardson, James Robins and Eric Zivot Working Paper no. 73 Center for Statistics and the Social Sciences

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

The Numerical Delta Method and Bootstrap

The Numerical Delta Method and Bootstrap The Numerical Delta Method and Bootstrap Han Hong and Jessie Li Stanford University and UCSC 1 / 41 Motivation Recent developments in econometrics have given empirical researchers access to estimators

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

1 Introduction. Conditional Inference with a Functional Nuisance Parameter. By Isaiah Andrews 1 and Anna Mikusheva 2 Abstract

1 Introduction. Conditional Inference with a Functional Nuisance Parameter. By Isaiah Andrews 1 and Anna Mikusheva 2 Abstract 1 Conditional Inference with a Functional Nuisance Parameter By Isaiah Andrews 1 and Anna Mikusheva 2 Abstract his paper shows that the problem of testing hypotheses in moment condition models without

More information

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

Measuring the Sensitivity of Parameter Estimates to Estimation Moments Measuring the Sensitivity of Parameter Estimates to Estimation Moments Isaiah Andrews MIT and NBER Matthew Gentzkow Stanford and NBER Jesse M. Shapiro Brown and NBER May 2017 Online Appendix Contents 1

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Imbens/Wooldridge, Lecture Notes 13, Summer 07 1

Imbens/Wooldridge, Lecture Notes 13, Summer 07 1 Imbens/Wooldridge, Lecture Notes 13, Summer 07 1 What s New in Econometrics NBER, Summer 2007 Lecture 13, Wednesday, Aug 1st, 2.00-3.00pm Weak Instruments and Many Instruments 1. Introduction In recent

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

CENTER FOR LAW, ECONOMICS AND ORGANIZATION RESEARCH PAPER SERIES

CENTER FOR LAW, ECONOMICS AND ORGANIZATION RESEARCH PAPER SERIES Maximum Score Estimation of a Nonstationary Binary Choice Model Hyungsik Roger Moon USC Center for Law, Economics & Organization Research Paper No. C3-15 CENTER FOR LAW, ECONOMICS AND ORGANIZATION RESEARCH

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information