A more powerful subvector Anderson Rubin test in linear instrumental variable regression

A more powerful subvector Anderson Rubin test in linear instrumental variable regression Patrik Guggenberger Department of Economics Pennsylvania State University Frank Kleibergen Department of Quantitative Economics University of Amsterdam Sophocles Mavroeidis Department of Economics University of Oxford First Version: January, 2017 Revised: October 19, 2017 Abstract We study subvector inference in the linear instrumental variables model assuming homoskedasticity but allowing for weak instruments. The subvector Anderson and Rubin 1949) test that uses chi square critical values with degrees of freedom reduced by the number of parameters not under test, proposed by Guggenberger et al 2012), controls size but is generally conservative. We propose a conditional subvector Anderson and Rubin test that uses data-dependent critical values that adapt to the strength of identification of the parameters not under test. This test has correct size and strictly higher power than the subvector Anderson and Rubin test by Guggenberger et al 2012). We provide tables with conditional critical values so that the new test is quick and easy to use. Keywords: Asymptotic size, linear IV regression, subvector inference, weak instruments JEL codes: C12, C26 1 Introduction Inference in the homoskedastic linear instrumental variables IV) regression model with possibly weak instruments has been the subject of a growing literature. 1 Most of this literature has focused on the problem of inference on the full vector of slope coefficients of the endogenous regressors. Weak-instrument robust Guggenberger gratefully acknowledges the research support of the National Science Foundation via grant number SES- 1326827. Mavroeidis gratefully acknowledges the research support of the European Research Council via Consolidator grant number 647152. The authors thank seminar participants at various institutions as well as Julius Koll and Jin Thed for research assistance. 1 See e.g., Nelson and Startz 1990), Staiger and Stock 1997), Kleibergen 2002), Moreira 2003), Andrews et al. 2006, 2008) Chernozhukov et al. 2009), and Hillier 2009a,b). 1

inference on subvectors of slope coefficients is a harder problem, because the parameters not under test become additional nuisance parameters, and has received less attention in the literature, see e.g., Dufour and Taamouti 2005), Guggenberger et al. 2012) henceforth GKMC), and Kleibergen 2015). The present paper contributes to that part of the literature and focuses on the subvector Anderson and Rubin 1949) AR) test studied by GKMC. Chernozhukov et al 2009) showed that the full vector AR test is admissible, see also Montiel-Olea 2017). GKMC proved that the use of χ 2 k m W critical values, where k is the number of instruments and m W is the number of unrestricted slope coefficients under the null hypothesis, results in a subvector AR test with asymptotic size equal to the nominal size, thus providing a power improvement over the projection approach, see Dufour and Taamouti 2005), that uses χ 2 k critical values. This paper is motivated by the insight that the largest quantiles of the subvector AR test statistic, namely the quantiles of a χ 2 k m W distribution, occur under strong identification of the nuisance parameters. Therefore, there may be scope for improving the power of the subvector AR test by using data-dependent critical values that adapt to the strength of identification of the nuisance parameters. Indeed, we propose a new data-dependent critical value for the subvector AR test that is smaller than the χ 2 k m W critical value in GKMC. The new critical value depends monotonically on a statistic that measures the strength of identification of the nuisance parameters under the null akin to a first-stage F statistic in a model with m W = 1), and converges to the χ 2 k m W critical value when the conditioning statistic gets large. We prove that the new conditional subvector AR test has correct asymptotic size and strictly higher power than the test in GKMC, and therefore the subvector AR test in GKMC is inadmissible. At least in the case m W = 1, there is little scope for exploring alternative approaches, such as, e.g., Bonferroni, for using information about the strength of identification to improve the power of the subvector GKMC test. Specifically, in the case m W = 1, we use the approach of Elliott et al. 2015) to obtain a pointoptimal power bound for any test that only uses the subvector AR statistic and our measure of identification strength, and find that the power of the new conditional subvector AR test is very close to it. Implementation of the new subvector test is trivial. The test statistic is the same as in GKMC and the critical values, as functions of a scalar conditioning statistic, are tabulated. Our analysis relies on the insight that the subvector AR statistic is the likelihood ratio statistic for testing that the mean of a k p Gaussian matrix with Kronecker covariance is of reduced rank. When the covariance matrix is known, this statistic corresponds to the minimum eigenvalue of a noncentral Wishart matrix. This enables us to draw on a large related statistical literature, see Muirhead 2009). A useful result from Perlman and Olkin 1980) establishes the monotonicity of the distribution of the subvector AR statistic with respect to the concentration parameter which measures the strength of identification when m W = 1. The proposed conditional critical values are based on results given in Muirhead 1978) on approximations to the distribution of the eigenvalues of noncentral Wishart matrices. In the normal linear IV model, we show that the finite-sample size of the conditional subvector AR test depends only on a m W -dimensional nuisance parameter. When m W = 1, it is therefore straightforward to compute the finite-sample size by simulation or numerical integration, and we prove that finite-sample size for general m W is bounded by size in the case m W = 1. The conditional subvector AR test depends on eigenvalues of quadratic forms of random matrices. We combine the method of Andrews et al. 2011) that was used in GKMC with results in Andrews and Guggenberger 2015) to show that the asymptotic size of the new test can be computed from finite-sample size when errors are Gaussian and their covariance matrix is known. 2

Three other related papers are Rhodes Jr 1981) that studies the exact distribution of the likelihood ratio statistic for testing the validity of overidentifying restrictions in a Gaussian simultaneous equations model; and Nielsen 1999, 2001) that study conditional tests of rank in bivariate canonical correlation analysis, which is related to the present problem when k = 2 and m W = 1. These papers do not provide results on asymptotic size or power. In ongoing work, Kleibergen 2015) provides power improvements over projection for the conditional likelihood ratio test for a subvector hypothesis in the linear IV model. Building on the approach of Chaudhuri and Zivot 2011), Andrews 2017) proposes a two-step Bonferroni-like method that applies more generally to nonlinear models with non-iid heteroskedastic data, and is asymptotically efficient under strong identification. Our paper focuses instead on power improvement under weak identification. Another related recent paper on subvector inference in the linear IV model is Zhu 2015), whose setup also allows for conditional heteroskedasticity and is based on the Bonferroni method. Andrews and Mikusheva 2016) develop robust subvector inference in nonlinear models. Han and McCloskey 2017) study subvector inference in nonlinear models with near singular Jacobian. Kaido et al. 2016) and Bugni et al. 2017) consider subvector inference in models defined by moment inequalities. The analysis in this paper relies critically on the assumption of homoskedasticity. Allowing for heteroskedasticity is difficult because the number of nuisance parameters grows with k, and finite-sample distribution theory becomes intractable. When testing hypotheses on the full vector of coefficients in linear IV regression, robustness to heteroskedasticity is asymptotically costless since the heteroskedasticity-robust AR test is asymptotically equivalent to the nonrobust one under homoskedasticity, and the latter is admissible. However, in the subvector case, our paper shows that one can exploit the structure of the homoskedastic linear IV model to obtain more powerful tests, while it is not at all clear whether this is feasible under heteroskedasticity. Therefore, given the current state of the art, our results seem to indicate that there is a trade-off between efficiency and robustness to heteroskedasticity for subvector testing in the linear IV model. The structure of the paper is as follows. Section 2 provides the finite-sample results with Gaussian errors, fixed instruments, and known covariance matrix, Section 3 gives asymptotic results, and Section 4 concludes. All proofs of the main results in the paper and tables of conditional critical values and additional numerical results are provided in the Supplemental Material SM). We use the following notation. For a full column rank matrix A with n rows let P A = AA A) 1 A and M A = I n P A, where I n denotes the n n identity matrix. If A has zero columns, then we set M A = I n. The chi square distribution with k degrees of freedom and its 1 α-quantile are written as χ 2 k and χ2 k,1 α, respectively. For an n n matrix A, ρ A) denotes the rank of A and κ i A), i = 1,..., n denote the eigenvalues of A in non-increasing order. By κ min A) and κ max A) we denote the smallest and largest eigenvalue of A, respectively. We write 0 n k to denote a matrix of dimensions n by k with all entries equal to zero and typically write 0 n for 0 n 1. 2 Finite sample analysis The model is given by the equations y=y β + W γ + ε Y =ZΠ Y + V Y W =ZΠ W + V W, 2.1) 3

where y R n, Y R n m Y, W R n m W, and Z R n k. We assume that k m W 1. The reduced form can be written as y Y W ) = Z Π Y Π W ) β I my 0 m Y m W γ 0 m W m Y I mw ) ) + v y V Y V W, 2.2) } {{ } V where v y := V Y β+ V W γ + ε. By V i we denote the i-th row of V written as a column vector and similarly for other matrices. Let m := m Y + m W. Throughout this section, we make the following assumption. Assumption A: 1. V i := v yi, V Y i, V W i ) i.i.d.n 0 m+1) m+1), Ω ), i = 1,..., n, where Ω R m+1) m+1) is known and positive definite. 2. The instruments Z R n k are fixed and Z Z R k k is positive definite. The objective is to test the hypothesis H 0 : β = β 0 against H 1 : β β 0, 2.3) using tests whose size, i.e. the highest null rejection probability NRP) over the unrestricted nuisance parameters Π Y, Π W, and γ, equals the nominal size α. In particular, weak identification and non-identification of β and γ are allowed for. The subvector AR statistic for testing H 0 is defined as AR n β 0 ) := min γ R m W Y 0 W γ) P Z Y 0 W γ) 1, γ ) Ω β 0 ) 1, γ ), 2.4) where Ω β 0 ) := 1 0 1 m W β 0 0 m Y m W Ω 1 0 1 m W β 0 0 m Y m W, 2.5) 0 m W 1 I mw 0 m W 1 I mw and Y 0 := y Y β 0. 2.6) Denote by ˆκ i for i = 1,..., p := 1 + m W the roots of the following characteristic polynomial in κ κω β 0 ) Y 0, W ) PZ Y 0, W ) = 0, 2.7) ordered non-increasingly. Then, AR n β 0 ) = ˆκ p, 2.8) that is, AR n β 0 ) equals the smallest characteristic root, see, e.g. Schmidt, 1976, chapter 4.8). The subvector AR test in GKMC rejects H 0 at significance level α if AR n β 0 ) > χ 2 k m W,1 α, while the AR test based on projection rejects if AR n β 0 ) > χ 2 k,1 α. Under Assumption A, the subvector AR statistic equals the minimum eigenvalue of a noncentral Wishart matrix. More precisely, we show in the SM Subsection S.1.1) that the roots ˆκ i of 2.7) for i = 1,..., p, satisfy 0 = ˆκ i I p Ξ Ξ, 2.9) where Ξ N M, I kp ) for some nonrandom M R k p defined in S 15) in the SM). Furthermore, under the 4

null hypothesis H 0, M = ) 0 k, Θ W for some ΘW R k m W defined in S 17) in the SM) and thus ρ M) m W. Therefore, Ξ Ξ W p k, I p, M M), where the latter denotes a non-central Wishart distribution with k degrees of freedom, covariance matrix I p, and noncentrality matrix ) M 0 0 1 m W M = 0 m W 1 Θ W Θ. 2.10) W The joint distribution of the eigenvalues of a noncentral Wishart matrix only depends on the eigenvalues of the noncentrality matrix M M see e.g. Muirhead, 2009). Hence, the distribution of ˆκ 1,..., ˆκ p ) under the null only depends on the eigenvalues of Θ W Θ W, which we denote by κ i := κ i Θ W Θ W ), i = 1,..., m W. 2.11) We can think of Θ W Θ W as the concentration matrix for the endogenous regressors W, see e.g. Stock et al. 2002). In the case when m W = 1, Θ W Θ W is a scalar, and corresponds to the well-known concentration parameter see e.g. Staiger and Stock 1997)) that measures the strength of the identification of the parameter vector γ not under test. 2.1 Motivation for conditional subvector AR test: Case m W = 1 The above established that when m W = 1 the distribution of AR n β 0 ) under H 0 depends only on the single nuisance parameter κ 1. The following result gives a useful monotonicity property of this distribution. Theorem 1 Suppose that Assumption A holds and m W = 1. Then, under the null hypothesis H 0 : β = β 0, the distribution function of the subvector AR statistic in 2.4) is monotonically decreasing in the parameter κ 1, defined in 2.11), and converges to χ 2 k 1 as κ 1. This result follows from Perlman and Olkin, 1980, Theorem 3.5), who established that the eigenvalues of a k p noncentral Wishart matrix are stochastically increasing in the nonzero eigenvalue of the noncentrality matrix when the noncentrality matrix is of rank 1. Theorem 1 shows that the ) subvector AR test in GKMC is conservative for all κ 1 <, because its NRP Pr κ1 AR n β 0 ) > χ 2 k 1,1 α is monotonically increasing in κ 1 and the worst case occurs at κ 1 =. Hence, it seems possible to improve the power of the subvector AR test by reducing the χ 2 k 1 critical value based on information about the value of κ 1. If κ 1 were known, which it is not, one would set the critical value equal to the 1 α quantile of the exact distribution of AR n β 0 ) and obtain a similar test with higher power than the subvector AR test in GKMC. Alternatively, if there was a one-dimensional minimal sufficient statistic for κ 1 under H 0, one could obtain a similar test by conditioning on it. Unfortunately, we are not aware of such a statistic. However, an approximation to the density of eigenvalues of noncentral Wishart matrices by Leach 1969), specialized to this case, implies that the largest eigenvalue ˆκ 1 is approximately sufficient for κ 1 when κ 1 is large and = 0. Based on this approximation, Muirhead, 1978, Section 6) provides an approximate, nuisance parameter free, conditional density of the smallest eigenvalue ˆ given the largest one ˆκ 1. This approximate density with respect to Lebesgue measure) of ˆ given ˆκ 1 can be written as f ˆ ˆκ 1 x 2 ˆκ 1 ) = f χ 2 k 1 x 2 ) ˆκ 1 x 2 ) 1/2 g ˆκ 1 ), x 2 [0, ˆκ 1 ], 2.12) 5

4 k = 2 10 k = 5 3 2 5 1 0.1 0.2 1 2 3 10 20 100 200 0.1 0.2 1 2 3 10 20 100 200 k = 10 ^κ 1 k = 20 ^κ 1 30 15 10 20 5 10 1 2 3 4 10 20 100 200 1 2 3 4 10 20 100 200 400 ^κ 1 ^κ 1 Figure 1: Conditional quantile function. The solid line plots the 1 α quantile of the distribution with density 2.12), for α = 5%. The dotted straight blue line gives the corresponding quantile of χ 2 k 1. where f χ 2 k 1 ) is the density of a χ 2 k 1 and g ˆκ 1) is a function that does not depend on any unknown parameters, see S 26) in the SM. Because 2.12) is analytically available, the quantiles of the distribution whose density is given in 2.12) can be computed easily using numerical integration for fixed values of ˆκ 1. Figure 1 plots the 1 α quantile of that distribution as a function of ˆκ 1 for α = 5% and k = 2, 5, 10, and 20. It is evident that this conditional quantile function is strictly increasing in ˆκ 1 and asymptotes to χ 2 k 1,1 α.2 We propose to use the above conditional quantile function to obtain conditional critical values for the subvector AR statistic. In practice, to make implementation of the test straightforward for empirical researchers, the conditional critical value function will be tabulated for different k 1 and α over a grid of points ˆκ 1,j, j = 1,..., J, say, and conditional critical values for any given ˆκ 1 will be obtained by linear interpolation. 3 Specifically, let q 1 α,j k 1) denote the 1 α quantile of the distribution whose density is given by 2.12) with ˆκ 1 replaced by ˆκ 1,j. The end point of the grid ˆκ 1,J should be chosen high enough so that q 1 α,j k 1) χ 2 k 1,1 α. For any realization of ˆκ 1 ˆκ 1,J, 4 find j such that ˆκ 1 [ˆκ 1,j 1, ˆκ 1,j ] with ˆκ 1,0 = 0 and q 1 α,0 k 1) = 0, and let c 1 α ˆκ 1, k 1) := ˆκ 1,j ˆκ 1 ˆκ 1,j ˆκ 1,j 1 q 1 α,j 1 k 1) + ˆκ 1 ˆκ 1,j 1 ˆκ 1,j ˆκ 1,j 1 q 1 α,j k 1). 2.13) Table 1 gives conditional critical values at significance level 5% for a fine grid for the conditioning statistic ˆκ 1 for the case k 1 = 4. To mitigate any slight over-rejection induced by interpolation, the reported critical values have been rounded up to one decimal. We will see that by using c 1 α ˆκ 1, k 1) as a critical value for the subvector AR test, one obtains a close to similar test, except for small values of κ 1. Note that ˆκ 1, the largest root of the characteristic polynomial in 2.7) is comparable to the first-stage F statistic in the case m W = 1 for the hypothesis that 2 The monotonicity statement is made based on numerical integration without an analytical proof. An analytical proof of the limiting result is given in Section S.1.2 in the SM. 3 For general m W, discussed in the next subsection, the role of k 1 is played by k m W. 4 When ˆκ 1 > ˆκ 1,J, we can define c 1 α ˆκ 1, k 1) using nonlinear interpolation between ˆκ 1,J and, i.e., c 1 α ˆκ 1, k 1) := 1 F ˆκ1 ˆκ 1,J ) ) q 1 α,j k 1) + F ˆκ 1 ˆκ 1,J )χ 2 k 1,1 α, where F is some distribution function. 6

α = 5%, k 1 = 4 ˆκ 1 cv ˆκ 1 cv 1.2 1.1 2.1 1.9 3.2 2.9 4.5 3.9 5.9 4.9 7.4 5.9 9.4 6.9 12.5 7.9 20.9 8.9 1.3 1.2 2.3 2.1 3.5 3.1 4.7 4.1 6.2 5.1 7.8 6.1 9.9 7.1 13.4 8.1 26.5 9.1 1.4 1.3 2.5 2.3 3.7 3.3 5.0 4.3 6.5 5.3 8.2 6.3 1 7.3 14.5 8.3 39.9 9.3 1.6 1.5 2.7 2.5 4.0 3.5 5.3 4.5 6.8 5.5 8.6 6.5 11.1 7.5 15.9 8.5 57.4 9.4 1.8 1.7 3.0 2.7 4.2 3.7 5.6 4.7 7.1 5.7 9.0 6.7 11.7 7.7 17.9 8.7 1000 9.48 Table 1: 1 α quantile of the conditional distribution with density given in 2.12), cv=c 1 α ˆκ 1, k 1) at different values of the conditioning variable ˆκ 1. Computed by numerical integration. Π W = 0 k m W γ is unidentified) under the null hypothesis H 0 : β = β 0. So given α, c 1 α ˆκ 1, k 1) is a data-dependent critical value that depends only on the integer k 1 the number of IVs minus the number of untested parameters), and the nonnegative scalar ˆκ 1 which is a measure of the strength of identification of the unrestricted coefficient γ. 2.2 Definition of the conditional subvector AR test for general m W We will now define the conditional subvector AR test for the general case when m W 1. The conditional subvector AR test rejects H 0 at nominal size α if AR n β 0 ) > c 1 α ˆκ 1, k m W ), 2.14) where c 1 α, ) has been defined in 2.13) for any argument consisting of a vector with first component in R + { } and second component in N. Tables of critical values for significance levels α = 10%, 5%, and 1%, and degrees of freedom k m W = 1 to 20 are provided in Section S.3 of the SM. Since AR n β 0 ) = ˆκ p, the associated test function can be written as ˆκ) := 1 [ˆκ p > c 1 α ˆκ 1, k m W )], 2.15) where 1 [ ] is the indicator function, ˆκ := ˆκ 1, ˆκ p ) and the subscript c abbreviates conditional. The subvector AR test in GKMC that uses χ 2 k m W critical value has test function ˆκ) := 1 [ˆκ p > c 1 α, k m W )]. 2.16) Since c 1 α x, ) < c 1 α, ) for all x <, it follows that E [ ˆκ)] > E [ ˆκ)], i.e., the conditional subvector AR test has strictly higher power than the unconditional) subvector AR test in GKMC. 2.3 Finite sample size of when m W = 1 As long as the conditional critical values c 1 α ˆκ 1, k m W ) guarantee size control for the new test, the actual quality of the approximation 2.12) to the true conditional density is not of major concern to us, and the main purpose of 2.12) was to give us a simple analytical expression to generate data-dependent critical values. We next compute the size of the conditional subvector AR test, and because we don t have available an analytical expression of the NRP, we need to do that numerically. This can be done easily because the 7

0.050 k=5, m W =1, α= 0.05 0.045 0.040 0.035 0.030 0.025 0.020 c GKMC 0.015 0.010 0.005 0 10 20 30 40 50 60 70 80 90 100 κ 1 Figure 2: Null rejection probabilities of conditional 2.15) red solid) and GKMC subvector AR 2.16) blue dotted) tests at nominal size 5% as a function of the nuisance parameter κ mw. The number of instruments is k = 5 and the number of nuisance parameters is m W = 1. Computed by numerical integration of the exact density 2.12). nuisance parameter κ 1 is one-dimensional, and the density of the data is analytically available, so the NRP of the test can be estimated accurately by Monte Carlo simulation or numerical integration. Using lowdimensional) simulations to calculate the asymptotic) size of a testing procedure has been used in several recent papers, see e.g. Elliott et al. 2015). Figure 2 plots the NRPs of both and the subvector AR test of GKMC in 2.16) at α = 5% as a function of κ 1 for k = 5 and m W = 1. The conditional test is evaluated using the critical values reported in Table 1 with interpolation. 5 We notice that the size of the conditional subvector AR test is controlled, because the NRPs never exceed the nominal size no matter the value of κ 1. The NRPs of the subvector AR test are monotonically increasing in κ 1 in accordance with Theorem 1. Therefore the proposed conditional test strictly dominates the unconditional test. The results for other significance levels and other values of k are the same, and they are reported in Table S.21 of the SM. We summarize this finding in the following theorem. Theorem 2 Under Assumption A, the finite-sample size of the conditional subvector AR test defined in 2.15) is equal to its nominal size α. Comment. To reiterate, the proof of Theorem 2 for given k m W and nominal size α is a combination of an analytical step that shows that the null rejection probability of the new test depends on only a scalar parameter and of a numerical step where it is shown by numerical integration and Monte Carlo simulation that none of the NRPs exceeds the nominal size. We performed these simulations for k m W = 1,..., 20 and α = 10%, 5%, and 1% using 1 million Monte Carlo replications, and in no case did we find size distortion. 5 E.g. if ˆκ 1 = 2.4 which is an element of [2.3, 2.5], then from Table 1 the critical value employed would be 2.2. To produce Figure 2 we use a grid of 42 points for κ 1, evenly spaced in log-scale between 0 and 100. In this figure, the NRPs were computed by numerical integration using the Quadpack in Ox, see Doornik 2001). The densities were evaluated using the algorithm of Koev and Edelman 2006) for the computation of hypergeometric functions of two matrix arguments. The NRPs are essentially the same when estimated by Monte Carlo integration with 1 million replications, see Section S.2 in the SM. 8

2.4 Power analysis when m W = 1 One main advantage of the conditional subvector AR test 2.14) is its computational simplicity. For general m W there are other approaches one might consider based on the information in the eigenvalues ˆκ 1,..., ˆκ mw ) that, at the expense of potentially much higher computational cost, might yield higher power than the conditional subvector AR test. For example, one could apply the critical value function approach of Moreira et al. 2016) to derive conditional critical values. One could condition on the largest m W eigenvalues rather than just the largest one. To assess the scope for power improvements over the subvector AR test in GKMC, we consider the case m W = 1 and compute power bounds of all tests that depend on the statistic ˆκ 1, ˆ ). These are point-optimal bounds based on the least favorable distribution for the nuisance parameter κ 1 under the null that = 0, see the SM S.2.3 for details. We consider both the approximately least favorable distribution ) method of Elliott et al. 2015) and the one-point least favorable distribution of Andrews et al., 2008, section 4.2), but report here only the bound for brevity and because it is very similar to the Andrews et al. 2008) upper bound. The results based on the Andrews et al. 2008) method are discussed in Section S.4.2 of the SM. Recall from 2.11) that under H 0 : β = β 0 in 2.3), the joint distribution of ˆκ 1,..., ˆκ p ) only depends on the vector of eigenvalues κ 1,..., κ mw ) of Θ W Θ W, where Θ W R k m W appears in the noncentrality matrix M = ) 0 k, Θ W of Ξ N M, Ikp ). It follows easily from S 17) in the SM that if Π W ranges through all matrices in R k m W, then κ 1,..., κ mw ) ranges through all vectors in [0, ) m W. Define A := EZ y Y β 0, W )) R k p and consider the null hypothesis H 0 : ρ A) m W versus H 1 : ρ A) = p. 2.17) Clearly, whenever H 0 holds H 0 holds too, but the reverse is not true; by S 18) in the SM, H 0 holds iff Π W is rank deficient or Π Y β β 0 ) spanπ W ). It is shown in the SM Case 2 in Subsection S.1.1) that under H 0 the joint distribution of ˆκ 1,..., ˆκ p ) is the same as the one of the vector of eigenvalues of a Wishart matrix W p k, I p, M M) with rank deficient noncentrality matrix and therefore depends only on the vector of the largest m W eigenvalues κ 1,..., κ mw ) R m W of M M. The important implication of that result is that any test ϕˆκ 1,..., ˆκ p ) [0, 1] for some measurable function ϕ that has size bounded by α under H 0 also has size in the parameters β, γ, Π Y, Π W )) bounded by α under H 0. In particular, no test ϕˆκ 1,..., ˆκ p ) that controls size under H 0 has power exceeding size under alternatives H 0\H 0. Now assume m W = 1. We compute the power of the conditional and unconditional subvector tests and at the 5% level for k = 5 and the associated power bound for a grid of values of the parameters κ 1 > 0 under the alternative, see Section S.2.3 in the SM for details. The power curves are computed using 100,000 Monte Carlo replications without importance sampling results for other k are similar and given in the SM). The left panel of Figure 3 plots the difference between the power function of the conditional test and the power bound across all alternatives. Except at alternatives very close to the null, and when κ 1 is very close to so the nuisance parameter is weakly identified), the power of the conditional subvector test is essentially on the power bound. The fact that the power of for small κ 1 is somewhat below the power bound can be explained by the fact that the test is not exactly similar, so its rejection probability can fall below α for some alternatives. The right panel of Figure 3 plots the power curves for alternatives with κ 1 =, which seem to be the least favorable to the conditional test. The power of the conditional test is mostly on the power bound, while the subvector test is well below the bound. Two-dimensional 9

Power of minus power bound 1.0 Power curves when κ 1 = 0.02 0.01 0 Power bound 75 50 25 κ 1 10 20 30 0 5 10 15 20 25 30 Figure 3: Power of conditional 2.15) and GKMC 2.16) subvector AR tests, and, and point optimal power envelope computed using the method of Elliott et al. 2015). The number of instruments is k = 5 and the number of nuisance parameters is m W = 1. The left panel plots the power of minus the power bound across all alternatives. The right panel plots the power curves for both tests and the power bound when κ 1 =. plots for other values of κ 1 are provided in the SM. As κ 1 gets larger, the power of gets closer to the power envelope, as expected. 2.5 Size of when m W > 1 - inadmissibility of We cannot extend the monotonicity result of Theorem 1 to the general case m W > 1. This is because the distribution of the subvector AR statistic depends on all the m W eigenvalues of M M in 2.10), and the method of the proof of Theorem 1 only works for the case that ρ M M) = 1. 6 However, the following result suffices to establish correct finite-sample size of the proposed conditional subvector AR test 2.15) and the inadmissibility of the subvector test in 2.16) in the general case. Theorem 3 Suppose that Assumption A holds with m W > 1. Denote by Ξ 11 R k m W +1 2 the upper left submatrix of Ξ := ΞO R k p, where Ξ and the random orthogonal matrix O R p p are defined below 2.9) and in S 4) of the SM, respectively. Then, under the null hypothesis H 0 : β = β 0 Ξ 11 Ξ 11 O W 2 k m W + 1, I 2, M 11 M11 ), where M 11 is defined in S 7) in the SM and satisfies ρ M 11 M 11 ) 1. Note that AR n β 0 ) = κ min Ξ Ξ) = κ min Ξ Ξ) κmin Ξ 11 Ξ 11 ) κ max Ξ 11 Ξ 11 ) κ max Ξ Ξ) = κmax Ξ Ξ), 2.18) where the first and third inequalities hold by the inclusion principle, see Lütkepohl, 1996, p. 73) and the second and last equalities hold because O is orthogonal. Therefore, P AR n β 0 ) > c 1 α κ max Ξ Ξ), k m W )) P κ min Ξ Ξ 11 11 ) > c 1 α κ max Ξ Ξ 11 11 ), k m W )) α, 2.19) 6 See Perlman and Olkin, 1980, p. 1337) for some more discussion of the difficulties involved in extending the result to the general case. 10

0.04 NRP, k=5, m W =1, α= 0.05 0.2 Power curves when κ 1 =0 power bound ϕ adj 0.02 ϕ adj 0.1 0 20 40 60 80 100 κ 1 0 1 2 3 4 5 6 Figure 4: Left panel: NRP of 2.15), GKMC 2.16) and adjusted subvector AR tests,, and ϕ adj. Right panel: comparison of power curves when κ 1 = to point optimal power envelope computed using the method of Elliott et al. 2015). where the first inequality follows from 2.18). The second inequality follows from Theorem 2 for the case m W = 1 and from Theorem 3 by conditioning on O, where the role of k is now played by k m W + 1. Hence, the conditional subvector AR test has correct size for any m W. Because c 1 α κ max Ξ Ξ), k m W ) < χ 2 k m W,1 α, it follows that the subvector AR test given in 2.16) is inadmissible. 2.6 Refinement Figure 2 shows that the NRPs of test for nominal size 5% is considerably below 5% for small values of κ 1, which causes a loss of power for some alternatives that are close to H 0, see Figure 3. However, we can reduce the under-rejection by adjusting the conditional critical values to bring the test closer to similarity. 7 For the case k = 5, m W = 1, and α = 5%, let ϕ adj be the test that uses the critical values in Table 1 where the smallest 8 critical values are divided by 5 e.g., the critical value for ˆκ 1 = 2.5 becomes 0.46). Figure 4 shows that ϕ adj still has size 5%, that it is much closer to similarity than, and does not suffer from any loss of power relative to the power bound near H 0. This approach can be applied to all other values of α and k, but needs to be adjusted for each case. 3 Asymptotics In this section, Assumption A is replaced by F. Assumption B: The random vectors ε i, Z i, V Y,i V W,i ) for i = 1,..., n in 2.1) are i.i.d. with distribution Therefore, the instruments are random, the reduced form errors are not necessarily normally distributed, and the matrix Ω = E F V i V i is unknown. We define the parameter space F for γ, Π W, Π Y, F ) under the null hypothesis H 0 : β = β 0 exactly as in GKMC. 8 Namely, for U i = ε i, V W,i ) let 7 We thank Ulrich Müller for this suggestion. 8 Regarding the notation γ, Π W, Π Y, F ) and elsewhere, note that we allow as components of a vector column vectors, matrices of different dimensions), and distributions. 11

F = {γ, Π W, Π Y, F ) : γ R m W, Π W R k m W, Π Y R k m Y, E F T i 2+δ ) B, for T i {Z i ε i, vecz i V W,i), V W,i ε i, ε i, V W,i, Z i }, E F Z i V i ) = 0 k m+1), E F vecz i U i)vecz i U i)) ) = E F U i U i) E F Z i Z i)), κ min A) δ for A {E F Z i Z i), E F U i U i)}} 3.1) for some δ > 0 and B <, where denotes the Kronecker product of two matrices and vec ) the column vectorization of a matrix. Note that the factorization of the covariance matrix into a Kronecker product in line three of 3.1) is a slightly weaker assumption than conditional homoskedasticity. Rather than controlling the finite-sample size the objective is to demonstrate that the new conditional subvector AR test has asymptotic size, that is the limit of the finite-sample size with respect to F, equal to the nominal size. We next define the test statistic and the critical value for the case here where Ω is unknown. With some abuse of notation by using the same symbol for another object than above), the subvector AR statistic AR n β 0 ) is defined as the smallest root ˆκ pn of the roots ˆκ in, i = 1,..., p ordered nonincreasingly) of the characteristic polynomial ˆκIp Ûn Y 0, W ) PZ Y 0, W ) Û n = 0, 3.2) where Û n := n k) 1 Y 0, W ) MZ Y 0, W ) ) 1/2 3.3) and Û 2 n is a consistent estimator under certain drifting sequences from the parameter space F) for Ω β 0 ) in 2.5), see Lemma 1 in the SM for details. The conditional subvector AR test rejects H 0 at nominal size α if AR n β 0 ) > c 1 α ˆκ 1n, k m W ), 3.4) where c 1 α, ) has been defined in 2.13) and ˆκ 1n is the largest root of 3.2). Theorem 4 Under Assumption B, the conditional subvector AR test in 3.4) implemented at nominal size α 0, 1) has asymptotic size equal to α for the parameter space F defined in 3.1). Comments. 1. The proof of Theorem 4 is given in Section S.1.3 in the SM. It relies on showing that the limiting NRP is smaller or equal to α along all relevant drifting sequences of parameters from F. This is done by showing that the limiting NRPs equal finite-sample NRPs under Assumption A. Therefore the same comment applies to Theorem 4 as the comment below Theorem 2. The analysis is substantially more complicated here than in GKMC, in part because the critical values are also random. 2. Theorem 4 remains true if the conditional critical value c 1 α ˆκ 1n, k m W ) of the subvector AR test is replaced by any other critical value, c 1 α ˆκ 1n, k m W ) say, where c 1 α, k m W ) is a continuous non-decreasing function such that the corresponding test under Assumption A has finite-sample size equal to α. In particular, besides the critical values obtained from Table 1 by interpolation also the critical values suggested in Section 2.6 could be used. 12

4 Conclusion We show that the subvector AR test of GKMC is inadmissible by developing a new conditional subvector AR test that has correct size and uses data-dependent critical values that are always smaller than the χ 2 k m W critical values in GKMC. The critical values are increasing in a conditioning statistic that relates to the strength of identification of the parameters not under test. Our proposed test has considerably higher power under weak identification than the GKMC procedure. References Anderson, T. W. and H. Rubin 1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. Ann. Math. Statistics 20, 46 63. Andrews, D. W. 2017). Identification-robust subvector inference. Cowles Foundation Discussion Papers 3005, Cowles Foundation for Research in Economics, Yale University. Andrews, D. W., X. Cheng, and P. Guggenberger 2011). Generic Results for Establishing the Asymptotic Size of Confidence Sets and Tests. Cowles Foundation Discussion Papers 1813, Cowles Foundation for Research in Economics, Yale University. Andrews, D. W. K. and P. Guggenberger 2015). Identification- and Singularity-Robust Inference for Moment Condition. Cowles Foundation Discussion Papers 1978, Cowles Foundation for Research in Economics, Yale University. Andrews, D. W. K., M. J. Moreira, and J. H. Stock 2006). Optimal two-sided invariant similar tests for instrumental variables regression. Econometrica 74 3), 715 752. Andrews, D. W. K., M. J. Moreira, and J. H. Stock 2008). Efficient two-sided nonsimilar invariant tests in IV regression with weak instruments. Journal of Econometrics 146 2), 241 254. Andrews, I. and A. Mikusheva 2016). A geometric approach to nonlinear econometric models. Econometrica 84 3), 1249 1264. Bugni, F. A., I. A. Canay, and X. Shi 2017). Inference for subvectors and other functions of partially identified parameters in moment inequality models. Quantitative Economics 8 1), 1 38. Chaudhuri, S. and E. Zivot 2011). A new method of projection-based inference in GMM with weakly identified nuisance parameters. Journal of Econometrics 164 2), 239 251. Chernozhukov, V., C. Hansen, and M. Jansson 2009). Admissible invariant similar tests for instrumental variables regression. Econometric Theory 25 03), 806 818. Doornik, J. A. 2001). Ox: An Object-Oriented Matrix Language 4th ed.). London: Timberlake Consultants Press. Dufour, J.-M. and M. Taamouti 2005). Projection-based statistical inference in linear structural models with possibly weak instruments. Econometrica 73 4), 1351 1365. Elliott, G., U. K. Müller, and M. W. Watson 2015). Nearly optimal tests when a nuisance parameter is present under the null hypothesis. Econometrica 83 2), 771 811. 13

Guggenberger, P., F. Kleibergen, S. Mavroeidis, and L. Chen 2012). On the Asymptotic Sizes of Subset Anderson-Rubin and Lagrange Multiplier Tests in Linear Instrumental Variables Regression. Econometrica 80 6), 2649 2666. Han, S. and A. McCloskey 2017). Estimation and inference with a nearly) singular Jacobian. Manuscript. Hillier, G. 2009a). Exact properties of the conditional likelihood ratio test in an iv regression model. Econometric Theory 25 4), 915 957. Hillier, G. 2009b). On the conditional likelihood ratio test for several parameters in iv regression. Econometric Theory 25 2), 305 335. James, A. T. 1964). Distributions of matrix variates and latent roots derived from normal samples. The Annals of Mathematical Statistics, 475 501. Johnson, N. L. and S. Kotz 1970). Continuous univariate distributions. Boston: Houghton Mifflin. Kaido, H., F. Molinari, and J. Stoye 2016). Confidence intervals for projections of partially identified parameters. arxiv preprint arxiv:1601.00934. Kleibergen, F. 2002). Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70 5), 1781 1803. Kleibergen, F. 2015). Efficient size correct subset inference in linear instrumental variables regression. mimeo, University of Amsterdam. Koev, P. and A. Edelman 2006). The efficient evaluation of the hypergeometric function of a matrix argument. Mathematics of Computation 75 254), 833 846. Leach, B. G. 1969). Bessel functions of matrix argument with statistical applications. Ph. D. thesis. Lütkepohl, H. 1996). Handbook of Matrices. England: Wiley. Montiel-Olea, J.-L. 2017). Admissible, similar tests: A characterization. Manuscript. Moreira, M. J. 2003). A conditional likelihood ratio test for structural models. Econometrica 71 4), 1027 1048. Moreira, M. J., R. Mourão, and H. Moreira 2016). A critical value function approach, with an application to persistent time-series. arxiv preprint arxiv:1606.03496. Muirhead, R. J. 1978). Latent roots and matrix variates: a review of some asymptotic results. The Annals of Statistics, 5 33. Muirhead, R. J. 2009). Aspects of multivariate statistical theory, Volume 197. John Wiley & Sons. Nelson, C. R. and R. Startz 1990). Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58 4), 967 976. Nielsen, B. 1999). The likelihood-ratio test for rank in bivariate canonical correlation analysis. Biometrika 86 2), 279 288. 14

Nielsen, B. 2001). Conditional test for rank in bivariate canonical correlation analysis. Biometrika 88 3), 874 880. Olver, F. W. 1997). Asymptotics and special functions., AK Peters Ltd., Wellesley, MA, 1997. Reprint of the 1974 original. Perlman, M. D. and I. Olkin 1980). Unbiasedness of invariant tests for manova and other multivariate problems. The Annals of Statistics, 1326 1341. Rhodes Jr, G. F. 1981). Exact density functions and approximate critical regions for likelihood ratio identifiability test statistics. Econometrica: Journal of the Econometric Society, 1035 1055. Schmidt, P. 1976). Econometrics. New York: Marcel Dekker, Inc. Staiger, D. and J. H. Stock 1997). Instrumental variables regression with weak instruments. Econometrica 65, 557 586. Stock, J. H., J. H. Wright, and M. Yogo 2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics 20, 518 529. Zhu, Y. 2015). A new method for uniform subset inference of linear instrumental variables models. Available at ssrn: https://ssrn.com/abstract=2620552. 15

Supplemental Material The Supplemental Material contains proofs of the results in the main text and additional graphs and figures. S.1 Proofs and derivations Proof of Theorem 1: The monotonicity follows from Perlman and Olkin, 1980, Theorem 3.5). The proof relies on the following result, available in Muirhead, 2009, Theorem 10.3.8), which states that a 2 2 non-central Wishart matrix with noncentrality matrix of rank 1 can be expressed as T T, where ) t 11 t 12 T =, 0 t 22 t 2 11 χ 2 k κ 1) non-central χ 2 with noncentrality parameter κ 1 ), t 2 22 χ 2 k 1, t 12 N 0, 1), and t 11, t 12, t 22 are mutually independent. The minimum eigenvalue of T T, ˆκ min, is given by ˆκ min = t2 11 + t 2 12 + t 2 22 t 2 11 + t2 12 + t2 22 )2 4t 2 11 t2 22. 2 It is straightforward to show that ˆκ min t 2 22, which establishes the upper bound in the distribution of ˆκ min in GKMC. It is also straightforward to establish that ˆκ min is monotonically increasing in t 2 11, and since t 2 11 is stochastically increasing in κ 1 see, e.g., Johnson and Kotz, 1970, ch. 28)), then ˆκ min is stochastically increasing in κ 1, as shown formally in Perlman and Olkin, 1980, Theorem 3.5). Finally, ˆκ min t 2 p 22 0 as κ 1 because t 2 p d 11 ), and therefore, ˆκ min χ 2 k 1, as required. Proof of Theorem 3: Consider the k m W + 1) Gaussian matrix Ξ N M, I kmw +1)), with M nonstochastic and ρ M) m W. Partition Ξ as ) Ξ 11 Ξ 12 Ξ =, S 1) Ξ 21 Ξ 22 where Ξ 11 is k m W + 1) 2, Ξ 12 is k m W + 1) m W 1), Ξ 21 is m W 1) 2, and Ξ 22 is m W 1) m W 1). Partition M conformably with Ξ. Let µ i, i = 1,..., m W, denote the possibly nonzero singular values of M the order doesn t matter for the arguments below). Wlog, we can set M = M 11 0 k m W +1 m W 1 0 m W 1 2 M 22 ), S 2) where ) 0 k m W 1 0 k m W 1 M 11 :=, and M 22 := diag µ 1,...µ mw 1). S 3) 0 µ mw Let O := I2 + Ξ 21Ξ 1 Ξ 1 22 Ξ 21 22 Ξ 1 22 Ξ 21 I2 + Ξ 21Ξ 1 22 Ξ 1 22 Ξ 21 ) 1/2 Ξ 21Ξ 1 ImW 1 + Ξ 1 22 ) 1/2 ImW 1 + Ξ 1 22 Ξ 21Ξ 21Ξ 1 ) 1/2 22 22 Ξ 21Ξ 21Ξ 1 22 ) 1/2 ) S 4) S.1

and where ) Ξ11 Ξ12 Ξ := ΞO =, S 5) 0 Ξ22 Ξ 11 := Ξ 11 Ξ 12 Ξ 1 22 Ξ ) 21 I2 + Ξ 21Ξ 1 22 Ξ 1 22 Ξ 1/2 21). S 6) Moreover, since Ξ 21 and Ξ 22 are independent of Ξ 11 and Ξ 12, and O O = I mw +1, conditional on O, Ξ 11 R k m W +1) 2 is Gaussian with covariance matrix I 2k mw +1) and mean M 11 := M 11 M 12 Ξ 1 22 Ξ ) 21 I2 + Ξ 21Ξ 1 22 Ξ 1 22 Ξ ) 1/2 21 = M 11 I2 + Ξ 21Ξ 1 22 Ξ 1 22 Ξ 21) 1/2. S 7) ) Since ρ M 11 ) 1 by S 3), the same holds for ρ M11. Hence, conditional on O, Ξ Ξ 11 11 W 2 k m W +1, ) I 2, M 11 M11 ) with ρ M 11 M11 1. S.1.1 Joint distribution of the vector of eigenvalues of eigenproblem 2.7) We study the joint distribution of the vector of eigenvalues ˆκ 1,..., ˆκ mw ) of the eigenproblem that defines the subvector statistic AR n β 0 ) when the hypothesized β 0 does not necessarily equal the true slope parameter β. Recall the model 2.1) and the eigenproblem of the subvector AR statistic 2.7). Pre/post-multiplying 2.7) by 1 0 γ I mw ) yields 0 = κσ u, W ) P Z u, W ) S 8) an equivalent eigenproblem, where u := y Y β 0 W γ = ε + Y β β 0 ), Σ := σ uu Σ uv W Σ uvw Σ VW V W ), S 9) and σ uu and Σ uv W R m W denote the variance of u and the covariance between u and V W, respectively. Note that u does not equal the structural error ε in 2.1) unless β = β 0. Note that for C := ) σuu 1/2 0 Σ 1/2 V W V W.u Σ σ 1 uv W uu Σ 1/2 V W V W.u with Σ VW V W.u := Σ VW V W Σ uv W Σ uvw σ 1 uu R m W m W, S 10) CΣC = I p holds. Therefore, pre and postmultiplying S 8) by C leads to 0 = κi p u/σuu 1/2, W u Σ ) ) uv W Σ 1/2 V σ W V W.u P Z u/σuu 1/2, W u Σ ).u) uv W Σ 1/2 V uu σ W V W uu S 11) or 0 = κi 1+m W ξ uξ u ξ W.u ξ u ) ξ uξ W.u ξ W.u ξ W.u, S 12) where ξ u := Z Z) 1/2 Z u/σ 1/2 uu R k and ξ W.u := Z Z) 1/2 Z W u Σ uv W σ uu ) Σ 1/2 V W V W.u Rk m W. S 13) S.2

Now, E ξ u )=E Z Z) 1/2 Z Y β β 0 ) /σ 1/2 uu =Z Z) 1/2 Π Y β β 0 ) /σuu 1/2 and E ξ W.u )=Z Z) 1/2 Π W Π Y β β 0 ) Σ ) uv W Σ 1/2 V σ W V W.u. uu S 14) Hence, Ξ := [ξ u, ξ W.u ] N M, I kp ) and Ξ Ξ W p k, I p, M M), where [ M := Z Z) 1/2 Π Y β β 0 ) /σuu 1/2, Π W Π Y β β 0 ) Σ uv W σ uu ) Σ 1/2 V W V W.u ]. S 15) Case 1) Assume that H 0 in 2.3) holds. In that case u = ε and we write Σ = σ εε Σ εvw Σ εv W Σ VW V W ) S 16) and Σ VW V W.ε := Σ VW V W Σ εv W Σ εvw σ 1 εε. Defining Θ W := Z Z) 1/2 Π W Σ 1/2 V W V W.ε Rk m W, S 17) it follows that M = 0 k, Θ W ). Case 2) Assume instead that H 0 in 2.17) holds. Note that and therefore for M defined in S 15) M = Z Z) 1/2 AT for T := A = Z Z [Π Y β β 0 ) + Π W γ, Π W ] 1/σ1/2 uu Σ uv W σ 1/2 uu γ/σ 1/2 uu I mw + γ Σ uv W σ 1/2 uu Σ 1/2 V W V W.u )Σ 1/2 V W V W.u. S 18) S 19) Because Z Z) 1/2 and T are both of full rank it follows that ρ M) = ρ A). 9 9 To see the former, note that T is of full rank iff T := 1 c γ Σ 1/2 V W V W.u + γc is of full rank, where c := Σ uvw Σ 1/2 V W V W.u σ 1/2 uu. But whenever T a 1, a 2 ) = 0 p it follows that a 1 c a 2 = 0 and γa 1 + Σ 1/2 V W V W.u a 2 + γc a 2 = 0 m W. Inserting the former into the latter equality yields Σ 1/2 V W V W.u a 2 = 0 m W and thus a 2 = 0 m W. The latter implies a 1 = 0. Finally, Z Z) 1/2 is of full rank by Assumption A 2. ) S.3

S.1.2 The approximate conditional distribution This section replicates the analysis in Muirhead 1978, Section 6). As a special case of James, 1964, eq. 68)), the joint density of the eigenvalues ˆκ 1 and ˆ of Ξ Ξ W 2 k, I 2, M M) can be written as π 2 fˆκ1,ˆ x 1, x 2 ; κ 1, ) = 2 k Γ 2 k/2) Γ 2 1) exp 1 ) 2 x 1 + x 2 ) exp 1 ) 2 κ 1 + ) 0F 2) 1 1 2 k; 1 ) )) κ1 0 x1 0, 4 0 0 x 2 x k 3 2 1 x k 3 2 2 x 1 x 2 ) S 20) for x 1 x 2 0, where Γ m a) := π mm 1)/4 m i=1 Γ a 1 2 i 1)) and 0 F 2) 1 is the hypergeometric function of two matrix arguments. Thus, Γ 2 a) := π 1/2 Γ a) Γa 1 2 ), Γ 2 1) := π 1/2 Γ 1) Γ 1 2 ) = π and Γ 2 k/2) = π 1/2 Γ k/2) Γ k 1 2 ). So, the joint density S 20) can also be written as π 1/2 2 k Γ k/2) Γ k 1 exp 2 1 2 κ 1 + ) )exp 1 ) 2 x 1 + x 2 ) ) x k 3 2 0F 2) 1 1 2 k; 1 κ1 4 0 1 x k 3 2 0 2 x 1 x 2 ) ) x1 0, 0 Under the assumption that κ 1 > = 0, where κ 1 is large, Leach 1969) has shown that x 2 )). S 21) 0F 2) 1 1 2 k; 1 κ1 4 0 0 ) x1, 0 0 x 2 )) 2 k 2 ) 2 π Γ k/2) exp x 1 κ 1 ) 1 2 κ 1 x 1 ) 2 k 4 κ 1 x 1 x 2 )) 1 2. S 22) Substituting equation S 22) into equation S 21) gives an asymptotic representation for the density function of ˆκ 1 and ˆ under the assumption that κ 1 is large, 2 k+2 π 1/2 2 Γ k 1 exp 2 1 2 x 2 ) exp 1 ) 2 κ 1 κ k 4 1 ˆκ k 4 4 ) x k 3 2 2 x 1 x 2 ) 1 2. 1 exp [ 1 ] 2 x 1 + x 1 κ 1 ) 1 2 S 23) This is a special case of Muirhead 1978, 6.5)) with his k, m, and n corresponding to 1, p = 2, and k, respectively, and using = 0. Integrating the second line of S 23) w.r.t. x 2 yields ˆκ1 0 = π 1 2 exp 1 ) 2 x 2 2 xk/2 1 Γ ) k 1 2 Γ k+2 2 x k 3 2 2 x 1 x 2 ) 1 2 dx 2 ) 1 F 1 k 1 2, k + 2 ; x 1 2 2 ), S 24) where 1 F 1 a, c; z) is the confluent hypergeometric function. Combined with S 23), the approximate conditional distribution of ˆ given ˆκ 1 is f ˆ ˆκ 1 x 2 ˆκ 1 ) = Γ ) k+2 2 Γ ) 2 exp 1 2 x ) k 3 2 x 2 2 ˆκ 1 x 2 1 2 k 1 2 ˆκ k ). S 25) 2 1 π k 1 1 F 1 2, k+2 2 ; k 2 2 S.4

The last equation reduces to 2.12) if we use the definition of the density of χ 2 k 1, f χ 2 x 1 2) = k 1 2 k 1 2 Γ k 1 2 ) x k 3 2 2 e x 2 2. Hence, the integrating constant g ˆκ 1 ) in the approximate conditional density 2.12) is given by g ˆκ 1 ) = Γ ) k+1 k+2 2 2 2 ˆκ k 2 1 π 1 F k 1 1 2, k+2 2 ; ˆκ1 2 ). S 26) The result that c 1 α, k 1) = χ 2 k 1,1 α follows from the fact that limˆκ 1 f ˆ ˆκ 1 ˆκ 1 ) = f χ 2 k 1 ). This can be proven using the property that 1 F 1 a, c; z) z a Γ c) /Γ c a) as z Olver, 1997, p. 2 k+1 2 x 1 x 2) 257, eq. 10.08). It follows that 1/2 Γ k+2 k+1 2 ) 2 Γ k+2 2 k 1 2 ) = 2Γ 3 2) π = 1 as x 1. x k 21 S.1.3 Proof of Theorem 4 Uniformity Reparametrization π 1F 1 k 1 2, k+2 2 ; x 1 2 ) 2 π2 k 1 2 To prove that the new subvector AR test has asymptotic size bounded by the nominal size α we use a general result in Andrews, Cheng, and Guggenberger 2011, ACG from now on). To describe it, consider a sequence of arbitrary tests {ϕ n : n 1} of a certain null hypothesis and denote by RP n λ) the null rejection probability of ϕ n when the DGP is pinned down by the parameter vector λ Λ, where Λ denotes the parameter space of λ. By definition, the asymptotic size of ϕ n is defined as AsySz = lim sup n λ Λ sup RP n λ). S 27) Let {h n λ) : n 1} be a sequence of functions on Λ, where h n λ) = h n,1 λ),..., h n,j λ)) with h n,1 λ) R j = 1,..., J. Define H = {h R {± }) J : h wn λ wn ) h for some subsequence {w n } of {n} and some sequence {λ wn Λ : n 1}} S 28) Assumption B in ACG: For any subsequence {w n } of {n} and any sequence {λ wn Λ : n 1} for which h wn λ wn ) h H, RP wn λ wn ) [RP h), RP + h)] for some RP h), RP + h) 0, 1). 10 The assumption states, in particular, that along certain drifting sequences of parameters λ wn indexed by a localization parameter h the NRP of the test cannot asymptotically exceed a certain threshold RP + h) indexed by h. Proposition 1 ACG, Theorem 2.1a) and Theorem 2.2) Suppose Assumption B in ACG holds. Then, inf h H RP h) AsySz sup h H RP + h). We next verify Assumption B in ACG for the subvector AR test and establish that sup h H RP + h) = α when the test is implemented at nominal size α. To do so, we use Andrews and Guggenberger 2015, AG from now on), namely Proposition 12.5 in AG, to derive the joint limiting distribution of the eigenvalues κ in, i = 1,..., p in 3.2). We reparameterize the null distribution F to a vector λ. The vector λ is chosen such that for a subvector of λ convergence of a drifting subsequence of the subvector after suitable renormalization) yields convergence in distribution of the test statistic and the critical value. For given F define Q F := E F Z i Z i) 1/2 and U F := Ωβ 0 ) 1/2. S 29) 10 By definition, the notation x n [x 1,, x 2, ] means that x 1, lim inf n x n lim sup n x n x 2,. S.5