Inference for Subsets of Parameters in Partially Identified Models

Inference for Subsets of Parameters in Partially Identified Models Kyoo il Kim University of Minnesota June 2009, updated November 2010 Preliminary and Incomplete Abstract We propose a confidence set for the subsets of parameters under partially identified models characterized by moment inequalities. The subvector inference is based on the specification testing of Guggenberger, Hahn, and Kim (2008) that discuss the dual characterization between the specification testing of the moment inequalities and the multi-dimensional one-sided tests. We explore the idea that a specification testing has natural implications for the construction of a confidence set (CS) for a subset component of a vector-valued parameter. To be precise, let θ be the full parameter vector in an econometric model. We decompose θ = (θ 1, θ 2 ) and consider a confidence set for θ 1. The CI or CS for θ 1 can be constructed as a set of all θ 1 Θ 1, which are not rejected by specification testing whether there exists a θ 2 in Θ 2 that does not reject the model given the value of θ 1. We modify this procedure and propose a two step method constructing a CI or CS of the subvector θ 1 by restricting the values of θ 2 to its first step confidence set constructed at each candidate value of θ 1. Then, we collect the values of θ 1 that survives this modified specification test. We show that the proposed CI or CS has a correct coverage probability and that our proposed CI or CS for the subsets of parameters is asymptotically locally equivalent to the infeasible CI or CS with known true parameter set of the other parameters. Keywords: Moment inequalities, Partial identification, Specification test, Inference for subsets of parameters, Dual characterization JEL classification numbers: C12 Correspondence: kyookim@umn.edu, Address: 4-129 Hanson Hall, 1925 Fourth Street South, Minneapolis, MN55455 1

1 Introduction There has been a recent surge of interest in statistical inference for econometric models where parameters of interest are partially identified. Initiated by Manski, econometric analyses of incomplete or partially identified models have grown substantially over the last decade. Incomplete models can arise in many contexts such as interval-censored observations, sample selection with missing counterfactuals, moment inequality models, and games with multiple equilibria. Several estimation, inference methods, and/or specification testing for these models have been proposed including Manski (1990), Horowitz and Manski (1995), Manski and Tamer (2002), Chernozhukov, Hong, and Tamer (2007), Andrews, Berry, and Jia (2004), Rosen (2008), Romano and Shaikh (2008, 2010), Beresteanu and Molinari (2008), Andrews and Guggenberger (2009), Andrews and Soares (2010), Pakes, Porter, Ho, and Ishii (2006), Bugni (2010), Canay (2010), and Fan and Wu (2010) among many others. Most of the current literature, however, focuses on estimation or inference for the whole parameter vector in the model. When one wants to obtain a confidence interval of a particular parameter or a confidence set of a subset of parameters, we typically construct a confidence interval or a confidence set using a projection method from a confidence set of the whole parameter vector. It has been understood that such method will result in very conservative testing or inference. A similar argument can be found in the literature of weak instruments or weak moments where a subset of parameters are weakly identified or nonidentified. Here we want to provide a potentially less conservative testing or confidence set for the subset of parameters. Now let θ be the set of parameters in a model of interest. We decompose θ = (θ 1,θ 2 ) Θ and want to obtain a confidence set for θ 1. We start from the idea that a specification testing has natural implications for the construction of a CS for a subset component of a vector-valued parameter. A CS for θ 1 can in principle be obtained as a set of all θ 1 Θ 1, which are not rejected by the specification test that asks whether there exists a θ 2 in Θ 2 that does not reject the model given the value of θ 1. This idea has been suggested by Guggenberger, Hahn, and Kim (2008). Because we consider all the values of θ 2 in Θ 2, we may obtain a potentially conservative CS for θ 1 in this approach. This paper builds on the idea that we may improve CI or CS by restricting the values of θ 2 to a first step confidence set, denoted by C 2 (1 α 2, θ 1 ) that covers the true value of θ 2 with asymptotic coverage probability equal to 1 α 2 given the value of θ 1. Then, we collect the values of θ 1 that survive this modified specification test. We show that the constructed CI or CS of θ 1 in this two step approach has at least 1 α 1 α 2 asymptotic coverage probability of the true value of θ 1 where the second step significance level is given by α 1 following Bonferroni-type arguments. Our proposal shares with the same idea used in Chaudhuri and Zivot (2008) for GMM models where a subset of parameters are weakly identified. They focus on the case that the subset parameters of interest are strongly identified and propose inference methods for these strongly identified subset parameters while the focus of this paper is in that all the parameters of the model are partially identified. 2

We expect this modified CI or CS via modified specification testing can provide a potentially less conservative CI or CS than one from the projection method. Moreover by exploiting the duality between the specification testing and the multi-dimensional one-sided tests (e.g. Gourieroux, Holly, Monfort 1982 and Wolak 1987), we demonstrate the proposed two step procedure can be potentially practical. We further illustrate our approach in the linear moment inequality models where a practical algorithm to obtain this duality is available. This linear moment inequality models include an useful empirical framework often used in the empirical studies following Pakes, Porter, Ho, and Ishii (2006). This two step approach of constructing CI or CS for a subset of parameters can also be applied to other inference methods available in the current literature of partially identified models. Although our focus is on the linear models, we present the asymptotic justification of our two-step approach in a general model setup so that one can adopt our strategy to other methods of constructing CI or CS. In the following section, we briefly review specification testing of partially identified models characterized by moment inequalities using a dual characterization. Section 3 proposes a CI or CS for subset of parameters using the duality and the modified specification testing. In Section 4, we provide the asymptotic coverage properties of our proposed CI or CS that nests the original specification testing of Guggenberger, Hahn, and Kim (2008). Then, in Section 5 we further discuss linear moment inequalities models in detail. An illustrative example is presented in Section 6. Conclusion follows in Section 7. Technical details are deferred to Appendix. 2 Specification Testing of moment inequalities We can write many partially identified models of interest in terms of moment inequalities (including moment equalities) of the form: (1) E [ϕ(w i ;θ)] µ where ϕ is a nonlinear or linear function of θ and the data w i = (y i,x i ) and the inequalities are taken componentwise. We let y i denote endogenous variables including dependent variables and endogenous regressors and x i denote exogenous variables. Therefore, our model can nest IV models where x i includes excluded instrumental variables. In particular, the moment inequality conditions of (1) arise in many cases including models with interval measured y i and game theoretic models where (1) denotes a set of necessary conditions that characterize equilibria of the game. The identified set is defined by the collection of θ in the parameter set Θ that satisfies (1): Θ 0 = {θ Θ : θ satisfies (1)}. 3

We first illustrate specification testing on a linear model given by a restriction of the form Cθ µ for C R m d, µ R m, and θ R d. We are interested in the specification test: (2) H 0 : θ such that Cθ µ. Example: Consider the linear IV regression model (3) y i = x i θ + ε i and E [z i ε i ] = 0 for θ R d. Assume y i is only observed in terms of the lower and upper interval bounds, denoted by y il and y iu respectively, of the interval that y i is part of, y i [y il,y iu ). Assume z i has bounded support. Then w.l.o.g. we can assume that z i 0. It follows that E [z i y il ] E [z i y i ] E [z i y iu ]. Therefore, writing A = E [z i x i ], µl = E [z i y il ], and µ U = E [z i y iu ], we obtain the restriction µ L Aθ µ U or Cθ µ for C = ( A,A ) and µ = ( µ U,µ L ). The specification test for (2) tests whether there exist a θ R d such that (3) holds. Guggenberger, Hahn, and Kim (GHK, 2008) show that the null hypothesis of the specification test (2) can be equivalently given by a dual characterization of the form (4) H 0 : B(C)µ 0. This dual characterization means that there is a θ satisfying Cθ µ if and only if B(C)µ 0. Therefore, the specification testing of (2) is equivalent to the multi-dimensional one-sided test discussed, e.g., by Gourieroux, Holly, and Monfort (1982) and Wolak (1991). If C is known, then B is known 1, and the test can build on the Wald-type test statistic of the form (5) W n = inf t R p{n(b µ t) Ĵ 1 (B µ t) subject to t 0}. Here µ is a n-consistent asymptotic normal estimator of µ and Ĵ is a consistent estimator for the asymptotic variance matrix of B µ. The asymptotic distribution of W n is a mixture of χ 2 -distributions as shown in Kudo (1963). If B is unknown, we need to replace it with a consistent estimator B of B. Then Ĵ in W n needs to be replaced by a consistent estimator of the asymptotic covariance matrix of B µ. For technical details, see Appendix A. A similar dual characterization approach is provided in GHK for nonlinear models. 1 This dual characterization algorithm for the linear case is available in GHK. 4

3 Confidence set for a subset of parameters The specification testing discussed above has natural implications for the construction of a CS for a subset of parameters. The confidence set we consider is for a true parameter θ 0 Θ 0 that covers the true parameter with a specified coverage probability. This confidence set is in spirit of Imbens and Manski (2004). Alternatively one can construct a confidence set for the identified set as in Chernozhukov, Hong, and Tamer (2007) and Romano and Shaikh (2010). Typically the latter type of CS provides conservative inference when the true parameter is still of our interest. Imbens and Manski (2004) shows that a confidence interval (CI) for a true parameter is shorter than that of the identified interval. The former corresponds to one-sided CI while the latter does to two-sided CI. The intuition is that θ 0 cannot be both the upper bound and the lower bound at the same time unless they are the same. To fix the idea, we start with a model characterized by a set of linear restrictions Cθ µ. We decompose 2 θ = (θ 1,θ 2 ) Θ Θ 1 Θ 2 R k 1 R k 2. We also decompose the identified set Θ 0 as Θ 10 = {θ 1 Θ 1 : ( θ 1,θ 2) Θ0 } and Θ 20 = {θ 2 Θ 2 : ( θ 1,θ 2) Θ0 }. Note that Θ 0 Θ 10 Θ 20 but not necessarily Θ 10 Θ 20 Θ 0 unless Θ 0 itself is a product space. Accordingly we also decompose C = [C 1,C 2 ] such that Cθ = C 1 θ 1 + C 2 θ 2. We are interested in obtaining CS or CI for the first component θ 1 of θ. Since our primary interest is to obtain less conservative CI, we discuss our motivation in terms of CI but it extends to the CS of subset parameters naturally. The CI can be obtained by testing whether without loss of generality the first component θ 1 of θ is equal to a trial value θ 1 and collecting all the values of θ 1 that survives the following the null hypothesis test (6) H 0 : θ 2 Θ 2 such that C 2 θ 2 µ C 1 θ1. The CI for θ 1 can in principle be obtained as a set of all θ 1 values which are not rejected by the test of the above hypothesis such that (7) CI GHK 1 = { θ 1 Θ 1 : θ 1 does not reject H 0 with 1 α level of confidence}. Also note that the CS for the entire parameter θ is straightforward to implement from this perspective by testing and collecting θ that does not reject H 0 : C θ µ 2 Note that Θ is not necessarily a product space of Θ 1 Θ 2 where θ 1 Θ 1 and θ 2 Θ 2. 5

with a given asymptotic coverage level. The above idea has been suggested in GHK. A recent working paper by Hahn and Ridder (2009), however, argues that the confidence interval constructed by (7) may not be satisfactory and be conservative because the approach does not project out the test of overidentification. To understand this point, note that when we present our null hypothesis in (6), we do not clearly define what is our alternative hypothesis. Indeed what we really want to test is H 0 : θ 2 Θ 2 such that C 2 θ 2 µ C 1 θ1 against (8) H 1 : θ Θ such that C 1 θ 1 + C 2 θ 2 µ but θ 2 Θ 2 such that C 2 θ 2 µ C 1 θ1 but the construction of CI in (7) proceeds under the alternative hypothesis that (9) H 1 : θ 2 Θ 2 such that C 2 θ 2 µ C 1 θ1. This alternative hypothesis H 1 can be satisfied in two cases: 1) there is no θ that satisfies the model, i.e. θ Θ such that Cθ µ and 2) the model is satisfied by some θ but is not satisfied by any of ( θ 1,θ 2 ) at the particular value θ 1 of θ 1. This reveals that indeed the null hypothesis H 0 is a joint test and to increase the power of test, we need to project out the overidentification test. The latter case in the above corresponds to the alternative hypothesis H1 in (8) while the first case corresponds to (9). Therefore we do not project out the overidentification test in (7). The test of H 0 against H 1 is less powerful than the test of H 0 against H1. This also implies that we can improve CI of θ 1 by testing H 0 against H1 instead of H 1. For a linear parametric model, Hahn and Ridder (2009) propose a potentially computation-intensive way of obtaining the CI based on the alternative hypothesis of (8). The procedure, however, is practically implausible for nonlinear models. Below we consider an alternative approach. 3.1 Modified two step procedure In this paper we propose an alternative way of obtaining potentially less conservative CI that can be also applied to nonlinear models. We modify the above procedure to obtain a potentially less conservative CI for θ 1 and the CI has a desirable asymptotic property. Suppose that C 2 (1 α 2, θ 1 ) is a CS for θ 2 that covers the true value of θ 2 with the uniform asymptotic coverage probability equal to 1 α 2 at given θ 1. Note that C 2 (1 α 2, θ 1 ) could be a null set, for example, if θ 1 / Θ 10. We define a modified specification test as (10) H0 : θ 2 C 2 (1 α 2, θ 1 ) such that C 2 θ 2 µ C 1 θ1 6

against H 1 : θ 2 C 2 (1 α 2, θ 1 ) such that C 2 θ 2 µ C 1 θ1 or C 2 (1 α 2, θ 1 ) = Ø. We then propose the following CI for θ 1 (11) CI 1 = { θ 1 Θ 1 : θ 1 does not reject H 0 with 1 α 1 level of confidence}. Note that the original specification testing idea of obtaining a CI for θ 1 can be viewed as a special case of our proposed CI when α 2 = 0 and α 1 = α. By restricting the set of possible values of θ 2 to the first step confidence set, we can potentially increase the size of the test but on the other hand this also decreases the size of test because we need to take a smaller level of significance for the second step. We expect this two step approach will be more useful when the space of true parameter values contains isolated points, lines, or hyperplanes so that the first step confidence set can rule out those isolated points, lines, or hyperplanes that are away from particular trial value ( θ 1 ) of θ 1. But when the space of true parameter values is a convex set, the usefulness of the two step approach will be limited. Before implementing the two step approach, one can figure out whether the two step approach can reduce the CI obtained from the original specification testing. If the first step confidence set C 2 (1 α 2, θ 1 ) at α 2 = α (the highest level of significance) does not become the null set for any value of θ 1 in CI GHK 1, the confidence interval obtained from (7), the two step approach cannot improve on the CI from the original specification testing. 3.2 First step confidence set To complete the above procedure, one need to obtain C 2 (1 α 2, θ 1 ) in the first step. Interestingly, we can obtain this CS by applying the asymptotic version of Gourieroux, Holly, and Monfort s (1982) test. In order to understand this point, continue to assume that the model is given by Cθ µ. We obtain C 2 (1 α 2, θ 1 ) as the collection of θ 2 Θ 2 that does not reject the following hypothesis (12) H 0 : C 2 θ2 µ C 1 θ1 such that C 2 (1 α 2, θ 1 ) = {θ 2 Θ 2 : θ 2 does not reject H0 with 1 α 2 level of confidence}. Note that here we fix θ 1 and only change θ 2 to collect the values of θ 2 that does not reject H0. Also note that though the testing of (12) is for θ 2 only, the implementation is the same with the test for the whole parameter vector given by θ = ( θ 1, θ ) 2 and it even does not involve a specification test. This is a just one-sided hypothesis testing of H0 : C 1 θ1 + C 2 θ2 µ. This is equivalent to testing whether H 0 : C θ µ. Such a test does not even require any dual characterization or specification testing. By 7

applying the asymptotic version of Gourieroux, Holly, and Monfort s (1982) test and comparing C θ with µ, the confidence region for θ 2 (while fixing θ 1 = θ 1 ) can be trivially obtained. We, however, need to emphasize two differences here. The first is that we need to replace C with its consistent estimator Ĉ and use an appropriate asymptotic covariance matrix. The second is that since we fix θ 1 = θ 1, the number of hypotheses in terms of the one-sided hypothesis testing can be reduced because some of inequalities become redundant at θ 1 = θ 1. 3.3 Confidence Interval for nonlinear models The above procedure to obtain the CS or CI for the subset of parameters can be immediately extended to models characterized by nonlinear moment inequalities. The CI for the first component θ 1 of the model (1) is similarly obtained by testing for trial values of θ 1 H 0 : θ 2 C 2 (1 α 2, θ [ 1 ) such that E ϕ (w i ; θ )] 1,θ 2 µ. The CI for θ 1 is obtained by collecting all θ 1 which are not rejected by the above test such that (13) CI 1 = { θ 1 Θ 1 : θ 1 does not reject H 0 with 1 α 1 level of confidence}. Similarly we obtain C 2 (1 α 2, θ 1 ) as the collection of θ 2 Θ 2 that does not reject the following hypothesis such that [ H 0 : E ϕ (w i ; θ 1, θ )] 2 µ C 2 (1 α 2, θ 1 ) = {θ 2 Θ 2 : θ 2 does not reject H0 with 1 α 2 level of confidence}. In fact, a confidence region for θ 2 can be constructed for models characterized by nonlinear restrictions E [ϕ(w i ;θ)] µ as above. It can be done again by comparing the sample analog n 1 n i=1 (w ϕ i ; θ ) of [ ( E ϕ w i ; θ )] with µ while fixing θ 1 = θ 1. Again, this can be done by applying the asymptotic version of Gourieroux, Holly, and Monfort s (1982) test. When we implement the second step of this test, we may exploit the duality between the specification test and the multi-dimensional one-sided test in testing the hypotheses of H0 for both linear and nonlinear models. An algorithm to obtain the dual characterization has been developed in GHK for the linear models. For the nonlinear models, it may not be plausible to obtain a practical algorithm. However, we note that one can construct our proposed CI or CS without using the duality. Our procedure is still valid without the duality. Therefore, one can obtain the CI or CS for the subset 8

of parameters by implementing our two step procedure using other inference methods proposed in the literature for each step. The idea is that once we fix the value of one subset of parameters, we can use set estimation or inference method for the other subset of parameters treating them as the whole parameters in the model. Our key contributions of this paper include showing the asymptotic validity of the proposed two step method and indicating the duality is readily implementable for the two step procedure in the linear case due to GHK. We have admitted that the difficulty of implementing the specification testing using the duality for nonlinear models but it does not mean that other methods also suffer from such difficulties. Therefore, showing the asymptotic validity of our proposed method for nonlinear models contributes to the literature, although the computational issue still remains. Now we study the asymptotic coverage property of our proposed CI or CS. Again we focus on CI and the result extends to CS naturally. 4 Large sample theory We will obtain two main theoretical results in this section. Although our focus is on the linear models, we present the asymptotic justification of our two-step approach in a general model setup so that one can adopt our strategy to other alternative methods of constructing CI or CS available in the literature. First we show that our proposed CI will have a correct asymptotic coverage probability and second we show that our proposed CI is optimal in the following sense. Define ǫ n -expansion of Θ 10 as Θ ǫn 10 = {θ 1 Θ 1 : nd(θ 1,Θ 10 ) ǫ} i.e., the n (outer) neighborhood of the identified set where the metric d(, ) denotes the Hausdorff metric as the distance measure between two sets. We let a sequence ǫ n = ǫ/ n = O(1/ n). The Hausdorff metric is defined for two sets A and B: d(a, B) = max{ρ(a B), ρ(b A)}, where ρ(a B) = sup inf a b and ρ(b A) = sup inf a b. a A b B b B a A Then, we show that for any parameter value of θ 1 in the ε n -expansion of Θ 10, the asymptotic coverage probability of our proposed CI will be the same with the coverage probability when Θ 20 is known. In other words, for any parameter value of θ 1 in the ε n -expansion of Θ 10, the proposed CI is asymptotically locally equivalent to the infeasible CI that (14) CI 1 = {θ 1 Θ 1 : θ 1 does not reject H 0 with 1 α 1 coverage probability} and H 0 : θ 2 Θ 20 (θ 1 ) such that E [ϕ(w i ;θ 1,θ 2 )] µ 9

where Θ 20 (θ 10 ) denotes the set of the identified parameter values of θ 2 at θ 1 = θ 10, Θ 20 (θ 10 ) = {θ 2 Θ 2 : (θ 10,θ 2 ) Θ 0 }. To be precise, we will establish that the asymptotic coverage levels of these two CI s are the same: lim n Pr{θ 1 CI 1 } = lim n Pr{θ 1 CI1} for any θ1 Θ ǫn 0. We start with a set of conditions under which one can construct our proposed CI that covers the true parameter uniformly over the identified set. We show that our two-step procedure will produce a CI for the subset of parameters with a correct coverage probability under higher level conditions. These conditions can be checked for other alternative inference methods too. One can combine our approach with other existing inference methods in the literature. In the following section, we further discuss regularity conditions for linear models under which the proposed CI obtained using the duality has the uniform coverage probability. We use the following alternative characterization of CS and CI. This setup is general enough that it can provide a connection to other inference methods in the literature. Note that we can write our CI equivalently as (15) CI 1 = { θ 1 Θ 1 : inf v n Q n ( θ 1,θ 2 ) q n ( θ 1,1 α 1 )} θ 2 C 2 (1 α 2, θ e 1 ) where Q n ( θ 1,θ 2 ) denotes a test statistic or a criterion function (e.g., Chernozhukov, Hong, and Tamer, 2007) with an appropriate rate function v n such that inf v n Q n ( θ 1,θ 2 ) = O p (1) θ 2 C 2 (1 α 2, θ e 1 ) and it satisfies Condition 1 below. In (15), q n ( θ 1,1 α 1 ) denotes the 1 α 1 quantile of the limit distribution of inf θ2 C 2 (1 α 2, θ e 1 ) v nq n ( θ 1,θ 2 ) or appropriate critical value (it may not depend on n). Examples of v n Q n ( θ 1,θ 2 ) and corresponding q n ( θ 1,1 α 1 ) will be presented later including the linear case using the duality. The first step confidence set C 2 (1 α 2,θ 1 ) can be characterized similarly as C 2 (1 α 2,θ 1 ) = {θ 2 Θ 2 : v 2n Q 2n (θ 1,θ 2 ) q 2n (θ 1,1 α 2 )} for any fixed θ 1 where Q 2n (θ 1,θ 2 ) denotes a test statistic or a criterion function with an appropriate rate function v 2n such that v 2n Q 2n (θ 1,θ 2 ) = O p (1) for any fixed θ 1. Condition 1 (i) A test statistic or a criterion function v n Q n (θ 1, ) and the corresponding critical 10

value q n (θ 1,1 α 1 ) satisfy that lim inf Pr n θ 1 Θ 10 { } inf v nq n (θ 1,θ 2 ) q n (θ 1,1 α 1 ) θ 2 Θ 20 (θ 1 ) 1 α 1. (ii) The first stage confidence set for θ 20, denoted by C 2 (1 α 2,θ 1 ) satisfies that (a) lim n inf Pr {Θ 20 (θ 1 ) C 2 (1 α 2,θ 1 )} 1 α 2 ; θ 1 Θ 10 (b) C 2 (1 α 2,θ 1 ) is a consistent estimator of Θ 20 (θ 1 ), i.e. d(c 2 (1 α 2,θ 1 ),Θ 20 (θ 1 )) 0 for any θ 1 Θ 10. One may want to weaken the above requirements. For example, we may replace the requirements in Condition 1 (i) and (ii) (a) with { } inf lim Pr inf v nq n (θ 1,θ 2 ) q n (θ 1,1 α 1 ) θ 1 Θ 10 n θ 2 Θ 20 (θ 1 ) 1 α 1 and inf θ 1 Θ 10 lim n Pr {Θ 20(θ 1 ) C 2 (1 α 2,θ 1 )} 1 α 2. Then, we cannot obtain the uniform asymptotic coverage result over Θ 10. Instead, we will obtain the pointwise asymptotic coverage. However, in the literature, it has been stressed that a pointwise asymptotic coverage result could be quite misleading and so imposing the uniformity is important for the partially identified models. The point that the confidence set is also a consistent estimator of the true parameter set is discussed by Chernozhukov, Hong, and Tamer (2007). Under the case that their notion of degeneracy is satisfied, the consistency in Condition 1 (ii) (b) will hold with the fixed critical value. But when the degeneracy is not satisfied, we need to add a slackness variable, c n, to the critical value, which gets large as the sample size grows but c n /v n 0 so that the confidence set can cover the identified set w.p.a.1. We typically let c n = O (ln(n)) or c n = O (ln ln(n)). Condition 2 (i) Θ is compact; (ii) Q n (θ 1,θ 2 ) is continuous in θ; (iii) For any θ1 Θǫn 10, C 2(1 α,θ1 ) is not empty w.p.a.1. (iv) ǫ n,ǫ n such that for any θ 1 Θǫn 10, C 2(1 α,θ1 ) Θǫ n 20 (θ1 ) w.p.a.1. Condition 2 (i) and (ii) are standard in the literature. Condition 2 (iii) can be ensured by carefully designing the confidence set. Remark 1 For θ 1 Θ 10, C 2 (1 α,θ 1 ) will be not empty w.p.a.1. as long as C 2(1 α, ) is a valid confidence set. For θ1 Θǫn 10 /Θ 10, C 2 (1 α,θ1 ) will also be not empty w.p.a.1 if the limit distribution of v 2n Q 2n (θ 1,θ 2 ) is continuous in θ 1 such that v 2n Q 2n (θ 1,θ 2 ) Q 2 (θ 1,θ 2 ) 11

and Q 2 (θ 1,θ 2 ) is continuous in θ 1. This condition will hold, for example, if v 2n Q 2n (θ 1,θ 2 ) is given by a quadratic function of a process having a P-Donsker property and Q 2 (θ 1,θ 2 ) is a quadratic function of a Gaussian process with a.s. continuous pathes w.r.t. θ. Condition 2 (iv) means that the confidence set of θ 2 for any given value of θ1 Θǫn 10 belongs to the n-neighborhood (expansion) of the true parameter set of θ2, indexed by θ1 Θǫn 10. Now we state our main theorem that shows the asymptotic coverage property of our two step CI and its asymptotic local equivalence to the infeasible CI. Theorem 1 The confidence set obtained in (13) satisfies Condition 1. Then, (i) lim n Pr{θ 1 CI 1 } 1 α 1 α 2 uniformly over Θ 0. Further suppose C 2 (1 α 2, ) satisfies Condition 2. Then, for the infeasible CS defined in (14), we have (ii) lim n Pr{θ 1 CI 1} = lim n Pr{θ 1 CI 1 } for any θ 1 Θǫn 0. Theorem 1 (i) states that the uniform coverage probability of the proposed CI for the subset parameter is at least equal to 1 α 1 α 2 for any given asymptotic coverage probability of the first-stage CS. By construction, when α 2 = 0, the first stage confidence set becomes the whole parameter space for θ 2 and so the second stage CI becomes the CI using the original specification testing of GHK. Theorem 1 (ii) shows that the proposed CS is asymptotically locally equivalent to the infeasible CS. 5 Confidence set using dual characterization In this section, we study the asymptotic properties of the CS and CI obtained using the dual characterization for linear moment inequalities, which will be of primary interest for empirical researchers and we already have an algorithm to obtain the dual characterization. Under a minimal set of assumptions, we discuss how the proposed CI in (11) can satisfy the high level conditions of Conditions 1 and 2. Moreover, the requirements for desirable asymptotic properties are less stringent for linear models. Let B 2 (θ 1 ) = B 2 (Ĉ2,θ 1, C 2 (1 α 2,θ 1 )) represent the dual characterization of the null hypothesis of (10) restricting the parameter set of θ 2 to C 2 (1 α 2,θ 1 ) where Ĉ2 is a consistent estimator of C 2. Similarly, we let B 2 (θ 1 ) = B 2 (C 2,θ 1,Θ 20 (θ 1 )) represent the dual characterization of the null hypothesis of (10) restricting the parameter set of θ 2 to Θ 20 (θ 1 ). To obtain CI of θ 1 we focus on an approach based on Rosen (2008) and GHK among a few alternative methods available in the literature (see e.g. Andrews and Guggenberger 2009 and Andrews and Soares 12

2010). First we construct a Wald-type statistic W n ( ) indexed by θ 1 testing the null hypothesis of (10) as W n (θ 1 ) = inf t where [ ( ( n B2 (θ 1 ) µ(θ 1 ) t) B2 (θ 1 ) V (θ 1 ) B 1 ( ) 2 1)) (θ B2 (θ 1 ) µ(θ 1 ) t ] subject to t 0 µ(θ 1 ) = µ Ĉ1θ 1,µ(θ 1 ) = µ C 1 θ 1 n( µ(θ1 ) µ(θ 1 )) N (0,V (θ 1 )) uniformly over θ 1 Θ 1 V (θ 1 ) = V (θ 1 ) + o p (1) uniformly over θ 1 Θ 1. From the fact that B 2 (,, ) is continuous in the first and the third argument and Ĉ2 p C 2 and d(c 2 (1 α 2,θ 1 ),Θ 20 (θ 1 )) p 0, we also have B 2 (θ 1 ) p B 2 (θ 1 ). Following previous notation from Section 4, in the CI (15) one can let Q n (θ 1,θ 2 ) = W n (θ 1 )/n and v n = n. Note that by construction of the duality, W n (θ 1 ) does not depend on θ 2 i.e., the uniformity w.r.t. θ 2 is trivial due to the specification testing approach. The uniform weak convergence is also trivial since µ(θ 1 ) is linear in θ 1 and µ and C 1 are bounded (see Newey and McFadden (1994) for example). The asymptotic critical value is obtained as the (1 α 1 )th quantile of the limit distribution of W n (θ 1 ) given by (16) W(θ 1 ) = inf t [ (B 2 (θ 1 )z(θ 1 ) t) ( B 2 (θ 1 )V (θ 1 )B 2 (θ 1) ) ] 1 (B(θ1 )z(θ 1 ) t) subject to t 0 where z(θ 1 ) N(0,V (θ 1 )). Kudo (1963) and Wolak (1987) show that the above limit distribution is determined by a mixture chi-square distribution at each θ 1. It follows that b(θ 1 ) lim Pr {W n(θ 1 ) c} = ζ(b(θ 1 ),b(θ 1 ) j)pr{χ 2 n j c} j=0 where ζ(b(θ 1 ),b(θ 1 ) j) is the mixing function that commonly appears in the literature of multivariate one-sided hypothesis tests. Let t 0 (θ 1 ) be the solution of the minimization problem in (16). Then, the mixing function has the form of ζ(b(θ 1 ),b(θ 1 ) j) = Pr{t 0 (θ 1 ) has exactly j components equal to zero}. To obtain a critical value q n (θ 1,1 α 1 ) in the CI (15) using the above asymptotic distribution, we need to have or approximate the weight function ζ(, ). Closed forms of the mixing function are derived in 13

Wolak (1987) for some cases but general results have not been obtained. One could also approximate the mixing function using the simulation method in Sen and Silvapulle (2004, 78-80). Wolak (1987, 1991) instead propose to use the least favorable asymptotic distribution of the test statistic to obtain a critical value where all the moment inequalities at θ 1 are assumed to be binding. 5.1 A Practical critical value One can also take a practical approach as in Rosen (2008). Note from Corollary 1 of Rosen (2008) we have sup lim Pr{W n(θ 1 ) > c} 1 θ 1 Θ n 10 2 Pr { χ 2 b > c} + 1 2 Pr { χ 2 b 1 > c} for some upper bound of the number of binding constraints, b sup θ1 Θ 10 b(θ 1 ). A critical value can be obtained as the solution to (17) 1 2 Pr { χ 2 b > } 1 c 1 α 1 + 2 Pr{ χ 2 b 1 > } c 1 α 1 = α1 and this critical value does not depend on θ 1, i.e. q n (θ 1,1 α 1 ) = c 1 α 1. It follows that inf θ 1 Θ 10 lim n Pr{θ 1 CI 1 in (15) with q n (θ 1,1-α 1 )=c 1 α 1 } = inf lim Pr { W n (θ 1 ) c } θ 1 Θ 10 n 1 α 1 = 1 1 α 1 α 2 sup lim Pr{ W n (θ 1 ) > c } θ 1 Θ 10 n 1 α 1 where the last inequality holds by the construction of c 1 α 1 in (17) and also following Bonferroni-type argument. Therefore we obtain at least 1 α 1 α 2 coverage probability using our approach combined with the critical value obtained following Rosen (2008). The key advantage of this approach is that the critical value is easy to obtain and does not involve any simulation or resampling method. GHK, however, point out possible slackness of this approach. 5.2 Critical value based on a simulation method We can also use a simulation based approach. In this approach we obtain a critical value, q(θ 1,1 α 1 ), indexed by θ 1 by simulating the limit distribution of (16) such that q(θ 1,1 α 1 ) becomes the (1 α 1 )th quantile of the simulated distribution: q(θ 1,1 α 1 ) = inf x : 1 S 1 i S 1[W (i) (θ 1 ) x] 1 α 1 14

where W (i) (θ 1 ) is the i-th simulated value of (16). We, however, do not have the true values of B 2 (θ 1 ) and V (θ 1 ). To obtain the feasible critical value, we need to replace each W (i) (θ 1 ) with (18) W (i) n (θ 1) = inf t [ ( ) ( B2 (θ 1 )ẑ (i) (θ 1 ) t B2 (θ 1 ) V (θ 1 ) B ) 1 ( 2 (θ 1) B2 ẑ (i) (θ 1 ) t) ] subject to t 0 where ẑ (i) (θ 1 ) is drawn from N(0, V (θ 1 )). Denote the simulated critical value as q n,s (θ 1,1 α 1 ). Then, the CI of θ 1 is obtained by CI 1 = {θ 1 : W n (θ 1 ) q n,s (θ 1,1 α 1 )}. Now we discuss how to verify Conditions 1 and 2 to apply Theorem 1 for this approach. For this purpose let Wn(θ 0 1 ) denote an infeasible version of W n (θ 1 ) where we replace B 2 (θ 1 ) = B 2 (Ĉ2,θ 1, C 2 (1 α 2,θ 1 )) with B 2 0(θ 0(i) 1) = B 2 (Ĉ2,θ 1,Θ 20 (θ 1 )). Similarly define W n (θ 1 ) by replacing B 2 (θ 1 ) with B 2 0(θ 1) in (18). First, we want to show lim n inf Pr { Wn(θ 0 1 ) q n,s (θ 1,1 α 1 ) } 1 α 1. θ 1 Θ 10 To prove the uniform coverage property over Θ 10. We need to consider two cases separately. When θ 1 int(θ 10 ), so none of constraints is binding, we have q n,s (θ 1,1 α 1 ) w.p.a.1. Therefore, p 0 and also Wn 0(θ 1) = 0 S,n (19) Pr { W 0 n(θ 1 ) q n,s (θ 1,1 α 1 ) } 1. Second, when θ 1 Θ 10 (i.e., θ 1 is on the boundary), we need to show the uniform convergence of the distribution of Wn(θ 0 0(i) 1 ) and W n (θ 1 ) to the distribution of W(θ 1 ). In other words, if we denote by F n (x,θ 1 ) ( F n (x,θ 1 )) the distribution function of Wn(θ 0 1 ) ( W n 0(i) (θ 1 )) such that F n (x,θ 1 ) = Pr{Wn(θ 0 1 ) x} and F n (x,θ 1 ) = Pr{ W n 0(i) (θ 1 ) x} and similarly by F(x,θ 1 ) the distribution function of W(θ 1 ). We need to show (20) (21) lim sup F n (x,θ 1 ) F(x,θ 1 ) = 0 and n x 0 lim F n (x,θ 1 ) F(x,θ 1 ) = 0. sup n x 0 Below we show (20) holds. We adopt a similar proof strategy with Romano and Shaikh (2008). For this purpose, let l n (θ 1 ) = n( µ(θ 1 ) µ(θ 1 )) and t b (θ 1 ) denote components of t corresponding to binding constraints. Then, following a similar argument in Appendix A we can rewrite W 0 n (θ 1) W 0 n(θ 1 ) = inf t [ (B(θ 1 )l n (θ 1 ) t) ( B(θ 1 )V (θ 1 )B(θ 1 ) ) ] 1 (B(θ1 )l n (θ 1 ) t) s.t. t b (θ 1 ) 0 + o p (1) 15

since B 0 2 (θ 1) p B 2 (θ 1 ), V (θ 1 ) p V (θ 1 ), and n ( µ(θ 1 ) µ(θ 1 )) N (0,V (θ 1 )) uniformly in Θ 10. This implies that W 0 n (θ 1) takes values as a weighted quadratic sum of half normal variables. Therefore, if we let L n (θ 1 ) be a vector of the binding components of B(θ 1 )l n (θ 1 ), we have F n (x,θ 1 ) = Pr{L n (θ 1 ) + o p (1) C(x)} for the appropriate convex set C(x) R b(θ1). Due to Theorem 2.11 of Bhattacharay and Rao (1976), we have sup C(x) C Pr{L n (θ 1 ) + o p (1) C(x)} Pr{L(θ 1 ) C(x)} 0 where C denotes the set of all convex subsets of R b(θ1) and L(θ 1 ) denotes the limit of L n (θ 1 ). This implies the condition (20) holds and a similar argument proves (21). 3 Therefore, we conclude lim n inf Pr { Wn 0 (θ 1) q n,s (θ 1,1 α 1 ) } 1 α 1. θ 1 Θ 10 Remark 2 One could prove the asymptotic justification for our two step approach where CS or CI for each step is constructed using subsampling methods as in Romano and Shaikh (2008) following similar arguments above. This subsampling approach can also achieve the uniform coverage probability in a class of potential distribution P(θ 0 ) that generate the data (for related issue, see Andrews and Guggenberger 2009 and Andrews and Soares 2010). The simulation based approach we propose here may not achieve this uniformity over the potential data generating process P(θ 0 ). For example, depending on how the sequences of W n (θ 1 ) and q n,s (θ 1,1 α 1 ) approach to zeros, the convergence result (19) may or may not hold. 5.3 First step confidence set There are a few alternative methods we can use to construct the first stage confidence set C 2 (1 α 2,θ 1 ). We illustrate the idea again using the asymptotic version of Gourieroux, Holly, and Monfort s (1982) test for multivariate one-sided hypothesis. In this approach we obtain the CS by collecting the values of θ 2 that do not reject H 0 : C 2 θ2 µ C 1 θ 1 for each value of θ 1 Θ 1 where the critical value is indexed by θ 1. The critical value is obtained as the (1 α 2 )th quantile of the asymptotic distribution of Gourieroux, Holly, and Monfort (1982) but now indexed by θ 1. It follows that lim n Pr {Θ 20 (θ 1 ) C 2 (1 α 2,θ 1 )} 1 α 2 by construction of CS at each θ 1 and that lim n inf θ1 Θ 10 Pr {Θ 20 (θ 1 ) C 2 (1 α 2,θ 1 )} 1 α 2 under the uniform weak convergence of the test statistic over θ 1 Θ 10, which is sufficient to establish the second requirement of Condition 1. Condition 2 (i) is assumed and (ii) is obvious. We discuss how to verify Condition 2 (iii) and 3 To show (21) holds simply replace l n(θ 1) with bz (i) (θ 1) in the proof of (20). 16

(iv). Condition 2 (iii) holds since the test statistic and its limit distribution are written as quadratic functions of Gaussian processes that have a.s. continuous pathes w.r.t. θ. Specifically to construct the CS, one can use a Wald-statistic w.r.t. θ 2 for given θ 1, W 2n (θ 2,θ 1 ) = inf t where V 2 (θ 2,θ 1 ) p V 2 (θ 2,θ 1 ) and [ ) ) ) ) n (( µ Ĉ1θ 1 Ĉ2θ 2 t V 1 2 (θ 2,θ 1 ) (( µ Ĉ1θ 1 Ĉ2θ 2 t ) ) n (( µ Ĉ1θ 1 Ĉ2θ 2 (µ C 1 θ 1 C 2 θ 2 ) N(θ 2,θ 1 ) N(0,V 2 (θ 2,θ 1 )). ] s.t. t 0 Obviously the Gaussian process N(θ 2,θ 1 ) is continuous in (θ 2,θ 1 ) and so Condition 2 (iv) is satisfied due to Remark 1. Regarding Condition 2 (iii), note that ǫ n, C 2 (1 α,θ 1 ) Θ ǫn 20 (θ 1) for any θ 1 w.p.a.1 by construction of ǫ n -expansion and so Condition 2 (iv) is satisfied. 6 Heuristic Illustration Here we introduce a simple example to illustrate our idea and discuss why our procedure is potentially less conservative. We let θ = (θ 1,θ 2 ) R 2 and suppose we have the following population relationship (22) µ 1L θ 1 + θ 2 µ 1U µ 2L θ 1 θ 2 µ 2U with n consistent estimators of (µ 1L,µ 1U,µ 2L,µ 2U ) such that µ 1L µ 1L µ 1U µ 1U n N(0,V ) = N µ 2L µ 2L 0, µ 2U µ 2U σ1l 2 σ 1L1U σ 1L2L σ 1L2U σ 1U1L σ1u 2 σ 1U2L σ 1U2U σ2l 2 σ 2L2U σ2u 2 where consistent estimators for variances and covariances are available as V p V. There are several alternatives to obtain the confidence set for θ and the confidence interval for θ 1, which cover the true value θ 0 and θ 10 with certain coverage probabilities, respectively. Here we compare three alternative confidence intervals for θ 1. The first one is from Rosen (2008) 4 (CI 1 ), the second is from GHK (CI 2 ), and the third one is from the two step approach we propose (CI 3 ). The 4 To be precise, he does not consider CI specifically. Here we obtain two alternative CI s based on his CS. One is based on the projection and the second is based on our two step procedure. 17

above (22) implies the following moment inequalities (23) m(θ 1, θ 2, µ 1U,µ 1L,µ 2U,µ 2L ) Therefore we obtain the Wald-type criterion function θ 1 θ 2 + µ 1U θ 1 + θ 2 µ 1L θ 1 + θ 2 + µ 2U θ 1 θ 2 µ 2L 0. Q n (θ 1,θ 2 ) = inf t=(t 1,t 2,t 3,t 4 ) 0 (m(θ 1,θ 2, µ 1U, µ 1L, µ 2U, µ 2L ) t) V 1 (m(θ 1, θ 2, µ 1U, µ 1L, µ 2U, µ 2L ) t). We first consider Rosen (2008) s confidence set for θ as Θ 1 = {θ : n Q n (θ 1,θ 2 ) c b 1 α } where b is the maximum possible number of binding constraints, which is equal to 2 assuming µ 1U > µ 1L and µ 2U > µ 2L. The critical value c b 1 α is obtained as the solution to 1 2 Pr { χ 2 2 > c} + 1 2 Pr{ χ 2 1 > c} = α. Once one obtains the confidence set Θ 1, the projected confidence interval for θ 1 is given by CI 1 1 α = {θ 1 : (θ 1,θ 2 ) Θ 1 }. The alternative CI using our two step idea can be combined with this conservative approach. In the first step, we obtain C 2 (1 α 2,θ 1 ) as C 2 (1 α 2,θ 1 ) = {θ : n Q n (θ 1,θ 2 ) c b (θ 1 ) 1 α 2 } where b (θ 1 ) is the maximum number of binding constraints given θ 1. Note that b (θ 1 ) = 1 except four points where θ 1 and θ 2 are uniquely determined by two binding constraints. The second step CI is obtained by adding two additional constraints that θ 2 min C 2 (1 α 2,θ 1 ) and θ 2 max C 2 (1 α 2,θ 1 ). Now we consider the confidence interval from GHK. CI 2 1 α is obtained by collecting θ 1 that does not reject H 0 : θ 2,(23) holds. We let C = ( 1,1,1 1) and µ(θ 1 ) = ( µ 1U + θ 1,µ 1L θ 1, µ 2U + θ 1,µ 2L θ 1 ) so that the above hypothesis becomes H 0 : θ 2,Cθ 2 µ(θ 1 ). 18

Then, applying the dual characterization algorithm in GHK, we can write the above hypothesis as (24) H 0 : Bµ(θ 1 ) 0 with the matrix B obtained from the algorithm in GHK. The CI is CI 2 1 α = {θ 1 : (24) is not rejected with α level of significance}. Last our proposed confidence interval is given by a collection of θ 1 that does not reject (25) H 0 : θ 2 C 2 (1 α 2,θ 1 ),(23) holds. Suppose C 2 (1 α 2,θ 1 ) = {θ 2 : θ 2,1 α2 (θ 1 ) θ 2 θ 2,1 α2 (θ 1 )} and write (26) θ 2 θ 2,1 α2 (θ 1 ) θ 2 θ 2,1 α2 (θ 1 ). Then, the above null hypothesis is equivalent to the hypothesis that H 0 : θ 2, both (23) and (26) holds. If we let C 2 = ( 1,1) and µ 2 (θ 1 ) = ( θ 2,1 α2 (θ 1 ), θ 2,1 α2 (θ 1 )), the above hypothesis is written as H 0 : θ 2, ( C C 2 ) ( µ(θ1 ) θ 2 µ 2 (θ 1 ) ) and the dual characterization becomes (27) H 0 : B Then, we obtain our proposed CI as ( C C 2 )( µ(θ1 ) µ 2 (θ 1 ) ) 0. CI 3 1 α = {θ 1 : (27) is not rejected with α 1 level of significance}. Note that when we obtain the critical value of the one-sided hypothesis test of (27) based on the asymptotic distribution of Kudo (1963), we treat µ 2 (θ 1 ) as nonrandom fixed numbers. Finally we illustrate that the CI constructed from the specification testing is shorter (less conservative) than the CI obtained using the naive approach. We further assume that σ 2U1U = σ 2U1L = 19

σ 2L1U = σ 2L1L = 0. We also choose the sample size and estimates of means and variances as n = 100, µ 1U = 1, µ 1L = 1, µ 2U = 3, µ 2L = 2. σ 1U 2 = σ2 2U = 2, and σ2 1L = σ2 2L = 1. We further let ϑ 1 = θ 1 + θ 2 and ϑ 2 = θ 1 θ 2. First consider the CI 1 projected from the confidence set of θ = (θ 1,θ 2 ) that collects all value of θ that solve inf t 0 V 1 n ( µ 1U ϑ 1 t 1,ϑ 1 µ 1L t 2, µ 2U ϑ 2 t 3,ϑ 2 µ 2L t 4 ) ( µ 1U ϑ 1 t 1,ϑ 1 µ 1L t 2, µ 2U ϑ 2 t 3,ϑ 2 µ 2L t 4 ) c 1 α where the critical value c 1 α = 4.2306 with α = 0.05 solves 5 It becomes 1 4 Pr{ χ 2 2 > c } + 1 2 Pr { χ 2 1 > c } = 0.05. (28) inf t 0 0.5(1 ϑ 1 t 1 ) 2 + (ϑ 1 + 1 t 2 ) 2 + 0.5(3 ϑ 2 t 3 ) 2 + (ϑ 2 2 t 4 ) 2 0.042306. When ϑ 1 1, t 1 = 1 ϑ 1 and t 2 = 0. When 1 < ϑ 1 < 1, t 1 = 1 ϑ 1 and t 2 = ϑ 1+1. When ϑ 1 1, t 1 = 0 and t 2 = ϑ 1 + 1. Similarly we obtain the followings. When ϑ 2 2, t 3 = 3 ϑ 2 and t 4 = 0. When 2 < ϑ 2 < 3, t 3 = 3 ϑ 2 and t 4 = ϑ 2 2. When ϑ 2 3, t 3 = 0 and t 4 = ϑ 2 2. Combining these results, we obtain the confidence set (two dimensional ellipsoid ) for θ and then obtain CI s using the projection of the ellipsoid on each axis. We find CI 1 0.95(θ 1 ) = {θ 1 : 0.3548 < θ 1 < 2.2055} and CI 1 0.95 (θ 2 ) = {θ 2 : 2.1782 < θ 2 < 0.3219}. Now we construct CI s using the specification testing. Using the dual characterization algorithm 6 in GHK, we obtain ( [ ] ) /1 B = C I 4 5 This becomes 1 4 1 1 0 0 µ 1U µ 1L 1 0 1 0 µ 1U + µ 2U 2θ 1 = and so Bµ(θ 0 1 0 1 1 ) = 0. µ 1L µ 2L + 2θ 1 0 0 1 1 µ 2U µ 2L 1 1 γ(1,c/2) Γ(1) + 1 2 1 γ(1/2,c/2) Γ(1/2) incomplete Gamma function. Note Γ(1) = 1, γ(1, c/2) = 1 e c/2, and γ(1/2,c/2) = 0.05 where Γ( ) is the Gamma function and γ(, ) is the lower πerf( c/2) = Γ(1/2) Γ(1/2,0) = erf( c/2) 1 erf(0). 6 For a matrix M, we define manipulations M /k and M k. Let M R m d andk d. Let M /k R m d (typically m > m) be the matrix whose rows are given (in some arbitrary sequence) by (i) the rows c i of M, for all i with c ik = 0 and (ii) the rows c ik c j c jk c i for all i, j with c ik > 0 and c jk < 0, where by c ik we denote the element of M in row i and column k. Next let M k be the matrix that results from M by eliminating its k-th column. 20

This reduces to µ 1U + µ 2U 2 The CI of θ 1 is obtained by inverting inf t 1,t 2 0 θ 1 0 and θ 1 µ 1L + µ 2L 2 0. { n (( µ1u + µ 2U )/2 θ 1 t 1,θ 1 ( µ 1L + µ 2L ) /2 t 2 ) Ŵ 1 (( µ 1U + µ 2U ) /2 θ 1 t 1,θ 1 ( µ 1L + µ 2L )/2 t 2 ) } c 1 α where Ŵ = diag ( 1 4 σ2 1U + 1 4 σ2 2U, 1 4 σ2 1L + 1 4 σ2 2L) and the critical value c 1 α = 2.706 solves 1 2 Pr { χ 2 1 > c} = α = 0.05. So we have { (29) inf (2 θ 1 t 1 ) 2 + 2(θ 1 0.5 t 2 ) 2} 0.02706. t 1,t 2 0 As solutions to (29), we obtain the CI of θ 1, with at least 95 percent coverage probability, equal to CI 2 0.95 (θ 1 ) = {θ 1 : 0.3837 θ 1 2.1645}. Similarly we can obtain the CI for θ 2 applying the specification testing to θ 1 as (change the role of θ 1 and θ 2 in the above dual characterization), CI 2 0.95 (θ 2 ) = {θ 2 : 2.1425 θ 2 0.3575}. Comparing CI 2 0.95(θ 1 ) with CI 1 0.95(θ 1 ) and CI 2 0.95(θ 2 ) with CI 1 0.95(θ 2 ), we note that the specification testing approach produces less conservative CI s for subset parameters. To see whether the two step approach can further reduce the CI, we need to find some values of θ 1 in CI 2 0.95(θ 1 ) that can be ruled out in the first step. This can happen when the first step confidence set becomes the null set for a particular value of θ 1 CI 2 0.95(θ 1 ). Consider the Wald-type test statistic at θ 1 = θ 1, n Q n ( θ 1,θ 2 ) = ( inf m( θ 1, θ 2, µ 1U, µ 1L, µ 2U, µ 2L ) t) ( V 1 m( θ ) 1, θ 2, µ 1U, µ 1L, µ 2U, µ 2L ) t. t=(t 1,t 2,t 3,t 4 ) 0 Then, the first step confidence set can be obtained as C 2 (1 α 2, θ 1 ) = {θ 2 : n Q n ( θ 1,θ 2 ) c b 1 α 2 } where we obtain the critical value c b 1 α 2 using the conservative approach. First we take α 2 = 0.05 and c b 1 α 2 = 2.706 is obtained as the solution to 1 2 Pr { χ 2 1 > c} = 0.05 where b is reduced to one because at θ 1, two out of four inequalities in m( ) become redundant. 21

We obtain the first step confidence set, C 2 (1 α 2, θ 1 ) as a collection of θ 2 satisfies ( inf 0.5 1 ϑ ) 2 ( ) 2 1 t 1 + ϑ1 + 1 t 2 + 0.5 (3 ϑ ) 2 ( ) 2 2 t 3 + ϑ2 2 t 4 0.02706. t 0 where ϑ 1 = θ 1 + θ 2 and ϑ 2 = θ 1 θ 2. A similar exercise with (28), we find that when θ 1 < 0.3837 or when θ 1 > 2.1645, C 2 (1 α 2, θ 1 ) becomes the null set. Note that the confidence interval of CI 2 0.95(θ 1 ) does not contain those values excluded by the first step confidence set. This suggests that the two-step approach can not improve the CI based on the specification testing. We expect this result because the set of the true parameter values defined by (22) is a convex set that does not contain isolated points, lines, or hyperplanes. 7 Conclusion We study a potentially less conservative inference method for subsets of parameters under partially identified models characterized by moment inequalities. Inference methods for subset parameters have not been focused in the partial identification literature although empirical researchers often require a less conservative confidence interval or a confidence set for a smaller subset of parameters in their applications. Naive projection methods often produce conservative confidence intervals. We construct a confidence set or a confidence interval for a subset of parameters by modifying the specification testing idea of Guggenberger, Hahn, and Kim (2006) who establish the duality between the specification testing of the moment inequalities and the multi-dimensional one-sided tests of Gourieroux, Holly, Monfort (1982) and Wolak (1987, 1991). The proposed confidence interval or set is constructed in two steps. In the first step, we obtain a confidence set for the remaining parameters under a fixed value of the parameter of interest. In the second step, the CI collects the values of the parameter of interest that do not reject the modified specification testing. By restricting values that the other parameters can take to the first step confidence set, we can obtain a potentially less conservative confidence interval than other alternatives using projection methods. Interestingly we find that our proposed CI or CS for the subsets of parameters is optimal in the sense that the proposed CI or CS is asymptotically locally equivalent to the infeasible CI or CS that would be obtained when the true parameter set of the other parameters are known. 22

Appendix A Asymptotic distribution of W n The following is reproduced from the working paper version of GHK. In order to implement the specification testing of (5), we need to assume that (i) we have a n-consistent and asymptotically normal estimator µ of µ such that n( µ µ) N (0,V ); and (ii) consistent estimators B and V of B and V are available to the econometrician. Our test statistic is a Wald statistic: W n = inf t [ ( ) ( n B µ t B V ) 1 ( ) B B µ t ] subject to t 0. In order to analyze the asymptotic distribution, it is convenient to write l = n ( µ µ 0 ) N (0,V ). We can then write Now, let W n = inf t = inf t = inf t [ ( ) ( ) 1 ( ) Bl t B V B Bl t subject to t = x ] n Bµ 0, x 0 [ ( ) ( ) 1 ( ) Bl t B V B Bl t subject to t ] n Bµ 0 [(Bl t) ( BV B ) 1 ] (Bl t) subject to t nbµ0 + o p (1). C = {µ : Bµ 0} C i = {µ : Bµ > 0} C b = C C i. When Bµ 0 > 0, then the optimization is over t R p, where p denotes the number of rows of R. It follows that, for all µ 0 C i, lim Pr{W n = 0 µ 0 } = 1. n Let B b denote the subset of B such that Bµ 0 is satisfied with equality so that B b µ 0 = 0. Assume that there are m such rows. Let B s denote the subset of B such that Bµ 0 is satisfied with strict inequality so that B s µ 0 > 0. We can then write W n = inf t [ (Bl t) ( BV B ) 1 (Bl t) subject to ts ] nb s µ 0, t b 0 + o p (1). Taking the limit, we obtain W n = inf t L (t b ) + o p (1) where L (t b ) [ (Bl t) ( BV B ) ] 1 (Bl t) subject to tb 0. 23