On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

Size: px

Start display at page:

Download "On the Uniform Asymptotic Validity of Subsampling and the Bootstrap"

Ambrose Harrison
5 years ago
Views:

1 On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University Azeem M. Shaikh Department of Economics University of Chicago April 13, 2010 Abstract This paper provides conditions under which subsampling and the bootstrap can be used to construct estimates of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions P. These results are then applied (i) to construct confidence regions that behave well uniformly over P in the sense that the coverage probability tends to at least the nominal level uniformly over P and (ii) to construct tests that behave well uniformly over P in the sense that the size tends to no greater than the nominal level uniformly over P. Without these stronger notions of convergence, the asymptotic approximations to the coverage probability or size may be poor even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process, and U-statistics. KEYWORDS: Bootstrap, Empirical Process, Moment Inequalities, Multiple Testing, Subsampling, Uniformity, U-Statistic 1

2 1 Introduction Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P and denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. In statistics and econometrics, it is often of interest to estimate certain quantiles of J n (x, P ). Two commonly used methods for this purpose are subsampling and the bootstrap. This paper provides conditions under which these estimates behave well uniformly over P. More precisely, for α (0, 1), we provide conditions under which subsampling and the bootstrap may be used to construct estimates ĉ n (1 α) of the 1 α quantiles of J n (x, P ) satisfying lim inf inf P R n ĉ n (1 α)} 1 α. (1) Similarly, for α (0, 1), we provide conditions under which subsampling and the bootstrap may be used to construct estimates ĉ n (α) of the α quantiles of J n (x, P ) satisfying lim inf We also provide stronger conditions under which lim inf inf P R n ĉ n (α)} 1 α. (2) inf P ĉ n(α) R n ĉ n (1 α)} 1 2α. (3) In many cases, it is possible to replace the lim inf and in (1), (2), and (3) with lim and =, respectively. These results differ from those usually stated in the literature in that they require the convergence to hold uniformly over P instead of just pointwise over P. The importance of this stronger notion of convergence when applying these results is discussed further below. The additional flexibility allowed for by the lim inf and in (1) and (2) greatly extends the scope of applicability of the results, especially in the case of subsampling. As we will see, ĉ n (1 α) may satisfy (1) while ĉ n (α) fails to satisfy (2), or, conversely, ĉ n (α) may satisfy (2) while ĉ n (1 α) fails to satisfy (1). This phenomenon arises when it is not possible to estimate J n (x, P ) uniformly well with respect to a suitable metric, but, in a sense to be made precise by our results, it is possible to estimate it sufficiently well to ensure that (1) or (2) still hold. It is worth noting that metrics compatible with the weak topology are not sufficient for our purposes. In particular, closeness of distributions with respect to such a metric does not ensure closeness of quantiles. See Remark 2.7 for further discussion of this point. In fact, closeness of distributions with respect to even stronger metrics, such as the Kolmogorov metric, does not ensure closeness of quantiles either. For this reason, our results rely heavily on Lemma 4.1 in the 2

3 Appendix that relates closeness of distributions with respect to a suitable metric and coverage statements. The usual arguments for the pointwise asymptotic validity of subsampling and the bootstrap rely on showing for each P P that ĉ n (1 α) tends in probability under P to the 1 α quantile of the limiting distribution of R n under P. Because our results are uniform in P P, we must consider the behavior of R n and ĉ n (1 α) under arbitrary sequences P n P : n 1}, under which the quantile estimates need not even settle down. Thus, the results are not trivial extensions of the usual pointwise asymptotic arguments. The construction of ĉ n (α) and ĉ n (1 α) satisfying (1) - (3) is useful for constructing confidence regions that behave well uniformly over P. More precisely, our results provide conditions under which subsampling and the bootstrap can be used to construct confidence regions C n = C n (X (n) ) of level α for a parameter θ(p ) that are uniformly consistent in level in the sense that lim inf inf P θ(p ) C n} 1 α. (4) Our results are also useful for constructing tests φ n = φ n (X (n) ) of level α for a null hypothesis P P 0 P against the alternative P P 1 = P \ P 0 that are uniformly consistent in level in the sense that lim sup sup E P [φ n ] α. (5) 0 In some cases, it is possible to replace the lim inf and in (4) or the lim sup and in (5) with lim and =, respectively. Confidence regions satisfying (4) are desirable because they ensure that for every ɛ > 0 there is an N such that for n > N we have that P θ(p ) C n } is no less than 1 α ɛ for all P P. In contrast, confidence regions that are only pointwise consistent in level in the sense that lim inf P θ(p ) C n} 1 α for all P P have the feature that for every ɛ > 0 and every n, however large, there is some P P for which P θ(p ) C n } is less than 1 α ɛ. Likewise, tests satisfying (5) are desirable because they ensure that for every ɛ > 0 there is an N such that for n > N we have that E P [φ n ] is no greater than α + ɛ for all P P 0. In contrast, tests that are only pointwise consistent in level in the sense that lim sup E P [φ n ] α for all P P 0 have the feature that for every ɛ > 0 and every n, however large, there is some P P 0 for which E P [φ n ] is greater than α + ɛ. For this reason, inferences based on confidence 3

4 regions or tests that fail to satisfy (4) or (5) may be very misleading in finite samples. Of course, as pointed out by Bahadur and Savage (1956), there may be no nontrivial confidence region or test satisfying (4) or (5) when P is sufficiently rich. For this reason, we will have to restrict P appropriately in our examples. In the case of the confidence regions for or tests about the mean, for instance, we will have to impose a very weak Lindeberg-type condition. Some of our results on subsampling are closely related to results in Andrews and Guggenberger (2010), which were developed independently and at about the same time as our results. See the discussion on p. 431 of Andrews and Guggenberger (2010). Our results show that the the question of whether subsampling can be used to construct estimates ĉ n (α) and ĉ n (1 α) satisfying (1) - (3) reduces to a single, succinct requirement on the asymptotic relationship between the distribution of J n (x, P ) and J b (x, P ), where b is the subsample size, whereas the results of Andrews and Guggenberger (2010) require the verification of a larger number of conditions. On the other hand, the results of Andrews and Guggenberger (2010) further provide a means of calculating the limiting value of inf P R n ĉ n (1 α)} or inf P R n ĉ n (α)} in the case where it may not satisfy (1) or (2). To the best of our knowledge, our results on the bootstrap are the first to be stated at this level of generality. An important antecedent is Romano (1989), who studies the uniform asymptotic behavior of confidence regions for a univariate cumulative distribution function. See also Mikusheva (2007), who analyzes the uniform asymptotic behavior of some tests that arise in the context of an autoregressive model where the autoregressive parameter may be close to one in absolute value. The remainder of the paper is organized as follows. In Section 2, we present the conditions under which ĉ n (α) and ĉ n (1 α) satisfying (1) - (3) may be constructed using subsampling or the bootstrap. We then provide in Section 3 several applications of our general results. These applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process and U-statistics. The discussion of U-statistics is especially noteworthy because it highlights the fact that the assumptions required for the uniform asymptotic validity of subsampling and the bootstrap may differ. In particular, subsampling may be uniformly asymptotically valid under conditions where the bootstrap fails even to be pointwise asymptotically valid. Proofs of all results can be found in the Appendix. Many of the intermediate results in the Appendix may be of independent interest, including uniform weak laws of large numbers for U-statistics and V -statistics (Lemma 4.12 and Lemma 4.13, respectively) as well as Lemma

5 2 General Results 2.1 Subsampling Theorem 2.1. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P. Denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. Let b = b n < n be a sequence of positive integers tending to infinity, but satisfying b/n 0. N n = ( n b) and L n (x, P ) = 1 N n Let 1 i N n IR b (X n,(b),i, P ) x}, (6) where X n,(b),i denotes the ith subset of data of size b. Then, the following statements are true for any α (0, 1): (i) If lim sup sup sup J b (x, P ) J n (x, P )} 0, then lim inf (ii) If lim sup sup sup J n (x, P ) J b (x, P )} 0, then lim inf (iii) If lim sup sup J b (x, P ) J n (x, P ) = 0, then lim inf inf P L 1 n inf P R n L 1 n (1 α, P )} 1 α. (7) inf P R n L 1 n (α, P )} 1 α. (8) (α, P ) R n L 1 n (1 α, P )} 1 2α. (9) Remark 2.1. It is typically easy to deduce from the conclusions of Theorem 2.1 stronger results in which the lim inf and in (7), (8) and (9) are replaced by lim and =, respectively. For example, in order to assert that (7) holds with lim inf and replaced by lim and =, respectively, all that is required is that lim P R n L 1 n (1 α, P )} = 1 α for some P P. This can be verified using the usual arguments for the pointwise asymptotic valididy of subsampling. Indeed, it suffices to show for some P P that J n (x, P ) tends in distribution to a limiting distribution J(x, P ) that is continuous at its 1 α quantile. See Politis et al. (1999) for details. Likewise, in order to assert that (8) holds with lim inf and replaced by lim and =, respectively, it suffices to show for some P P that J n (x, P ) tends in distribution to a limiting distribution J(x, P ) that is continuous at its α quantile. Finally, in order to assert that (9) 5

6 holds with lim inf and replaced by lim and =, respectively, it suffices to show for some P P that J n (x, P ) tends in distribution to a limiting distribution J(x, P ) that is continuous at its α and 1 α quantiles. Remark 2.2. Note that L n (x, P ) as defined in (6) is infeasible because it still depends on P, which is unknown, through R b (X n,(b),i, P ). Even so, Theorem 2.1 may be used without modification to construct feasible confidence regions for a parameter of interest θ(p ) provided that R n (X (n), P ), and therefore L n (x, P ), depends on P only through θ(p ). If this is the case, then one may simply invert tests of the null hypotheses θ(p ) = θ for all θ Θ to construct a confidence region for θ(p ). More concretely, suppose R n (X (n), P ) = R n (X (n), θ(p )) and L n (x, P ) = L n (x, θ(p )). Whenever we may apply part (i) of Theorem 2.1, we have that C n = θ Θ : R n (X (n), θ) L 1 n (1 α, θ)} satisfies (4). Similar conclusions follow from parts (ii) and (iii) of Theorem 2.1. Remark 2.3. In some cases, Theorem 2.1 may be used to construct feasible confidence regions for a parameter of interest θ(p ) without having to invert hypothesis tests. To illustrate this, suppose interest focuses on a real-valued parameter θ(p ) and consider the root R n (X (n), P ) = τ n (ˆθ n θ(p )). (10) Here, ˆθ n = ˆθ n (X (n) ) is some estimate of θ(p ) and τ n is a sequence tending to infinity with the sample size in hopes of satisfying the conditions of Theorem 2.1. Let ˆL n (x) be the feasible estimate of J n (x, P ) given by ˆL n (x) = 1 N n In this case, it is easy to see that Therefore, 1 i N n Iτ b (ˆθ b (X n,(b),i ) ˆθ n ) x}. L 1 1 n (1 α, P ) = ˆL n (1 α) + τ b (ˆθ n θ(p )). P τ n (ˆθ n θ(p )) L 1 1 n (1 α, P )} = P (τ n τ b )(ˆθ n θ(p )) ˆL n (1 α)}. Hence, whenever we may apply part (i) of Theorem 2.1, we have that lim inf inf P (τ 1 n τ b )(ˆθ n θ(p )) ˆL n (1 α)} 1 α. Similar conclusions follow from parts (ii) and (iii) of Theorem 2.1. An analagous modification is applicable even in some cases where the root is not of the form given by (10), for example, when θ(p ) takes values in a metric space with metric d and R n (X (n), P ) = τ n d(ˆθ n, θ(p )). 6

7 Remark 2.4. In some cases it may often be possible simply to replace L n (x, P ) in Theorem 2.1 with a feasible estimate ˆL n (x) without affecting the conclusions of the theorem. For example, when L n (x, P ) = L n (x, θ(p )), such an estimate may be given by L n (x, ˆθ n ). The validity of such a modification will typically require first an argument to establish that the quantiles of ˆL n (x) are close to the quantiles of L n (x, P ) and then a subsequencing argument to establish the desired coverage property with ˆL n (x) in place of L n (x, P ). See Example 3.1, Example 3.5 and Example 3.6 for details. Remark 2.5. It is worth emphasizing that even though Theorem 2.1 is stated for roots, it is, of course, applicable in the special case where R n (X (n), P ) = T n (X (n) ). In other words, in the special case where R n (X (n), P ), and therefore L n (x, P ), does not depend on P. This is especially useful in the context of hypothesis testing. See Example 3.3 for one such instance. 2.2 Bootstrap Theorem 2.2. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P. Denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. Let ρ(, ) be a function on P P and let ˆP n = ˆP n (X (n) ) be a (random) sequence of distributions such that for all sequences P n P : n 1} ρ( ˆP n, P n ) Pn 0 and P n ˆP n P } 1. Then, the following are true for any α (0, 1): (i) If for all sequences Q n P : n 1} and P n P : n 1} satisfying ρ(q n, P n ) 0 lim sup supj n (x, Q n ) J n (x, P n )} 0, then lim inf inf P R n Jn 1 (1 α, ˆP n )} 1 α. (11) (ii) If for all sequences Q n P : n 1} and P n P : n 1} satisfying ρ(q n, P n ) 0 lim sup supj n (x, P n ) J n (x, Q n )} 0, then lim inf inf P R n Jn 1 (α, ˆP n )} 1 α. (12) 7

8 (iii) If for all sequences Q n P : n 1} and P n P : n 1} satisfying ρ(q n, P n ) 0 then lim inf inf lim sup J n (x, Q n ) J n (x, P n ) = 0, 1 P Jn (α, ˆP n ) R n Jn 1 (1 α, ˆP n )} 1 2α. (13) Remark 2.6. It is typically easy to deduce from the conclusions of Theorem 2.2 stronger results in which the lim inf and in (11), (12) and (13) are replaced by lim and =, respectively. For example, in order to assert that (11) holds with lim inf and replaced by lim and =, respectively, all that is required is that for some P P. lim P R n J 1 n (1 α, ˆP n )} = 1 α This can be verified using the usual arguments for the pointwise asymptotic validity of the bootstrap. Indeed, if the conditions required to apply part (iii) of Theorem 2.2 hold, then it suffices to show for some P P that J n (x, P ) tends in distribution to a limiting distribution J(x, P ) that is continuous at its 1 α quantile. This follows after noting that along a further subsequence J n (x, ˆP n ) converges in distribution with probability one under P to J(x, P ). Analagous generalizations of assertions (12) and (13) hold as well. Remark 2.7. In some cases, it is possible to construct estimates Ĵn(x) of J n (x, P ) that are uniformly consistent over a large class of distributions P in the sense that for any ɛ > 0 sup P ρ(ĵn( ), J n (, P )) > ɛ} 0, (14) where ρ is the Levy metric or some other metric compatible with the weak topology. Yet a result such as (14) is not strong enough to yield any of (1) - (3). In other words, uniform coverage statements such as those in Theorem 2.1 and Theorem 2.2 do not follow from uniform approximations of the distribution of interest if the quality of the approximation is measured in terms of metrics metrizing weak convergence. To see this, consider the following simple example. Example 2.1. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P θ = Bernoulli(θ). Denote by J n (x, P θ ) the distribution of root n(ˆθn θ) under P θ. It is easy to show for any ɛ > 0 that sup P θ ρ(j n (, ˆP n ), J n (, P θ )) > ɛ} 0 (15) 0<θ<1 whenever ρ is a metric compatible with the weak topology. Nevertheless, it follows from p. 78 of Romano (1989) that the coverage statements in Theorem 2.2 fail to hold. 8

9 On the other hand, when ρ is the Kolmogorov metric, (15) holds when θ is restricted to an interval of the form [δ, 1 δ] for some δ > 0. Moreover, when θ is restricted to such an interval, the coverage statements in Theorem 2.2 hold as well. Thus, even when using a parametric bootstrap in very smooth parametric models, uniform coverage statements may require further restrictions on P. 3 Applications 3.1 Subsampling Example 3.1. (Multivariate Nonparametric Mean) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k. Suppose one wishes to construct a rectangular confidence region for µ(p ). For this purpose, a natural choice of root is We have the following theorem: n( R n (X (n) Xj,n µ j (P )), P ) = max. (16) 1 j k S j,n Theorem 3.1. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k. Denote by P j the set of distributions formed from the jth marginal distributions of the distributions in P. Suppose P is such that for all 1 j k [ (X ) µ(p ) 2 lim sup E P I λ P P σ(p ) X µ(p ) }] σ(p ) > λ = 0 (17) for P = P j. Let J n (x, P ) be the distribution of the root (16). Let b = b n < n be a sequence of positive integers tending to infinity, but satisfying b/n 0 and define L n (x, P ), the subsampling estimate of J n (x, P ), by (6). Then, for any α (0, 1), the following statements are true: (i) (ii) } n( lim inf P L 1 Xj,n µ j (P )) n (α, P ) max = 1 α. 1 j k S j,n } n( lim inf P Xj,n µ j (P )) max L 1 n (1 α, P ) = 1 α. 1 j k S j,n (iii) n( lim inf P L 1 Xj,n µ j (P )) n (α, P ) max 1 j k S j,n } L 1 n (1 α, P ) = 1 2α. 9

10 Note in Theorem 3.1 that L n (x, P ), only depends on P through µ(p ), i.e., L n (x, P ) = L n (x, µ(p )). Therefore, as described in Remark 2.2, it follows from part (i) of Theorem 3.1 that } n( C n = µ R k : L 1 Xj,n µ j ) n (α, µ) max 1 j k satisfies S j,n lim inf P µ(p ) C n} = 1 α. (18) Similar conclusions follow from parts (ii) and (iii) of Theorem 3.1. Remark 2.4, with an additional argument it is possible to show that } C n = µ R k : L 1 n (α, X n( Xj,n µ j ) n ) max 1 j k S j,n also satisfies (18) with C n in place of C n. To see this, first note that Moreover, as described in max 1 j k b( Xj,b X j,n ) S j,b max 1 j k b( Xj,b µ j (P )) S j,b + max 1 j k b(µj (P ) X j,n ) S j,b. Similarly, max 1 j k b( Xj,b µ j (P )) S j,b max 1 j k b( Xj,b X j,n ) S j,b + max 1 j k b(µj (P ) X j,n ) S j,b. Consider any sequence P n P : n 1}. Since max 1 j k b(µj (P n ) X j,n ) S j,b = o Pn (1), we have that L 1 n (α, X n ) = L 1 n (α, P n ) + o Pn (1). It now follows from an additional subsequencing argument and part (i) of Theorem 3.1 that } lim inf P L 1 n (α, X n( Xj,n µ j (P )) n ) max = 1 α, 1 j k from which the desired conclusion follows. Similar statements follow from parts (ii) and (iii) of Theorem 3.1. Remark 3.1. Theorem 3.1 generalizes to the case where the root is given by S j,n R n (X (n), P ) = f(z n (P ), Ω( ˆP n )), where Z n (P ) = ( n( X1,n µ 1 (P )) S 1,n,..., ) n( Xk,n µ k (P )) S k,n, (19) 10

11 f is a real-valued function that is continuous in both of its arguments, and f(z, Ω) is continuously distributed whenever Z N(0, Ω) for some Ω satisfying Ω j,j = 1 for all 1 j k. Under these conditions, sup J n (x, P n ) P f(z, Ω) x} 0 (20) whenever Z n (P n ) d Z under P n and Ω( ˆP n ) Pn Ω, where Z N(0, Ω). As a result, the proof given in Section 4.4 can be adapted appropriately to accommodate this situation. The result can be further generalized to allow the distribution of f(z, Ω) to have discontinuities. In this case, we require that at any point of discontinuity P n f(z n (P n ), Ω( ˆP n )) x} P f(z, Ω) x} P n f(z n (P n ), Ω( ˆP n )) < x} P f(z, Ω) < x} whenever Z n (P n ) d Z under P n and Ω( ˆP n ) Pn Ω, where Z N(0, Ω). When this is true, it follows from Lemma 3 on p. 260 of Chow and Teicher (1978), a generalization of Polya s Theorem, that (20) holds, from which the desired result can again be established by modifying the proof in Section 4.4 appropriately. Example 3.2. (Constrained Univariate Nonparametric Mean) Andrews (2000) considers the following example. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R. Suppose it is known that µ(p ) 0 for all P P and one wishes to construct a confidence interval for µ(p ). A natural choice of root in this case is R n = R n (X (n), P ) = n(max X n, 0} µ(p )). This root differs from the one considered in Theorem 3.1 and the ones discussed in Remark 3.1 in the sense that under weak assumptions on P holds, but lim sup lim sup sup supj b (x, P ) J n (x, P )} 0 (21) sup supj n (x, P ) J b (x, P )} 0 (22) fails to hold. To see this, suppose P satisfies the uniform integrability requirement (17) with P = P. Note that J b (x, P ) = P maxz b (P ), bµ(p )} x} J n (x, P ) = P maxz n (P ), nµ(p )} x}, 11

12 where Z b (P ) = b( X b µ(p )) and Z n (P ) = n( X n µ(p )). Since bµ(p ) nµ(p ) for any P P, J b (x, P ) J n (x, P ) is bounded from above by P maxz b (P ), nµ(p )} x} J n (x, P ). It now follows from the uniform central limit theorem established by Lemma of Romano and Shaikh (2008) and Theorem 2.11 of Bhattacharya and Rao (1976) that (21) holds. A similar argument shows that (22) fails to hold. Thus, under these assumptions, subsampling can be used to construct uniformly valid upper critical values for R n, but not uniformly valid lower critical values for R n. Example 3.3. (Moment Inequalities) The generality of Theorem 2.1 illustrated in Example 3.2 is also useful when testing multisided hypotheses about the mean. To see this, let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k. Define P 0 = P P : µ(p ) 0} and P 1 = P\P 0. Consider testing the null hypothesis that P P 0 versus the alternative hypothesis that P P 1 at level α (0, 1). Such hypothesis testing problems have recently received considerable attention in the moment inequality literature in econometrics. See, for example, Andrews and Soares (2010), Andrews and Guggenberger (2010), Andrews and Jia (2009), Bugni (2010), Canay (2010), Romano and Shaikh (2008), and Romano and Shaikh (2010). Theorem 2.1 may be used to construct tests φ n (X (n) ) that are uniformly consistent in level in the sense that (5) holds under weak assumptions on P. Specifically, consider the test φ n (X (n) ) = IT n (X (n) ) > L 1 n (1 α)}, where n T n (X (n) Xj,n ) = max 1 j k S j,n and L n (x, P ) is given by (6) with R n (X (n), P ) = T n (X (n) ). If P is defined as in Theorem 3.1 and b = b n < n is a sequence of positive integers tending to infinity, but satisfying b/n 0, then it is possible to show that (21) holds. The desired conclusion therefore follows from Theorem 2.1. The required argument is essentially the same as the one presented in Romano and Shaikh (2008) for T n (X (n) ) = max n X j,n, 0} 2, 1 j k though Lemma 4.5 in the Appendix is also needed for establishing (21) in the case considered here because of Studentization. Related results are obtained by Andrews and Guggenberger (2009). Further applications of Theorem 2.1 can be found in Romano and Shaikh (2010). 12

13 Example 3.4. (Multiple Testing) We now illustrate the use of Theorem 2.1 to construct tests of multiple hypotheses that behave well uniformly over a large class of distributions. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k and consider testing the family of null hypotheses H j : µ j (P ) 0 for 1 j k (23) versus the alternative hypotheses H j : µ j (P ) > 0 for 1 j k. (24) in a way that controls the familywise error rate at level α (0, 1) in the sense that lim sup sup F W ER P α, (25) where F W ER P = P reject some H j with µ j (P ) 0}. For K 1,..., k}, define L n (x, K) according to the righthand-side of (6) with R n (X (n), P ) = max j K n Xj,n S j,n and consider the following stepwise multiple testing procedure: Algorithm 3.1. Step 1: Set K 1 = 1,..., k}. If n Xj,n max j K 1 S j,n then stop. Otherwise, reject any H j with L 1 n (1 α, K 1 ), n Xj,n S j,n > L 1 n (1 α, K 1 ) and continue to Step 2 with K 2 = n Xj,n j K 1 : S j,n } L 1 n (1 α, K 1 ).. 13

14 Step s: If n Xj,n max j K s S j,n then stop. Otherwise, reject any H j with L 1 n (1 α, K s ), n Xj,n S j,n > L 1 n (1 α, K s ) and continue to Step s + 1 with n Xj,n K s+1 = j K s : S j,n } L 1 n (1 α, K s ).. It follows from the analysis in Romano and Wolf (2005) that Algorithm 3.1 satisfies } n Xj,n F W ER P P > L 1 n (1 α, I(P )), (26) max j I(P ) S j,n where I(P ) = 1 j k : µ j (P ) 0}. (27) If P is defined as in Theorem 3.1, then the supremum over P of the righthand-side of (26) is asymptotically bounded from above by α from Example 3.3. It is, of course, possible to extend the analysis in a straightforward way to two-sided testing. See also Romano and Shaikh (2010) for applications of Theorem 2.1 to testing an uncountably infinite number of null hypotheses in a way that satisfies (25). Example 3.5. (Empirical Process on R) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R. Suppose one wishes to construct a confidence region for the cumulative distribution function associated with P, i.e., P (, t]}. For this purpose a natural choice of root is sup n ˆPn (, t]} P (, t]}, (28) where ˆP n is the empirical distribution of X (n). We have the following theorem. Theorem 3.2. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P, where P = P on R : ɛ < P (, t]} < 1 ɛ for some t R}. (29) Let J n (x, P ) be the distribution of the root (28). Then, for any α (0, 1), the following statements are true: 14

15 (i) (ii) (iii) lim inf P lim inf P lim inf P L 1 n sup L 1 n } (α, P ) sup n ˆPn (, t]} P (, t]} = 1 α. } n ˆPn (, t]} P (, t]} L 1 n (1 α, P ) = 1 α. } (α, P ) sup n ˆPn (, t]} P (, t]} L 1 n (1 α, P ) = 1 2α. We have immediately from part (i) of Theorem 3.2 that } C n = P : L 1 n (α, P ) sup n ˆPn (, t]} P (, t]} satisfies lim inf P P C n} = 1 α. (30) Similar conclusions follow from parts (ii) and (iii) of Theorem 3.2. Yet this construction is not very useful in practice since it involves searching through all possible distributions on R. On the other hand, as described in Remark 2.4, with an additional argument it is possible to show that } C n = P : L 1 n (α, ˆP n ) sup n ˆPn (, t]} P (, t]} also satisfies (30) with C n in place of C n. To see this, first note that sup Similarly, b ˆPb (, t]} ˆP n (, t]} sup b ˆPb (, t]} P (, t]} + sup b P (, t]} ˆPn (, t]}. sup b ˆPb (, t]} P (, t]} sup b ˆPb (, t]} ˆP n (, t]} Consider any sequence P n P : n 1}. Since + sup b ˆPn (, t]} P (, t]}. sup b Pn (, t]} ˆP n (, t]} = o Pn (1), we have that L 1 n (α, ˆP n ) = L 1 n (α, P n ) + o Pn (1). 15

16 It now follows from an additional subsequencing argument and part (i) of Theorem 3.2 that } lim inf P L 1 n (α, ˆP n ) sup n ˆPn (, t]} P (, t]} = 1 α, from which the desired conclusion follows. Similar statements follow from parts (ii) and (iii) of Theorem 3.2. Example 3.6. (One Sample U-Statistics) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R. Suppose one wishes to construct a confidence region for θ(p ) = θ h (P ) = E P [h(x 1,..., X m )], (31) where h is a symmetric kernel of degree m. The usual estimate of θ(p ) in this case is given by the U-statistic ˆθ n = ˆθ n (X (n) ) = ( 1 n ) m c h(x i1,..., X im ). Here, c denotes summation over all ( n m) subsets i1,..., i m } of 1,..., n}. A natural choice of root is therefore given by We have the following theorem: R n (X (n), P ) = n(ˆθ n θ(p )). (32) Theorem 3.3. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R and let h be a symmetric kernel of degree m. Define g(x, P ) = g h (x, P ) = E P [h(x, X 2,..., X m )] θ(p ), (33) and Suppose P is such that σ 2 (P ) = σ 2 h (P ) = m2 Var P [g(x i, P )]. (34) [ g 2 }] lim sup (X i, P ) g(x i, P ) E P λ σ 2 I (P ) σ(p ) > λ = 0 (35) and Var P [h(x 1,..., X m )] sup σ 2 <. (36) (P ) Let J n (x, P ) be the distribution of the root (32). Let b = b n < n be a sequence of positive integers tending to infinity, but satisfying b/n 0 and define L n (x, P ), the subsampling estimate of J n (x, P ), by (6). Then, for any α (0, 1), the following statements are true: 16

17 (i) (ii) (iii) lim inf P lim inf P L 1 n (α, P ) } n(ˆθ n θ(p )) = 1 α. } lim inf P n(ˆθn θ(p )) L 1 n (1 α, P ) = 1 α. L 1 n (α, P ) } n(ˆθ n θ(p )) L 1 n (1 α, P ) = 1 2α. Note in Theorem 3.3 that L n (x, P ), only depends on P through θ(p ), i.e., L n (x, P ) = L n (x, θ(p )). Therefore, as described in Remark 2.2, it follows from part (i) of Theorem 3.3 that satisfies C n = θ R : L 1 n (α, θ) n(ˆθ n θ)} lim inf P θ(p ) C n} = 1 α. (37) Similar conclusions follow from parts (ii) and (iii) of Theorem 3.3. Remark 2.4, with an additional argument it is possible to show that C n = θ R : L 1 n (α, ˆθ n ) n(ˆθ n θ)} also satisfies (37) with C n in place of C n. To see this, first note that Moreover, as described in b(ˆθb ˆθ n ) = b(ˆθ b θ(p )) + b(θ(p ) ˆθ n ). Consider any sequence P n P : n 1}. Since b(θ(pn ) ˆθ n ) = o Pn (1), we have that L 1 n (α, ˆθ n ) = L 1 n (α, P n ) + o Pn (1). It now follows from an additional subsequencing argument and part (i) of Theorem 3.3 that lim inf P L 1 n (α, ˆθ n ) } n(ˆθ n θ(p )) = 1 α, from which the desired conclusion follows. Similar statements follow from parts (ii) and (iii) of Theorem

18 3.2 Bootstrap Example 3.7. (Multivariate Nonparametric Mean) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k. Suppose one wishes to construct a rectangular confidence region for µ(p ). As described in Example 3.1, a natural choice of root in this case is given by (16). We have the following theorem: Theorem 3.4. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P, where P is defined as in Theorem 3.1. Let J n (x, P ) be the distribution of the root R n defined by (16). Denote by ˆP n the empirical distribution of X (n). Then, for any α (0, 1), the following statements are true: (i) (ii) } lim inf P J 1 n (α, ˆP n( Xj,n µ j (P )) n ) max = 1 α. 1 j k S j,n } n( lim inf P Xj,n µ j (P )) max J n 1 (1 α, 1 j k S ˆP n ) = 1 α. j,n (iii) lim inf P J n 1 (α, ˆP n( Xj,n µ j (P )) n ) max 1 j k S j,n } Jn 1 (1 α, ˆP n ) = 1 2α. Theorem 3.4 generalizes in the same way that Theorem 3.1 generalizes as described in Remark 3.1. In this case, the proof given in Section 4.7 can be adapted simply by modifying Lemma 4.7 appropriately. Example 3.8. (Moment Inequalities) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k and define P 0 and P 1 as in Example 3.3. Andrews and Jia (2009) propose testing the null hypothesis that P P 0 versus the alternative hypothesis that P P 1 at level α (0, 1) using an adjusted quasi-likelihood ratio statistic T n (X (n) ) defined as follows: Here, T n (X (n) ) = inf W n (t) Ω 1 k n W n (t). :t 0 Ω n = maxɛ det(ω( ˆP n )), 0}diag(Ω( ˆP n )), where ɛ > 0 and W n (t) = ( n( X1,n t) S 1,n,..., ) n( Xk,n t) S k,n. 18

19 Andrews and Jia (2009) propose a procedure for constructing critical values for T n (X (n) ) that they term refined moment selection. construction. To this end, define R n (X (n), P ) = For illustrative purposes, we instead consider a simpler inf (Z n (P ) t) Ω 1 k n (Z n (P ) t), (38) :t 0 where Z n (P ) is defined as in (19). Note that (38) is a function of Z n (P ) and Ω( ˆP n ) only, i.e., R n (X (n), P ) = f(z n (P ), Ω( ˆP n )). Furthermore, whenever Z N(0, Ω) for some Ω satisfying Ω j,j = 1 for all 1 j k, f(z, Ω) is continuously distributed except for a point mass at zero. But for any sequence P n P : n 1} such that Z n (P n ) d Z under P n and Ω( ˆP n ) Pn Ω, where Z N(0, Ω), we see that P n f(z n (P n ), Ω( ˆP n )) 0} P f(z, Ω) 0} P n f(z n (P n ), Ω( ˆP n )) < 0} P f(z, Ω) < 0}. It therefore follows from the generalization of Theorem 3.4 discussed in Example 3.7 that the test φ n (X (n) ) = IT n (X (n) ) > J 1 n (1 α, ˆP n )}, where J n (x, P ) is the distribution of (38), satisfies (5) whenever P is defined as in Theorem 3.4. Example 3.9. (Multiple Testing) Theorem 2.2 may be used in the same way that Theorem 2.1 was used in Example 3.4 to construct tests of multiple hypotheses that behave well uniformly over a large class of distributions. To see this, let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R k and again consider testing the family of null hypotheses (23) versus the alternative hypotheses (24) in a way that satisfies (25). For K 1,..., k}, let J n (x, K, P ) be the distribution of the root R n (X (n), P ) = max j K n( Xj,n µ j (P )) S j,n under P and consider the following stepwise multiple testing procedure: Algorithm 3.2. Step 1: Set K 1 = 1,..., k}. If max j K 1 n Xj,n S j,n J 1 n (1 α, K 1, ˆP n ), then stop. Otherwise, reject any H j with n Xj,n > Jn 1 (1 α, K 1, ˆP n ) S j,n 19

20 and continue to Step 2 with n Xj,n K 2 = j K 1 : S j,n } Jn 1 (1 α, K 1, ˆP n ).. Step s: If n Xj,n max j K s S j,n J 1 n (1 α, K s, ˆP n ), then stop. Otherwise, reject any H j with n Xj,n > Jn 1 (1 α, K s, ˆP n ) S j,n and continue to Step s + 1 with n Xj,n K s+1 = j K s : S j,n } Jn 1 (1 α, K s, ˆP n ).. It follows from the analysis in Romano and Wolf (2005) that Algorithm 3.1 satisfies } n Xj,n F W ER P P > Jn 1 (1 α, I(P ), ˆP n ), (39) max j I(P ) S j,n where I(P ) is defined as in (27). The righthand-side of (39) can be further bounded from above by } n( Xj,n µ j (P )) P > Jn 1 (1 α, I(P ), ˆP n ). max j I(P ) S j,n The desired conclusion therefore follows from Theorem 3.4 if P is defined as in Theorem 3.1. It is, of course, possible to extend the analysis in a straightforward way to two-sided testing. Example (Empirical Process on R) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R. Suppose one wishes to construct a confidence region for the cumulative distribution function associated with P, i.e., P (, t]}. As described in Example 3.5, a natural choice of root in this case is given by (28). We have the following theorem: Theorem 3.5. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P, where P is defined as in (29). Let J n (x, P ) be the distribution of the root (28). Denote by ˆP n the empirical distribution of X (n). Then, for any α (0, 1), the following statements are true: (i) lim inf P J 1 n (α, ˆP n ) sup } n ˆPn (, t]} P (, t]} = 1 α. 20

21 (ii) lim inf P sup } n ˆPn (, t]} P (, t]} Jn 1 (1 α, ˆP n ) = 1 α. (iii) lim inf P J 1 n (α, ˆP n ) sup } n ˆPn (, t]} P (, t]} Jn 1 (1 α, ˆP n ) = 1 2α. Some of the conclusions of Theorem 3.5 can be found in Romano (1989), though the method of proof given in the Appendix is quite different. Example (One Sample U-Statistics) Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R and let h be a symmetric kernel of degree m. Suppose one wishes to construct a confidence region for θ(p ) = θ h (P ) given by (31). As described in Example 3.6, a natural choice of root in this case is given by (32). Before proceeding, it is useful to introduce the following notation. For an arbitrary kernel h, ɛ > 0 and B > 0, denote by P h,ɛ,b the set of all distributions P on R such that E P [ h(x 1,..., X m ) θ h(p ) ɛ ] B. (40) Similarly, for an arbitrary kernel h and δ > 0, denote by S h,δ the set of all distributions P on R such that σ 2 h(p ) δ, (41) where σ 2 h(p ) is defined as in (34). Finally, for an arbitrary kernel h, ɛ > 0 and B > 0, let P h,ɛ,b be the set of distributions P on R such that E P [ h(x i1,..., X im )] θ h(p ) ɛ ] B whenever 1 i j n for all 1 j m. Using this notation, we have the following theorem: Theorem 3.6. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P on R and let h be a symmetric kernel of degree m. Define the kernel h of degree 2m according to the rule h (x 1,..., x 2m ) = h(x 1,..., x m )h(x 1, x m+2,..., x 2m ) h(x 1,..., x m )h(x m+1,..., x 2m ). (42) Suppose P P h,2+δ,b S h,δ P h,1+δ,b P h,2+δ,b for some δ > 0 and B > 0. Let J n (x, P ) be the distribution of the root R n defined by (32). Denote by ˆP n the empirical distribution of X (n). Then, for any α (0, 1), the following statements are true: 21

22 (i) (ii) (iii) lim inf P lim inf P J n 1 (α, ˆP n ) } n(ˆθ n θ(p )) = 1 α. lim inf P n(ˆθn θ(p )) J n 1 (1 α, ˆP } n ) = 1 α. Jn 1 (α, ˆP n ) n(ˆθ n θ(p )) Jn 1 (1 α, ˆP } n ) = 1 2α. Note that the conditions on P in Theorem 3.6 are stronger than the conditions on P in Theorem 3.3. While it may be possible to weaken the restrictions on P in Theorem 3.6 some, it is not possible to establish the conclusions of Theorem 3.6 under the conditions on P in Theorem 3.3. Indeed, as shown by Bickel and Freedman (1981), the bootstrap need not be even pointwise asymptotically valid under the conditions on P in Theorem Appendix 4.1 Auxiliary Results Lemma 4.1. Let F and G be (nonrandom) distribution functions on R. Then the following are true: (i) If sup G(x) F (x)} ɛ, then G 1 (1 α) F 1 (1 (α + ɛ)). (ii) If sup F (x) G(x)} ɛ, then G 1 (α) F 1 (α + ɛ). Furthermore, if X F, it follows that: (iii) If sup G(x) F (x)} ɛ, then P X G 1 (1 α)} 1 (α + ɛ). (iv) If sup F (x) G(x)} ɛ, then P X G 1 (α)} 1 (α + ɛ). 22

23 (v) If sup G(x) F (x) ɛ, then P G 1 (α) X G 1 (1 α)} 1 2(α + ɛ). If Ĝ is a random distribution function on R, then we have further that: (vi) If P sup Ĝ(x) F (x)} ɛ} 1 δ, then P X Ĝ 1 (1 α)} 1 (α + ɛ + δ). (vii) If P sup F (x) Ĝ(x)} ɛ} 1 δ, then P X Ĝ 1 (α)} 1 (α + ɛ + δ). (viii) If P sup Ĝ(x) F (x) ɛ} 1 δ, then P Ĝ 1 (α) X Ĝ 1 (1 α)} 1 2(α + ɛ + δ). Proof: To see (i), first note that sup G(x) F (x)} ɛ implies that G(x) ɛ F (x) for all x R. Thus, x R : G(x) 1 α} = x R : G(x) ɛ 1 α ɛ} x R : F (x) 1 α ɛ}, from which it follows that F 1 (1 (α + ɛ)) = infx R : F (x) 1 α ɛ} infx R : G(x) 1 α} = G 1 (1 α}. Similarly, to prove (ii), first note that sup F (x) G(x)} ɛ implies that F (x) ɛ G(x) for all x R, so x R : F (x) α + ɛ} = x R : F (x) ɛ α} x R : G(x) α}. Therefore, G 1 (α) = infx R : G(x) α} infx R : F (x) α+ɛ} = F 1 (α+ɛ). To prove (iii), note that because sup G(x) F (x)} ɛ, it follows from (i) that X G 1 (1 α)} X F 1 (1 (α+ɛ))}. Hence, P X G 1 (1 α)} P X F 1 (1 (α+ɛ))} 1 (α+ɛ). Using the same reasoning, (iv) follows from (ii) and the assumption that sup F (x) G(x)} ɛ. To see (v), note that P G 1 (α) X G 1 (1 α)} 1 P X < G 1 (α)} P X > G 1 (1 α)} = P X G 1 (α)} + P X G 1 (1 α)} 1 1 2(α + ɛ), where the first inequality follows from the Bonferroni inequality and the second inequality follows from (iii) and (iv). 23

24 To prove (vi), note that P X Ĝ 1 (1 α)} P X Ĝ 1 (1 α) supĝ(x) F (x)} ɛ} P X F 1 (1 (α + ɛ)) supĝ(x) F (x)} ɛ} = P X F 1 (1 (α + ɛ))} P X F 1 (1 (α + ɛ)) supĝ(x) F (x)} > ɛ} P X F 1 (1 (α + ɛ))} P supĝ(x) F (x)} > ɛ} = 1 α ɛ δ, where the second inequality follows from (i). A similar argument using (ii) establishes (vii). Finally, (viii) follows from (vi) and (vii) by an argument analogous to the one used to establish (v). 4.2 Proof of Theorem 2.1 Lemma 4.2. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P. Denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. Let N n = ( n b), kn = n b and define L n(x, P ) according to (6). Then, for any ɛ > 0, we have that P sup L n (x, P ) J b (x, P ) > ɛ} 1 2π. (43) ɛ k n We also have that for any 0 < δ < 1 P sup L n (x, P ) J b (x, P ) > ɛ} δ ɛ + 2 ɛ exp 2k nδ 2 }. (44) Proof: Let ɛ > 0 be given and define S n (x, P ; X 1,..., X n ) by 1 k n 1 i k n IR b ((X b(i 1)+1,..., X bi ), P ) x} J b (x, P ). Denote by S n the symmetric group with n elements. Note that using this notation, we may rewrite as 1 N n 1 i N n IR b (X n,(b),i, P ) x} J b (x, P ) Z n (x, P ; X 1,..., X n ) = 1 S n (x, P ; X n! π(1),..., X π(n) ). π S n 24

25 Note further that sup Z n (x, P ; X 1,..., X n ) 1 n! π S n sup S n (x, P ; X π(1),..., X π(n) ), which is a sum of n! identically distributed random variables. Let ɛ > 0 be given. It follows that P sup Z n (x, P ; X 1,..., X n ) > ɛ} is bounded above by P 1 n! π S n sup Using Markov s inequality, (45) can be bounded by = 1 ɛ S n (x, P ; X π(1),..., X π(n) ) > ɛ}. (45) 1 ɛ Esup S n (x, P ; X 1,..., X n ) } (46) 1 0 P sup S n (x, P ; X 1,..., X n ) > u}du. (47) We may then use the Dvoretsky-Kiefer-Wolfowitz inequality to bound (47) by which, when evaluated, yields 1 ɛ exp 2k n u 2 }du, 2 2π [Φ(2 k n ) 1 ɛ k n 2 ] < 1 2π. ɛ k n To establish (44), note that for any 0 < δ < 1, we have that Esup S n (x, P ; X 1,..., X n ) } δ + P sup S n (x, P ; X 1,..., X n ) > δ}. (48) The result (44) now follows immediately by using the Dvoretsky-Kiefer-Wolfowitz inequality to bound the second term on the right hand side in (48). Lemma 4.3. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P. Denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. Let k n = n b and define L n(x, P ) according to (6). Then, for all ɛ > 0 and γ (0, 1), we have the following: (i) P sup L n (x, P ) J n (x, P )} > ɛ} 1 2π + IsupJ b (x, P ) J n (x, P )} > (1 γ)ɛ}. γɛ k n (ii) P sup J n (x, P ) L n (x, P )} > ɛ} 1 2π + IsupJ n (x, P ) J b (x, P )} > (1 γ)ɛ}. γɛ k n 25

26 (iii) P sup L n (x, P ) J n (x, P )} > ɛ} 1 2π + Isup J b (x, P ) J n (x, P ) > (1 γ)ɛ}. γɛ k n Proof: Let ɛ > 0 and γ (0, 1) be given. Note that P supl n (x, P ) J n (x, P )} > ɛ} P supl n (x, P ) J b (x, P )} + supj b (x, P ) J n (x, P )} > ɛ} P supl n (x, P ) J b (x, P )} > γɛ} + IsupJ b (x, P ) J n (x, P )} > (1 γ)ɛ} 1 2π + IsupJ b (x, P ) J n (x, P )} > (1 γ)ɛ}, γɛ k n where the final inequality follows from Lemma 4.2. A similar argument establishes (ii) and (iii). Lemma 4.4. Let X (n) = (X 1,..., X n ) be an i.i.d. sequence of random variables with distribution P P. Denote by J n (x, P ) the distribution of a real-valued root R n = R n (X (n), P ) under P. Let k n = n b and define L n(x, P ) according to (6). Let δ 1,n (ɛ, γ, P ) = 1 2π + IsupJ b (x, P ) J n (x, P )} > (1 γ)ɛ} γɛ k n δ 2,n (ɛ, γ, P ) = 1 2π + IsupJ n (x, P ) J b (x, P )} > (1 γ)ɛ} γɛ k n δ 3,n (ɛ, γ, P ) = 1 2π + Isup J b (x, P ) J n (x, P ) > (1 γ)ɛ}. γɛ k n Then, for any ɛ > 0 and γ (0, 1), we have that (i) (ii) (iii) P R n L 1 n (1 α)} 1 (α + ɛ + δ 1,n (ɛ, γ, P )). P R n L 1 n (α)} 1 (α + ɛ + δ 2,n (ɛ, γ, P )). P L 1 n ( α 2 ) R n L 1 n (1 α 2 )} 1 (α + ɛ + δ 3,n(ɛ, γ, P )). Proof: Let ɛ > 0 and γ (0, 1) be given. By part (i) of Lemma 4.3, we have that P supl n (x, P ) J n (x, P )} > ɛ} δ 1,n (ɛ, γ, P ). 26

27 Assertion (i) follows by applying part (vi) of Lemma 4.1. Assertions (ii) and (iii) are established similarly. Proof of Theorem 2.1: To prove (i), note that by part (i) of Lemma 4.4, we have for any ɛ > 0 and γ (0, 1) that where sup By the assumption on P R n L 1 n (1 α)} 1 (α + ɛ + inf δ 1,n(ɛ, γ, P )), δ 1,n (ɛ, γ, P ) = 1 2π + IsupJ b (x, P ) J n (x, P )} > (1 γ)ɛ}. γɛ k n sup supj b (x, P ) J n (x, P )}, we have that for every ɛ > 0, inf δ 1,n (ɛ, γ, P ) 0. Thus, there exists a sequence ɛ n > 0 tending to 0 so that inf δ 1,n (ɛ n, γ, P ) 0. The desired claim now follows from applying part (i) of Lemma 4.4 to this sequence. Assertions (ii) and (iii) follow in exactly the same way. 4.3 Proof of Theorem 2.2 We prove only (i). Similar arguments can be used to establish (ii) and (iii). Let η > 0 and α (0, 1) be given. Choose δ > 0 so that sup J n (x, P ) J n (x, P )} < η 2 whenever ρ(p, P ) < δ for P P and P P. For n sufficiently large, we have that For such n, we therefore have that sup P ρ( ˆP n, P ) > δ} < η 4 and sup P ˆP n P } < η 4. 1 η 2 inf P ρ( ˆP n, P ) δ ˆP n P } inf P sup J n (x, ˆP n ) J n (x, P )} η 2 }. It follows from part (vi) of Lemma 5.1 that for such n inf P R n Jn 1 (1 α, ˆP n )} 1 (α + η). Since the choice of η was arbitrary, the desired result follows. 27

28 4.4 Proof of Theorem 3.1 Lemma 4.5. Consider any sequence P n P : n 1}, where P satisfies (17). Let X n,i, i = 1,..., n be an i.i.d. sequence of random variables with distribution P n. Then, Sn 2 P n 1. σ 2 (P n ) Proof: First assume without loss of generality that µ(p n ) = 0 and σ(p n ) = 1 for all n 1. Next, note that S 2 n = 1 n 1 i n X 2 n,i X 2 n. By Lemma of Lehmann and Romano (2005), we have that 1 n 1 i n X 2 n,i P n 1. By Lemma of Lehmann and Romano (2005), we have further that X n P n 0. The desired result therefore follows from the continuous mapping theorem. Proof of Theorem 3.1: We argue that sup sup J b (x, P ) J n (x, P ) 0. (49) Suppose by way of contradiction that (49) fails. It follows that there exists a subsequence n l such that Ω(P nl ) Ω and either or To see that neither (50) or (51) can hold, let sup J nl (x, P nl ) Φ Ω (x,..., x) 0 (50) sup J bnl (x, P nl ) Φ Ω (x,..., x) 0. (51) } n( X1,n µ 1 (P )) n( Xk,n µ k (P )) K n (x, P ) = P x 1,..., x k. (52) σ 1 (P ) σ k (P ) By Polya s Theorem, sup Φ Ω(Pnl )(x) Φ Ω (x) 0. k 28

29 It therefore follows from the uniform central limit theorem established by Lemma of Romano and Shaikh (2008) that K nl (x, P nl ) d Φ Ω (x). From Lemma 4.5, we have for 1 j k that S j,nl P nl 1. σ j (P nl ) Hence, by Slutsky s Theorem, K nl (x, P nl ) d Φ Ω (x). By Polya s Theorem, we therefore have that sup K nl (x, P nl ) Φ Ω (x) 0. k We thus see that (50) can not hold. A similar argument establishes that (51) can not hold. Hence, (49) holds. For any fixed P P, we also have that J n (x, P ) tends in distribution to a continuous limiting distribution. Assertions (i) - (iii) therefore follow from Theorem 2.1 and Remark Proof of Theorem 3.2 We begin with some preliminaries. First note that where B n is the uniform empirical process and J n (x, P ) = P sup B n (P (, t]}) x} = P sup (P ) B n (t) x}, R(P ) = cl(p (, t]} : t R}). (53) By Theorem 3.85 of Aliprantis and Border (2006), the set of all nonempty closed subsets of [0, 1] is a compact metric space with respect to the Hausdorff metric Here, d H (U, V ) = infη > 0 : U V η, V U η }. (54) U η = A η (u), u U where A η (u) is the open ball with center u and radius η. Thus, for any sequence P n P : n 1}, there is a subsequence n l and a closed set R [0, 1] along which d H (R(P nl ), R) 0. (55) Finally, denote by B the standard Brownian bridge process. By the almost sure representation theorem, we may choose B n and B so that sup B n (t) B(t) 0 a.s. (56) 0 t 1 29

30 We now argue that sup sup J b (x, P ) J n (x, P ) 0. (57) Suppose by way of contradiction that (57) fails. It follows that there exists a subsequence n l and a closed subset R [0, 1] such that d H (R(P nl ), R) 0 and either or where sup J nl (x, P nl ) J (x) 0 (58) sup J bnl (x, P nl ) J (x) 0, (59) J (x) = P sup B(t) x}. Moreover, by the definition of P, it must be the case that R contains some point different from zero and one. To see that neither (58) or (59) can hold, note that sup (P nl ) B n (t) sup B(t) sup B n (t) B(t) + sup (P nl ) (P nl ) B(t) sup B(t). (60) By (56), we see that the first term on the righthand-side of (60) tends to zero a.s. By the a.s. uniform continuity of B(t) and (82), we see that the second term on the righthand-side of (60) tends to zero a.s. Thus, sup B n (t) (P nl ) d sup B(t). Since R contains some point different from zero and one, we see from Theorem 11.1 Davydov et al. (1998) that sup B(t) is continuously distributed. By Polya s Theorem, we therefore have that (58) holds. A similar argument establishes that (59) can not hold. Hence, (57) holds. For any fixed P P, we also have that J n (x, P ) tends in distribution to a continuous limiting distribution. Assertions (i) - (iii) now follow from Theorem 2.1 and Remark Proof of Theorem 3.3 Lemma 4.6. Let h be a symmetric kernel of degree m. Denote by J n (x, P ) the distribution of R n (X (n), P ) defined in (32). Suppose P satisfies (35) and (36). Then, lim sup sup J n (x, P ) Φ(x/σ(P )) = 0. x 30

31 Proof: Let P n P : n 1} be given and denote by X n,i, i = 1,..., n an i.i.d. sequence of random variables with distribution P n. Define n (P n ) = n 1/2 (ˆθ n θ(p n )) mn 1/2 1 i n g(x n,i, P n ), (61) where g(x, P ) is defined as in (33). Note that n (P n ) is itself a mean zero, degenerate U-statistic. By Lemma A on p. 183 of Serfling (1980), we therefore see that ( ) n 1 ( m Var Pn [ n (P n )] = n m c 2 c m where the terms ζ c (P n ) are nondecreasing in c and thus Hence, It follows that Since )( n m m c ζ c (P n ) ζ m (P n ) = V ar Pn [h(x 1,..., X m )]. ( ) n 1 Var Pn [ n (P n )] n m Var Pn [ n (P n ) σ(p n ) 2 c m ] ( ) n 1 n m 2 c m ( ) n 1 m n m c=2 ( m c ( m c )( n m m c ) ζ c (P n ), (62) ) Var Pn [h(x 1,..., X m )]. ( )( ) m n m VarPn [h(x 1,..., X m )] c m c σ(p n ) )( n m m c ) 0,. (63) it follows from (36) that the lefthand-side of (63) tends to zero. Therefore, by Chebychev s inequality, we see that n (P n ) σ(p n ) P n 0. Next, note that from Lemma of Lehmann and Romano (2005) we have that mn 1/2 1 i n g(x n,i, P n ) σ(p n ) d Φ(x) under P n. We therefore have further from Slutsky s Theorem that n 1/2 (ˆθ n θ(p n )) σ(p n ) d Φ(x) under P n. An appeal to Polya s Theorem establishes the desired result. Proof of Theorem 3.3: From the triangle inequality and Lemma 4.6, we see immediately that sup sup J b (x, P ) J n (x, P ) 0. For any fixed P P, we also have that J n (x, P ) tends in distribution to a continuous limiting distribution. Assertions (i) - (iii) therefore follow from Theorem 2.1 and Remark

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP By Joseph P. Romano Azeem M. Shaikh Technical Report No. 2010-03 April 2010 Department of Statistics STANFORD UNIVERSITY Stanford, California