University of California San Diego and Stanford University and

Similar documents
K-sample subsampling in general spaces: the case of independent time series

large number of i.i.d. observations from P. For concreteness, suppose

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

IEOR E4703: Monte-Carlo Simulation

The correct asymptotic variance for the sample mean of a homogeneous Poisson Marked Point Process

Control of Generalized Error Rates in Multiple Testing

Testing Statistical Hypotheses

Theoretical Statistics. Lecture 1.

Inference for identifiable parameters in partially identified econometric models

Economics 583: Econometric Theory I A Primer on Asymptotics

Exact and Approximate Stepdown Methods For Multiple Hypothesis Testing

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Inference for Identifiable Parameters in Partially Identified Econometric Models

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Marginal Screening and Post-Selection Inference

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Weak convergence of dependent empirical measures with application to subsampling in function spaces

A General Overview of Parametric Estimation and Inference Techniques.

EXPLICIT NONPARAMETRIC CONFIDENCE INTERVALS FOR THE VARIANCE WITH GUARANTEED COVERAGE

SDS : Theoretical Statistics

RESIDUAL-BASED BLOCK BOOTSTRAP FOR UNIT ROOT TESTING. By Efstathios Paparoditis and Dimitris N. Politis 1

Can we do statistical inference in a non-asymptotic way? 1

Christopher J. Bennett

The International Journal of Biostatistics

Testing Statistical Hypotheses

One-Sample Numerical Data

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bootstrap & Confidence/Prediction intervals

5 Introduction to the Theory of Order Statistics and Rank Statistics

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Bahadur representations for bootstrap quantiles 1

STAT Section 2.1: Basic Inference. Basic Definitions

A multiple testing procedure for input variable selection in neural networks

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS. Eun Yi Chung Joseph P. Romano

Bootstrap, Jackknife and other resampling methods

Robust Resampling Methods for Time Series

1 Glivenko-Cantelli type theorems

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

UC San Diego Recent Work

Inference For High Dimensional M-estimates. Fixed Design Results

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich

On block bootstrapping areal data Introduction

A Primer on Asymptotics

The Nonparametric Bootstrap

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Predictive Regression and Robust Hypothesis Testing: Predictability Hidden by Anomalous Observations

11. Bootstrap Methods

STAT 512 sp 2018 Summary Sheet

Chapter 3. Point Estimation. 3.1 Introduction

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

DA Freedman Notes on the MLE Fall 2003

Generalized nonparametric tests for one-sample location problem based on sub-samples

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Exact and Approximate Stepdown Methods for Multiple Hypothesis Testing

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Robust Backtesting Tests for Value-at-Risk Models

Statistical Applications in Genetics and Molecular Biology

Math 494: Mathematical Statistics

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Tail bound inequalities and empirical likelihood for the mean

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

UNIVERSITÄT POTSDAM Institut für Mathematik

Empirical Processes: General Weak Convergence Theory

Better Bootstrap Confidence Intervals

Hypothesis Testing For Multilayer Network Data

ST5215: Advanced Statistical Theory

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Random Number Generation. CS1538: Introduction to simulations

INFERENCE FOR AUTOCORRELATIONS IN THE POSSIBLE PRESENCE OF A UNIT ROOT

Chapter 5 Confidence Intervals

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

11 Survival Analysis and Empirical Likelihood

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions

The International Journal of Biostatistics. A Unified Approach for Nonparametric Evaluation of Agreement in Method Comparison Studies

Strong approximations for resample quantile processes and application to ROC methodology

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Advanced Statistics II: Non Parametric Tests

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bayesian estimation of the discrepancy with misspecified parametric models

The Numerical Delta Method and Bootstrap

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Lawrence D. Brown* and Daniel McCarthy*

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

Jackknife Euclidean Likelihood-Based Inference for Spearman s Rho

Quantile Regression for Residual Life and Empirical Likelihood

Transcription:

First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford University dpolitis@ucsd.edu and romano@stanford.edu Abstract The problem of subsampling in two-sample and K-sample settings is addressed where both the data and the statistics of interest take values in general spaces. We show the asymptotic validity of subsampling confidence intervals and hypothesis tests in the case of independent samples, and give a comparison to the bootstrap in the K-sample setting. 1 Introduction Subsampling is a statistical method that is most generally valid for nonparametric inference in a large-sample setting. The applications of subsampling are numerous starting from i.i.d. data and regression, and continuing to time series, random fields, marked point processes, etc. ; see olitis, Romano and Wolf (1999) for a review and list of references. Interestingly, the two-sample and K-sample settings have not been explored yet in the subsampling literature ; we attempt to fill this gap here. So, consider K independent datasets : X (1),...,X (K) where X (k) =(X (k) 1,...,X(k) variables X (k) j n k )fork =1,...,K. The random take values in an arbitrary space 1 S ; typically, S would be R d for some d, but S can very well be a function space. Although the dataset X (k) is independent of X (k ) for k k, there may exist some dependence within a dataset. For conciseness, we will focus on the case of independence within samples here ; the general case will be treated in a follow-up bigger exposition. Thus, in the sequel we will assume that, for any k, X (k) 1,...,X(k) n k are i.i.d. An example in the i.i.d. case is the usual two-sample set-up in biostatistics where d features (body characteristics, gene expressions, etc.) are measured on a group of patients, and then again measured on a control group. The probability law associated with such a K-sample experiment is specified by =( 1,..., K ), where k is the underlying probability of the kth sample ; more formally, the joint distribution of all the observations is the product measure K. The goal is inference (confidence regions, hypothesis tests, etc.) k=1 n k k 1 Actually, one can let S vary with k as well, but we do not pursue this here for lack of space.

regarding some parameter θ = θ( ) that takes values in a general normed linear space B with norm denoted by.denoten =(n 1,...,n K ), and let ˆθ n = ˆθ n (X (1),...,X (K) ) be an estimator of θ. It will be assumed that ˆθ n is consistent as min k n k. In general, one could also consider the case where the number of samples K tends to as well. Let g : B R be a continuous function, and let J n ( ) denote the sampling distribution of the root g[τ n (ˆθ n θ( ))] under, with corresponding cumulative distribution function J n (x, )=rob {g[τ n (ˆθ n θ( ))] x} (1) where τ n is a normalizing sequence ; in particular, τ n is to be thought of as a fixed function of n such that τ n when min k n k.asanexample,g( ) might be a continuous function of the norm or a projection operator. As in the one-sample case, the basic assumption that is required for subsampling to work is existence of a bona fide large-sample distribution, i.e., Assumption 1.1 There exists a nondegenerate limiting law J( ) such that J n ( ) converges weakly to J( ) as min k n k. The α quantile of J( ) will be denoted by J 1 (α, )=inf{x : J(x, ) α}. In addition to Assumption 1.1, we will use the following mild assumption. Assumption 1.2 As min k n k, τ b ˆθ n θ( ) = o (1). Assumptions 1.1 and 1.2 are implied by the following assumption, as long as τ b /τ n 0. Assumption 1.3 As min k n k, the distribution of τ n (ˆθ n θ( )) under converges weakly to some distribution (on the Borel σ-field of the normed linear space B). Here, weak convergence is understood to be taken in the modern sense of Hoffmann- Jorgensen ; see Section 1.3 of van der Vaart and Wellner (1996). That Assumption 1.3 implies both Assumptions 1.1 and 1.2 follows by the Continuous Mapping Theorem ; see Theorem 1.3.6 of van der Vaart and Wellner (1996). 2 Subsampling hypothesis tests in K samples For k =1,...,K,letS k denote the set of all size b k (unordered) subsets of the dataset {X (k) 1,...,X(k) n k } where b k is an integer in [1,n k ]. Note that the set S k contains Q k = ( nk ) (k) b k elements that are enumerated as S 1,S(k) 2,...,S(k) Q k.ak foldsubsample is then constructed by choosing one element from each super-set S k for k =1,...,K.Thus,a typical K fold subsample has the form : S (1) i 1,S (2) i 2,...,S (K) i K, where i k is an integer in [1,Q k ]fork =1,...,K. It is apparent that the number of possible K fold subsamples is Q = K k=1 Q k. So a subsample value of the general statistic ˆθ n is ˆθ i,b = ˆθ b (S (1) i 1,...,S (K) i K ) (2) where b =(b 1,...,b K )andi =(i 1,...,i K ). The subsampling distribution approximation to J n ( ) is defined by L n,b (x) = 1 Q Q 1 Q 2 i 1=1 i 2=1 Q K i K=1 1{g[τ b (ˆθ i,b ˆθ n )] x}. (3)

The distribution L n,b (x) is useful for the construction of subsampling confidence sets as discussed in Section 3. For hypothesis testing, however, we instead let G n,b (x) = 1 Q Q 1 Q 2 i 1=1 i 2=1 Q K i K =1 1{g[τ b (ˆθ i,b )] x}, (4) and consider the general problem of testing a null hypothesis H 0 that =( 1,..., k ) 0 against H 1 that 1. The goal is to construct an asymptotically valid null distribution based on some test statistic of the form g(τ n ˆθn ), whose distribution under is defined to be G n ( )(withc.d.f.g n (,)). The subsampling critical value is obtained as the 1 α quantile of G n,b ( ), denoted g n,b (1 α). We will make use of the following assumption. Assumption 2.1 If 0, there exists a nondegenerate limiting law G( ) such that G n ( ) converges weakly to G( ) as min k n k. Let G(,) denote the c.d.f. corresponding to G( ). Let G 1 (1 α, )denotea1 α quantile of G( ). The following result gives the consistency of the procedure under H 0, and under a sequence of contiguous alternatives ; for the definition contiguity see Section 12.3 of Lehmann and Romano (2007). One could also obtain a simple consistency result under fixed alternatives. Theorem 2.1 Suppose Assumption 2.1 holds. Also, assume that, for each k =1,...,K, we have b k /n k 0, andb k as min k n k. (i) Assume 0.IfG(,) is continuous at its 1 α quantile G 1 (1 α, ), then and g n,b rob {g(τ n ˆθn ) >g n,b (1 α)} α (ii) Suppose, that for some = ( 1,..., k ) 0, n k k,n k G 1 (1 α, ) (5) as min k n k. (6) is contiguous to n k k k =1,...K. Then, under such a contiguous sequence, g(τ n ˆθn ) is tight. Moreover, if it converges in distribution to some random variable T and G(,) is continuous at G 1 (1 α, ), then the limiting power of the test against such a sequence is {T > G 1 (1 α, )}. roof : To prove (i), let x be a continuity point of G(,). We claim G n,b (x) for G(x, ). (7) Toseewhy,notethatE[G n,b (x)] = G b (x, ) G(x, ). So, by Chebychev s inequality, to show (7), it suffices to show Var[G n,b (x)] 0. To do this, let d = d n be the greatest integer min k (n k /b k ). Then, for j =1,...,d,let θ j,b be equal to the statistic ˆθ b evaluated at the data set where the observations from the kth sample are (X (k) b k (j 1)+1,X(k) b k (j 1)+2,...,X(k) b k (j 1)+b k ). Then, set Ḡ n,b (x) =d 1 d 1{g(τ b θj,b ) x}. j=1

By construction, Ḡ n,b (x) is an average of i.i.d. 0 1 random variables with expectation G b (x, ) and variance that is bounded above by 1/(4d n ) 0. But, G n,b (x) has smaller variance than Ḡn,b(x). This last statement follows by a sufficiency argument from the Rao-Blackwell Theorem ; indeed, (k) G n,b (x) =E[Ḡn,b(x) ˆ n k, k =1,...,K], (k) where ˆ n k is the empirical measure in the kth sample. Since these empirical measures are sufficient, it follows that Var(G n,b (x)) Var(Ḡn,b(x)) 0. Thus, (7) holds. Then, (5) follows by Lemma 11.2.1(ii) of Lehmann and Romano (2005). Application of Slutsky s Theorem yields (6). To prove (ii), we know that g n,b G 1 (1 α, ) under. Contiguity forces the same convergence under the sequence of contiguous alternatives. The result follows by Slutsky s Theorem. 3 Subsampling confidence sets in K samples Let c n,b (1 α) =inf{x : L n,b (x) 1 α} where L n,b (x) was defined in (3). Theorem 3.1 Assume Assumptions 1.1 and 1.2, where g is assumed uniformly continuous. Also assume that, for each k =1,...,K, we have b k /n k 0, τ b /τ n 0, and b k as min k n k. (i) Then, L n,b (x) J(x, ) for all continuity points x of J(,). (ii) If J(,) is continuous at J 1 (1 α, ), then the event {g[τ n (ˆθ n θ( ))] c n,b (1 α)} (8) has asymptotic probability equal to 1 α ; therefore, the confidence set {θ : g[τ n (ˆθ n θ)] c n,b (1 α)} has asymptotic coverage probability 1 α. roof : Assume without loss of generality that θ( ) = 0 (in which case J n ( )=G n ( )). Let x be a continuity point of J(,). First, we claim that L n,b (x) G n,b (x) 0. (9) Given ɛ>0, there exists δ>0, so that g(x) g(x ) <ɛif x x <δ.butthen, g[τ b (ˆθ i,b ˆθ n )] g(τ bˆθi,b ) <ɛif τ bˆθn <δ; this latter event has probability tending to one. It follows that, for any fixed ɛ>0, G n,b (x ɛ) L n,b (x) G n,b (x + ɛ) with probability tending to one. But, the behavior of G n,b (x) was given in Theorem 2.1. Letting ɛ 0 through continuity points of J(,) yields (9) and (i). art (ii) follows from Slutsky s Theorem.

Remark. The uniform continuity assumption for g can be weakened to continuity if Assumptions 1.1 and 1.2 are replaced by Assumption 1.3. However, the proof is much more complicated and relies on a K-sample version of Theorem 7.2.1 of olitis, Romano and Wolf (1999). In general, we may also try to approximate the distribution of a studentized root of the form g(τ n [ˆθ n θ( )]/ˆσ n,whereˆσ n is some estimator which tends in probability to some finite nonzero constant σ( ). The subsampling approximation to this distribution is L + n,b (x) = 1 Q Q 1 Q 2 i 1=1 i 2=1 Q K i K =1 1{g[τ b (ˆθ i,b ˆθ n )]/ˆσ i,b x}, (10) where ˆσ i,b is the estimator ˆσ b computed from the ith subsampled data set. Also let c + n,b (1 α) =inf{x : L+ n,b (x) 1 α}. Theorem 3.2 Assume Assumptions 1.1 and 1.2, where g is assumed uniformly continuous. Let ˆσ n satisfy ˆσ n σ( ) > 0. Also assume that, for each k =1,...,K, we have b k /n k 0, τ b /τ n 0, andb k as min k n k. (i) Then, L n,b (x) J(x σ( ),) if J(,) is continuous at xσ( ). (ii) If J(,) is continuous at J 1 (1 α, )/σ( ), then the event has asymptotic probability equal to 1 α. {g[τ n (ˆθ n θ( ))]/ˆσ n c + n,b (1 α)} (11) 4 Random subsamples and the K sample bootstrap ( nk b k ) can be a prohibitively large number ; consider- For large values of n k and b k, Q = k ing all possible subsamples may be impractical and, thus, we may resort to Monte Carlo. To define the algorithm for generating random subsamples of sizes b 1,...,b K respectively, recall that subsampling in the i.i.d. single-sample case is tantamount to sampling without replacement from the original dataset ; see e.g. olitis et al. (1999, Ch. 2.3). Thus, for m =1,...,M, we can generate the mth joint subsample as X (1) m,x (2) m,...,x (K) m where X m (k) = {X(k) I 1,...,X (k) I bk },andi 1,...,I bk are b k numbers drawn randomly without replacement from the index set {1, 2,...,n k }. Note that the random indices drawn to generate X m (k) are independent to those drawn to generate ) X(k m for k k. Thus, a randomly chosen subsample value of the statistic ˆθ n is given by ˆθ m,b = ˆθ b (X (1) m,...,x m (K) ), with corresponding subsampling distribution defined as L n,b (x) = 1 M M 1{g[τ b (ˆθ m,b ˆθ n )] x}. (12) m=1 The following corollary shows that L n,b (x) andits1 α quantile c n,b (1 α) canbeused for the construction of large-sample confidence regions for θ ; its proof is analogous to the proof of Corollary 2.1 of olitis and Romano (1994).

Corollary 4.1 Assume the conditions of Theorem 3.1. As M, parts (i) and (ii) of Theorem 3.1 remain valid with L n,b (x) and c n,b (1 α) instead of L n,b (x) and c n,b (1 α). Similarly, hypothesis testing can be conducted using the notion of random subsamples. To describe it, let g n,b (1 α) =inf{x : G n,b (x) 1 α} where G n,b (x) = 1 M M 1{g[τ b (ˆθ m,b )] x}. (13) m=1 Corollary 4.2 Assume the conditions of Theorem 2.1. As M, parts (i) and (ii) of Theorem 2.1 remain valid with G n,b (x) and g n,b (1 α) instead of G n,b (x) and g n,b (1 α). The bootstrap in two-sample settings is often used in practical work ; see Hall and Martin (1988) or van der Vaart and Wellner (1996). In the i.i.d. set-up, resampling and (random) subsampling are very closely related since, as mentioned, they are tantamount to sampling with vs. without replacement from the given i.i.d. sample. By contrast to subsampling, however, no general validity theorem is available for the bootstrap unless a smaller resample size is used ; see olitis and Romano (1993). Actually, the general validity of K-sample bootstrap that uses a resample size b k for sample k follows from the general validity of subsampling as long as b 2 k << n k. To state it, let Jn,b (x) denote the bootstrap (pseudo-empirical) distribution of g[τ b(ˆθ n,b ˆθ n )] where ˆθ n,b is the statistic ˆθ b computed from the bootstrap data. Similarly, let c n,b (1 α) = inf{x : Jn,b (x) 1 α}. The proof of the following corollary parallels the discussion in Section 2.3 of olitis et al. (1999). Corollary 4.3 Under the additional condition b 2 k /n k 0 for all k, Theorem 3.1 is valid as stated with Jn,b (x) and c n,b (1 α) in place of L n,b(x) and c n,b (1 α) respectively. Références [1] Hall,. and Martin, M. (1988). On the bootstrap and two-sample problems, Australian Journal of Statistics, vol. 30A, 179-192. [2] Lehmann, E.L. and Romano, J. (2005). Testing Statistical Hypotheses, 3rd edition, Springer, New York. [3] olitis, D.N., and Romano, J.. (1993). Estimating the Distribution of a Studentized Statistic by Subsampling, in Bulletin of the International Statistical Institute, 49th Session, Firenze, August 25 - September 2, 1993, Book 2, pp.315-316. [4] olitis, D.N., and Romano, J.. (1994). Large sample confidence regions based on subsamples under minimal assumptions, Ann. Statist., vol. 22, 2031-2050. [5] olitis, D., Romano, J. and Wolf, M. (1999). Subsampling, Springer, New York. [6] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical rocesses, Springer, New York.