An elementary proof of the weak convergence of empirical processes

Similar documents
Weak Convergence of Stationary Empirical Processes

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Empirical Processes: General Weak Convergence Theory

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Bounded uniformly continuous functions

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Closest Moment Estimation under General Conditions

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

Closest Moment Estimation under General Conditions

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

On Kusuoka Representation of Law Invariant Risk Measures

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

The main results about probability measures are the following two facts:

1 Weak Convergence in R k

Course 212: Academic Year Section 1: Metric Spaces

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Weak convergence of dependent empirical measures with application to subsampling in function spaces

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS

Simple Integer Recourse Models: Convexity and Convex Approximations

Defining the Integral

Metric Spaces and Topology

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

Convergence of generalized entropy minimizers in sequences of convex problems

Lecture 2: Uniform Entropy

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Midterm 1. Every element of the set of functions is continuous

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATHS 730 FC Lecture Notes March 5, Introduction

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Exercises Measure Theoretic Probability

STAT 7032 Probability Spring Wlodek Bryc

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Elementary properties of functions with one-sided limits

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

Spectral theory for compact operators on Banach spaces

Concentration behavior of the penalized least squares estimator

Verifying Regularity Conditions for Logit-Normal GLMM

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

{σ x >t}p x. (σ x >t)=e at.

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

3 Integration and Expectation

Goodness-of-Fit Testing with Empirical Copulas

Estimation of the Bivariate and Marginal Distributions with Censored Data

1 Cheeger differentiation

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Refining the Central Limit Theorem Approximation via Extreme Value Theory

1.4 The Jacobian of a map

9 Brownian Motion: Construction

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Elements of Probability Theory

University of California San Diego and Stanford University and

DIFFERENTIAL CRITERIA FOR POSITIVE DEFINITENESS

Some functional (Hölderian) limit theorems and their applications (II)

arxiv: v1 [math.pr] 24 Aug 2018

The Kadec-Pe lczynski theorem in L p, 1 p < 2

Basics of Stochastic Analysis

Quantile Processes for Semi and Nonparametric Regression

Statistical inference on Lévy processes

Wiener Measure and Brownian Motion

ECE 4400:693 - Information Theory

Probability and Measure

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

On the Converse Law of Large Numbers

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

Bootcamp. Christoph Thiele. Summer As in the case of separability we have the following two observations: Lemma 1 Finite sets are compact.

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

DISTRIBUTION OF THE SUPREMUM LOCATION OF STATIONARY PROCESSES. 1. Introduction

An empirical Central Limit Theorem in L 1 for stationary sequences

Combinatorics in Banach space theory Lecture 12

MATH 6605: SUMMARY LECTURE NOTES

Lecture 6 Basic Probability

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Central Limit Theorem for the Measure of Level Sets Generated by a Gaussian Random Field

An almost sure invariance principle for additive functionals of Markov chains

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

THE INVERSE FUNCTION THEOREM

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Empirical Processes and random projections

COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM

BASICS OF CONVEX ANALYSIS

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

A uniform central limit theorem for neural network based autoregressive processes with applications to change-point analysis

The Arzelà-Ascoli Theorem

Chapter 6. Integration. 1. Integrals of Nonnegative Functions. a j µ(e j ) (ca j )µ(e j ) = c X. and ψ =

Lecture 4 Lebesgue spaces and inequalities

CHAPTER 6. Differentiation

Proof. We indicate by α, β (finite or not) the end-points of I and call

Math 328 Course Notes

Transcription:

An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical Science, Cornell University February 2016 Abstract This paper develops a simple technique for proving the weak convergence of a stochastic process Z n (g) := g dz n, indexed by functions g in some class G. The main novelty is a decoupling argument that allows to derive asymptotic equicontinuity of the process { Z n (g), g G} from that of the basic process {Z n (t), t R}, with Z n (t) = Z n (f t ) and f t (x) = 1 (,t] (x). The method leads to novel results for empirical processes based on stationary processes and its bootstrap versions. Running title: Weak convergence of empirical processes indexed by functions of bounded variation. MSC2000 Subject classification: Primary 60F17 ; secondary 60G99. Keywords and phrases: Integration by parts, bounded variation, weak convergence. 1

2 1 Introduction This paper describes a simple, yet very general technique for proving the weak convergence of a stochastic process Z n (g) := g dz n, indexed by functions g belonging to some class G. The canonical situation we have in mind is when Z n equals the standard empirical process G n (t) = 1 n n (1 (,t] (X i ) F (t)), t R (1) i=1 based on a stationary sequence of random variables X i from a distribution function F. The traditional proof of weak convergence of the empirical process { g dg n, g G} in l (G) based on i.i.d. observations X 1, X 2,... uses maximal inequalities, which in turn rely on bounds involving metric entropy numbers of the class G, see, for instance, Van der Vaart & Wellner (1996, Chapter 2). Instead, we show that stochastic equicontinuity of the process { Z n (g), g G}, follows from weak convergence of the (arbitrary) process {Z n (t), t R}. The heart of our argument is Lemma 3 which decouples the quantity Z n (g) into two parts; one that is controlled by the process {Z n (t), t R}, and another, non-probabilistic part, that is bounded by L 1 and total variation norms of the function g. The proof uses a simple integration by parts trick and a finite dimensional approximation technique. It does not involve any metric entropy computations and relies only on elementary measure theory. The maximal inequalities are well known in the i.i.d. setting, in particular for uniformly bounded classes of functions of bounded variation, see, for instance, Van der Vaart & Wellner (1996, p.149, Example 2.6.21 or p.159, Theorem 2.7.5). Thus, for standard empirical processes G n based on i.i.d. variables, the novelty of our results are purely aesthetic in nature. However, the results presented in this paper are far more general and they apply to a variety of interesting and novel situations. In particular, we are able to greatly enhance the existing results on stationary dependent sequences and the bootstrap. Only the case of the rather restrictive beta-mixing is known to give comparable results to the i.i.d. case, partly due to the fact that good maximal inequalities for Z n exist in this case, see, for instance Arcones and Yu (1994), Radulović

3 (1996) and Rio (1998). Moreover, the standard argument of enlarging a P-Donsker class (in our case 1 (,t] to its convex hull does not seem to apply to stationary sequences. To the best of our knowledge the proof of such an extension (See van der Vaart and Wellner (1996, page 191) works for i.i.d. sequences only. The paper is organized as follows. Section 2 contains our main results (Lemma 3, Theorem 5) with their short, elementary proofs, followed by a discussion in Section 3. The important extension to distributions in R d, d 2, will be considered separately in another paper. 2 Main result We use notation g T V = sup Π g(x i ) g(x i 1 ) x i Π for the total variation norm of a function g : R R. Here the supremum is taken over all countable partitions Π = {x 1 < x 2 <...} of R. We set BV T := {g : R R : g T V T } for T > 0. Throughout this paper, we make the blanket assumption that {Z n (t), t R} is an arbitrary stochastic process such that (i) lim t Z n (t) = 0 (ii) the sample paths of Z n are right-continuous and of bounded variation. Clearly, both requirements (i) and (ii) are met for the canonical empirical process G n (t) in (1) based on (not necessarily independent) observations from a distribution F. In this paper, we study the limit distribution of the process Z n (g) := g(x) dz n (x), g G

4 for some class G BV T, for some finite T. Recall that, see, for instance, Van der Vaart & Wellner (1996, Chapter 1.5), the process { Z n (g), g G} converges weakly to a tight limit in l (G), provided (a) the marginals ( Z n (g 1 ),..., Z n (g k )) converge weakly for every finite subset g 1,..., g k G, and (b) there exists a semi-metric ρ on G such that (G, ρ) is totally bounded and Z n (g) is ρ-stochastically equicontinuous, that is, { lim lim sup P δ 0 n sup Z n (f) Z n (g) > ε ρ(f,g) δ } = 0 for all ε > 0. The supremum is taken over all f, g G with ρ(f, g) δ. Point (a) will be addressed in Section 2.1, while point (b) will be addressed in Section 2.2. For ρ we will take any L p (F 0 ) semi-metric based on an arbitrary cumulative distribution function (c.d.f.) F 0 and we say that Z(f) is L 1 (F 0 )-continuous if its sample paths are continuous with respect to the L 1 (F 0 ) semi-norm. First, we note that we may consider only right continuous functions g. This is a consequence of the following lemma. Lemma 1. Let BV T BV T with g T V T. We have sup g BV T inf h BV T be the class of all right-continuous functions g : R R Zn (g) Z n (h) T sup x Proof. Let g be an arbitrary function in BV T. Z n (x) Z n (x ). Recall that a function of bounded variation on the real line only has countably many discontinuities. We denote the points of discontinuity of g by a i. Let ḡ be the right-continuous version of g, that is, ḡ(x) = g(x) for all x a i ; and ḡ(a i ) = g(a + i ) for all i. Then g dz n ḡ dz n g(a i ) ḡ(a i ) Z n (a i ) Z n (a i ) i The conclusion follows easily. g T V sup Z n (x) Z n (x ). x

5 In case Z n is the ordinary empirical process G n in (1), the term on the right in the bound of Lemma 1 is of order O(T n 1/2 ). 2.1 Finite dimensional convergence First we address the finite dimensional convergence. In many cases, the limiting distribution is Gaussian, and the weak convergence of ( Z n (g 1 ),..., Z n (g k )) for every finite g 1,..., g k G follows from that of Z n (g) for each g G by the Cramér-Wold device. Lemma 2. Assume that the stochastic process Z n (t) converges weakly to a Gaussian process Z(t). Then, for any right-continuous function g : R R of bounded variation, Z n (g) := g dz n is well defined, and converges to a normal distribution on R. Proof. Let g be an arbitrary right-continuous function of bounded variation. First, we notice that g dz n and Z n dg are indeed well-defined as Lebesgue-Stieltjes integrals. Recall that a function of bounded variation on the real line only has countably many discontinuities. We denote the points of discontinuity of g by a i. By the integration by parts formula, Lemma 12 in the appendix, we have g(x) dz n (x) = T 1 (Z n ) + T 2 (Z n ) with operators T 1, T 2 : l (R) R T 1 (Z n ) := Z n (x) dg(x) T 2 (Z n ) := 1 x=y dg(x) dz n (y) = i ( g(ai ) g(a i )) ( Z n (a i ) Z n (a i )). Since g has finite variation, it is bounded and, for α i := g(a i ) g(a i ), i α i < and i α2 i <. Hence, the operator T 2 (Z n ) = i α i ( Zn (a i ) Z n (a i )) is linear. We conclude the proof by observing that the linearity of the operators T 1 and T 2 and the weak convergence of Z n to a (Gaussian) process Z, ensure that the sequence

6 of random variables Y n := g dz n = T 1 (Z n ) + T 2 (Z n ) converges weakly (to a normal distribution) by the continuous mapping theorem. 2.2 Asymptotic tightness We now address the asymptotic equicontinuity of the process { Z n (g), g G}. Lemma 3 below plays the key role in our analysis. It relates the process { Z n (g), g G} directly to the process {Z n (t), t R}, using the following quantities: For any c.d.f. F 0 and any β > 0, we define Ψ β,f0 (Z n ) := sup Z n (s) Z n (t), F 0 (s) F 0 (t) β ð β,f0 (Z n ) := sup F 0 (s) F 0 (s )>β Zn (s) Z n (s ). Clearly, ð β,f0 (Z n ) = 0 for all β > 0 if F 0 is continuous. In general, for arbitrary F 0, the quantity ð β,f0 (Z n ) is bounded in probability, for all β > 0, by the continuous mapping theorem, as long as Z n converges weakly. The latter bound, ð β,f0 = O p (1), suffices in our application of Lemma 3 in Theorem 5 below. Lemma 3 (Decoupling lemma). For any distribution function F 0 on R, any rightcontinuous function g : R R of bounded variation, and any β > 0, we have Z n (g) { } 2β 1 g L1 (F 0 ) + 6 g T V Ψ2β,F0 (Z n ) + β 1 g L1 (F 0 )ð β,f0 (Z n ). Here g L1 (F 0 ) = g df 0. Proof. Without loss of generality we can assume that g T V + g L1 (F 0 ) <. Since F 0 is a distribution function, we can construct, for any 0 < β < 1, a finite grid = s 0 < s 1 < < s M 1 < s M < such that F 0 (s j ) F 0 (s j 1 ) β, F 0 (s M ) < 1 2β and F 0 (s j ) F 0(s j 1 ) 2β,

7 leaving the possible jump F 0 (s j ) F 0 (s j ) unspecified. Based on this grid, we approximate Z n (t) by Z n (t) = M Z n (s j 1 )1 [sj 1,s j )(t) and we set Z n (± ) = 0. We observe that by construction sup Z n (x) Z n (x) max sup Z n (x) Z n (s j 1 ) + sup Z n (x) x 1 j M x [s j 1,s j ) x [s M, ) sup Z n (x) Z n (y) F 0 (x) F 0 (y) 2β = Ψ 2β,F0. The process Z n inherits the bounded variation property from Z n and g(s) dz n (s) = g(s) d Z n (s) + g(s) d(z n Z n )(s) (2) is well defined for any function g of bounded variation. Using the integration by parts formula (Lemma 12 in the appendix), we obtain for the last term on the right, g(s) d(z n Z n )(s) = (Z n Z n )(s) dg(s) + 1 x=y dg(x) d(z n Z n )(y). Since g is of bounded variation, it has countably many discontinuities a i, and we can write 1 x=y dg(x) d(z n Z n )(y) g(ai ) g(a i ) (Zn Z n )(a i ) (Z n Z n )(a i ) i g T V sup (Z n Z n )(a i ) + g T V sup (Z n Z n )(a i ) i i 2 g T V Ψ 2β,F0 (Z n ). Consequently, g(s) d(z n Z n )(s) (Z n Z n )(s) dg(s) + 1 x=y dg(x) d(z n Z n )(y) (Z n Z n )(s) dg(s) + 2 g T V Ψ 2β,F0 (Z n ) 3 g T V Ψ 2β,F0 (Z n ). (3)

8 Next, we deal with the finite dimensional approximation M g d Z n = g(s j ) (Z n (s j )) Z n (s j 1 )). By the triangle inequality, we have g d Z n M g(s j ) ( Z n (s j )) Z n(s j 1 ) ) M + g(s j ) ( Z n (s j )) Z n (s j )) (4) and we address each term on the right separately. For the first term on the right, we introduce the step function g (t) = M 1 (sj 1,s j ](t) inf s j 1 <s s j g(s) and we observe that M g (s j ) M β 1 g (s j )(F 0 (s j ) F 0 (s j 1 )) = β 1 g df 0 β 1 g df 0 = β 1 g L1 (F 0 ). Hence, we have M g(s j ) ( Z n (s j )) Z n(s j 1 ) ) { M Ψ 2β,F0 (Z n ) ( g(s j ) g (s j )) + } M g (s j ) Ψ 2β,F0 (Z n ) ( g T V + β 1 g L1 (F 0 )). (5)

9 For the second term in (4), we have M g(s j ) ( Z n (s j )) Z n (s j )) ( g(s j ) Zn (s j )) Z n (s j )) ( + g(s j ) Zn (s j )) Z n (s j )) j: F 0 (s j ) F 0 (s j ) β j: F 0 (s j ) F 0 (s j )>β Ψ 2β,F0 (Z n ) ( 2 g T V + β 1 g L1 (F 0 )) + ðβ,f0 (Z n )β 1 g L1 (F 0 ), (6) using for the last inequality M g(s j ) 2 g T V + β 1 g L1 (F 0 ) (by repeating the same computation in (5) before), and M g(s j ) 1{F 0 (s j ) F 0 (s j ) > β} M β 1 g(s j ) (F 0 (s j ) F 0 (s j ) β 1 g L1 (F 0 ). The proof now follows easily after collecting all the bounds (2), (3), (4), (5) and (6). An immediate corollary is the following result: Corollary 4. For any c.d.f. F 0, and for all T <, δ > 0 and p 1, we have sup g dz n (2 δ + 6T )Ψ 2 δ,f0 (Z n ) + ð δ,f 0 (Z n ) δ, g Lp(F0 ) δ, g T V T where g Lp(F 0 ) = ( g p df 0 ) 1/p. Proof. The proof follows trivially from Lemma 3 taking β = δ and using g L1 (F 0 ) g Lp(F 0 ) δ for all p 1.

10 2.3 Main result Throughout the papr we view Z n as a random element in l (R) and its weak convergence to a tight Gaussian process is understood in the sense of Hoffmann-Jorgensen (see Van der Vaart and Wellner, 1996). Theorem 5. Assume that the stochastic process {Z n (t), t R} converges weakly to a Gaussian process {Z(t), t R}, that is continuous with respect to the distance d(s, t) = F 0 (s) F 0 (t) for some c.d.f. F 0. Then, for any T < and G BV T, the process { Z n (g), g G} converges weakly to a L 1 (F 0 )-continuous Gaussian process in l (G). Proof. The finite dimensional convergence follows trivially from Lemma 2, invoking the weak convergence of Z n (t) to a Gaussian limit. As for the stochastic equicontinuity of Z n (g), it suffices to show that, for all ε > 0, { } lim lim sup P sup δ 0 h dz n > ε = 0. n h L1 (F 0 ) δ Here the supremum is taken over all differences h = g g with g, g G and h L1 (F 0 ) = h df0 δ. Since h T V 2T, we find by Corollary 4 that sup h L1 (F 0 ) δ h dz n Ψ 2 δ,f0 (Z n )(12T + 2 δ) + ð δ,f 0 (Z n ) δ. Let f t (x) = 1{x t}, so that Z n (t) = Z n (f t ) and d(s, t) = F 0 (s) F 0 (t) = f s f t df 0, and observe that sup Z n (s) Z n (t) = Ψ δ,f0 (Z n ). d(s,t) δ The weak convergence of {Z n (t), t R} (or, equivalently, Z n (f t )) to a continuous (with respect to d) process Z(t) implies that Ψ 2 δ,f0 (Z n ) P 0 as δ 0 and n.

11 Moreover, the weak convergence of {Z n (t), t R} implies that ð δ,f0 (Z n ) is bounded in probability, so that ð δ,f 0 (Z n ) δ 0 P as δ 0 and n. Summarizing, the process Z n (g) converges for each g G to a Gaussian limit, Z n (g) is uniformly L 1 (F 0 )-equicontinuous, in probability, and (G, L 1 (F 0 )) is totally bounded (this is true for any c.d.f. F 0 ). This implies the weak convergence of Z n (g) to a L 1 (F 0 )- continuous Gaussian process, see Theorems 1.5.4 and 1.5.7 in Van der Vaart & Wellner (1996). 3 Discussion 3.1 Alternative proof if F 0 (t) = t The proof of Theorem 5 can be shortened considerably if the limit Z(t) of Z n (t) is continuous. Let d BL be the bounded Lipschitz metric that metrics weak convergence, see, e.g., Van der Vaart & Wellner (1996, page 73) for the definition. Hence, d BL (Z n, Z) 0 as n. Set Z n (g) = g dz n for any g BV T. By Lemma 1 and the fact that Z is continuous, we only need to prove weak convergence in l (G) with G BV T. For this class, the Lebesgue Stieltjes integrals Z n (g) = Z n dg are well defined. Next, by the integration by parts formula in Lemma 12, we have Z n (g) = Z n (g) + R n (g) with R n (g) T sup Z n (x) Z n (x ) 0, x in probability, as n, as Z is continuous. For fixed g, Z n (g) converges weakly to Z(g) := Z dg by the continuous mapping theorem and weak convergence of Z n. The continuous mapping theorem also guarantees that the limit Z is tight in l (G) as the map Γ : l (R) l (G) defined as Γ f (X) = X dg, g G, is continuous. By

12 the triangle inequality, d BL ( Z n, Z) d BL ( Z n, Z n ) + d BL ( Z n, Z) = sup EH( Z n ) EH( Z n ) + sup EH( Z n ) EH( Z) H BL 1 H BL 1 T E[sup Z n (x) Z n (x ) ] + T sup EH (Z n ) EH (Z) x H BL 1 = T E[sup Z n (x) Z n (x ) ] + T d BL (Z n, Z) x The second term follows since the map Γ f (X) := X df is Lipschitz with Lipschitz constant df T and the suprema are taken over all Lipschitz functionals H : l (G) R with H 1 and H(X) H(Y ) X Y and H : l (R) R with H 1 and H (X) H (Y ) X Y, respectively Together with the tightness of the limit Z, this implies that the empirical process Z n (g) indexed by g G BV T converges weakly. 3.2 Standard empirical processes based on i.i.d. sequences Although there are examples of Donsker classes of infinite variation, such as f : [0, 1] [0, 1] with f(x) f(y) x y α, 1/2 < α < 1, such cases are rather the exception than the norm. The majority of examples of bounded Donsker classes that are given in the literature are subsets of BV T. Nevertheless, since any function of bounded variation can be written as a difference of two monotone functions, Theorem 5, for Z n (t) equal to the standard empirical process G n (t) in (1), based on i.i.d. X 1, X 2,, is certainly not novel, except for its mathematical proof. For stationary, dependent sequences X i, the situation is radically different. 3.3 Stationary sequences Theorem 5 allows us to argue weak convergence of Z n (g) via Z n (t), regardless of the structure of the latter process. For instance, taking Z n as the standard empirical processes G n based on stationary sequences X i, we obtain the following corollary as an immediate consequence of Theorem 5.

13 Corollary 6. Let X k be a stationary sequence of random variables with distiribution F and alpha-mixing coefficients α n satisfying α n = O(n r ), n 1, for some r > 1. Then, for any G BV T, the empirical process { g dg n, g G} converges weakly to a L 1 (F )-continuous Gaussian process in l (G). Proof. It is well known, see Theorem 7.2, page 96 in Rio (2000), that α n = O(n r ) implies that the standard empirical process G n converges weakly to a Brownian bridge process with continuous paths with respect to the distance d(s, t) = F (s) F (t) for the stationary distribution F of X k. We apply Theorem 5 to obtain the result. The limit Z of Z n in Corollary 6 is a mean zero Gaussian process with covariance structure E[ Z(f) Z(g)] = Cov(f(X 0 ), g(x 0 )) + + Cov(f(X 0 ), g(x k )) k=1 Cov(f(X k ), g(x 0 )). k=1 We would like to point out that alpha-mixing is the least restrictive form of available mixing assumptions in the literature. To the best of our knowledge, there are actually very few results that treat processes { g dg n, g G}, indexed by functions, and they all require very stringent conditions on the entropy numbers of G and on the rate of decay for α k. See, for instance, Andrews and Pollard (1994). This is due to fact that alpha-mixing does not allow for sharp exponential inequalities for partial sums. Consequently, the only known cases for which we have sharp conditions are under more restrictive, beta-mixing dependence. Beta-mixing allows for decoupling and it does yield exponential inequalities not unlike the i.i.d. case. The current state-of-the art results, Arcones & Yu (1994), Doukhan, Massart & Rio (1995), applied to bounded sequences, require n β n <. However, Theorem 5 goes beyond dependence defined via mixing conditions. For example, they allow for short memory casual linear sequences. These sequences are

14 defined by X i = a j ξ i j j=0 based on i.i.d. random variables ξ i and constants a i. While the X i form a stationary sequence, they do not necessarily satisfy any mixing condition. Weak convergence of the empirical processes {G n (t), t R} was established under sharp conditions (Doukhan and Surgailis, 1998). To the best of our knowledge, there are no extensions to the more general processes Z n (g). Theorem 5 and the Doukhan and Surgailis (1998) result combined imply the following: Corollary 7. Let X i = j 0 a jξ i j be such that conditions of Doukhan and Surgailis (1998, pp 87 88) are satisfied and let F be the stationary disitribution of X i. Then, for any G BV T, the empirical process Z n = { g dg n, g G} converges weakly to a L 1 (F )-continuous Gaussian process in l (G). Proof. Doukhan and Surgailis (1998) prove that G n (t) converges weakly to a Gaussian process in the Skorohod space. Since F is continuous under their assumptions, see Doukhan and Surgailis (1998, p 88), the limiting process of G n is continuous with respect to the distance d(s, t) = F (s) F (t). The result follows from Theorem 5. The recent papers by Dehling et al. (2009 and 2014) offer yet another, clever way to prove the weak limit of the standard empirical processes G n based a stationary sequences that are not necessarily mixing. Their technique uses finite dimensional convergence coupled with a bound on the higher moments of partial sums, which in turn controls the dependence structure. Dehling et al. (2009) establishes the weak convergence of G n, while Dehling et al. (2014) extends this idea to more general classes of functions. However, the authors impose cumbersome entropy conditions and only manage to marginally extend the classes. For example, they manage to prove weak convergence of the process f t dg n, indexed by functions f t (x) in the one-dimensional monotone class (with restrictive requirement that s t f s f t ). Theorem 5 applied in their setting, yields a more general result. Corollary 8. Let G BV T. Under assumptions (i) and (ii) in Section 1 of Dehling et al. (2009), the empirical process { g dg n, g G} converges weakly to a Gaussian process in l (G).

15 Proof. The underlying distribution function F of X i in Dehling et al (2009) is continuous, see their display (3) at p 3702. Consequently, the limiting process of G n is continuous with respect to the distance d(s, t) = F (s) F (t) and Theorem 5 yields the result. 3.4 Bootstrap Given the sample X 1,..., X n, we let G n be bootstrap empirical process ( ) G n(t) = 1 m n m n 1 (,t] (Xi ) 1 n 1 (,t] (X i ), t R, n m n i=1 based on a bootstrap sample X 1,..., X m n. We stress that no additional assumption on the structure of the variables Xi,n is required. Analogous to Z n (g) = g dg n, we define Z n(g) := g dg n for any g BV T with T <. Recall that the bounded Lipschitz distance i=1 d BL ( Z n, Z) = sup h BL 1 E [h( Z n)] E[h( Z)] between two processes Z n and Z metrizes weak convergence. Here E is the expectation over the randomness of the bootstrap sample X 1,..., X m n, conditionally given the original sample X 1,..., X n, and BL 1 is the space of Lipschitz functionals h : l (G) R with h(x) 1 and h(x) h(y) x y for all x, y G BV T. Customary in the literature, if the random variable d BL ( Z n, Z) converges to zero in probability, we speak of weak convergence in probability; if it converges to zero almost surely, we speak of weak convergence almost surely. Theorem 9. Let G BV T. Assume that, conditionally on X 1,..., X n, in probability, {G n(t), t R} converges weakly to a Gaussian process that is continuous with respect to the the distance d(s, t) = F (s) F (t) based on the stationary distribution F of X i. Then, the process Z n = { g dg n, g G} converges weakly to a L 1 (F )-continuous Gaussian process in l (G). If the weak convergence of G n holds almost surely, then the conclusion that Z n converges weakly in l (G) holds almost surely.

16 Proof. We only prove the in probability statement. The almost sure statement follows after straightforward changes in the proof. A simple modification of Lemma 2 yields g(t)dg n(t) = T 1 (G n) + T 2 (G n) for the same operators T 1 and T 2 defined in the proof of Lemma 1. Since G n converges to a Gaussian process, the finite dimensional convergence of Z n follows. As for stochastic equicontinuity of Z n, we find, analogous to Corollary 4, that, for all δ > 0, sup h T V 2T, h L1 (F ) δ h dg n Ψ 2 δ,f (G n)(12t + 2 δ) + ð δ,f (G n) δ. Now use the weak convergence of G n, as in the proof of Theorem 5, to conclude the result. If G n and G n converge to the same limit, then Z n and Z n converge to the same limit in l (G). More precisely, if d BL (Ḡ n, Ḡn) 0, in probability (or almost surely), then the bootstrap works in that d BL ( Z n, Z n ) 0 in probability (or almost surely). The literature offers numerous bootstrapping techniques for stationary data (moving block bootstrap, stationary bootstrap, sieved bootstrap, Markov chain bootstrap, etc.), but, their validity is proved for specific cases/statistics only. Due to complications with entropy calculations for dependent triangular arrays, almost all results are done for standard empirical processes G n with few notable exceptions. The moving block bootstrap was justified for VC-type classes, but only under rather restrictive beta-mixing conditions on X i (Radulović, 1996). Bracketing classes were considered by Bühlmann (1995), but his conditions are even more restrictive. In contrast, the process {G n (t), t R} is rather easy to bootstrap. This coupled with Theorem 9 offers the following result.

17 Corollary 10. Let X j be a stationary sequence of random variables with continuous stationary c.d.f. F and alpha-mixing coefficients satisfying k n α k = O(n γ ), for some 0 < γ < 1/3. Let G n be the bootstrapped standard empirical process based on the moving block bootstrap, with block sizes b n, specified in Peligrad (1998, p 882). Then, for G BV T, the bootstrap empirical process Z n = { g dg n, g G} converges weakly to a L 1 (F )-continuous Gaussian process, almost surely. Proof. Theorem 2.3 of Peligrad (1998) establishes the convergence of G n to a Gaussian process that is continuous with respect to the distance d(s, t) = F (s) F (t) and Theorem 9 extends it to Z n. Just as with the weak convergence of the empirical process based on stationary sequences, there are numerous results that treat stationary bootstrap for non-mixing sequences. For example, Ktaibi et al. (2014) study short memory casual linear sequences, and prove weak convergence of G n under conditions akin to the ones required for its non-bootstrap counterpart G n (Doukhan and Surgailis, 1998). Again, Theorem 9 easily extends this result. Corollary 11. Let X i = j 0 a jξ i j be a sequence of random variables with stationary distribution F such that conditions of Ktaibi et al (2014) are satisfied. Then, for any G BV T, the process Z n = { g dg n, g G} converges weakly a.s. to a L 1 (F )- continuous Gaussian limit. Proof. Ktaibi et al (2014) prove the convergence of G n in the Skorohod space, for continuous F, which in turn implies that the limiting process Z(t) is continuous with respect to d(s, t) = F (s) F (t). Theorem 9 extends it to Z n. A Integration by parts formula For completeness, we state the following classical result and give a simple elementary proof which was communicated to us by David Pollard.

18 Lemma 12 (Integration by parts). Let f and g be right-continuous functions of bounded variation and define measures µ and v as µ(, x] = f(x) f( ) and ν(, y] = g(y) g( ). Then f(x) dg(x) + g(x) df(x) = (fg)( ) (fg)( ) + 1 x=y dµ dν.. Moreover, if either f(± ) = 0 of g(± ) = 0, then f(x) dg(x) + g(x) df(x) = 1 x=y dµ dν. Proof. Set H(x, y) = 1{x y} and observe that by the very definition of Lebesgue integral f(y) = H(x, y) dµ(x) + f( ) and g(x) = H(y, x) dν(y) + g( ). Hence f(y) dν(y) + g(x) dµ(x) ( ) ( = H(x, y) dµ(x) dν(y) + ) H(y, x) dν(y) dµ(x) Next we apply Fubini ( ) H(x, y) dµ(x) = = dµ(x) dν(y) + +f( )(g( ) g( )) + g( )(f( ) f( )) ( dν(y) + (H(x, y) + H(y, x)) dµ(x) dν(y) = 1 x=y dµ(x) dν(y) = {f( ) f( )}{g( ) g( )} + Combine the two previous displays to prove the lemma. ) H(y, x) dv(y) dµ(x) (1 x y + 1 y x ) dµ(x) dν(y) 1 x=y dµ(x) dν(y). Acknowledgement: We thank David Pollard for the elegant proof of integration by parts.

19 References [1] D.W.K. Andrews and D.P. Pollard (1994). An Introduction to Functional Central Limit Theorems for Dependent Stochastic Processes. International Statistical Review, 62(1), 119 132. [2] M.A. Arcones, and B. Yu (1994). Central limit theorems for empirical and U- processes of stationary mixing sequences. Journal of Theoretical Probability 7, 47 71. [3] P. Bühlmann (1995). The blockwise bootstrap for general empirical processes of stationary sequences. Stochastic Processes and their Applications, 58, 247 265. [4] H. Dehling, O. Durieu and D. Volny (2009). New techniques for empirical processes of dependent data. Stochastic Processes and their Applications, 119, 3699 3718. [5] H. Dehling, O. Durieu and M. Tusche (2014). Approximating class approach for empirical processes of dependent sequences indexed by functions. Bernoulli, 20, 1372 1403. [6] P. Doukhan, and D. Surgailis (1998). Functional central limit theorem for the empirical process of short memory linear processes. Comptes Rendus de l Académie des Sciences, Mathématique, 326, 87 92. [7] F. El Ktaibi, B.G. Ivanoff, N.C. Weber (2014). Bootstrapping the empirical distribution of a linear process. Statistics & Probability Letters, 93, 134 142 [8] T.H. Hildebrandt (1963). Introduction to the Theory of Integration. Academic Press, New York, London. [9] M. Peligrad (1998). On the blockwise bootstrap for empirical processes for stationary sequences. Annals of Probability, 26, 887 901. [10] D. Radulović (1996). The bootstrap for empirical processes based on stationary observations. Stochastic Processes and Their Applications, 65, 259 279. [11] E. Rio (1998). Processus empiriques absolument réguillets et entropie universelle. Probability Thoery and Related Fields, 111, 585 608.

20 [12] E. Rio (2000). Théorie asymptotique des processus aléatoires faiblement dependants, Collection Mathématiques and Applications, 31. Springer. [13] G.R. Shorack and J.A. Wellner (2009). Empirical Processes with Applications to Statistics, 2nd Edition, SIAM. [14] A.W. van der Vaart and J. Wellner (2000). Weak convergence and empirical processes. Springer Series in Statistics. Springer. [15] D. Viennet (1997). Inequalities for absolutely regular sequences: application to density estimation. Probability Thoery and Related Fields, 107, 467 492.