On Asymptotic Proximity of Distributions

Similar documents
Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Empirical Processes: General Weak Convergence Theory

Bonus Homework. Math 766 Spring ) For E 1,E 2 R n, define E 1 + E 2 = {x + y : x E 1,y E 2 }.

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

7 Complete metric spaces and function spaces

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

The Heine-Borel and Arzela-Ascoli Theorems

Compact operators on Banach spaces

Math 209B Homework 2

A note on some approximation theorems in measure theory

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

REAL AND COMPLEX ANALYSIS

THEOREMS, ETC., FOR MATH 515

A VERY BRIEF REVIEW OF MEASURE THEORY

McGill University Math 354: Honors Analysis 3

Final Exam Practice Problems Math 428, Spring 2017

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Your first day at work MATH 806 (Fall 2015)

Real Analysis Chapter 4 Solutions Jonathan Conder

Math 117: Continuity of Functions

Combinatorics in Banach space theory Lecture 12

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

MATHS 730 FC Lecture Notes March 5, Introduction

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Some Background Material

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

Your first day at work MATH 806 (Fall 2015)

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Principles of Real Analysis I Fall VII. Sequences of Functions

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

2 Statement of the problem and assumptions

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Topological properties

The Lebesgue Integral

Continuity of convex functions in normed spaces

Metric Spaces and Topology

Problem Set 5: Solutions Math 201A: Fall 2016

GLUING LEMMAS AND SKOROHOD REPRESENTATIONS

The Skorokhod reflection problem for functions with discontinuities (contractive case)

STAT 7032 Probability Spring Wlodek Bryc

3 Integration and Expectation

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Optimal series representations of continuous Gaussian random fields

Introduction and Preliminaries

I and I convergent function sequences

Exercises Measure Theoretic Probability

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

Bounded uniformly continuous functions

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Analysis Qualifying Exam

Continuous Functions on Metric Spaces

Real Analysis Problems

(2m)-TH MEAN BEHAVIOR OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS UNDER PARAMETRIC PERTURBATIONS

Wiener Measure and Brownian Motion

MULTIPLES OF HYPERCYCLIC OPERATORS. Catalin Badea, Sophie Grivaux & Vladimir Müller

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

The Arzelà-Ascoli Theorem

Deviation Measures and Normals of Convex Bodies

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

MTH 404: Measure and Integration

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Lecture Notes on Metric Spaces

Optimal global rates of convergence for interpolation problems with random design

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

The Hopf argument. Yves Coudene. IRMAR, Université Rennes 1, campus beaulieu, bat Rennes cedex, France

Envelope Theorems for Arbitrary Parametrized Choice Sets

The main results about probability measures are the following two facts:

Math 328 Course Notes

On a Class of Multidimensional Optimal Transportation Problems

Mixing in Product Spaces. Elchanan Mossel

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

The Kadec-Pe lczynski theorem in L p, 1 p < 2

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.

Immerse Metric Space Homework

A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION

B553 Lecture 1: Calculus Review

THE SET OF RECURRENT POINTS OF A CONTINUOUS SELF-MAP ON AN INTERVAL AND STRONG CHAOS

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Math 117: Topology of the Real Numbers

MATH 202B - Problem Set 5

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Topological vectorspaces

01. Review of metric spaces and point-set topology. 1. Euclidean spaces

Functional Analysis Winter 2018/2019

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

A fixed point theorem for weakly Zamfirescu mappings

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Weak and strong convergence theorems of modified SP-iterations for generalized asymptotically quasi-nonexpansive mappings

Banach Journal of Mathematical Analysis ISSN: (electronic)

Transcription:

On Asymptotic Proximity of Distributions Youri DAVYDOV 1 and Vladimir ROTAR 2 1 Laboratoire Paul Painlevé - UMR 8524 Université de Lille I - Bat. M2 59655 Villeneuve d Ascq, France Email: youri.davydov@univ-lille1.fr 2 Department of Mathematics and Statistics of the San Diego State University, USA and the Central Economics and Mathematics Institute of the Russian Academy of Sciences, RF Email: vrotar@math.ucsd.edu Abstract. We consider some general facts concerning convergence P n Q n 0 as n, where P n and Q n are probability measures in a complete separable metric space. The main point is that the sequences {P n } and {Q n } are not assumed to be tight. We compare different possible definitions of the above convergence, and establish some general properties. AMS 1991 Subject Classification: Primary 60F17, Secondary 60G15. Keywords: Proximity of distributions, merging of distributions, weak convergence, the central asymptotic problem, asymptotic proximity of distributions. 1

1 Introduction and results 1.1 Background and Motivation Usually, a limit theorem of Probability Theory is a theorem that concerns convergence of a sequence of distributions P n to a distribution P. However, there is a number of works where the traditional setup is modified, and the object of study is two sequences of distributions, {P n }and {Q n }. The goal in this case consists in establishing conditions for convergence P n Q n 0 (1.1.1) in a proper sense. In particular problems, P n and Q n are, as a rule, the distributions of the r.v. s f n (X 1,...,X n ) and f n (Y 1,...,Y n ), where f n ( ) is a function, and X 1,X 2,... and Y 1,Y 2,... are two sequences of r.v. s. The aim here is rather to show that different random arguments X 1,...,X n may generate close distributions of f n (X 1,...,X n ), than to prove that the distribution of f n (X 1,...,X n ) is close to some fixed distribution (which, above else, may be not true). Consider, for example, a quadratic form < A n X n,x n >, where A n is a n n- matrix and X n = (X 1,...,X n ) is a vector with i.i.d. coordinates. In this case, the limiting distribution, if any, depends on the matrices A n. For instance, with a proper normalization, the form X 1 X 2 + X 2 X 3 +... + X n 1 X n is asymptotically normal (if E{Xi 2} < ), while the form (X 1 +... + X n ) 2 has the χ 2 1 limiting distribution. Nevertheless, one can state the following unified limit theorem. Denote by P nf the distribution of < A n X n,x n > in the case when each X i has a distribution F. Then, under rather mild conditions, P nf P ng 0 (1.1.2) for any two distributions F and G with the same first two moments (see, [13], [7] for detail, and references therein). A class F such that (1.1.2) is true for any F,G F is called an invariance class ([13]). Let us come back to (1.1.1). Clearly, such a framework is more general than the traditional one. First, as was mentioned, the distributions P n and Q n themselves do not have to converge. Secondly, the sequences {P n } and {Q n } are not assumed to be tight, and the convergence in (1.1.1) covers situations when a part of the probability distributions or the whole distributions move away to infinity while the distributions P n and Q n are approaching each other. To our knowledge, the scheme above was first systematically used in Loéve [10, Chapter VIII, Section 28, The Central Asymptotic Problem], who considered 2

sums of dependent r.v. s. The same approach is applied in some non-classical limit theorems for sums of r.v. s; that is, theorems not involving the condition of asymptotic negligibility of separate terms (see, e.g., monograph [19] by Zolotarev, survey [13] by Rotar, and references in [19] and [13]; Liptser and Shiryaev [9], and Davydov and Rotar [6] on a non-classical invariance principle.) Non-linear functions f n have been also considered in the above framework. In particular, it concerns polynomials, polylinear forms of r.v. s, and quasi-polynomial functions; see, e.g., Rotar [13] and [14] for limit theorems, Götze and Tikhomirov [7], and Mossel, O Donnel and Oleszkiewicz [11] for the accuracy of the corresponding invariance principle. Other interesting schemes different from those above were explored in D Aristotile, Diaconis, and Freedman [2] and in Chatterjee [1]. The present paper addresses general facts on convergence (1.1.1). A corresponding theory was built in Dudley [3], [4], [5, Chapter 11] and D Aristotile, Diaconis, and Freedman [2]. The paper [2] concerns some possible definitions of convergence (1.1.1) in terms of uniformities, and establishes connections between these definitions (see also below). The theory in [3]-[5] (which is used in part in [2] also) is mainly based on a metric approach. We complement and, to a certain extent, develop the theory from [2] and [5], paying more attention to a functional approach. Throughout this paper, we repeatedly refer to and cite results from [2]-[5]. First, consider three definitions of convergence (1.1.1) explored in [2] (or, in the terminology of [2], merging ). D1. π(p n,q n ) 0 where π is the Lévy-Prokhorov metric. D2. f (x)(p n (dx) Q n (dx)) 0 (1.1.3) for all bounded continuous functions f. D3. T (P n ) T (Q n ) 0 for all bounded and continuous (with respect to weak topology) functions T on the space of probability measures. In [2], it was shown that D3 D2 D1, and D1, D2, D3 are equivalent iff the space on which measures are defined, is compact. As one can derive from [2], and as follows from results below, once we consider particular sequences {P n } and {Q n }, the above definitions are equivalent if one of the sequences is tight. 3

In the general setup, when the above sequences {P n } and {Q n } are not assumed to be tight, Definition D2 (and hence D3 also) looks too strong. As an example, consider that from [2]. Let us deal with distributions on the real line, and let P n be concentrated at the point n, and Q n at the point n + 1 n. Clearly, (1.1.3) is not true for, say, f (x) = sin(x 2 ). On the other hand, it would have been unnatural, if a definition had not covered such a trivial case of asymptotic proximity of distributions. (Clearly, in this case, π(p n,q n ) 0. To make the example simpler, one may consider f (x) = sin ( π 4 x 2). Certainly, the example above concerns the Euclidean metric in R. For other metrics, points n and n + 1 n may be not close. Below, we cover the general case of a complete separable metric space.) To fix the situation, one can consider (1.1.3) for functions only from the class of all bounded continuous functions vanishing at infinity, but such an approach would be too restrictive. In this case, the definition would not cover situations where parts of the distributions move away to infinity, continuing to approach each other (or, in the terminology from [2], merge ). On the other hand, in accordance with the same definition, the distributions P n and Q n concentrated, for example, at points n and 2n, would be viewed as merging, which also does not look reasonable. In this paper, we suggest to define weak convergence as that with respect to all bounded uniformly continuous functions. (This type of convergence was not considered in [2] and [5].) We justify this definition proving that such a convergence is equivalent to convergence in the Lévy-Prokhorov metric that satisfies some natural, in our opinion, requirements. In the counterpart of Definition D3, we require the uniform continuity of T. We establish also some facts concerning weak convergence uniform on certain classes of functions f (or linear functionals on the space of distributions); see Section 1.2.3 for detail. Proofs turn out to be, though not very difficult, but not absolutely trivial since the absence of tightness requires additional constructions. The point is that in the generalized setup, we cannot choose just one compact, not depending on n, on which all measures will be almost concentrated. On the other hand, as will follow from results below, if one of the sequences, P n or Q n, is tight, the definition of weak convergence suggested is equivalent to the classical definition, and we deal with the classical framework. We would like to thank P.J.Fitzsimmons and F.D.Lesley for useful discussions. 4

1.2 Main Results 1.2.1 Weak convergence and the Lévy-Prokhorov metric Let (H, ρ) be a complete separable metric space, and B be the corresponding Borel σ-algebra. The symbols P and Q, with or without indices, will denote probability distributions on B. All functions f below, perhaps with indices, are continuous functions f : H R. We denote by C the class of all bounded and continuous functions on (H, ρ), and by C - the class of all bounded and uniformly continuous functions. For two sequences of probability measures (distributions), {P n } and {Q n }, we say that P n Q n 0 weakly with respect to (w.r.t.) a class of functions K if f d(pn Q n ) 0 for all f K. If we do not mention a particular class K, the term weak convergence (or more precisely, merging) will concern that w.r.t. C. When it cannot cause misunderstanding, we will use the term convergence in the situation of merging also. In the space of probability distributions on B, we define - in a usual way - the Lévy-Prokhorov metric π(p,q) = inf{ε : P(A ε ) Q(A) + ε for all closed sets A}. (1.2.1) (One can restrict himself to only one inequality, not switching P and Q; see Dudley [4, Theorem 11.3.1].) Our main result is Theorem 1 The difference P n Q n 0 weakly (w.r.t. C) (1.2.2) if and only if π(p n,q n ) 0. (1.2.3) The choice of the Lévy-Prokhorov metric as a good metric that justifies the definition of weak convergence above, is connected, first of all, with the fact that the analog of the Skorokhod representation theorem ([16], see also, e.g., [5, Sec.11.7]) holds in the case of merging measures. More precisely, the following is true. 5

Let, the symbols X and Y, perhaps with indices, denote random variables defined on a probability space (Ω,A,P), and assuming values in H (that is, these variables are A B measurable.) The symbol P X stands for the distribution of X. Below, two metrics r 1 (x,y) and r 2 (x,y) in a space are said to be uniformly equivalent, if for any two sequences {x n } and {y n }, the relations r 1 (x n,y n ) 0 and r 2 (x n,y n ) 0 are equivalent. The first two (and main) assertions of the next theorem are stated and proved in Dudley [4]; see also the second edition [5, Theorem 11.7.1]. For the completeness of the picture, we present all facts regarding the metric π in one theorem. Theorem 2 Metric π is the only metric, to within uniform equivalence, that satisfies the following conditions. A. If ρ(x n,y n ) P 0, then π(p Xn, P Yn ) 0. B. If π(p n, Q n ) 0, then there exist a probability space and random elements X n,y n on this space such that P Xn = P n, P Yn = Q n, and ρ(x n,y n ) P 0. C. If π(p n, Q n ) 0, then π(p n f 1,Q n f 1 ) 0 for any uniformly continuous function f. D. If Q n = Q, then the convergence π(p n, Q) 0 is equivalent to weak convergence with respect to all bounded continuous functions. Remarks. 1. We have already mentioned above an example showing that convergence (1.1.3) for all f C does not possess Property A. In other words, there exist r.v. s X n and Y n such that ρ(x n,y n ) P 0, while P Xn P Yn 0 weakly with respect to C. If H has bounded but non-compact sets, a similar example may be constructed even for continuous functions equal to zero out of a closed bounded set. Indeed, let O r (x) = {y : ρ(x,y) r}. Consider the case where for some x 0, the ball O = O 1 (x 0 ) is not compact. Then there exists a sequence {x n } O which does not contain a converging subsequence. Furthermore, there exists a numerical sequence δ n 0, such that the balls O δn (x n ) are disjoint, and each contains only one element from {x n }, that is, the center x n. We define { 1 1 f (x) = δ n ρ(x,x n ) if x O δn (x n ), 0 if x k O δk (x k ). 6

The function f is, first, bounded, and secondly, due to the choice of {x n }, f is continuous. On the other hand, one may set X n x n and Y n y n, where y n is a point from the boundary of O δn (x n ). Clearly, ρ(x n,y n ) = δ n 0, while f dp Xn = 1 and f dp Yn = 0. 2. As was mentioned in the introduction, if one of the sequences, say {Q n }, is tight, and (1.2.2) is true, then the other sequence, {P n }, is also tight. In this case, relation (1.1.3) is true for all f C, and we deal with the classical scheme. If (H,ρ) is a space in which any closed bounded set is compact, this fact is easy to prove directly. In the general case, it is easier to appeal to Theorem 1 and Prokhorov s theorem on relative compactness w.r.t. functions from C ([12]). More precisely, assume that (1.2.2) holds and {Q n }is tight. Then, by Prokhorov s theorem, {Q n } is relatively compact with respect to weak convergence for functions from C. Let a subsequence Q nk weakly converges to some Q w.r.t. C, and hence π(q nk,q) 0. By virtue of Theorem 1, π(p nk,q nk ) 0, and consequently, π(p nk,q) 0. So, P nk converges to the same Q w.r.t. to C, and P nk Q nk 0 w.r.t. to all functions from C. (1.2.4) Thus, any subsequence of {P n } contains a subsubsequence convergent w.r.t. C. Hence, again by Prokhorov s theorem, {P n } is tight. Moreover, in this case P n Q n 0 w.r.t. C. Indeed, otherwise we could select convergent subsequences P nk and Q nk and a bounded and continuous f such that f d(p nk Q nk ) 0, which would have contradicted (1.2.4). 3. Let us return to Definition D3 from Section 1.1. To make it suitable to the setup of this paper, one should consider functions T on the space of probability measures on B, uniformly continuous with respect to the the Lévy-Prokhorov metric (or, which is the same, w.r.t. the weak convergence regarding C). So modified Definition D3 will be equivalent to convergences (1.2.2)-(1.2.3). 1.2.2 Lipschitz functions For f C, we set f L = sup x y { f (x) f (y) ρ(x, y) }, 7

and C BL = { f : f L <, f < }. In Dudley [5, Sec.11.3], it was shown that in the traditional setup, for the weak convergence P n P w.r.t. the whole class C, it suffices to have the convergence w.r.t. C BL. A similar property is true for the generalized setup of merging probability measures. Namely, let {P n } and {Q n } be two fixed sequences of probability measures. Theorem 3 The weak convergence P n Q n 0 with respect to C BL implies the weak convergence with respect to C. Next, we consider classes of functions on which weak convergence is uniform. 1.2.3 Uniform convergence In Dudley [5], it was shown that in the traditional setup, weak convergence is uniform on any class of functions f with uniformly bounded norms f L and f. More precisely, consider the metric { } β(p,q) := sup f d(p Q) : f L + f 1. In [5, Section 1.3], it was proved that the weak convergence P n P w. r. t. C, is equivalent to the convergence β(p n,p) 0. We establish a similar property in the generalized setup and for arbitrary classes of functions with a fixed order of their moduli of continuity. For f C, we define its modulus of continuity ω f (h) = sup{ f (x) f (y) : ρ(x,y) h}. (1.2.5) Let ω(h) be a fixed non-decreasing function on R +, such that ω(h) 0 as h 0. Set C ω = { f : f <, and ω f (h) = O(ω(h) + h)}. (Usually, h = O(ω(h)), and one can write just ω f (h) = O(ω(h)). However, there are situations when ω(h) even equals zero for sufficiently small h s; for example, if H = N with the usual metric.) The next proposition, having its intrinsic value, plays an essential role in proving Theorem 1. 8

Theorem 4 Let for all f C ω, and let f d (P n Q n ) 0 (1.2.6) F ω = { f : f < 1, and ω f (h) ω(h) for all h 0}. Then sup f F ω f d (P n Q n ) 0. (1.2.7) Clearly, instead of the above class F ω, one may consider a class of uniformly bounded (not necessarily by one) functions f such that ω f (h) k 1 ω(h) + k 2 h, where k 1, k 2 are fixed constants. (Such a formal generalization follows from (1.2.7) just by replacing ω(h) by k 1 ω(h) + k 2 h.) Corollary 5 If (1.2.6) is true for all f C BL, then sup f d (P n Q n ) 0 (1.2.8) f F for any class F of uniformly bounded functions with uniformly bounded Lipschitz constants. Corollary 6 If (1.2.6) is true for all f C, then (1.2.8) is true for any class F of uniformly bounded and uniformly equicontinuous functions. To derive the above corollaries from Theorem 4, one should set ω(h) = ω F (h) := sup f F ω f (h). (In the literature, there is no unity in definitions of equicontinuity: some authors define it pointwise; in other definitions, the word uniformly is redundant. When talking about uniform equicontinuity, we mean that the function ω F (h) is bounded and vanishing at the origin.) 2 Proofs Proofs of Theorems 1 and 3 essentially use Corollary 5 from Theorem 4 and Theorem 2. We start with a proof of the latter theorem the main assertions of this theorem, A and B, are known ([5, Sec.11.7]), and the rest of the proof is short. The proof of Theorem 4 is relegated to the last Section 2.2. 9

2.1 Proofs of Theorems 1 3 2.1.1 Proof of Theorem 2 To justify Property C, consider the r.v. s X n,y n defined in Property B. We have f (X n ) f (Y n ) ω f (ρ(x n,y n )). Consequently, f (X n ) f (Y n ) P 0, and it suffices to appeal to Property A. Property D is obvious. Now, let r 1 (, ) and r 2 (, ) be two metrics with Properties A-B. If r 1 (P n,q n ) 0, then there exist X n, Y n for which ρ(x n,y n ) 0, and hence by Property A, r 2 (P n,q n ) 0. For proving Theorems 1 and 3, we need 2.1.2 Two lemmas Lemma 7 If π(p n,q n ) 0, then sup f F f d(p n Q n ) 0 (2.1.1) for any class F of uniformly bounded and uniformly equicontinuous functions. Proof. By Theorem 2, there exist X n,y n such that P Xn = P n, Q n = P Yn, and ρ(x n,y n ) P 0. By conditions of the lemma, M := sup f F f <, and ω(h) := sup f F ω f (h) 0 as h 0. For any ε > 0, f d(p n Q n ) = E { f (X n ) f (Y n )} E { f (X n ) f (Y n ) } = E{ f (X n ) f (Y n ) ; ρ(x n,y n ) > ε} + E{ f (X n ) f (Y n ) ; ρ(x n,y n ) ε} 2MP(ρ(X n,y n ) > ε) + ω(ε). Hence, lim n sup f F f d(p n Q n ) ω(ε) 0 as ε 0. Lemma 8 (Dudley [5, a part of Theorem 11.7.1]). Suppose (2.1.1) is true for F = F 1 := { f : f 1, f L 1}. Then π(p n,q n ) 0. 10

Proof. In [5], the proof of this fact is based on the relation π 2 β, which is proved separately. For the completeness of the picture, we give a direct proof (which, in essence, is very close to the reasoning in [5, p.396 ]). For a closed set K, and an ε > 0, set IK(x) ε = 1 if x K, 1 1 ε ρ(x,k) if x Kε \ K, 0 otherwise. (Here, K ε is the ε-neighborhood of K.) Since ω I ε K (h) 1 ε h, the family {Iε K (x) : K is closed} F ε := { f : f 1, f L 1/ε}. Clearly, if (2.1.1) holds for F = F 1, then it holds for F = F ε for any ε > 0. Therefore, n (ε) := sup K IK(x)d(P ε n Q n ) 0 (2.1.2) for any ε > 0 as n. On the other hand, P n (K) I ε K(x)dP n n (ε) + Q n (K ε ). (2.1.3) From (2.1.2) and (2.1.3), it follows that for sufficiently large n, P n (K) ε + Q n (K ε ), which implies that π(p n,q n ) ε. π(p n,q n ) 0. Since ε is arbitrary small, this means that 2.1.3 Proof of Theorem 1 The implication π(p n,q n ) 0 f d(p n Q n ) 0 for any f C, immediately follows from Lemma 7. Assume f d(p n Q n ) 0 for any f C. Then, the same is true for all f C BL. Hence, by Corollary 5 from Theorem 4, relation (2.1.1) holds for F = F 1 (defined in Lemma 8). This implies the convergence π(p n,q n ) 0 by virtue of Lemma 8. 11

2.1.4 Proof of Theorem 3 Let f d(p n Q n ) 0 for any f C BL. As was proved in Section 2.1.3 above, π(p n,q n ) 0. By virtue of Theorem 1, this implies that f d(p n Q n ) 0 for all f C. 2.2 Proof of Theorem 4 2.2.1 Three more lemmas Lemma 9 For any two functions f and g, provided that the l.-h.sides are finite. ω f g (h) g ω f (h) + f ω g (h), (2.2.1) ω f g (h) max{ω f (h), ω g (h)}, (2.2.2) ω f g (h) max{ω f (h), ω g (h)}. (2.2.3) Proof is straightforward and very close to that in [5, Propositions 11.2.1-2] dealing with Lipschitz functions. Next, for a function f (x), a set K, and a number t > 0, we define the function f (t) K (x) = f (x) ( ) if x K, f (x) 1 ρ(x,k) t if x K t \ K, 0 otherwise. (2.2.4) Lemma 10 Let f be a bounded uniformly continuous function. Then f (t) f, (2.2.5) and for any h 0, K h ω (t) f (h) ω f (h) + f K t. (2.2.6) Proof. Bound (2.2.5) is obvious. Next, note that for any xand y, (Indeed, for any z K, ρ(x,k) ρ(y,k) ρ(x,y). (2.2.7) ρ(x,k) ρ(x,z) ρ(x,y) + ρ(y,z), 12

which implies that ρ(x,k) ρ(x,y) + ρ(y,k). We can also switch x and y.) Thus, for q t (x) := 1 ρ(x,k)/t, we have ω qt (h) h/t. Together with (2.2.1), this implies (2.2.6). Note that, in particular, from Lemma 10 it follows that if f is uniformly continuous, so does f (t) K. Let the signed measure Ψ n = P n Q n. Lemma 11 Suppose f dψ n 0 (2.2.8) for any f C. Let F be a class of uniformly bounded and uniformly equicontinuous functions. Set ω(h) := sup ω f (h). (2.2.9) f F Then for any compact K, and t > 0, lim sup n f F f (t) K dψ n 4ω(t). (2.2.10) Proof. By the Arzelà-Ascoli theorem, for any ε > 0, there is a finite family { f 1,..., f d } F such that for any f F, there exists s = s( f ) {1,...,d}, for which sup f (x) f s (x) < ε. (2.2.11) x K On the other hand, for any z K, and x K t, f (t) (t) K (x) f s,k (x) = ( f (x) f s(x) 1 ρ(x,k) ) t f (x) f (z) + f s (x) f s (z) + f (z) f s (z) 2ω(ρ(x,z)) + ε. Hence, for x K t, f (t) K (t) (x) f (x) 2ω(t) + ε. (2.2.12) s,k 13

Therefore, f (t) K dψ n ( f (t) (t) K (x) f s,k (x))dψ n + f (t) s,k (x)dψ n 2(2ω(t) + ε) + f (t) s,k (x)dψ n 4ω(t) + 2ε + max f (t) m,k (x)dψ n. m {1,...,d} Since max m {1,...,d} m,k (x)dψ n 0 as n, (2.2.13) f (t) and ε is arbitrary small, this implies (2.2.10). We turn to 2.2.2 A direct proof of Theorem 4 Consider a fixed function ω(h), and the class F = F ω from the statement of Theorem 4. Assume that there exist a sequence { f 1, f 2,... } F, a sequence {m k }, and a δ > 0, such that f k dψ mk δ for all k = 1,2,.... Without loss of generality we can identify {m k } with N, and write just f k dψ k δ for all k = 1,2,.... We may also assume that for all k s, 0 f k (x) 1. (2.2.14) Now, let t and ε t be two fixed positive numbers to be chosen later. Let n 1 = 1, and K 1 be a compact such that Ψ n1 (K 1 ) t. (Here the measure Ψ (dx) is the variation of Ψ(dx), and K is the complement of K.) Let n 2 n 1 + 1 be a number such that for all n n 2 f (ε/3) 1,K 1 dψ n t, (2.2.15) 14

and sup j f (t) j,k 1 dψ n 5ω(t). (2.2.16) Inequality (2.2.15) is true for sufficiently large n because f (ε/3) 1,K 1 is uniformly continuous due to Lemma 10, and (2.2.16) holds for large n by virtue of (2.2.10). Note also that in (2.2.16) we deal with the supremum over a class of functions, while (2.2.15) concerns only one fixed function. Next, we consider a compact K 2 such that K 2 K 1, and Ψ n2 (K 2 ) t. Let us set f 1 = f 1, and By virtue of (2.2.16), f 2 dψ n2 = f 2 = f n2 f (t) n 2,K 1. f n2 dψ n2 f (t) n 2,K 1 dψ n2 δ 5ω(t). Also, By Lemma 10, f 2 1, and f 2 (x) = 0 for all x K 1. ω f 2 (h) ω fn2 (h) + ω f (t) n 2,K 1 (h) 2ω(h) + h t. (2.2.17) Now, we set L 1 = K 1, and L 2 = K 2 \ K ε 1. Let By construction, g 1 (x) = f (ε/3) 1,K 1 (x), and g 2 (x) = g 1 (x) + f (ε/3) 2,L 2 (x). (2.2.18) g 1 (x) = 0 for x / K ε/3 1, and g 2 (x) = 0 for x / K ε/3 2. By Lemma 10, ω g1 (h) ω(h) + h ε/3 = ω(h) + 3h ε. (2.2.19) 15

Now, since the sets L ε/3 1 and L ε/3 2 are disjoint, in (2.2.18), either g 1 (x) or f (ε/3) 2,L 2 (x) equals zero. So, we can also write that g 2 (x) = max{g 1 (x), f (ε/3) 2,L 2 (x)}. Then, from Lemma 9, Lemma 10, (2.2.19), and (2.2.17), it follows that ω g2 (h) max{ω g1 (x), ω f 2 (h) + h ε/3 } 2ω(h) + h t + 3h ε. (2.2.20) In view of (2.2.19), bound (2.2.20) is true for both functions, g 1 and g 2. Thus, both functions, g 1 and g 2, are bounded and uniformly continuous (since ε,t > 0 are fixed). Next, we choose n 3 n 2 + 1 such that for all n n 3, g 2 (x)dψ n t, and sup j Let K 3 be a compact such that K 3 K 2, and We define a function f (t) j,k 2 dψ n 5ω(t). Ψ n3 (K 3 ) t. f 3 = f n3 f (t) n 3,K 2 which has properties similar to those of f 2, and we define the set The sets L ε/3 1, Lε/3 2 L 3 = K 3 \ K ε 2., and Lε/3 3 are mutually disjoint. We set g 3 (x) = g 2 (x) + f (ε/3) 3,L 3 (x), (2.2.21) and again note that in (2.2.21), either g 2 (x) or f (ε/3) 2,L 3 (x) equals zero. Similarly to what we did above, we conclude that (2.2.20) is true for g 3 also. So, g 3 (x) is fixed, uniformly continuous, and g 3 (x) = 0 for x K ε/3 3. Continuing the recurrence procedure in the same fashion, we come to the following objects. (a) The sequence n m. 16

(b) The sequence of compacts K m such that for all m Ψ nm (K m) t. (2.2.22) (c) The sequence of compact sets L m K m such that the sets L ε/3 m (d) The sequence of functions f m such that for all m are disjoint. f m (x) = 0 for all x K m 1, (2.2.23) ω f m (h) 2ω(h) + h t, (2.2.24) and f m dψ nm δ 5ω(t). (2.2.25) (e) The non-decreasing sequence {g 1 (x) g 2 (x)...} such that g m (x) = g m 1 (x)+ f (ε/3) m,l m (x), g m 1 (x)dψ nm t, (2.2.26) g m (x) = 0 for x Km ε/3, and ω gm (h) 2ω(h) + h t + 3h ε. Let Clearly, 0 g(x) 1, and g(x) = lim m g m(x). ω g (h) 2ω(h) + h t + 3h ε. (2.2.27) Since the numbers ε and t are fixed, the function g C ω. We show that, nevertheless, one can choose ε and t such that I n := gdψ n 0. To make exposition simpler, we replace the sequence {n m } by N, write Ψ n instead of Ψ nm, and remove from f s. All of this cannot cause misunderstanding. By virtue of (2.2.25), I n = f n dψ n + (g f n )dψ n δ 5ω(t) + (g f n )dψ n. 17

For J n := (g f n )dψ n, we write J n (g f n )dψ n + Ψ n (K n ) (g f n )dψ n +t. K n K n Now, (g f n )dψ n = K n + L n K n K ε/3 n 1 + := J n1 + J n2 + J n3. K n \(L n K ε/3 n 1 ) By construction, J 1n = 0. For the second integral, we have J n2 = (g f n )dψ n f n )dψ n K ε/3 n 1 Kn Kn 1(g ε/3 (g f n )dψ n + Ψ n (dx) K ε/3 n 1 Kn gdψ n + f n dψ n +t, (2.2.28) K ε/3 n 1 K ε/3 n 1 in view of (2.2.22). By construction, g(x) = g n 1 (x) for x K ε/3 n 1. So, K ε/3 n 1 gdψ n = K ε/3 n 1 by virtue of (2.2.26). In view of (2.2.23) and (2.2.24), f n dψ n = K ε/3 n 1 g n 1 dψ n = K ε/3 n 1 \K n 1 ( 2ω(ε/3) + ε/3 t g n 1 dψ n t, (2.2.29) f n dψ n ω fn (ε/3) Ψ n (dx) ) 2 = 4ω(ε/3) + 2ε 3t. Thus, J n2 4ω(ε/3) + 2ε + 2t. (2.2.30) 3t 18

To evaluate J n3, first note that if x K n \ (L n K ε/3 n 1 ), then x Kε n 1, and hence f n (x)dψ n ω K n \(L n K ε/3 n 1 ) K n \(L n K ε/3 fn (ε) Ψ n (dx) n 1 ) ( 2ω(ε) + ε t ) 2 = 4ω(ε) + 2ε t. (2.2.31) Now, let us observe that if x K n \ (L n K ε/3 n 1 ) and x / Lε/3 n, then g(x) = 0. So, K n \(L n K ε/3 n 1 ) g(x)dψ n = (K n \L n ) L ε/3 n g(x)dψ n. On the other hand, (K n \ L n ) L ε/3 n K ε n 1, and g(x) f n(x) on (K n \ L n ) L ε/3 n. Thus, for the function g, we have the bound similar to (2.2.31), and and J n3 8ω(ε) + 4ε t. Combining the bounds above, we have J n 4ω(ε/3) + 8ω(ε) + 14ε 3t + 3t 12ω(ε) + 5 ε t + 3t, I n δ 5ω(t) 12ω(ε) 5 ε t 3t δ 17ω(t) 5ε t 3t, because we choose ε t. Without loss of generality we can take t < 1. Let ε = t 2. Then I n δ 17ω(t) 8t. Clearly, one can choose t for which I n 2 δ for all n. References [1] Chatterjee, S., (2006). A generalization of the Lindeberg principle, Ann. Probab., 34 no. 6, 2061-2076. [2] D Aristotile, A., Diaconis, P., and Freedman, D. (1988), On merging of probabilities, Sankhyă: The Indian Journal of Statistics, v.58, Series A, Pt.3, pp. 363-380. 19

[3] Dudley, R. M., (1968). Distances of probability measures and random variables, Ann. Math. Statist. 39, 1563-1572. [4] Dudley, R. M., (1989). Real Analysis and Probability, 1st edition, Wadsworth, Inc. [5] Dudley, R. M., (2002). Real Analysis and Probability, 2nd edition, Cambridge University Press. [6] Davydov, Yu.A., and Rotar, V.I., (2007). On a non-classical invariance principle, http://arxiv.org/math.pr/0702085. [7] Götze, F., and Tikhomirov, A. N. (1999). Asymptotic distribution of quadratic forms, Ann. Probab. 27, no. 2, 1072 1098. [8] Götze, F., and Tikhomirov, A. (2002). Asymptotic distribution of quadratic forms and applications, J. Theoret. Probab. 15, no. 2, 423 475. [9] Liptser, R.Sh., and Shiryaev, A.N. (1983), On the invariance principle for semi-martingales: the non-classical case, Theory Probabl. Appl., XXVIII, 1, 1-34. [10] Loève, M., (1963). Probability Theory, 3rd edition, Princeton, N.J., Van Nostrand. [11] Mossel, E., O Donnel, R., and Oleszkiewicz, K., (2005). Noise stability of functions with low influences: invariance and optimality, to appear, http://arxiv.org/math.pr/0503503. [12] Prokhorov, Yu.V., (1956). Convergence of random processes and limit theorems in Probability Theory, Theory Probabl. Appl., I, 2, 157-214. [13] Rotar, V.I., (1975). Limit theorems for multilinear forms and quasipolynomial functions, Theory Probabl. Appl., XX, 3, 512-532. [14] Rotar, V.I., (1979). Limit theorems for polylinear forms, Journal of Multivariate analysis, 9, 4, 511-530. [15] Rotar, V.I., (1982). On summation of independent variables in the nonclassical situation; Russian Mathematical Surveys, 37:6, 151-175. 20

[16] Skorohod, A.V., (1956). Limit theorems for stochastic processes, Theory Probabl. Appl., I, 3, 261-290. [17] Strassen, V., (1965). The existence of probability measures with given marginals, Ann. Math. Stat., 36, pp. 423 439. [18] Szulga, A, (1982). On minimal metrics in the space of random variables, Teor. Probab. Appl., 27, pp. 424 430. [19] Zolotarev, V.M., (1997). Modern Theory of Summation of Random Variables, V.S.P. Intern. Science Publishers. 21