Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Weak Leiden University Amsterdam, 13 November 2013

Outline 1 2 3 4 5 6 7

Definition Definition Let µ, µ 1, µ 2,... be probability measures on (R, B). It is said that µ n converges weakly to µ, and we then write µ n w µ, if µ n (f ) µ(f ) for all f C b (R). If X, X 1, X 2,... are random variables (possibly defined on different probability spaces) with distributions µ, µ 1, µ 2,..., then we say that X n converges weakly to X, and write X n X, if it holds that µ n w µ.

Example Example Let {x n } be a convergent of real numbers sequence with lim n x n = 0. Then for every f C b (R) one has f (x n ) f (0). Let µ n be the Dirac measure concentrated on {x n } and µ the Dirac measure concentrated in the origin. Since µ n (f ) = f (x n ), we see that µ n w µ.

Sufficient condition Theorem Consider distributions µ, µ 1, µ 2,... having densities f, f 1, f 2,... w.r.t. Lebesgue measure λ on (R, B). Suppose that f n f λ-a.e. Then µ n w µ. Proof. Apply Scheffé s to conclude that f n f in L 1 (R, B, λ). Let g C b (R). Since g is bounded, we also have f n g fg in L 1 (R, B, λ) and hence µ n w µ.

Example Example µ n (B) µ(b) for every B B, which is the same as to require that th class of test f consists of indicators of Borel sets, or even by requiring that the integrals µ n (f ) converge to µ(f ) for every bounded measurable function, is too rigid to be useful. Let x n = 1/n and let µ n be the Dirac measure concentrated on x n and µ the Dirac measure concentrated in the origin. Take B = (, x]. Then µ n (B) µ(b) fails for x = 0, although it is natural in this case to say that µ n µ.

Simple criterion Theorem The following are equivalent: w (i) µ n µ; (ii) every subsequence {µ nj } of {µ n } has a further w subsequence {µ njk }, such that µ njk X as k.

Proof Proof. That (i) implies (ii) is obvious. We prove the reverse implication. Assume the µ n w µ fails. This means there exists a bounded continuous function f, a subsequence {n j } of {n} and a constant ε > 0, such that for all j. But then µ nj (f ) µ(f ) ε µ njk (f ) µ(f ) ε for any subsequence {n jk } of {n j } as well. Hence {µ nj } has no further subsequence converging weakly to µ, a contradiction.

Notation The boundary E of a set E B is E = E \ E, where E is the closure and E is the interior of E. The distance from a point x to a set E is d(x, E) = inf{ x y : y E}. The δ-neighbourhood of E (here δ > 0) is the set E δ = {x : d(x, E) < δ}.

Notation Theorem (Portmanteau ) Let µ, µ 1, µ 2,... be probability measures on (R, B). The following statements are equivalent. (i) µ n w µ. (ii) lim sup n µ n (F ) µ(f ) for all closed sets F. (iii) lim inf n µ n (G) µ(g) for all open sets G. (iv) lim n µ n (E) = µ(e) for all sets E with µ( E) = 0 (all µ-continuity sets).

Definition Definition We shall say that a sequence {F n } on R converges weakly to a limit distribution function F, and shall write F n F, if F n (x) F (x) for all x C F, where C F is the set of all those points, at which F is continuous.

Weak and distribution Theorem Let µ, µ 1, µ 2,... be probability measures on the real line and denote by F, F 1, F 2,... the corresponding distribution. The following statements are equivalent: (i) µ n w µ; (ii) F n F.

Uniform Theorem Suppose F n F and F is continuous. Then lim sup F n (t) F (t) = 0. n t R

Definition Definition A sequence of probability measures {µ n } on (R, B) is called sequentially compact, if every subsequence {µ nk } of {µ n } has a further weakly convergent subsequence.

Helly s theorem Theorem (Helly s theorem) Let {F n } be a sequence. Then there exists a possibly defective distribution function F and a subsequence {F nk }, such that F nk (x) F (x), for all x C F.

Example Example Let µ n be the Dirac measure concentrated on {n}. Then its distribution function is given by F n (x) = 1 [n, ) (x) and hence lim n F n (x) = 0. Hence the limit function F has to be the zero function, which is clearly defective. One colloquially says that in the limit the probability mass escapes to infinity.

Tightness Definition A sequence of probability measures {µ n } on (R, B) is called tight, if lim M inf n µ n ([ M, M]) = 1. Remark In order to show that a sequence is tight it is sufficient to show tightness from a certain suitably chosen index on.

Prokhorov s theorem Theorem A sequence {µ n } of probability measures on (R, B) is tight if and only if it is sequentially compact. Corollary If µ n w µ for some probability measure µ, then the sequence {µ n } is tight.

Continuous mapping theorem Theorem (Continuous mapping theorem) Let g : R R be continuous at every point of a set C, such that P(X C) = 1. a.s. (i) If X n X, then g(x n ) a.s. g(x ). (ii) If X n X, then g(x n ) g(x ). (iii) If X n P X, then g(xn ) P g(x ).

Quantile function Definition The quantile function of a distribution function F is a generalised inverse F 1 : (0, 1) R given by F 1 (p) = inf {x : F (x) p}.

First comments Quantile function is left-continuous. Its range is equal to the support of F (or rather of the corresponding µ) and therefore, F 1 is often unbounded. The fact that the quantile function is monotone implies that it might have at most a countable number of discontinuity points only.

More properties Lemma For every 0 < p < 1 and x R, 1 F 1 (p) x if and only if p F (x); 2 F F 1 (p) p, with equality holding if and only if p is in the range of F ; the equality can fail if and only if F is discontinuous at F 1 (p); 3 F F 1 (p) p, with F (x) = F (x ); 4 F 1 F (x) x; the equality fails if and only if x is in the interior or at the right endpoint of a flat part of F ; 5 F F 1 F = F ; F 1 F F 1 = F 1 ; 6 (F G) 1 = G 1 F 1.

Picture Figure : Distribution function (red line).

Two corollaries Corollary (Quantile transformation) Let F be an arbitrary distribution function and U a uniform random variable on [0, 1]. Then F 1 (U) F. Corollary (Probability integral transformation) Let X F for a continuous distribution function F. Then F (X ) is uniformly distributed on [0, 1].

of quantile Definition We shall say that a sequence of quantile Fn 1 converges weakly to a limit quantile function F 1, and denote this by Fn 1 F 1, if Fn 1 (t) F 1 (t) at every point 0 < t < 1, at which F 1 is continuous.

Equivalence Lemma For any sequence F n, F n F if and only if F 1 n F 1.

Theorem ( ) Let X n X. Then there exists a probability space ( Ω, F, P) and random variables X n, X defined on it, such that for all n 1, X d n = Xn, X = d X, and X a.s. n X.

in probability Theorem Suppose that a sequence {X n } of random variables and a random variable X are defined on the same probability space. Assume that X n P X. Then Xn X. Corollary Suppose that a sequence {X n } of random variables and a random variable X are defined on the same probability space. a.s. Assume that X n X. Then X n X.

Counterexample Example Let X N(0, 1) and X n = X for all n N. Then P( X n X > ε) = P( X > ε/2) > 0 for all n N, and thus in probability fails. Obviously, so does the almost sure. On the other hand, by the symmetry of the standard normal distribution, X n d = X, and hence Xn X.

Exception Theorem Let the random variables X, X 1, X 2,... be defined on the same probability space. If X n X, where P(X = x) = 1 for some x R, then also X n P X. Proof. The distribution µ of X is the Dirac measure at x. For any ε > 0, the sets (x + ε, ) and (, x ε) are µ-continuity sets. Note that P( X n X > ε) = P(X n > x + ε) + P(X n < x ε). The right-hand side tends to zero as n by the portmanteau.

of first moments Theorem Assume that X n X. If the sequence {X n } is uniformly integrable, then E [X n ] E [X ] as n. Proof. By the almost sure theorem, there exists a probability space ( Ω, F, P) with random variables X, X 1, X 2..., such that X = d X d, X n = Xn for all n N, and X a.s. n X. By the uniform integrability of the family {X n }, the family { X n } is also uniformly integrable. Therefore E [ X n ] E [ X ], and the result follows.

of first moments Remark Assume additionally that {X n } and X are defined on the same probability space. One could have thought that we also have the L 1 -: E [ X n X ] 0. However, this in general is false and here is a simple counterexample: take X N(0, 1) and X n = X for all n N. Then E [ X n X ] = 2E [ X ], which does not tend to zero. The point is that E [ X n X ] depends on the bivariate law of (X n, X ), and this need not be the same as that of ( X n, X ) (marginals do not determine joint distributions uniquely).

No UI? Theorem If X n X, then E [ X ] lim inf n E [ X n ].

Theorem Let {X n } and {Y n } be two sequences of the random variables defined on the same probability space. (i) If X n X and X n Y n P 0, then Y n X. (ii) If X n X and Y n c for a constant c, then X n Y n cx.