Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let F denote the cdf of X. Example: A good example to keep in mind is the following. Let Y 1, Y 2,... be a sequence of i.i.d. random variables. Let X n = 1 n n i=1 be the average of the first n of the Y i s. This defines a new sequence X 1, X 2,..., X n. That is, the sequence of interest X 1,..., X n might be a sequence of statistics based on some other sequence of random variables. Y i 1. X n converges to X in probability, written X n X, if, for every ɛ > 0, In other words, and X n X = o (1). ( X n X > ɛ) 0 as n. (1) lim ( X n X > ɛ) = 0 2. X n converges almost surely to X, written X n X, if, for every ɛ > 0, This is also called Strong convergence. ( lim X n X < ɛ) = 1. (2) 3. X n converges to X in quadratic mean (also called convergence in L 2 ), written X n X, if E(X n X) 2 0 as n. (3) 4. X n converges to X in distribution, written X n X, if at all t at which F is continuous. lim F n(t) = F (t) (4) 1
Recall the following definition. Definition 1 Z has a point mass distribution at a, written as Z δ a, if (Z = a) = 1 in which case F Z (z) = δ a (z) = { 0 if z < a 1 if z a. and the probability mass function is f(z) = 1 for z = a and 0 otherwise. Example 2 Consider flipping a coin for which the probability of heads is p. Let X i denote the outcome of a single toss (0 or 1). Hence, p = (X i = 1) = E(X i ). The fraction of heads after n tosses is X n. According to the law of large numbers, X n converges to p in probability. This does not mean that X n will numerically equal p. It means that, when n is large, the distribution of X n is tightly concentrated around p. Suppose that p = 1/2. How large should n be so that (.4 X n.6).7? First, E(X n ) = p = 1/2 and Var(X n ) = σ 2 /n = p(1 p)/n = 1/(4n). From Chebyshev s inequality, (.4 X n.6) = ( X n µ.1) The last expression will be larger than.7 if n = 84. = 1 ( X n µ >.1) 1 1 4n(.1) = 1 25 2 n. When the limiting random variable is a point mass, we change the notation slightly. For example, 1. If (X = c) = 1 and X n X then we write X n c. 2. If X n converges to c in quadratic mean, written X n c, if E(X n c) 2 0 as n. 3. If X n converges to c in distribution, written X n c, if for all t c. lim F n(t) = δ c (t) Suppose we are given a probability space (Ω, B, ). We say a statement about random elements holds almost surely () if there exists an event N B with (N) = 0 such that the statement holds if ω N c. Alternatively, we may say the statement holds for a.a. (almost all) ω. The set N appearing the definition is sometimes called the exception set. Here are several examples of statements that hold : 2
1. If {X n } is a sequence of random variables, then lim X n exists means that there exists an event N B, such that (N) = 0 and if ω N c then exists. It also means that for a.a. ω, lim X n lim sup X n (ω) = lim inf X n(ω). We will write lim X n = X or X n X, or X n X. 2. X n converges almost surely to a constant c, written X n c if there exists an event N B, such that (N) = 0 and if ω N c then lim X n = c. Example 3 (Almost sure convergence) Let the sample space S be [0, 1] with the uniform probability distribution. If the sample space S has elements denoted by s, then random variables X n (s) and X(s) are all functions defined on S. Define X n (s) = s+s n and X(s) = s. For every s [0, 1), s n 0 as n and X n (s) s = X(s). However X n (1) = 2 for every n so does not converge to 1 = X(1). Since the convergence occurs on the set [0, 1) and ([0, 1)) = 1. X n X: that is, the function X n (s) converge to X(s) for all s S except for s N = {1}, where N S and (N) = 0. See Example CB 5.5.7. Example 4 Example CB 5.5.8 Continuing Example 3. Let S = [0, 1]. Let be uniform on [0, 1]. Let X(s) = s and let X 1 = s + I [0,1] (s), X 2 = s + I [0,1/2] (s), X 3 = s + I [1/2,1] (s) X 4 = s + I [0,1/3] (s), X 5 = s + I [1/3,2/3] (s), X 6 = s + I [2/3,1] (s) etc. It is straightforward to see that X n converges to X in probability. As n, ( X n X > ɛ) is equal to the probability of an interval [a n, b n ] of s values whose length is going to 0. Then X n X. However, X does not converge to X almost surely. Indeed, there is no value of s S for which X n (s) s = X(s). For each s, the value X n (s) alternates between the values of s and s + 1 infinitely often, that is, X n (s) does not converge to X(s). That is, no pointwise convergence occurs for this sequence. 3
You are not expected to know the following Theorem 5 for this class. as Theorem 5 X n X if and only if, for every ɛ > 0, lim (sup X m X ɛ) = 1. m n Theorem 6 The following relationships hold: (a) X n X implies that X n X. (b) X n X implies that X n X. (c) If X n X and if (X = c) = 1 for some real number c, then X n X. as (d) X n X implies X n X. In general, none of the reverse implications hold except the special case in (c). Example 7 (Convergence in distribution) Let X n N(0, 1/n). Intuitively, X n is concentrating at 0 so we would like to say that X n converges to 0. Let s see if this is true. Note that nx n N(0, 1). Let F be the distribution function for a point mass at 0: (X = 0) = 1. Let Z denote a standard normal random variable. For t < 0, since nt. For t > 0, F n (t) = (X n < t) = ( nx n < nt) = (Z < nt) 0 F n (t) = (X n < t) = ( nx n < nt) = (Z < nt) 1 since nt. Hence, F n (t) F (t) for all t 0 and so X n 0. Notice that F n (0) = 1/2 F (0) = 1 so convergence fails at t = 0. That doesn t matter because t = 0 is not a continuity point of F and the definition of convergence in distribution only requires convergence at continuity points. Now convergence in probability follows from Theorem 6 (c): X n 0. Here we also provides a direct proof. For any ɛ > 0, using Markov s inequality, as n. ( X n > ɛ) = ( X n 2 > ɛ 2 ) E(X2 n) ɛ 2 = 1 n ɛ 2 0 4
We will show proof of Theorem 6(a) (c) next time. roof of Theorem 6. We start by proving (a). Suppose that X n X. Fix ɛ > 0. Then, using Markov s inequality, Also, ( X n X > ɛ) = ( X n X 2 > ɛ 2 ) E X n X 2 ɛ 2 0. roof of (b). Fix ɛ > 0 and let x be a continuity point of F. Then F n (x) = (X n x) = (X n x, X x + ɛ) + (X n x, X > x + ɛ) (X x + ɛ) + ( X n X > ɛ) = F (x + ɛ) + ( X n X > ɛ). F (x ɛ) = (X x ɛ) = (X x ɛ, X n x) + (X x ɛ, X n > x) F n (x) + ( X n X > ɛ). Hence, F (x ɛ) ( X n X > ɛ) F n (x) F (x + ɛ) + ( X n X > ɛ). Take the limit as n to conclude that F (x ɛ) lim inf F n(x) lim sup F n (x) F (x + ɛ). This holds for all ɛ > 0. Take the limit as ɛ 0 and use the fact that F is continuous at x and conclude that lim n F n (x) = F (x). roof of (c). Fix ɛ > 0. Then, ( X n c > ɛ) = (X n < c ɛ) + (X n > c + ɛ) (X n c ɛ) + (X n > c + ɛ) = F n (c ɛ) + 1 F n (c + ɛ) F (c ɛ) + 1 F (c + ɛ) = 0 + 1 1 = 0. Warning! Convergence in probability does not imply convergence in quadratic mean. 5
Let U Unif(0, 1) and let X n = ni (0,1/n) (U). Then ( X n > ɛ) = ( ni (0,1/n) (U) > ɛ) = (0 U < 1/n) = 1/n 0. Hence, X n 0. But E(X 2 n) = n 1/n 0 du = 1 for all n so X n does not converge in quadratic mean. Convergence in distribution does not imply convergence in probability. Let X N(0, 1). Let X n = X for n = 1, 2, 3,...; hence X n N(0, 1). X n has the same distribution function as X for all n so, trivially, lim n F n (x) = F (x) for all x. Therefore, X n X. But ( X n X > ɛ) = ( 2X > ɛ) = ( X > ɛ/2) 0. So X n does not converge to X in probability. One might conjecture that if X n b, then E(X n ) b. This is not true. Let X n be a random variable defined by (X n = n 2 ) = 1/n and (X n = 0) = 1 (1/n). Now, ( X n < ɛ) = (X n = 0) = 1 (1/n) 1. Hence, X n 0. However, E(X n ) = [n 2 (1/n)] + [0 (1 (1/n))] = n. Thus, E(X n ). Example 8 Let X 1,..., X n Uniform(0, 1). Let X (n) = max i X i. First we claim that X (n) 1. This follows since ( X (n) 1 > ɛ) = (X (n) 1 ɛ) = i (X i 1 ɛ) = (1 ɛ) n 0. Also (n(1 X (n) ) t) = 1 (X (n) 1 (t/n)) = 1 (1 t/n) n 1 e t. So n(1 X (n) ) Exp(1). 6