Chapter 5 Weak convergence We will see later that if the X i are i.i.d. with mean zero and variance one, then S n / p n converges in the sense P(S n / p n 2 [a, b])! P(Z 2 [a, b]), where Z is a standard normal. If S n / p n converged in probability or almost surely, then by the Kolmogorov zero-one law it would converge to a constant, contradicting the above. We want to generalize the above type of convergence. We say F n converges weakly to F if F n (x)! F (x) forallxat which F is continuous. Here F n and F are distribution functions. We say X n converges weakly to X if F Xn converges weakly to F X. We sometimes say X n converges in distribution or converges in law to X. Probabilities µ n converge weakly if their corresponding distribution functions converges, that is, if F µn (x) =µ n ( 1,x]convergesweakly. An example that illustrates why we restrict the convergence to continuity points of F is the following. Let X n =1/n with probability one, and X =0 with probability one. F Xn (x) is0ifx < 1/n and 1 otherwise. F Xn (x) converges to F X (x) forallx except x =0. Proposition 5.1 X n converges weakly to X if and only if E g(x n )! E g(x) for all g bounded and continuous. The idea that E g(x n )convergestoe g(x) forallg bounded and continuous makes sense for any metric space and is used as a definition of weak 41
42 CHAPTER 5. WEAK CONVERGENCE convergence for X n taking values in general metric spaces. Proof. First suppose E g(x n )convergestoeg(x). Let x be a continuity point of F,let">0, and choose such that F (y) F (x) <"if y x <. Choose g continuous such that g is one on ( 1,x], takes values between 0and1,andis0on[x +, 1). Then F Xn (x) apple E g(x n )! E g(x) apple F X (x + ) apple F (x)+". Similarly, if h is a continuous function taking values between 0 and 1 that is 1 on ( 1,x ]and0on[x, 1), F Xn (x) E h(x n )! E h(x) F X (x ) F (x) ". Since" is arbitrary, F Xn (x)! F X (x). Now suppose X n converges weakly to X. We start by making some observations. First, if we have a distribution function, it is increasing and so the number of points at which it has a discontinuity is at most countable. Second, if g is a continuous function on a closed bounded interval, it can be approximated uniformly on the interval by step functions. Using the uniform continuity of g on the interval, we may even choose the step function so that the places where it jumps are not in some pre-specified countable set. Our third observation is that P(X <x)= lim k!1 P(X apple x 1 )= lim k y!x thus if F X is continuous at x, thenp(x = x) =F X (x) F X(y); P(X <x)=0. Suppose g is bounded and continuous, and we want to prove that E g(x n )! E g(x). By multiplying by a constant, we may suppose that g is bounded by 1. Let ">0andchooseM such that F X (M) > 1 " and F X ( M) <" and so that M and M are continuity points of F X.Weseethat P(X n apple M) =F Xn ( M)! F X ( M) <", and so for large enough n, we have P(X n apple M) apple 2". Similarly, for large enough n, P(X n >M) apple 2". Therefore for large enough n, E g(x n ) di ers from E (g1 [ M,M] )(X n ) by at most 4". Also, E g(x) di ers from E (g1 [ M,M] )(X) byatmost2". Let h be a step function such that sup x applem h(x) g(x) <"and h is 0outsideof[ M,M]. We choose h so that the places where h jumps are continuity points of F and of all the F n.thene(g1 [ M,M] )(X n )di ersfrom E h(x n )byatmost", andthesamewhenx n is replaced by X.
43 If we show E h(x n )! E h(x), then lim sup E g(x n ) E g(x) apple8", n!1 and since " is arbitrary, we will be done. h is of the form P m i=1 c i1 Ii,whereI i is an interval, so by linearity it is enough to show E 1 I (X n )! E 1 I (X) when I is an interval whose endpoints are continuity points of all the F n and of F. If the endpoints of I are a<b,thenbyourthirdobservation above, P(X n = a) = 0, and the same when X n is replaced by a and when a is replaced by b. Wethenhave as required. E 1 I (X n )=P(a <X n apple b) =F Xn (b) F Xn (a)! F X (b) F X (a) =P(a <Xapple b) =E 1 I (X), Let us examine the relationship between weak convergence and convergence in probability. The example of S n / p n shows that one can have weak convergence without convergence in probability. Proposition 5.2 (a) If X n converges to X in probability, then it converges weakly. (b) If X n converges weakly to a constant, it converges in probability. (c) (Slutsky s theorem) If X n converges weakly to X and Y n converges weakly to a constant c, then X n + Y n converges weakly to X + c and X n Y n converges weakly to cx. Proof. To prove (a), let g be a bounded and continuous function. If n j is any subsequence, then there exists a further subsequence such that X(n jk ) converges almost surely to X. Then by dominated convergence, E g(x(n jk ))! E g(x). That su ces to show E g(x n )convergestoeg(x). For (b), if X n converges weakly to c, P(X n c>")=p(x n >c+ ") =1 P(X n apple c + ")! 1 P(c apple c + ") =0.
44 CHAPTER 5. WEAK CONVERGENCE We use the fact that if Y c, thenc + " is a point of continuity for F Y.A similar equation shows P(X n c apple ")! 0, so P( X n c >")! 0. We now prove the first part of (c), leaving the second part for the reader. Let x be a point such that x c is a continuity point of F X. Choose " so that x c + " is again a continuity point. Then P(X n + Y n apple x) apple P(X n + c apple x + ")+P( Y n c >")! P(X apple x c + "). So lim sup P(X n + Y n apple x) apple P(X + c apple x + "). Since " can be as small as we like and x c is a continuity point of F X,thenlimsupP(X n + Y n apple x) apple P(X + c apple x). The lim inf is done similarly. Here is an example where X n converges weakly but not in probability. Let X 1,X 2,... be an i.i.d. sequence with P(X 1 =1)= 1 and P(X 2 n =0)= 1. 2 Since the F Xn are all equal, then we have weak convergence. We claim the X n do not converge in probability. If they did, we could find a subsequence {n j } such that X nj converges a.s. Let A j =(X nj =1). These are independent sets, P(A j )= 1,soP 2 j P(A j)=1. By the Borel- Cantelli lemma, P(A j i.o.) =1,whichmeansthatX nj is equal to 1 infinitely often with probability one. The same argument also shows that X nj is equal to 0 infinitely often with probability one, which implies that X nj does not converge almost surely, a contradiction. We say a sequence of distribution functions {F n } is tight if for each ">0 there exists M such that F n (M) 1 " and F n ( M) apple " for all n. A sequence of r.v.s is tight if the corresponding distribution functions are tight; this is equivalent to P( X n M) apple ". We give an easily checked criterion for tightness. Proposition 5.3 Suppose there exists ' :[0, 1)! [0, 1) that is increasing and '(x)!1as x!1. If c =sup n E '( X n ) < 1, then the X n are tight. Proof. Let ">0. Choose M such that '(x) c/" if x>m.then Z '( Xn ) P( X n >M) apple 1 ( Xn >M)dP apple " c/" c E '( X n ) apple ".
Theorem 5.4 (Helly s theorem) Let F n be a sequence of distribution functions that is tight. There exists a subsequence n j and a distribution function F such that F nj converges weakly to F. 45 What could happen is that X n = n, so that F Xn! 0; the tightness precludes this. Proof. Let q k be an enumeration of the rationals. Since F n (q k ) 2 [0, 1], any subsequence has a further subsequence that converges. Use the diagonalization procedure so that F nj (q k )convergesforeachq k and call the limit F (q k ). F is nondecreasing, and define F (x) = inf qk x F (q k ). So F is right continuous and nondecreasing. If x is a point of continuity of F and ">0, then there exist r and s rational such that r<x<sand F (s) F (x) <"and F (x) F (r) <". Then F nj (x) F nj (r)! F (r) >F(x) " and Since " is arbitrary, F nj (x)! F (x). F nj (x) apple F nj (s)! F (s) <F(x)+". Since the F n are tight, there exists M such that F n ( M) <". Then F ( M) apple ", which implies lim x! 1 F (x) =0. Showinglim x!1 F (x) =1is similar. Therefore F is in fact a distribution function.