Convergence Concepts of Random Variables and Functions c 2002 2007, Professor Seppo Pynnonen, Department of Mathematics and Statistics, University of Vaasa Version: January 5, 2007
Convergence Modes Convergence of real numbers and real valued functions (a) Limit of a sequence of real numbers: Definition 1 The real number a IR is the limit of a sequence {x n }, denoted as lim n x n = a if for any ɛ > 0 there is an integer n(ɛ) such that x n a < ɛ whenever n n(ɛ), where n(ɛ) indicates that it may depend on ɛ. (b) Limit of a sequence of functions: Definition 2 A sequence of real-valued functions {f n } defined on a set S IR converge point wise to a function f : S IR, denoted as lim f n (x) = f(x), if for any ɛ > 0 and x S there is an integer n(ɛ, x) such that f n (x) f(x) < ɛ for each x S, whenever n n(ɛ, x), where n(ɛ, x) indicates that the integer may depend on ɛ and x. If n(ɛ, x) = n(ɛ), and for any ɛ > 0 f n (x) f(x) < ɛ for all x S, whenever n n(ɛ), then {f n } is converging uniformly to f. Note. In the latter case whenever n n(ɛ) the graphs of f n (x) become indistinguishable from f(x). Note. The uniform convergence implies the pointwise convergence, but not vice versa. Example. Let f n (x) = x n if 0 x 1. Then f(x) = lim n f n (x) = { 0 if 0 x < 1, 1 if x = 1. Thus {f n } converge pointwise but not uniformly to the limit function f(x) = 0 for 0 x < 1 and f(x) = 1 for x = 1. 2
Definition 3 (Limit of a function) A point c is the limit of a function f(x) as x p lim x p f(x) = c if for any ɛ > 0 there exists a δ > 0 such that f(x) c < ɛ, whenever 0 < x p < δ. Note. f need not be defined at p. Example. It is well known that f(x) = sin x x 1 as x 0. Definition 4 A function f : S IR is continuous at a point p S if f is defined at p, and for all ɛ > 0 there exist a δ > 0 such that f(x) f(p) < ɛ, whenever x p < δ Properties of lim: Theorem 1 Let lim x n = a and f(x) is continuous at a then lim f(x n ) = f(lim x n ) = f(a). Proof. For an arbitrary ɛ > 0 select δ > 0 (which may depend on ɛ) such that f(x) f(a) < ɛ, whenever x a < δ. Because x n a then there exists an n δ such that x n a < δ when n n δ, and hence f(x n ) f(a) < ɛ i.e., lim n f(x n ) = f(a), and the proof is complete. 3
Examples: If lim n x n = a 0 then lim 1 x n = 1 lim x n = 1 a lim x n = lim x n = a. Given that lim x n and lim y n exist then for an b, c IR lim(bx n + cy n ) = b lim x n + c lim y n lim x n y n = (lim x n )(lim y n ) provided that lim y n 0. lim x n y n = lim x n lim y n 4
Convergence of random variables Suppose next that the sequence X n is a sequence of random variables defined on a probability space (Ω, F, P ). Thus X n : Ω IR are just real valued functions with domain Ω and values X n (ω) IR), and all the convergence concepts defined so far apply as such. These are, however, pretty uninteresting, because they are tied to particular outcome ω. This would be just equivalent to investigate sequences {f n (x)} of real valued functions, and we would loose the new component randomness. Introducing randomness, then the interest in the convergence modes is the probability of those events where the sequence does not converge. If this diminishes to zero it is reasonable to say that a sequence of random variables converges. Definition 5 (Convergence with probability one, w.p.1-convergence or almost sure (a.s.) convergence). A sequence {X n } of random variables converges almost surely to a random variable X, denoted as if for any ɛ > 0 X n a.s. X (or X n X w.p.1) lim P ({ω Ω : X n(ω) X(ω) < ɛ, whenever n k}) = 1. k That is ( ) P lim X n = X = 1. n Note. The sets S k = {ω Ω : X n X < ɛ whenever n k} are an increasing sequence of sets, i.e., S k S k+1, and in set notations can be written as S k = {ω Ω : X k (ω) X(ω) < ɛ and X k+1 (ω) X(ω) < ɛ and } = n=k{ω Ω : X n (ω) X(ω) < ɛ}. Thus lim k P (S k ) = 1 and hence lim k P (S c k) = 0, i.e., for the sets S c k = Ω \ S k = n=k{ω Ω : X n X ɛ}, where the convergence does not occur, the probability diminishes to zero. Note. Confirming almost sure convergence of a sequence is, unfortunately, not an easy exercise. There are many sufficient conditions. One is the so 5
called Borel-Cantelli Lemma: Let {X n } be as sequence of random variables. If for every ɛ > 0 P ( X n X ɛ) < then n=1 X n a.s. X. Proof. Now P ( X n X ɛ) 0, and by assumption the sum above converges, which together imply that But then lim P ( X n X ɛ) = 0. k n=k lim k P ( n=k{ X n X < ɛ}) = 1 lim k P ( n=k{ X n X ɛ}) and the proof is complete. 1 lim k n=k P ( X n X ɛ) = 1, Definition 6 A sequence of random variables {X n } converges in probability to random variable X, denoted as plim X n = X or X n P X if for all ɛ > 0 lim P ( X n X ɛ) = 0. n Assume X n a.s X. Obviously, because { X n X ɛ} k=n{ X k X ɛ}, so that lim P ( X n X ɛ) lim P ( n n k=n{ X k X ɛ}) = 0. Thus a.s. convergence implies convergence in probability. The converse does not hold. Example. Let {X n } be a sequence of independent random variables, such that P (X n = 0) = 1 1 n, and P (X n = 1) = 1 n. 6
Then for any 1 > ɛ > 0 P ( X n < ɛ) = P (X n = 0) = 1 1 n 1, as n. So X n P 0. Nevertheless, lim k P ( n=k{ X n ɛ}) = 1 lim k P ( n=k{ X n < ɛ}) since n=k (1 1 ) = 0 for all k.1 n Thus X n 0 in probability but not a.s. = 1 lim k n=k P ( X n < ɛ) (by independence) = 1 lim k n=k (1 1 n ) = 1 Definition 7 (Convergence in distribution) A sequence of random variables {X n } in distribution to a random X, denoted as X n D X if lim F n(x) = F (x) n in all continuity points x of F (x), where F n (x) = P (X n x) is the cumulative distribution function (cdf) of X n and F (x) = P (X x) is the cdf of X. If g( ) is a continuous monotonically increasing function. Then if Y n = g(x n ), we have F Yn (y) = P (Y n y) = P (g(x n ) y) = P (X n g 1 (y)) = F Xn (g 1 (y))0. Similarly, if Y = g(x) then 1 n ( k 1 F Y (y) = P (Y y) = P (g(x) y) = P (X g 1 (y)) = F X (g 1 (y)). m=2 m=2 (1 1 m = n ( 1 1 m) ) 1 n m=2 m=2 ( m 1 m ( 1 1 m ) = (n 1)! n! = 1 n 0 as n. Generally n m=k (1 1 m ) ( ) ) = 1 = 1 1 k 1 n = (k 1) 1 n 0 as n. 7
If X n D X, then F Yn (y) = F Xn (g 1 (y)) F X (g 1 (y)) = F Y (y) as n. That is g(x n ) D g(x). More generally: D X and g( ) is con- Theorem 2 (Continuous Mapping Theorem) If X n tinuous, then g(x n ) D X. Definition 8 (Convergence in quadratic mean) A sequence of random variables {X n } with E(Xn) 2 < converges in quadratic mean to X (with E(X 2 ) < ), denoted as X n X if lim E(X n X) 2 = 0. n Using the Chebysev inequality we obtain directly that if X n P X. Generally X n X then [X n X] [X n a.s. X] [X n P X] [X n D X] Figure. Relations between convergence modes. Furthermore Theorem 3 Let {X n } and {Y n } be sequences of random variables. (a) If c is a constant then X n c X n P c 8
(b) If c is constant then X n c lim E[X n ] = c and lim Var[X n ] = lim E (X n E[X n ]) 2 = 0. (c) If plim X n and plim Y n exist and a and b are constants, then (1) plim (ax n ± by n ) = a plim X n ± b plim Y n, (2) plim (X n Y n ) = (plim X n )(plim Y n ) (3) if plim Y n 0 then plim Xn Y n = plim X n plim Y n. (4) if X n D X and plim X n = plim Y n, then Y n D X (d) If X n D X and plim Y n = c, a constant, then (1) X n ± Y n D X ± c. (2) X n Y n D cx given that c 0. (3) If c = 0, then plim X n Y n = 0. We observe that plim shares the same properties as the usual lim, but the other stochastic convergence concepts do not. Especially (and unfortunately) limit shares very few of these properties. For example X n X and Y n Y does not imply X n + Y n X + Y! (If it did, then by defining Z n = X n Y n one gets Z n 0, and property (c) would make equivalent to plim.) 9