Elements of Probability Theory

Size: px

Start display at page:

Download "Elements of Probability Theory"

Priscilla Barrett
5 years ago
Views:

1 Elements of Probability Theory CHUNG-MING KUAN Department of Finance National Taiwan University December 5, 2009 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

2 Lecture Outline 1 Probability Space and Random Variables Probability Space Random Variables Moments and Norms 2 Conditional Distributions and Moments Conditional Distributions Conditional Moments 3 Modes of Convergence Almost Sure Convergence Convergence in Probability Convergence in Distribution 4 Stochastic Order Notations C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

3 Lecture Outline (cont d) 5 Law of Large Numbers 6 Central Limit Theorem 7 Stochastic Processes Brownian motion Weak Convergence 8 Functional Central Limit Theorem C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

4 Probability Space and σ-algebra A probability space is a triplet (Ω, F, IP), where 1 Ω is the outcome space, whose elements ω are outcomes of the random experiment, 2 F is a a σ-algebra, a collection of subsets of Ω, 3 IP is a a probability measure assigned to the elements in F. F is a σ-algebra if 1 Ω F, 2 if A F, then A c F, 3 if A 1, A 2, F, then n=1 A n F. By (2), Ω c = F. From de Mongan s law, ( ) c A n = A c n F. n=1 n=1 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

5 Probability Measure IP: F [0, 1] is a real-valued set function such that 1 IP(Ω) = 1, 2 IP(A) 0 for all A F. 3 if A 1, A 2,... F are disjoint, then IP( n=1 A n) = n=1 IP(A n). IP( ) = 0, IP(A c ) = 1 IP(A), IP(A) IP(B) if A B, and IP(A B) = IP(A) + IP(B) IP(A B). If {A n } is an increasing (decreasing) sequence in F with the limiting set A, then lim n IP(A n ) = IP(A). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

6 Borel Field Let C be a collection of subsets of Ω. The σ-algebra generated by C, σ(c), is the intersection of all σ-algebras that contain C and hence the smallest σ-algebra containing C. When Ω = R, the Borel field, B, is the σ-algebra generated by all open intervals (a, b) in R. Note that (a, b), [a, b], (a, b], and (, b] can be obtained from each other by taking complement, union and/or intersection. For example, (a, b] = n=1 ( a, b + 1 ), (a, b) = n n=1 ( a, b 1 ]. n Thus, the collection all open intervals (closed intervals, half-open intervals or half lines) generates the same Borel field. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

7 The Borel field on R d, B d, is generated by all open hypercubes: (a 1, b 1 ) (a 2, b 2 ) (a d, b d ). B d can be generated by all closed hypercubes: [a 1, b 1 ] [a 2, b 2 ] [a d, b d ], or by (, b 1 ] (, b 2 ] (, b d ]. The sets that generate the Borel field B d are all Borel sets. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

8 Random Variable A random variable z defined on (Ω, F, IP) is a function z : Ω R such that for every B in the Borel field B, its inverse image is in F: z 1 (B) = {ω : z(ω) B} F. That is, z is a F/B-measurable (or simply F-measurable) function. Given ω, the resulting value z(ω) is known as a realization of z. A R d valued random variable (random vector) z defined on (Ω, F, IP) is: z: Ω R d such that for every B B d, z 1 (B) = {ω : z(ω) B} F; i.e., z is a F/B d -measurable function. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

9 Borel Measurable All the inverse images of random vector z, z 1 (B), form a σ-algebra, denoted as σ(z). It is known as the σ-algebra generated by z, or the information set associated with z. It is the smallest σ-algebra in F such that z is measurable. A function g : R R is B-measurable or Borel measurable if {ζ R: g(ζ) b} B. For random variable z defined on (Ω, F, IP) and Borel measurable function g( ), g(z) is a random variable defined on (Ω, F, IP). The same conclusion holds for d-dimensional random vector z and B d -measurable function g( ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

10 Distribution Function The joint distribution function of z is a non-decreasing, right-continuous function F z such that for ζ = (ζ 1,..., ζ d ) R d, F z (ζ) = IP{ω Ω: z 1 (ω) ζ 1,..., z d (ω) ζ d }, with lim F z(ζ) = 0, ζ 1,..., ζ d lim F z(ζ) = 1. ζ 1,..., ζ d The marginal distribution function of the i th component of z is F zi (ζ i ) = IP{ω Ω: z i (ω) ζ i } = F z (,...,, ζ i,,..., ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

11 Independence y and z are (pairwise) independent iff for any Borel sets B 1 and B 2, IP(y B 1 and z B 2 ) = IP(y B 1 ) IP(z B 2 ). A sequence of random variables {z i } is totally independent if ( ) IP {z i B i } = IP(z i B i ). all i all i Lemma 5.1 Let {z i } be a sequence of independent random variables and h i, i = 1, 2,... be Borel-measurable functions. Then {h i (z i )} is also a sequence of independent random variables. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

12 Expectation The expectation of Z i is the Lebesgue integral of z i wrt to IP: IE(z i ) = z i (ω) d IP(ω). Ω In terms of its distribution function, IE(z i ) = ζ i df z (ζ) = ζ i df zi (ζ i ). R d R For Borel measurable function g( ) of z, IE[g(z)] = g(z(ω)) d IP(ω) = g(ζ) df z (ζ). Ω R d For example, the covariance matrix of z IE(zz ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

13 A function g is convex on a set S if for any a [0, 1] and any x, y in S, g (ax + (1 a)y) ag(x) + (1 a)g(y); g is concave on S if the inequality above is reversed. Lemma 5.2 (Jensen) Let g be a convex function on the support of z. For an integrable random variable z such that g(z) is integrable, g(ie(z)) IE[g(z)]; the inequality reverses if g is concave. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

14 L p -Norm For random variable z with finite p th moment, its L p -norm is: z p = [IE(z p )] 1/p. The inner product of square integrable random variables z i and z j is: z i, z j = IE(z i z j ). The L 2 -norm of z i can be obtained as z i 2 = z i, z i 1/2. For any c > 0 and p > 0, note that c p IP( z c) = c p 1 {ζ: ζ c} df z (ζ) ζ p df z (ζ) IE z p, {ζ: ζ c} where 1 A is the indicator function of the event A. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

15 Inequalities Lemma 5.3 (Markov) Let z be a random variable with finite p th moment. Then, IP( z c) IE z p c p, where c is a positive real number. For p = 2, Markov s inequality is also known as Chebyshev s inequality. Markov s inequality is trivial if c is small such that IE z p /c p > 1. When c becomes large, the probability that z assumes very extreme values will be vanishing at the rate c p. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

16 Lemma 5.4 (Hölder) Let y be a random variable with finite p th moment (p > 1) and z a random variable with finite q th moment (q = p/(p 1)). Then, IE yz y p z q. Since IE(yz) IE yz, we also have: Lemma 5.5 (Cauchy-Schwatz) Let y and z be two square integrable random variables. Then, IE(yz) y 2 z 2. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

17 Let y = 1 and x = z p. For q > p and r = q/p, by Hölder s inequality, IE z p x r y r/(r 1) = [IE(z pr )] 1/r = [IE(z q )] p/q. Lemma 5.6 (Liapunov) Let z be a random variable with finite q th moment. Then for p < q, z p z q. Lemma 5.7 (Minkowski) Let z i, i = 1,..., n, be random variables with finite p th moment (p 1). Then, n i=1 z i p n i=1 z i p. When n = 2, this is just the triangle inequality for L p -norms. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

18 Conditional Distributions Given A, B F, suppose we know B has occurred. Given the outcome space is restricted to B, the likelihood of A is characterized by the conditional probability: IP(A B) = IP(A B)/ IP(B). The conditional density function of z given y = η is f z y (ζ y = η) = f z,y(ζ, η). f y (η) f z y (ζ y = η) is clearly non-negative. Also R d f z y (ζ y = η)dζ = 1 f y (η) R d f z,y (ζ, η)dζ = 1 f y (η) f y(η) = 1. That is, f z y (ζ y = η) is a legitimate density function. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

19 Given the conditional density function f z y, for A B d, IP(z A y = η) = f z y (ζ y = η)dζ. A This probability is defined even when IP(y = η) is zero. When A = (, ζ 1 ] (, ζ d ], the conditional distribution function is F z y (ζ y = η) = IP(z 1 ζ 1,..., z d ζ d y = η). When z and y are independent, the conditional density (distribution) reduces to the unconditional density (distribution). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

20 Let G be a sub-σ-algebra of F, the conditional expectation IE(z G) is the integrable and G-measurable random variable satisfying IE(z G) d IP = z d IP, G G. G G Suppose that G is the trivial σ-algebra {Ω, }, then IE(z G) must be a constant c, so that IE(z) = z d IP = Ω Ω c d IP = c. Consider G = σ(y), the σ-algebra generated by y. IE(z y) = IE[z σ(y)], which is interpreted as the prediction of z given all the information associated with y. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

21 By definition, IE[IE(z G)] = Ω IE(z G) d IP = Ω z d IP = IE(z); That is, only a smaller σ-algebra matters in conditional expectation. Lemma 5.9 (Law of Iterated Expectations) Let G and H be two sub-σ-algebras of F such that G H. Then for the integrable random vector z, IE[IE(z H) G] = IE[IE(z G) H] = IE(z G). If z is G-measurable, then IE[g(z)x G] = g(z) IE(x G) with prob. 1. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

22 Lemma 5.11 Let z be a square integrable random variable. Then IE[z IE(z G)] 2 IE(z z) 2, for any G-measurable random variable z. Proof: For any square integrable, G-measurable random variable z, IE ( [z IE(z G)] z ) = IE ( [IE(z G) IE(z G)] z ) = 0. It follows that IE(z z) 2 = IE[z IE(z G) + IE(z G) z] 2 = IE[z IE(z G)] 2 + IE[IE(z G) z] 2 IE[z IE(z G)] 2. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

23 The conditional variance-covariance matrix of z given y is var(z y) = IE ( [z IE(z y)][z IE(z y)] y ) = IE(zz y) IE(z y) IE(z y), which leads to decomposition of analysis of variance: var(z) = IE[var(z y)] + var ( IE(z y) ). Example 5.12: Suppose that [ ] ([ ] [ y µ N y, x µ x Σ yy Σ xy Σ xy Σ xx ]). Then, IE(y x) = µ y Σ xyσ 1 xx (x µ x ), var(y x) = var(y) var ( IE(y x) ) = Σ yy Σ xyσ 1 xx Σ xy. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

24 Almost Sure Convergence A sequence of random variables, {z n ( )} n=1,2,..., is such that for a given ω, z n (ω) is a realization of the random element ω with index n, and that for a given n, z n ( ) is a random variable. Almost Sure Convergence Suppose {z n } and z are all defined on (Ω, F, IP). {z n } is said to converge to z almost surely if, and only if, IP(ω : z n (ω) z(ω) as n ) = 1, denoted as z n a.s. z or z n z a.s. (with prob. 1). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

25 Lemma 5.13 a.s. Let g : R R be a function continuous on S g R. If z n z, where z is a random variable such that IP(z S g ) = 1, then g(z n ) a.s. g(z). Proof: Let Ω 0 = {ω : z n (ω) z(ω)} and Ω 1 = {ω : z(ω) S g }. Thus, for ω (Ω 0 Ω 1 ), continuity of g ensures that g(z n (ω)) g(z(ω)). Note that (Ω 0 Ω 1 ) c = Ω c 0 Ω c 1, which has probability zero because IP(Ω c 0 ) = IP(Ωc 1 ) = 0. (Why?) It follows that Ω 0 Ω 1 has probability one, showing that g(z n ) g(z) with probability one. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

26 Convergence in Probability Convergence in Probability {z n } is said to converge to z in probability if for every ɛ > 0, lim IP(ω : z n n(ω) z(ω) > ɛ) = 0, or equivalently, lim n IP(ω : z n (ω) z(ω) ɛ) = 1. This is denoted as IP z or z n z in probability. z n Note: In this definition, the events Ω n (ɛ) = {ω : z n (ω) z(ω) ɛ} may vary with n, and convergence is referred to the probability of such events: p n = IP(Ω n (ɛ)), rather than the random variables z n. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

27 Almost sure convergence implies convergence in probability. To see this, let Ω 0 denote the set of ω such that z n (ω) z(ω). For ω Ω 0, there is some m such that ω is in Ω n (ɛ) for all n > m. That is, Ω 0 m=1 n=m Ω n (ɛ) F. As n=mω n (ɛ) is non-decreasing in m, it follows that ( ) IP(Ω 0 ) IP Ω n (ɛ) = lim m IP m=1 n=m ( n=m Ω n (ɛ) ) lim m IP( Ω m (ɛ) ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

28 Example 5.15 Let Ω = [0, 1] and IP be the Lebesgue measure. Consider the sequence of intervals {I n } in [0, 1]: [0, 1/2), [1/2, 1], [0, 1/3), [1/3, 2/3), [2/3, 1],..., and let z n = 1 In. When n tends to infinity, I n shrinks toward a singleton. For 0 < ɛ < 1, we have IP( z n > ɛ) = IP(I n ) 0, IP which shows z n 0. On the other hand, each ω [0, 1] must be covered by infinitely many intervals, so that z n (ω) = 1 for infinitely many n. This shows that z n (ω) does not converge to zero. Note: Convergence in probability permits z n to deviate from the probability limit infinitely often, but almost sure convergence does not, except for those ω in the set of probability zero. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

29 Lemma 5.16 Let {z n } be a sequence of square integrable random variables. If IP IE(z n ) c and var(z n ) 0, then z n c. Lemma 5.17 Let g : R R be a function continuous on S g R. If z n z, where z IP is a random variable such that IP(z S g ) = 1, then g(z n ) g(z). Proof: By the continuity of g, for each ɛ > 0, we can find a δ > 0 s.t. {ω : z n (ω) z(ω) δ} {ω : z(ω) S g } {ω : g(z n (ω)) g(z(ω)) ɛ}. Taking complementation of both sides, we have IP( g(z n ) g(z) > ɛ) IP( z n z > δ) 0. IP C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

30 Lemma 5.13 and Lemma 5.17 are readily generalized to R d -valued random a.s. IP variables. For instance, z n z (z n z) implies z 1,n + z 2,n z 1,n z 2,n z 2 1,n + z 2 2,n a.s. ( ) IP z 1 + z 2, a.s. ( ) IP z 1 z 2, a.s. ( ) IP z1 2 + z2 2, where z 1,n, z 2,n are two elements of z n and z 1, z 2 are the corresponding elements of z. Also, provided that z 2 0 with probability one, z 1,n /z 2,n a.s. ( ) IP z 1 /z 2. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

31 Convergence in Distribution Convergence in Distribution {z n } is said to converge to z in distribution, denoted as z n D z, if lim F n z n (ζ) = F z (ζ), for every continuity point ζ of F z. We also say that z n is asymptotically distributed as F z, denoted as z n A Fz ; F z is thus known as the limiting distribution of z n. Cramér-Wold Device. Let {z n } be a sequence of random vectors in R d D. Then z n z if and only if α D z n α z for every α R d such that α α = 1. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

32 Lemma 5.19 If z n IP z, then z n D z. For a constant c, zn IP c iff z n D c. Proof: For some arbitrary ɛ > 0 and a continuity point ζ of F z, we have IP(z n ζ) = IP({z n ζ} { z n z ɛ}) + IP({z n ζ} { z n z > ɛ}) IP(z ζ + ɛ) + IP( z n z > ɛ). IP Similarly, IP(z ζ ɛ) IP(z n ζ) + IP( z n z > ɛ). If z n z, then by passing to the limit and noting that ɛ is arbitrary, lim IP(z n n ζ) = IP(z ζ). That is, F zn (ζ) F z (ζ). The converse is not true in general, however. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

33 Theorem 5.20 (Continuous Mapping Theorem) Let g : R R be a function continuous almost everywhere on R, except D for at most countably many points. If z n z, then g(zn ) D g(z). For example, z n Theorem 5.21 D N (0, 1) implies z 2 n D χ 2 (1). Let {y n } and {z n } be two sequences of random vectors such that IP D D y n z n 0. If z n z, then yn z. Theorem 5.22 If y n converges in probability to a constant c and z n converges in D D c + z, yn z n cz, and zn /y n distribution to z, then y n + z n if c 0. D z/c C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

34 Non-Stochastic Order Notations Order notations are used to describe the behavior of real sequences. b n is (at most) of order c n, denoted as b n = O(c n ), if there exists a < such that b n /c n for all sufficiently large n. b n is of smaller order than c n, denoted as b n = o(c n ), if b n /c n 0. An O(1) sequence in bounded; an o(1) sequence converges to zero. The product of O(1) and o(1) sequences is o(1). Theorem 5.23 (a) If a n = O(n r ) and b n = O(n s ), then a n b n = O(n r+s ), a n + b n = O(n max(r,s) ). (b) If a n = o(n r ) and b n = o(n s ), then a n b n = o(n r+s ), a n + b n = o(n max(r,s) ). (c) If a n = O(n r ) and b n = o(n s ), then a n b n = o(n r+s ), a n + b n = O(n max(r,s) ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

35 Stochastic Order Notations The order notations defined earlier easily extend to describe the behavior of sequences of random variables. {z n } is O a.s. (c n ) (or O(c n ) almost surely) if z n /c n is O(1) a.s. {z n } is O IP (c n ) (or O(c n ) in probability) if for every ɛ > 0, there is some such that IP( z n /c n ) ɛ, for all n sufficiently large. Lemma 5.23 holds for stochastic order notations. For example, y n = O IP (1) and z n = o IP (1), then y n z n is o IP (1). It is very restrictive to require a random variable being bounded almost surely, but a well defined random variable is typically bounded in probability, i.e., O IP (1). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

36 D Let {z n } be a sequence of random variables such that z n z and ζ be a continuity point of F z. Then for any ɛ > 0, we can choose a sufficiently D large ζ such that IP( z > ζ) < ɛ/2. As z n z, we can also choose n large enough such that IP( z n > ζ) IP( z > ζ) < ɛ/2, which implies IP( z n > ζ) < ɛ. We have proved: Lemma 5.24 Let {z n } be a sequence of random variables such that z n z n = O IP (1). D z. Then C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

37 Law of Large Numbers When a law of large numbers holds almost surely, it is a strong law of large numbers (SLLN); when a law of large numbers holds in probability, it is a weak law of large numbers (WLLN). A sequence of random variables obeys a LLN when its sample average essentially follows its mean behavior; random irregularities (deviations from the mean) are wiped out in the limit by averaging. Kolmogorov s SLLN : Let {z t } be a sequence of i.i.d. random variables with mean µ o. Then, T 1 T t=1 z a.s. t µ o. Note that i.i.d. random variables need not obey Kolmogorov s SLLN if they do not have a finite mean, e.g., i.i.d. Cauchy random variables. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

38 Theorem 5.26 (Markov s SLLN) Let {z t } be a sequence of independent random variables such that for some δ > 0, IE z t 1+δ is bounded for all t. Then, 1 T T [z t IE(z t )] a.s. 0. t=1 Note that here z t need not have a common mean, and the average of their means need not converge. Compared with Kolmogorov s SLLN, Markov s SLLN requires a stronger moment condition but not identical distribution. A LLN usually obtains by regulating the moments of and dependence across random variables. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

39 Examples Example 5.27 Suppose that y t = α o y t 1 + u t with α o < 1. Then, var(y t ) = σ 2 u/(1 α 2 o), and cov(y t, y t j ) = α j o ( T ) var y t = t=1 T t=1 σ2 u 1 α 2 o. Thus, T 1 var(y t ) + 2 (T τ) cov(y t, y t τ ) T var(y t ) + 2T t=1 ( so that var T 1 ) T t=1 y t 1 T T t=1 y t IP 0. τ=1 T 1 τ=1 by Lemma That is, {y t } obeys a WLLN. cov(y t, y t τ ) = O(T ), = O(T 1 ). As IE(T 1 T t=1 y t) = 0, C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

40 Lemma 5.28 Let y t = i=0 π iu t i, where u t are i.i.d. random variables with mean zero and variance σu. 2 If i= π i <, then T 1 T t=1 y a.s. t 0. In Example 5.27, y t = i=0 αi ou t i with α o < 1, so that i=0 αi o < Lemma 5.28 is quite general and applicable to processes that can be expressed as an MA process with absolutely summable weights, e.g., weakly stationary AR(p) processes. For random variables with strong correlations over time, the variation of their partial sums may grow too rapidly and cannot be eliminated by simple averaging. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

41 Example 5.29: For the sequences {t} and {t 2 }, T t=1 t = T (T + 1)/2, T t=1 t2 = T (T + 1)(2T + 1)/6. Hence, T 1 T t=1 t and T 1 T t=1 t2 both diverge. Example 5.30: u t are i.i.d. with mean zero and variance σ 2 u. Consider now {tu t }, which does not have bounded (1 + δ) th moment and does not obey Markov s SLLN. Moreover, ( T ) T var tu t = t 2 var(u t ) = σu 2 t=1 t=1 T (T + 1)(2T + 1), 6 so that T t=1 tu t = O IP (T 3/2 ). It follows that T 1 T t=1 tu t = O IP (T 1/2 ). That is, {tu t } does not obey a WLLN. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

42 Example 5.31: y t is a random walk: y t = y t 1 + u t. For s < t, y t = y s + t i=s+1 u i = y s + v t s, where v t s is independent of y s and cov(y t, y s ) = IE(y 2 s ) = sσ 2 u. Thus, ( T ) var y t = t=1 T t=1 T 1 var(y t ) + 2 T τ=1 t=τ+1 for T t=1 var(y t) = T t=1 tσ2 u = O(T 2 ) and T 1 2 T τ=1 t=τ+1 T 1 cov(y t, y t τ ) = 2 T τ=1 t=τ+1 cov(y t, y t τ ) = O(T 3 ), (t τ)σ 2 u = O(T 3 ). Then, T t=1 y t = O IP (T 3/2 ) and T 1 T t=1 y t diverges in probability. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

43 Example 5.32: y t is the random walk in Example Then, IE(y t 1 u t ) = 0, var(y t 1 u t ) = IE(y 2 t 1 ) IE(u2 t ) = (t 1)σ 4 u, and for s < t, cov(y t 1 u t, y s 1 u s ) = IE(y t 1 y s 1 u s ) IE(u t ) = 0. This yields ( T ) var y t 1 u t = t=1 T var(y t 1 u t ) = t=1 T (t 1)σu 4 = O(T 2 ), and T t=1 y t 1u t = O IP (T ). As var(t 1 T t=1 y t 1u t ) converges to σ 4 u/2, rather than 0, {y t 1 u t } does not obey a WLLN, even though its partial sums are O IP (T ). t=1 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

44 Central Limit Theorem (CLT) Lemma 5.35 (Lindeberg-Lévy s CLT) Let {z t } be a sequence of i.i.d. random variables with mean µ o and variance σo 2 > 0. Then, D T ( z T µ o )/σ o N (0, 1). i.i.d. random variables need not obey this CLT if they do not have a finite variance, e.g., t(2) r.v. Note that z T converges to µ o in probability, and its variance σo/t 2 vanishes when T tends to infinity. A normalizing factor T 1/2 suffices to prevent a degenerate distribution in the limit. When {z t } obeys a CLT, z T is said to converge to µ o at the rate T 1/2, and z T is understood as a root-t consistent estimator. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

45 Lemma 5.36 (Liapunov s CLT) Let {z Tt } be a triangular array of independent random variables with mean µ Tt and variance σtt 2 > 0 such that σ2 T = 1 T T t=1 σ2 Tt σ2 o > 0. If for some δ > 0, IE z Tt 2+δ are bounded, then D T ( z T µ T )/σ o N (0, 1). A CLT usually requires stronger conditions on the moment of and dependence across random variables than those needed to ensure a LLN. Moreover, every random variable must also be asymptotically negligible, in the sense that no random variable is influential in affecting the partial sums. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

46 Examples Example 5.37: {u t } is a sequence of independent random variables with mean zero, variance σu, 2 and bounded (2 + δ) th moment. we know var( T t=1 tu t) is O(T 3 ), which implies that variance of T 1/2 T t=1 tu t is diverging at the rate O(T 2 ). On the other hand, observe that ( 1 T var T 1/2 It follows that 3 T 1/2 σ u T t=1 t=1 ) t T u t = t T u t D N (0, 1). T (T + 1)(2T + 1) 6T 3 σu 2 σ2 u 3. These results show that {(t/t )u t } obeys a CLT, whereas {tu t } does not. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

47 Example 5.38: y t is a random walk: y t = y t 1 + u t, where u t are i.i.d. with mean zero and variance σu. 2 We know y t do not obey a LLN and hence do not obey a CLT. CLT for Triangular Array {z Tt } is a triangular array of random variables and obeys a CLT if 1 σ o T T [z Tt IE(z Tt )] = t=1 T ( zt µ T ) σ o D N (0, 1), where z T = T 1 T t=1 z Tt, µ T = IE( z T ), and ( T ) σt 2 = var T 1/2 z Tt σo 2 > 0. t=1 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

48 Consider an array of square integrable random vectors z Tt in R d. Let z T denote the average of z Tt, µ T = IE( z T ), and ( 1 Σ T = var T T t=1 z Tt ) Σ o, a positive definite matrix. Using the Cramér-Wold device, {z Tt } is said to obey a multivariate CLT, in the sense that Σ 1/2 o 1 T T t=1 [z Tt IE(z Tt )] = Σ 1/2 o T ( zt µ T ) D N (0, I d ), if {α z Tt } obeys a CLT, for any α R d such that α α = 1. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

49 Stochastic Processes A d-dimensional stochastic process with the index set T is a measurable mapping z: Ω (R d ) T such that z(ω) = {z t (ω), t T }. For each t T, z t ( ) is a R d -valued r.v.; for each ω, z(ω) is a sample path (realization) of z, a R d -valued function on T. The finite-dimensional distributions of {z(t, ), t T } is IP(z t1 a 1,..., z tn a n ) = F t1,...,t n (a 1,..., a n ). z is stationary if F t1,...,t n are invariant under index displacement. z is Gaussian if F t1,...,t n are all (multivariate) normal. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

50 Brownian motion The process {w(t), t [0, )} is the standard Wiener process (standard Brownian motion) if it has continuous sample paths almost surely and satisfies: 1 IP ( w(0) = 0 ) = 1. 2 For 0 t 0 t 1 t k, IP ( w(t i ) w(t i 1 ) B i, i k ) = i k IP( ) w(t i ) w(t i 1 ) B i, where B i are Borel sets. 3 For 0 s < t, w(t) w(s) N (0, t s). Note: w here has independent and Gaussian increments. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

51 w(t) N (0, t) such that for r t, cov ( w(r), w(t) ) = IE [ w(r) ( w(t) w(r) )] + IE [ w(r) 2] = r. The sample paths of w are a.s. continuous but highly irregular (nowhere differentiable). To see this, note w c (t) = w(c 2 t)/c for c > 0 is also a standard Wiener process. (Why?) Then, w c (1/c) = w(c)/c. For a large c such that w(c)/c > 1, wc(1/c) 1/c = w(c) > c. That is, the sample path of w c has a slope larger than c on a very small interval (0, 1/c). The difference quotient: [w(t + h) w(t)]/h N (0, 1/ h ) can not converge to a finite limit (as h 0) with a positive prob. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

52 The d-dimensional, standard Wiener process w consists of d mutually independent, standard Wiener processes, so that for s < t, w(t) w(s) N (0, (t s) I d ). Lemma 5.39 Let w be the d-dimensional, standard Wiener process. 1 w(t) N (0, t I d ). 2 cov(w(r), w(t)) = min(r, t) I d. The Brownian bridge w 0 on [0, 1] is w 0 (t) = w(t) tw(1). Clearly, IE[w 0 (t)] = 0, and for r < t, cov ( w 0 (r), w 0 (t) ) = cov ( w(r) rw(1), w(t) tw(1) ) = r(1 t) I d. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

53 Weak Convergence IP n converges weakly to IP, denoted as IP n IP, if for every bounded, continuous real function f on S, f (s) dip n (s) f (s) d IP(s), where {IP n } and IP are probability measures on (S, S). When z n and z are all R d -valued random variables, IP n IP reduces D to the usual notion of convergence in distribution: z n z. When z n and z are d-dimensional stochastic processes with the D distributions induced by IP n and IP, z n z, also denoted as zn z, implies that all the finite-dimensional distributions of z n converge to the corresponding distributions of z. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

54 Continuous Mapping Theorem Lemma 5.40 (Continuous Mapping Theorem) Let g : R d R be a function continuous almost everywhere on R d, except for at most countably many points. If z n z, then g(z n ) g(z). Proof: Let S and S be two metric spaces with Borel σ-algebras S and S and g : S S be a measurable mapping. For IP on (S, S), define IP on (S, S ) as IP (A ) = IP(g 1 (A )), A S. For every bounded, continuous f on S, f g is also bounded and continuous on S. IP n IP now implies that f g(s) dip n (s) f g(s) d IP(s), which is equivalent to f (a) dip n(a) f (a) dip (a), proving IP n IP. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

55 Functional Central Limit Theorem (FCLT) ζ i are i.i.d. with mean zero and variance σ 2. Let s n = ζ ζ n and z n (i/n) = (σ n) 1 s i. For t [(i 1)/n, i/n), the constant interpolations of z n (i/n) is z n (t) = z n ((i 1)/n) = 1 σ n s [nt], where [nt] is the the largest integer less than or equal to nt. From Lindeberg-Lévy s CLT, 1 σ n s [nt] = ( [nt] n ) 1/2 1 σ [nt] s D [nt] t N (0, 1), which is just N (0, t), the distribution of w(t). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

56 For r < t, we have (z n (r), z n (t) z n (r)) D ( w(r), w(t) w(r) ), and hence (z n (r), z n (t)) D (w(r), w(t)). This is easily extended to establish convergence of any finite-dimensional distributions and leads to the functional central limit theorem. Lemma 5.41 (Donsker) Let ζ t be i.i.d. with mean µ o and variance σ 2 o > 0 and z T (r) = 1 σ o T [Tr] (ζ t µ o ), r [0, 1]. t=1 Then, z T w as T. C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

57 Let ζ t be r.v.s with mean µ t and variance σ 2 t > 0. Define long-run variance of ζ t as σ 2 = lim T var ( 1 T T t=1 ζ t ) {ζ t } is said to obey an FCLT if z T w as T, where z T (r) = 1 σ T [Tr] ( ) ζt µ t, r [0, 1]. t=1 In the multivariate context, FCLT is z T w as T, where z T (r) = 1 Σ 1/2 ( ) ζt µ t, r [0, 1], T [Tr] t=1 w is the d-dimensional, standard Wiener process, and ( T ) ( 1 Σ = lim T T IE T ) (ζ t µ t ) (ζ t µ t ), t=1 C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58, t=1

58 Example 5.43 y t = y t 1 + u t, t = 1, 2,..., with y 0 = 0, where u t are i.i.d. with mean zero and variance σ 2 u. By Donsker s FCLT, the partial sum y [Tr] = [Tr] t=1 u t is such that 1 T 3/2 T y t = σ u t=1 T t/t t=1 (t 1)/T 1 1 y [Tr] dr σ u w(r) dr, T σu 0 This result also verifies that T t=1 y t is O IP (T 3/2 ). Similarly, 1 T 2 T yt 2 = 1 T t=1 T t=1 ( yt ) 2 1 σ 2 u w(r) 2 dr, T 0 so that T t=1 y 2 t is O IP (T 2 ). C.-M. Kuan (National Taiwan Univ.) Elements of Probability Theory December 5, / 58

Elements of Probability Theory

Chapter 5 Elements of Probability Theory The purpose of this chapter is to summarize some important concepts and results in probability theory. Of particular interest to us are the limit theorems which