CHAPTER 3: LARGE SAMPLE THEORY

Size: px

Start display at page:

Download "CHAPTER 3: LARGE SAMPLE THEORY"

Alvin Esmond Roberts
6 years ago
Views:

1 CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY

2 CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction

3 CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually difficult and complicated large sample theory studies the limit behavior of a sequence of random variables, say X n. example: Xn µ, n( X n µ)

4 CHAPTER 3 LARGE SAMPLE THEORY 4 Modes of Convergence

5 CHAPTER 3 LARGE SAMPLE THEORY 5 Convergence almost surely Definition 3.1 X n is said to converge almost surely to X, denoted by X n a.s. X, if there exists a set A Ω such that P (A c ) = 0 and for each ω A, X n (ω) X(ω) in real space.

6 CHAPTER 3 LARGE SAMPLE THEORY 6 Equivalent condition {ω : X n (ω) X(ω)} c = ϵ>0 n {ω : sup X m (ω) X(ω) > ϵ} m n X n a.s. X iff P (sup X m X > ϵ) 0 m n

7 CHAPTER 3 LARGE SAMPLE THEORY 7 Convergence in probability Definition 3.2 X n is said to converge in probability to X, denoted by X n p X, if for every ϵ > 0, P ( X n X > ϵ) 0.

8 CHAPTER 3 LARGE SAMPLE THEORY 8 Convergence in moments/means Definition 3.3 X n is said to converge in rth mean to X, denote by X n r X, if E[ X n X r ] 0 as n for functions X n, X L r (P ), where X L r (P ) means X r dp <.

9 CHAPTER 3 LARGE SAMPLE THEORY 9 Convergence in distribution Definition 3.4 X n is said to converge in distribution of X, denoted by X n d X or F n d F (or L(X n ) L(X) with L referring to the law or distribution ), if the distribution functions F n and F of X n and X satisfy F n (x) F (x) as n for each continuity point x of F.

10 CHAPTER 3 LARGE SAMPLE THEORY 10 Uniform integrability Definition 3.5 A sequence of random variables {X n } is uniformly integrable if lim lim sup E { X n I( X n λ)} = 0. λ n

11 CHAPTER 3 LARGE SAMPLE THEORY 11 A note Convergence almost surely and convergence in probability are the same as we defined in measure theory. Two new definitions are convergence in rth mean convergence in distribution

12 CHAPTER 3 LARGE SAMPLE THEORY 12 convergence in distribution is very different from others example: a sequence X, Y, X, Y, X, Y,... where X and Y are N(0, 1); the sequence converges in distribution to N(0, 1) but the other modes do not hold. convergence in distribution is important for asymptotic statistical inference.

13 CHAPTER 3 LARGE SAMPLE THEORY 13 Relationship among different modes Theorem 3.1 A. If X n a.s. X, then X n p X. B. If X n p X, then X nk a.s. X for some subsequence X nk. C. If X n r X, then X n p X. D. If X n p X and X n r is uniformly integrable, then X n r X. E. If X n p X and lim sup n E X n r E X r, then X n r X.

14 CHAPTER 3 LARGE SAMPLE THEORY 14 F. If X n r X, then X n r X for any 0 < r r. G. If X n p X, then X n d X. H. X n p X if and only if for every subsequence {X nk } there exists a further subsequence {X nk,l} such that X nk,l a.s. X. I. If X n d c for a constant c, then X n p c.

15 CHAPTER 3 LARGE SAMPLE THEORY 15

16 CHAPTER 3 LARGE SAMPLE THEORY 16 Proof A and B follow from the results in the measure theory. Prove C. Markov inequality: for any increasing function g( ) and random variable Y, P ( Y > ϵ) E[ g( Y ) g(ϵ) ]. P ( X n X > ϵ) E[ X n X r ϵ r ] 0.

17 CHAPTER 3 LARGE SAMPLE THEORY 17 Prove D. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E X nk X r 0. For any subsequence of {X n }, from B, there exists a further subsequence {X nk } such that X nk a.s. X. For any ϵ, there exists λ such that lim sup nk E[ X nk r I( X nk r λ)] < ϵ. Particularly, choose λ such that P ( X r = λ) = 0 X nk r I( X nk r λ) a.s. X r I( X r λ). By the Fatou s Lemma, E[ X r I( X r λ)] lim sup n k E[ X nk r I( X nk r λ)] < ϵ.

18 CHAPTER 3 LARGE SAMPLE THEORY 18 E[ X nk X r ] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +E[ X nk X r I( X nk r 2λ, or, X r 2λ)] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +2 r E[( X nk r + X r )I( X nk r 2λ, or, X r 2λ)], where the last inequality follows from the inequality (x + y) r 2 r (max(x, y)) r 2 r (x r + y r ), x 0, y 0. When n k is large, the second term is bounded by 2 2 r {E[ X nk r I( X nk λ)] + E[ X r I( X λ)]} 2 r+1 ϵ. lim sup n E[ X nk X r ] 2 r+1 ϵ.

19 CHAPTER 3 LARGE SAMPLE THEORY 19 Prove E. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E[ X nk X r ] 0. For any subsequence of {X n }, there exists a further subsequence {X nk } such that X nk a.s. X. Define Y nk = 2 r ( X nk r + X r ) X nk X r 0. By the Fatou s Lemma, lim inf Y nk dp lim inf n k n k It is equivalent to Y nk dp. 2 r+1 E[ X r ] lim inf n k {2 r E[ X nk r ] + 2 r E[ X r ] E[ X nk X r ]}.

20 CHAPTER 3 LARGE SAMPLE THEORY 20 Prove F. The Hölder inequality: f(x)g(x) dµ { } 1/p { f(x) p dµ(x) g(x) p dµ(x)} 1/q, 1 p + 1 q = 1. Choose µ = P, f = X n X r, g 1 and p = r/r, q = r/(r r ) in the Hölder inequality E[ X n X r ] E[ X n X r ] r /r 0.

21 CHAPTER 3 LARGE SAMPLE THEORY 21 Prove G. X n p X. If P (X = x) = 0, then for any ϵ > 0, P ( I(X n x) I(X x) > ϵ) = P ( I(X n x) I(X x) > ϵ, X x > δ) +P ( I(X n x) I(X x) > ϵ, X x δ) P (X n x, X > x + δ) + P (X n > x, X < x δ) +P ( X x δ) P ( X n X > δ) + P ( X x δ). The first term converges to zero since X n p X. The second term can be arbitrarily small if δ is small, since lim δ 0 P ( X x δ) = P (X = x) = 0. I(X n x) p I(X x) F n (x) = E[I(X n x)] E[I(X x)] = F (x).

22 CHAPTER 3 LARGE SAMPLE THEORY 22 Prove H. One direction follows from B. To prove the other direction, use the contradiction. Suppose there exists ϵ > 0 such that P ( X n X > ϵ) does not converge to zero. find a subsequence {X n } such hat P ( X n X > ϵ) > δ for some δ > 0. However, by the condition, there exists a further subsequence X n such that X n a.s. X then X n p X from A. Contradiction!

23 CHAPTER 3 LARGE SAMPLE THEORY 23 Prove I. Let X c. P ( X n c > ϵ) 1 F n (c + ϵ) + F n (c ϵ) 1 F X (c + ϵ) + F (c ϵ) = 0.

24 CHAPTER 3 LARGE SAMPLE THEORY 24 Some counter-examples (Example 1 ) Suppose that X n is degenerate at a point 1/n; i.e., P (X n = 1/n) = 1. Then X n converges in distribution to zero. Indeed, X n converges almost surely.

25 CHAPTER 3 LARGE SAMPLE THEORY 25 (Example 2 ) X 1, X 2,... are i.i.d with standard normal distribution. Then X n d X 1 but X n does not converge in probability to X 1.

26 CHAPTER 3 LARGE SAMPLE THEORY 26 (Example 3 ) Let Z be a random variable with a uniform distribution in [0, 1]. Let X n = I(m2 k Z < (m + 1)2 k ) when n = 2 k + m where 0 m < 2 k. Then it is shown that X n converges in probability to zero but not almost surely. This example is already given in the second chapter.

27 CHAPTER 3 LARGE SAMPLE THEORY 27 (Example 4 ) Let Z be Uniform(0, 1) and let X n = 2 n I(0 Z < 1/n). Then E[ X n r ]] but X n converges to zero almost surely.

28 CHAPTER 3 LARGE SAMPLE THEORY 28 Result for convergence in rth mean Theorem 3.2 (Vitali s theorem) Suppose that X n L r (P ), i.e., X n r <, where 0 < r < and X n p X. Then the following are equivalent: A. { X n r } are uniformly integrable. B. X n r X. C. E[ X n r ] E[ X r ].

29 CHAPTER 3 LARGE SAMPLE THEORY 29 One sufficient condition for uniform integrability Liapunov condition: there exists a positive constant ϵ 0 such that lim sup n E[ X n r+ϵ 0 ] < E[ X n r I( X n r λ)] E[ X n r+ϵ 0 ] λ ϵ 0

30 CHAPTER 3 LARGE SAMPLE THEORY 30 Integral inequalities

31 CHAPTER 3 LARGE SAMPLE THEORY 31 Young s inequality ab a p p + b q q, a, b > 0, where the equality holds if and only if a = b. log x is concave: log( 1 p a p + 1 q b q ) 1 p log a p + 1 q log b. Geometric interpretation (insert figure here):

32 CHAPTER 3 LARGE SAMPLE THEORY 32 Hölder inequality f(x)g(x) dµ(x) { } 1 { } 1 f(x) p p dµ(x) g(x) q q dµ(x). in the Young s inequality, let a = f(x)/ { f(x) p dµ(x)} 1/p b = g(x)/ { g(x) q dµ(x)} 1/q. when µ = P and f = X(ω), g = 1, µ s t r where µ r = E[ X r ] and r s t 0. µ r s t µ r t s when p = q = 2, obtain Cauchy-Schwartz inequality: f(x)g(x) dµ(x) { } 1 { } 1 f(x) 2 2 dµ(x) g(x) 2 2 dµ(x).

33 CHAPTER 3 LARGE SAMPLE THEORY 33 Minkowski s inequality r > 1, derivation: X + Y r X r + Y r. E[ X + Y r ] E[( X + Y ) X + Y r 1 ] E[ X r ] 1/r E[ X+Y r ] 1 1/r +E[ Y r ] 1/r E[ X+Y r ] 1 1/r. r in fact is a norm in the linear space {X : X r < }. Such a normed space is denoted as L r (P ).

34 CHAPTER 3 LARGE SAMPLE THEORY 34 Markov s inequality P ( X ϵ) E[g( X )], g(ϵ) where g 0 is a increasing function in [0, ). Derivation: P ( X ϵ) P (g( X ) g(ϵ)) = E[I(g( X ) g(ϵ))] E[ g( X ) g(ϵ) ]. When g(x) = x 2 and X replaced by X µ, obtain Chebyshev s inequality: P ( X µ ϵ) V ar(x) ϵ 2.

35 CHAPTER 3 LARGE SAMPLE THEORY 35 Application of Vitali s theorem Y 1, Y 2,... are i.i.d with mean µ and variance σ 2. Let X n = Ȳn. By the Chebyshev s inequality, X n p µ. P ( X n µ > ϵ) V ar(x n) ϵ 2 = σ2 nϵ 2 0. From the Liapunov condition with r = 1 and ϵ 0 = 1, X n µ satisfies the uniform integrability condition E[ X n µ ] 0.

36 CHAPTER 3 LARGE SAMPLE THEORY 36 Convergence in Distribution

37 CHAPTER 3 LARGE SAMPLE THEORY 37 Convergence in distribution is the most important mode of convergence in statistical inference.

38 CHAPTER 3 LARGE SAMPLE THEORY 38 Equivalent conditions Theorem 3.3 (Portmanteau Theorem) The following conditions are equivalent. (a). X n converges in distribution to X. (b). For any bounded continuous function g( ), E[g(X n )] E[g(X)]. (c). For any open set G in R, lim inf n P (X n G) P (X G). (d). For any closed set F in R, lim sup n P (X n F ) P (X F ). (e). For any Borel set O in R with P (X O) = 0 where O is the boundary of O, P (X n O) P (X O).

39 CHAPTER 3 LARGE SAMPLE THEORY 39 Proof (a) (b). Without loss of generality, assume g(x) 1. We choose [ M, M] such that P ( X = M) = 0. Since g is continuous in [ M, M], g is uniformly continuous in [ M, M]. Partition [ M, M] into finite intervals I 1... I m such that within each interval I k, max Ik g(x) min Ik g(x) ϵ and X has no mass at all the endpoints of I k (why?).

40 CHAPTER 3 LARGE SAMPLE THEORY 40 Therefore, if choose any point x k I k, k = 1,..., m, E[g(X n )] E[g(X)] E[ g(x n ) I( X n > M)] + E[ g(x) I( X > M)] m + E[g(X n )I( X n M)] g(x k )P (X n I k ) + m g(x k )P (X n I k ) k=1 + E[g(X)I( X M)] k=1 m g(x k )P (X I k ) k=1 m g(x k )P (X I k ) k=1 P ( X n > M) + P ( X > M) m +2ϵ + P (X n I k ) P (X I k ). k=1 lim sup n E[g(X n )] E[g(X)] 2P ( X > M) + 2ϵ. Let M and ϵ 0.

41 CHAPTER 3 LARGE SAMPLE THEORY 41 ϵ (b) (c). For any open set G, define g(x) = 1 ϵ+d(x,g c ), where d(x, G c ) is the minimal distance between x and G c, inf y G c x y. For any y G c, d(x 1, G c ) x 2 y x 1 y x 2 y x 1 x 2, d(x 1, G c ) d(x 2, G c ) x 1 x 2. g(x 1 ) g(x 2 ) ϵ 1 d(x 1, G c ) d(x 2, G c ) ϵ 1 x 1 x 2. g(x) is continuous and bounded. E[g(X n )] E[g(X)]. Note 0 g(x) I G (x) lim inf P (X n G) lim inf n n E[g(X n )] E[g(X)]. Let ϵ 0 E[g(X)] converges to E[I(X G)] = P (X G). (c) (d). This is clear by taking complement of F.

42 CHAPTER 3 LARGE SAMPLE THEORY 42 (d) (e). For any O with P (X O) = 0, lim sup n P (X n O) lim sup n P (X n Ō) P (X Ō) = P (X O), lim inf n P (X n O) lim inf n P (X n O o ) P (X O o ) = P (X O). (e) (a). Choose O = (, x] with P (X O) = P (X = x) = 0.

43 CHAPTER 3 LARGE SAMPLE THEORY 43 Counter-examples Let g(x) = x, a continuous but unbounded function. Let X n be a random variable taking value n with probability 1/n and value 0 with probability (1 1/n). Then X n d 0. However, E[g(X)] = 1 does not converge to 0. The continuity at boundary in (e) is also necessary: let X n be degenerate at 1/n and consider O = {x : x > 0}. Then P (X n O) = 1 but X n d 0.

44 CHAPTER 3 LARGE SAMPLE THEORY 44 Weak Convergence and Characteristic Functions

45 CHAPTER 3 LARGE SAMPLE THEORY 45 Theorem 3.4 (Continuity Theorem) Let ϕ n and ϕ denote the characteristic functions of X n and X respectively. Then X n d X is equivalent to ϕ n (t) ϕ(t) for each t.

46 CHAPTER 3 LARGE SAMPLE THEORY 46 Proof To prove direction, from (b) in Theorem 3.1, ϕ n (t) = E[e itx n ] E[e itx ] = ϕ(t). The proof of direction consists of a few tricky constructions (skipped).

47 CHAPTER 3 LARGE SAMPLE THEORY 47 One simple example X 1,..., X n Bernoulli(p) ϕ Xn (t) = E[e it(x X n )/n ] = (1 = p + pe it/n ) n = (1 p + p + itp/n + o(1/n)) n e itp. Note the limit is the c.f. of X = p. Thus, Xn d p so X n converges in probability to p.

48 CHAPTER 3 LARGE SAMPLE THEORY 48 Generalization to multivariate random vectors X n d X if and only if E[exp{it X n }] E[exp{it X }], where t is any k-dimensional constant Equivalently, t X n d t X for any t to study the weak convergence of random vectors, we can reduce to study the weak convergence of one-dimensional linear combination of the random vectors This is the well-known Cramér-Wold s device

49 CHAPTER 3 LARGE SAMPLE THEORY 49 Theorem 3.5 (The Cramér-Wold device) Random vector X n in R k satisfy X n d X if and only t X n d t X in R for all t R k.

50 CHAPTER 3 LARGE SAMPLE THEORY 50 Properties of Weak Convergence

51 CHAPTER 3 LARGE SAMPLE THEORY 51 Theorem 3.6 (Continuous mapping theorem) Suppose X n a.s. X, or X n p X, or X n d X. Then for any continuous function g( ), g(x n ) converges to g(x) almost surely, or in probability, or in distribution.

52 CHAPTER 3 LARGE SAMPLE THEORY 52 Proof If X n a.s. X, g(x n ) a.s g(x). If X n p X, then for any subsequence, there exists a further subsequence X nk a.s. X. Thus, g(x nk ) a.s. g(x). Then g(x n ) p g(x) from (H) in Theorem 3.1. To prove that g(x n ) d g(x) when X n d X, use (b) of Theorem 3.1.

53 CHAPTER 3 LARGE SAMPLE THEORY 53 One remark Theorem 3.6 concludes that g(x n ) d g(x) if X n d X and g is continuous. In fact, this result still holds if P (X C(g)) = 1 where C(g) contains all the continuity points of g. That is, if g s discontinuity points take zero probability of X, the continuous mapping theorem holds.

54 CHAPTER 3 LARGE SAMPLE THEORY 54 Theorem 3.7 (Slutsky theorem) Suppose X n d X, Y n p y and Z n p z for some constant y and z. Then Z n X n + T n d zx + y.

55 CHAPTER 3 LARGE SAMPLE THEORY 55 Proof First show that X n + Y n d X + y. For any ϵ > 0, P (X n + Y n x) P (X n + Y n x, Y n y ϵ) + P ( Y n y > ϵ) P (X n x y + ϵ) + P ( Y n y > ϵ). lim sup n F Xn +Y n (x) lim sup n F Xn (x y + ϵ) F X (x y + ϵ).

56 CHAPTER 3 LARGE SAMPLE THEORY 56 On the other hand, P (X n + Y n > x) = P (X n + Y n > x, Y n y ϵ) + P ( Y n y > ϵ) P (X n > x y ϵ) + P ( Y n y > ϵ). lim sup(1 F Xn +Y n (x)) lim sup P (X n > x y ϵ) n n lim sup n P (X n x y 2ϵ) (1 F X (x y 2ϵ)). F X (x y 2ϵ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X (x + y + ϵ). F X+y (x ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X+y (x).

57 CHAPTER 3 LARGE SAMPLE THEORY 57 To complete the proof, P ( (Z n z)x n > ϵ) P ( Z n z > ϵ 2 )+P ( Z n z ϵ 2, X n > 1 ϵ ). lim sup n P ( (Z n z)x n > ϵ) lim sup n + lim sup n that (Z n z)x n p 0. P ( Z n z > ϵ 2 ) P ( X n 1 2ϵ ) P ( X 1 2ϵ ). Clearly zx n d zx Z n X n d zx from the proof in the first half. Again, using the first half s proof, Z n X n + Y n d zx + y.

58 CHAPTER 3 LARGE SAMPLE THEORY 58 Examples Suppose X n d N(0, 1). Then by continuous mapping theorem, X 2 n d χ 2 1. This example shows that g can be discontinuous in Theorem 3.6. Let X n d X with X N(0, 1) and g(x) = 1/x. Although g(x) is discontinuous at origin, we can still show that 1/X n d 1/X, the reciprocal of the normal distribution. This is because P (X = 0) = 0. However, in Example 3.6 where g(x) = I(x > 0), it shows that Theorem 3.6 may not be true if P (X C(g)) < 1.

59 CHAPTER 3 LARGE SAMPLE THEORY 59 The condition Y n p y, where y is a constant, is necessary. For example, let X n = X Uniform(0, 1). Let Y n = X so Y n d X, where X is an independent random variable with the same distribution as X. However X n + Y n = 0 does not converge in distribution to X X.

60 CHAPTER 3 LARGE SAMPLE THEORY 60 Let X 1, X 2,... be a random sample from a normal distribution with mean µ and variance σ 2 > 0, n( Xn µ) d N(0, σ 2 ), s 2 n = 1 n 1 n i=1 (X i X n ) 2 a.s σ 2. n( Xn µ) 1 d s n σ N(0, σ2 ) = N(0, 1). in large sample, t n 1 can be approximated by a standard normal distribution.

61 CHAPTER 3 LARGE SAMPLE THEORY 61 Representation of Weak Convergence

62 CHAPTER 3 LARGE SAMPLE THEORY 62 Theorem 3.8 (Skorohod s Representation Theorem) Let {X n } and X be random variables in a probability space (Ω, A, P ) and X n d X. Then there exists another probability space ( Ω, Ã, P ) and a sequence of random variables X n and X defined on this space such that X n and X n have the same distributions, X and X have the same distributions, and moreover, Xn a.s. X.

63 CHAPTER 3 LARGE SAMPLE THEORY 63 Quantile function F 1 (p) = inf{x : F (x) p}. Proposition 3.1 (a) F 1 is left-continuous. (b) If X has continuous distribution function F, then F (X) Uniform(0, 1). (c) Let ξ Uniform(0, 1) and let X = F 1 (ξ). Then for all x, {X x} = {ξ F (x)}. Thus, X has distribution function F.

64 CHAPTER 3 LARGE SAMPLE THEORY 64 Proof (a) Clearly, F 1 is nondecreasing. Suppose p n increases to p then F 1 (p n ) increases to some y F 1 (p). Then F (y) p n so F (y) p. F 1 (p) y y = F 1 (p). (b) {X x} {F (X) F (x)} F (x) P (F (X) F (x)). {F (X) F (x) ϵ} {X x} P (F (X) F (x) ϵ) F (x) P (F (X) F (x) ) F (x). Then if X is continuous, P (F (X) F (x)) = F (x). (c) P (X x) = P (F 1 (ξ) x) = P (ξ F (x)) = F (x).

65 CHAPTER 3 LARGE SAMPLE THEORY 65 Proof Let ( Ω, Ã, P ) be ([0, 1], B [0, 1], λ). Define X n = Fn 1 (ξ), X = F 1 (ξ), where ξ Uniform(0, 1). Xn has a distribution F n which is the same as X n. For any t (0, 1) such that there is at most one value x such that F (x) = t (it is easy to see t is the continuous point of F 1 ), for any z < x, F (z) < t when n is large, F n (z) < t so Fn 1 (t) z. lim inf n F 1 n (t) z lim inf n F 1 n (t) x = F 1 (t). From F (x + ϵ) > t, F n (x + ϵ) > t so Fn 1 (t) x + ϵ. lim sup n Fn 1 (t) x + ϵ lim sup n Fn 1 (t) x. Thus F 1 n (t) F 1 (t) for almost every t (0, 1) X n a.s. X.

66 CHAPTER 3 LARGE SAMPLE THEORY 66 Usefulness of representation theorem For example, if X n d X and one wishes to show some function of X n, denote by g(x n ), converges in distribution to g(x): see the diagram in Figure 2.

67 CHAPTER 3 LARGE SAMPLE THEORY 67

68 CHAPTER 3 LARGE SAMPLE THEORY 68 Alternative Proof for Slutsky Theorem First, show (X n, Y n ) d (X, y). ϕ (Xn,Y n )(t 1, t 2 ) ϕ (X,y) (t 1, t 2 ) = E[e it 1X n e it 2Y n ] E[e it1x e it2y ] E[e it 1X n (e it 2Y n e it2y )] + e it2y E[e it 1X n ] E[e it1x ] E[ e it 2Y n e it2y ] + E[e it 1X n ] E[e it1x ] 0. Thus, (Z n, X n ) d (z, X). Since g(z, x) = zx is continuous, Z n X n d zx. Since (Z n X n, Y n ) d (zx, y) and g(x, y) = x + y is continuous, Z n X n + Y n d zx + y.

69 CHAPTER 3 LARGE SAMPLE THEORY 69 Summation of Independent R.V.s

70 CHAPTER 3 LARGE SAMPLE THEORY 70 Some preliminary lemmas Proposition 3.2 (Borel-Cantelli Lemma) For any events A n, i=1 P (A n ) < implies P (A n, i.o.) = P ({A n } occurs infinitely often) = 0; or equivalently, P ( n=1 m n A m ) = 0. Proof P (A n, i.o) P ( m n A m ) m n P (A m ) 0, as n.

71 CHAPTER 3 LARGE SAMPLE THEORY 71 One result of the first Borel-cantelli lemma If for a sequence of random variables, {Z n }, and for any ϵ > 0, n P ( Z n > ϵ) <, then Z n > ϵ only occurs finite times. Z n a.s. 0.

72 CHAPTER 3 LARGE SAMPLE THEORY 72 Proposition 3.3 (Second Borel-Cantelli Lemma) For a sequence of independent events A 1, A 2,..., n=1 P (A n ) = implies P (A n, i.o.) = 1. Proof Consider the complement of {A n, i.o}. P ( n=1 m n A c m) = lim n P ( m n A c m) = lim n lim sup n m n exp{ m n P (A m )} = 0. (1 P (A m ))

73 CHAPTER 3 LARGE SAMPLE THEORY 73 Equivalence lemma Proposition 3.4 X 1,..., X n are i.i.d with finite mean. Define Y n = X n I( X n n). Then n=1 P (X n Y n ) <.

74 CHAPTER 3 LARGE SAMPLE THEORY 74 Proof Since E[ X 1 ] <, P ( X n) = n=1 np (n X < (n + 1)) n=1 E[ X ] <. n=1 From the Borel-Cantelli Lemma, P (X n Y n, i.o) = 0. For almost every ω Ω, when n is large enough, X n (ω) = Y n (ω).

75 CHAPTER 3 LARGE SAMPLE THEORY 75 Weak Law of Large Numbers

76 CHAPTER 3 LARGE SAMPLE THEORY 76 Theorem 3.9 (Weak Law of Large Number) If X, X 1,..., X n are i.i.d with mean µ (so E[ X ] < and µ = E[X]), then X n p µ.

77 CHAPTER 3 LARGE SAMPLE THEORY 77 Proof Define Y n = X n I( n X n n). Let µ n = n k=1 E[Y k]/n. P ( Ȳn µ n ϵ) V ar(ȳn) ϵ 2 n k=1 V ar(x ki( X k k)) n 2 ϵ 2. V ar(x k I( X k k)) E[X 2 ki( X k k)] = E[X 2 ki( X k k, X k kϵ 2 )] + E[X 2 ki( X k k, X kϵ 2 )] ke[ X k I( X k kϵ 2 )] + kϵ 4, n P ( Ȳn µ n ϵ) lim sup n P ( Ȳn µ n ϵ) ϵ 2 Ȳn µ n p 0. k=1 E[ X I( X kϵ 2 )] nϵ 2 + ϵ 2 n(n+1) 2n. 2 µ n µ Ȳn p µ. From Proposition 3.4 and subsequence arguments, X nk a.s. µ X n p µ.

78 CHAPTER 3 LARGE SAMPLE THEORY 78 Strong Law of Large Numbers

79 CHAPTER 3 LARGE SAMPLE THEORY 79 Theorem 3.10 (Strong Law of Large Number) If X 1,..., X n are i.i.d with mean µ then X n a.s. µ.

80 CHAPTER 3 LARGE SAMPLE THEORY 80 Proof Without loss of generality, we assume X n 0 since if this is true, the result also holds for any X n by X n = X + n X n. Similar to Theorem 3.9, it is sufficient to show Ȳn a.s. µ, where Y n = X n I(X n n). Note E[Y n ] = E[X 1 I(X 1 n)] µ so n E[Y k ]/n µ. k=1 if we denote S n = n k=1 (Y k E[Y k ]) and we can show S n /n a.s. 0, then the result holds.

81 CHAPTER 3 LARGE SAMPLE THEORY 81 V ar( S n n ) = V ar(y k ) k=1 By the Chebyshev s inequality, n k=1 E[Y 2 k ] ne[x 2 1 I(X 1 n)]. n=1 P ( S n n > ϵ) 1 n 2 ϵ 2 V ar( S n ) E[X2 1 I(X 1 n)] nϵ 2. For any α > 1, let u n = [α n ]. P ( S un > ϵ) u n n=1 1 u n ϵ 2 E[X2 1 I(X 1 u n )] 1 ϵ 2 E[X2 1 u n X 1 1 u n ]. Since for any x > 0, u n x {µ n} 1 < 2 n log x/ log α α n Kx 1 for some constant K, P ( S un > ϵ) K u n ϵ 2 E[X 1] <, n=1 S un /u n a.s. 0.

82 CHAPTER 3 LARGE SAMPLE THEORY 82 For any k, we can find u n < k u n+1. Thus, since X 1, X 2,... 0, S un u n µ/α lim inf k u n u n+1 S k k S un+1 u n+1 u n+1 u n. S k k lim sup k S k k µα. Since α is arbitrary number larger than 1, let α 1 and we obtain lim k Sk /k = µ.

83 CHAPTER 3 LARGE SAMPLE THEORY 83 Central Limit Theorems

84 CHAPTER 3 LARGE SAMPLE THEORY 84 Preliminary result of c.f. Proposition 3.5 Suppose E[ X m ] < for some integer m 0. Then ϕ X (t) m k=0 (it) k E[X k ] / t m 0, as t 0. k!

85 CHAPTER 3 LARGE SAMPLE THEORY 85 Proof e itx = m k=1 (itx) k k! + (itx)m m! [e itθx 1], where θ [0, 1]. ϕ X (t) as t 0. m k=0 (it) k E[X k ] / t m E[ X m e itθx 1 ]/m! 0, k!

86 CHAPTER 3 LARGE SAMPLE THEORY 86 Simple versions of CLT Theorem 3.11 (Central Limit Theorem) If X 1,..., X n are i.i.d with mean µ and variance σ 2 then n( Xn µ) d N(0, σ 2 ).

87 CHAPTER 3 LARGE SAMPLE THEORY 87 Proof Denote Y n = n( X n µ). ϕ Yn (t) = { ϕ X1 µ(t/ n) } n. ϕ X1 µ(t/ n) = 1 σ 2 t 2 /2n + o(1/n). ϕ Yn (t) exp{ σ2 t 2 2 }.

88 CHAPTER 3 LARGE SAMPLE THEORY 88 Theorem 3.12 (Multivariate Central Limit Theorem) If X 1,..., X n are i.i.d random vectors in R k with mean µ and covariance Σ = E[(X µ)(x µ) ], then n( X n µ) d N(0, Σ). Proof Use the Cramér-Wold s device.

89 CHAPTER 3 LARGE SAMPLE THEORY 89 Liaponov CLT Theorem 3.13 (Liapunov Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let µ n = n i=1 µ ni, σ 2 n = n i=1 σ 2 ni. If n i=1 E[ X ni µ ni 3 ] σ 3 n then n i=1 (X ni µ ni )/σ n d N(0, 1). 0,

90 CHAPTER 3 LARGE SAMPLE THEORY 90 Lindeberg-Feller CLT Theorem 3.14 (Lindeberg-Fell Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let σ 2 n = n i=1 σ 2 ni. Then both n i=1 (X ni µ ni )/σ n d N(0, 1) and max {σ 2 ni/σ 2 n : 1 i n} 0 if and only if the Lindeberg condition 1 σ 2 n n i=1 holds. E[ X ni µ ni 2 I( X ni µ ni ϵσ n )] 0, for all ϵ > 0

91 CHAPTER 3 LARGE SAMPLE THEORY 91 Proof of Liapunov CLT using Theorem σ 2 n n i=1 E[ X nk µ nk 2 I( X nk µ nk > ϵσ n )] 1 ϵ 3 σ 3 n n k=1 E[ X nk µ nk 3 ].

92 CHAPTER 3 LARGE SAMPLE THEORY 92 Examples This is one example from a simple linear regression X j = α + βz j + ϵ j for j = 1, 2,... where z j are known numbers not all equal and the ϵ j are i.i.d with mean zero and variance σ 2. ˆβ n = n j=1 X j (z j z n )/ n j=1 (z j z n ) 2 = β + n j=1 ϵ j (z j z n )/ n j=1 (z j z n ) 2. Assume max j n (z j z n ) 2 / n j=1 (z j z n ) 2 0. n n j=1 (z j z n ) 2 n ( ˆβ n β) d N(0, σ 2 ).

93 CHAPTER 3 LARGE SAMPLE THEORY 93 The example is taken from the randomization test for paired comparison. Let (X j, Y j ) denote the values of jth pairs with X j being the result of the treatment and Z j = X j Y j. Conditional on Z j = z j, Z j = Z j sgn(z j ) is independent taking values ± Z j with probability 1/2, when treatment and control have no difference. Conditional on z 1, z 2,..., the randomization t-test is the t-statistic n 1 Z n /s z where s 2 z is 1/n n j=1 (Z j Z n ) 2. When max j n z2 j / n j=1 z 2 j 0, this statistic has an asymptotic normal distribution N(0, 1).

94 CHAPTER 3 LARGE SAMPLE THEORY 94 Delta Method

95 CHAPTER 3 LARGE SAMPLE THEORY 95 Theorem 3.15 (Delta method) For random vector X and X n in R k, if there exists two constant a n and µ such that a n (X n µ) d X and a n, then for any function g : R k R l such that g has a derivative at µ, denoted by g(µ) a n (g(x n ) g(µ)) d g(µ)x.

96 CHAPTER 3 LARGE SAMPLE THEORY 96 Proof By the Skorohod representation, we can construct X n and X such that X n d X n and X d X ( d means the same distribution) and a n ( X n µ) a.s. X. a n (g( X n ) g(µ)) a.s. g(µ) X a n (g(x n ) g(µ)) d g(µ)x

97 CHAPTER 3 LARGE SAMPLE THEORY 97 Examples Let X 1, X 2,... be i.i.d with fourth moment and s 2 n = (1/n) n i=1 (X i X n ) 2. Denote m k as the kth moment of X 1 for k 4. Note that s 2 n = (1/n) n i=1 Xi 2 ( n i=1 X i /n) 2 and [( ) ( )] X n n (1/n) m1 n i=1 X2 i m 2 ( ( )) m2 m 1 m 3 m 1 m 2 d N 0, m 3 m 1 m 2 m 4 m 2, 2 the Delta method with g(x, y) = y x 2 n(s 2 n V ar(x 1 )) d N(0, m 4 (m 2 m 2 1) 2 ).

98 CHAPTER 3 LARGE SAMPLE THEORY 98 Let (X 1, Y 1 ), (X 2, Y 2 ),... be i.i.d bivariate samples with finite fourth moment. One estimate of the correlation among X and Y is ˆρ n = s xy, s 2 x s 2 y where s xy = (1/n) n i=1 (X i X n )(Y i Ȳn), s 2 x = (1/n) n i=1 (X i X n ) 2 and s 2 y = (1/n) n i=1 (Y i Ȳn) 2. To derive the large sample distribution of ˆρ n, first obtain the large sample distribution of (s xy, s 2 x, s 2 y) using the Delta method then further apply the Delta method with g(x, y, z) = x/ yz.

99 CHAPTER 3 LARGE SAMPLE THEORY 99 The example is taken from the Pearson s Chi-square statistic. Suppose that one subject falls into K categories with probabilities p 1,..., p K, where p p K = 1. The Pearson s statistic is defined as χ 2 = n K k=1 ( n k n p k) 2 /p k, which can be treated as (observed count expected count) 2 /expected count. Note n(n 1 /n p 1,..., n K /n p K ) has an asymptotic multivariate normal distribution. Then we can apply the Delta method to g(x 1,..., x K ) = K i=1 x 2 k.

100 CHAPTER 3 LARGE SAMPLE THEORY 100 U-statistics

101 CHAPTER 3 LARGE SAMPLE THEORY 101 Definition Definition 3.6 A U-statistics associated with h(x 1,..., x r ) is defined as U n = 1 r! ( n r) β h(x β1,..., X βr ), where the sum is taken over the set of all unordered subsets β of r different integers chosen from {1,..., n}.

102 CHAPTER 3 LARGE SAMPLE THEORY 102 Examples One simple example is h(x, y) = xy. Then U n = (n(n 1)) 1 i j X i X j. U n = E[ h(x 1,..., X r ) X (1),..., X (n) ]. U n is the summation of non-independent random variables. If define h(x 1,..., x r ) as (r!) 1 ( x 1,..., x r ) h( x 1,..., x r ), then h(x 1,..., x r ) is permutation-symmetric U n = 1 ( n r) β 1 <...<β r h(β 1,..., β r ). h is called the kernel of the U-statistic U n.

103 CHAPTER 3 LARGE SAMPLE THEORY 103 CLT for U-statistics Theorem 3.16 Let µ = E[h(X 1,..., X r )]. If E[h(X 1,..., X r ) 2 ] <, then n(un µ) n n i=1 E[U n µ X i ] p 0. Consequently, n(u n µ) is asymptotically normal with mean zero and variance r 2 σ 2, where, with X 1,..., X r, X 1,..., X r i.i.d variables, σ 2 = Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )).

104 CHAPTER 3 LARGE SAMPLE THEORY 104 Some preparation Linear space of r.v.:let S be a linear space of random variables with finite second moments that contain the constants; i.e., 1 S and for any X, Y S, ax + by S n where a and b are constants. Projection: for random variable T, a random variable S is called the projection of T on S if E[(T S) 2 ] minimizes E[(T S) 2 ], S S.

105 CHAPTER 3 LARGE SAMPLE THEORY 105 Proposition 3.7 Let S be a linear space of random variables with finite second moments. Then S is the projection of T on S if and only if S S and for any S S, E[(T S) S] = 0. Every two projections of T onto S are almost surely equal. If the linear space S contains the constant variable, then E[T ] = E[S] and Cov(T S, S) = 0 for every S S.

106 CHAPTER 3 LARGE SAMPLE THEORY 106 Proof For any S and S in S, E[(T S) 2 ] = E[(T S) 2 ] + 2E[(T S) S] + E[(S S) 2 ]. if S satisfies that E[(T S) S] = 0, then E[(T S) 2 ] E[(T S) 2 ]. S is the projection of T on S. If S is the projection, for any constant α, E[(T S α S) 2 ] is minimized at α = 0. Calculate the derivative at α = 0 E[(T S) S] = 0. If T has two projections S 1 and S 2, E[(S 1 S 2 ) 2 ] = 0. Thus, S 1 = S 2, a.s. If the linear space S contains the constant variable, choose S = 1 0 = E[(T S) S] = E[T ] E[S]. Clearly, Cov(T S, S) = E[(T S) S] = 0.

107 CHAPTER 3 LARGE SAMPLE THEORY 107 Equivalence with projection Proposition 3.8 Let S n be linear space of random variables with finite second moments that contain the constants. Let T n be random variables with projections S n on to S n. If V ar(t n )/V ar(s n ) 1 then Z n T n E[T n ] V ar(t n ) S n E[S n ] V ar(s n ) p 0.

108 CHAPTER 3 LARGE SAMPLE THEORY 108 Proof. E[Z n ] = 0. Note that Cov(T n, S n ) V ar(z n ) = 2 2 V ar(tn )V ar(s n ). Since S n is the projection of T n, Cov(T n, S n ) = Cov(T n S n, S n ) + V ar(s n ) = V ar(s n ). We have V ar(s n ) V ar(z n ) = 2(1 V ar(t n ) ) 0. By the Markov s inequality, we conclude that Z n p 0.

109 CHAPTER 3 LARGE SAMPLE THEORY 109 Conclusion if S n is the summation of i.i.d random variables such that (S n E[S n ])/ V ar(s n ) d N(0, σ 2 ), so is (T n E[T n ])/ V ar(t n ). The limit distribution of U-statistics is derived using this lemma.

110 CHAPTER 3 LARGE SAMPLE THEORY 110 Proof of CLT for U-statistics Proof Let X 1,..., X r be random variables with the same distribution as X 1 and they are independent of X 1,..., X n. Denote Ũn by n i=1 E[U µ X i]. We show that Ũn is the projection of U n on the linear space S n = { g 1 (X 1 ) g n (X n ) : E[g k (X k ) 2 ] <, k = 1,..., n }, which contains the constant variables. Clearly, Ũn S n. For any g k (X k ) S n, E[(U n Ũn)g k (X k )] = E[E[U n Ũn X k ]g k (X k )] = 0.

111 CHAPTER 3 LARGE SAMPLE THEORY 111 Ũ n = = r n n i=1 ( n 1 r 1 ) ( n r) E[h( X 1,..., X r 1, X i ) µ X i ] n E[h( X 1,..., X r 1, X i ) µ X i ]. i=1 V ar(ũn) = r2 n 2 n E[(E[h( X 1,..., X r 1, X i ) µ X i ]) 2 ] i=1 = r2 n Cov(E[h( X 1,..., X r 1, X 1 ) X 1 ], E[h( X 1,..., X r 1, X 1 ) X 1 ]) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2..., X r )) = r2 σ 2 n.

112 CHAPTER 3 LARGE SAMPLE THEORY 112 Furthermore, = = V ar(u n ) ( n r ( n r ) 2 β ) 2 r k=1 β Cov(h(X β1,..., X βr ), h(x β 1,..., X β r )) β and β share k components Cov(h(X 1, X 2,.., X k, X k+1,..., X r ), h(x 1, X 2,..., X k, X k+1,..., X r )). V ar(u n ) = r k=1 r! k!(r k)! (n r)(n r+1) (n 2r+k+1) n(n 1) (n r+1) c k. V ar(u n ) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )) + O( 1 n 2 ). V ar(u n )/V ar(ũn) 1. U n µ V ar(un ) Ũ n V ar(ũn) p 0.

113 CHAPTER 3 LARGE SAMPLE THEORY 113 Example In a bivariate i.i.d sample (X 1, Y 1 ), (X 2, Y 2 ),..., one statistic of measuring the agreement is called Kendall s τ-statistic ˆτ = 4 n(n 1) i<j I {(Y j Y i )(X j X i ) > 0} 1. ˆτ + 1 is a U-statistic of order 2 with the kernel 2I {(y 2 y 1 )(x 2 x 1 ) > 0}. n(ˆτ n + 1 2P ((Y 2 Y 1 )(X 2 X 1 ) > 0)) has an asymptotic normal distribution with mean zero.

114 CHAPTER 3 LARGE SAMPLE THEORY 114 Rank Statistics

115 CHAPTER 3 LARGE SAMPLE THEORY 115 Some definitions X (1) X (2)... X (n) is called order statistics The rank statistics, denoted by R 1,..., R n are the ranks of X i among X 1,..., X n. Thus, if all the X s are different, X i = X (Ri ). When there are ties, R i is defined as the average of all indices such that X i = X (j) (sometimes called midrank). Only consider the case that X s have continuous densities.

116 CHAPTER 3 LARGE SAMPLE THEORY 116 More definitions a rank statistic is any function of the ranks a linear rank statistic is a rank statistic of the special form n i=1 a(i, R i ) for a given matrix (a(i, j)) n n. if a(i, j) = c i a j, then such statistic with form ni=1 c i a Ri is called simple linear rank statistic: c and a s are called the coefficients and scores.

117 CHAPTER 3 LARGE SAMPLE THEORY 117 Examples In two independent sample X 1,..., X n and Y 1,..., Y m, a Wilcoxon statistic is defined as the summation of all the ranks of the second sample in the pooled data X 1,..., X n, Y 1,..., Y m, i.e., W n = n+m i=n+1 R i. Other choices for rank statistics: for instance, the van der Waerden statistic n+m i=n+1 Φ 1 (R i ).

118 CHAPTER 3 LARGE SAMPLE THEORY 118 Properties of rank statistics Proposition 3.9 Let X 1,..., X n be a random sample from continuous distribution function F with density f. Then 1. the vectors (X (1),..., X (n) ) and (R 1,..., R n ) are independent; 2. the vector (X (1),..., X (n) ) has density n! n i=1 f(x i ) on the set x 1 <... < x n ; 3. the variable X (i) has density ) F (x) i 1 (1 F (x)) n i f(x); for F the uniform ( n 1 i 1 distribution on [0, 1], it has mean i/(n + 1) and variance i(n i + 1)/[(n + 1) 2 (n + 2)];

119 CHAPTER 3 LARGE SAMPLE THEORY the vector (R 1,..., R n ) is uniformly distributed on the set of all n! permutations of 1, 2,..., n; 5. for any statistic T and permutation r = (r 1,..., r n ) of 1, 2,..., n, E[T (X 1,..., X n ) (R 1,.., R n ) = r] = E[T (X (r1 ),.., X (rn ))]; 6. for any simple linear rank statistic T = n i=1 c i a Ri, E[T ] = n c n ā n, V ar(t ) = 1 n 1 n n (c i c n ) 2 (a i ā n ) 2. i=1 i=1

120 CHAPTER 3 LARGE SAMPLE THEORY 120 CLT of rank statistics Theorem 3.17 Let T n = n i=1 c i a Ri such that max a i ā n / n (a i ā n ) 2 0, max c i c n / n (c i c n ) 2 0. i n i n i=1 Then (T n E[T n ])/ for every ϵ > 0, (i,j) i=1 V ar(t n ) d N(0, 1) if and only if { n a i ā n c i c n I n i=1 (a i ā n ) 2 n i=1 (c i c n ) 2 > ϵ a i ā n 2 c i c n 2 n i=1 (a i ā n ) 2 n i=1 (c i c n ) 0. 2 }

121 CHAPTER 3 LARGE SAMPLE THEORY 121 More on rank statistics a simple linear signed rank statistic n i=1 a R + sign(x i ), i where R + 1,..., R + n, absolute rank, are the ranks of X 1,..., X n. In a bivariate sample (X 1, Y 1 ),..., (X n, Y n ), ni=1 a Ri b Si where (R 1,..., R n ) and (S 1,..., S n ) are respective ranks of (X 1,..., X n ) and (Y 1,..., Y n ).

122 CHAPTER 3 LARGE SAMPLE THEORY 122 Martingales

123 CHAPTER 3 LARGE SAMPLE THEORY 123 Definition 3.7 Let {Y n } be a sequence of random variables and F n be sequence of σ-fields such that F 1 F 2... Suppose E[ Y n ] <. Then the pairs {(Y n, F n )} is called a martingale if E[Y n F n 1 ] = Y n 1, a.s. {(Y n, F n )} is a submartingale if E[Y n F n 1 ] Y n 1, a.s. {(Y n, F n )} is a supmartingale if E[Y n F n 1 ] Y n 1, a.s.

124 CHAPTER 3 LARGE SAMPLE THEORY 124 Some notes on definition Y 1,..., Y n are measurable in F n. Sometimes, we say Y n is adapted to F n. One simple example: Y n = X X n, where X 1, X 2,... are i.i.d with mean zero, and F n is the σ-filed generated by X 1,..., X n.

125 CHAPTER 3 LARGE SAMPLE THEORY 125 Convex function of martingales Proposition 3.9 Let {(Y n, F n )} be a martingale. For any measurable and convex function ϕ, {(ϕ(y n ), F n )} is a submartingale.

126 CHAPTER 3 LARGE SAMPLE THEORY 126 Proof Clearly, ϕ(y n ) is adapted to F n. It is sufficient to show E[ϕ(Y n ) F n 1 ] ϕ(y n 1 ). This follows from the well-known Jensen s inequality: for any convex function ϕ, E[ϕ(Y n ) F n 1 ] ϕ(e[y n F n 1 ]) = ϕ(y n 1 ).

127 CHAPTER 3 LARGE SAMPLE THEORY 127 Jensen s inequality Proposition 3.10 For any random variable X and any convex measurable function ϕ, E[ϕ(X)] ϕ(e[x]).

128 CHAPTER 3 LARGE SAMPLE THEORY 128 Proof Claim that for any x 0, there exists a constant k 0 such that for any x, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ). By the convexity, for any x < y < x 0 < y < x, ϕ(x 0 ) ϕ(x ) x 0 x ϕ(y) ϕ(x 0) y x 0 ϕ(x) ϕ(x 0) x x 0. Thus, ϕ(x) ϕ(x 0) x x 0 is bounded and decreasing as x decreases to x 0. Let the limit be k 0 + ϕ(x) ϕ(x 0) x x 0 k 0 +. ϕ(x) k 0 + (x x 0) + ϕ(x 0 ).

129 CHAPTER 3 LARGE SAMPLE THEORY 129 Similarly, ϕ(x ) ϕ(x 0 ) x x 0 ϕ(y ) ϕ(x 0 ) y x 0 ϕ(x) ϕ(x 0) x x 0. Then ϕ(x ) ϕ(x 0 ) x x 0 is increasing and bounded as x increases to x 0. Let the limit be k0 ϕ(x ) k0 (x x 0 ) + ϕ(x 0 ). Clearly, k + 0 k 0. Combining those two inequalities, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ) for k 0 = (k k 0 )/2. Choose x 0 = E[X] then ϕ(x) ϕ(e[x]) + k 0 (X E[X]).

130 CHAPTER 3 LARGE SAMPLE THEORY 130 Decomposition of submartingale Y n = (Y n E[Y n F n 1 ]) + E[Y n F n 1 ] any submartingale can be written as the summation of a martingale and a random variable predictable in F n 1.

131 CHAPTER 3 LARGE SAMPLE THEORY 131 Convergence of martingales Theorem 3.18 (Martingale Convergence Theorem) Let {(X n, F n )} be submartingale. If K = sup n E[ X n ] <, then X n a.s. X where X is a random variable satisfying E[ X ] K.

132 CHAPTER 3 LARGE SAMPLE THEORY 132 Corollary 3.1 If F n is increasing σ-field and denote F as the σ-field generated by n=1f n, then for any random variable Z with E[ Z ] <, it holds E[Z F n ] a.s. E[Z F ].

133 CHAPTER 3 LARGE SAMPLE THEORY 133 CLT for martingale Theorem 3.19 (Martingale Central Limit Theorem) Let (Y n1, F n1 ), (Y n2, F n2 ),... be a martingale. Define X nk = Y nk Y n,k 1 with Y n0 = 0 thus Y nk = X n X nk. Suppose that k E[X 2 nk F n,k 1 ] p σ 2 where σ is a positive constant and that for each ϵ > 0. Then k E[X 2 nki( X nk ϵ) F n,k 1 ] p 0 k X nk d N(0, σ 2 ).

134 CHAPTER 3 LARGE SAMPLE THEORY 134 Some Notation

135 CHAPTER 3 LARGE SAMPLE THEORY 135 o p (1) and O p (1) X n = o p (1) denotes that X n converges in probability to zero, X n = O p (1) denotes that X n is bounded in probability; i.e., lim lim sup P ( X n M) = 0. M n for a sequence of random variable {r n }, X n = o p (r n ) means that X n /r n p 0 and X n = O p (r n ) means that X n /r n is bounded in probability.

136 CHAPTER 3 LARGE SAMPLE THEORY 136 Algebra in o p (1) and O p (1) o p (1) + o p (1) = o p (1) O p (1) + O p (1) = O p (1), O p (1)o p (1) = o p (1) (1 + o p (1)) 1 = 1 + o p (1) o p (R n ) = R n o p (1) O p (R n ) = R n O p (1) o p (O p (1)) = o p (1). If a real function R( ) satisfies that R(h) = o( h p ) as h 0, R(X n ) = o p ( X n p ). If R(h) = O( h p ) as h 0, R(X n ) = O p ( X n p ).

137 CHAPTER 3 LARGE SAMPLE THEORY 137 READING MATERIALS: Lehmann and Casella, Section 1.8, Ferguson, Part 1, Part 2, Part

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the