CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction

CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually difficult and complicated large sample theory studies the limit behavior of a sequence of random variables, say X n. example: Xn µ, n( X n µ)

CHAPTER 3 LARGE SAMPLE THEORY 4 Modes of Convergence

CHAPTER 3 LARGE SAMPLE THEORY 5 Convergence almost surely Definition 3.1 X n is said to converge almost surely to X, denoted by X n a.s. X, if there exists a set A Ω such that P (A c ) = 0 and for each ω A, X n (ω) X(ω) in real space.

CHAPTER 3 LARGE SAMPLE THEORY 6 Equivalent condition {ω : X n (ω) X(ω)} c = ϵ>0 n {ω : sup X m (ω) X(ω) > ϵ} m n X n a.s. X iff P (sup X m X > ϵ) 0 m n

CHAPTER 3 LARGE SAMPLE THEORY 7 Convergence in probability Definition 3.2 X n is said to converge in probability to X, denoted by X n p X, if for every ϵ > 0, P ( X n X > ϵ) 0.

CHAPTER 3 LARGE SAMPLE THEORY 8 Convergence in moments/means Definition 3.3 X n is said to converge in rth mean to X, denote by X n r X, if E[ X n X r ] 0 as n for functions X n, X L r (P ), where X L r (P ) means X r dp <.

CHAPTER 3 LARGE SAMPLE THEORY 9 Convergence in distribution Definition 3.4 X n is said to converge in distribution of X, denoted by X n d X or F n d F (or L(X n ) L(X) with L referring to the law or distribution ), if the distribution functions F n and F of X n and X satisfy F n (x) F (x) as n for each continuity point x of F.

CHAPTER 3 LARGE SAMPLE THEORY 10 Uniform integrability Definition 3.5 A sequence of random variables {X n } is uniformly integrable if lim lim sup E { X n I( X n λ)} = 0. λ n

CHAPTER 3 LARGE SAMPLE THEORY 11 A note Convergence almost surely and convergence in probability are the same as we defined in measure theory. Two new definitions are convergence in rth mean convergence in distribution

CHAPTER 3 LARGE SAMPLE THEORY 12 convergence in distribution is very different from others example: a sequence X, Y, X, Y, X, Y,... where X and Y are N(0, 1); the sequence converges in distribution to N(0, 1) but the other modes do not hold. convergence in distribution is important for asymptotic statistical inference.

CHAPTER 3 LARGE SAMPLE THEORY 13 Relationship among different modes Theorem 3.1 A. If X n a.s. X, then X n p X. B. If X n p X, then X nk a.s. X for some subsequence X nk. C. If X n r X, then X n p X. D. If X n p X and X n r is uniformly integrable, then X n r X. E. If X n p X and lim sup n E X n r E X r, then X n r X.

CHAPTER 3 LARGE SAMPLE THEORY 14 F. If X n r X, then X n r X for any 0 < r r. G. If X n p X, then X n d X. H. X n p X if and only if for every subsequence {X nk } there exists a further subsequence {X nk,l} such that X nk,l a.s. X. I. If X n d c for a constant c, then X n p c.

CHAPTER 3 LARGE SAMPLE THEORY 15

CHAPTER 3 LARGE SAMPLE THEORY 16 Proof A and B follow from the results in the measure theory. Prove C. Markov inequality: for any increasing function g( ) and random variable Y, P ( Y > ϵ) E[ g( Y ) g(ϵ) ]. P ( X n X > ϵ) E[ X n X r ϵ r ] 0.

CHAPTER 3 LARGE SAMPLE THEORY 17 Prove D. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E X nk X r 0. For any subsequence of {X n }, from B, there exists a further subsequence {X nk } such that X nk a.s. X. For any ϵ, there exists λ such that lim sup nk E[ X nk r I( X nk r λ)] < ϵ. Particularly, choose λ such that P ( X r = λ) = 0 X nk r I( X nk r λ) a.s. X r I( X r λ). By the Fatou s Lemma, E[ X r I( X r λ)] lim sup n k E[ X nk r I( X nk r λ)] < ϵ.

CHAPTER 3 LARGE SAMPLE THEORY 18 E[ X nk X r ] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +E[ X nk X r I( X nk r 2λ, or, X r 2λ)] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +2 r E[( X nk r + X r )I( X nk r 2λ, or, X r 2λ)], where the last inequality follows from the inequality (x + y) r 2 r (max(x, y)) r 2 r (x r + y r ), x 0, y 0. When n k is large, the second term is bounded by 2 2 r {E[ X nk r I( X nk λ)] + E[ X r I( X λ)]} 2 r+1 ϵ. lim sup n E[ X nk X r ] 2 r+1 ϵ.

CHAPTER 3 LARGE SAMPLE THEORY 19 Prove E. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E[ X nk X r ] 0. For any subsequence of {X n }, there exists a further subsequence {X nk } such that X nk a.s. X. Define Y nk = 2 r ( X nk r + X r ) X nk X r 0. By the Fatou s Lemma, lim inf Y nk dp lim inf n k n k It is equivalent to Y nk dp. 2 r+1 E[ X r ] lim inf n k {2 r E[ X nk r ] + 2 r E[ X r ] E[ X nk X r ]}.

CHAPTER 3 LARGE SAMPLE THEORY 20 Prove F. The Hölder inequality: f(x)g(x) dµ { } 1/p { f(x) p dµ(x) g(x) p dµ(x)} 1/q, 1 p + 1 q = 1. Choose µ = P, f = X n X r, g 1 and p = r/r, q = r/(r r ) in the Hölder inequality E[ X n X r ] E[ X n X r ] r /r 0.

CHAPTER 3 LARGE SAMPLE THEORY 21 Prove G. X n p X. If P (X = x) = 0, then for any ϵ > 0, P ( I(X n x) I(X x) > ϵ) = P ( I(X n x) I(X x) > ϵ, X x > δ) +P ( I(X n x) I(X x) > ϵ, X x δ) P (X n x, X > x + δ) + P (X n > x, X < x δ) +P ( X x δ) P ( X n X > δ) + P ( X x δ). The first term converges to zero since X n p X. The second term can be arbitrarily small if δ is small, since lim δ 0 P ( X x δ) = P (X = x) = 0. I(X n x) p I(X x) F n (x) = E[I(X n x)] E[I(X x)] = F (x).

CHAPTER 3 LARGE SAMPLE THEORY 22 Prove H. One direction follows from B. To prove the other direction, use the contradiction. Suppose there exists ϵ > 0 such that P ( X n X > ϵ) does not converge to zero. find a subsequence {X n } such hat P ( X n X > ϵ) > δ for some δ > 0. However, by the condition, there exists a further subsequence X n such that X n a.s. X then X n p X from A. Contradiction!

CHAPTER 3 LARGE SAMPLE THEORY 23 Prove I. Let X c. P ( X n c > ϵ) 1 F n (c + ϵ) + F n (c ϵ) 1 F X (c + ϵ) + F (c ϵ) = 0.

CHAPTER 3 LARGE SAMPLE THEORY 24 Some counter-examples (Example 1 ) Suppose that X n is degenerate at a point 1/n; i.e., P (X n = 1/n) = 1. Then X n converges in distribution to zero. Indeed, X n converges almost surely.

CHAPTER 3 LARGE SAMPLE THEORY 25 (Example 2 ) X 1, X 2,... are i.i.d with standard normal distribution. Then X n d X 1 but X n does not converge in probability to X 1.

CHAPTER 3 LARGE SAMPLE THEORY 26 (Example 3 ) Let Z be a random variable with a uniform distribution in [0, 1]. Let X n = I(m2 k Z < (m + 1)2 k ) when n = 2 k + m where 0 m < 2 k. Then it is shown that X n converges in probability to zero but not almost surely. This example is already given in the second chapter.

CHAPTER 3 LARGE SAMPLE THEORY 27 (Example 4 ) Let Z be Uniform(0, 1) and let X n = 2 n I(0 Z < 1/n). Then E[ X n r ]] but X n converges to zero almost surely.

CHAPTER 3 LARGE SAMPLE THEORY 28 Result for convergence in rth mean Theorem 3.2 (Vitali s theorem) Suppose that X n L r (P ), i.e., X n r <, where 0 < r < and X n p X. Then the following are equivalent: A. { X n r } are uniformly integrable. B. X n r X. C. E[ X n r ] E[ X r ].

CHAPTER 3 LARGE SAMPLE THEORY 29 One sufficient condition for uniform integrability Liapunov condition: there exists a positive constant ϵ 0 such that lim sup n E[ X n r+ϵ 0 ] < E[ X n r I( X n r λ)] E[ X n r+ϵ 0 ] λ ϵ 0

CHAPTER 3 LARGE SAMPLE THEORY 30 Integral inequalities

CHAPTER 3 LARGE SAMPLE THEORY 31 Young s inequality ab a p p + b q q, a, b > 0, where the equality holds if and only if a = b. log x is concave: log( 1 p a p + 1 q b q ) 1 p log a p + 1 q log b. Geometric interpretation (insert figure here):

CHAPTER 3 LARGE SAMPLE THEORY 32 Hölder inequality f(x)g(x) dµ(x) { } 1 { } 1 f(x) p p dµ(x) g(x) q q dµ(x). in the Young s inequality, let a = f(x)/ { f(x) p dµ(x)} 1/p b = g(x)/ { g(x) q dµ(x)} 1/q. when µ = P and f = X(ω), g = 1, µ s t r where µ r = E[ X r ] and r s t 0. µ r s t µ r t s when p = q = 2, obtain Cauchy-Schwartz inequality: f(x)g(x) dµ(x) { } 1 { } 1 f(x) 2 2 dµ(x) g(x) 2 2 dµ(x).

CHAPTER 3 LARGE SAMPLE THEORY 33 Minkowski s inequality r > 1, derivation: X + Y r X r + Y r. E[ X + Y r ] E[( X + Y ) X + Y r 1 ] E[ X r ] 1/r E[ X+Y r ] 1 1/r +E[ Y r ] 1/r E[ X+Y r ] 1 1/r. r in fact is a norm in the linear space {X : X r < }. Such a normed space is denoted as L r (P ).

CHAPTER 3 LARGE SAMPLE THEORY 34 Markov s inequality P ( X ϵ) E[g( X )], g(ϵ) where g 0 is a increasing function in [0, ). Derivation: P ( X ϵ) P (g( X ) g(ϵ)) = E[I(g( X ) g(ϵ))] E[ g( X ) g(ϵ) ]. When g(x) = x 2 and X replaced by X µ, obtain Chebyshev s inequality: P ( X µ ϵ) V ar(x) ϵ 2.

CHAPTER 3 LARGE SAMPLE THEORY 35 Application of Vitali s theorem Y 1, Y 2,... are i.i.d with mean µ and variance σ 2. Let X n = Ȳn. By the Chebyshev s inequality, X n p µ. P ( X n µ > ϵ) V ar(x n) ϵ 2 = σ2 nϵ 2 0. From the Liapunov condition with r = 1 and ϵ 0 = 1, X n µ satisfies the uniform integrability condition E[ X n µ ] 0.

CHAPTER 3 LARGE SAMPLE THEORY 36 Convergence in Distribution

CHAPTER 3 LARGE SAMPLE THEORY 37 Convergence in distribution is the most important mode of convergence in statistical inference.

CHAPTER 3 LARGE SAMPLE THEORY 38 Equivalent conditions Theorem 3.3 (Portmanteau Theorem) The following conditions are equivalent. (a). X n converges in distribution to X. (b). For any bounded continuous function g( ), E[g(X n )] E[g(X)]. (c). For any open set G in R, lim inf n P (X n G) P (X G). (d). For any closed set F in R, lim sup n P (X n F ) P (X F ). (e). For any Borel set O in R with P (X O) = 0 where O is the boundary of O, P (X n O) P (X O).

CHAPTER 3 LARGE SAMPLE THEORY 39 Proof (a) (b). Without loss of generality, assume g(x) 1. We choose [ M, M] such that P ( X = M) = 0. Since g is continuous in [ M, M], g is uniformly continuous in [ M, M]. Partition [ M, M] into finite intervals I 1... I m such that within each interval I k, max Ik g(x) min Ik g(x) ϵ and X has no mass at all the endpoints of I k (why?).

CHAPTER 3 LARGE SAMPLE THEORY 40 Therefore, if choose any point x k I k, k = 1,..., m, E[g(X n )] E[g(X)] E[ g(x n ) I( X n > M)] + E[ g(x) I( X > M)] m + E[g(X n )I( X n M)] g(x k )P (X n I k ) + m g(x k )P (X n I k ) k=1 + E[g(X)I( X M)] k=1 m g(x k )P (X I k ) k=1 m g(x k )P (X I k ) k=1 P ( X n > M) + P ( X > M) m +2ϵ + P (X n I k ) P (X I k ). k=1 lim sup n E[g(X n )] E[g(X)] 2P ( X > M) + 2ϵ. Let M and ϵ 0.

CHAPTER 3 LARGE SAMPLE THEORY 41 ϵ (b) (c). For any open set G, define g(x) = 1 ϵ+d(x,g c ), where d(x, G c ) is the minimal distance between x and G c, inf y G c x y. For any y G c, d(x 1, G c ) x 2 y x 1 y x 2 y x 1 x 2, d(x 1, G c ) d(x 2, G c ) x 1 x 2. g(x 1 ) g(x 2 ) ϵ 1 d(x 1, G c ) d(x 2, G c ) ϵ 1 x 1 x 2. g(x) is continuous and bounded. E[g(X n )] E[g(X)]. Note 0 g(x) I G (x) lim inf P (X n G) lim inf n n E[g(X n )] E[g(X)]. Let ϵ 0 E[g(X)] converges to E[I(X G)] = P (X G). (c) (d). This is clear by taking complement of F.

CHAPTER 3 LARGE SAMPLE THEORY 42 (d) (e). For any O with P (X O) = 0, lim sup n P (X n O) lim sup n P (X n Ō) P (X Ō) = P (X O), lim inf n P (X n O) lim inf n P (X n O o ) P (X O o ) = P (X O). (e) (a). Choose O = (, x] with P (X O) = P (X = x) = 0.

CHAPTER 3 LARGE SAMPLE THEORY 43 Counter-examples Let g(x) = x, a continuous but unbounded function. Let X n be a random variable taking value n with probability 1/n and value 0 with probability (1 1/n). Then X n d 0. However, E[g(X)] = 1 does not converge to 0. The continuity at boundary in (e) is also necessary: let X n be degenerate at 1/n and consider O = {x : x > 0}. Then P (X n O) = 1 but X n d 0.

CHAPTER 3 LARGE SAMPLE THEORY 44 Weak Convergence and Characteristic Functions

CHAPTER 3 LARGE SAMPLE THEORY 45 Theorem 3.4 (Continuity Theorem) Let ϕ n and ϕ denote the characteristic functions of X n and X respectively. Then X n d X is equivalent to ϕ n (t) ϕ(t) for each t.

CHAPTER 3 LARGE SAMPLE THEORY 46 Proof To prove direction, from (b) in Theorem 3.1, ϕ n (t) = E[e itx n ] E[e itx ] = ϕ(t). The proof of direction consists of a few tricky constructions (skipped).

CHAPTER 3 LARGE SAMPLE THEORY 47 One simple example X 1,..., X n Bernoulli(p) ϕ Xn (t) = E[e it(x 1+...+X n )/n ] = (1 = p + pe it/n ) n = (1 p + p + itp/n + o(1/n)) n e itp. Note the limit is the c.f. of X = p. Thus, Xn d p so X n converges in probability to p.

CHAPTER 3 LARGE SAMPLE THEORY 48 Generalization to multivariate random vectors X n d X if and only if E[exp{it X n }] E[exp{it X }], where t is any k-dimensional constant Equivalently, t X n d t X for any t to study the weak convergence of random vectors, we can reduce to study the weak convergence of one-dimensional linear combination of the random vectors This is the well-known Cramér-Wold s device

CHAPTER 3 LARGE SAMPLE THEORY 49 Theorem 3.5 (The Cramér-Wold device) Random vector X n in R k satisfy X n d X if and only t X n d t X in R for all t R k.

CHAPTER 3 LARGE SAMPLE THEORY 50 Properties of Weak Convergence

CHAPTER 3 LARGE SAMPLE THEORY 51 Theorem 3.6 (Continuous mapping theorem) Suppose X n a.s. X, or X n p X, or X n d X. Then for any continuous function g( ), g(x n ) converges to g(x) almost surely, or in probability, or in distribution.

CHAPTER 3 LARGE SAMPLE THEORY 52 Proof If X n a.s. X, g(x n ) a.s g(x). If X n p X, then for any subsequence, there exists a further subsequence X nk a.s. X. Thus, g(x nk ) a.s. g(x). Then g(x n ) p g(x) from (H) in Theorem 3.1. To prove that g(x n ) d g(x) when X n d X, use (b) of Theorem 3.1.

CHAPTER 3 LARGE SAMPLE THEORY 53 One remark Theorem 3.6 concludes that g(x n ) d g(x) if X n d X and g is continuous. In fact, this result still holds if P (X C(g)) = 1 where C(g) contains all the continuity points of g. That is, if g s discontinuity points take zero probability of X, the continuous mapping theorem holds.

CHAPTER 3 LARGE SAMPLE THEORY 54 Theorem 3.7 (Slutsky theorem) Suppose X n d X, Y n p y and Z n p z for some constant y and z. Then Z n X n + T n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 55 Proof First show that X n + Y n d X + y. For any ϵ > 0, P (X n + Y n x) P (X n + Y n x, Y n y ϵ) + P ( Y n y > ϵ) P (X n x y + ϵ) + P ( Y n y > ϵ). lim sup n F Xn +Y n (x) lim sup n F Xn (x y + ϵ) F X (x y + ϵ).

CHAPTER 3 LARGE SAMPLE THEORY 56 On the other hand, P (X n + Y n > x) = P (X n + Y n > x, Y n y ϵ) + P ( Y n y > ϵ) P (X n > x y ϵ) + P ( Y n y > ϵ). lim sup(1 F Xn +Y n (x)) lim sup P (X n > x y ϵ) n n lim sup n P (X n x y 2ϵ) (1 F X (x y 2ϵ)). F X (x y 2ϵ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X (x + y + ϵ). F X+y (x ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X+y (x).

CHAPTER 3 LARGE SAMPLE THEORY 57 To complete the proof, P ( (Z n z)x n > ϵ) P ( Z n z > ϵ 2 )+P ( Z n z ϵ 2, X n > 1 ϵ ). lim sup n P ( (Z n z)x n > ϵ) lim sup n + lim sup n that (Z n z)x n p 0. P ( Z n z > ϵ 2 ) P ( X n 1 2ϵ ) P ( X 1 2ϵ ). Clearly zx n d zx Z n X n d zx from the proof in the first half. Again, using the first half s proof, Z n X n + Y n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 58 Examples Suppose X n d N(0, 1). Then by continuous mapping theorem, X 2 n d χ 2 1. This example shows that g can be discontinuous in Theorem 3.6. Let X n d X with X N(0, 1) and g(x) = 1/x. Although g(x) is discontinuous at origin, we can still show that 1/X n d 1/X, the reciprocal of the normal distribution. This is because P (X = 0) = 0. However, in Example 3.6 where g(x) = I(x > 0), it shows that Theorem 3.6 may not be true if P (X C(g)) < 1.

CHAPTER 3 LARGE SAMPLE THEORY 59 The condition Y n p y, where y is a constant, is necessary. For example, let X n = X Uniform(0, 1). Let Y n = X so Y n d X, where X is an independent random variable with the same distribution as X. However X n + Y n = 0 does not converge in distribution to X X.

CHAPTER 3 LARGE SAMPLE THEORY 60 Let X 1, X 2,... be a random sample from a normal distribution with mean µ and variance σ 2 > 0, n( Xn µ) d N(0, σ 2 ), s 2 n = 1 n 1 n i=1 (X i X n ) 2 a.s σ 2. n( Xn µ) 1 d s n σ N(0, σ2 ) = N(0, 1). in large sample, t n 1 can be approximated by a standard normal distribution.

CHAPTER 3 LARGE SAMPLE THEORY 61 Representation of Weak Convergence

CHAPTER 3 LARGE SAMPLE THEORY 62 Theorem 3.8 (Skorohod s Representation Theorem) Let {X n } and X be random variables in a probability space (Ω, A, P ) and X n d X. Then there exists another probability space ( Ω, Ã, P ) and a sequence of random variables X n and X defined on this space such that X n and X n have the same distributions, X and X have the same distributions, and moreover, Xn a.s. X.

CHAPTER 3 LARGE SAMPLE THEORY 63 Quantile function F 1 (p) = inf{x : F (x) p}. Proposition 3.1 (a) F 1 is left-continuous. (b) If X has continuous distribution function F, then F (X) Uniform(0, 1). (c) Let ξ Uniform(0, 1) and let X = F 1 (ξ). Then for all x, {X x} = {ξ F (x)}. Thus, X has distribution function F.

CHAPTER 3 LARGE SAMPLE THEORY 64 Proof (a) Clearly, F 1 is nondecreasing. Suppose p n increases to p then F 1 (p n ) increases to some y F 1 (p). Then F (y) p n so F (y) p. F 1 (p) y y = F 1 (p). (b) {X x} {F (X) F (x)} F (x) P (F (X) F (x)). {F (X) F (x) ϵ} {X x} P (F (X) F (x) ϵ) F (x) P (F (X) F (x) ) F (x). Then if X is continuous, P (F (X) F (x)) = F (x). (c) P (X x) = P (F 1 (ξ) x) = P (ξ F (x)) = F (x).

CHAPTER 3 LARGE SAMPLE THEORY 65 Proof Let ( Ω, Ã, P ) be ([0, 1], B [0, 1], λ). Define X n = Fn 1 (ξ), X = F 1 (ξ), where ξ Uniform(0, 1). Xn has a distribution F n which is the same as X n. For any t (0, 1) such that there is at most one value x such that F (x) = t (it is easy to see t is the continuous point of F 1 ), for any z < x, F (z) < t when n is large, F n (z) < t so Fn 1 (t) z. lim inf n F 1 n (t) z lim inf n F 1 n (t) x = F 1 (t). From F (x + ϵ) > t, F n (x + ϵ) > t so Fn 1 (t) x + ϵ. lim sup n Fn 1 (t) x + ϵ lim sup n Fn 1 (t) x. Thus F 1 n (t) F 1 (t) for almost every t (0, 1) X n a.s. X.

CHAPTER 3 LARGE SAMPLE THEORY 66 Usefulness of representation theorem For example, if X n d X and one wishes to show some function of X n, denote by g(x n ), converges in distribution to g(x): see the diagram in Figure 2.

CHAPTER 3 LARGE SAMPLE THEORY 67

CHAPTER 3 LARGE SAMPLE THEORY 68 Alternative Proof for Slutsky Theorem First, show (X n, Y n ) d (X, y). ϕ (Xn,Y n )(t 1, t 2 ) ϕ (X,y) (t 1, t 2 ) = E[e it 1X n e it 2Y n ] E[e it1x e it2y ] E[e it 1X n (e it 2Y n e it2y )] + e it2y E[e it 1X n ] E[e it1x ] E[ e it 2Y n e it2y ] + E[e it 1X n ] E[e it1x ] 0. Thus, (Z n, X n ) d (z, X). Since g(z, x) = zx is continuous, Z n X n d zx. Since (Z n X n, Y n ) d (zx, y) and g(x, y) = x + y is continuous, Z n X n + Y n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 69 Summation of Independent R.V.s

CHAPTER 3 LARGE SAMPLE THEORY 70 Some preliminary lemmas Proposition 3.2 (Borel-Cantelli Lemma) For any events A n, i=1 P (A n ) < implies P (A n, i.o.) = P ({A n } occurs infinitely often) = 0; or equivalently, P ( n=1 m n A m ) = 0. Proof P (A n, i.o) P ( m n A m ) m n P (A m ) 0, as n.

CHAPTER 3 LARGE SAMPLE THEORY 71 One result of the first Borel-cantelli lemma If for a sequence of random variables, {Z n }, and for any ϵ > 0, n P ( Z n > ϵ) <, then Z n > ϵ only occurs finite times. Z n a.s. 0.

CHAPTER 3 LARGE SAMPLE THEORY 72 Proposition 3.3 (Second Borel-Cantelli Lemma) For a sequence of independent events A 1, A 2,..., n=1 P (A n ) = implies P (A n, i.o.) = 1. Proof Consider the complement of {A n, i.o}. P ( n=1 m n A c m) = lim n P ( m n A c m) = lim n lim sup n m n exp{ m n P (A m )} = 0. (1 P (A m ))

CHAPTER 3 LARGE SAMPLE THEORY 73 Equivalence lemma Proposition 3.4 X 1,..., X n are i.i.d with finite mean. Define Y n = X n I( X n n). Then n=1 P (X n Y n ) <.

CHAPTER 3 LARGE SAMPLE THEORY 74 Proof Since E[ X 1 ] <, P ( X n) = n=1 np (n X < (n + 1)) n=1 E[ X ] <. n=1 From the Borel-Cantelli Lemma, P (X n Y n, i.o) = 0. For almost every ω Ω, when n is large enough, X n (ω) = Y n (ω).

CHAPTER 3 LARGE SAMPLE THEORY 75 Weak Law of Large Numbers

CHAPTER 3 LARGE SAMPLE THEORY 76 Theorem 3.9 (Weak Law of Large Number) If X, X 1,..., X n are i.i.d with mean µ (so E[ X ] < and µ = E[X]), then X n p µ.

CHAPTER 3 LARGE SAMPLE THEORY 77 Proof Define Y n = X n I( n X n n). Let µ n = n k=1 E[Y k]/n. P ( Ȳn µ n ϵ) V ar(ȳn) ϵ 2 n k=1 V ar(x ki( X k k)) n 2 ϵ 2. V ar(x k I( X k k)) E[X 2 ki( X k k)] = E[X 2 ki( X k k, X k kϵ 2 )] + E[X 2 ki( X k k, X kϵ 2 )] ke[ X k I( X k kϵ 2 )] + kϵ 4, n P ( Ȳn µ n ϵ) lim sup n P ( Ȳn µ n ϵ) ϵ 2 Ȳn µ n p 0. k=1 E[ X I( X kϵ 2 )] nϵ 2 + ϵ 2 n(n+1) 2n. 2 µ n µ Ȳn p µ. From Proposition 3.4 and subsequence arguments, X nk a.s. µ X n p µ.

CHAPTER 3 LARGE SAMPLE THEORY 78 Strong Law of Large Numbers

CHAPTER 3 LARGE SAMPLE THEORY 79 Theorem 3.10 (Strong Law of Large Number) If X 1,..., X n are i.i.d with mean µ then X n a.s. µ.

CHAPTER 3 LARGE SAMPLE THEORY 80 Proof Without loss of generality, we assume X n 0 since if this is true, the result also holds for any X n by X n = X + n X n. Similar to Theorem 3.9, it is sufficient to show Ȳn a.s. µ, where Y n = X n I(X n n). Note E[Y n ] = E[X 1 I(X 1 n)] µ so n E[Y k ]/n µ. k=1 if we denote S n = n k=1 (Y k E[Y k ]) and we can show S n /n a.s. 0, then the result holds.

CHAPTER 3 LARGE SAMPLE THEORY 81 V ar( S n n ) = V ar(y k ) k=1 By the Chebyshev s inequality, n k=1 E[Y 2 k ] ne[x 2 1 I(X 1 n)]. n=1 P ( S n n > ϵ) 1 n 2 ϵ 2 V ar( S n ) E[X2 1 I(X 1 n)] nϵ 2. For any α > 1, let u n = [α n ]. P ( S un > ϵ) u n n=1 1 u n ϵ 2 E[X2 1 I(X 1 u n )] 1 ϵ 2 E[X2 1 u n X 1 1 u n ]. Since for any x > 0, u n x {µ n} 1 < 2 n log x/ log α α n Kx 1 for some constant K, P ( S un > ϵ) K u n ϵ 2 E[X 1] <, n=1 S un /u n a.s. 0.

CHAPTER 3 LARGE SAMPLE THEORY 82 For any k, we can find u n < k u n+1. Thus, since X 1, X 2,... 0, S un u n µ/α lim inf k u n u n+1 S k k S un+1 u n+1 u n+1 u n. S k k lim sup k S k k µα. Since α is arbitrary number larger than 1, let α 1 and we obtain lim k Sk /k = µ.

CHAPTER 3 LARGE SAMPLE THEORY 83 Central Limit Theorems

CHAPTER 3 LARGE SAMPLE THEORY 84 Preliminary result of c.f. Proposition 3.5 Suppose E[ X m ] < for some integer m 0. Then ϕ X (t) m k=0 (it) k E[X k ] / t m 0, as t 0. k!

CHAPTER 3 LARGE SAMPLE THEORY 85 Proof e itx = m k=1 (itx) k k! + (itx)m m! [e itθx 1], where θ [0, 1]. ϕ X (t) as t 0. m k=0 (it) k E[X k ] / t m E[ X m e itθx 1 ]/m! 0, k!

CHAPTER 3 LARGE SAMPLE THEORY 86 Simple versions of CLT Theorem 3.11 (Central Limit Theorem) If X 1,..., X n are i.i.d with mean µ and variance σ 2 then n( Xn µ) d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 87 Proof Denote Y n = n( X n µ). ϕ Yn (t) = { ϕ X1 µ(t/ n) } n. ϕ X1 µ(t/ n) = 1 σ 2 t 2 /2n + o(1/n). ϕ Yn (t) exp{ σ2 t 2 2 }.

CHAPTER 3 LARGE SAMPLE THEORY 88 Theorem 3.12 (Multivariate Central Limit Theorem) If X 1,..., X n are i.i.d random vectors in R k with mean µ and covariance Σ = E[(X µ)(x µ) ], then n( X n µ) d N(0, Σ). Proof Use the Cramér-Wold s device.

CHAPTER 3 LARGE SAMPLE THEORY 89 Liaponov CLT Theorem 3.13 (Liapunov Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let µ n = n i=1 µ ni, σ 2 n = n i=1 σ 2 ni. If n i=1 E[ X ni µ ni 3 ] σ 3 n then n i=1 (X ni µ ni )/σ n d N(0, 1). 0,

CHAPTER 3 LARGE SAMPLE THEORY 90 Lindeberg-Feller CLT Theorem 3.14 (Lindeberg-Fell Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let σ 2 n = n i=1 σ 2 ni. Then both n i=1 (X ni µ ni )/σ n d N(0, 1) and max {σ 2 ni/σ 2 n : 1 i n} 0 if and only if the Lindeberg condition 1 σ 2 n n i=1 holds. E[ X ni µ ni 2 I( X ni µ ni ϵσ n )] 0, for all ϵ > 0

CHAPTER 3 LARGE SAMPLE THEORY 91 Proof of Liapunov CLT using Theorem 3.14 1 σ 2 n n i=1 E[ X nk µ nk 2 I( X nk µ nk > ϵσ n )] 1 ϵ 3 σ 3 n n k=1 E[ X nk µ nk 3 ].

CHAPTER 3 LARGE SAMPLE THEORY 92 Examples This is one example from a simple linear regression X j = α + βz j + ϵ j for j = 1, 2,... where z j are known numbers not all equal and the ϵ j are i.i.d with mean zero and variance σ 2. ˆβ n = n j=1 X j (z j z n )/ n j=1 (z j z n ) 2 = β + n j=1 ϵ j (z j z n )/ n j=1 (z j z n ) 2. Assume max j n (z j z n ) 2 / n j=1 (z j z n ) 2 0. n n j=1 (z j z n ) 2 n ( ˆβ n β) d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 93 The example is taken from the randomization test for paired comparison. Let (X j, Y j ) denote the values of jth pairs with X j being the result of the treatment and Z j = X j Y j. Conditional on Z j = z j, Z j = Z j sgn(z j ) is independent taking values ± Z j with probability 1/2, when treatment and control have no difference. Conditional on z 1, z 2,..., the randomization t-test is the t-statistic n 1 Z n /s z where s 2 z is 1/n n j=1 (Z j Z n ) 2. When max j n z2 j / n j=1 z 2 j 0, this statistic has an asymptotic normal distribution N(0, 1).

CHAPTER 3 LARGE SAMPLE THEORY 94 Delta Method

CHAPTER 3 LARGE SAMPLE THEORY 95 Theorem 3.15 (Delta method) For random vector X and X n in R k, if there exists two constant a n and µ such that a n (X n µ) d X and a n, then for any function g : R k R l such that g has a derivative at µ, denoted by g(µ) a n (g(x n ) g(µ)) d g(µ)x.

CHAPTER 3 LARGE SAMPLE THEORY 96 Proof By the Skorohod representation, we can construct X n and X such that X n d X n and X d X ( d means the same distribution) and a n ( X n µ) a.s. X. a n (g( X n ) g(µ)) a.s. g(µ) X a n (g(x n ) g(µ)) d g(µ)x

CHAPTER 3 LARGE SAMPLE THEORY 97 Examples Let X 1, X 2,... be i.i.d with fourth moment and s 2 n = (1/n) n i=1 (X i X n ) 2. Denote m k as the kth moment of X 1 for k 4. Note that s 2 n = (1/n) n i=1 Xi 2 ( n i=1 X i /n) 2 and [( ) ( )] X n n (1/n) m1 n i=1 X2 i m 2 ( ( )) m2 m 1 m 3 m 1 m 2 d N 0, m 3 m 1 m 2 m 4 m 2, 2 the Delta method with g(x, y) = y x 2 n(s 2 n V ar(x 1 )) d N(0, m 4 (m 2 m 2 1) 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 98 Let (X 1, Y 1 ), (X 2, Y 2 ),... be i.i.d bivariate samples with finite fourth moment. One estimate of the correlation among X and Y is ˆρ n = s xy, s 2 x s 2 y where s xy = (1/n) n i=1 (X i X n )(Y i Ȳn), s 2 x = (1/n) n i=1 (X i X n ) 2 and s 2 y = (1/n) n i=1 (Y i Ȳn) 2. To derive the large sample distribution of ˆρ n, first obtain the large sample distribution of (s xy, s 2 x, s 2 y) using the Delta method then further apply the Delta method with g(x, y, z) = x/ yz.

CHAPTER 3 LARGE SAMPLE THEORY 99 The example is taken from the Pearson s Chi-square statistic. Suppose that one subject falls into K categories with probabilities p 1,..., p K, where p 1 +... + p K = 1. The Pearson s statistic is defined as χ 2 = n K k=1 ( n k n p k) 2 /p k, which can be treated as (observed count expected count) 2 /expected count. Note n(n 1 /n p 1,..., n K /n p K ) has an asymptotic multivariate normal distribution. Then we can apply the Delta method to g(x 1,..., x K ) = K i=1 x 2 k.

CHAPTER 3 LARGE SAMPLE THEORY 100 U-statistics

CHAPTER 3 LARGE SAMPLE THEORY 101 Definition Definition 3.6 A U-statistics associated with h(x 1,..., x r ) is defined as U n = 1 r! ( n r) β h(x β1,..., X βr ), where the sum is taken over the set of all unordered subsets β of r different integers chosen from {1,..., n}.

CHAPTER 3 LARGE SAMPLE THEORY 102 Examples One simple example is h(x, y) = xy. Then U n = (n(n 1)) 1 i j X i X j. U n = E[ h(x 1,..., X r ) X (1),..., X (n) ]. U n is the summation of non-independent random variables. If define h(x 1,..., x r ) as (r!) 1 ( x 1,..., x r ) h( x 1,..., x r ), then h(x 1,..., x r ) is permutation-symmetric U n = 1 ( n r) β 1 <...<β r h(β 1,..., β r ). h is called the kernel of the U-statistic U n.

CHAPTER 3 LARGE SAMPLE THEORY 103 CLT for U-statistics Theorem 3.16 Let µ = E[h(X 1,..., X r )]. If E[h(X 1,..., X r ) 2 ] <, then n(un µ) n n i=1 E[U n µ X i ] p 0. Consequently, n(u n µ) is asymptotically normal with mean zero and variance r 2 σ 2, where, with X 1,..., X r, X 1,..., X r i.i.d variables, σ 2 = Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )).

CHAPTER 3 LARGE SAMPLE THEORY 104 Some preparation Linear space of r.v.:let S be a linear space of random variables with finite second moments that contain the constants; i.e., 1 S and for any X, Y S, ax + by S n where a and b are constants. Projection: for random variable T, a random variable S is called the projection of T on S if E[(T S) 2 ] minimizes E[(T S) 2 ], S S.

CHAPTER 3 LARGE SAMPLE THEORY 105 Proposition 3.7 Let S be a linear space of random variables with finite second moments. Then S is the projection of T on S if and only if S S and for any S S, E[(T S) S] = 0. Every two projections of T onto S are almost surely equal. If the linear space S contains the constant variable, then E[T ] = E[S] and Cov(T S, S) = 0 for every S S.

CHAPTER 3 LARGE SAMPLE THEORY 106 Proof For any S and S in S, E[(T S) 2 ] = E[(T S) 2 ] + 2E[(T S) S] + E[(S S) 2 ]. if S satisfies that E[(T S) S] = 0, then E[(T S) 2 ] E[(T S) 2 ]. S is the projection of T on S. If S is the projection, for any constant α, E[(T S α S) 2 ] is minimized at α = 0. Calculate the derivative at α = 0 E[(T S) S] = 0. If T has two projections S 1 and S 2, E[(S 1 S 2 ) 2 ] = 0. Thus, S 1 = S 2, a.s. If the linear space S contains the constant variable, choose S = 1 0 = E[(T S) S] = E[T ] E[S]. Clearly, Cov(T S, S) = E[(T S) S] = 0.

CHAPTER 3 LARGE SAMPLE THEORY 107 Equivalence with projection Proposition 3.8 Let S n be linear space of random variables with finite second moments that contain the constants. Let T n be random variables with projections S n on to S n. If V ar(t n )/V ar(s n ) 1 then Z n T n E[T n ] V ar(t n ) S n E[S n ] V ar(s n ) p 0.

CHAPTER 3 LARGE SAMPLE THEORY 108 Proof. E[Z n ] = 0. Note that Cov(T n, S n ) V ar(z n ) = 2 2 V ar(tn )V ar(s n ). Since S n is the projection of T n, Cov(T n, S n ) = Cov(T n S n, S n ) + V ar(s n ) = V ar(s n ). We have V ar(s n ) V ar(z n ) = 2(1 V ar(t n ) ) 0. By the Markov s inequality, we conclude that Z n p 0.

CHAPTER 3 LARGE SAMPLE THEORY 109 Conclusion if S n is the summation of i.i.d random variables such that (S n E[S n ])/ V ar(s n ) d N(0, σ 2 ), so is (T n E[T n ])/ V ar(t n ). The limit distribution of U-statistics is derived using this lemma.

CHAPTER 3 LARGE SAMPLE THEORY 110 Proof of CLT for U-statistics Proof Let X 1,..., X r be random variables with the same distribution as X 1 and they are independent of X 1,..., X n. Denote Ũn by n i=1 E[U µ X i]. We show that Ũn is the projection of U n on the linear space S n = { g 1 (X 1 ) +... + g n (X n ) : E[g k (X k ) 2 ] <, k = 1,..., n }, which contains the constant variables. Clearly, Ũn S n. For any g k (X k ) S n, E[(U n Ũn)g k (X k )] = E[E[U n Ũn X k ]g k (X k )] = 0.

CHAPTER 3 LARGE SAMPLE THEORY 111 Ũ n = = r n n i=1 ( n 1 r 1 ) ( n r) E[h( X 1,..., X r 1, X i ) µ X i ] n E[h( X 1,..., X r 1, X i ) µ X i ]. i=1 V ar(ũn) = r2 n 2 n E[(E[h( X 1,..., X r 1, X i ) µ X i ]) 2 ] i=1 = r2 n Cov(E[h( X 1,..., X r 1, X 1 ) X 1 ], E[h( X 1,..., X r 1, X 1 ) X 1 ]) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2..., X r )) = r2 σ 2 n.

CHAPTER 3 LARGE SAMPLE THEORY 112 Furthermore, = = V ar(u n ) ( n r ( n r ) 2 β ) 2 r k=1 β Cov(h(X β1,..., X βr ), h(x β 1,..., X β r )) β and β share k components Cov(h(X 1, X 2,.., X k, X k+1,..., X r ), h(x 1, X 2,..., X k, X k+1,..., X r )). V ar(u n ) = r k=1 r! k!(r k)! (n r)(n r+1) (n 2r+k+1) n(n 1) (n r+1) c k. V ar(u n ) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )) + O( 1 n 2 ). V ar(u n )/V ar(ũn) 1. U n µ V ar(un ) Ũ n V ar(ũn) p 0.

CHAPTER 3 LARGE SAMPLE THEORY 113 Example In a bivariate i.i.d sample (X 1, Y 1 ), (X 2, Y 2 ),..., one statistic of measuring the agreement is called Kendall s τ-statistic ˆτ = 4 n(n 1) i<j I {(Y j Y i )(X j X i ) > 0} 1. ˆτ + 1 is a U-statistic of order 2 with the kernel 2I {(y 2 y 1 )(x 2 x 1 ) > 0}. n(ˆτ n + 1 2P ((Y 2 Y 1 )(X 2 X 1 ) > 0)) has an asymptotic normal distribution with mean zero.

CHAPTER 3 LARGE SAMPLE THEORY 114 Rank Statistics

CHAPTER 3 LARGE SAMPLE THEORY 115 Some definitions X (1) X (2)... X (n) is called order statistics The rank statistics, denoted by R 1,..., R n are the ranks of X i among X 1,..., X n. Thus, if all the X s are different, X i = X (Ri ). When there are ties, R i is defined as the average of all indices such that X i = X (j) (sometimes called midrank). Only consider the case that X s have continuous densities.

CHAPTER 3 LARGE SAMPLE THEORY 116 More definitions a rank statistic is any function of the ranks a linear rank statistic is a rank statistic of the special form n i=1 a(i, R i ) for a given matrix (a(i, j)) n n. if a(i, j) = c i a j, then such statistic with form ni=1 c i a Ri is called simple linear rank statistic: c and a s are called the coefficients and scores.

CHAPTER 3 LARGE SAMPLE THEORY 117 Examples In two independent sample X 1,..., X n and Y 1,..., Y m, a Wilcoxon statistic is defined as the summation of all the ranks of the second sample in the pooled data X 1,..., X n, Y 1,..., Y m, i.e., W n = n+m i=n+1 R i. Other choices for rank statistics: for instance, the van der Waerden statistic n+m i=n+1 Φ 1 (R i ).

CHAPTER 3 LARGE SAMPLE THEORY 118 Properties of rank statistics Proposition 3.9 Let X 1,..., X n be a random sample from continuous distribution function F with density f. Then 1. the vectors (X (1),..., X (n) ) and (R 1,..., R n ) are independent; 2. the vector (X (1),..., X (n) ) has density n! n i=1 f(x i ) on the set x 1 <... < x n ; 3. the variable X (i) has density ) F (x) i 1 (1 F (x)) n i f(x); for F the uniform ( n 1 i 1 distribution on [0, 1], it has mean i/(n + 1) and variance i(n i + 1)/[(n + 1) 2 (n + 2)];

CHAPTER 3 LARGE SAMPLE THEORY 119 4. the vector (R 1,..., R n ) is uniformly distributed on the set of all n! permutations of 1, 2,..., n; 5. for any statistic T and permutation r = (r 1,..., r n ) of 1, 2,..., n, E[T (X 1,..., X n ) (R 1,.., R n ) = r] = E[T (X (r1 ),.., X (rn ))]; 6. for any simple linear rank statistic T = n i=1 c i a Ri, E[T ] = n c n ā n, V ar(t ) = 1 n 1 n n (c i c n ) 2 (a i ā n ) 2. i=1 i=1

CHAPTER 3 LARGE SAMPLE THEORY 120 CLT of rank statistics Theorem 3.17 Let T n = n i=1 c i a Ri such that max a i ā n / n (a i ā n ) 2 0, max c i c n / n (c i c n ) 2 0. i n i n i=1 Then (T n E[T n ])/ for every ϵ > 0, (i,j) i=1 V ar(t n ) d N(0, 1) if and only if { n a i ā n c i c n I n i=1 (a i ā n ) 2 n i=1 (c i c n ) 2 > ϵ a i ā n 2 c i c n 2 n i=1 (a i ā n ) 2 n i=1 (c i c n ) 0. 2 }

CHAPTER 3 LARGE SAMPLE THEORY 121 More on rank statistics a simple linear signed rank statistic n i=1 a R + sign(x i ), i where R + 1,..., R + n, absolute rank, are the ranks of X 1,..., X n. In a bivariate sample (X 1, Y 1 ),..., (X n, Y n ), ni=1 a Ri b Si where (R 1,..., R n ) and (S 1,..., S n ) are respective ranks of (X 1,..., X n ) and (Y 1,..., Y n ).

CHAPTER 3 LARGE SAMPLE THEORY 122 Martingales

CHAPTER 3 LARGE SAMPLE THEORY 123 Definition 3.7 Let {Y n } be a sequence of random variables and F n be sequence of σ-fields such that F 1 F 2... Suppose E[ Y n ] <. Then the pairs {(Y n, F n )} is called a martingale if E[Y n F n 1 ] = Y n 1, a.s. {(Y n, F n )} is a submartingale if E[Y n F n 1 ] Y n 1, a.s. {(Y n, F n )} is a supmartingale if E[Y n F n 1 ] Y n 1, a.s.

CHAPTER 3 LARGE SAMPLE THEORY 124 Some notes on definition Y 1,..., Y n are measurable in F n. Sometimes, we say Y n is adapted to F n. One simple example: Y n = X 1 +... + X n, where X 1, X 2,... are i.i.d with mean zero, and F n is the σ-filed generated by X 1,..., X n.

CHAPTER 3 LARGE SAMPLE THEORY 125 Convex function of martingales Proposition 3.9 Let {(Y n, F n )} be a martingale. For any measurable and convex function ϕ, {(ϕ(y n ), F n )} is a submartingale.

CHAPTER 3 LARGE SAMPLE THEORY 126 Proof Clearly, ϕ(y n ) is adapted to F n. It is sufficient to show E[ϕ(Y n ) F n 1 ] ϕ(y n 1 ). This follows from the well-known Jensen s inequality: for any convex function ϕ, E[ϕ(Y n ) F n 1 ] ϕ(e[y n F n 1 ]) = ϕ(y n 1 ).

CHAPTER 3 LARGE SAMPLE THEORY 127 Jensen s inequality Proposition 3.10 For any random variable X and any convex measurable function ϕ, E[ϕ(X)] ϕ(e[x]).

CHAPTER 3 LARGE SAMPLE THEORY 128 Proof Claim that for any x 0, there exists a constant k 0 such that for any x, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ). By the convexity, for any x < y < x 0 < y < x, ϕ(x 0 ) ϕ(x ) x 0 x ϕ(y) ϕ(x 0) y x 0 ϕ(x) ϕ(x 0) x x 0. Thus, ϕ(x) ϕ(x 0) x x 0 is bounded and decreasing as x decreases to x 0. Let the limit be k 0 + ϕ(x) ϕ(x 0) x x 0 k 0 +. ϕ(x) k 0 + (x x 0) + ϕ(x 0 ).

CHAPTER 3 LARGE SAMPLE THEORY 129 Similarly, ϕ(x ) ϕ(x 0 ) x x 0 ϕ(y ) ϕ(x 0 ) y x 0 ϕ(x) ϕ(x 0) x x 0. Then ϕ(x ) ϕ(x 0 ) x x 0 is increasing and bounded as x increases to x 0. Let the limit be k0 ϕ(x ) k0 (x x 0 ) + ϕ(x 0 ). Clearly, k + 0 k 0. Combining those two inequalities, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ) for k 0 = (k + 0 + k 0 )/2. Choose x 0 = E[X] then ϕ(x) ϕ(e[x]) + k 0 (X E[X]).

CHAPTER 3 LARGE SAMPLE THEORY 130 Decomposition of submartingale Y n = (Y n E[Y n F n 1 ]) + E[Y n F n 1 ] any submartingale can be written as the summation of a martingale and a random variable predictable in F n 1.

CHAPTER 3 LARGE SAMPLE THEORY 131 Convergence of martingales Theorem 3.18 (Martingale Convergence Theorem) Let {(X n, F n )} be submartingale. If K = sup n E[ X n ] <, then X n a.s. X where X is a random variable satisfying E[ X ] K.

CHAPTER 3 LARGE SAMPLE THEORY 132 Corollary 3.1 If F n is increasing σ-field and denote F as the σ-field generated by n=1f n, then for any random variable Z with E[ Z ] <, it holds E[Z F n ] a.s. E[Z F ].

CHAPTER 3 LARGE SAMPLE THEORY 133 CLT for martingale Theorem 3.19 (Martingale Central Limit Theorem) Let (Y n1, F n1 ), (Y n2, F n2 ),... be a martingale. Define X nk = Y nk Y n,k 1 with Y n0 = 0 thus Y nk = X n1 +... + X nk. Suppose that k E[X 2 nk F n,k 1 ] p σ 2 where σ is a positive constant and that for each ϵ > 0. Then k E[X 2 nki( X nk ϵ) F n,k 1 ] p 0 k X nk d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 134 Some Notation

CHAPTER 3 LARGE SAMPLE THEORY 135 o p (1) and O p (1) X n = o p (1) denotes that X n converges in probability to zero, X n = O p (1) denotes that X n is bounded in probability; i.e., lim lim sup P ( X n M) = 0. M n for a sequence of random variable {r n }, X n = o p (r n ) means that X n /r n p 0 and X n = O p (r n ) means that X n /r n is bounded in probability.

CHAPTER 3 LARGE SAMPLE THEORY 136 Algebra in o p (1) and O p (1) o p (1) + o p (1) = o p (1) O p (1) + O p (1) = O p (1), O p (1)o p (1) = o p (1) (1 + o p (1)) 1 = 1 + o p (1) o p (R n ) = R n o p (1) O p (R n ) = R n O p (1) o p (O p (1)) = o p (1). If a real function R( ) satisfies that R(h) = o( h p ) as h 0, R(X n ) = o p ( X n p ). If R(h) = O( h p ) as h 0, R(X n ) = O p ( X n p ).

CHAPTER 3 LARGE SAMPLE THEORY 137 READING MATERIALS: Lehmann and Casella, Section 1.8, Ferguson, Part 1, Part 2, Part 3 12-15