CHAPTER 3: LARGE SAMPLE THEORY

Similar documents
Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Probability and Measure

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Convergence in Distribution

7 Convergence in R d and in Metric Spaces

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Useful Probability Theorems

P (A G) dp G P (A G)

STA205 Probability: Week 8 R. Wolpert

Selected Exercises on Expectations and Some Probability Inequalities

3 Integration and Expectation

Asymptotic Statistics-III. Changliang Zou

STAT 7032 Probability Spring Wlodek Bryc

4 Expectation & the Lebesgue Theorems

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Lecture 22: Variance and Covariance

Lecture 5: Expectation

Asymptotic Statistics-VI. Changliang Zou

STOR 635 Notes (S13)

THEOREMS, ETC., FOR MATH 515

A D VA N C E D P R O B A B I L - I T Y

Gaussian vectors and central limit theorem

18.175: Lecture 3 Integration

Product measure and Fubini s theorem

Probability Theory. Richard F. Bass

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

Probability and Measure

Part II Probability and Measure

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

On the convergence of sequences of random variables: A primer

Lecture 4 Lebesgue spaces and inequalities

Lectures 22-23: Conditional Expectations

Lecture 6 Basic Probability

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lecture 21: Convergence of transformations and generating a random variable

1. Probability Measure and Integration Theory in a Nutshell

1 Probability space and random variables

The main results about probability measures are the following two facts:

Formulas for probability theory and linear models SF2941

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

The Theory of Statistics and Its Applications

Chapter 5. Weak convergence

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

1* (10 pts) Let X be a random variable with P (X = 1) = P (X = 1) = 1 2

Stochastic Models (Lecture #4)

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Exercises Measure Theoretic Probability

Notes on Random Vectors and Multivariate Normal

Real Analysis Notes. Thomas Goller

STAT 512 sp 2018 Summary Sheet

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

Analysis Comprehensive Exam Questions Fall 2008

Lecture Notes on Asymptotic Statistics. Changliang Zou

Probability Background

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Elementary Probability. Exam Number 38119

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

Problem set 1, Real Analysis I, Spring, 2015.

Multivariate Random Variable

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

Probability and Measure

1. Stochastic Processes and filtrations

Qualifying Exam in Probability and Statistics.

4 Sums of Independent Random Variables

Math 832 Fall University of Wisconsin at Madison. Instructor: David F. Anderson

Brownian Motion and Conditional Probability

the convolution of f and g) given by

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Theoretical Statistics. Lecture 1.

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Advanced Probability

Chapter 5. Measurable Functions

µ X (A) = P ( X 1 (A) )

Qualifying Exam in Probability and Statistics.

STAT 200C: High-dimensional Statistics

Measure-theoretic probability

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Advanced Probability Theory (Math541)

Actuarial Science Exam 1/P

Real Analysis Problems

Regression and Statistical Inference

5 Operations on Multiple Random Variables

(A n + B n + 1) A n + B n

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Multiple Random Variables

Elements of Probability Theory

MATHS 730 FC Lecture Notes March 5, Introduction

Exercises in Extreme value theory

Dalal-Schmutz (2002) and Diaconis-Freedman (1999) 1. Random Compositions. Evangelos Kranakis School of Computer Science Carleton University

Transcription:

CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction

CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually difficult and complicated large sample theory studies the limit behavior of a sequence of random variables, say X n. example: Xn µ, n( X n µ)

CHAPTER 3 LARGE SAMPLE THEORY 4 Modes of Convergence

CHAPTER 3 LARGE SAMPLE THEORY 5 Convergence almost surely Definition 3.1 X n is said to converge almost surely to X, denoted by X n a.s. X, if there exists a set A Ω such that P (A c ) = 0 and for each ω A, X n (ω) X(ω) in real space.

CHAPTER 3 LARGE SAMPLE THEORY 6 Equivalent condition {ω : X n (ω) X(ω)} c = ϵ>0 n {ω : sup X m (ω) X(ω) > ϵ} m n X n a.s. X iff P (sup X m X > ϵ) 0 m n

CHAPTER 3 LARGE SAMPLE THEORY 7 Convergence in probability Definition 3.2 X n is said to converge in probability to X, denoted by X n p X, if for every ϵ > 0, P ( X n X > ϵ) 0.

CHAPTER 3 LARGE SAMPLE THEORY 8 Convergence in moments/means Definition 3.3 X n is said to converge in rth mean to X, denote by X n r X, if E[ X n X r ] 0 as n for functions X n, X L r (P ), where X L r (P ) means X r dp <.

CHAPTER 3 LARGE SAMPLE THEORY 9 Convergence in distribution Definition 3.4 X n is said to converge in distribution of X, denoted by X n d X or F n d F (or L(X n ) L(X) with L referring to the law or distribution ), if the distribution functions F n and F of X n and X satisfy F n (x) F (x) as n for each continuity point x of F.

CHAPTER 3 LARGE SAMPLE THEORY 10 Uniform integrability Definition 3.5 A sequence of random variables {X n } is uniformly integrable if lim lim sup E { X n I( X n λ)} = 0. λ n

CHAPTER 3 LARGE SAMPLE THEORY 11 A note Convergence almost surely and convergence in probability are the same as we defined in measure theory. Two new definitions are convergence in rth mean convergence in distribution

CHAPTER 3 LARGE SAMPLE THEORY 12 convergence in distribution is very different from others example: a sequence X, Y, X, Y, X, Y,... where X and Y are N(0, 1); the sequence converges in distribution to N(0, 1) but the other modes do not hold. convergence in distribution is important for asymptotic statistical inference.

CHAPTER 3 LARGE SAMPLE THEORY 13 Relationship among different modes Theorem 3.1 A. If X n a.s. X, then X n p X. B. If X n p X, then X nk a.s. X for some subsequence X nk. C. If X n r X, then X n p X. D. If X n p X and X n r is uniformly integrable, then X n r X. E. If X n p X and lim sup n E X n r E X r, then X n r X.

CHAPTER 3 LARGE SAMPLE THEORY 14 F. If X n r X, then X n r X for any 0 < r r. G. If X n p X, then X n d X. H. X n p X if and only if for every subsequence {X nk } there exists a further subsequence {X nk,l} such that X nk,l a.s. X. I. If X n d c for a constant c, then X n p c.

CHAPTER 3 LARGE SAMPLE THEORY 15

CHAPTER 3 LARGE SAMPLE THEORY 16 Proof A and B follow from the results in the measure theory. Prove C. Markov inequality: for any increasing function g( ) and random variable Y, P ( Y > ϵ) E[ g( Y ) g(ϵ) ]. P ( X n X > ϵ) E[ X n X r ϵ r ] 0.

CHAPTER 3 LARGE SAMPLE THEORY 17 Prove D. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E X nk X r 0. For any subsequence of {X n }, from B, there exists a further subsequence {X nk } such that X nk a.s. X. For any ϵ, there exists λ such that lim sup nk E[ X nk r I( X nk r λ)] < ϵ. Particularly, choose λ such that P ( X r = λ) = 0 X nk r I( X nk r λ) a.s. X r I( X r λ). By the Fatou s Lemma, E[ X r I( X r λ)] lim sup n k E[ X nk r I( X nk r λ)] < ϵ.

CHAPTER 3 LARGE SAMPLE THEORY 18 E[ X nk X r ] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +E[ X nk X r I( X nk r 2λ, or, X r 2λ)] E[ X nk X r I( X nk r < 2λ, X r < 2λ)] +2 r E[( X nk r + X r )I( X nk r 2λ, or, X r 2λ)], where the last inequality follows from the inequality (x + y) r 2 r (max(x, y)) r 2 r (x r + y r ), x 0, y 0. When n k is large, the second term is bounded by 2 2 r {E[ X nk r I( X nk λ)] + E[ X r I( X λ)]} 2 r+1 ϵ. lim sup n E[ X nk X r ] 2 r+1 ϵ.

CHAPTER 3 LARGE SAMPLE THEORY 19 Prove E. It is sufficient to show that for any subsequence of {X n }, there exists a further subsequence {X nk } such that E[ X nk X r ] 0. For any subsequence of {X n }, there exists a further subsequence {X nk } such that X nk a.s. X. Define Y nk = 2 r ( X nk r + X r ) X nk X r 0. By the Fatou s Lemma, lim inf Y nk dp lim inf n k n k It is equivalent to Y nk dp. 2 r+1 E[ X r ] lim inf n k {2 r E[ X nk r ] + 2 r E[ X r ] E[ X nk X r ]}.

CHAPTER 3 LARGE SAMPLE THEORY 20 Prove F. The Hölder inequality: f(x)g(x) dµ { } 1/p { f(x) p dµ(x) g(x) p dµ(x)} 1/q, 1 p + 1 q = 1. Choose µ = P, f = X n X r, g 1 and p = r/r, q = r/(r r ) in the Hölder inequality E[ X n X r ] E[ X n X r ] r /r 0.

CHAPTER 3 LARGE SAMPLE THEORY 21 Prove G. X n p X. If P (X = x) = 0, then for any ϵ > 0, P ( I(X n x) I(X x) > ϵ) = P ( I(X n x) I(X x) > ϵ, X x > δ) +P ( I(X n x) I(X x) > ϵ, X x δ) P (X n x, X > x + δ) + P (X n > x, X < x δ) +P ( X x δ) P ( X n X > δ) + P ( X x δ). The first term converges to zero since X n p X. The second term can be arbitrarily small if δ is small, since lim δ 0 P ( X x δ) = P (X = x) = 0. I(X n x) p I(X x) F n (x) = E[I(X n x)] E[I(X x)] = F (x).

CHAPTER 3 LARGE SAMPLE THEORY 22 Prove H. One direction follows from B. To prove the other direction, use the contradiction. Suppose there exists ϵ > 0 such that P ( X n X > ϵ) does not converge to zero. find a subsequence {X n } such hat P ( X n X > ϵ) > δ for some δ > 0. However, by the condition, there exists a further subsequence X n such that X n a.s. X then X n p X from A. Contradiction!

CHAPTER 3 LARGE SAMPLE THEORY 23 Prove I. Let X c. P ( X n c > ϵ) 1 F n (c + ϵ) + F n (c ϵ) 1 F X (c + ϵ) + F (c ϵ) = 0.

CHAPTER 3 LARGE SAMPLE THEORY 24 Some counter-examples (Example 1 ) Suppose that X n is degenerate at a point 1/n; i.e., P (X n = 1/n) = 1. Then X n converges in distribution to zero. Indeed, X n converges almost surely.

CHAPTER 3 LARGE SAMPLE THEORY 25 (Example 2 ) X 1, X 2,... are i.i.d with standard normal distribution. Then X n d X 1 but X n does not converge in probability to X 1.

CHAPTER 3 LARGE SAMPLE THEORY 26 (Example 3 ) Let Z be a random variable with a uniform distribution in [0, 1]. Let X n = I(m2 k Z < (m + 1)2 k ) when n = 2 k + m where 0 m < 2 k. Then it is shown that X n converges in probability to zero but not almost surely. This example is already given in the second chapter.

CHAPTER 3 LARGE SAMPLE THEORY 27 (Example 4 ) Let Z be Uniform(0, 1) and let X n = 2 n I(0 Z < 1/n). Then E[ X n r ]] but X n converges to zero almost surely.

CHAPTER 3 LARGE SAMPLE THEORY 28 Result for convergence in rth mean Theorem 3.2 (Vitali s theorem) Suppose that X n L r (P ), i.e., X n r <, where 0 < r < and X n p X. Then the following are equivalent: A. { X n r } are uniformly integrable. B. X n r X. C. E[ X n r ] E[ X r ].

CHAPTER 3 LARGE SAMPLE THEORY 29 One sufficient condition for uniform integrability Liapunov condition: there exists a positive constant ϵ 0 such that lim sup n E[ X n r+ϵ 0 ] < E[ X n r I( X n r λ)] E[ X n r+ϵ 0 ] λ ϵ 0

CHAPTER 3 LARGE SAMPLE THEORY 30 Integral inequalities

CHAPTER 3 LARGE SAMPLE THEORY 31 Young s inequality ab a p p + b q q, a, b > 0, where the equality holds if and only if a = b. log x is concave: log( 1 p a p + 1 q b q ) 1 p log a p + 1 q log b. Geometric interpretation (insert figure here):

CHAPTER 3 LARGE SAMPLE THEORY 32 Hölder inequality f(x)g(x) dµ(x) { } 1 { } 1 f(x) p p dµ(x) g(x) q q dµ(x). in the Young s inequality, let a = f(x)/ { f(x) p dµ(x)} 1/p b = g(x)/ { g(x) q dµ(x)} 1/q. when µ = P and f = X(ω), g = 1, µ s t r where µ r = E[ X r ] and r s t 0. µ r s t µ r t s when p = q = 2, obtain Cauchy-Schwartz inequality: f(x)g(x) dµ(x) { } 1 { } 1 f(x) 2 2 dµ(x) g(x) 2 2 dµ(x).

CHAPTER 3 LARGE SAMPLE THEORY 33 Minkowski s inequality r > 1, derivation: X + Y r X r + Y r. E[ X + Y r ] E[( X + Y ) X + Y r 1 ] E[ X r ] 1/r E[ X+Y r ] 1 1/r +E[ Y r ] 1/r E[ X+Y r ] 1 1/r. r in fact is a norm in the linear space {X : X r < }. Such a normed space is denoted as L r (P ).

CHAPTER 3 LARGE SAMPLE THEORY 34 Markov s inequality P ( X ϵ) E[g( X )], g(ϵ) where g 0 is a increasing function in [0, ). Derivation: P ( X ϵ) P (g( X ) g(ϵ)) = E[I(g( X ) g(ϵ))] E[ g( X ) g(ϵ) ]. When g(x) = x 2 and X replaced by X µ, obtain Chebyshev s inequality: P ( X µ ϵ) V ar(x) ϵ 2.

CHAPTER 3 LARGE SAMPLE THEORY 35 Application of Vitali s theorem Y 1, Y 2,... are i.i.d with mean µ and variance σ 2. Let X n = Ȳn. By the Chebyshev s inequality, X n p µ. P ( X n µ > ϵ) V ar(x n) ϵ 2 = σ2 nϵ 2 0. From the Liapunov condition with r = 1 and ϵ 0 = 1, X n µ satisfies the uniform integrability condition E[ X n µ ] 0.

CHAPTER 3 LARGE SAMPLE THEORY 36 Convergence in Distribution

CHAPTER 3 LARGE SAMPLE THEORY 37 Convergence in distribution is the most important mode of convergence in statistical inference.

CHAPTER 3 LARGE SAMPLE THEORY 38 Equivalent conditions Theorem 3.3 (Portmanteau Theorem) The following conditions are equivalent. (a). X n converges in distribution to X. (b). For any bounded continuous function g( ), E[g(X n )] E[g(X)]. (c). For any open set G in R, lim inf n P (X n G) P (X G). (d). For any closed set F in R, lim sup n P (X n F ) P (X F ). (e). For any Borel set O in R with P (X O) = 0 where O is the boundary of O, P (X n O) P (X O).

CHAPTER 3 LARGE SAMPLE THEORY 39 Proof (a) (b). Without loss of generality, assume g(x) 1. We choose [ M, M] such that P ( X = M) = 0. Since g is continuous in [ M, M], g is uniformly continuous in [ M, M]. Partition [ M, M] into finite intervals I 1... I m such that within each interval I k, max Ik g(x) min Ik g(x) ϵ and X has no mass at all the endpoints of I k (why?).

CHAPTER 3 LARGE SAMPLE THEORY 40 Therefore, if choose any point x k I k, k = 1,..., m, E[g(X n )] E[g(X)] E[ g(x n ) I( X n > M)] + E[ g(x) I( X > M)] m + E[g(X n )I( X n M)] g(x k )P (X n I k ) + m g(x k )P (X n I k ) k=1 + E[g(X)I( X M)] k=1 m g(x k )P (X I k ) k=1 m g(x k )P (X I k ) k=1 P ( X n > M) + P ( X > M) m +2ϵ + P (X n I k ) P (X I k ). k=1 lim sup n E[g(X n )] E[g(X)] 2P ( X > M) + 2ϵ. Let M and ϵ 0.

CHAPTER 3 LARGE SAMPLE THEORY 41 ϵ (b) (c). For any open set G, define g(x) = 1 ϵ+d(x,g c ), where d(x, G c ) is the minimal distance between x and G c, inf y G c x y. For any y G c, d(x 1, G c ) x 2 y x 1 y x 2 y x 1 x 2, d(x 1, G c ) d(x 2, G c ) x 1 x 2. g(x 1 ) g(x 2 ) ϵ 1 d(x 1, G c ) d(x 2, G c ) ϵ 1 x 1 x 2. g(x) is continuous and bounded. E[g(X n )] E[g(X)]. Note 0 g(x) I G (x) lim inf P (X n G) lim inf n n E[g(X n )] E[g(X)]. Let ϵ 0 E[g(X)] converges to E[I(X G)] = P (X G). (c) (d). This is clear by taking complement of F.

CHAPTER 3 LARGE SAMPLE THEORY 42 (d) (e). For any O with P (X O) = 0, lim sup n P (X n O) lim sup n P (X n Ō) P (X Ō) = P (X O), lim inf n P (X n O) lim inf n P (X n O o ) P (X O o ) = P (X O). (e) (a). Choose O = (, x] with P (X O) = P (X = x) = 0.

CHAPTER 3 LARGE SAMPLE THEORY 43 Counter-examples Let g(x) = x, a continuous but unbounded function. Let X n be a random variable taking value n with probability 1/n and value 0 with probability (1 1/n). Then X n d 0. However, E[g(X)] = 1 does not converge to 0. The continuity at boundary in (e) is also necessary: let X n be degenerate at 1/n and consider O = {x : x > 0}. Then P (X n O) = 1 but X n d 0.

CHAPTER 3 LARGE SAMPLE THEORY 44 Weak Convergence and Characteristic Functions

CHAPTER 3 LARGE SAMPLE THEORY 45 Theorem 3.4 (Continuity Theorem) Let ϕ n and ϕ denote the characteristic functions of X n and X respectively. Then X n d X is equivalent to ϕ n (t) ϕ(t) for each t.

CHAPTER 3 LARGE SAMPLE THEORY 46 Proof To prove direction, from (b) in Theorem 3.1, ϕ n (t) = E[e itx n ] E[e itx ] = ϕ(t). The proof of direction consists of a few tricky constructions (skipped).

CHAPTER 3 LARGE SAMPLE THEORY 47 One simple example X 1,..., X n Bernoulli(p) ϕ Xn (t) = E[e it(x 1+...+X n )/n ] = (1 = p + pe it/n ) n = (1 p + p + itp/n + o(1/n)) n e itp. Note the limit is the c.f. of X = p. Thus, Xn d p so X n converges in probability to p.

CHAPTER 3 LARGE SAMPLE THEORY 48 Generalization to multivariate random vectors X n d X if and only if E[exp{it X n }] E[exp{it X }], where t is any k-dimensional constant Equivalently, t X n d t X for any t to study the weak convergence of random vectors, we can reduce to study the weak convergence of one-dimensional linear combination of the random vectors This is the well-known Cramér-Wold s device

CHAPTER 3 LARGE SAMPLE THEORY 49 Theorem 3.5 (The Cramér-Wold device) Random vector X n in R k satisfy X n d X if and only t X n d t X in R for all t R k.

CHAPTER 3 LARGE SAMPLE THEORY 50 Properties of Weak Convergence

CHAPTER 3 LARGE SAMPLE THEORY 51 Theorem 3.6 (Continuous mapping theorem) Suppose X n a.s. X, or X n p X, or X n d X. Then for any continuous function g( ), g(x n ) converges to g(x) almost surely, or in probability, or in distribution.

CHAPTER 3 LARGE SAMPLE THEORY 52 Proof If X n a.s. X, g(x n ) a.s g(x). If X n p X, then for any subsequence, there exists a further subsequence X nk a.s. X. Thus, g(x nk ) a.s. g(x). Then g(x n ) p g(x) from (H) in Theorem 3.1. To prove that g(x n ) d g(x) when X n d X, use (b) of Theorem 3.1.

CHAPTER 3 LARGE SAMPLE THEORY 53 One remark Theorem 3.6 concludes that g(x n ) d g(x) if X n d X and g is continuous. In fact, this result still holds if P (X C(g)) = 1 where C(g) contains all the continuity points of g. That is, if g s discontinuity points take zero probability of X, the continuous mapping theorem holds.

CHAPTER 3 LARGE SAMPLE THEORY 54 Theorem 3.7 (Slutsky theorem) Suppose X n d X, Y n p y and Z n p z for some constant y and z. Then Z n X n + T n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 55 Proof First show that X n + Y n d X + y. For any ϵ > 0, P (X n + Y n x) P (X n + Y n x, Y n y ϵ) + P ( Y n y > ϵ) P (X n x y + ϵ) + P ( Y n y > ϵ). lim sup n F Xn +Y n (x) lim sup n F Xn (x y + ϵ) F X (x y + ϵ).

CHAPTER 3 LARGE SAMPLE THEORY 56 On the other hand, P (X n + Y n > x) = P (X n + Y n > x, Y n y ϵ) + P ( Y n y > ϵ) P (X n > x y ϵ) + P ( Y n y > ϵ). lim sup(1 F Xn +Y n (x)) lim sup P (X n > x y ϵ) n n lim sup n P (X n x y 2ϵ) (1 F X (x y 2ϵ)). F X (x y 2ϵ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X (x + y + ϵ). F X+y (x ) lim inf n F Xn +Y n (x) lim sup n F Xn +Y n (x) F X+y (x).

CHAPTER 3 LARGE SAMPLE THEORY 57 To complete the proof, P ( (Z n z)x n > ϵ) P ( Z n z > ϵ 2 )+P ( Z n z ϵ 2, X n > 1 ϵ ). lim sup n P ( (Z n z)x n > ϵ) lim sup n + lim sup n that (Z n z)x n p 0. P ( Z n z > ϵ 2 ) P ( X n 1 2ϵ ) P ( X 1 2ϵ ). Clearly zx n d zx Z n X n d zx from the proof in the first half. Again, using the first half s proof, Z n X n + Y n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 58 Examples Suppose X n d N(0, 1). Then by continuous mapping theorem, X 2 n d χ 2 1. This example shows that g can be discontinuous in Theorem 3.6. Let X n d X with X N(0, 1) and g(x) = 1/x. Although g(x) is discontinuous at origin, we can still show that 1/X n d 1/X, the reciprocal of the normal distribution. This is because P (X = 0) = 0. However, in Example 3.6 where g(x) = I(x > 0), it shows that Theorem 3.6 may not be true if P (X C(g)) < 1.

CHAPTER 3 LARGE SAMPLE THEORY 59 The condition Y n p y, where y is a constant, is necessary. For example, let X n = X Uniform(0, 1). Let Y n = X so Y n d X, where X is an independent random variable with the same distribution as X. However X n + Y n = 0 does not converge in distribution to X X.

CHAPTER 3 LARGE SAMPLE THEORY 60 Let X 1, X 2,... be a random sample from a normal distribution with mean µ and variance σ 2 > 0, n( Xn µ) d N(0, σ 2 ), s 2 n = 1 n 1 n i=1 (X i X n ) 2 a.s σ 2. n( Xn µ) 1 d s n σ N(0, σ2 ) = N(0, 1). in large sample, t n 1 can be approximated by a standard normal distribution.

CHAPTER 3 LARGE SAMPLE THEORY 61 Representation of Weak Convergence

CHAPTER 3 LARGE SAMPLE THEORY 62 Theorem 3.8 (Skorohod s Representation Theorem) Let {X n } and X be random variables in a probability space (Ω, A, P ) and X n d X. Then there exists another probability space ( Ω, Ã, P ) and a sequence of random variables X n and X defined on this space such that X n and X n have the same distributions, X and X have the same distributions, and moreover, Xn a.s. X.

CHAPTER 3 LARGE SAMPLE THEORY 63 Quantile function F 1 (p) = inf{x : F (x) p}. Proposition 3.1 (a) F 1 is left-continuous. (b) If X has continuous distribution function F, then F (X) Uniform(0, 1). (c) Let ξ Uniform(0, 1) and let X = F 1 (ξ). Then for all x, {X x} = {ξ F (x)}. Thus, X has distribution function F.

CHAPTER 3 LARGE SAMPLE THEORY 64 Proof (a) Clearly, F 1 is nondecreasing. Suppose p n increases to p then F 1 (p n ) increases to some y F 1 (p). Then F (y) p n so F (y) p. F 1 (p) y y = F 1 (p). (b) {X x} {F (X) F (x)} F (x) P (F (X) F (x)). {F (X) F (x) ϵ} {X x} P (F (X) F (x) ϵ) F (x) P (F (X) F (x) ) F (x). Then if X is continuous, P (F (X) F (x)) = F (x). (c) P (X x) = P (F 1 (ξ) x) = P (ξ F (x)) = F (x).

CHAPTER 3 LARGE SAMPLE THEORY 65 Proof Let ( Ω, Ã, P ) be ([0, 1], B [0, 1], λ). Define X n = Fn 1 (ξ), X = F 1 (ξ), where ξ Uniform(0, 1). Xn has a distribution F n which is the same as X n. For any t (0, 1) such that there is at most one value x such that F (x) = t (it is easy to see t is the continuous point of F 1 ), for any z < x, F (z) < t when n is large, F n (z) < t so Fn 1 (t) z. lim inf n F 1 n (t) z lim inf n F 1 n (t) x = F 1 (t). From F (x + ϵ) > t, F n (x + ϵ) > t so Fn 1 (t) x + ϵ. lim sup n Fn 1 (t) x + ϵ lim sup n Fn 1 (t) x. Thus F 1 n (t) F 1 (t) for almost every t (0, 1) X n a.s. X.

CHAPTER 3 LARGE SAMPLE THEORY 66 Usefulness of representation theorem For example, if X n d X and one wishes to show some function of X n, denote by g(x n ), converges in distribution to g(x): see the diagram in Figure 2.

CHAPTER 3 LARGE SAMPLE THEORY 67

CHAPTER 3 LARGE SAMPLE THEORY 68 Alternative Proof for Slutsky Theorem First, show (X n, Y n ) d (X, y). ϕ (Xn,Y n )(t 1, t 2 ) ϕ (X,y) (t 1, t 2 ) = E[e it 1X n e it 2Y n ] E[e it1x e it2y ] E[e it 1X n (e it 2Y n e it2y )] + e it2y E[e it 1X n ] E[e it1x ] E[ e it 2Y n e it2y ] + E[e it 1X n ] E[e it1x ] 0. Thus, (Z n, X n ) d (z, X). Since g(z, x) = zx is continuous, Z n X n d zx. Since (Z n X n, Y n ) d (zx, y) and g(x, y) = x + y is continuous, Z n X n + Y n d zx + y.

CHAPTER 3 LARGE SAMPLE THEORY 69 Summation of Independent R.V.s

CHAPTER 3 LARGE SAMPLE THEORY 70 Some preliminary lemmas Proposition 3.2 (Borel-Cantelli Lemma) For any events A n, i=1 P (A n ) < implies P (A n, i.o.) = P ({A n } occurs infinitely often) = 0; or equivalently, P ( n=1 m n A m ) = 0. Proof P (A n, i.o) P ( m n A m ) m n P (A m ) 0, as n.

CHAPTER 3 LARGE SAMPLE THEORY 71 One result of the first Borel-cantelli lemma If for a sequence of random variables, {Z n }, and for any ϵ > 0, n P ( Z n > ϵ) <, then Z n > ϵ only occurs finite times. Z n a.s. 0.

CHAPTER 3 LARGE SAMPLE THEORY 72 Proposition 3.3 (Second Borel-Cantelli Lemma) For a sequence of independent events A 1, A 2,..., n=1 P (A n ) = implies P (A n, i.o.) = 1. Proof Consider the complement of {A n, i.o}. P ( n=1 m n A c m) = lim n P ( m n A c m) = lim n lim sup n m n exp{ m n P (A m )} = 0. (1 P (A m ))

CHAPTER 3 LARGE SAMPLE THEORY 73 Equivalence lemma Proposition 3.4 X 1,..., X n are i.i.d with finite mean. Define Y n = X n I( X n n). Then n=1 P (X n Y n ) <.

CHAPTER 3 LARGE SAMPLE THEORY 74 Proof Since E[ X 1 ] <, P ( X n) = n=1 np (n X < (n + 1)) n=1 E[ X ] <. n=1 From the Borel-Cantelli Lemma, P (X n Y n, i.o) = 0. For almost every ω Ω, when n is large enough, X n (ω) = Y n (ω).

CHAPTER 3 LARGE SAMPLE THEORY 75 Weak Law of Large Numbers

CHAPTER 3 LARGE SAMPLE THEORY 76 Theorem 3.9 (Weak Law of Large Number) If X, X 1,..., X n are i.i.d with mean µ (so E[ X ] < and µ = E[X]), then X n p µ.

CHAPTER 3 LARGE SAMPLE THEORY 77 Proof Define Y n = X n I( n X n n). Let µ n = n k=1 E[Y k]/n. P ( Ȳn µ n ϵ) V ar(ȳn) ϵ 2 n k=1 V ar(x ki( X k k)) n 2 ϵ 2. V ar(x k I( X k k)) E[X 2 ki( X k k)] = E[X 2 ki( X k k, X k kϵ 2 )] + E[X 2 ki( X k k, X kϵ 2 )] ke[ X k I( X k kϵ 2 )] + kϵ 4, n P ( Ȳn µ n ϵ) lim sup n P ( Ȳn µ n ϵ) ϵ 2 Ȳn µ n p 0. k=1 E[ X I( X kϵ 2 )] nϵ 2 + ϵ 2 n(n+1) 2n. 2 µ n µ Ȳn p µ. From Proposition 3.4 and subsequence arguments, X nk a.s. µ X n p µ.

CHAPTER 3 LARGE SAMPLE THEORY 78 Strong Law of Large Numbers

CHAPTER 3 LARGE SAMPLE THEORY 79 Theorem 3.10 (Strong Law of Large Number) If X 1,..., X n are i.i.d with mean µ then X n a.s. µ.

CHAPTER 3 LARGE SAMPLE THEORY 80 Proof Without loss of generality, we assume X n 0 since if this is true, the result also holds for any X n by X n = X + n X n. Similar to Theorem 3.9, it is sufficient to show Ȳn a.s. µ, where Y n = X n I(X n n). Note E[Y n ] = E[X 1 I(X 1 n)] µ so n E[Y k ]/n µ. k=1 if we denote S n = n k=1 (Y k E[Y k ]) and we can show S n /n a.s. 0, then the result holds.

CHAPTER 3 LARGE SAMPLE THEORY 81 V ar( S n n ) = V ar(y k ) k=1 By the Chebyshev s inequality, n k=1 E[Y 2 k ] ne[x 2 1 I(X 1 n)]. n=1 P ( S n n > ϵ) 1 n 2 ϵ 2 V ar( S n ) E[X2 1 I(X 1 n)] nϵ 2. For any α > 1, let u n = [α n ]. P ( S un > ϵ) u n n=1 1 u n ϵ 2 E[X2 1 I(X 1 u n )] 1 ϵ 2 E[X2 1 u n X 1 1 u n ]. Since for any x > 0, u n x {µ n} 1 < 2 n log x/ log α α n Kx 1 for some constant K, P ( S un > ϵ) K u n ϵ 2 E[X 1] <, n=1 S un /u n a.s. 0.

CHAPTER 3 LARGE SAMPLE THEORY 82 For any k, we can find u n < k u n+1. Thus, since X 1, X 2,... 0, S un u n µ/α lim inf k u n u n+1 S k k S un+1 u n+1 u n+1 u n. S k k lim sup k S k k µα. Since α is arbitrary number larger than 1, let α 1 and we obtain lim k Sk /k = µ.

CHAPTER 3 LARGE SAMPLE THEORY 83 Central Limit Theorems

CHAPTER 3 LARGE SAMPLE THEORY 84 Preliminary result of c.f. Proposition 3.5 Suppose E[ X m ] < for some integer m 0. Then ϕ X (t) m k=0 (it) k E[X k ] / t m 0, as t 0. k!

CHAPTER 3 LARGE SAMPLE THEORY 85 Proof e itx = m k=1 (itx) k k! + (itx)m m! [e itθx 1], where θ [0, 1]. ϕ X (t) as t 0. m k=0 (it) k E[X k ] / t m E[ X m e itθx 1 ]/m! 0, k!

CHAPTER 3 LARGE SAMPLE THEORY 86 Simple versions of CLT Theorem 3.11 (Central Limit Theorem) If X 1,..., X n are i.i.d with mean µ and variance σ 2 then n( Xn µ) d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 87 Proof Denote Y n = n( X n µ). ϕ Yn (t) = { ϕ X1 µ(t/ n) } n. ϕ X1 µ(t/ n) = 1 σ 2 t 2 /2n + o(1/n). ϕ Yn (t) exp{ σ2 t 2 2 }.

CHAPTER 3 LARGE SAMPLE THEORY 88 Theorem 3.12 (Multivariate Central Limit Theorem) If X 1,..., X n are i.i.d random vectors in R k with mean µ and covariance Σ = E[(X µ)(x µ) ], then n( X n µ) d N(0, Σ). Proof Use the Cramér-Wold s device.

CHAPTER 3 LARGE SAMPLE THEORY 89 Liaponov CLT Theorem 3.13 (Liapunov Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let µ n = n i=1 µ ni, σ 2 n = n i=1 σ 2 ni. If n i=1 E[ X ni µ ni 3 ] σ 3 n then n i=1 (X ni µ ni )/σ n d N(0, 1). 0,

CHAPTER 3 LARGE SAMPLE THEORY 90 Lindeberg-Feller CLT Theorem 3.14 (Lindeberg-Fell Central Limit Theorem) Let X n1,..., X nn be independent random variables with µ ni = E[X ni ] and σ 2 ni = V ar(x ni ). Let σ 2 n = n i=1 σ 2 ni. Then both n i=1 (X ni µ ni )/σ n d N(0, 1) and max {σ 2 ni/σ 2 n : 1 i n} 0 if and only if the Lindeberg condition 1 σ 2 n n i=1 holds. E[ X ni µ ni 2 I( X ni µ ni ϵσ n )] 0, for all ϵ > 0

CHAPTER 3 LARGE SAMPLE THEORY 91 Proof of Liapunov CLT using Theorem 3.14 1 σ 2 n n i=1 E[ X nk µ nk 2 I( X nk µ nk > ϵσ n )] 1 ϵ 3 σ 3 n n k=1 E[ X nk µ nk 3 ].

CHAPTER 3 LARGE SAMPLE THEORY 92 Examples This is one example from a simple linear regression X j = α + βz j + ϵ j for j = 1, 2,... where z j are known numbers not all equal and the ϵ j are i.i.d with mean zero and variance σ 2. ˆβ n = n j=1 X j (z j z n )/ n j=1 (z j z n ) 2 = β + n j=1 ϵ j (z j z n )/ n j=1 (z j z n ) 2. Assume max j n (z j z n ) 2 / n j=1 (z j z n ) 2 0. n n j=1 (z j z n ) 2 n ( ˆβ n β) d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 93 The example is taken from the randomization test for paired comparison. Let (X j, Y j ) denote the values of jth pairs with X j being the result of the treatment and Z j = X j Y j. Conditional on Z j = z j, Z j = Z j sgn(z j ) is independent taking values ± Z j with probability 1/2, when treatment and control have no difference. Conditional on z 1, z 2,..., the randomization t-test is the t-statistic n 1 Z n /s z where s 2 z is 1/n n j=1 (Z j Z n ) 2. When max j n z2 j / n j=1 z 2 j 0, this statistic has an asymptotic normal distribution N(0, 1).

CHAPTER 3 LARGE SAMPLE THEORY 94 Delta Method

CHAPTER 3 LARGE SAMPLE THEORY 95 Theorem 3.15 (Delta method) For random vector X and X n in R k, if there exists two constant a n and µ such that a n (X n µ) d X and a n, then for any function g : R k R l such that g has a derivative at µ, denoted by g(µ) a n (g(x n ) g(µ)) d g(µ)x.

CHAPTER 3 LARGE SAMPLE THEORY 96 Proof By the Skorohod representation, we can construct X n and X such that X n d X n and X d X ( d means the same distribution) and a n ( X n µ) a.s. X. a n (g( X n ) g(µ)) a.s. g(µ) X a n (g(x n ) g(µ)) d g(µ)x

CHAPTER 3 LARGE SAMPLE THEORY 97 Examples Let X 1, X 2,... be i.i.d with fourth moment and s 2 n = (1/n) n i=1 (X i X n ) 2. Denote m k as the kth moment of X 1 for k 4. Note that s 2 n = (1/n) n i=1 Xi 2 ( n i=1 X i /n) 2 and [( ) ( )] X n n (1/n) m1 n i=1 X2 i m 2 ( ( )) m2 m 1 m 3 m 1 m 2 d N 0, m 3 m 1 m 2 m 4 m 2, 2 the Delta method with g(x, y) = y x 2 n(s 2 n V ar(x 1 )) d N(0, m 4 (m 2 m 2 1) 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 98 Let (X 1, Y 1 ), (X 2, Y 2 ),... be i.i.d bivariate samples with finite fourth moment. One estimate of the correlation among X and Y is ˆρ n = s xy, s 2 x s 2 y where s xy = (1/n) n i=1 (X i X n )(Y i Ȳn), s 2 x = (1/n) n i=1 (X i X n ) 2 and s 2 y = (1/n) n i=1 (Y i Ȳn) 2. To derive the large sample distribution of ˆρ n, first obtain the large sample distribution of (s xy, s 2 x, s 2 y) using the Delta method then further apply the Delta method with g(x, y, z) = x/ yz.

CHAPTER 3 LARGE SAMPLE THEORY 99 The example is taken from the Pearson s Chi-square statistic. Suppose that one subject falls into K categories with probabilities p 1,..., p K, where p 1 +... + p K = 1. The Pearson s statistic is defined as χ 2 = n K k=1 ( n k n p k) 2 /p k, which can be treated as (observed count expected count) 2 /expected count. Note n(n 1 /n p 1,..., n K /n p K ) has an asymptotic multivariate normal distribution. Then we can apply the Delta method to g(x 1,..., x K ) = K i=1 x 2 k.

CHAPTER 3 LARGE SAMPLE THEORY 100 U-statistics

CHAPTER 3 LARGE SAMPLE THEORY 101 Definition Definition 3.6 A U-statistics associated with h(x 1,..., x r ) is defined as U n = 1 r! ( n r) β h(x β1,..., X βr ), where the sum is taken over the set of all unordered subsets β of r different integers chosen from {1,..., n}.

CHAPTER 3 LARGE SAMPLE THEORY 102 Examples One simple example is h(x, y) = xy. Then U n = (n(n 1)) 1 i j X i X j. U n = E[ h(x 1,..., X r ) X (1),..., X (n) ]. U n is the summation of non-independent random variables. If define h(x 1,..., x r ) as (r!) 1 ( x 1,..., x r ) h( x 1,..., x r ), then h(x 1,..., x r ) is permutation-symmetric U n = 1 ( n r) β 1 <...<β r h(β 1,..., β r ). h is called the kernel of the U-statistic U n.

CHAPTER 3 LARGE SAMPLE THEORY 103 CLT for U-statistics Theorem 3.16 Let µ = E[h(X 1,..., X r )]. If E[h(X 1,..., X r ) 2 ] <, then n(un µ) n n i=1 E[U n µ X i ] p 0. Consequently, n(u n µ) is asymptotically normal with mean zero and variance r 2 σ 2, where, with X 1,..., X r, X 1,..., X r i.i.d variables, σ 2 = Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )).

CHAPTER 3 LARGE SAMPLE THEORY 104 Some preparation Linear space of r.v.:let S be a linear space of random variables with finite second moments that contain the constants; i.e., 1 S and for any X, Y S, ax + by S n where a and b are constants. Projection: for random variable T, a random variable S is called the projection of T on S if E[(T S) 2 ] minimizes E[(T S) 2 ], S S.

CHAPTER 3 LARGE SAMPLE THEORY 105 Proposition 3.7 Let S be a linear space of random variables with finite second moments. Then S is the projection of T on S if and only if S S and for any S S, E[(T S) S] = 0. Every two projections of T onto S are almost surely equal. If the linear space S contains the constant variable, then E[T ] = E[S] and Cov(T S, S) = 0 for every S S.

CHAPTER 3 LARGE SAMPLE THEORY 106 Proof For any S and S in S, E[(T S) 2 ] = E[(T S) 2 ] + 2E[(T S) S] + E[(S S) 2 ]. if S satisfies that E[(T S) S] = 0, then E[(T S) 2 ] E[(T S) 2 ]. S is the projection of T on S. If S is the projection, for any constant α, E[(T S α S) 2 ] is minimized at α = 0. Calculate the derivative at α = 0 E[(T S) S] = 0. If T has two projections S 1 and S 2, E[(S 1 S 2 ) 2 ] = 0. Thus, S 1 = S 2, a.s. If the linear space S contains the constant variable, choose S = 1 0 = E[(T S) S] = E[T ] E[S]. Clearly, Cov(T S, S) = E[(T S) S] = 0.

CHAPTER 3 LARGE SAMPLE THEORY 107 Equivalence with projection Proposition 3.8 Let S n be linear space of random variables with finite second moments that contain the constants. Let T n be random variables with projections S n on to S n. If V ar(t n )/V ar(s n ) 1 then Z n T n E[T n ] V ar(t n ) S n E[S n ] V ar(s n ) p 0.

CHAPTER 3 LARGE SAMPLE THEORY 108 Proof. E[Z n ] = 0. Note that Cov(T n, S n ) V ar(z n ) = 2 2 V ar(tn )V ar(s n ). Since S n is the projection of T n, Cov(T n, S n ) = Cov(T n S n, S n ) + V ar(s n ) = V ar(s n ). We have V ar(s n ) V ar(z n ) = 2(1 V ar(t n ) ) 0. By the Markov s inequality, we conclude that Z n p 0.

CHAPTER 3 LARGE SAMPLE THEORY 109 Conclusion if S n is the summation of i.i.d random variables such that (S n E[S n ])/ V ar(s n ) d N(0, σ 2 ), so is (T n E[T n ])/ V ar(t n ). The limit distribution of U-statistics is derived using this lemma.

CHAPTER 3 LARGE SAMPLE THEORY 110 Proof of CLT for U-statistics Proof Let X 1,..., X r be random variables with the same distribution as X 1 and they are independent of X 1,..., X n. Denote Ũn by n i=1 E[U µ X i]. We show that Ũn is the projection of U n on the linear space S n = { g 1 (X 1 ) +... + g n (X n ) : E[g k (X k ) 2 ] <, k = 1,..., n }, which contains the constant variables. Clearly, Ũn S n. For any g k (X k ) S n, E[(U n Ũn)g k (X k )] = E[E[U n Ũn X k ]g k (X k )] = 0.

CHAPTER 3 LARGE SAMPLE THEORY 111 Ũ n = = r n n i=1 ( n 1 r 1 ) ( n r) E[h( X 1,..., X r 1, X i ) µ X i ] n E[h( X 1,..., X r 1, X i ) µ X i ]. i=1 V ar(ũn) = r2 n 2 n E[(E[h( X 1,..., X r 1, X i ) µ X i ]) 2 ] i=1 = r2 n Cov(E[h( X 1,..., X r 1, X 1 ) X 1 ], E[h( X 1,..., X r 1, X 1 ) X 1 ]) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2..., X r )) = r2 σ 2 n.

CHAPTER 3 LARGE SAMPLE THEORY 112 Furthermore, = = V ar(u n ) ( n r ( n r ) 2 β ) 2 r k=1 β Cov(h(X β1,..., X βr ), h(x β 1,..., X β r )) β and β share k components Cov(h(X 1, X 2,.., X k, X k+1,..., X r ), h(x 1, X 2,..., X k, X k+1,..., X r )). V ar(u n ) = r k=1 r! k!(r k)! (n r)(n r+1) (n 2r+k+1) n(n 1) (n r+1) c k. V ar(u n ) = r2 n Cov(h(X 1, X 2,..., X r ), h(x 1, X 2,..., X r )) + O( 1 n 2 ). V ar(u n )/V ar(ũn) 1. U n µ V ar(un ) Ũ n V ar(ũn) p 0.

CHAPTER 3 LARGE SAMPLE THEORY 113 Example In a bivariate i.i.d sample (X 1, Y 1 ), (X 2, Y 2 ),..., one statistic of measuring the agreement is called Kendall s τ-statistic ˆτ = 4 n(n 1) i<j I {(Y j Y i )(X j X i ) > 0} 1. ˆτ + 1 is a U-statistic of order 2 with the kernel 2I {(y 2 y 1 )(x 2 x 1 ) > 0}. n(ˆτ n + 1 2P ((Y 2 Y 1 )(X 2 X 1 ) > 0)) has an asymptotic normal distribution with mean zero.

CHAPTER 3 LARGE SAMPLE THEORY 114 Rank Statistics

CHAPTER 3 LARGE SAMPLE THEORY 115 Some definitions X (1) X (2)... X (n) is called order statistics The rank statistics, denoted by R 1,..., R n are the ranks of X i among X 1,..., X n. Thus, if all the X s are different, X i = X (Ri ). When there are ties, R i is defined as the average of all indices such that X i = X (j) (sometimes called midrank). Only consider the case that X s have continuous densities.

CHAPTER 3 LARGE SAMPLE THEORY 116 More definitions a rank statistic is any function of the ranks a linear rank statistic is a rank statistic of the special form n i=1 a(i, R i ) for a given matrix (a(i, j)) n n. if a(i, j) = c i a j, then such statistic with form ni=1 c i a Ri is called simple linear rank statistic: c and a s are called the coefficients and scores.

CHAPTER 3 LARGE SAMPLE THEORY 117 Examples In two independent sample X 1,..., X n and Y 1,..., Y m, a Wilcoxon statistic is defined as the summation of all the ranks of the second sample in the pooled data X 1,..., X n, Y 1,..., Y m, i.e., W n = n+m i=n+1 R i. Other choices for rank statistics: for instance, the van der Waerden statistic n+m i=n+1 Φ 1 (R i ).

CHAPTER 3 LARGE SAMPLE THEORY 118 Properties of rank statistics Proposition 3.9 Let X 1,..., X n be a random sample from continuous distribution function F with density f. Then 1. the vectors (X (1),..., X (n) ) and (R 1,..., R n ) are independent; 2. the vector (X (1),..., X (n) ) has density n! n i=1 f(x i ) on the set x 1 <... < x n ; 3. the variable X (i) has density ) F (x) i 1 (1 F (x)) n i f(x); for F the uniform ( n 1 i 1 distribution on [0, 1], it has mean i/(n + 1) and variance i(n i + 1)/[(n + 1) 2 (n + 2)];

CHAPTER 3 LARGE SAMPLE THEORY 119 4. the vector (R 1,..., R n ) is uniformly distributed on the set of all n! permutations of 1, 2,..., n; 5. for any statistic T and permutation r = (r 1,..., r n ) of 1, 2,..., n, E[T (X 1,..., X n ) (R 1,.., R n ) = r] = E[T (X (r1 ),.., X (rn ))]; 6. for any simple linear rank statistic T = n i=1 c i a Ri, E[T ] = n c n ā n, V ar(t ) = 1 n 1 n n (c i c n ) 2 (a i ā n ) 2. i=1 i=1

CHAPTER 3 LARGE SAMPLE THEORY 120 CLT of rank statistics Theorem 3.17 Let T n = n i=1 c i a Ri such that max a i ā n / n (a i ā n ) 2 0, max c i c n / n (c i c n ) 2 0. i n i n i=1 Then (T n E[T n ])/ for every ϵ > 0, (i,j) i=1 V ar(t n ) d N(0, 1) if and only if { n a i ā n c i c n I n i=1 (a i ā n ) 2 n i=1 (c i c n ) 2 > ϵ a i ā n 2 c i c n 2 n i=1 (a i ā n ) 2 n i=1 (c i c n ) 0. 2 }

CHAPTER 3 LARGE SAMPLE THEORY 121 More on rank statistics a simple linear signed rank statistic n i=1 a R + sign(x i ), i where R + 1,..., R + n, absolute rank, are the ranks of X 1,..., X n. In a bivariate sample (X 1, Y 1 ),..., (X n, Y n ), ni=1 a Ri b Si where (R 1,..., R n ) and (S 1,..., S n ) are respective ranks of (X 1,..., X n ) and (Y 1,..., Y n ).

CHAPTER 3 LARGE SAMPLE THEORY 122 Martingales

CHAPTER 3 LARGE SAMPLE THEORY 123 Definition 3.7 Let {Y n } be a sequence of random variables and F n be sequence of σ-fields such that F 1 F 2... Suppose E[ Y n ] <. Then the pairs {(Y n, F n )} is called a martingale if E[Y n F n 1 ] = Y n 1, a.s. {(Y n, F n )} is a submartingale if E[Y n F n 1 ] Y n 1, a.s. {(Y n, F n )} is a supmartingale if E[Y n F n 1 ] Y n 1, a.s.

CHAPTER 3 LARGE SAMPLE THEORY 124 Some notes on definition Y 1,..., Y n are measurable in F n. Sometimes, we say Y n is adapted to F n. One simple example: Y n = X 1 +... + X n, where X 1, X 2,... are i.i.d with mean zero, and F n is the σ-filed generated by X 1,..., X n.

CHAPTER 3 LARGE SAMPLE THEORY 125 Convex function of martingales Proposition 3.9 Let {(Y n, F n )} be a martingale. For any measurable and convex function ϕ, {(ϕ(y n ), F n )} is a submartingale.

CHAPTER 3 LARGE SAMPLE THEORY 126 Proof Clearly, ϕ(y n ) is adapted to F n. It is sufficient to show E[ϕ(Y n ) F n 1 ] ϕ(y n 1 ). This follows from the well-known Jensen s inequality: for any convex function ϕ, E[ϕ(Y n ) F n 1 ] ϕ(e[y n F n 1 ]) = ϕ(y n 1 ).

CHAPTER 3 LARGE SAMPLE THEORY 127 Jensen s inequality Proposition 3.10 For any random variable X and any convex measurable function ϕ, E[ϕ(X)] ϕ(e[x]).

CHAPTER 3 LARGE SAMPLE THEORY 128 Proof Claim that for any x 0, there exists a constant k 0 such that for any x, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ). By the convexity, for any x < y < x 0 < y < x, ϕ(x 0 ) ϕ(x ) x 0 x ϕ(y) ϕ(x 0) y x 0 ϕ(x) ϕ(x 0) x x 0. Thus, ϕ(x) ϕ(x 0) x x 0 is bounded and decreasing as x decreases to x 0. Let the limit be k 0 + ϕ(x) ϕ(x 0) x x 0 k 0 +. ϕ(x) k 0 + (x x 0) + ϕ(x 0 ).

CHAPTER 3 LARGE SAMPLE THEORY 129 Similarly, ϕ(x ) ϕ(x 0 ) x x 0 ϕ(y ) ϕ(x 0 ) y x 0 ϕ(x) ϕ(x 0) x x 0. Then ϕ(x ) ϕ(x 0 ) x x 0 is increasing and bounded as x increases to x 0. Let the limit be k0 ϕ(x ) k0 (x x 0 ) + ϕ(x 0 ). Clearly, k + 0 k 0. Combining those two inequalities, ϕ(x) ϕ(x 0 ) + k 0 (x x 0 ) for k 0 = (k + 0 + k 0 )/2. Choose x 0 = E[X] then ϕ(x) ϕ(e[x]) + k 0 (X E[X]).

CHAPTER 3 LARGE SAMPLE THEORY 130 Decomposition of submartingale Y n = (Y n E[Y n F n 1 ]) + E[Y n F n 1 ] any submartingale can be written as the summation of a martingale and a random variable predictable in F n 1.

CHAPTER 3 LARGE SAMPLE THEORY 131 Convergence of martingales Theorem 3.18 (Martingale Convergence Theorem) Let {(X n, F n )} be submartingale. If K = sup n E[ X n ] <, then X n a.s. X where X is a random variable satisfying E[ X ] K.

CHAPTER 3 LARGE SAMPLE THEORY 132 Corollary 3.1 If F n is increasing σ-field and denote F as the σ-field generated by n=1f n, then for any random variable Z with E[ Z ] <, it holds E[Z F n ] a.s. E[Z F ].

CHAPTER 3 LARGE SAMPLE THEORY 133 CLT for martingale Theorem 3.19 (Martingale Central Limit Theorem) Let (Y n1, F n1 ), (Y n2, F n2 ),... be a martingale. Define X nk = Y nk Y n,k 1 with Y n0 = 0 thus Y nk = X n1 +... + X nk. Suppose that k E[X 2 nk F n,k 1 ] p σ 2 where σ is a positive constant and that for each ϵ > 0. Then k E[X 2 nki( X nk ϵ) F n,k 1 ] p 0 k X nk d N(0, σ 2 ).

CHAPTER 3 LARGE SAMPLE THEORY 134 Some Notation

CHAPTER 3 LARGE SAMPLE THEORY 135 o p (1) and O p (1) X n = o p (1) denotes that X n converges in probability to zero, X n = O p (1) denotes that X n is bounded in probability; i.e., lim lim sup P ( X n M) = 0. M n for a sequence of random variable {r n }, X n = o p (r n ) means that X n /r n p 0 and X n = O p (r n ) means that X n /r n is bounded in probability.

CHAPTER 3 LARGE SAMPLE THEORY 136 Algebra in o p (1) and O p (1) o p (1) + o p (1) = o p (1) O p (1) + O p (1) = O p (1), O p (1)o p (1) = o p (1) (1 + o p (1)) 1 = 1 + o p (1) o p (R n ) = R n o p (1) O p (R n ) = R n O p (1) o p (O p (1)) = o p (1). If a real function R( ) satisfies that R(h) = o( h p ) as h 0, R(X n ) = o p ( X n p ). If R(h) = O( h p ) as h 0, R(X n ) = O p ( X n p ).

CHAPTER 3 LARGE SAMPLE THEORY 137 READING MATERIALS: Lehmann and Casella, Section 1.8, Ferguson, Part 1, Part 2, Part 3 12-15