Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Similar documents
Asymptotic Statistics-III. Changliang Zou

2016 Final for Advanced Probability for Communications

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Probability and Measure

Section 21. Expected Values. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

CS145: Probability & Computing

Limiting Distributions

Section 26. Characteristic Functions. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Lecture 21: Convergence of transformations and generating a random variable

Convergence Concepts of Random Variables and Functions

Solution Set for Homework #1

Stochastic Models (Lecture #4)

STA205 Probability: Week 8 R. Wolpert

conditional cdf, conditional pdf, total probability theorem?

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Multivariate Random Variable

CHAPTER 3: LARGE SAMPLE THEORY

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Limiting Distributions

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Convergence in Distribution

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Characteristic Functions and the Central Limit Theorem

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Positive and null recurrent-branching Process

P (A G) dp G P (A G)

1* (10 pts) Let X be a random variable with P (X = 1) = P (X = 1) = 1 2

Statistical signal processing

Sampling Distributions

Probability and Measure

17. Convergence of Random Variables

The Markov Chain Monte Carlo Method

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

On the convergence of sequences of random variables: A primer

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Lecture 3 - Expectation, inequalities and laws of large numbers

Lecture 2: Review of Probability

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1

2 2 + x =

Multiple Random Variables

0.1 Uniform integrability

Problem set 2 The central limit theorem.

Chapter 6: Random Processes 1

4. Distributions of Functions of Random Variables

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

1 Independent increments

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

Deterministic. Deterministic data are those can be described by an explicit mathematical relationship

The Lindeberg central limit theorem

Statistics for scientists and engineers

Spring 2012 Math 541B Exam 1

5 Operations on Multiple Random Variables

the convolution of f and g) given by

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Chapter 7: Special Distributions

Stationary independent increments. 1. Random changes of the form X t+h X t fixed h > 0 are called increments of the process.

Formulas for probability theory and linear models SF2941

THE LINDEBERG-FELLER CENTRAL LIMIT THEOREM VIA ZERO BIAS TRANSFORMATION

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

Review: mostly probability and some statistics

Exercises with solutions (Set D)

A Gentle Introduction to Stein s Method for Normal Approximation I

Selected Exercises on Expectations and Some Probability Inequalities

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Stat 5101 Notes: Algorithms

Essentials on the Analysis of Randomized Algorithms

Proving the central limit theorem

Practical conditions on Markov chains for weak convergence of tail empirical processes

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

Wiener Measure and Brownian Motion

Research Article Exponential Inequalities for Positively Associated Random Variables and Applications

18.175: Lecture 15 Characteristic functions and central limit theorem

Errata for FIRST edition of A First Look at Rigorous Probability

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Measure-theoretic probability

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

Continuous Distributions

Expectation, variance and moments

Continuous Random Variables

The Central Limit Theorem: More of the Story

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Lecture 4: Completion of a Metric Space

Sampling Distributions

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

UCSD ECE250 Handout #20 Prof. Young-Han Kim Monday, February 26, Solutions to Exercise Set #7

Lecture 14: Multivariate mgf s and chf s

1 Stat 605. Homework I. Due Feb. 1, 2011

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Twelfth Problem Assignment

Constructing Approximations to Functions

Comprehensive Examination Quantitative Methods Spring, 2018

Fixed Points and Contractive Transformations. Ron Goldman Department of Computer Science Rice University

Preliminary Exam: Probability 9:00am 2:00pm, Friday, January 6, 2012

STAT 512 sp 2018 Summary Sheet

The Heine-Borel and Arzela-Ascoli Theorems

Metric Spaces and Topology

Transcription:

Section 27 The Central Limit Theorem Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 3000, R.O.C.

Identically distributed summands 27- Central limit theorem: The sum of many independent random variables will be approximately normally distributed, if each summand has high probability of being small. Theorem 27. (Lindeberg-Lévy theorem) Suppose that {X n } n= is an independent sequence of random variables having the same distribution with mean c and finite positive variance σ 2.Then S n nc σ n N, where N is standard normal distributed, and S n = X + + X n.

Idea behind the proof. 27-2 Proof: Without loss of generality, we can assume that X n has zero-mean and unit variance. The characteristic function of S n / n is then given by: [ E e its n/ ] [ n = E [ ] = E n By the continuity theorem, = ϕ n X Theorem 26.3 (Continuity theorem) e it(x + +X n )/ n e itx/ n ( t n ). X n X if, and only if ϕ Xn (t) ϕ X (t) for every t R. ] we need to show that or equivalently, ( ) t lim ϕn X = e t2 /2, n [ ( )] t lim n log ϕ X n = t2 2.

Idea behind the proof. 27-3 Lemma If E[ X k ] <, then ϕ (k) (0) = i k E[X k ]. By noting that ϕ X (0) =, ϕ X (0) = 0 and ϕ X (0) =, lim n log ϕ X(t/ log ϕ X (ts) n) = lim (s =/ n) s 0 s 2 = lim s 0 = lim s 0 ϕ X t (ts) ϕ X (ts) ( 2s ϕ X (ts)ϕ X (ts) [ϕ (ts)] 2 ) t2 ϕ 2 X (ts) 2 = t2 2.

Generalization of Theorem 27. 27-4 Theorem 27. requires the distribution of each X n being identical. Can we relax this requirement for the central limit theorem to hold? Yes, but different proof technique must be developed.

Illustration of idea behind alternative proof 27-5 Lemma If X has a moment of order n, then n ϕ(t) (it) k [ { }] E[X k tx n+ ] k! E min (n +)!, 2 tx n n! k=0 { t n+ E[ X n+ ] min, 2 t n E[ X n } ]. (n +)! n! (Notably, this inequality is valid even if E[ X n+ ]=. Suppose that E[ X 3 ] <. (The assumption is simply made for showing the idea behind an alternative proof). With E[X] =0andE[X 2 ]=, ( ) ) [ { }] t ϕ X ( t2 tx 3 E min n 2n 6n, t2 3/2 n X2 t 3 E[ X 3 ] n 3/2 6

Illustration of idea behind alternative proof 27-6 Lemma Let z,...,z m and w,...,w m be complex numbers of modulus at most ; then m z z 2 z m w w 2 w m z k w k. Proof: As z z 2 z m w w 2 w m = z z 2 z m w z 2 z m + w z 2 z m w w 2 w m = (z w )(z 2 z m )+w (z 2 z m w 2 w m ), we obtain: z z 2 z m w w 2 w m = (z w )(z 2 z m )+w (z 2 z m w 2 w m ) (z w )(z 2 z m ) + w (z 2 z m w 2 w m ) z w + z 2 z m w 2 w m. The lemma therefore can be proved by induction.

Illustration of idea behind alternative proof 27-7 With the lemma and considering those n satisfying n t 2 /4, ( ) ) n ( ) t ϕn X ( t2 n t n 2n ϕ X n n t 3 E[ X 3 ] n 3/2 6 = t 3 n E[ X 3 ] 6 The central limit statement can then be validated by: ( ) n + t2 /2 e t2 /2. n 0. ( t2 2n)

Exemplified application of central limit theorem 27-8 Example 27.2 Suppose one wants to estimate the parameter α of an exponential distribution on the basis of independent samples X,...,X n. By law of large numbers, X n = n (X + + X n ) converges to mean α with probabiity. As the variance of exponential distribution is /α 2, Lindeberg-Lévy theorem gives that X + X 2 + + X n n/α n/α = α n ( Xn /α ) N. Equivalently, [ lim Pr α ( n X n α ) ] x =Φ(x), where Φ( ) represents the standard normal cdf. Roughly speaking, Xn is approximately Gaussian distributed with mean /α and variance /(α 2 n). Notably, this statement exactly indicates that lim Pr [ Xn (/α) /(α n) ] x =Φ(x)

Exemplified application of central limit theorem 27-9 Theorem 25.6 (Skorohod s theorem) Suppose µ n and µ are probability measures on (R, B), and µ n µ. Then there exist random variables Y n and Y such that:. they are both defined on common probability space (Ω, F,P); 2. Pr[Y n y] =µ n (,y] for every y; 3. Pr[Y y] =µ(,y] for every y; 4. lim Y n (ω) =Y (ω) for every ω Ω. By Skorohod s Theorem, there exist Ȳn :Ω Rand Y :Ω Rsuch that lim α n ( Ȳ n (ω) /α ) = Y (ω) for every ω Ω, and Ȳn and Y have the same distributions as X n and N, respectively. As P ({ ω Ω : lim Ȳ n (ω) =/α }) =, ( ) n α Ȳ n (ω) α = α n ( Ȳ n (ω) α ) Y (ω) = Y (ω), αȳn(ω) α (/α) where Y is also standard normal distributed.

Exemplified application of central limit theorem 27-0 This concludes to ( n α Xn ) α N. In words, / X n is approximately Gaussian distributed with mean α and variance α 2 /n.

Lindeberg Theorem 27- Definition A triangular array of random variables is X, X,r X 2, X 2,2 X 2,r2 X 2,r2 X 3, X 3,2 X 3,3 X 3,r3 2 X 3,r3 X 3,r3. where the probability space, on which each sequence X n,,...,x n,rn is commonly defined, may change (since we do not care about the dependence across sequences.) A sequence of random variables is just a special case of a triangular array of random variables with r n = n and X n,k = X k. Lindeberg s condition For an array of independent zero-mean random variables X n,,...,x n,rn, r n r n lim x 2 [ ] df Xn,k (x) = lim E Xn,k 2 I [ X n,k εs n ] =0, s 2 n [ x εs n ] where s 2 n = r n E [ X 2 n,k]. s 2 n

Lindeberg Theorem 27-2 Theorem 27.2 (Lindeberg theorem) For an array of independent zero-mean random variables X n,,...,x n,rn, if Lindeberg s condition holds for all positive ε, then S n s n N, where S n = X n, + + X n,rn. Discussions Theorem 27. (Lindeberg-Lévy theorem) is a special case of Theorem 27.2. Specifically, (i) r n = n,(ii) X n,k = X k,(iii) finite variance, and (iv) identically distributed assumption give: n lim [ ] nσ 2E XkI 2 [ ] [ Xk εσ n] = lim σ 2E XI 2 [ X εσ n] =0, where σ 2 = E [ X 2 ].

Lindeberg Theorem 27-3 Proof: Without loss of generality, assume s n = (since we can replace each X n,k by X n,k /s n ). Hence, Lindeberg s condition reduces to: lim r n [ x ε] x 2 df Xn,k (x) = lim r n E [ ] Xn,kI 2 [ Xn,k ε] =0. Since E[X n,k ]=0, ( ϕ X n,k (t) 2 t2 E[Xn,k]) 2 E [ min { tx n,k 3, tx n,k 2}] <. Lemma If X has a moment of order n, then n ϕ(t) (it) k [ { }] E[X k tx n+ ] k! E min (n +)!, 2 tx n n! k=0 ( E [ min { tx n+, tx n}] ) for n 2.

Lindeberg Theorem 27-4 Observe that: E [ min { tx n,k 3, tx n,k 2}] = min { tx 3, tx 2} df Xn,k (x) [ x <ε] + min { tx 3, tx 2} df Xn,k (x) [ x ε] tx 3 df Xn,k (x)+ tx 2 df Xn,k (x) [ x <ε] [ x ε] tε tx 2 df Xn,k (x)+ tx 2 df Xn,k (x) [ x <ε] [ x ε] ε t 3 E[Xn,k 2 ]+t2 x 2 df Xn,k (x), [ x ε] which implies that: r n ( ϕ X n,k (t) ) r n 2 t2 E[Xn,k] 2 ε t 3 E[Xn,k]+t 2 2 r n = ε t 3 + t 2 [ x ε] r n [ x ε] x 2 df Xn,k (x). x 2 df Xn,k (x)

Lindeberg Theorem 27-5 As ε can be made arbitrarily small, r n ϕ X n,k (t) By r n we get: ϕ Xn,k (t) lim r n lim ( 2 t2 E[X 2 n,k]) r n ϕ Xn,k (t) r n ( 2 t2 E[X 2 n,k ] ) =0. rn ϕ X n,k (t) (Hint: Is this correct? See the end of the proof!) It remains to show that lim r n ( 2 t2 E[X 2 n,k]) =0. ( ) 2 t2 E[Xn,k] 2 = e t2 /2. ( 2 t2 E[X 2 n,k]),

Lindeberg Theorem 27-6 e t2 /2 r n ( 2 t2 E[X 2 n,k]) = For each complex z, e z z z 2 k=2 z k 2 k! r n r n e t2 E[X 2 n,k ]/2 E[X 2 e t2 n,k ]/2 z 2 r n 4 t4 e t2 /2 k=2 r n ( 2 t2 E[Xn,k]) 2 ( 2 t2 E[Xn,k]) 2 z k 2 (k 2)! = z 2 e z. 4 t4 E 2 [X 2 n,k]e t2 E[X2 n,k ]/2 r n (Take z = 2 t2 E[X 2 n,k]) E 2 [X 2 n,k] (by E[X 2 n,k] s 2 n =) ( ) 4 t4 e t2 /2 max E[Xn,k 2 ] rn E[Xn,k 2 ] k r n = ( ) 4 t4 e t2 /2 max E[Xn,k] 2. k r n

Lindeberg Theorem 27-7 Finally, E[Xn,k] 2 ε 2 + [ x 2 ε 2 ] which implies that max E[Xn,k] 2 ε 2 + max k r n k r n ε 2 + k r n x 2 df Xn,k (x), [ x ε] [ x ε] x 2 df Xn,k (x) x 2 df Xn,k (x). The proof is then completed by taking arbitrarily small ε and Lindeberg s condition. ( 2 t2 E[X 2 n,k ]) is a complex number of modulus at most for t 2 4/E[X 2 n,k ]. Converse to Lindeberg theorem Give an array of independent zero-mean random variables X n,,...,x n,rn. If S n /s n N, then Lindeberg s condition holds, provided that max k r n E[X 2 n,k]/s 2 n 0.

Lindeberg Theorem 27-8 Without the extra condition of max E[Xn,k]/s 2 2 n 0, k r n the converse of Lindeberg theorem may not be valid. Counterexample Let X n,k be Gaussian distributed with mean 0 and variance for k<r n = n, andx n,n be Gaussian distributed with mean 0 and variance (n ). Thus, s 2 n = n E[Xn,k] 2 =(n ) + (n ) = 2(n ).

Lindeberg Theorem 27-9 Then S n /s n =(X n, +X n,2 + +X n,n )/s n is Gaussian distributed with mean 0 and variance, but n x 2 df s 2 Xn,k (x) x 2 df n [ x εs n ] s 2 Xn,n (x) n [ x εs n ] = x 2 e x2 /(2(n )) dx s 2 n [ x εs n ] 2π(n ) = 2(n ) [ y ε 2(n )/ (n )y 2 e y2 /2 dy, n ] 2π where x = n y = y 2 e y2 /2 dy, 2 2π and hence Lindeberg s condition fails. Notably, E[Xn,k 2 max ] k n does not converge to zero. s 2 n [ y ε 2] = max k n (n ) 2(n ) = 2

Goncharov s theorem 27-20 Observation If X n,k is uniformly bounded for each n and k, ands n as n, then Lindeberg s condition holds. Proof: Let M be the bound for X n,k,namelypr[ X n,k M] = for each k and n. Then for any ε > 0, εs n will ultimately exceed M, and therefore [ x εs n ] x2 df Xn,k (x) = 0 for every n and k, which implies r n x 2 df Xn,k (x) =0. lim s 2 n [ x εs n ]

Goncharov s theorem 27-2 Example 27.3 Let Pr[Y n,k =]= k and Pr[Y n,k =0]= k. Let X n,k = Y n,k E[Y n,k ]=Y n,k k for k n. Then E[X n,k ]=0,E[X 2 n,k ]=k k 2,ands 2 n = n k k 2, Lindeberg s con- Since X n,k is bounded by with probability, and s 2 n dition holds. The Lindeberg theorem thus concludes: X n, + + X n,n s n = Y n, + + Y n,n n (/k) n (/k) n N. (/k2 )

Goncharov s theorem 27-22 Goncharov s theorem: Y n, + + Y n,n log(n) N. log(n) Theorem 25.4 If X n X and X n Y n 0, then Y n X. Proof: Goncharov s theorem can be easily proved by: ( Y n, + + Y n,n ) n (/k) n (/k) n (/k2 ) N. and ( Y n, + + Y n,n ) n (/k) n (/k) n (/k2 ) ( Y n, + + Y n,n log(n) log(n) ) 0.

Lyapounov s condition 27-23 Discussions Lindeberg s condition is quite satisfiable in the sense that it is the sufficient and necessary condition for normalized row sum of independent array random variables to converge to standard normal, provided that the variances are uniformly and asymptotically negligible. It however may not be easy to examine the validity of Lindeberg s condition. A useful sufficiency for Lindeberg s condition to hold is Lyapounov s condition, which is often easier to verify than Lindeberg s condition. Lyapounov s condition Fix an array of independent zero-mean random variables X n,,...,x n,rn.forsomeδ>0, lim where s 2 n = r n E [ X 2 n,k]. r n s 2+δ n E [ X n,k 2+δ] =0,

Lyapounov s condition 27-24 Observation Lyapounov s condition implies Lindeberg s condition. Proof: r n s 2 n [ x εs n ] x 2 df Xn,k (x) r n s 2 n = ε δ r n ε δ r n [ x εs n ] s 2+δ n s 2+δ n ( ) x x 2 δ df (εs n ) δ Xn,k (x) [ x εs n ] R x 2+δ df Xn,k (x) x 2+δ df Xn,k (x). Theorem 27.3 For an array of independent zero-mean random variables X n,,...,x n,rn, if Lyapounov s condition holds for some positive δ, then where S n = X n, + + X n,rn. S n s n N,

Lyapounov s condition 27-25 Example 27.4 The Lyapounov s condition is always valid for i.i.d. sequence with bounded (2 + δ)th absolute moment. Proof: lim r n s 2+δ n E [ X n,k 2+δ] = lim [ r n E X 2+δ] r +δ/2 n E +δ/2 [X 2 ] = E [ X 2+δ] E +δ/2 [X 2] lim = 0 rn δ/2

Problem of coupon collector 27-26 Example 27.5 (Problem of coupon collector) A coupon collector has to collect r n distinct coupons to exchange for some free gift. Each purchase will give him one coupon, randomly and with replacement. The statistical behavior of this collection can be described as follows. Coupons are drawn from a coupon population of size n, randomly and with replacement, until the number of distinct coupons that have been collected is r n,where r n n. Let S n be the number of purchases required for this collection. Assume that r n /n ρ>0. What is the approximation distribution of (S n E[S n ])/ Var[S n ]? We can also apply this problem to, say, that r n out of n pieces need to be collected in order to recover the original information.

Problem of coupon collector 27-27 Solution: If (k ) distinct coupons have thus far been collected, the number of purchases until the next distinct one enters is distributed as X k,where where p k = n k + n Pr[X k = j] =( p k ) j p k for j =, 2, 3,... = k n. (Notably, we wish to see any one of the remaining n (k ) coupons to appear.) S n = X + X 2 + + X rn E[X k ]= and Var[X k ]= p k p k p 2. k

Problem of coupon collector 27-28 Hence, n rn /n /n /n which implies that x dx E[S n]= s 2 n = which implies that lim r n s 2 n lim Var[X k ]= r n p k = r n k n E[S n ] n log[/( ρ)] =. r n n ρ 0 x/( x)2 dx = lim p k p 2 k = r n n k n ( k n rn /n 0 ) 2, s 2 n n[ρ/( ρ)+log( ρ)] =. x dx,

Problem of coupon collector 27-29 Therefore, since Lyapounov s s condition holds for δ = 2, i.e., lim r n we obtain: s 4 n E [ X k E[X k ] 4] lim r n = lim s 4 n lim s 4 n = lim s 4 n s 4 n r n 24n lim s 4 n r n r n E [ X k 4] (2 p k )(2 2p k + p 2 k ) p 4 k 24 p 4 k 24 p 4 k rn /n S n n log[/( ρ)] n[ρ/( ρ)+log( ρ)] N. 0 ( x) 4dx =0,

Lindeberg condition Lyapounov condition 27-30 Counterexample Lindeberg s condition does not necessarily imply Lyapounov s condition. Consider a sequence of random variables that are independent, and each has density function c f(x) = for x e 2.7828..., x 3 (log(x)) 2 where c = e 2 2 2 t e t dt 0.0375343... It can be verified that the i.i.d. sequence has finite marginal second moment c e x(log(x)) 2dx = hence, Lindeberg s condition holds. However, E[ X 2+δ ]= e c u2du = c; c x δ (log(x)) 2dx = c 0 e δu du =, u2 where u =log(x). Accordingly, Lyapounov s condition does not hold!

Dependent variables 27-3 Can we extend the central limit theorem to sequence of dependent variables? Definition (α-mixing) A sequence of random variables is said to be α-mixing, if there exists a non-negative sequence α,α 2,α 3,... such that and sup sup k H R k G R lim α n =0 Pr [( X k H ) (Xn+k G) ] Pr [ X k H ] Pr [Xn+k G] α n. Operational meaning: By α-mixing, we mean that X k =(X,X 2,,X k ) and X n+k =(X n+k,x n+k+, ) are approximately independent, when n is large. An independent sequence is α-mixing with α k =0forallk =, 2, 3,...

m-dependent 27-32 Definition A sequence of random variables is said to be m-dependent, if (X i,...,x i+k ) and (X i+k+n,...,x i+k+n+l ) are independent whenever n>m. An independent sequence is 0-dependent. A m-dependent sequence is α-mixing with α n =0forn>m. Example 27.7 Let Y,Y 2,... be i.i.d. sequence. Define X n = f(y n,y n+,...,y n+m ) for a real Borel-measurable function on R m+. Then X,X 2,... is stationary and m-dependent.

α-mixing and m-dependent 27-33 Example 27.6 Let Y,Y 2,...be a first-order finite-state Markov chain with positive transition probabilities. Suppose the initial probability used is the one that makes Y,Y 2,... stationary. Then Pr[Y = i,...,y k = i k,y k+n = j 0,...,Y k+n+l = j l ] = ( p i p i i 2 p ik i k ) p (n) i k j 0 ( pj0 j p jl j l ), where p ij = P Yn Y n (j i) andp (k) ij Also, = P Yn Y n k (j i). Pr[Y = i,...,y k = i k ]Pr[Y k+n = j 0,...,Y k+n+l = j l ] = ( ) ( ) p i p i i 2 p ik i pj0 pj0 k j p jl j l.

α-mixing and m-dependent 27-34 Theorem 8.9 There exists a stationary distribution {π i } for a finite-state, irreducible, aperiodic, first-order Markovchainsuchthat p (n) ij π j Aρ n, where A 0and0 ρ<. A first-order Markov chain is aperiodic, if the greatest common divisor of the integers in the set {n N : p (n) jj > 0} is for every j. A Markov chain is irreducible, if for every i and j, p (n) ij > 0forsomen.

α-mixing and m-dependent 27-35 = = Pr [( Y k H ) ( Y n+k+l n+k G )] Pr [ Y k H ] Pr [ Y n+k+l n+k G ] ( Pr[Y = i,...,y k = i k,y k+n = j 0,...,Y k+n+l = j l ] i k H j0 l G Pr[Y = i,...,y k = i k ]Pr[Y k+n = j 0,...,Y k+n+l = j l ]) ( ) ( ) pi p i i 2 p ik i k p (n) (pj0 ) i k j 0 p j0 j p jl j l i k H i k H j0 l G j l 0 G ( pi p i i 2 p ik i k ) p (n) i k j 0 p j0 ( pj0 j p jl j l ) ( )( ) pi p i i 2 p ik i pj0 k j p jl j l Aρ n i k H Aρ n j l 0 G ( pi p i i 2 p ik i k )( pj0 j p jl j l ) i k Yk j0 l Yl+ = Aρ n =A Y ρ n, j 0 Y for some A 0and0 ρ<.

α-mixing and m-dependent 27-36 Since the upper bound is independent of k, l 0, H R k and G R l+,we obtain that Y,Y 2,... is α-mixing with α n = A Y ρ n. Hence, a finite-state, irreducible, aperiodic, first-order Markov chain is α-mixing with exponentially decaying α n.

Central limit theorem for α-mixing sequence 27-37 Theorem 27.4 Suppose that X,X 2,... is zero-mean, stationary and α-mixing with α n = O(n 5 )as. Assume that E[Xn 2 ] <. Then. n Var[S n] σ 2 = E[X]+2 2 E[X X +k ]. 2. S n σ n N. ( ) S n Var[Sn ] N. Proof: No proof is provided.

Central limit theorem for α-mixing sequence 27-38 Example 27.8 Let Y,Y 2,...be a first-order finite-state Markov chain with positive transition probabilities. Suppose the initial probability used is the one that makes Y,Y 2,... stationary. Example 27.6 already proves that Y,Y 2,... is α-mixing with α n = A Y ρ n. Thus, X = Y E[Y ],X 2 = Y 2 E[Y 2 ],... is also α-mixing with α n = A Y ρ n. Furthermore, the finite state assumption indicates that all moments of X n are bounded. Accordingly, Theorem 27.4 holds, namely, X + + X n Var[X + + X n ] N.

Central limit theorem for α-mixing sequence 27-39 More specifically, let and Y n {, +} P Yn Y n (+ ) = P Yn Y n ( +)=ε. Since the probability transition matrix can be expressed as: [ ] [ ] ε ε 2 [ ] [ = 2 0 ε ε 2 0 2ε we have: [ ] n ε ε = ε ε [ 2 2 2 2 2 ] [ 0 0 ( 2ε) n ] [ 2 2 2 2 2 2 2 2 ] T = and Y,Y 2,... are α-mixing with α n =2(/2) 2ε n = 2ε n. ] T, [ 2 + 2 ( 2ε)n 2 ] 2 ( 2ε)n 2 2 ( 2ε)n 2 +, 2 ( 2ε)n

Central limit theorem for α-mixing sequence 27-40 Theorem 27.4 then gives that: Var[Y + + Y n ] n and E[Y 2 ]+2 = y {,} +2 = +2 = +2 E[Y Y +k ] y 2 P Y (y ) y {,} y +k {,} y {,} y 2 {,} ( 2ε) k = ε, Y + + Y n n ( N. ε ) y y +k P Y+k Y (y +k y )P Y (y ) 2 y y +k P Y+k Y (y +k y )