Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may complicated problems we are ot able to fid exactly the momets or distributios of give statistics. Whe the sample size is large, we may approximate the momets ad distributios of statistics, usig asymptotic tools, some of which are studied i this course. I a asymptotic aalysis, we cosider a sample X = (X 1,...,X ) ot for fixed, but as a member of a sequece correspodig to = 0, 0 + 1,..., ad obtai the limit of the distributio of a appropriately ormalized statistic or variable T (X) as. The limitig distributio ad its momets are used as approximatios to the distributio ad momets of T (X) i the situatio with a large but actually fiite. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 1 / 17
This leads to some asymptotic statistical procedures ad asymptotic criteria for assessig their performaces. The asymptotic approach is ot oly applied to the situatio where o exact method (the approach cosiderig a fixed ) is available, but also used to provide a procedure simpler (e.g., i terms of computatio) tha that produced by the exact approach. I additio to providig more theoretical results ad/or simpler procedures, the asymptotic approach requires less striget mathematical assumptios tha does the exact approach. Defiitio 5.5.1 (covergece i probability) A sequece of radom variables Z, i = 1,2,..., coverges i probability to a radom variable Z iff for every ε > 0, lim P( Z Z ε) = 0. A sequece of radom vectors Z coverges i probability to a radom vector Z iff each compoet of Z coverges i probability to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 2 / 17
Theorem 5.5.2 (Weak Law of Large Numbers (WLLN)) Let X 1,...,X be iid radom variables with E(X i ) = µ ad fiite Var(X i ) = σ 2. The, the sample mea X coverges i probability to µ. Proof. By Chebychev s iequality ad Theorem 5.2.6, P( X µ ε) which coverges to 0 as. Remarks. Var( X) ε 2 = σ 2 ε 2 1 Although we write the sample mea as X, it depeds o. 2 The WLLN states that the probability of the sample mea X beig close to the populatio mea µ coverges to 1. 3 The existece of a fiite variace σ 2 is ot eeded; we oly eed the existece of E(X i ), a proof will be give later. 4 The idepedece assumptio is ot ecessary either: i the previous proof, we oly eed X i s are ucorrelated. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 3 / 17
Example. Suppose that X 1,...,X are idetically distributed with E(X i ) = µ ad Var(X i ) = σ 2 <, ad that { c if s t = 1 Cov(X t,x s ) = 0 if s t > 1 The X coverges i probability to µ, because ( Var( X) 1 = Var = 1 2 ( = σ 2 ad, Chebychev s iequality, P( X µ ε) i=1 i=1 X i ) = 1 2 Var ( i=1 X i ) Var(X i ) + Cov(X i,x j ) i j + ( 1)c 2 ) Var( X) ε 2 = σ 2 + (1 1 )c ε 2 0 UW-Madiso (Statistics) Stat 609 Lecture 19 2015 4 / 17
A proof of the WLLN usig chf s Let X 1,...,X be iid radom variables with E X 1 < ad E(X i ) = µ. From the result for the chf (Theorem C1), the chf of X 1 is differetiable at 0 ad φ X1 (t) = 1 + ıµt + o( t ) as t 0. The, the chf of X is φ X (t) = [ φ X1 ( t )] = [ 1 + ıµt ( )] t + o e ıµt for ay t R as, because (1 + c /) e c for ay complex sequece {c } satisfyig c c. The limitig fuctio e ıµt is the chf of the costat µ. By Theorem C7, if F X (x) is the cdf of X, the lim F X (x) = { 1 x > µ 0 x < µ This shows that X coverges i probability to µ, because of Theorem 5.5.13 to be established later. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 5 / 17
Theorem 5.5.4. Let Z 1,Z 2,... be radom vectors that coverge i probability to a radom vector Z ad let h be a cotiuous fuctio. The h(z 1 ),h(z 2 ),... coverges i probability to h(z ). Example 5.5.3. Let X 1,X 2,... be iid radom variable with E(X i ) = µ ad Var(X i ) = σ 2. Cosider the sample variace Defie S 2 = 1 1 Z = 1 i=1 i=1 (X i X) 2 = 1 1 i=1 (X i µ) 2 ( X µ) 2 1 (X i µ) 2, U = X µ, a = 1 By the WLLN, (Z,U ) coverges i probability to (σ 2,0). Note that a 1 ad a is ot radom, but we ca view that a coverges i probability to 1. The, by Theorem 5.5.4, S 2 = h(a,z,u ) = a (Z U) 2 coverges i probability to h(1,σ 2,0) = σ 2. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 6 / 17
Example 5.5.5. Cosider h(x) = x. By Theorem 5.5.4, the sample stadard deviatio S = h(s 2 ) coverges i probability to the populatio stadard deviatio σ = h(σ 2 ). Note that covergece i probability is differet from the covergece of a sequece of determiistic fuctios g (x) to a fuctio g(x) for x i a set A R k. Similar to the covergece of determiistic fuctios (ote that radom variables are radom fuctios), we have the followig cocept. Defiitio 5.5.6 (covergece almost surely) A sequece of radom variables Z, = 1,2,..., coverges almost surely to a radom variable Z iff ( ) P lim Z = Z = 1. A sequece of radom vectors Z, = 1,2,..., coverges almost surely to a radom vector Z iff each compoet of Z coverges almost surely to the correspodig compoet of Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 7 / 17
The almost sure covergece of Z to Z meas that there is a evet N such that P(N) = 0 ad for every elemet ω N c, lim Z (ω) = Z (ω), which is almost the same as poit-wise covergece for determiistic fuctios (Example 5.5.7). If a sequece of radom vectors Z coverges almost surely to a radom vector Z, ad h is a cotiuous fuctio, the h(z ) coverges almost surely to h(z ). If Z coverges almost surely to Z, the Z coverges i probability to Z. Covergece i probability, however, does ot imply covergece almost surely (Example 5.5.8). If Z coverges i probability fast eough, the it coverges almost surely, i.e., if for every ε > 0, P( Z Z ε) <, =1 the Z coverges almost surely to Z. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 8 / 17
It is, however, ot easy to costruct a example of covergece i probability but ot almost surely. Similar to the WLLN i Theorem 5.5.2, we have the followig result with almost sure covergece. Theorem 5.5.9 (Strog Law of Large Numbers (SLLN)) Let X 1,...,X be iid radom variables with E(X i ) = µ. The, the sample mea X coverges almost surely to µ. Note that we still oly require the existece of the mea, ot the secod order momet. The proof is omitted, sice it is out of the scope of the textbook. Approximatio to a itegral Suppose that h(x) is a fuctio of x R k. I may applicatios we wat to calculate a itegral R k h(x)dx UW-Madiso (Statistics) Stat 609 Lecture 19 2015 9 / 17
If the itegral is ot easy to calculate, a umerical method is eeded. The followig is the so called Mote Carlo approximatio method, which is based o the SLLN. Suppose that we ca geerate iid radom vectors X 1,X 2,... from a pdf p(x) o R k satisfyig that p(x) > 0 if h(x) 0. By the SLLN, with probability equal to 1 (almost surely), ( ) 1 h(x lim i ) h(x1 ) = E p(x i=1 i ) p(x 1 ) h(x) = R k p(x) p(x)dx = h(x)dx R k Thus, we ca approximate the itegral by the average 1 h(x i ) i=1 p(x i ) with a very large. We ca actually fid what is the large eough to have a good approximatio. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 10 / 17
We ofte eed to cosider a covergece eve weaker tha covergece i probability. Defiitio 5.5.10 (covergece i distributio) A sequece of radom variables Z, = 1,2,..., coverges i distributio to a radom variable Z iff lim F Z (x) = F Z (x), x {y : F Z (y) is cotious } where F Z ad F Z are the cdf s of Z ad Z, respectively. Note that we oly eed to cosider the covergece at x that is a cotiuity poit of F Z. Note that cdf s, ot pdf s or pmf s, are ivolved i this defiitio. I covergece i distributio, it is really the cdfs that coverge, ot the radom variables; i fact, the radom variables ca be defied i differet spaces, which is very differet from the covergece i probability or almost surely. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 11 / 17
Example 5.5.11. Let X 1,X 2,... be iid from uiform o (0,1) ad X () = max i=1,..., X i. For every ε > 0, P( X () 1 ε) = P(X () 1 ε) + P(X () 1 + ε) = P(X () 1 ε) = P(X i 1 ε,i = 1,...,) = (1 ε) Hece, X () coverges i probability to 1. I fact, sice =1 (1 ε) <, X () coverges almost surely to 1. For ay t > 0, P((1 X () ) t) = 1 P((1 X () ) t) = 1 P(X () 1 t/) = 1 (1 t/) 1 e t which is the cdf of the expoetial(0, 1) distributio. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 12 / 17
It is clear that P((1 X () ) t) = 0 if t 0. Thus, (1 X () ) coverges i distributio to Z expoetial(0,1). The ext theorem shows that covergece i distributio is weaker tha covergece i probability ad, hece, is also weaker tha almost sure covergece. Theorem 5.5.12. If Z coverges i probability to Z, the Z coverges i distributio to Z. Proof. For ay x R ad ε > 0, F Z (x ε) = P ( Z x ε ) Lettig, we obtai that P ( Z x ) + P ( Z x ε,z > x ) F Z (x) + P ( Z Z > ε). F Z (x ε) limiff Z (x). UW-Madiso (Statistics) Stat 609 Lecture 19 2015 13 / 17
Switchig Z ad Z i the previous argumet, we ca show that i.e., F Z (x ε) limif Sice ε is arbitrary, F Z (x + ε) limsupf Z (x) F Z (x) limsupf Z (x) F Z (x + ε) lim F Z (x ε) limiff Z (x) limsupf Z (x) lim F Z (x + ε) ε 0 ε 0 Now, if F Z is cotiuous at x, the the limit o the far left had side equals the limit o the far right had side ad both are equal to F Z (x), which shows that F Z (x) = lim F Z (x). Example. The coverse of Theorem 5.5.12 is ot true i geeral. Let θ = 1 + 1 ad X be a radom variable havig the expoetial(0,θ ) distributio, = 1,2,... Let X be a radom variable expoetial(0,1). UW-Madiso (Statistics) Stat 609 Lecture 19 2015 14 / 17
For ay x > 0, as, F X (x) = 1 e x/θ 1 e x = F X (x) Sice F X (x) 0 F X (x) for x 0, we have show that X coverges i distributio to X. Does X coverge i probability to X? Case 1 Need further iformatio about the radom variables X ad X. We cosider two cases i which differet aswers ca be obtaied. Suppose that X θ X (the X has the give distributio). X X = (θ 1)X = 1 X, which has the cdf (1 e x )I [0, ) (x). The, X coverges i probability to X, because, for ay ε > 0, P ( X X ε) = e ε 0 I fact, X coverges almost surely to X, sice e ε < =1 UW-Madiso (Statistics) Stat 609 Lecture 19 2015 15 / 17
Case 2 Suppose that X ad X are idepedet radom variables. Sice the pdf s of X ad X are θ 1 e x/θ I (0, ) (x) ad e x I (,0) (x), respectively, we have ε P ( X X ε) = θ 1 e x/θ e y x I (0, ) (x)i (,x) (y)dxdy, ε which coverges to (by the domiated covergece theorem) ε e x e y x I (0, ) (x)i (,x) (y)dxdy = 1 e ε. Thus, ε P ( X X ε) e ε > 0 for ay ε > 0 ad, therefore, X does ot coverge i probability to X. I oe situatio, however, covergece i distributio is equivalet to covergece i probability, as the followig result shows. Theorem 5.5.13. Z coverges i probability to a costat c iff Z coverges i distributio to c. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 16 / 17
Proof. The oly if" part is a special case of Theorem 5.5.12. Hece, we oly eed to show the if" part. If Z coverges i distributio to a costat c, the lim P(Z x) = { 0 x < c 1 x > c which is the cdf of a costat c. (Note that the limit does ot iclude the case of x = 0, which is a discotiuity poit of the cdf of c. For every ε > 0, P( Z c ε) = P(Z c ε) + P(Z c ε) = P(Z c + ε) + P(Z c ε) = 1 P(Z < c + ε) + P(Z c ε) 1 P(Z < c + ε/2) + P(Z c ε) 1 1 + 0 = 0 sice c ε < c ad c + ε/2 > c. This proves that Z coverges i probability to c. UW-Madiso (Statistics) Stat 609 Lecture 19 2015 17 / 17