Probability and Measure

Similar documents
7 Convergence in R d and in Metric Spaces

STA205 Probability: Week 8 R. Wolpert

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems

17. Convergence of Random Variables

Probability and Measure

STA 711: Probability & Measure Theory Robert L. Wolpert

Lecture 4: September Reminder: convergence of sequences

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

1 Sequences of events and their limits

An Introduction to Laws of Large Numbers

Problem set 1, Real Analysis I, Spring, 2015.

Notes 1 : Measure-theoretic foundations I

Metric Spaces and Topology

8 Laws of large numbers

Inference for Stochastic Processes

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Probability Theory. Richard F. Bass

A D VA N C E D P R O B A B I L - I T Y

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Introduction and Preliminaries

Lecture 6 Basic Probability

The Borel-Cantelli Group

MATH 140B - HW 5 SOLUTIONS

Advanced Probability

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Statistical Inference

Lecture 1: Overview of percolation and foundational results from probability theory 30th July, 2nd August and 6th August 2007

18.175: Lecture 2 Extension theorems, random variables, distributions

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

1 Stat 605. Homework I. Due Feb. 1, 2011

Real Analysis Notes. Thomas Goller

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Solutions: Problem Set 4 Math 201B, Winter 2007

Course 212: Academic Year Section 1: Metric Spaces

Homework 11. Solutions

I. ANALYSIS; PROBABILITY

Measure and integration

Tools from Lebesgue integration

Lecture Notes 3 Convergence (Chapter 5)

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

Part II Probability and Measure

P-adic Functions - Part 1

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

Lecture 2: Convergence of Random Variables

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Numerical Sequences and Series

Probability and Measure

1 The Glivenko-Cantelli Theorem

MTH 202 : Probability and Statistics

MATHS 730 FC Lecture Notes March 5, Introduction

1 Independent increments

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Convergence of random variables, and the Borel-Cantelli lemmas

Introduction to Dynamical Systems

Analysis Qualifying Exam

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

2 Measure Theory. 2.1 Measures

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Part 2 Continuous functions and their properties

Integration on Measure Spaces

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

Convergence of Random Variables

Math LM (24543) Lectures 01

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

McGill University Math 354: Honors Analysis 3

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

A PECULIAR COIN-TOSSING MODEL

Measure and Integration: Solutions of CW2

Lecture 7. Sums of random variables

1. Probability Measure and Integration Theory in a Nutshell

1 Probability theory. 2 Random variables and probability theory.

Introduction to Real Analysis Alternative Chapter 1

Useful Probability Theorems

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Compact operators on Banach spaces

Metric Spaces Math 413 Honors Project

Problem Set 2: Solutions Math 201A: Fall 2016

REVIEW OF ESSENTIAL MATH 346 TOPICS

Lecture Notes for MA 623 Stochastic Processes. Ionut Florescu. Stevens Institute of Technology address:

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

On the convergence of sequences of random variables: A primer

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

1: PROBABILITY REVIEW

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

1 Probability space and random variables

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Some Background Material

1 Measurable Functions

Lecture I: Asymptotics for large GUE random matrices

Exercises Measure Theoretic Probability

ST213 Mathematics of Random Events

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Preliminaries. Probability space

Transcription:

Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real Numbers A sequence of real numbers a n converges to a limit a if and only if, for each ɛ > 0, the sequence a n eventually lies within a ball of radius ɛ centered at a. It s okay if the first few (or few million) terms lie outside that ball and the number of terms that do lie outside the ball may depend on how big ɛ is (if ɛ is small enough it will take millions of terms before the remaining sequence lies inside the ball). This can be made mathematically precise by introducing a letter (say, N ɛ ) for how many initial terms we have to throw away, so that a n a if and only if there is an N ɛ < so that, for each n N ɛ, a n a < ɛ: only finitely many a n can be farther than ɛ from a. The same notion of convergence really works in any (complete) metric space, where we require that some measure of the distance d(a n, a) from a n to a tend to zero in the sense that it exceeds each number ɛ > 0 for at most some finite number N ɛ of terms. Points a n in d-dimensional Euclidean space will converge to a limit a R d if and only if each of their coordinates converges; and, since there are only finitely many of them, if they all converge then they do so uniformly (i.e., for each ɛ we can take the same N ɛ for all d of the coordinate sequences). 2. Convergence of Random Variables For random variables X n the idea of convergence to a limiting random variable X is more delicate, since each X n is a function of ω Ω and usually 1

there are infinitely many points ω Ω. What should we mean in asking about the convergence of a sequence X n of random variables to a limit X? Should we mean that X n (ω) converges to X(ω) for each fixed ω? Or that these sequences converge uniformly in ω Ω? Or that some notion of the distance d(x n, X) between X n and the limit X decreases to zero? Should the probability measure P be involved in some way? Here are a few different choices of what we might mean by the statement that X n converges to X, for a sequence of random variables X n and a random variable X, all defined on the same probability space (Ω, F, P): pw: The sequence of real numbers X n (ω) X(ω) for every ω Ω (pointwise convergence): ( ɛ > 0) ( ω Ω) ( N ɛ,ω < ) ( n N ɛ,ω ) X n (ω) X(ω) < ɛ. uni: The sequences of real numbers X n (ω) X(ω) uniformly for ω Ω: ( ɛ > 0) ( N ɛ < ) ( ω Ω) ( n N ɛ ) X n (ω) X(ω) < ɛ. a.s..: Outside some null event N F, each sequence of real numbers X n (ω) X(ω) (Almost-Sure convergence, or almost everywhere (a.e.)): for some N F with P[N] = 0, ( ɛ > 0) ( ω / N) ( N ɛ,ω < ) ( n N ɛ,ω ) X n (ω) X(ω) < ɛ, i.e., P { ɛ>0 N< n N X n (ω) X(ω) ɛ } = 0. L : Outside some null event N F, the sequences of real numbers X n (ω) X(ω) converge uniformly ( almost-uniform or L convergence): for some N F with P[N] = 0, ( ɛ > 0) ( N ɛ < ) ( ω / N) ( n N ɛ ) X n (ω) X(ω) < ɛ. i.p.: For each ɛ > 0, the probabilities P[ X n X > ɛ] 0 (convergence in probability, or in measure ): ( ɛ > 0) ( η > 0) ( N ɛ,η < ) ( n N ɛ,η ) P[ X n X > ɛ] < η. L 1 : The expectation E[ X n X ] converges to zero (convergence in L 1 ): ( ɛ > 0) ( N ɛ < ) ( n N ɛ ) E[ X n X ] < ɛ. 2

L p : For some fixed number p > 0, the expectation of the p th power E[ X n X p ] converges to zero (convergence in L p, sometimes called in the p th mean ): ( ɛ > 0) ( N ɛ < ) ( n N ɛ ) E[ X n X p ] < ɛ. i.d.: The distributions of X n converge to the distribution of X, i.e., the measures P Xn 1 converge in some way to P X 1 ( vague or weak convergence, or convergence in distribution, sometimes written X n X): ( ɛ > 0) ( φ C b (R)) ( N ɛ,φ < ) ( n N ɛ,φ ) E[ φ(x n ) φ(x) ] < ɛ. Which of these eight notions of convergence is right for random variables? The answer is that all of them are useful in probability theory for one purpose or another. You will want to know which ones imply which other ones, under what conditions. All but the first two (pointwise, uniform) notions depend upon the measure P; it is possible for a sequence X n to converge to X in any of these senses for one probability measure P, but to fail to converge for another P. Most of them can be phrased as metric convergence for some notion of distance between random variables: i.p.: X n X in probability if and only if d 0 (X, X n ) 0 as real numbers, where: ( ) X Y d 0 (X, Y ) E 1 + X Y L 1 : X n X in L 1 if and only if d 1 (X, X n ) = X X n 1 0 as real numbers, where: X Y 1 E X Y L p : X n X in L p if and only if d p (X, X n ) = X X n p 0 as real numbers, where: X Y p (E X Y p ) 1/p L : X n X almost uniformly if and only if d (X, X n ) = X Y 0 as real numbers, where: X Y = l.u.b.{r < : P[ X Y > r] > 0} As the notation suggests, convergence in probability and in L are in some sense limits of convergence in L p as p 0 and p, respectively. Almostsure convergence is an exception: there is no metric notion of distance d(x, Y ) for which X n X almost surely if and only if d(x, X n ) 0. 3

2.1. Almost-Sure Convergence Let {X n } and X be a collection of RV s on some (Ω, F, P). The set of points ω for which X n (ω) does converge to X(ω) is just ɛ>0 m=1 n=m [ω : X n (ω) X(ω) ɛ], the points which, for all ɛ > 0, have X n (ω) X(ω) less than ɛ for all but finitely-many n. The sequence X n is said to converge almost everywhere (a.e.) to X, or to converge to X almost surely (a.s..), if this set of ω has probability one, or (conversely) if its complement is a null set: [ P ɛ>0 m=1 n=m ] [ω : X n (ω) X(ω) > ɛ] = 0. The union over ɛ > 0 is only a countable one, since we need include only rational ɛ (or, for that matter, any sequence ɛ k tending to zero, such as ɛ k = 1/k). Thus X n X a.e. if and only if, for each ɛ > 0, [ P m=1 n=m ] [ω : X n (ω) X(ω) > ɛ] = 0. (a.e.) This combination of intersection and union occurs frequently in probability, and has a name; for any sequence E n of events, [ m=1 n=m E n] is called the lim sup of the {E n }, and is sometimes described more colorfully as [E n i.o.], the set of points in E n infinitely often. Its complement is the lim inf of the sets F n = En c, [ m=1 n=m F n]: the set of points in all but finitely many of the F n. Since P is countably additive, and since the intersection in the definition of lim sup is decreasing and the union in the definition of lim inf is increasing, always we have P[ n=m E n] P[ m=1 m. Thus, n=m E n] and P[ n=m F n] P[ m=1 n=m F n] as Theorem 1 X n X P-a.s.. if and only if for every ɛ > 0, lim P[ X n X > ɛ for some n m] = 0. m In particular, X n X P-a.s.. if P[ X n X > ɛ] < for each ɛ > 0 (why?). 4

2.2. Convergence In Probability The sequence X n is said to converge to X in probability (i.p.) if, for each ɛ > 0, P[ω : X n (ω) X(ω) > ɛ] 0. (i.p.) If we denote by E n the event [ω : X n (ω) X(ω) > ɛ] we see that convergence almost surely requires that P[ n m E n] 0 as m, while convergence in probability requires only that P[E n ] 0. Thus: Theorem 2 If X n X a.e. then X n X i.p. Here is a partial converse: Theorem 3 If X n X i.p., then there is a subsequence n k such that X nk X a.e. Proof. Set n 0 = 0 and, for each integer k 1, set { [ n k = inf n > n k 1 : P ω : X n (ω) X(ω) > 1 ] } 2 k. k For any ɛ > 0 we have 1 k < ɛ eventually (namely, for k > k 0 = 1 ɛ ) and for each m > k 0, [ P k=m ] [ω : X nk (ω) X(ω) > ɛ] [ P k=m [ω : X nk (ω) X(ω) > 1 k ] ] P[ω : X nk (ω) X(ω) > 1 k ] k=m k=m 2 k = 2 1 m 0 as m. 5

2.2.1. A Counter-Example If X n X a.e. implies X n X i.p., and if the converse holds at least along subsequences, are the two notions really identical? Or is it possible for RV s X n to converge to X i.p., but not a.e.? The answer is that the two notions are different, and that a.e. convergence is strictly stronger than convergence i.p. Here s an example: Let (Ω, F, P) be the unit interval with Borel sets and Lebesgue measure. Define a sequence of random variables X n : Ω R by X n (ω) = { i 1 if < ω i+1 2 j 2 j 0 otherwise where n = i + 2 j, 0 i < 2 j. Each X n is one on an interval of length 2 j, where j = log 2 (n) ; since 1 n 1 2 j < 2 n, P[ X n > ɛ] = 2 j < 2 n 0 for each 0 < ɛ < 1 and X n 0 i.p. On the other hand, for every j > 0 we have Ω = 2 j 1 i=0 ( i 2 j, i + 1 ] 2 j = 2 j+1 1 n=2 j [ ω : Xn (ω) = 1 ] so [ω : X n (ω) 0] is empty, not a set of probability one! Obviously X n does not converge a.e. This example is a building-block for several examples to come, so getting to know it well is worth while. Try to verify that X n 0 in probability and in L p but not almost surely. What is X n p? Why doesn t X n 0 a.s.? What would happen if we multiplied X n by n? By n 2? What about the subsequence Y n = X 2 n? Does X n converge in L? 3. Cauchy Convergence Sometimes we wish to consider a sequence X n that converges to some limit X, perhaps without knowing X in advance; the concept of Cauchy Convergence is ideal for this. For any of the distance measures d p above, with 0 p, say X n is a Cauchy sequence in L p if ( ɛ > 0)( N < )( n m N) d p (X m, X n ) < ɛ. 6

The spaces L p for 0 p are all complete in the sense that if X n is Cauchy for d p then there exists X L p for which d p (X n, X) 0. To see this, take an increasing subsequence N k along which d p (X m, X n ) < 2 k for n m N k, and set X 0 = 0 and N 0 = 0; set Y k X Nk X Nk 1. Check to confirm that k=1 Y k converges a.s., to some limit X L p with d p (X n, X) 0. 4. Uniform Integrability Let Y 0 be integrable on some probability space (Ω, F, P), E[Y ] = Y dp < ; it follows (from DCT or MCT, for example) that lim E[Y 1 [Y >t]] = Y dp = 0 t Ω [ω:y (ω)>t] and, consequently, that for any sequence of random variables X n dominated by Y in the sense that X n Y a.s.., lim E[X n 1 t [Xn>t]] = X n dp [ω:x n(ω)>t] Y dp [ω:y (ω)>t] = 0, uniformly in n. Call the sequence X n uniformly integrable (or simply UI) if E[X n 1 [Xn>t]] 0 uniformly in n, even if it is not dominated by a single integrable random variable Y. The big result is: Theorem 4 If X n X i.p. and if X n is UI then X n X in L 1. Proof. Without loss of generality take X 0. Fix any ɛ > 0; find (by UI) t ɛ > 0 such that E[ X n 1 [Xn>t ɛ]] ɛ for all n. Now find (by X n X i.p.) N ɛ N such that, for n N ɛ, P[ X n > ɛ] < ɛ/t ɛ ; then: E[ X n ] = X n dp + X n dp + X n dp [ X n ɛ] [ɛ< X n t ɛ] ɛ dp + t ɛ dp + [ X n ɛ] [ɛ< X n t ɛ] ɛ + (t ɛ )P[ X n > ɛ] + ɛ 3ɛ. 7 [t ɛ< X n ] [t ɛ< X n ] X n dp

Similarly, for any p > 0, X n X (i.p.) and X n p UI (for example, X n Y L p ) gives X n X (L p ). In the special case of X n Y L p this is just Lebesgue s Dominated Convergence Theorem (DCT). We have seen that {X n } is UI whenever X n Y L 1, but UI is more general than that. Here are two more criteria: Theorem 5 If {X n } is uniformly bounded in L p for some p > 1 then {X n } is UI. Proof. Let c R + be an upper bound for E X n p. First recall that, by Fubini s Theorem, any random variable X satisfies for any q > 0 E X q = 0 q x q 1 P[ X > x] dx. We apply this for q = 1 and q = p to the random variables X n and, for t > 0, to X n 1 Xn >t. Fix any t > 0; then E [ ] X n 1 Xn >t = = 0 t 0 P[ X n 1 Xn >t > x] dx P[ X n > t] dx + t P[ X n p > t p ] + t E X n p t p + 1 p t p 1 t t = t 1 p (1 + p 1 )E X n p c t 1 p (1 + p 1 ) 0 as t, uniformly in n. 0 P[ X n > x] dx p x p 1 p t p 1 P[ X n > x] dx p x p 1 P[ X n > x] dx Theorem 6 If {X n } is UI, then ( ɛ > 0)( δ > 0)( A F w/ P(A) < δ) E[ X n 1 A ] < ɛ. Conversely, if {X n } is uniformly bounded in L 1 and if ( ɛ > 0)( δ > 0) such that E[ X n 1 A ] < ɛ whenever P[A] < δ, then {X n } is UI. Proof. Straightforward. The condition {X n } is uniformly bounded in L 1 is unnecessary if (Ω, F, P) is non-atomic. 8

5. Summary: Uniform Integrability and Convergence Concepts I. Uniform Integrability (UI) A. X n < Y L r, r > 0, implies [ X n >t] X n r dp [Y >t] Y r dp 0 as t, uniformly in n. This is the definition of UI. B. If (Ω, F, P) nonatomic, X n UI iff ɛ δ Λ X n dp < ɛ for P[Λ] δ (take δ = ɛ/2t) [If (Ω, F, P) has atoms must also require E X n B] C. E X n p c p < implies X n r UI for each r < p... δ = (ɛ/c) q, 1 p + 1 q = 1, q = p p 1. 1. Remark: not for r = p (counterexample: X n = n1 (0,1/n] ) D. Main result (Thm 4.5.4, p97): If X n X i.p. then X n r UI iff X n X in L r iff E X n r E X r. II. Vague Convergence A. X n X i.p. iff n k n ki X nki X a.e. (by contradiction) B. X n X a.s.. and φ(x) continuous implies φ(x n ) φ(x) a.s.. C. X n X i.p. and φ(x) continuous implies φ(x n ) φ(x) i.p. (use A) D. Definition: X n X if Eφ(X n ) Eφ(X) φ C b (R) 1. Prop: X n X i.p. implies X n X (use II.C) 2. Prop: X n X implies F n (r) F (r) wherever F (r) = F (r ). a. Remark: Even if X n X, F n (r) may not converge where F (r) jumps; b. Remark: Even if X n X, f n (r) = F n (r) may not converge to f(r) = F (r); in fact, either may fail to exist. III. Implications among these notions: a.e., i.p., L r, L p, L, i.d. (0<r<p< ): A. a.e. = i.p. (by Easy Borel-Cantelli) 1. i.p. = a.e. along subsequences 2. i.p. a.e. (counterexample: X n (ω) = 1 (i/2 j,(i+1)/2 j ](ω), n = i + 2 j ) B. L p = i.p. (by Chebychev s inequality) 1. i.p. = L p under Uniform Integrability 2. i.p. L p (counterexample: X n = n 1/p 1 (0,1/n] ) C. L p = L r (by Jensen s inequality) 9

1. L r L p (counterexample: X n = n 1/p 1 (0,1/n] ) D. L = L p (simple estimate) 1. L p L (counterexample: X n = n 1/2p 1 (0,1/n] ) E. L = a.e. (uniform cgce implies pointwise cgce) F. i.p. = i.d. (II.D.1 above) 1. i.d. i.p. (counterexample: X n, X on different spaces) 2. i.d. = a.s.. ( (Ω, F, P), X n, X X n X a.e...) 10

6. Infinite Coin-Toss and the Laws of Large Numbers The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and independent trials in which E occurs. Similarly the expectation of a random variable X is taken to be its asymptotic average, the limit as n of the average of n repeated, similar, and independent replications of X. As statisticians trying to make inference about the underlying probability distribution f(x θ) governing observed random variables X i, this suggests that we should be interested in the probability distribution for large n of quantities like the average of the RV s, 1 n n i=1 X i. Three of the most celebrated theorems of probability theory concern this sum. For independent random variables X i, all with the same probability distribution satisfying E X i 3 <, set µ = EX i, σ 2 = E X i µ 2, and S n = n i=1 X i. The three main results are: Laws of Large Numbers: Central Limit Theorem: S n nµ σn S n nµ σ n 0 (i.p. and a.s..) = N(0, 1) (i.d.) Law of the Iterated Logarithm: lim sup ± S n nµ σ 2n log log n = 1.0 (a.s..) Together these three give a clear picture of how quickly and in what sense 1 n S n tends to µ. We begin with the Law of Large Numbers (LLN), in its weak form (asserting convergence i.p.) and in its strong form (convergence a.s..). There are several versions of both theorems. The simplest requires the X i to be IID and L 2 ; stronger results allow us to weaken (but not eliminate) the independence requirement, permit non-identical distributions, and consider what happens if the RV s are only L 1 (or worse!) instead of L 2. The text covers these things well; to complement it I am going to: (1) Prove the simplest version, and with it the Borel-Cantelli theorems; and (2) Show what happens with Cauchy random variables, which don t satisfy the requirements (the LLN fails). 11

I. Weak version, non-iid, L 2 : µ i = EX i, σ ij = E[X i µ i ][X j µ j ] A. Y n = (S n Σµ i )/n satisfies EY n = 0, EY 2 n = 1 n 2 Σ i σ ii + 2 n 2 Σ i<j σ ij ; 1. If σ ii M and σ ij 0, Chebychev = Y n 0, i.p. 2. (pairwise) IID L 2 is OK II. Strong version, non-iid, L 2 : EX i = 0, EX 2 i M, EX i X j 0. A. P[ S n > nɛ] < Mn n 2 ɛ 2 = M nɛ 2 1. P[ S n 2 > n 2 ɛ] < M, Σ n 2 ɛ 2 n P[ S n 2 > n 2 ɛ] < Mπ2 6ɛ 2 2. Borel-Cantelli: P[ S n 2 > n 2 1 ɛ i.o.] = 0, S n 2 n 2 0 a.s.. 3. D n = max n 2 k<(n+1) 2 S k S n 2, EDn 2 2nE S (n+1) 2 S n 2 4n 2 M 4. Chebychev: P[D n > n 2 ɛ] < 4n2 M n 4 ɛ, D 2 n 0 a.s.. B. S k /k S n 2 +D n 0 a.s.., QED n 2 1. Bernoulli RV s, normal number theorem, Monte Carlo. III. Weak version, pairwise-iid, L 1 A. Equivalent sequences: n P[X n Y n ] < 1. n [X n Y n ] < a.s.. 2. n i=1 [X i], a n n i=1 [X i] converge iff n i=1 [Y i], a n n i=1 [Y i] do 3. Y n = X n 1 [ Xn n] IV. Counterexamples: Cauchy, A. X i dx π[1+x 2 ] = P[ S n /n ɛ] 2 π tan 1 (ɛ) 1, WLLN fails. B. P[X i = n] = ±c, n 1; X n 2 i / L 1, and S n /n 0 i.p. or a.s.. C. P[X i = n] = ±c n 2 log n, n 3; X i / L 1, but S n /n 0 i.p. and not a.s.. D. Medians: for ANY RV s X n X i.p., then m n m if m is unique. Let X i be iid standard Cauchy RV s, with P[X 1 t] = and characteristic function E e iλx 1 = t dx π[1 + x 2 ] = 1 2 + 1 π arctan(t) e iλx so S n /n has characteristic function E e iλsn/n = E e i λ n [X 1+ +X n] = dx π[1 + x 2 ] = e λ, ( ) E e i λ n n X 1 = (e λ n ) n = e λ 12

and S n /n also has the standard Cauchy distribution with P[S n /n t] = 1 2 + 1 π arctan(t); in particular, S n/n does not converge almost surely, or even in probability. 13