On the convergence of sequences of random variables: A primer

Similar documents
Weak convergence in Probability Theory A summer excursion! Day 3

Convergence of Random Variables

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Convergence Concepts of Random Variables and Functions

1. Supremum and Infimum Remark: In this sections, all the subsets of R are assumed to be nonempty.

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

Exercises in Extreme value theory

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

REVIEW OF ESSENTIAL MATH 346 TOPICS

Logical Connectives and Quantifiers

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

STOR 635 Notes (S13)

17. Convergence of Random Variables

Preliminaries. Probability space

Probability and Measure

Stochastic Convergence, Delta Method & Moment Estimators

Stat 8112 Lecture Notes Weak Convergence in Metric Spaces Charles J. Geyer January 23, Metric Spaces

7 Convergence in R d and in Metric Spaces

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Probability and Measure

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

CHAPTER 3. Sequences. 1. Basic Properties

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

1 Sequences of events and their limits

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

3 Measurable Functions

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

18.175: Lecture 3 Integration

Econ Lecture 3. Outline. 1. Metric Spaces and Normed Spaces 2. Convergence of Sequences in Metric Spaces 3. Sequences in R and R n

Estimates for probabilities of independent events and infinite series

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Introduction and Preliminaries

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

STAT 7032 Probability Spring Wlodek Bryc

The main results about probability measures are the following two facts:

Lecture 6 Basic Probability

1 Exercises for lecture 1

Chapter 5. Measurable Functions

Convergence of Random Variables

Real Analysis Notes. Thomas Goller

Stochastic Models (Lecture #4)

Probability and Measure

4 Sums of Independent Random Variables

P (A G) dp G P (A G)

Part II Probability and Measure

MATHS 730 FC Lecture Notes March 5, Introduction

Lecture Notes for MA 623 Stochastic Processes. Ionut Florescu. Stevens Institute of Technology address:

Lecture 2: Convergence of Random Variables

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

Probability Theory. Richard F. Bass

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 5. Weak convergence

PROBABILITY THEORY II

Continuity. Chapter 4

MATH 409 Advanced Calculus I Lecture 12: Uniform continuity. Exponential functions.

1 Probability space and random variables

l(y j ) = 0 for all y j (1)

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Solution. 1 Solution of Homework 7. Sangchul Lee. March 22, Problem 1.1

Math 832 Fall University of Wisconsin at Madison. Instructor: David F. Anderson

3 Integration and Expectation

Chapter 3 Continuous Functions

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

4 Expectation & the Lebesgue Theorems

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Lecture 3. Econ August 12

THEOREMS, ETC., FOR MATH 515

Lecture 22: Variance and Covariance

1 Presessional Probability

Sequences. Limits of Sequences. Definition. A real-valued sequence s is any function s : N R.

Brownian Motion and Conditional Probability

Limit and Continuity

Product measure and Fubini s theorem

A LITTLE REAL ANALYSIS AND TOPOLOGY

Problem set 1, Real Analysis I, Spring, 2015.

Stat 5101 Lecture Slides Deck 4. Charles J. Geyer School of Statistics University of Minnesota

arxiv: v1 [math.pr] 6 Sep 2012

7 About Egorov s and Lusin s theorems

Notes 1 : Measure-theoretic foundations I

Continuity. Chapter 4

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

CHAPTER 3: LARGE SAMPLE THEORY

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

Homework 11. Solutions

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

Existence of a Limit on a Dense Set, and. Construction of Continuous Functions on Special Sets

MATH 131A: REAL ANALYSIS (BIG IDEAS)

Measure-theoretic probability

MATH 140B - HW 5 SOLUTIONS

2.2 Some Consequences of the Completeness Axiom

Chapter 4. Measure Theory. 1. Measure Spaces

5 Birkhoff s Ergodic Theorem

Empirical Processes: General Weak Convergence Theory

1 Weak Convergence in R k

Lecture 2: Repetition of probability theory and statistics

Transcription:

BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu

BCAM May 2012 2 A sequence a : N 0 R, often described as {a n, n = 1, 2,...}, converges to some a in R if for every ε > 0, there exists n (ε) such that a n a ε, n n (ε) We write lim a n = a or n a n a This definition contains two basic questions: Existence It converges! Value Find the limiting value What happens if a = ±?

BCAM May 2012 3 Existence Every monotone sequence converges! Bolzano-Weierstrass: Every bounded sequence contains at least one convergent subsequence! Given a sequence a : N 0 R, define lim sup n a n = inf n 1 a n with a n = ( ) sup a m m n and lim inf n = sup a n with a n = n 1 ( ) inf a m m n

BCAM May 2012 4 and lim sup n a n = Largest accumulation point of the sequence lim inf n a n = Smallest accumulation point of the sequence a n liminf a n and a n limsupa n n n lim inf a n limsupa n n n Fact:: The sequence a : N 0 R converges if and only if lim inf a n = limsupa n lim a n n n n

BCAM May 2012 5 A sequence a : N 0 R is said to be Cauchy if for every ε > 0, there exists n (ε) such that a n a m ε, m, n n (ε) Fact: A sequence a : N 0 R converges if and only if it is Cauchy R is complete under its usual topology

BCAM May 2012 6 Cesaro convergence The Cesaro sequence associated with the sequence a : N 0 R is the sequence a : N 0 R given by a n = 1 n (a 1 +... + a n ), n = 1, 2,... A sequence a : N 0 R is Cesaro-convergent if the associated Cesaro sequence a : N 0 R converges.

BCAM May 2012 7 Fact: A convergent sequence a : N 0 R with limit a is also Cesaro-convergent with limit a, namely lim n a n = a However, the converse is not true, e.g., a n = ( 1) n, n = 1, 2,,... Averaging is good! Law of Large Numbers!!!

BCAM May 2012 8 Random variables Given a probability triple (Ω, F, P), a d-dimensional random variable (rv) is a measurable mapping X : Ω R d such that X 1 (B) = {ω Ω : X(ω) B} F, B B(R d ) Two viewpoints Rv as a mapping Rv as a probability distribution function (i.e., measure) F : R d [0, 1] : x F(x) P [X x] Multiple modes of convergence with many subtleties!

BCAM May 2012 9 An obvious definition... Consider a collection {X; X n, n = 1, 2,...} of R d -valued rvs all defined on the same probability triple (Ω, F, P). Then, we say convergence takes place to X if lim X n(ω) = X(ω), n ω Ω Why not? Too strong Modeling information: Often only the corresponding probability distributions {F n, n = 1, 2,...} are available

BCAM May 2012 10 Four basic modes of convergence Convergence in distribution Convergence in the r th -mean (r 1) Convergence in probability Convergence with probability one (w.p. 1) Easy-to use-criteria Relationships Impact of (continuous) transformations Cesaro convergence Key limit theorems of Probability Theory

BCAM May 2012 11 Convergence with probability one Consider a collection {X; X n, n = 1, 2,...} of R d -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {X n, n = 1, 2,...} converges almost surely (a.s.) (or with probability one (w.p. 1)) if [ ] P ω Ω : lim X n(ω) = X(ω) = 1 n We write lim n X n = X a.s.

BCAM May 2012 12 Convergence in distribution Also known as convergence in law and weak convergence. Multiple equivalent definitions available A sequence of probability distribution functions {F n, n = 1, 2,...} on R converges in distribution to the probability distribution function F on R, written F n = n F, if lim F n(x) = F(x), n where C F denotes the continuity set of F. x C F

BCAM May 2012 13 Why this definition? The limit of a distribution is not always a distribution Skorokhod s Theorem: Assume the sequence of probability distribution functions {F n, n = 1, 2,...} on R to converge in distribution to the probability distribution function F on R. Then there exists a single probability triple (Ω, F, P) and a collection of rvs {X, X n, n = 1, 2,...} defined on it such that with the property that X F, X n F n, n = 1, 2,... lim X n(ω) = X(ω), n ω Ω

BCAM May 2012 14 Proof: Take Ω = (0, 1), F = B((0, 1)), P = λ and set X(ω) = F (ω) and X n (ω) = F n (ω), ω Ω n = 1, 2,... For any non-decreasing function F : R [0, 1], define its (left-continuous) generalized inverse F : [0, 1] R {± } by F (t) inf{x R : F(x) t}, 0 t 1

BCAM May 2012 15 Auto-regressive sequences Consider X 0 = ξ X t+1 = αx t + W t+1, t = 0, 1,... Assume α 0 R-valued rv ξ i.i.d. R-valued rvs {W, W t, t = 1, 2,...} with E [ W ] < Mutual independence

BCAM May 2012 16 Fact: if α < 1, then there exists an R-valued rv X such that X t = t X regardless of the initial condition ξ. The rv X has finite first moment and is characterized by X = st α s W s s=1 because t X t+1 = α t+1 ξ + α t s W s+1 s=0 t = st α t+1 ξ + α s W s+1 (1) s=0 (W 1, W 2,...,W t ) = st (W t, W t 1,...,W 1 )

BCAM May 2012 17 Lindley s recursion Consider X 0 = ξ X t+1 = (X t + η t+1 ) +, t = 0, 1,... Assume R + -valued rv ξ i.i.d. R-valued rvs {η, η t, t = 1, 2,...} with E [ η ] < Mutual independence

BCAM May 2012 18 Fact: If E [η] < 0, then there exists an R + -valued rv X such that X t = t X regardless of the initial condition ξ, with ( ) + X = st sup (η 1 +... + η t ) t=1,2,...

BCAM May 2012 19 Analytic view of weak convergence With R d -valued rv X = (X 1,...,X d ), define its characteristic function Φ X : R C given by [ ] Φ X (t) = E e it X, t R d Also Φ F = Φ X where X F Uniqueness: Φ F = Φ G if and only if F = G

BCAM May 2012 20 Fact: With R d -valued rv X = (X 1,...,X d ), its characteristic function Φ X : R C satisfies the following properties: Bounded: Φ X (t) Φ X (0) = 1, t R d Uniformly continuous on R d : lim sup ( Φ X (t + h) Φ X (t) ) = 0 h 0 t R d Positive definiteness: For every n = 1, 2,..., every t 1,...,t n in R d and every z 1,...,z n in C, n k=1 n Φ X (t k t l )z k zl 0 l=1 This charactrizes characteristic functions among functions R C

BCAM May 2012 21 Fact: The sequence of probability distribution functions {F n, n = 1, 2,...} on R d converges in distribution to the probability distribution function F on R d if and only if lim Φ F n n (t) = Φ F (t), t R d Behavior of characteristic functions: lim Φ F n n (t) = lim E [ e ] itx n, n t R d

BCAM May 2012 22 Fact: Consider a sequence of probability distribution functions {F n, n = 1, 2,...} on R d such that the limits lim Φ F n n (t) = Φ(t), t R d exist. If Φ : R d C is continuous at t = 0, then it is the characteristic function of a probability distribution function F on R d and F n = n F. Consequence of the Bochner-Herglotz Theorem which provides a characterization of characteristic functions through positive definiteness.

BCAM May 2012 23 Beware (I) Behavior of probability density functions: F n (x) = x f n (t)dt, x R Example: F n (x) = x n, x [0, 1] n = 1, 2,...

BCAM May 2012 24 Beware (II) Behavior of probability mass functions (pmfs): F n (x) = x j x (F(x j ) F(x j )), x R n = 1, 2,... Example: X n = n + Poi(λ), n = 1, 2,

BCAM May 2012 25 Tightness The R d -valued rvs {X n, n = 1, 2,...} are tight if there for every ε > 0, there exists a compact subset K ε R d such that sup P [X n K ε ] 1 ε n=1,2,... By Prohorov s Theorem, Tightness = Sequential precompactness (with respect to weak convergence) Remember Bolzano-Weierstrass!

BCAM May 2012 26 Easy criterion Tightness holds if for some r 1, we have sup E [ X n r ] < n=1,2,... By Markov s inequality, P [ X n > c] E [ X n r ] c r, c > 0 n = 1, 2,...

BCAM May 2012 27 Fact: if the sequence of probability distribution functions {F n, n = 1, 2,...} on R converges in distribution to the probability distribution function F on R, then the collection {F n, n = 1, 2,...} is tight

BCAM May 2012 28 Fix x in R. For each δ > 0, there a finite integer n = n (x; δ) such that Consequently, F(x) δ F n (x) F(x) + δ, n n P [X n > x] P [X > x] + δ, n n Now take x sufficiently large, say x = x(δ), such that P [X > x] δ Finally, P [X n > x] P [X > x] + δ δ + δ = 2δ, n n(δ) with n(δ) = n (x(δ); δ)

BCAM May 2012 29 Fact: The sequence of probability distribution functions {F n, n = 1, 2,...} on R d converges in distribution to the probability distribution function F on R d if and only if lim E [g(x n)] = E [g(x)] n for every bounded continuous mapping g : R d R. Alternate definition of weak convergence Useful consequences

BCAM May 2012 30 Beware Assume and X n = n X Y n = n Y where for each n = 1, 2,..., the pair of rvs X n and Y n are defined on the same probability triple (Ω n, F n, P n )

BCAM May 2012 31 Convergence of sums: Is it true that X n + Y n = n X + Y? In general no: Take Z N(0, 1), X n = Z and Y n = ( 1) n Z, so that X n + Y n = (1 + ( 1) n )Z Fact: We have X n + Y n = n X + Y if for each n = 1, 2,..., the rvs X n and Y n are independent!

BCAM May 2012 32 Joint convergence: Is it true that (X n, Y n ) = n (X, Y )? In general no: Same counterexample as before Fact: We have (X n, Y n ) = n (X, Y ) if for each n = 1, 2,..., the rvs X n and Y n are independent, in which case X and Y are independent.

BCAM May 2012 33 Convergence under transformation: Is it true that with h : R d R p? h(x n ) = n h(x) Fact: We have h(x n ) = n h(x) if h : R d R is continuous Skorohod to the rescue!

BCAM May 2012 34 Convergence in the r th mean (r > 0) Consider a collection {X; X n, n = 1, 2,...} of R d -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {X n, n = 1, 2,...} converges in the r th -mean to the rv X if and E [( X n r ) r ] <, n = 1, 2,... and E [( X r ) r ] < This is often written lim E [( X n X r ) r ] = 0. n X n r n X

BCAM May 2012 35 Cauchy criterion available: For every ε > 0, there exists a finite integer n (ε) such that E [( X n X m r ) r ] ε, n, m n (ε)

BCAM May 2012 36 Revisiting auto-regressive sequences Consider X 0 = ξ X t+1 = αx t + W t+1, t = 0, 1,... Assume α 0 R-valued rv ξ i.i.d. R-valued rvs {W, W t, t = 1, 2,...} with E [ W 2] < Mutual independence

BCAM May 2012 37 Recall that for each t = 0, 1, 2,..., X t+1 = α t+1 ξ + t α t s W s+1 s=0 So and [ t ] E α t s W s+1 = s=0 [ t ] Var α t s W s+1 s=0 ( t ) α t s = = s=0 E [W] = 1 αt+1 1 α E [W] t α 2(t s) Var[W s+1 ] s=0 ( t ) α 2s s=0 σ 2 W = 1 α2(t+1) 1 α 2 σ 2 W

BCAM May 2012 38 Convergence in probability Consider a collection {X; X n, n = 1, 2,...} of R d -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {X n, n = 1, 2,...} converges in probability to the rv X if for every ε > 0, This is often written lim P [ X n X 2 > ε] = 0. n X n P n X For d = 1, lim P [ X n X > ε] = 0. n

BCAM May 2012 39 Cauchy criterion available Fact: Convergence in the r th -mean implies convergence in probability: By Markov s inequality P [ X n X > ε] = P [ X n X r > ε r ] ε r E [ X n X r ], r > 0, ε > 0 n = 1, 2,... Converse is not true without additional conditions, e.g., with α > 0, 0 with probability 1 n α X n = n with probability n α

BCAM May 2012 40 Fact: Convergence in probability implies convergence in distribution: Indeed, for each n = 1, 2,... and ε > 0, we have P [X n x] P [X x + ε] + P [ X n X ε] and P [X x ε] P [X n x] + P [ X n X ε] Thus, and Finally let ε 0! lim sup n P [X n x] P [X x + ε] P [X x ε] liminf n P [X n x]

BCAM May 2012 41 Converse is not true! With Z N(0, 1), take X n = ( 1) n Z for each n = 1, 2,.... Obviously, X n = n Z but X n Z = 1 ( 1) n Z, n = 1, 2,...

BCAM May 2012 42 However, not all is lost: If the sequence {X n, n = 1, 2,...} converges in distribution to the a.s. constant rv c, then X n = n c Every sequence converging in distribution to a constant converges to it in probability! Indeed, for each n = 1, 2,... and ε > 0, we have P [ X n c ε] = P [X n c + ε] P [X n < c ε]

BCAM May 2012 43 Problems: You know that X n P n X and Y n P n Y

BCAM May 2012 44 Convergence of sums: Is it true that X n + Y n P n X + Y? Yes because for each n = 1, 2,..., the event [ (X n + Y n ) (X + Y ) > ε] is contained in [ X n X > ε 2 ] [ Y n Y > ε 2 ]

BCAM May 2012 45 What if only and X n P n X Y n = n Y Counterexample: With Z N(0, 1), set X n = Z and Y n = ( 1) n Z, n = 1, 2,... so that X n + Y n = (1 + ( 1) n )Z, n = 1, 2,... It is plain that X n P n Z and Y n = n Z, but the convergence X n + Y n = n X + Y does not hold, hence X n + Y n P n X + Y fails as well!

BCAM May 2012 46 Joint convergence: Is it true that (X n, Y n ) P n (X, Y )?

BCAM May 2012 47 Convergence under transformation: Is it true that with continuous h : R d R p? h(x n ) P n h(x) Easy to see if h : R d R p is uniformly continuous!

BCAM May 2012 48 Fact: Convergence in the a.s. sense implies convergence in probability With ε > 0, [X n converges to X] n=1b n (ε) with monotone increasing events Therefore, by monotonicity! B n (ε) m=n[ X m X ε], n = 1, 2,... P [X n converges to X] lim n P [B n(ε)]

BCAM May 2012 49 If P [X n converges to X] = 1, then lim n P [B n (ε)] = 1 becomes 0 = lim n P [B n(ε) c ] lim n P [ m=n[ X m X > ε]] by complementarity, whence lim P [ X n X > ε] = 0 n Converse is not true!

BCAM May 2012 50 However, not all is lost Partial converse If the sequence {X n, n = 1, 2,...} converges in probability to the rv X, then there exists a sequence ν : N 0 N 0 with ν k < ν k+1, k = 1, 2,... (whence lim k ν k = ) such that lim X ν k = X k a.s. Thus, any sequence convergent in probability contains a deterministic subsequence which converges a.s. (to the same limit).

BCAM May 2012 51 Borel-Cantelli Lemma Consider a sequence of events {A n, n = 1, 2,...}, i.e., A n F, n = 1, 2,... Set and lim sup n A n = n=1 m=n A m = [A n i.o.] lim inf n A n = n=1 m=n A m Obviously lim inf n A n limsup n A n

BCAM May 2012 52 Fact: If then P P [A n ] <, n=1 [ ] lim supa n = 0 n Fact: Assume the events {A n, n = 1, 2,...} to be mutually independent. If P [A n ] =, then P n=1 [ ] lim supa n = 1 n

BCAM May 2012 53 Establishing a.s. convergence How do we show that lim P [X n converges to X] = 1? n With ε > 0, A n (ε) [ X n X ε], n = 1, 2,... and so that B n (ε) m=na m (ε), n = 1, 2,... n=1b n (ε) =...

BCAM May 2012 54 Key observation: By the definition of convergence, [X n converges to X] = ε>0 ( n=1b n (ε)) = ( k=1 n=1 B n (k 1 ) ) Fact: Convergence takes place if for every ε > 0, we have or equivalently, if lim n P [ n=1b n (ε)] = 1 lim n P [ n=1b n (ε) c ] = 0

BCAM May 2012 55 But P [ n=1b n (ε) c ] = P [ n=1 ( m=na m (ε)) c ] = P [ n=1 ( m=na m (ε) c )] (2) so lim n P [ n=1b n (ε) c ] = lim n P [ m=na m (ε) c ] By a union bound argument, if lim n P [ n=1b n (ε) c ] = 0 P [A n (ε) c ] < n=1

BCAM May 2012 56 Fact: We have if for every ε > 0, we have lim P [X n converges to X] n P [ X n X > ε] < n=1 Instance of Borel-Cantelli Lemma

BCAM May 2012 57 Interchanging limit and expectation Consider the rvs {X n, n = 1, 2,...} with E [ X n ] <, n = 1, 2,... such that X n P n X for some rv X. When do we have that the limit of the expected values is the expected value of the limit? lim E [X n] = E [X] n

BCAM May 2012 58 What about using Monotone Convergence Theorem Bounded Convergence Theorem

BCAM May 2012 59 Uniform integrability The rvs {X n, n = 1, 2,...} are uniformly integrable if ( ) lim sup E [1[ X n > c] X n ] = 0 c n=1,2,... Easy test The rvs {X n, n = 1, 2,...} are uniformly integrable if for some r > 1, we have sup E [ X n r ] < n=1,2,...

BCAM May 2012 60 Fact: Consider a collection of rvs {X, X n, n = 1, 2,...} such that X n = n X. If the collection is uniformaly integrable, then E [ X ] < and lim E [X n] = E [X] n For each n = 1, 2,... and c > 0, we have the decomposition E [X n ] E [X] = E [1 [ X n c]x n ] E [1 [ X c]x] + E [1 [ X n > c] X n ] E [1 [ X > c]x] Converse available No escape from uniform integrability!

BCAM May 2012 61 Poisson s Theorem for sums of Bernoulli rvs For each n = 1, 2,..., the collection {B n,k (p n ), k = 1,...,k n } i.i.d. Bernoulli(p n ) rvs is defined on some probability triple (Ω n, F n, P n ). Write so that S n = k n k=1 [ kn ] E [S n ] = E B n,k (p n ) = B n,k (p n ), n = 1, 2,... k n k=1 k=1 E [B n,k (p n )] = k n p n

BCAM May 2012 62 Theorem 1 If for some λ > 0, then with lim k np n = λ and n S n = n Poi(λ) lim k n = n Poi(λ)(k) = λk k! e λ, k = 0, 1,... Historically: k n = n and p n = λ n Many variations on this theme! Chen-Stein method for Poisson approximation Point process version leads to ubiquity of Poisson modeling!

BCAM May 2012 63 Strong Law of Large Numbers Consider a collection {X, X n, n = 1, 2,...} of i.i.d. rvs defined on the same probability triple (Ω, F, P), and write S n = X 1 +... + X n, n = 1, 2,... Theorem 2 If E [ X ] <, then lim n S n n = E [X] a.s. Frequentist definition of probability compatible with Kolmogorov s axiomatic model

BCAM May 2012 64 Weak Law of Large Numbers Consider a collection {X, X n, n = 1, 2,...} of i.i.d. rvs defined on the same probability triple (Ω, F, P), and write S n = X 1 +... + X n, n = 1, 2,... Theorem 3 If E [ X ] <, then S n n P n E [X] a.s. Many variations on this theme! Markov s inequality at work (when second moments are available)

BCAM May 2012 65 Var[X] = Var[X 1 +... + X n ] n = Var[X i ] + n i=1 k,l=1, k l Cov[X k, X l ] Here, under i.i.d. assumptions, so that Var [ Sn n ] = Var[X] n [ ] S n P n E [X] > ε ε 2 Var [ Sn n ] This also works under weaker assumptions, e.g., uncorrelated rvs, etc

BCAM May 2012 66 Central Limit Theorem Consider a collection {X, X n, n = 1, 2,...} of i.i.d. rvs defined on the same probability triple (Ω, F, P), and write S n = X 1 +... + X n, n = 1, 2,... Theorem 4 If E [ X 2] <, then ( ) Sn n n E [X] = n σu with U N(0, 1) and σ 2 = Var[X].