Lecture Notes 3 Convergence (Chapter 5)

Similar documents
Lecture 4: September Reminder: convergence of sequences

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

17. Convergence of Random Variables

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

1 Sequences of events and their limits

Almost Sure Convergence of a Sequence of Random Variables

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

COMPSCI 240: Reasoning Under Uncertainty

Convergence Concepts of Random Variables and Functions

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

MAT 135B Midterm 1 Solutions

X = X X n, + X 2

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

MATH Solutions to Probability Exercises

University of Regina. Lecture Notes. Michael Kozdron

MAS113 Introduction to Probability and Statistics

Chapter 6: Large Random Samples Sections

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

A PECULIAR COIN-TOSSING MODEL

Lecture 2 Sep 5, 2017

Assignment 4: Solutions

Stochastic Models (Lecture #4)

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

. Find E(V ) and var(v ).

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1

Convergence of random variables, and the Borel-Cantelli lemmas

Concentration inequalities and tail bounds

1 Probability theory. 2 Random variables and probability theory.

EE514A Information Theory I Fall 2013

Lecture 23. Random walks

Quick Tour of Basic Probability Theory and Linear Algebra

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

1 Stat 605. Homework I. Due Feb. 1, 2011

Joint Distribution of Two or More Random Variables

Probability and Measure

18.175: Lecture 14 Infinite divisibility and so forth

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Soo Hak Sung and Andrei I. Volodin

Convergence of Random Variables

Lecture 1: Overview of percolation and foundational results from probability theory 30th July, 2nd August and 6th August 2007

MATH 418: Lectures on Conditional Expectation

Exercises with solutions (Set D)

Lecture 2: Review of Basic Probability Theory

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Math 328 Course Notes

Lecture 1: August 28

18.175: Lecture 13 Infinite divisibility and Lévy processes

More on Distribution Function

Notes on Discrete Probability

Proving the central limit theorem

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

JUSTIN HARTMANN. F n Σ.

MATH 117 LECTURE NOTES

Expectation of Random Variables

Complex Analysis Slide 9: Power Series

Lecture Notes 7 Stationary Random Processes. Strict-Sense and Wide-Sense Stationarity. Autocorrelation Function of a Stationary Process

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Notes 1 : Measure-theoretic foundations I

Probability and Measure

Basic Probability. Introduction

Review of Probability. CS1538: Introduction to Simulations

Lecture 1: Review on Probability and Statistics

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Lecture 11: Random Variables

MATH 140B - HW 5 SOLUTIONS

Limiting Distributions

simple if it completely specifies the density of x

2 2 + x =

n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Eleventh Problem Assignment

Lecture 5: Asymptotic Equipartition Property

COMP2610/COMP Information Theory

8 Laws of large numbers

Limiting Distributions

1 Generating functions

1 Measurable Functions

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Uniform Convergence Examples

THEORY OF PROBABILITY VLADIMIR KOBZAR

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Lecture 2: Convergence of Random Variables

CS 124 Math Review Section January 29, 2018

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS 246 Review of Proof Techniques and Probability 01/14/19

STAT 331. Martingale Central Limit Theorem and Related Results

Lecture 14: October 22

Chapter 2: Random Variables

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Compatible probability measures

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Problem Points S C O R E Total: 120

Lecture 2: Repetition of probability theory and statistics

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Lecture 2: Random Variables and Expectation

Transcription:

Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let F denote the cdf of X. Example: A good example to keep in mind is the following. Let Y 1, Y 2,... be a sequence of i.i.d. random variables. Let X n = 1 n n i=1 be the average of the first n of the Y i s. This defines a new sequence X 1, X 2,..., X n. That is, the sequence of interest X 1,..., X n might be a sequence of statistics based on some other sequence of random variables. Y i 1. X n converges to X in probability, written X n X, if, for every ɛ > 0, In other words, and X n X = o (1). ( X n X > ɛ) 0 as n. (1) lim ( X n X > ɛ) = 0 2. X n converges almost surely to X, written X n X, if, for every ɛ > 0, This is also called Strong convergence. ( lim X n X < ɛ) = 1. (2) 3. X n converges to X in quadratic mean (also called convergence in L 2 ), written X n X, if E(X n X) 2 0 as n. (3) 4. X n converges to X in distribution, written X n X, if at all t at which F is continuous. lim F n(t) = F (t) (4) 1

Recall the following definition. Definition 1 Z has a point mass distribution at a, written as Z δ a, if (Z = a) = 1 in which case F Z (z) = δ a (z) = { 0 if z < a 1 if z a. and the probability mass function is f(z) = 1 for z = a and 0 otherwise. Example 2 Consider flipping a coin for which the probability of heads is p. Let X i denote the outcome of a single toss (0 or 1). Hence, p = (X i = 1) = E(X i ). The fraction of heads after n tosses is X n. According to the law of large numbers, X n converges to p in probability. This does not mean that X n will numerically equal p. It means that, when n is large, the distribution of X n is tightly concentrated around p. Suppose that p = 1/2. How large should n be so that (.4 X n.6).7? First, E(X n ) = p = 1/2 and Var(X n ) = σ 2 /n = p(1 p)/n = 1/(4n). From Chebyshev s inequality, (.4 X n.6) = ( X n µ.1) The last expression will be larger than.7 if n = 84. = 1 ( X n µ >.1) 1 1 4n(.1) = 1 25 2 n. When the limiting random variable is a point mass, we change the notation slightly. For example, 1. If (X = c) = 1 and X n X then we write X n c. 2. If X n converges to c in quadratic mean, written X n c, if E(X n c) 2 0 as n. 3. If X n converges to c in distribution, written X n c, if for all t c. lim F n(t) = δ c (t) Suppose we are given a probability space (Ω, B, ). We say a statement about random elements holds almost surely () if there exists an event N B with (N) = 0 such that the statement holds if ω N c. Alternatively, we may say the statement holds for a.a. (almost all) ω. The set N appearing the definition is sometimes called the exception set. Here are several examples of statements that hold : 2

1. If {X n } is a sequence of random variables, then lim X n exists means that there exists an event N B, such that (N) = 0 and if ω N c then exists. It also means that for a.a. ω, lim X n lim sup X n (ω) = lim inf X n(ω). We will write lim X n = X or X n X, or X n X. 2. X n converges almost surely to a constant c, written X n c if there exists an event N B, such that (N) = 0 and if ω N c then lim X n = c. Example 3 (Almost sure convergence) Let the sample space S be [0, 1] with the uniform probability distribution. If the sample space S has elements denoted by s, then random variables X n (s) and X(s) are all functions defined on S. Define X n (s) = s+s n and X(s) = s. For every s [0, 1), s n 0 as n and X n (s) s = X(s). However X n (1) = 2 for every n so does not converge to 1 = X(1). Since the convergence occurs on the set [0, 1) and ([0, 1)) = 1. X n X: that is, the function X n (s) converge to X(s) for all s S except for s N = {1}, where N S and (N) = 0. See Example CB 5.5.7. Example 4 Example CB 5.5.8 Continuing Example 3. Let S = [0, 1]. Let be uniform on [0, 1]. Let X(s) = s and let X 1 = s + I [0,1] (s), X 2 = s + I [0,1/2] (s), X 3 = s + I [1/2,1] (s) X 4 = s + I [0,1/3] (s), X 5 = s + I [1/3,2/3] (s), X 6 = s + I [2/3,1] (s) etc. It is straightforward to see that X n converges to X in probability. As n, ( X n X > ɛ) is equal to the probability of an interval [a n, b n ] of s values whose length is going to 0. Then X n X. However, X does not converge to X almost surely. Indeed, there is no value of s S for which X n (s) s = X(s). For each s, the value X n (s) alternates between the values of s and s + 1 infinitely often, that is, X n (s) does not converge to X(s). That is, no pointwise convergence occurs for this sequence. 3

You are not expected to know the following Theorem 5 for this class. as Theorem 5 X n X if and only if, for every ɛ > 0, lim (sup X m X ɛ) = 1. m n Theorem 6 The following relationships hold: (a) X n X implies that X n X. (b) X n X implies that X n X. (c) If X n X and if (X = c) = 1 for some real number c, then X n X. as (d) X n X implies X n X. In general, none of the reverse implications hold except the special case in (c). Example 7 (Convergence in distribution) Let X n N(0, 1/n). Intuitively, X n is concentrating at 0 so we would like to say that X n converges to 0. Let s see if this is true. Note that nx n N(0, 1). Let F be the distribution function for a point mass at 0: (X = 0) = 1. Let Z denote a standard normal random variable. For t < 0, since nt. For t > 0, F n (t) = (X n < t) = ( nx n < nt) = (Z < nt) 0 F n (t) = (X n < t) = ( nx n < nt) = (Z < nt) 1 since nt. Hence, F n (t) F (t) for all t 0 and so X n 0. Notice that F n (0) = 1/2 F (0) = 1 so convergence fails at t = 0. That doesn t matter because t = 0 is not a continuity point of F and the definition of convergence in distribution only requires convergence at continuity points. Now convergence in probability follows from Theorem 6 (c): X n 0. Here we also provides a direct proof. For any ɛ > 0, using Markov s inequality, as n. ( X n > ɛ) = ( X n 2 > ɛ 2 ) E(X2 n) ɛ 2 = 1 n ɛ 2 0 4

We will show proof of Theorem 6(a) (c) next time. roof of Theorem 6. We start by proving (a). Suppose that X n X. Fix ɛ > 0. Then, using Markov s inequality, Also, ( X n X > ɛ) = ( X n X 2 > ɛ 2 ) E X n X 2 ɛ 2 0. roof of (b). Fix ɛ > 0 and let x be a continuity point of F. Then F n (x) = (X n x) = (X n x, X x + ɛ) + (X n x, X > x + ɛ) (X x + ɛ) + ( X n X > ɛ) = F (x + ɛ) + ( X n X > ɛ). F (x ɛ) = (X x ɛ) = (X x ɛ, X n x) + (X x ɛ, X n > x) F n (x) + ( X n X > ɛ). Hence, F (x ɛ) ( X n X > ɛ) F n (x) F (x + ɛ) + ( X n X > ɛ). Take the limit as n to conclude that F (x ɛ) lim inf F n(x) lim sup F n (x) F (x + ɛ). This holds for all ɛ > 0. Take the limit as ɛ 0 and use the fact that F is continuous at x and conclude that lim n F n (x) = F (x). roof of (c). Fix ɛ > 0. Then, ( X n c > ɛ) = (X n < c ɛ) + (X n > c + ɛ) (X n c ɛ) + (X n > c + ɛ) = F n (c ɛ) + 1 F n (c + ɛ) F (c ɛ) + 1 F (c + ɛ) = 0 + 1 1 = 0. Warning! Convergence in probability does not imply convergence in quadratic mean. 5

Let U Unif(0, 1) and let X n = ni (0,1/n) (U). Then ( X n > ɛ) = ( ni (0,1/n) (U) > ɛ) = (0 U < 1/n) = 1/n 0. Hence, X n 0. But E(X 2 n) = n 1/n 0 du = 1 for all n so X n does not converge in quadratic mean. Convergence in distribution does not imply convergence in probability. Let X N(0, 1). Let X n = X for n = 1, 2, 3,...; hence X n N(0, 1). X n has the same distribution function as X for all n so, trivially, lim n F n (x) = F (x) for all x. Therefore, X n X. But ( X n X > ɛ) = ( 2X > ɛ) = ( X > ɛ/2) 0. So X n does not converge to X in probability. One might conjecture that if X n b, then E(X n ) b. This is not true. Let X n be a random variable defined by (X n = n 2 ) = 1/n and (X n = 0) = 1 (1/n). Now, ( X n < ɛ) = (X n = 0) = 1 (1/n) 1. Hence, X n 0. However, E(X n ) = [n 2 (1/n)] + [0 (1 (1/n))] = n. Thus, E(X n ). Example 8 Let X 1,..., X n Uniform(0, 1). Let X (n) = max i X i. First we claim that X (n) 1. This follows since ( X (n) 1 > ɛ) = (X (n) 1 ɛ) = i (X i 1 ɛ) = (1 ɛ) n 0. Also (n(1 X (n) ) t) = 1 (X (n) 1 (t/n)) = 1 (1 t/n) n 1 e t. So n(1 X (n) ) Exp(1). 6