IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Similar documents
IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

2 2 + x =

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

STAT 7032 Probability Spring Wlodek Bryc

Probability and Measure

The main results about probability measures are the following two facts:

Convergence Concepts of Random Variables and Functions

IEOR 8100: Topics in OR: Asymptotic Methods in Queueing Theory. Fall 2009, Professor Whitt. Class Lecture Notes: Wednesday, September 9.

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

4 Expectation & the Lebesgue Theorems

4 Sums of Independent Random Variables

1 Delayed Renewal Processes: Exploiting Laplace Transforms

Homework 11. Solutions

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Theory of Stochastic Processes 3. Generating functions and their applications

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Random experiments may consist of stages that are performed. Example: Roll a die two times. Consider the events E 1 = 1 or 2 on first roll

On the convergence of sequences of random variables: A primer

Lecture 11. Probability Theory: an Overveiw

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

1 Sequences of events and their limits

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

STA205 Probability: Week 8 R. Wolpert

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

0.1 Uniform integrability

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Problem set 1, Real Analysis I, Spring, 2015.

9 Brownian Motion: Construction

1 Stat 605. Homework I. Due Feb. 1, 2011

Notes 1 : Measure-theoretic foundations I

Lecture 4: September Reminder: convergence of sequences

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

Math 117: Infinite Sequences

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Lecture 3: Probability Measures - 2

7 Convergence in R d and in Metric Spaces

The Heine-Borel and Arzela-Ascoli Theorems

µ (X) := inf l(i k ) where X k=1 I k, I k an open interval Notice that is a map from subsets of R to non-negative number together with infinity

3 Integration and Expectation

Probability and Measure

Random Process Lecture 1. Fundamentals of Probability

3 Measurable Functions

Probability and Measure

MATHS 730 FC Lecture Notes March 5, Introduction

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

The Theory of Statistics and Its Applications

13. Examples of measure-preserving tranformations: rotations of a torus, the doubling map

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

Economics 620, Lecture 8: Asymptotics I

STAT 7032 Probability. Wlodek Bryc

1* (10 pts) Let X be a random variable with P (X = 1) = P (X = 1) = 1 2

Section 27. The Central Limit Theorem. Po-Ning Chen, Professor. Institute of Communications Engineering. National Chiao Tung University

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Homework 1 due on Monday September 8, 2008

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

4 Expectation & the Lebesgue Theorems

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Iowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Stochastic Models (Lecture #4)

18.175: Lecture 3 Integration

Chapter 4. Measure Theory. 1. Measure Spaces

Probability A exam solutions

MATH 117 LECTURE NOTES

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

Brownian Motion and Conditional Probability

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

MAT 544 Problem Set 2 Solutions

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Exercises for Unit VI (Infinite constructions in set theory)

Product measure and Fubini s theorem

Lecture 11: Random Variables

Hints/Solutions for Homework 3

Solutions for Homework Assignment 2

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

A Review of Basic FCLT s

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Thursday, October 4 Renewal Theory: Renewal Reward Processes

Lecture 2: Convergence of Random Variables

Lecture 17: The Exponential and Some Related Distributions

Notes 18 : Optional Sampling Theorem

On Kusuoka Representation of Law Invariant Risk Measures

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Normed Vector Spaces and Double Duals

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Empirical Processes: General Weak Convergence Theory

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

Proofs of the martingale FCLT

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Second Midterm Exam, November 13, Based on Chapters 1-3 in Ross

Math 117: Topology of the Real Numbers

The Borel-Cantelli Group

Transcription:

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence 1 Overview We started by stating the two principal laws of large numbers: the strong and weak forms, denoted by SLLN and WLLN. We want to be clear in our understanding of the statements; that led us to a careful definition of a random variable and now a careful examination of the basic modes of convergence for a sequence of random variables. We also want to focus on the proofs, but in this course (as in the course textbook) we consider only relatively simple proofs that apply under extra moment conditions. Even with these extra conditions, important proof techniques appear, which relate to the basic axioms of probability, in particular, to countable additivity, which plays a role in understanding and proving the Borel-Cantelli lemma (p. 4). We think that it is helpful to fucus on these more elementary cases before considering the most general conditions. Key reading for the present lecture: 1.1-1.3, 1.7-1.8, the Appendix, pp. 56-58. 2 Modes of Convergence As general background, we need to understand the relation among the basic modes of convergence for a sequence of random variables. The main idea is to realize that there is more than one mode of convergence. Consider the following limits (as n ): (i) X n X (convergence in distribution) Definition: That means F n (x) F(x) for all x that are continuity points of the cdf F, where F n (x) P(X n x) and F(x) P(X x). A point x is a continuity point of F if (and only if) F does not have a jump at x, i.e., if F(x) = F(x ), where F(x ) is the left limit of F at x. (Recall that F is right continuous by definition; since it is monotone, the left limits necessarily exist.) (ii) EX n EX (convergence of means) This is just convergence for a sequence of numbers; the mean is only a partial summary of the full distribution. Obviously convergence of means is a relatively weak property. The meaning of convergence of a sequence of numbers should be clear: lim n x n = x means that: for all ǫ > 0, there exists an integer n 0 n 0 (ǫ) such that, for all n n 0, x n x < ǫ. (iii) E[ X n X ] 0 (convergence in the mean, convergence in L 1 ) This is convergence in the mean: for all ǫ > 0, there exists an integer n 0 n 0 (ǫ) such that, for all n n 0, E[ X n X ] < ǫ. We also sometimes consider convergence in the p th mean, especially for p = 2.

(iii ) E[ X n X p ] 0 (convergence in the p th mean, convergence in L p ) (iv) P(X n X) = 1 (convergence w.p.1) This is convergence with probability one (w.p.1). Elaborating, we are saying that P({s S : lim n X n(s) = X(s)}) = 1, where the limit inside is for a sequence of real numbers (for each s), and thus defined just as in (ii) above. (Here we use S for the underlying sample space. I alternate between S and Ω.) (v) X n X in probability Here is a definition: For all ǫ > 0 and η > 0, there exists an integer n 0 such that, for all n n 0, P( X n X > ǫ) < η. In general, convergence in probability implies convergence in distribution. However, when the limiting random variable X is a constant, i.e., when P(X = c) = 1 for some constant c, the two modes of convergence are equivalent; e.g., see p. 27 of Billingsley, Convergence of Probability Measures, second ed., 1999. References on this topic. limits for sequences of numbers: Chapter 3 of Rudin, Principles of Math. Analysis limits for sequences of functions: Chapter 7 of Rudin, Principles of Math. Analysis convergence concepts: Chapter 4 of Chung, A Course in Probability Theory II 25 expectation and convergence of random variables: some of Chapters V and VIII of Feller related supplementary reading: Billingsley, Probability and Measure, Chapters 1, 6, 14, 21, 3 Relations Among the Modes of Convergence Consider the following limits (as n ): (i) X n X (convergence in distribution) (ii) EX n EX (convergence of means) (iii) E[ X n X ] 0 (convergence in the mean, also known as convergence in L 1, using standard functional analysis terminology) (iv) P(X n X) = 1 (convergence with probability 1 (w.p.1)) (v) X n X in probability (convergence in probability) What are the implications among these limits? For example, does limit (i) imply limit (ii)? - See Chapter 4 of Chung, A Course in Probability Theory for a discussion of convergence concepts. The following implications hold: (iv) (v) (i), (iii) (ii) and (iii) (v) (i) 2

Modes of convergence for a sequence of random variables convergence w.p.1 convergence in the mean convergence in probability convergence of the means convergence in distribution Figure 1: Relations among the five modes of convergence for a sequence of random variables: X n X as n. We remark that convergence in probability is actually equivalent to convergence in distribution, i.e., also (i) implies (v), for the special case in which the limiting random variable X is a constant. That explains why the WLLN can be expressed as convergence in distribution. For all the relations that do not hold, it suffices to give two counterexamples. These examples involve specifying the underlying probability space and the random variables X n and X, which are (measurable) functions mapping this space into the real line R. In class we used pictures to describe the functions, where the x axis represented the domain and the y-axis represented the range. In both examples, the underlying probability space is taken to be the interval [0,1] with the uniform probability distribution. Counterexample 1: To see that convergence with probability 1 (w.p.1) (iv) does not imply convergence of means (ii) (and thus necessarily convergence of means (iii)), let the underlying probability space be the unit interval [0, 1] with the uniform distribution (which coincides with Lebesgue measure). Let X = 0 w.p.1 and let X n = 2 n on the interval (a n,a n + 2 n ) where a n = 2 1 + 2 2 + + 2 (n 1) with a 1 = 0, and let X n = 0 otherwise. Then P(X n X 0) = 1, but E[ X n X ] = EX n = 1 for all n, but EX = 0. (To see that indeed P(X n X 0) = 1, note that the interval on which X k is positive for any k > n has probability going to 0 as n.) We could make E[X n ] explode (diverge to + if we redefined X n to be n2 n where it is positive. From the example above, it follows that convergence in probability (v) does not imply convergence of means (ii) and that convergence in distribution (i) does not imply convergence 3

of means (ii). However, convergence in distribution (i) does imply convergence of means (ii) under extra regularity conditions, namely, under uniform integrability. See p. 32 of Billingsley, Convergence of Probability Measures, 1968, for more discussion. If X n X and if {X n : n 1} is uniformly integrable (UI), then E[X n ] E[X] as n. The online sources give more. For a sample paper full of UI issues, see Wanmo Kang, Perwez Shahabuddin and WW. Exploiting Regenerative Structure to Estimate Finite Time Averages via Simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS), vol. 17, No. 2, April 2007, Article 8, pp. 1-38. Available at: http://www.columbia.edu/ ww2040/recent.html Counterexample 2: To see that convergence in the mean (iii) does not imply convergence w.p.1 (iv), again let the underlying probability space be the unit interval [0, 1] with the uniform distribution (which coincides with Lebesgue measure). Let X = 0 w.p.1. Somewhat like before, let X n = 1 on the interval (a n,a n +n 1 ) where a n = a n 1 +(n 1) 1 mod 1, with a 1 = 0, and let X n = 0 otherwise. (The mod1 means that there is wrap around from 1 back to 0.) (To see that indeed P(X n X 0) = 0, note that the X k = 1 infinitely often for each sample point. On the other hand, E[ X n X ] = EX n = 1/n 0 as n. 4 Reducing Everything to Convergence w.p.1 An important unifying concept is the Skorohod Representation Theorem; it shows how convergence in distribution can be expressed in terms of convergence w.p.1. See homework 1. Also see Section 1.3 of the Internet Supplement to my book, available for downloading from my web page. It might also be helpful to read the introduction (3 pages) to my paper, Some Useful Functions for Functional Limit Theorems. Mathematics of Operations Research, vol. 5, No. 1, February 1980, pp. 67-85. This is also available for downloading. Both these sources point to the power of these ideas in the general context of stochastic processes instead of just real-valued random variables. 5 Proof of the SLLN 5.1 Moment Inequalities We now turn to the proof of Theorem??. To appreciate the extra conditions in Theorem?? above, you want to know the following moment inequality. Theorem 5.1 Suppose that r > p > 0. If E[ X r ] <, then E[ X p ] <. Theorem 5.1 follows from: Theorem 5.2 Suppose that r > p > 0. Then E[ X p ] < (E[ X r ]) p/r. Theorem 5.2 follows from Hölder s inequality: 4

Theorem 5.3 Suppose that p > 1, q > 1 and p 1 + q 1 = 1. Suppose that E[ X p ] < and E[ Y q ] <. Then E[ XY ] (E[ X p ]) 1/p (E[ Y q ]) 1/q. Hölder s inequality in Theorem 5.3 can be proved by exploiting the concavity of the logarithm. To apply Hölder s inequality, replace X by X p and Y by 1. Let the exponents be r/p > 1 and q such that (1/q) + (p/r) = 1. Google the law of large numbers and Hölder s inequality. 5.2 The Proof in the Appendix to Chapter 1 in Ross The proof is given on pages 56-58 of Ross. The key idea is to apply the Borel-Cantelli lemma, Proposition 1.1.2 on p. 4, as Ross indicates on the bottom of p. 57. For further discussion, let the mean be 0. We apply the Borel-Cantelli Lemma to show that, for all ǫ > 0, We do so by showing that P( S n /n > ǫ infinitely often) = 0. (1) P( S n /n > ǫ) <. (2) n=1 There are two remaining issues: (1) How do we establish this desired inequality in (2), and (ii) Why does that suffice to prove (1)? To see why it suffices to prove (1), define the events A k {s S : S n (s)/n > 1/k infinitely often}, k 1. Here infinitely often mean for infinitely many n. We observe that {s S : S n (s)/n 0 as n } c = A k. However, by the continuity property of increasing sets, Proposition 1.1.1 on p. 2 of Ross, which itself is equivalent to the fundamental countable additivity axiom, we have k=1 P ( k=1 A k ) = lim K P ( K k=1 A k ) = lim K P (A K). Hence, it suffices to show that P (A K ) = 0 for each K, which is equivalent to (1). Now we turn to the second point: verifying (2). Now we exploit Markov s inequality, using the fourth power, getting P( S n /n > ǫ) < E[S4 n ] n 4 ǫ 4. We then show that E[S 4 n ] Kn2. (Details are given in Ross.) Hence, P( S n /n > ǫ) < K /n 2 for some new constant K. Thus we can deduce the required (2). (Note that this argument does not work if we use the second power instead of the fourth power.) 5

6 The Continuous Mapping Theorem a. continuous-mapping theorem [see Section 3.4 of my book online] Theorem 6.1 If X n X as n and f is a continuous function, then f(x n ) f(x) as n. This can be approved by applying the Skorohod representation theorem (homework 1): We start by replacing the original random variables by the new random variables, say X n and X, on a new underlying probability space that converge w.p.1, and have the property that the probability law of X n coincides with the probability law of X n for each n and the probability law of X coincides with the probability law of X. Then we get convergence f(x n) f(x ) w.p.1 by the assumed continuity. But this w.p.1 convergence implies convergence in distribution: f(x n) f(x ) because w.p.1 convergence always implies convergence in distribution (or in law). However, since the asterisk random variables have the same distributions as the random variables without the asterisks, we have the desired f(x n ) f(x). [see Theorem 0.1 in Problem 3 (d) in Homework 1; also see the Skorohod Representation Theorem at the end of Section 3.2 in my book; see Section 1.3 of the Internet Supplement for a proof in the context of separable metric spaces] b. If X n X and Y n Y, does (X n,y n ) (X,Y )? [In general, the answer is no, but the answer is yes if either X or Y is deterministic. The answer is also yes if X n is independent of Y n for all n. See Section 11.4 of my book online.] (c) Some questions: Suppose that V n N(2,3), W n N(1,5), X n 7 and Y n 3. (Convergence in distribution as n ) (a) Does V 2 n N(2,3)2? [yes, by continuous mapping theorem; See Section 3.4 of my book] (b) Is N(2,3) 2 d = N(4,9), where d = means equal in distribution? [no, square of normal has chi squared distribution] (c) Does V n e (V 3 n 12V 2 n ) N(2,3)e (N(2,3)3 12N(2,3) 2 ) [yes, again by continuous mapping theorem] (d) Does V n + W n N(3,8)? [not necessarily, if V n and W n are independent then true] (e) Does V n + X n N(9,3)? [yes, See Section 11.4 of my book, we first have (V n,x n ) (V,X)] (f) Does V 4 n (e(x2 n +Y 3 n ) W n V 13 n + 6 N(2,3)4 (e (49+27) N(1,5)N(2,3) 13 + 6? [yes, under extra assumptions to get vector convergence: (V n,w n,x n,y n ) (V,W,X,Y ), where random variables on the right are independent (only important for random ones), independence suffices; then apply continuous mapping theorem] 6

(g) Answer questions (a) - (e) above under the condition that in all the limits is replaced by convergence with probability one (w.p.1). [almost same answers] (h) Answer questions (a) - (e) above under the condition that in all the limits is replaced by convergence in probability. [almost same answer, because there is a w.p.1 representation of convergence in probability: X n X in probability if and only if for all subsequences {X nk } there exists a further subsubsequence {X nkj } such that for this subsequence there is convergence w.p.1 to X.] 7