IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

Similar documents
IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Joint Probability Distributions and Random Samples (Devore Chapter Five)

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Thursday, September 5 Modes of Convergence

Convergence in Distribution

E[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =

the convolution of f and g) given by

8 Laws of large numbers

1 Delayed Renewal Processes: Exploiting Laplace Transforms

1 Probability and Random Variables

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

1 Expectation of a continuously distributed random variable

1.1 Review of Probability Theory

Northwestern University Department of Electrical Engineering and Computer Science

STAT 3610: Review of Probability Distributions

Characteristic Functions and the Central Limit Theorem

Recitation 2: Probability

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Q = (c) Assuming that Ricoh has been working continuously for 7 days, what is the probability that it will remain working at least 8 more days?

Chapter 4. Chapter 4 sections

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Math Review Sheet, Fall 2008

Department of Mathematics

Multiple Random Variables

1 Review of Probability

7 Random samples and sampling distributions

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

Limiting Distributions

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

4 Moment generating functions

Proving the central limit theorem

Math 341: Probability Seventeenth Lecture (11/10/09)

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Limiting Distributions

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Second Midterm Exam, November 13, Based on Chapters 1-3 in Ross

BASICS OF PROBABILITY

6.1 Moment Generating and Characteristic Functions

Chapter 2. Discrete Distributions

Continuous random variables

Lecture 5: Moment generating functions

Homework 10 (due December 2, 2009)

HW Solution 12 Due: Dec 2, 9:19 AM

Statistics, Data Analysis, and Simulation SS 2015

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Statistics 3657 : Moment Generating Functions

Multivariate Random Variable

Discrete Random Variables

Exponential Distribution and Poisson Process

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Introduction to Probability

18.175: Lecture 15 Characteristic functions and central limit theorem

Expectation is a positive linear operator

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

IEOR 8100: Topics in OR: Asymptotic Methods in Queueing Theory. Fall 2009, Professor Whitt. Class Lecture Notes: Wednesday, September 9.

Notes 9 : Infinitely divisible and stable laws

Econ 508B: Lecture 5

3 Conditional Expectation

1 Random Variable: Topics

PROBABILITY THEORY LECTURE 3

Stochastic Convergence, Delta Method & Moment Estimators

University of Regina. Lecture Notes. Michael Kozdron

Lecture 1: August 28

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Asymptotic distribution of the sample average value-at-risk

Chapter 6: Functions of Random Variables

Stat 5101 Notes: Algorithms

DS-GA 1002 Lecture notes 2 Fall Random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Thursday, October 4 Renewal Theory: Renewal Reward Processes

CS145: Probability & Computing

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Review of Probability Theory

Line of symmetry Total area enclosed is 1

Probability (continued)

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

1 Inverse Transform Method and some alternative algorithms

1 Joint and marginal distributions

1 Complete Statistics

Probability. Lecture Notes. Adolfo J. Rumbos

A Review of Basic FCLT s

1 Solution to Problem 2.1

Summary. Ancillary Statistics What is an ancillary statistic for θ? .2 Can an ancillary statistic be a sufficient statistic?

18.175: Lecture 13 Infinite divisibility and Lévy processes

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

MATH Notebook 5 Fall 2018/2019

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

This exam contains 6 questions. The questions are of equal weight. Print your name at the top of this page in the upper right hand corner.

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Section 9.1. Expected Values of Sums

Lecture 2: Review of Basic Probability Theory

Transcription:

IEOR 3106: Introduction to Operations Research: Stochastic Models Fall 2011, Professor Whitt Class Lecture Notes: Thursday, September 15. Random Variables, Conditional Expectation and Transforms 1. Random Variables and Functions of Random Variables (i) What is a random variable? A (real-valued) random variable, often denoted by X (or some other capital letter), is a function mapping a probability space (S, P ) into the real line R. This is shown in Figure 1. Associated with each point s in the domain S the function X assigns one and only one value X(s) in the range R. (The set of possible values of X(s) is usually a proper subset of the real line; i.e., not all real numbers need occur. If S is a finite set with m elements, then X(s) can assume at most m different values as s varies in S.) A random variable: a function X (S,P) R Domain: probability space Range: real line Figure 1: A (real-valued) random variable is a function mapping a probability space into the real line. As such, a random variable has a probability distribution. We usually do not care about the underlying probability space, and just talk about the random variable itself, but it is good

to know the full formalism. The distribution of a random variable is defined formally in the obvious way F (t) F X (t) P (X t) P ({s S : X(s) t}), where means equality by definition, P is the probability measure on the underlying sample space S and {s S : X(s) t} is a subset of S, and thus an event in the underlying sample space S. See Section 2.1 of Ross; he puts this out very quickly. (Key point: recall that P attaches probabilities to events, which are subsets of S.) If the underlying probability space is discrete, so that for any event E in the sample space S we have P (E) = s E p(s), where p is the probability mass function (pmf), then X also has a pmf p X on a new sample space, say S 1, defined by p X (r) P (X = r) P ({s S : X(s) = r}) = p(s) for r S 1. (1) s {s S:X(s)=r} Example 0.1 (roll of two dice) Consider a random roll of two dice. The natural sample space is S {(i, j) : 1 i 6, 1 j 6}, where each of the 36 points in S is assigned equal probability p(s) = 1/36. The random variable X might record the sum of the values on the two dice, i.e., X(s) X((i, j)) = i + j. Then the new sample space is S 1 = {2, 3, 4,..., 12}. In this case, using formula (1), we get the pmf of X being p X (r) P (X = r) for r S 1, where p X (2) = p X (12) = 1/36, p X (3) = p X (11) = 2/36, p X (4) = p X (10) = 3/36, p X (5) = p X (9) = 4/36, p X (6) = p X (8) = 5/36, p X (7) = 6/36. (ii) What is a function of a random variable? Given that we understand what is a random variable, we are prepared to understand what is a function of a random variable. Suppose that we are given a random variable X mapping the probability space (S, P ) into the real line R and we are given a function h mapping R into R. Then h(x) is a function mapping the probability space (S, P ) into R. As a consequence, h(x) is itself a new random variable, i.e., a new function mapping (S, P ) into R, as depicted in Figure 2. As a consequence, the distribution of the new random variable h(x) can be expressed in different (equivalent) ways: F h(x) (t) P (h(x) t) P ({s S : h(x(s)) t}), 2 P X ({r R : h(r) t}), P h(x) ({k R : k t}),

A function of a random variable X h (S,P) R R Domain: probability space Range: real line Range: real line Figure 2: A (real-valued) function of a random variable is itself a random variable, i.e., a function mapping a probability space into the real line. where P is the probability measure on S in the first line, P X is the probability measure on R (the distribution of X) in the second line and P h(x) is the probability measure on R (the distribution of the random variable h(x) in the third line. Example 0.2 (more on the roll of two dice) As in Example 0.1, consider a random roll of two dice. There we defined the random variable X to represent the sum of the values on the two rolls. Now let h(x) = x 7, so that h(x) X 7 represents the absolute difference between the observed sum of the two rolls and the average value 7. Then h(x) has a pmf on a new probability space S 2 {0, 1, 2, 3, 4, 5}. In this case, using formula (1) yet again, we get the pmf of h(x) being p h(x) (k) P (h(x) = k) P ({s S : h(x(s)) = k}) for k S 2, where p h(x) (5) = P (h(x) = 5) P ( X 7 = 5) = 2/36 = 1/18, p h(x) (4) = P (h(x) = 4) P ( X 7 = 4) = 4/36 = 2/18, p h(x) (3) = P (h(x) = 3) P ( X 7 = 3) = 6/36 = 3/18, p h(x) (2) = P (h(x) = 2) P ( X 7 = 2) = 8/36 = 4/18, p h(x) (1) = P (h(x) = 1) P ( X 7 = 1) = 10/36 = 5/18, p h(x) (0) = P (h(x) = 0) P ( X 7 = 0) = 6/36 = 3/18. 3

In this setting we can compute probabilities for events associated with h(x) X 7 in three ways: using each of the pmf s p, p X and p h(x). (iii) How do we compute the expectation (or expected value) of a (probability distribution) or a random variable? See Section 2.4. The expected value of a discrete probability distribution P is expected value = mean = k kp ({k}) = k kp(k), where P is the probability measure on S and p is the associated pmf, with p(k) P ({k}). The expected value of a discrete random variable X is E[X] = k kp (X = k) = k kp X (k) = s S X(s)P ({s}) = s S X(s)p(s). In the continuous case, with pdf s, we have corresponding formulas, but the story gets more complicated, involving calculus for computations. The expected value of a continuous probability distribution P with density f is expected value = mean = xf(x) dx. The expected value of a continuous random variable X with pdf f X is E[X] = xf X (x) dx = X(s)f(s) ds, where f is the pdf on S and f X is the pdf induced by X on R. (iv) How do we compute the expectation of a function of a random variable? Now we need to put everything above together. For simplicity, suppose S is a finite set, so that X and h(x) are necessarily finite-valued random variables. Then we can compute the expected value E[h(X)] in three different ways: E[h(X)] = s S = r R = t R s S h(x(s))p ({s}) = s S h(x(s))p(s) h(r)p (X = r) = r R h(r)p X (r) tp (h(x) = t) = t R tp h(x) (t). Similarly, we have the following expressions when all these probability distributions have probability density functions (the continuous case). First, suppose that the underlying probability distribution (measure) P on the sample space S has a probability density function (pdf) f. Then, under regularity conditions, the random variables X and h(x) have probability density 4

functions f X and f h(x). Then we have: E[h(X)] = = = s S h(x(s))f(s) ds h(r)f X (r) dr tf h(x) (t) dt. 2. Random Vectors, Joint Distributions, and Conditional Distributions We may want to talk about two or more random variables at once. For example, we may want to consider the two-dimensional random vector (X, Y ). (i) What does it mean for two random variables X and Y to be independent random variables? See Section 2.5.2. Pay attention to for all. We say that X and Y are independent random variables if P (X x, Y y) = P (X x)p (Y y) for all x and y. We can rewrite that in terms of cumulative distribution functions (cdf s) as We say that X and Y are independent random variables if F X,Y (x, y) P (X x, Y y) = F X (x)f Y (y) for all x and y. When the random variables all have pdf s, that relation is equivalent to f X,Y (x, y) = f X (x)f Y (y) for all x and y. (ii) What is the joint distribution of (X, Y ) in general? See Section 2.5. The joint distribution of X and Y is F X,Y (x, y) P (X x, Y y). (iii) How do we compute the conditional expectation of a random variable, given the value of another random variable, in the discrete case? See Section 3.2. There are two steps: (1) find the conditional probability distribution, (2) compute the expectation of the conditional distribution, just as you would compute the expected value of an unconditional distribution. Here is an example. We first compute a conditional density. Then we compute an expected value. Example 3.6 Here we consider conditional expectation in the case of continuous random variables. We now work with joint probability density functions and conditional probability density functions. We start with the joint pdf f X,Y (x, y). The definition of the conditional pdf is f X Y (x y) f X,Y (x, y), f Y (y) 5

where the pdf of Y, f Y (y), can be found from the given joint pdf by f Y (y) f X,Y (x, y) dx. Then we compute E[X Y = y] by computing the ordinary expected value E[X Y = y] = xf X Y (x y) dx, treating the conditional pdf as a function of x just like an ordinary pdf of x. Example 3.13 in 10 th ed., Example 3.12 in 9 th ed. This is the trapped minor example. This is another example with three doors. It shows how we can compute expected values by setting up a simple linear equation with one unknown. This is a common trick, worth knowing. As stated, the problem does not make much sense, because the miner would not make a new decision, independent of his past decisions, when he returns to his starting point. So think of the miner as a robot, who is programmed to make choices at random, independently of the past choices. That is not even a very good robot. But even then the expected time to get out is not so large. 3. moment generating functions Given a random variable X, the moment generating function (mgf) of X (really of its probability distribution) is ψ X (t) E[e tx ], which is a function of the real variable t, see Section 2.6 of Ross. (I here use ψ, whereas Ross uses φ.) An mgf is an example of a transform. The random variable could have a continuous distribution or a discrete distribution; Discrete case: Given a random variable X with a probability mass function (pmf) p n P (X = n), n 0,, the moment generating function (mgf) of X (really of its probability distribution) is ψ X (t) E[e tx ] p n e tn. The transform maps the pmf {p n : n 0} (function of n) into the associated function of t. Continuous case: Given a random variable X with a probability density function (pdf) f f X on the entire real line, the moment generating function (mgf) of X (really of its probability distribution) is ψ(t) ψ X (t) E[e tx ] n=0 f(x)e tx dx. In the continuous case, the transform maps the pdf {f(x) : x 0} (function of x) into the associated function of t. A major difficulty with the mgf is that it may be infinite or it may not be defined. For example, if X has a pdf f(x) A/(1 + x) p, x > 0, for p > 1, then the mgf is infinite for 6

all t > 0. Similarly, if X has the pmf p(n) A/n p for n = 1, 2,..., then the mgf is infinite for all t > 0. As a consequence, probabilists often use other transforms. In particular, the characteristic function E[e itx ], where i 1, is designed to avoid this problem. We will not be using complex numbers in this class. Two major uses of mgfs are: (i) calculating moments and (ii) characterizing the probability distributions of sums of random variables. Below are some illustrative examples. We did not do the Poisson example, but we did do the normal example, and a bit more. We showed that the sum of two independent normal random variables is again normally distributed with a mean equal to the sum of the means and a variance equal to the sum of the variances. That is easily done with the MGF s. Examples 2.37, 2.40 (2.36, 2.40 in 9 th ed.): Poisson Example 2.43 (2.42 in 9 th ed.): Normal Heuristic Proof of CLT: pp. 83-84 (pp. 82-83 in 9 th ed.) A full careful proof follows these same steps, but justifies everything rigorously; see below. 4. OPTIONAL EXTRA MATERIAL: Proofs In this final section we provide proofs of the weak law of large numbers (WLLN) and the central limit theorem (CLT). Both of these fundamental results give conditions for convergence in distribution, denoted by. We use characteristic functions (CF s), which are closely related to MGF s. This material is beyond the scope of this course, but here it is anyway, in case you are interested. A good reference for this is Chapter 6 of Chung, A Course in Probability Theory. That is a great book on measure-theoretic probability. The characteristic function (cf) is the mgf with an extra imaginary number i 1: φ(t) φ X (t) E[e itx ], where i 1 (a complex number). The random variable could have a continuous distribution or a discrete distribution. Unlike mgf s, every probability distribution has a well-defined cf. The reason is that e it is very different from e t. In particular, e itx = cos(tx) + i sin(tx). So the modulus or norm or absolute value of e itx is always finite: by virtue of the classic trig relation. e itx = cos(tx) 2 + sin(tx) 2 = 1, The key result behind these proofs is the continuity theorem for cf s. Theorem 0.1 (continuity theorem) Suppose that X n and X are real-valued random variables, n 1. Let φ n and φ be their characteristic functions (cf s), which necessarily are well defined. Then X n X as n 7

if and only if φ n (t) φ(t) as n for all t. Now to prove the WLLN (convergence in probability, which is equivalent to convergence in distribution here, because the limit is deterministic) and the CLT, we exploit the continuity theorem for cf s and the following two lemmas: Lemma 0.1 (convergence to an exponential) If {c n : n 1} is a sequence of complex numbers such that c n c as n, then (1 + (c n /n)) n e c as n. The next lemma is classical Taylor series approximation applied to the ch. Lemma 0.2 (Taylor s theorem) If E[ X k ] <, then the following version of Taylor s theorem is valid for the characteristic function φ(t) E[e itx ] φ(t) = j=k j=0 E[X j ](it) j j! + o(t k ) as t 0 where o(t) is understood to be a quantity (function of t) such that o(t) t 0 as t 0. Suppose that {X n : n 1} is a sequence of IID random variables. Let Theorem 0.2 (WLLN) If E[ X ] <, then S n X 1 + + X n, n 1. S n n EX as n. Proof. Look at the cf of S n /n: φ Sn/n(t) E[e its n/n ] = φ X (t/n) n = (1 + itex n + o(t/n)) n by the second lemma above. Hence, we can apply the first lemma to deduce that φ Sn /n(t) e itex as n. By the continuity theorem for cf s (convergence in distribution is equivalent to convergence of cf s), the WLLN is proved. Theorem 0.3 (CLT) If E[X 2 ] <, then S n nex nσ 2 N(0, 1) as n, where σ 2 = V ar(x). 8

Proof. For simplicity, consider the case of EX = 0. We get that case after subtracting the mean. Look at the cf of S n / nσ 2 : φ Sn / (t) E[e it[s n/ nσ 2] ] nσ 2 = φ X (t/ nσ 2 ) n it = (1 + ( )EX + ( it EX2 )2 nσ 2 nσ 2 2 = (1 + t2 2n + o(t/n))n e t2 /2 = φ N(0,1) (t) + o(t/n)) n by the two lemmas above. Thus, by the continuity theorem, the CLT is proved. - 9