MATH Notebook 5 Fall 2018/2019

Similar documents
MATH Notebook 4 Fall 2018/2019

STAT 414: Introduction to Probability Theory

Lecture 1: August 28

Discrete Distributions

Name: Firas Rassoul-Agha

STAT 418: Probability and Stochastic Processes

Math 493 Final Exam December 01

MATH4427 Notebook 2 Fall Semester 2017/2018

1 Review of Probability

Math 510 midterm 3 answers

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

MATH4427 Notebook 4 Fall Semester 2017/2018

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Introduction and Overview STAT 421, SP Course Instructor

CS 361: Probability & Statistics

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Polytechnic Institute of NYU MA 2212 MIDTERM Feb 12, 2009

MATH Solutions to Probability Exercises

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

1 Presessional Probability

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 1

IC 102: Data Analysis and Interpretation

HW1 (due 10/6/05): (from textbook) 1.2.3, 1.2.9, , , (extra credit) A fashionable country club has 100 members, 30 of whom are

Stat 100a, Introduction to Probability.

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 3. Discrete Random Variables

Topic 3: The Expectation of a Random Variable

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

1: PROBABILITY REVIEW

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Convergence of Random Processes

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Example 1. Assume that X follows the normal distribution N(2, 2 2 ). Estimate the probabilities: (a) P (X 3); (b) P (X 1); (c) P (1 X 3).

p. 4-1 Random Variables

Statistics for Economists. Lectures 3 & 4

Communication Theory II

Common Discrete Distributions

Expected Value 7/7/2006

Lecture 2: Review of Probability

STAT 3610: Review of Probability Distributions

Chapter 2: Random Variables

Lecture 1. ABC of Probability

Recitation 2: Probability

Chap 2.1 : Random Variables

Lecture 2: Repetition of probability theory and statistics

Lecture Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Things to remember when learning probability distributions:

Statistical Methods in Particle Physics

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Homework 10 (due December 2, 2009)

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Introduction to Machine Learning

Continuous Probability Distributions. Uniform Distribution

Chapter 5. Chapter 5 sections

REPEATED TRIALS. p(e 1 ) p(e 2 )... p(e k )

Expectation of Random Variables

Chapter 4 : Discrete Random Variables

Analysis of Engineering and Scientific Data. Semester

Expectations and Variance

Page Max. Possible Points Total 100

Limiting Distributions

Introduction to Machine Learning

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Week 9 The Central Limit Theorem and Estimation Concepts

Random variables (discrete)

Examples of random experiment (a) Random experiment BASIC EXPERIMENT

Expectations and Variance

Chapter 2. Discrete Distributions

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Errata List for the Second Printing

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

8 Laws of large numbers

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Limiting Distributions

Probability Distributions Columns (a) through (d)

Chapter 6: Functions of Random Variables

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

1.1 Review of Probability Theory

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Week 2. Review of Probability, Random Variables and Univariate Distributions

Algorithms for Uncertainty Quantification

Suppose that you have three coins. Coin A is fair, coin B shows heads with probability 0.6 and coin C shows heads with probability 0.8.

Introduction to Information Entropy Adapted from Papoulis (1991)

More than one variable

Discrete Distributions

Transcription:

MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables............................. 3 5.1.1 Running Sums and Running Averages........................ 3 5.2 Law of Large Numbers (LLN)................................. 4 5.3 Central Limit Theorem (CLT)................................ 7 5.3.1 Special Cases...................................... 12 5.4 Moment Generating Functions and the Central Limit Theorem.............. 15 1

2

5 MATH442601 2 Notebook 5 This notebook is concerned with patterns that emerge as the number of repetitions of an experiment grows large. We will study two important theorems: the law of large numbers, and the central limit theorem. The notes correspond to material in Chapter 5 of the Rice textbook. 5.1 Sequences of IID Random Variables Let X 1, X 2, X 3,... be a sequence of mutually independent, identically distributed (IID) random variables, each with the same distribution as X. 5.1.1 Running Sums and Running Averages Two related sequences are of interest: m 1. Sequence of Running Sums: S m = X i, for m = 1, 2, 3,.... i=1 2. Sequence of Running Averages: X m = 1 m m X i, for m = 1, 2, 3,.... i=1 Example. For example, consider tossing a fair coin and recording X = 1 if a head appears and X = 0 otherwise. I used the computer to simulate 30 coin tosses, and here are the results: 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0 The observed sequences of running sums (left) and averages (right) are shown below: Note that X is a Bernoulli random variable with p = 1 2 and, for each m, S m is a binomial random variable based on m trials of a Bernoulli experiment with success probability 1 2. 3

5.2 Law of Large Numbers (LLN) For distributions with finite mean and standard deviation, the law of large numbers says that the sample mean, X, is unlikely to be far from the true mean, µ, when the sample size is large enough. Formally, Theorem (Law of Large Numbers). Let X be a random variable with mean µ = E(X) and standard deviation σ = SD(X), and let ɛ be a positive real number. Then ( ) lim P X m µ ɛ = 0. m Example. To illustrate the law of large numbers, I used the computer to simulate 10000 repetitions of a Bernoulli experiment and of a Cauchy experiment. The left column of plots is five observed sequences of running averages from the Bernoulli experiment, and the right column is five observed sequences of running averages from the Cauchy experiment. Bernoulli Distribution, p = 1/2: Cauchy Distribution, a = 0, b = 1: Left Plots (Bernoulli): The running averages are almost identically equal to 1 2 for large m. Right Plots (Cauchy): Running averages can be quite far from the center of the Cauchy distribution, however, since this distribution has no mean and no standard deviation. 4

Exercise. One form of the Chebyshev inequality says that ) P ( Y µ y kσ y 1 k 2, where Y is a random variable with mean µ y = E(Y ) and standard deviation σ y = SD(Y ), and k is a positive constant. Use this form to prove the law of large numbers. 5

Monte Carlo Evaluation of Integrals. An interesting application of the law of large numbers is to the approximation of multiple integrals. Example. For example, consider evaluating the double integral 1 2 0 0 e xy dy dx, and let R = [0, 1] [0, 2] be the region of integration. [Note that the value of the double integral is the volume of the solid shown on the right.] Assume that (X, Y ) has a bivariate uniform distribution on R, and let W = e XY. 1. Since the area of R is 2, the expected value of W is 1 2 ( ) 1 µ = E(W ) = E(e XY ) = e xy dy dx = 1 ( 1 2 2 and 2µ is the double integral we want. 0 0 0 2 0 ) e xy dy dx, 2. To approximate the double integral using Monte Carlo simulation, consider the sequence 2W m = 2 m m e X iy i, for m = 1, 2, 3,..., i=1 where the (X i, Y i ) are drawn from the bivariate uniform distribution on the rectangle. 3. I used the computer to simulate 100,000 repetitions of the bivariate uniform experiment, and I plotted the sequence of observed estimates 2w m, for m = 1, 2, 3,..., 100000. 6

5.3 Central Limit Theorem (CLT) The most important theorem in a probability course is the central limit theorem. Its proof is attributed to P. Laplace and A. demoivre. Theorem (Central Limit Theorem). Let X 1, X 2, X 3,... be a sequence of independent, identically distributed random variables, each with the same distribution as X. Assume that the moment generating function of X converges in an open interval containing zero, and let µ = E(X) and σ = SD(X). Then, for every real number x, ( ) lim P Sm mµ m σ m x = Φ(x), where S m = Notes: m X i, and Φ() is the CDF of the standard normal random variable. i=1 1. Convergence in Distribution: The result of the central limit theorem is an example of convergence in distribution. Specifically, let Z m = S m mµ σ, for m = 1, 2, 3,..., be the m sequence of standardized running sums, and let F m (x) = P (Z m x) be the CDF of Z m, for each m. The central limit theorem says that the sequence of CDF s converges to the CDF of the standard normal random variable. That is, F m (x) Φ(x), as m. Thus, we say that Z m converges in distribution to the standard normal random variable. 2. Continuity Theorem: The proof of the central limit theorem uses a property of moment generating functions known as the continuity theorem. Continuity Theorem. Let {F m (x)} be a sequence of cumulative distribution functions for random variables with corresponding moment generating functions {M m (t)}, and let F (x) be the CDF of a random variable whose moment generating function is M(t). If M m (t) M(t) for all t in an open interval containing 0, then F m (x) F (x) at all continuity points of the CDF F (x). In our application of the continuity theorem, M m (t) is the moment generating function of the standardized sum Z m = S m mµ σ, and we can demonstrate that m M m (t) e t2 /2 for all t in an open interval containing 0. Since M(t) = e t2 /2 is the moment generating function of the standard normal random variable, the result follows. 7

3. Normal Approximation to a Sample Sum: Since Z m converges in distribution to the standard normal, we can use probabilities based on the standard normal random variable to approximate probabilities for sums of independent, identically distributed (IID) random variables when m is large enough. That is, P (S m s) Φ ( s mµ σ m ), for each s, when m is large enough. 4. Rate of Convergence: The central limit theorem tells us nothing about the rate of convergence of the sequence of CDFs. The graph below illustrates the speed of convergence for the finite discrete random variable X whose PMF is as follows: x 1 2 3 4 5 p(x) 0.30 0.25 0.20 0.15 0.10 The plot shows the probability histograms of S 1 = X, S 2, S 3,..., S 10. Note that the distribution of S 10 is approximately bell-shaped. The convergence for other types of distributions may take much longer. For example, if the distribution of X is very skewed, then the convergence will be slow. Thus, we need to be cautious when using normal approximations. 5. Normal Approximation to a Sample Mean: If we divide the numerator and denominator of the standardized sum Z m by m, we see that Z m = S m mµ σ m = (S m mµ)/m (σ m)/m = X m µ σ/, for m = 1, 2, 3,.... m Thus, the central limit theorem tells us that we can use probabilities based on the standard normal random variable to approximate probabilities for averages of independent, identically distributed (IID) random variables when m is large enough. That is, ( ) x µ P (X m x) Φ σ/, for each x. m 6. Normal Random Variable X: If X is a normal random variable, the formulas in items 3 and 5 are exact. That is, the distributions of the sample sum and of the sample mean are exactly normal for all m. 8

Exercise. Let X be the discrete random variable with PMF x 1 2 3 4 p(x) 0.10 0.20 0.30 0.40 ( = ) where p(x) = 0 otherwise, and let S be the sample sum of a random sample of size 50 from the X distribution. [Note: A dot plot of pairs (s, P (S = s)) is shown to the right.] Use the central limit theorem to approximate P (120 S 160). 9

Exercise. The Quick Delivery Package Service can hold up to 120 packages in its new Puddle Jumper aircraft. Let X equal the weight (in pounds) of a package and assume that X has a continuous distribution with PDF f(x) = 1 2592 x when 0 x 72, and 0 otherwise. Given a random sample of size 120 from the X distribution, use the appropriate large sample approximation to estimate the probability that (a) The total weight is more than 6000 pounds. (b) The mean weight is between 45 and 51 pounds. 10

Exercise. Suppose that calls received by a medical hotline during the night shift (midnight to 8AM) follow a Poisson process with an average rate of 5.5 calls per shift. Let X be the sample mean number of calls received by per night shift over a 28-day period. Use the appropriate large sample approximation to estimate the probability that X is between 4.5 and 6.1. 11

5.3.1 Special Cases The central limit theorem implies that the distributions of certain binomial, negative binomial, Poisson and gamma random variables can be approximated by normal distributions. Binomial Distribution: Let X be a binomial random variable with parameters n and p. Since X can be thought of as the sum of n independent Bernoulli random variables, each with success probability p, the distribution of X can be approximated by a normal distribution with mean np and standard deviation np(1 p) when n is large. Negative Binomial Distribution: Let X be a negative binomial random variable with parameters r and p. Since X can be thought of as the sum of r independent geometric random variables, each with success probability p, the distribution of X can be approximated by a normal distribution with mean r p and standard deviation r(1 p) when r is large. p 2 Poisson Distribution: Consider events occurring over time according to a Poisson process. Let X be the number of events occurring in a fixed time period, and assume that we expect λ events for this time period. Then X is a Poisson random variable with parameter λ. If we subdivide the interval into, say, m subintervals, then X can be thought of as the sum of m independent Poisson random variables, each with parameter λ m. Thus, X can be approximated by a normal distribution with mean λ and standard deviation λ when λ is large. Gamma Distribution: Let X be the time measured from time 0 to the occurrence of the r th event of a Poisson process whose rate is λ events per unit time. Then X can be thought of as the sum of r independent exponential random variables representing (1) the time to the first event, (2) the time from the first to the second event, (3) the time from the second to the third event, and so forth. Thus, X can be approximated by a normal distribution with mean r λ and standard deviation r = r λ 2 λ when r is large. 12

Exercise (The House Special). A roulette wheel has 38 pockets labelled 0, 00, 1, 2, 3, 4,..., 36. Assume that the ball lands in each pocket with equal probability and the results are independent from game to game. The house special is a bet on the 5 pockets 0, 00, 1, 2, 3. The bet pays 6-to-1. (If you place a one dollar bet and win, then you get back your dollar plus six more dollars; if you lose, then you get back nothing.) Assume you make 175 one-dollar bets on the house special. Use the appropriate large sample approximation to estimate the probability that you come out ahead. 13

Exercise. Assume that the occurrences of small earth tremors in southern California follow a Poisson process, with an average rate of 1 event every 2 months. Use the appropriate large sample approximation to estimate the probability that the time (starting from today) until the 95 th small tremor is 180 months or less. 14

5.4 Moment Generating Functions and the Central Limit Theorem As mentioned in the previous section, the proof of the central limit theorem is an application of the continuity theorem. In this section, we go through the steps leading to the key result, namely, that M m (t) e t2 /2 for all t in an open interval containing 0. Recall that µ = E(X) and σ = SD(X). Now, Step 1: Rewrite Standardized Sum as a Sum. For a given m, Z m = S m mµ σ m = ( m i=1 X i) mµ σ m = m i=1 (X i µ) σ m = m X i µ m σ m = Y i. i=1 i=1 Step 2: Find the Mean and Variance of Each Summand. Then, Y = 1 σ (X µ). m For a given m, let E(Y ) = V ar(y ) = Step 3: Approximate the Moment Generating Function of Each Summand. For a given m, M Y (t) = = k=0 k=0 M (k) Y (0) k! E(Y k ) t k k! t k (Maclaurin series expansion) (since M (k) Y (0) = E(Y k )) = 1 + E(Y )t + E(Y 2 ) 2 t 2 + E(Y 3 ) 3! t 3 + E(Y 4 ) t 4 +... 4! 1 + E(Y )t + E(Y 2 ) t 2 2 (quadratic approximation) = 15

Step 4: Approximate the Moment Generating Function of the Standardized Sum. Let M m (t) be the moment generating function of the standardized sum for a given value of m. Then, M m (t) = ( ) m ( ) m M Y (t) Step 5: Limit of the Approximation. (Please complete this step.) 16