n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Similar documents
Things to remember when learning probability distributions:

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Introduction to Probability Theory for Graduate Economics Fall 2008

Lecture Notes 2 Random Variables. Random Variable

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

Continuous Distributions

X = X X n, + X 2

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Notes 12 Autumn 2005

Fundamental Tools - Probability Theory II

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Northwestern University Department of Electrical Engineering and Computer Science

STA 4321/5325 Solution to Extra Homework 1 February 8, 2017

Lecture 3. Discrete Random Variables

Stat 134 Fall 2011: Notes on generating functions

Chapter 2. Discrete Distributions

Chapter 3: Random Variables 1

Class 26: review for final exam 18.05, Spring 2014

Quick Tour of Basic Probability Theory and Linear Algebra

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, < x <. ( ) X s. Real Line

ECON 5350 Class Notes Review of Probability and Distribution Theory

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

1.1 Review of Probability Theory

Common Discrete Distributions

Expectation of Random Variables

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

Lectures on Elementary Probability. William G. Faris

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Topic 3: The Expectation of a Random Variable

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

STAT 418: Probability and Stochastic Processes

Lecture 2: Repetition of probability theory and statistics

ESS011 Mathematical statistics and signal processing

CS145: Probability & Computing

a zoo of (discrete) random variables

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Suppose that you have three coins. Coin A is fair, coin B shows heads with probability 0.6 and coin C shows heads with probability 0.8.

Chapter 6: Large Random Samples Sections

STAT 414: Introduction to Probability Theory

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Poisson approximations

STAT 3610: Review of Probability Distributions

1 Bernoulli Distribution: Single Coin Flip

Stationary independent increments. 1. Random changes of the form X t+h X t fixed h > 0 are called increments of the process.

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

1 Random Variable: Topics

Chapter 3: Random Variables 1

Midterm Exam 1 Solution

1 Review of Probability

Continuous Variables Chris Piech CS109, Stanford University

1 Generating functions

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

Tom Salisbury

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

Twelfth Problem Assignment

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Continuous Random Variables and Continuous Distributions

Review of Probability Theory

STAT Chapter 5 Continuous Distributions

Mathematical Statistics 1 Math A 6330

Stochastic Models in Computer Science A Tutorial

Binomial and Poisson Probability Distributions

MAS113 Introduction to Probability and Statistics. Proofs of theorems

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Algorithms for Uncertainty Quantification

Analysis of Engineering and Scientific Data. Semester

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

SDS 321: Introduction to Probability and Statistics

Chapter 4 Multiple Random Variables

3 Multiple Discrete Random Variables

Introduction to Probability

Lecture 1: Review on Probability and Statistics

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

THE QUEEN S UNIVERSITY OF BELFAST

Chapter 2: Random Variables

Math/Stat 352 Lecture 8

1 Variance of a Random Variable

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION MODULE 2

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing.

Guidelines for Solving Probability Problems

Lecture 16. Lectures 1-15 Review

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

15 Discrete Distributions

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

Chapter 4. Chapter 4 sections

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

STAT/MATH 395 PROBABILITY II

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Lecture 1: August 28

Random Variables and Their Distributions

Transcription:

Lectures 3-4 jacques@ucsd.edu 7. Classical discrete distributions D. The Poisson Distribution. If a coin with heads probability p is flipped independently n times, then the number of heads is Bin(n, p) and the epected number of heads is λ = np. The probability of heads is p ( p) n. For any fied and λ, lim n p = lim n p n(n )... (n + )! = λ!. Also lim ( n p)n = lim ( λ ) n = e λ. n n Therefore in the limit as n and with λ = np fied, the binomial pmf converges to f() = λ e λ! =,, 2,.... A random variable X with this pmf is said to have the Poisson distribution, written X Poi(λ). Note that f is a pdf since λ e λ =! = e λ = λ! = e λ e λ = using the familiar Taylor series for e λ. A Poisson random variable X counts the number of hits or successes or arrivals for a Poisson arrival process, where arrivals occur randomly at epected rate λ per unit time, and the numbers of arrivals in disjoint segments of time are independent, namely if X and Y denote the number of arrivals in two disjoint segments of time, then P (X and Y y) = P (X )P (Y y). The number of arrivals in one unit of time is then Poi(λ), and the number of arrivals in t units of time is Poi(tλ).

We verify that the epected value of X Poi(λ) is indeed λ. We have E(X) = f() = = = λ e λ.! We could proceed using the derivative of the following power series (this is left as an eercise): z e λ g(z) =.! However, it is easier to change the range of summation: since /! = /( )!, = = λ e λ! = λe λ = λ ( )! = λe λ e λ = λ. So we have verified the epectation of a Poisson random variable. Note that in particular P (X = ) = e λ and also P (X > ) = P (X = ) = e λ when X Poi(λ). E. Negative Binomial Distribution. This generalizes the geometric distribution. A coin with heads probability p is flipped independently. Let X denote the number of coin flips up to and including the rth heads. For r =, X Geo(p). For r >, we say that X has the negative binomial distribution. To determine the pmf of X, if X = then in the first tosses there must be r heads, followed by heads on the th toss. There are ( r ) to choose on which tosses the first r heads occur, and the probability of this occurring is p r ( p) r. Therefore the pmf is ( ) f() = P (X = ) = p r ( p) r = r, r +, r + 2,.... r We could check that f is a pmf by summing it, but we do not do this technical calculation here. To compute E(X), we can use linearity of epectation: X = X + + X r where X i Geo(p), since the number of tosses between the ith heads and up to and including the (i + )th heads has a geometric distribution (this number of tosses is independent of where the ith heads occurred and independent of the preceding tosses). Therefore since E(X i ) = /p for i r, we have E(X) = E(X ) + E(X 2 ) + + E(X r ) = r p. 2

7.2 Classical continuous distributions A. The Uniform Distribution. A random variable X has uniform distribution on [a, b], written X U[a, b], if X has pdf f() = b a a b. This is indeed a pdf, since f()d = b a d =. b a The epectation is E(X) = f()d = b a b a d = b a 2. B. The Eponential Distribution. A random variable X has eponential distribution with parameter θ >, written X Ep(θ), if it has pdf To see that it is a pdf, f()d = f() = θe θ. The cdf can be worked out nicely too: The epected value of X is F () = P (X ) = E(X) = θe θ d = lim e θ =. θe θt dt = e θ. θe θ d. An integration by parts shows E(X) = /θ. The eponential distribution is the continuous analog of the geometric distribution. In the geometric distribution, X measures the time to the first heads given that p is the probability of heads. If we imagine instead that in a unit time, points arrive at a rate θ, then the time it takes for the first point to arrive is X, and X has the eponential distribution. One of the very useful properties of the eponential distribution is is that it is memoryless. This means that given that 3

no point has arrived up to time t, the time from t to the time when the point arrives has the same eponential distribution. We can verify this using conditional probability: P (X + t X t) = = = P (t X t + ) P (X t) P (X t + ) P (X t) P (X t) F (t + ) F (t) = e θ F (t) = F () So it is as though the process before time t was forgotten. The eponential distribution is related to the Poisson distribution as follows: in a Poisson arrival process with rate λ, if X i is the time between the ith and the (i + )th arrival, then X i Ep(λ), and these waiting times are independent, namely P (X i, X j y) = P (X i )P (X j y). C. Normal or Gaussian Distribution. This is one of the most important distributions of all. A random variable X has standard normal distribution, written X N(, ), if it has pdf f() = e 2 /2 R. To see that this is a pdf, we have to show f()d =. Since f() is an even function, it is enough to show f() = 2. We instead consider the square of the integral, ( ) 2 f()d = Using polar co-ordinates (r, θ), r = 2 + y 2 and e 2 (2 +y 2) dyd = Making the substitution u = r 2 /2, this is π/2 e 2 (2 +y 2) dyd. π/2 e u dudθ = 4. 4 re r2 /2 drdθ.

This completes the verification that f is a pdf. The epectation is E(X) = e 2 /2 d. However, e 2 /2 is an odd function and the integral converges, therefore it converges to zero and E(X) =. The cdf for a standard normal random variable cannot be written in closed form, but nevertheless it is very useful, and traditionally denoted Φ() = P (X ) = e t2 /2 dt. The normal distribution is fundamental in approimations of sums of many independent random variables, via the central limit theorem, which we shall see soon. D. The Gamma Distribution. Let X be the time of the rth arrival in a Poisson arrival process with rate λ. Then X is said to have the gamma distribution, X Γ(r, λ). A pdf of X can be determined to be f() = λr (r )! r e λ This is the continuous version of the negative binomial distribution. To check that f is a pdf requires r iterations of integration by parts to reduce the r to a constant. A similar technique applies to obtain E(X) = r/λ. We do not investigate the technical details. 7.3 Standard deviation, variance, and other epectations The variance of a random variable X with mean µ = E(X), when it eists, is E((X µ) 2 ). It is often denoted var(x) or σ 2. The standard deviation is var(x), often denoted by σ. As an eercise, we note that var(x) = E((X µ) 2 ) = E(X 2 ) E(X) 2 is sometimes a more convenient way to compute variance. These quantities measure how concentrated a random variable is near its epectation: the higher the variance, the less the random variable is concentrated at its mean. More generally, if g : R R is any function, we can consider E(g(X)) when it eists. In particular, when g(z) = z k, we obtain the central moments E(X k ). 5

First Eample. We compute the variance, standard deviation and all central moments for the uniform random variable X U[, ]. For this random variable, E(X) = /2. Now E(X k ) = k f()d = k d since the pdf is f() = for. Now we integrate to get E(X k ) = k +. For k = this agrees with E(X) = /2. The variance is E(X 2 ) E(X) 2 = 3 ( 2 The standard deviation is / 2. ) 2 = 2. Second Eample. Let X Poi(λ) and let X k = X(X )... (X k + ). We compute E(X k ) for all k as well as the variance and standard deviation. For k = it is just E(X) = λ. In general, E(X k ) = = ( )... ( k + )λ e λ = =k λ e λ ( k)! = e λ λ k =k λ k ( k)! = e λ λ k e λ = λ k. These are often called factorial moments. Now for the variance, var(x) = E(X 2 ) E(X) 2 = E(X(X ))+E(X) E(X) 2 = λ 2 +λ λ 2 = λ. So the variance actually equals the epectation E(X). The standard deviation is λ. 6!

Third Eample. We compute the variance of X Bin(n, p). First we use the formula var(x) = E(X(X )) + E(X) E(X) 2 which was used in the last eample. We know E(X) = np and so it remains to determine E(X(X )). By definition, E(X(X )) = n ( )f() = = n ( ) p ( p) n. = We use the second derivative of the power series g(z) = n = z ( p) n. The second derivative is g (z) = n ( ) z 2 ( p) n. = The reason for doing this is that p 2 g (p) = E(X(X )). Now by the Binomial Theorem, g(z) = ( p+z) n and then g (p) = n(n ). It follows that E(X(X )) = p 2 n(n ) and so var(x) = p 2 n(n ) + pn p 2 n 2 = np( p). This is the variance of a binomial random variable, and it will come up importantly in the work to follow. Fourth Eample. In n 2 independent tosses of a fair coin, let X be the number of pairs of consecutive heads. We determine var(x). To do so, we use that X = X + X 2 + + X 9 where X i = if toss i and toss i + gave heads, and X i = otherwise. Then var(x) = E(X 2 ) E(X) 2 = i,j E(X i X j ) i,j E(X i )E(X j ). 7

Here the sums are over all ordered pairs (i, j) with i, j n. Now X i and X j are not independent so we have to be careful when computing E(X i X j ). We note that E(X i ) = P (X i = ) = /4 and E(X 2 i ) = E(X i ) = /4. If i and j are consecutive integers, then E(X i X j ) = /8 since this is eactly the probability of three consecutive heads. Otherwise, E(X i X j ) = /6 = E(X i )E(X j ). So all terms in the above sums cancel out ecept when i = j or i = j + or j = i +. The terms coming from i = j contribute n n E(Xi 2 ) E(X i ) 2 = i= i= 3(n ) 6 using E(Xi 2 ) E(X i ) 2 = /4 /6 = 3/6. The terms coming from j = i+ contribute n 2 n 2 E(X i X i+ ) E(X i )E(X i+ ) = n 2 6. i= i= The same contribution is made from the terms with i = j +. Putting it all together, 3(n ) 2(n 2) var(x) = + = 5n 7. 6 6 6 This eample is an illustration of how tricky variance can be to compute. While epectation is linear, variance is certainly not in general. 7.4 Markov s and Chebyshev s Inequalities To make more precise how the variance measures deviation from the mean, we introduce two concentration inequalities. Proposition (Markov s Inequality) Let X be a random variable such that E(X) eists and X. Then for any positive real number λ, P (X λ) E(X) λ. Proof Let Y be the random variable such that Y = if X λ and Y = otherwise. Then λy X and so E(λY ) E(X). Now E(λY ) = λe(y ) by linearity of epectation, and E(Y ) = P (X λ). 8

We conclude λp (X λ) E(X) and this gives the result. Proposition 2 (Chebyshev s Inequality) Let X be a random variable such that E(X) and var(x) eist. Then for any positive real number λ, P ( X E(X) λ) var(x) λ 2. Proof The random variable X E(X) 2 is non-negative so Markov s Inequality applies to it. We get P ( X E(X) 2 λ 2 ) E( X E(X) 2 ) λ 2 from Markov s Inequality. Now E( X E(X) 2 ) = var(x) and we are done. First Eample. A dart is randomly thrown at a dartboard times, with probability /5 on each turn of hitting the bullseye. Show that the probability of getting at least bullseyes is at most /5. By linearity of epectation, the epected number of bullseyes is 2. If X is the number of bullseyes, P (X ) E(X) = 5. Second Eample. In independent tosses of a fair coin, show that the probability of at most heads is at most. Let X be the number of heads. Then E(X) = 5 and var(x) = np( p) = 25. By Chebyshev s Inequality P ( X E(X) λ) var(x) λ 2. At most ten heads is contained in the event X 5 4. So P (X ) P ( X 5 4) 25 6 = 64. Similarly P (X 9) /64. Note that this is better than what Markov s Inequality would give directly, P (X 9) E(X) 9 = 5 9. It turns out that we can in the instance of independent coin tosses we can do much better than Chebyshev s Inequality using the central limit theorem. The advantage of Chebyshev s Inequality is that it applies to any random variable, not just the number of heads in independent coin tosses. 9