Topic 3: The Expectation of a Random Variable

Topic 3: The Expectation of a Random Variable Course 003, 2016 Page 0

Expectation of a discrete random variable Definition: The expected value of a discrete random variable exists, and is defined by EX = x R(X) xf(x) The expectation is simply a weighted average of possible outcomes, with the weights being assigned by f(x). In general EX R(X). Consider the experiment of rolling a die where the random variable is the number of dots on the die. The density function is given by f(x) = 1 6 I {1,2...6}(x) The expectation is given by 6 x=1 x 6 I {1,2...6} (x) = 3.5 We can think of the expectation as a point of balance: if weights are placed on a weightless rod, where should a fulcrum be placed so that the rod balances? If X can take only a finite number of different values, this expectation always exists With an infinite sequence of possible values, it may or may not exist- high values would need to have low probability weights.. The expectation of X is also called the expected value or the mean of X. Page 1

Expectations: some useful results Result 1: If X and Y are r.v. s with the same distribution, then E(X) = E(Y) (if either side exists) The proof follows immediately from the definition of E(X). Note that the converse of the above statement is NOT true. This is a common mistake. Result 2: For any r.v. X and a constant c E(cX) = ce(x) Result 3 (linearity of expectation): For any r.v.s X, Y, E(X + Y) = E(X) + E(Y) Note: We do not require independence of these two r.v.s. Page 2

The expectations of Bernoulli and Binomial r.v.s Bernoulli: Using the definition of the expectation, this is 1p + 0q = p Binomial: A binomial r.v. can be expressed as the sum of n Bernoulli r.v.s: X = I 1 + I 2 +... I n. By linearity, E(X) = E(I 1 ) + E(I 2 ) +... E(I n ) = np. Page 3

The Geometric distribution Definition (Geometric distribution): Consider a sequence of independent Bernoulli trials, all with success probability p (0, 1). Let X be the number of failures before the first successful trial. Then X Geom(p) The PMF is given by P(X = k) = q k p for k = 0, 1, 2,... where q = 1 p. This is a valid PMF since the sum p k=0 q k = p 1 q = 1 The expectation of a geometric r.v. is defined as E(X) = kq k p k=0 We know that k=0 q k = 1 1 q. Differentiating both sides w.r.t. q, we get multiplying both sides by pq, we get 1 E(X) = pq (1 q) 2 = q p k=0 kq k 1 = 1 (1 q) 2. Now Page 4

The Negative Binomial distribution Definition (Negative Binomial distribution): Consider a sequence of independent Bernoulli trials, all with success probability p (0, 1). Let X be the number of failures before the rth success. Then X NBin(r, p) The PMF is given by for k = 0, 1, 2,... where q = 1 p. ( ) n + r 1 P(X = n) = p r q n r 1 We can write the Negative Binomial as a sum of r Geom(p) r.v.s: X = X 1 + X 2 + X r The expectation of a Negative Binomial r.v. is therefore E(X) = r q p Page 5

Expectation of a function of X Result: If X is a discrete random variable and g is a function from R to R, then E(g(X)) = g(x)f(x) x R(X) Intuition: Whenever X = x, g(x) = g(x), so we can assign p(x) to g(x). This is a very useful result, because it tells us we don t need the PMF of g(x) to find its expected value, and we are often interested in functions of random variables: expected revenues from the distribution of yields health outcomes as a function of food availability Think of other examples. Page 6

Variance of a random variable Definition (variance and standard deviation): The variance of and r.v. X is Var(X) = E[(X EX) 2 ] The square root of the variance is called the standard deviation of X. SD(X) = Var(X) Notice that by linearity, E(X EX) = 0 always since positive and negative deviations cancel. By squaring them we ensure all deviations contribute to the overall variability. However variance is in squared units, so to get back the the original units we take the square root. We can expand the above expression to rewrite the variance in a form that is often more convenient: Var(X) = E(X 2 ) (EX) 2 Page 7

Variance properties Result 1: Var(X) 0 with equality if and only if P(X = a) = 1 for some constant a. Result 2: For an constants a and b, Var(aX + b) = a 2 Var(X). It follows that Var(X) = Var( X) Proof: Var(aX + b) = E[(aX + b (aex + b)) 2 ] = E[(a(X EX)) 2 ] = a 2 E[(X EX) 2 ] = a 2 Var(X) Result 3: If X 1,..., X n are independent random variables, then Var(X 1 + + X n ) = Var(X 1 ) + + Var(X n ). Note that this is only true with independent r.v.s, variance is not in general linear. Proof: Denote EX by µ. For n = 2, E(X 1 + X 2 ) = µ 1 + µ 2 and therefore Var(X 1 + X 2 ) = E[(X 1 + X 2 µ 1 µ 2 ) 2 ] = E[(X 1 µ 1 ) 2 + (X 2 µ 2 ) 2 + 2(X 1 µ 1 )(X 2 µ 2 )] Taking expectations, we get E[(X 1 µ 1 ) 2 + (X 2 µ 2 ) 2 + 2(X 1 µ 1 )(X 2 µ 2 )] = Var(X 1 ) + Var(X 2 ) + 2E[(X 1 µ 1 )(X 2 µ 2 )] But since X 1 and X 2 are independent, E[(X 1 µ 1 )(X 2 µ 2 )] = E(X 1 µ 1 )E(X 2 µ 2 ) = (µ 1 µ 1 )(µ 2 µ 2 ) = 0 It therefore follows that Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ) Page 8 Using an induction argument, this can be established for any n

The Poisson distribution Definition (Poisson distribution): An r.v. X has the Poisson distribution with parameter λ > 0 if the PMF of X is P(X = k) = e λ λ k for k = 0, 1, 2,.... We write this as X Pois(λ). This is a valid PMF because the Taylor series 1 + λ + λ2 f(k) = e λ λ k = e λ e λ = 1. k k=0 As λ gets larger, the PMF becomes more bell-shaped. 2! + λ3 3! +... converges to e λ so Page 9

The Poisson mean and variance The expectation E(X) = e λ k=0 k λk = e λ k=1 k λk = λe λ The variance is also λ. To see this let s start with finding E(X 2 ): Since k=0 e λ λ k = 1, since this vanishes): k=0 Differentiate again to get k=1 λ k 1 (k 1)! = λe λ e λ = λ λ k = e λ. Now differentiate both sides w.r.t λ (and ignore the k = 0 term k=1 k λk 1 = e λ or k=1 k λk = λeλ k=1 k 2 λk 1 = e λ + λe λ or k=1 k 2 λk = eλ λ(1 + λ) So E(X 2 ) = e λ k=1 k 2 λk = λ(1 + λ) and Var(X) = λ(1 + λ) λ 2 = λ We ll see later that both these can be derived easily from the moment generating function of the Poisson. Page 10

Poisson additivity Result: If X Pois(λ 1 ) and Y Pois(λ 2 ) are independent, then X + Y Pois(λ 1 + λ 2 ) Proof: To get the PMF of X + Y, we use the law of total probability: P(X + Y = k) = = = = = k P(X + Y = k X = j)p(x = j) j=0 k P(Y = k j)p(x = j) j=0 k j=0 e λ 2 λ k j 2 (k j)! e (λ 1+λ 2 ) k j=0 e λ 1 λ j 1 j! ( ) k λ j 1 j λk j 2 e (λ 1+λ 2 ) (λ 1 + λ 2 ) k (using the binomial theorem) Page 11

A Poisson process Suppose that the number of type A outcomes that occur over a fixed interval of time, [0, t] follows a process in which 1. The probability that precisely one type A outcome will occur in a small interval of time t is approximately proportional to the length of the interval: g(1, t) = γ t + o( t) where o( t) denotes a function of t having the property that lim t 0 o( t) t = 0. 2. The probability that two or more type A outcomes will occur in a small interval of time t is negligible: g(x, t) = o( t) x=2 3. The numbers of type A outcomes that occur in nonoverlapping time intervals are independent events. These conditions imply a process which is stationary over the period of observation, i.e the probability of an occurrence must be the same over the entire period with neither busy nor quiet intervals. Page 12

Poisson densities representing poisson processes RESULT: Consider a Poisson process with the rate γ per unit of time. The number of events in a time interval t is a Poisson density with mean λ = γt. Applications: the number of weaving defects in a yard of handloom cloth or stitching defects in a shirt the number of traffic accidents on a motorway in an hour the number of particles of a noxious substance that come out of chimney in a given period of time the number of times a machine breaks down each week Example: let the probability of exactly one blemish in a foot of wire be 1000 1 blemishes be zero. and that of two or more we re interested in the number of blemishes in 3, 000 feet of wire. if the numbers of blemishes in non-overlapping intervals are assumed to be independently distributed, then our random variable X follows a poisson process with λ = γt = 3 and P(X = 5) = 35 e 3 5! you can plug this into a computer, or alternatively use tables to compute f(5; 3) =.101 Page 13

The Poisson as a limiting distribution We can show that a binomial distribution with large n and small p can be approximated by a Poisson ( which is computationally easier). useful result: e v = lim n (1 + v n )n We can rewrite the binomial density for non-zero values as f(x; n, p) = x (n i+1) i=1 x! p x (1 p) n x. If np = λ, we can subsitute for p by ( λn ) to get lim n f(x; n, p) = lim n = e λ λ x x! = lim n x (n i + 1) i=1 x! x (n i + 1) i=1 n x [ n (n 1) = lim n..... n n ( λ ) x ( 1 λ ) n x n n λ x x! ( 1 λ ) n ( 1 λ ) x n n (n x + 1) n λ x ( 1 λ ) n ( 1 λ ) x ] x! n n (using the above result and the property that the limit of a product is the product of the limits) Page 14

Poisson as a limiting distribution...example We have a 300 page novel with 1, 500 letters on each page. Typing errors are as likely to occur for one letter as for another, and the probability of such an error is given by p = 10 5. The total number of letters n = (300) (1500) = 450, 000 Using λ = np, the poisson distribution function gives us the probability of the number of errors being less than or equal to 10 as: P(x 10) 10 x=0 e 4.5 (4.5) x x! =.9933 Rules of Thumb: close to binomial probabilities when n 20 and p.05, excellent when n 100 and np 10. Page 15