Introduction to Probability Ariel Yadin Lecture 1 1. Expectation - Discrete Case Proposition 1.1. Let X be a discrete random variable, with range R and density f X. Then, *** Jan. 3 *** Expectation for discrete RVs r R rf X (r). Proof. For all N, let X + N := r R [,N] 1 {X=r} r and X N := Note that X + N X+ and X N X. Moreover, by linearity E[X + N ] = r R [,N] Using monotone convergence we get that r R [ N,] P[X = r]r and E[X N ] = r R [ N,] 1 {X=r} r. E[X + ] E[X ] = lim N E[X+ N ] E[X N ] = f X (r)r. r R P[X = r]r. Example 1.2. Let us calculate the expectations of different discrete random variables: Examples: Ber, Bin, Poi, Geo If X Ber(p) then 1 P[X = 1] + P[X = ] = p. If X Bin(n, p) then since ( ) ( n k k = n n 1 k 1) for 1 k n, n f X (k) k = = np n ( ) n p k (1 p) n k k k n ( ) n 1 p k 1 (1 p) n k = np. k 1 An easier way, would be to note that X = n X k where X k Ber(p) (and in fact X 1,..., X n are independent). Thus, by linearity, n E[X k] = np. 1
2 For X Poi(λ): λ λk e k! k = λ e λ λk 1 (k 1)! = λ. For X Geo(p): Note that for g(x) = (1 x) k, we have p x g(x) = k(1 x)k 1. Do this one f X (k) k = (1 p) k 1 p k = p = p p p (1 p)k = p p ( 1 p 1 ) = p Another way: Let E = E[X]. Then, 1 p 2 = 1 p. ( ) (1 p) k E = p + (1 p) k 1 pk = p + (1 p) k 1 p(k 1) + (1 p) k 1 p = p + (1 p)e + 1 p. k=2 k=2 k=2 So E = 1 + (1 p)e or E = 1/p. Example 1.3. A pair of independent fair dice are tossed. What is the expected number of tosses needed to see Shesh-Besh? Note that each toss of the dice is an independent trial, such that the probability of seeing Shesh-Besh is 2/36 = 1/18. So if X = number of tosses until Shesh-Besh, then X Geo(1/18). Thus, 18. 1.1. Function of a random variable. Let g : R d R be a measurable function. Let (X 1,..., X d ) be a joint distribution of d discrete random variables with range R each. Then, Y = g(x 1,..., X d ) is a random variable. What is its expectation? Well, first we need the density of Y : For any y R we have that P[Y = y] = P[(X 1,..., X d ) g 1 ({y})] = P[(X 1,..., X d ) = (r 1,..., r d )]. So, if R Y := g(r d ), then since R d = (r 1,...,r d ) R d g 1 ({y}) y R Y R d g 1 ({y}),
3 and since Y is discrete we get that E[g(X 1,..., X d )] = y P[Y = y] Y R Y = f (X1,...,X d )( r) g( r) Specifically, y R Y r R d g 1 ({y}) = r R d f (X1,...,X d )( r) g( r). Proposition 1.4. Let g : R d R be a measurable function, and X = (X 1,..., X d ) a discrete joint random variable with range R d. Then, Example 1.5. E[g(X)] = r R d f X ( r) g( r). Manchester United plans to earn some money selling Wayne Rooney jerseys. Each jersey costs the club x pounds, and is sold for y > x pounds. Suppose that the number of people who want to buy jerseys is a discrete random variable with range N. What is the expected profit if the club orders N jerseys? How many jerseys should be ordered in order to maximize this profit? Solution. Let p k = f X (k) = P[X = k]. Let g N : N R be the function that gets the number of people that want to buy jerseys and gives the profit, if the club ordered N jerseys. That is, ky Nx k N g n (k) = N(y x) k > N The expected profit is then E[g N (X)] = p k g N (k) = p k N(y x) p k (N k)y = N(y x) p k (N k)y. Call this G(N) := N(y x) N p k(n k)y. We want to maximize this as a function of N. Note that G(N + 1) G(N) = y x yp k (N + 1 N) = y x y P[X N]. So G(N + 1) > G(N) as long as P[X N] < y x y, so the club should order N + 1 jerseys for the largest N such that P[X N] < y x y.
4 2. Expectation - Continuous Case Goal is E g(x) Our goal is now to prove the following theorem: Theorem 1.6. Let X = (X 1,..., X d ) be an absolutely continuous random variable, and let g : R d R be a measurable function. Then, E[g(X)] = g( x)f X ( x)d x. R d The main lemma here is Lemma 1.7. Let X = (X 1,..., X d ) be an absolutely continuous random variable. Then, for any Borel set B B d, P[X B] = f X ( x)d x. Proof. Let Q(B) = B f X( x)d x. Then Q is a probability measure on (R d, B d ), that coincides with P X on the π-system of rectangles (, b 1 ] (, b d ]. Thus, P[X B] = P X (B) = Q(B) for all B B d. B Remark 1.8. We have not really defined the integral B f X( x)d x. However, for our purposes, we can define it as P[X B], and note that this coincides with the Riemann integral on intervals. Proof of Theorem 1.6. First assume that g, so R d = g 1 [, ). For all n define Y n = 2 n 2 n g(x) which are discrete non-negative random variables. First, we show that E[Y n ] = 2 n 2 n g( x) f X ( x)d x. R d Indeed, for n, k let B n,k = g 1 [2 n k, 2 n (k + 1)) B d. Note that Y n = 2 n k1 {X Bn,k } and E[Y n ] = 2 n k P[X B n,k ]. Now, since and since n n R d = g 1 [, ) = g 1 [2 n k, 2 n (k + 1)) = B n,k, 1 Bn,k 2 n 2 n g f X = 1 Bn,k 2 n kf X,
5 we have by the lemma above that 2 n 2 n g( x) f X ( x)d x = R d = 1 Bn,k 2 n 2 n g( x) f X ( x)d x = R d 2 n k P[X B n,k ] = E[Y n ]. Here we have used the fact that 1 Bn,k 2 n k 1 Bn,k 2 n k. 2 n k f X ( x)d x B n,k Since 2 n 2 n g g 2 n, we get that Y n g(x) 2 n. Thus, R E[g(X)] g( x)f X ( x)d x E[g(X)] E[Y n] + g( x)f X ( x)d x 2 n 2 n g( x) f X ( x)d x d R d R d 2 2 n. This proves the theorem for non-negative functions g. Now, if g is a general measurable function, consider g = g + g. Since g +, g are nonnegative, we have that E[g(X)] = E[g + (X) g (X)] = E[g + (X)] E[g (X)] = (g + ( x) g ( x))f X ( x)d x. R d Expectation for Corollary 1.9. Let X be an absolutely continuous random variable with density f X. Then, tf X (t)dt. cont. Compare tf X (t)dt to r P[X = r] in the discrete case. This is another place where f X is like P[X = ] (although the latter is identically in the continuous case, as we have seen). Examples: Unif., Example 1.1. Expectations of some absolutely continuous random variables: X U[, 1]: 1 tdt = 1/2. More generally, X U[a, b]: b a t 1 b a dt = 1 2(b a) (b2 a 2 ) = b+a 2. X Exp(λ): We use integration by parts, since e λt = λ 1 e λt, t λe λt dt = te λt + e λt dt = 1 λ. Normal, Exp.
6 X N(µ, σ): Change u = t µ so du = dt, 1 2πσ Since the function u u exp ) t 2πσ exp ( (t µ)2 2σ dt. 2 ) u exp ( u2 2σ 2 du + µf X (t)dt. ( ) u2 2σ is an odd function, its integral is, so 2 µ f X (t)dt = µ. ** Jan. 5 *** Proposition 1.11. Let X be an absolutely continuous random variable, such that E[X] exists. Then, Proof. Note that Similarly, P[X > t]dt = (P[X > t] P[X t])dt = f X (s)dsdt = s (1 F X (t) F X (t ))dt. dtf X (s)ds = t sf X (s)ds. P[X t]dt = t f X (s)dsdt = Subtracting both we have the result. ** in exercises ** In a similar way we can prove s dtf X (s)ds = sf X (s)ds. Exercise 1.12. Let X be a discrete random variable, with range Z such that E[X] exists. Then, (P[X > k] P[X < k]). Example 1.13. Let X N(, 1). Compute E[X 2 ]. By the above, E[X 2 ] = 1 2π R x 2 e x2 /2 dx = 1 xe x2 /2 2π + 1 2π R e x2 /2 dx = 1 where we have used integration by parts, with x e x2 /2 = xe x2 /2.
7 3. Examples Using Linearity Example 1.14. [Coupon Collector] Gilad collects super-goal cards. There are N cards to collect altogether. Each time he buys a card, he gets one of the N uniformly at random, independently. What is the expected amount of cards Gilad needs to buy in order to collect all cards? For k =, 1,..., N 1, let T k be the number of cards bought after getting the k-th new card, until getting the (k + 1) th new card. That is, when Gilad has k different cards, he buys T k more cards until he has (k + 1) different cards. If Gilad has k different cards, then with probability N k N have. So, T k Geo( N k N ). he buys a card he does not already Since the total number of cards Gilad buys until getting all N cards is T = T + T 1 + T 2 + + T N 1, using linearity of expectation Example 1.15. then E[T ] = E[T ] + E[T 1 ] + + E[T N 1 ] = 1 + N N N 1 + + N = N 1 k. We toss a die 1 times. What is the expected sum of all tosses? Here it really begs to use linearity. If X k is the outcome of the k-th toss, and X = 1 X k Example 1.16. on [, 1]. What is their expected sum? 1 E[X k ] = 1 7 2 = 35. 2 random numbers are output by a computer, each distributed uniformly 2 E[U[, 1]] = 2 1 2 = 1. Example 1.17. Let X n U[, 2 n ], for n, and let S N = N X k. What is the expectation of S N? Linearity of expectation gives E[S N ] = E[X k ] = 2 (k+1) = 1 2 (N+1).
8 Note that if S = X k, then S N S so monotone convergence gives that E[S ] = 1.