Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists if the difference EX = EX + EX is well-defined, which it will be if either EX + < or EX <. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write E[X; A] X dp X1 A dp. Notice that EX inherits all of the properties of the Lebesgue integral. In particular, Thm. (1.3.1): uppose that X, Y 0 or E X, E Y <. Then 1. E[X + Y ] = EX + EY. 2. E[aX + b] = ae[x] + b for any a, b R. 3. If P{X Y } = 1, then EX EY. A 2.) Inequalities: Jensen s inequality. uppose that φ : R R is convex, i.e., φ(λx + (1 λ)y) λφ(x) + (1 λ)φ(y) for all λ (0, 1) and all x, y R. Then φ (EX) E[φ(X)]. Examples: EX E X, (EX) 2 EX 2. Hölder s inequality: If p, q [1, ] with 1/p + 1/q = 1, then E XY X p Y q, where X r (E X r ) 1/r and X inf{m : P( X > M) = 0}. 1
X is called the essential supremum of X and satisfies the inequality X sup{ X(ω) : ω Ω}. Also, notice that the case p = q = 1/2 is the Cauchy-chwarz inequality: E XY (E[X 2 ] E[Y 2 ]). Chebyshev s inequality: uppose that φ : R R is non-negative, let A A and set i A = inf{φ(y) : y A}. Then i A P(X A) E[φ(X); A] E[φ(X)]. Proof: The result follows by taking expectations of the inequality i A 1 A (X) φ(x)1 A (X) φ(x). If φ(x) = x 2 and A = {x : x a}, then we obtain Markov s inequality: a 2 P{ X a} E[X 2 ]. 3.) Integration of limits: We are interested in conditions that guarantee that if X n X, then EX n EX. The following example shows that this does not hold in general. Example: Take Ω = (0, 1), F = the Borel sets and P = Lebesgue measure on (0, 1). X n = n1 (0,1/n), then X n X 0, but EX n = 1 > 0 = EX. If We begin by recalling three classical results from analysis. Fatou s Lemma: If X n 0, then [ ] E lim inf X n lim inf E[X n]. Monotone Convergence Theorem: If 0 X n X, then EX n EX. Proof: ince X n X for all n, we know that lim sup EX n EX. However, since X = lim inf X n, Fatou s Lemma implies that EX lim inf EX n. Combining these two results shows that EX = lim EX n. 2
Dominated Convergence Theorem: If X n X a.s. and X n Y for all n, where EY <, then EX n EX. The special case where Y is constant is called the bounded convergence theorem. The following theorem can handle some cases that are not covered by either the monotone or the dominated convergence theorems. Theorem (1.3.8): uppose that X n X a.s. Let g, h be continuous functions such that 1. g 0 and g(x) as x ; 2. h(x) /g(x) 0 as x ; 3. E[g(X n )] K < for all n. Then E[h(X n )] E[h(X)]. Proof: By subtracting a constant from h, we can assume wlog that h(0) = 0. Choose M so that P( X = M) = 0 and g(x) > 0 whenever x M. Given a random variable Y, let Ȳ = Y 1 ( Y M). Then X n X a.s. Indeed, on the set X < M, we have that X n < M for all n sufficiently large and so X n = X n X = X, while on the set X > M, we have that X n > M for all n sufficiently large and so X n = 0 X = 0. ince h( X n ) is bounded and h is continuous, the bounded convergence theorem implies that To control the truncation error, let E[h( X n )] E[h( X)]. ɛ M sup{ h(x) /g(x) : x M}. and observe that for any random variable Y we have ( ) E[h(Ȳ )] E[h(Y )] E h(ȳ ) h(y ) = E[ h(y ) ; Y > M] ɛ M E[g(Y )]. Taking Y = X n in ( ) and using condition (3) in the theorem, we have Eh( X n ) Eh(X n ) Kɛ M. To estimate the remaining truncation error, notice that because g 0 and g is continuous, Fatou s lemma implies that E[g(X)] lim inf E[g(X n)] K. Then, taking Y = X in ( ) gives Eh( X) Eh(X) Kɛ M. 3
Finally, by the triangle inequality, we have Eh(X n ) Eh(X) E[h(X n )] E[h( X n )] + E[h( X n )] E[h( X)] + E[h( X)] E[h(X)]. Letting n, we obtain lim sup E[h(Xn )] E[h(X)] 2KɛM which can be made arbitrarily close to 0 since ɛ M 0 as M. Corollary: uppose that X n X a.s. and that there exists a K < and a p > 1 such that E[X p n] K for all n 1. Then EX n EX. 4.) Computing Expected Values Change of Variables Formula: Let X be a random variable with distribution µ on (, ). If f is a measurable function from (, ) to (R, R) such that f 0 or E f(x) <, then E[f(X)] = f(y) µ(dy). Proof: We use the approach that Williams calls the standard machine. Case 1: Indicator Functions. If B and f = 1 B, then E[1 B (X)] = P(X B) = µ(b) = 1 B (y) µ(dy). Case 2: imple Functions. Let f(x) = n i=1 c i1 Bi (x), where c i R and B i. Then Case 1 combined with the linearity of both expectations and integration shows that E[f(X)] = = n c i E[1 Bi (X)] i=1 n c i i=1 1 Bi (y) µ(dy) = f(y) µ(dy). Case 3: Nonnegative Functions. If f 0 and we let f n (x) = ( 2 n f(x) /2 n ) n, then f n is a simple function and f n f as n. Combining Case 2 with the monotone convergence theorem gives E[f(X)] = lim E[f(X n)] = lim f n (y)µ(dy) = f(y)µ(dy). 4
Case 4: Integrable Functions. Write f(x) = f + (x) f (x) and note that the integrability of f implies that E[f + (X)] and E[f (X)] are finite. Using Case 3 gives Ef(X) = E[f + (X)] E[f (X)] = f + (y) µ(dy) f (y) µ(dy) = f(y) µ(dy). 5