ABSTRACT EXPECTATION Abstract. In undergraduate courses, expectation is sometimes defined twice, once for discrete random variables and again for continuous random variables. Here, we will give a definition that agrees with the undergraduate definition that is based on measure theory. 1. Undergraduate definitions Recall that if a 1, a 2,... is sequence of real numbers, then we say that a n has an absolutely convergent sum if a n <. n=1 If a n has an absolutely convergent sum, then a n <, n=1 and any re-ordering of the terms also results in a convergent sum with the same limit. If a n has a convergent sum, so that a n <, n=1 that is not absolutely convergent, then it is a theorem that for all z R, the sequence can be re-ordered to become a non-absolutely convergent sum that converges to z. Hence we need to be careful when sums have both positive and negative terms. Recall that we say that X is a discrete real-valued random variable if there exists a countable set D R such that P(X D) = 1. We defined EX = x D xp(x = x) provided that x D x P(X = x) <. We can allow the values ±, if we are even more careful. For any random variable, set X + =
max {X, } and X = min {X, }, so that X. If at least one of EX + and EX is finite, then we can define EX = EX + EX. In what follows we will extend this defintion of expectation to all random variables by taing limits. 2. Simple functions Let X be a random variable in the probability space (Ω, F, P). We say that is a simple random variable if there exists a finite number of events A i F and a i R such that X = a i 1 Ai. We set EX := a i P(A i ). Exercise 2.1. Show that the above definition is well-defined; that is, if m X = a i 1 Ai = b i 1 Bi, then a i P(A i ) = m b i P(B i ). Proposition 1. Let X : Ω [, 1) be a random variable in the probability space (Ω, F, P). There exists a sequence of simple random variables X n X everywhere. Proof. Let n 1, and consider the events for each n 1. Set Note that X n X 1 n. A n, := {X [/n, ( + 1)/n)}, X n = n 1 n 1 A n,. Exercise 2.2. Prove Proposition 1 for any real-valued random variable. Exercise 2.3. Show that every non-negative random variable is the limit of a non-decreasing sequence of simple random variables.
Let X and Y be random variables on the same probability space (Ω, F, P). Recall that σ(x) = {X 1 (B) : B B}. We abuse notation and write Y σ(x), if Y is measurable with respect to the smaller sigma-field generated by X; that is, for all Borel sets B B, we have {Y B} σ(x). Exercise 2.4. In this exercise we will argue that if Y σ(x), then there exists a Borel measurable function g : R R so that Y = g(x) almost surely. (a) Show that for every Borel function g : R R, we have that g(x) σ(x). (b) Show that for every event A σ(x) there exists a measurable function g : R R such that g(x) = 1 A. (c) Show that if W σ(x) is a simple function, then there exists a measuable function g : R R such that g(x) = W. (d) Notice that (Ω, σ(x), P) is a probability space in its own right. Thus there exists a sequence of simple functions Y n Y almost surely, where Y n σ(x). Use this fact to find a measurable function g : R R such that g(x) = Y almost surely. 3. Defining expectation and integration Let X be a non-negative random variable on a probability space (Ω, F, P). Set EX := sup {EY : Y X and Y is a non-negative simple function}. We set EX := EX + EX if at least one of the expectations is finite. We say that X L p, if E X p <. Theorem 2 (Monotone convergence theorem for simple functions). Let X be a non-negative random variable. If X n is a non-decreasing sequence of non-negative simple functions that coverage to X, then lim EX n = EX. n It is not too difficult to prove Theorem 2, but we will omit it, since you will probably see it in a proper measure theory course. Theorem 2 allows us to prove all the properties we are used to. For example: Lemma 3 (Linearity). If X and Y are random variables with finite expection, then E(aX + by ) = aex + bey for all a, b R.
The proof follows from the fact that we have linearity for simple function. The following are the standard continuity properties of E. Theorem 4 (Convergence theorems). Let X n be a sequence of random variables that converage to X almost surely. (a) Monotone convergence theorem: If X n is a non-decreasing sequence, then EX n EX. (b) Bounded convergence theorem (only for finite measure spaces): If there exists C such that X n C, then X L 1 and E X n X. (c) Dominated convergence theorem: If there exists a random variable Y such that E Y <, and X n Y, then X L 1 and E X n X Note that if E X n X, then EX n EX. Lemma 5 (Fatou). If X n, then lim inf n EX n E(lim inf n X n). It is easy to see that in the case of a discrete random variable, our new definition agrees with the old one. Here we do a simple calculation to illustrate that it agrees with the old definition in the special case of a continuous random variable that is supported in [, 1]. Lemma 6. If X is a continuous random, taing values in [, 1] with a pdf that is continuous on [, 1], then EX = xf(x)dx Proof. Let X n be defined as in Proposition 1, so that X n X n 1, and X n 1. We now by the bounded convergence theorem that EX n EX. By defintion that X is a continuous random, we have that Thus We set so that EX n = n 1 P(A n, ) = (+1)/n /n n P(A n 1 n,) = n f(x)dx. (+1)/n /n f(x)dx. n 1 [ g n (x) = 1 x [/n, ( + 1)/n)] n f(x), EX n = g n (x)dx.
Let g(x) = xf(x). Since f is continuous on [, 1], let f(x) M for all x [, 1]. Note that g n (x) g(x)) M/n, so that g n converges to g uniformly, hence EX n = g n (x)dx xf(x)dx = EX. Exercise 3.1. Prove that the Dominated convergence theorem implies the Bounded convergence theorem. Exercise 3.2. Let X n be non-negative random variables, show that E ( ) X n = EX n. n= Sometimes it is more convenient to thin of expectation as an integration with repsect to the probability measure P: we write EX = X(ω)dP(ω). In the case of a finite measure space (Ω, F, µ), we also define integration with respect to the measure µ in exactly the same way we defined expectation for random variables on a probability space. For a simple function f : Ω R given by f = a i 1 Ai n= for A i F and a i R, we set µ(f) := f(ω)dµ(ω) := a i µ(a i ), and proceed in exactly the same manner. It is slightly more tricy to deal with the case where (Ω, F, µ) is a measure space where µ(ω) =. In this case, we need an additional assumption that is σ-finite; that is, there exists a countable sequence F i F, such that there union is all of Ω and they each have finite measure udner µ. Borel measure is σ-finite measure; partition the real line into intervals. Exercise 3.3. Show that the Caratheodory s extension theorem is also unique for σ-finite measures. When we integrate with respect to Borel measure (or Lebesgue measure), this is called the Lebesgue integral. The Lebesgue integral can handle a much wider class of functions, and agrees with the Riemann integral whenever it exists.
Theorem 7. The Riemann integral defined using Riemann sums, whenver it exists, agrees with Lebesgue integral defined using measure theory. The next theorem allows us to do calculations and provide a ey bridge to our undergraduate notions. Theorem 8 (Law of the unconscious statistician). Let X be a random variable on the probability space (Ω, F, P). Let µ be the law of X so that (R, B, µ) is also a probability space. If g is a Borel function, such that E g(x) <, then Eg(X) = g(y)dµ(y). In addition, if X is a continuous random variable with density f, then Eg(X) = g(y)dµ(y) = g(x)f(x)dx. The proof of Theorem 8 proceeds by checing the result for simple functions, and then extending the result to non-negative functions by using the monotone convergence theorem, and finally general integrable functions by taing positive and negative parts. Theorem 8 also holds in the case that g is non-negative. Exercise 3.4. Let (Ω, F, µ) be a probability space, and let T : Ω Ω be a measure-preserving map, so that µ = µ T 1. Let f L 1. Show that (f T )(x)dµ(x) = f(x)dµ(x). For Exercise 3.4, note that if the case f = 1 A for some A F, the 1 A T = 1 T 1 (A), and we have that (f T )(x)dµ(x) = µ(t 1 (A)) = µ(a) = f(x)dµ(x), where the middle equality comes from the assumption that T is measurepreserving.