Lecture 2: Random Variables and Expectation

Econ 514: Probability and Statistics Lecture 2: Random Variables and Expectation Definition of function: Given sets X and Y, a function f with domain X and image Y is a rule that assigns to every x X one (and only one) y Y. Notation: f : X Y, y = f(x) 1

Definition of random variable Let (Ω, A, P ) and be a probability space. A random variable X is a function X : Ω R such that for all B B with B the Borel σ-algebra E = {ω X(ω) B} A. The set E is also denoted as E = X 1 (B). This does not mean that X 1 exists, i.e. that X is a 1-1 function! See figure 2

A random variable X is a function that is in addition Borel measurable. Why we need this will be discussed later. Measurability is a more general concept. Because random variables are always functions to R we need only Borel measurability, because for R we always take the Borel σ-algebra. Often the function X can take the values and, i.e. X is a function the the extended real line R with the extended Borel σ-field B that contains all sets in B and the two points,. Why random variables? Often outcomes of a random experiment are complicated. Random variable summarizes (aspects of) an outcome in a single number. 3

Example: Three tosses of a single coin. Outcome space Ω = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T } Define X number of H in 3 tosses X(ω) = 3 if ω = HHH = 2 if ω = HHT,HTH,THH = 1 if ω = THT,TTH,HTT = 0 if ω = TTT For some random experiment we can define many random variables, e.g. in this example Y number of T before first H. 4

Measurability To establish measurability we use a generating class argument, because it is often easier to establish measurability on such a class. If E is a generating class for B, e.g. the intervals (, x] or (x, ), then we need only to show that X 1 (E) A for all E E. Proof: Define C = {B B X 1 (B) A}. We show that this is a σ-field. (i) C. (ii) Note X 1 (B c ) = X 1 (B) c, because by definition ω X 1 (B) c iff X(ω) / B iff X(ω) B c. Hence X 1 (B c ) A. (iii) Note X 1 ( i=1 B i) = i=1 X 1 (B i ), because ω X 1 ( i=1 B i) iff X(ω) i=1 B i. Hence, X 1 ( i=1 B i) A. Because C is a σ-field and E C, we have B = σ(e) C. Hence X 1 (B) A for all B B, so that X is Borel measurable. 5

Applications Let Ω = R, i.e. X : R R. If X is a continuous function, then X is Borel measurable, because if E is an open set (the open sets in R are a generating class), then X 1 (E) is also open and hence in B. Let X n, n = 1, 2,... be a sequence of random variables, then X sup = sup n X n and X inf = inf n X n are also random variables, i.e. they are Borel measurable functions. Note that X sup (ω) may be equal to for some ω (and X inf (ω) equal to ), i.e. they are B measurable. To see this note that (x, ) is a generating class for B, and that {ω X sup (ω) > x} = n {ω X n (ω) > x} The last union is clearly in A. For X inf take the generating sets (, x). If lim X n = X exists (may be ± ), then this is a random variable. To see this note that lim inf X n (ω) = sup n inf m n X m (ω) and lim sup X n (ω) = inf n sup m n X m (ω). From the previous result these are Borel measurable functions. We have lim inf X n(ω) X(ω) = lim X n (ω) lim sup X n (ω) Hence if the limit exists, it is equal to the liminf and limsup which are Borel measurable. 6

Let X and Y be random variables, then Z = X + Y is Borel measurable. Note that A = {ω Z(ω) > z} = x {ω X(ω) = x} {ω Y (ω) > z x} which involve an uncountable union. Because the countable set of rational numbers is a dense subset of R, for all ω with X(ω) > z Y (ω) there is a rational number r such that X(ω) > r > z Y (ω). Hence A = r {ω X(ω) > r} {ω Y (ω) > z r} which is a countable union. 7

We denote all Borel measurable functions X : Ω R by M and the subset of Borel measurable nonnegative functions by M +. A special class of nonnegative Borel measurable functions are the simple functions that can be written as n X(ω) = α i I Ai (ω) i with I A the indicator function of the event A and A i A, i = 1,..., n a partition of Ω and α i 0, i = 1,..., n constants. Each function in M + can be approximated by and increasing sequence of simple functions. Theorem 1 For each X in M +, the sequence of simple functions 4n X n (ω) = 2 n I (ω) X i 2 n is such that 0 X 1 (ω) X 2 (ω)... X n (ω) and X n (ω) X(ω) for all ω Ω. Proof: If X(ω) 2 n, then X n (ω) = 2 n If k2 n X(ω) < (k + 1)2 n for some k = 0, 1,..., 4 n 1, then X n (ω) = k2 n. See the figure for a graph and the claim follows. i=1 8

Expectation and integration Random experiment: Toss a coin twice Ω = {HH, HT, T H, T T } and these outcomes are equally likely Random variable: X is number of H in two tosses X takes values 0 (TT), 1 (TH,HT), and 2 (HH). You receive the uncertain return $X How much do you want to pay for this gamble if you are risk neutral? 9

Most people make the following computation: Consider a large number of repetitions of the random experiment. The relative frequency of values of X is 1/4 (X = 0), 1/2 (X = 1), 1/4 (X = 2). On average (over the repetitions) X is 0. 1 4 + 1.1 2 + 2.1 4 = 1 You are willing to pay $1 for the gamble. Call this the expected value of X, denoted by E(X). Direct computation Note that X is a nonnegative simple function for the partition A 1 = {T T }, A 2 = {HT, T H}, A 3 = {HH} and X(ω) = 0.I A1 (ω) + 1.I A2 (ω) + 2.I A3 (ω). E(X) = 0.P (A 1 ) + 1.P (A 2 ) + 2.P (A 3 ) = 1. In general if X(ω) = n i α ii Ai (ω) is a simple function, the expected value of X is n E(X) = α i P (A i ) i=1 Note that this is a weighted average of the values of X, the weights being the probabilities of these values. This suggests the notation E(X) = X(ω)dP (ω) = Ω XdP 10

How do you compute E(X) for a general random variable X? We use Theorem 1: Let X be a nonnegative random variable defined on the probability space (Ω, A, P ). By Theorem 1 there is an increasing sequence of simple functions X n that has limit X. This is why we need that X is Borel measurable to be able to define E(X). Define for simple functions X S E(X) = XdP = sup{e(x S ) X X S } X S Properties of E(X) (i) E(I A ) = P (A) for A A. (ii) E(0) = 0 with 0 the null function that assigns 0 to all ω Ω. 11

(iii) For α, β 0 and nonnegative Borel measurable functions X, Y E(αX + βy ) = αe(x) + βe(y ) This is the linearity of the expectation. Proof: Note that if X S, Y S are simple functions then so is Z S = X S + Y S. Also E(Z S ) = E(X S ) + E(Y S ). E(X)+E(Y ) = sup X S {E(X S ) X X S }+sup Y S {E(Y S ) Y Y S } = = sup X S,Y S {E(X S ) + E(Y S ) X X S, Y Y S } sup X S,Y S {E(X S + Y S ) X + Y X S + Y S } = = sup Z S {E(Z S ) X + Y Z S } = E(X + Y ) Next we prove E(X +Y ) E(X)+E(Y ). Let Z S be a simple function with Z S X+Y and let ε > 0. We construct simple functions X S X and Y S Y such that (1 ε)z S X S + Y S. We do the construction for Z S = I A. The general case is analogous. Take ε = 1 m and denote l j = j m. Define ( ) m X S (ω) = I A (ω) I X 1 (ω) + l j 1 I lj 1 X<l j (ω) Y S (ω) = I A (ω) j=1 m (1 l j )I lj 1 X<l j (ω) j=1 Obviously X S X. Now X(ω) + Y (ω) 1 for all ω Ω. Hence for l j 1 X < l j, we have Y > 12

1 l j = Y S. This holds for all j and hence Y S Y. Finally, because 1 l j + l j 1 = 1 ε, we have X S (ω) + Y S (ω) = m = I A (ω)i X 1 (ω)+(1 ε)i A (ω) I lj 1 X<l j (ω) (1 ε)i A (ω) Hence for all Z S X + Y j=1 E(X) + E(Y ) E(X S ) + E(Y S ) (1 ε)e(z S ) and if we take the sup over all Z S X + Y, we find E(X) + E(Y ) (1 ε)e(x + Y ). Because ε > 0 is arbitrary, this is still true if ε 0. From the definition it follows directly that for all α 0, E(αX) = αe(x). (iv) If X(ω) Y (ω) for all ω Ω, then E(X) E(Y ). Proof: E(X) = E(Y ) E(Y X) E(X) 13

(v) If X n X is an increasing sequence of nonnegative Borel measurable functions, then E(X n ) E(X). This is the monotone convergence property. Proof: Let X S = m i=1 α ii Ai be a simple function with X X S. Define the simple functions m X ns (ω) = (1 ε)α i I Ai (ω)i Xn (1 ε)α i (ω) i=1 Then X ns X n and E(X n ) E(X ns ) = (1 ε) m α i P (A i {ω X n (ω) (1 ε)α i }) i=1 Because X n X α i for ω A i, A i {ω X n (ω) (1 ε)α i } A i and hence P (A i {ω X n (ω) (1 ε)α i }) P (A i ). Hence for all X S X lim E(X n) (1 ε)e(x S ) Take the sup over all X S X and let ε 0 to obtain Because X n X, also lim E(X n) E(X) lim E(X n) E(X) 14

Extension to all random variables Until now E(X) only defined for nonnegative random variables. For arbitrary random variable X we can always write X(ω) = X + (ω) X (ω) with X + (ω) = max{x(ω), 0} and X (ω) = min{x(ω), 0}. Note X +, X are nonnegative. We define E(X) = E(X + ) E(X ) = Ω X + dp Ω X dp This is well-defined unless E(X + ) = E(X ) =. To avoid this we can require E(X + ) <, E(X ) < or E( X ) <. A random variable X with E( X ) < is called integrable. Application: Jensen s inequality. A function f : R R is convex if for all 0 < λ < 1, f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ). If f is convex E = {x f(x) t} is a convex subset of R and hence an interval. Hence, f is Borel measurable. Proof: x 1, x 2 E, then f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ) t. For all x, x 0, f(x) f(x 0 ) + α(x x 0 ) with α a constant that may depend on x 0. 15

Note f(x) f(x 0 ) + α(x x 0 ) f(x 0 ) α ( x + x 0 ) Hence E(f(X) ) < if X is integrable and E(f(X) is well-defined. Take x 0 = E(X) to obtain E(f(X)) f(e(x))+α(e(x) E(X)) = f(e(x)) 16

Lebesgue integrals The expectation of X is the integral of X w.r.t. to the probability measure P. The same definition applies if P is replaced by a measure µ, i.e. if the condition that µ(ω) = 1 is dropped and replaced by µ( ) = 0 (the other conditions remain). Special case is Lebesgue measure, defined by m([a, b]) = b a. This is the length of the interval. It implies m((a, b)) = b a (because the Lebesgue measure of a point is 0) and because the open intervals are a generating class the definition can be uniquely extended to all sets in the Borel field B. The integral of Borel measurable f : R R w.r.t. Lebesgue measure is denoted by f(x)dx. The notation is the same as the (improper) Riemann integral of f. If f is integrable, i.e. if the Lebesgue integral f(x) dx <, then the Lebesgue integral is equal to the Riemann integral if the latter exists. If f(x) dx =, the improper Riemann integral lim t t t f(x)dx may exist, while the Lebesgue integral is not defined. Except for this special case you can compute Lebesgue integrals with all the calculus tricks. The theory of Lebesgue integration is easier than that of Riemann integration, in particular if order or integration and limit or integration and differentiation has to be interchanged. 17

Integration and limits Often we have a sequence of random variables X n, n = 1, 2,... and we need to know lim E(X n ) = lim Xn dp. Can we interchange limit and integral? We want to take the derivative w.r.t. of E(f(X, t)) = inf f(x, t)dp. Can we interchange differentiation and integration? What can go wrong: Consider the probability space [0, 1], B[0, 1], P ) with B[0, 1] the σ-field obtained by the intersections of the sets in B with [0, 1] and P ((a, b)) = b a. Define the sequence X n (ω) = n 2 I (0, 1 n ) (ω). X n (ω) 0 for all 0 ω 1, but E(X n ) = n. lim E(X n ) = 0 = E(lim X n ) Theorem 2 (Fatou s Lemma) Let X n be a sequence of nonnegative random variables (need not converge), then E(lim inf X n) lim inf E(X n) Proof: Remember lim inf X n = lim inf m n X m. Define Y n = inf m n X m. We have for all n, Y n X n. Moreover, Y n is an increasing sequence of nonnegative random variables, by monotone convergence E(lim inf X n ) = lim E(Y n ). Finally, because E(X n ) E(Y n ), we have lim E(Y n ) lim inf E(X n ). 18

Theorem 3 (Dominated convergence) Let X n be a sequence of integrable random variables and let the limit lim X n (ω) = X(ω) exist for all ω Ω. If there is a nonnegative integrable random variable Y such that X n (ω) Y (ω) for all ω Ω and all n, then X is integrable and lim E(X n ) = E(X). Proof: X Y and hence X is integrable. Consider the sequences Y +X n and Y X n that are both nonnegative and integrable. By Fatou s lemma E(Y +X) = E(lim inf (Y +X n)) E(Y )+lim inf E(X n) E(Y X) = E(lim inf (Y X n)) E(Y ) lim sup E(X n ) because lim inf X n = lim sup X n. Cancel E(Y ) to obtain lim sup E(X n ) E(X) lim inf E(X n) 19

Application: Let f(x, t) be an integrable random variable for δ < t < δ with δ > 0, let f(x, t) be differentiable in t on that interval and for all x. Consider the partial derivative with respect to t and assume for all x and δ < t < δ f(x, t) t with M(X) an integrable random variable. Hence by the mean value theorem f(x, t) f(x, 0) t f(x, t(x)) t M(x) with t(x) = λ(x)t for some 0 λ(x) 1. Define the sequence of random variables f(x, t n ) f(x, 0) t n with t n 0. We have ( ) f(x, lim E tn ) f(x, 0) t n = lim E(f(X, t n )) E(f(X, 0)) t n By dominated convergence we can interchange the limit and the expectation (integration), so that ( ) E f(x, t) = E(f(X, t)) t t 20

Sets of measure 0 In integrals/expactations sets E A with P (E) = 0 can be neglected. Theorem 4 If the random variables X and Y are such that E = {ω X(ω) Y (ω)} with P (E) = 0, then E(X) = E(Y ). Proof: If n is sufficiently large then X(ω) Y (ω) + n.i X Y (ω). Because the sequence on the rhs is increasing, we have by monotone convergence ( E(X) E lim (Y + n.i X Y ) Interchange X and Y to obtain, E(Y ) E(X). ) = lim E(Y +n.i X Y )) = E(Y ) 21