Chp 4. Expectation and Variance

Similar documents
LIST OF FORMULAS FOR STK1100 AND STK1110

3. Probability and Statistics

Random Variables and Their Distributions

Formulas for probability theory and linear models SF2941

STT 441 Final Exam Fall 2013

18.440: Lecture 28 Lectures Review

Continuous Random Variables

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

ENGG2430A-Homework 2

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Final Exam # 3. Sta 230: Probability. December 16, 2012

Expectation of Random Variables

Lecture 5: Expectation

Probability Theory and Statistics. Peter Jochumzen

Elements of Probability Theory

5 Operations on Multiple Random Variables

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

BASICS OF PROBABILITY

Lecture 19: Properties of Expectation

Actuarial Science Exam 1/P

conditional cdf, conditional pdf, total probability theorem?

1 Review of Probability

Practice Examination # 3

Random Variables. P(x) = P[X(e)] = P(e). (1)

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Probability and Distributions

STA 256: Statistics and Probability I

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 4 : Expectation and Moments

STAT Chapter 5 Continuous Distributions

18.440: Lecture 28 Lectures Review

Stat 5101 Notes: Algorithms

Continuous Random Variables and Continuous Distributions

Lecture 22: Variance and Covariance

MAS113 Introduction to Probability and Statistics. Proofs of theorems

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Chapter 5 continued. Chapter 5 sections

STAT 430/510: Lecture 16

ACM 116: Lectures 3 4

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

14.30 Introduction to Statistical Methods in Economics Spring 2009

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Theory of probability and mathematical statistics

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Stat 5101 Notes: Algorithms (thru 2nd midterm)

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

1.1 Review of Probability Theory

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Exercises and Answers to Chapter 1

Review of Probability Theory

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

1 Random Variable: Topics

4. CONTINUOUS RANDOM VARIABLES

Probability: Handout

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Week 12-13: Discrete Probability

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

1 Exercises for lecture 1

Lecture 1: August 28

Chapter 4. Chapter 4 sections

1 Review of Probability and Distributions

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

ECE Lecture #9 Part 2 Overview

2 (Statistics) Random variables

Conditional distributions. Conditional expectation and conditional variance with respect to a variable.

Introduction to Machine Learning

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Chapter 2: Random Variables

Stochastic Models of Manufacturing Systems

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

1 Probability theory. 2 Random variables and probability theory.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

ORF 245 Fundamentals of Statistics Great Expectations

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Brief Review of Probability

Solution to Assignment 3

ECON 5350 Class Notes Review of Probability and Distribution Theory

3 Multiple Discrete Random Variables

Notes on Stochastic Calculus

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Lecture 11. Probability Theory: an Overveiw

1 Presessional Probability

Interesting Probability Problems

1 Basic continuous random variable problems

The Multivariate Normal Distribution. In this case according to our theorem

Lectures 22-23: Conditional Expectations

Let X and Y denote two random variables. The joint distribution of these random

Chapter 4. Continuous Random Variables

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Continuous Random Variables

Transcription:

Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance. Definition 1.1 Assume a random variable X has probability distribution if j x j p j <, then p j P (X x j ), j, 1,..., E(X) is the Expectation of random variable X. x j p j (1) Definition 1.2 Assume the density function of a continuous random variable X is f(x), if x f(x)dx < (Absolutely integrable), then E(X) is the Expectation of random variable X. j xf(x)dx (2) Remark 1.1 1. We can find that the expectation is actually the weighted mean of values that the random variable may take. 2. We often write the expectation of X as EX instead of E(X), and µ is often used for EX. 3. Note that we need j x j p j < to make sure that j x jp j is not meaningless. We know that if P (X n 2 ) 1 n and P (X n) 1 1 n, then EX n n + n 1/2 goes to infinity as n goes to infinity. However, we can find that P (X n) 1 1 n means the probability of X < is 1 if n goes to infinity. However, if any x j is bigger than such that x j p j, EX is not meaningless. We can declare that EX exists whenever EX EX + EX make sense, i.e., EX + < or EX <, where x + max{x, } to be the positive part and x max{ x, } to be the negative part. The expectations of several distributions and indicator function: 1. Bernoulli Distribution B(1, p): EX p Since P (X 1) p, P (X ) 1 p, therefore EX 1 P (X 1)+ P (X ) p. 1

2. Binomial Distribution B(n, p): EX np. Since p j P (X j) C j np j (1 p) n j, j n, therefore we have EX jcnp j j (1 p) n j j np j1 C j 1 n 1 pj 1 (1 p) n j (jc j n nc j 1 n 1 ) n 1 np Cn 1p k k (1 p) n 1 k (k j 1) k np(p + 1 p) n 1 np. 3. Poisson Distribution P(λ): EX λ. Since P (X k) λk k! e λ, k, 1,..., therefore we have EX k k λk k! e λ λ 4. Gemetric Distribution: EX 1 p. Since therefore with q 1 p, we have EX k1 λ k 1 (k 1)! e λ λ. P (X j) p(1 p) j 1, j 1, 2,..., jpq j 1 p j1 j 5. Uniform Distribution U(a, b): EX a+b 2. EX xf(x)dx 6. Exponential Distribution ε(λ): EX 1 λ. EX xf(x)dx q j b a ( ) 1 p 1 1 q p. x 1 b a a + b 2. xλe λx dx 1 λ te t dt 1 λ. 2

7. Normal Distribution N(µ, σ 2 ): EX µ. EX xf(x)dx µf(x)dx + µ + 1 2πσ µ + 1 2πσ µ, (x µ)f(x)dx (x µ)e (x µ)2 2σ 2 te t2 2σ 2 dt where the last equation is because e t2 2σ 2 is symmetric of t. 8. Gamma Distribution Γ(α, β): EX α β. EX xf(x)dx 1 t α e t dt Γ(α)β Γ(α + 1) Γ(α)β α β, where the last equation is because Γ(α + 1) αγ(α). dx x βα x α 1 Γ(α) e βx dx (t βx) 9. If EX exists and the density of X f(x) satisfies f(µ + x) f(µ x), then EX µ. Answer: EX µ + since tf(t + µ) tf( t + µ). xf(x)dx µf(x)dx + tf(t + µ)dx µ (x µ)f(x µ + µ)dx 1. (Important) Assume A is an event, I A is the indicator function of A, then EI A P (I A 1) P (A). 3

2 Properties of Expectation 2.1 Expectation of Functions of Random Vectors Theorem 2.1 Assume X (X 1, X 2,..., X n ) is a random vector, x (x 1, x 2,..., x n ) R n. (1) If X has the joint density f(x) f(x 1, x 2,..., x n ), and the real function g(x) satisfies R n g(x) f(x)dx 1 dx 2...dx n <, then the expectation of Y g(x) is EY g(x)f(x)dx 1 dx 2...dx n ; (3) R n (2) If X is a discrete random vector and has the probability distribution p j1,j 2,...,j n P (X (x 1 (j 1 ), x 2 (j 2 ),..., x n (j n ))), j 1, j 2,..., j n 1, and the real function h(x) satisfies j 1,j 2,...,j n h(x 1 (j 1 ), x 2 (j 2 ),..., x n (j n )) p j1,j 2,...,j n <, then the expectation of Y h(x) is EY h(x 1 (j 1 ), x 2 (j 2 ),..., x n (j n ))p j1,j 2,...,j n. (4) j 1,j 2,...,j n Remark 2.1 We can get the expectation of functions of random vectors directly instead of deriving the distribution of them and then get the expectation. Now, we give some important examples. Example 2.1 Assume X N(, 1), α is a fixed number, calculate E X α. Answer: E X α 1 2π 2 ( 2) α 2π x α exp( x 2 /2)dx 1 ( 2) α t α/2 1/2 e t dt π t α/2 e t 1 2t dt (x 2t, dx dt 2 1 2 t 1/2 ) Therefore, when α > 1, E X α 2 α π Γ ( ) 1+α 2 as Γ(α) t α 1 e t dt. When α 1, E X α as X α. Specially, when α 2, EX 2 2 ( ) 3 Γ 2 ( ) 1 1 π 2 π 2 Γ 1, 2 4

when α 1, as Γ( 1 2 ) π, Γ(1) 1. E X 2 2 π Γ(1) π We can find that the main technique in calculating the expectation is integration and also changing the parameters. 2.2 Properties of Expectation Theorem 2.2 Assume E X j < (1 j n), c, c 1,..., c n are fixed numbers, then we have (1) the expecation of Y c + c 1 X 1 + c 2 X 2 +... + c n X n exists, and EY c +c 1 EX 1 +c 2 EX 2 +...+c n EX n, E(X 1 +X 2 +...+X n ) EX 1 +EX 2 +...+EX n ; Note: X 1, X 2,..., X n don t need to be independent. (2)if {X j } are independent, then the expectation of Z X 1 X 2...X n exists and EZ EX 1 EX 2...EX n ; Note: X 1, X 2,..., X n need to be independent. (3) if X 1 X 2 a.s. or say Y a.s., then EX 1 EX 2 or EY. Proof: We only prove the theorem when X (X 1, X 2,..., X n ) has joint density f(x). (1) E Y c + c j x j f(x)dx 1 dx 2...dx n R n And, E(Y ) R n c o + c o + c + c + c o + j1 c j x j f(x)dx 1 dx 2...dx n R n c j E( X j ) <. j1 j1 c j x j f(x)dx 1 dx 2...dx n j1 c j x j f(x)dx 1 dx 2...dx n R n j1 c j EX j X j (,...,, 1,,..., )(X 1, X 2,..., X n ) X j. j1 5

(2) E Z x 1 x 2...x n f(x)dx 1 dx 2...dx n R { n } { } { } x 1 f(x 1 )dx 1 x 2 f(x 2 )dx 2... x n f(x n )dx n E X 1 E X 2...E X n <. Similarly, we can get EZ EX 1 EX 2...EX n. (3) It s obvious that if Y (ω) for any ω Ω, then EY. Now, we discuss the case Y (ω) a.s.. We know that if EY <, then there must exist a fixed number a <, such that P (Y < a) > c > which is contrary to P (Y ) 1. It s important to connect the non-negativity of E X and X a.s.. Lemma 2.1 If E X, then P (X ) 1. Proof: We use I[n X > 1] to denote the indicator function of event {n X > 1}. Then we always have I[n X > 1] n X. Therefore, P ( X > 1/n) P (n X > 1) E(I[n X > 1]) ne( X ), and by the continuity of probability, we have ( ) P ( X > ) P { X > 1/n} n1 So, P ( X ) 1 P ( X > ) 1. lim P ( X > 1/n). n On the other hand, we can prove that if P (X ) 1, then E X. We know that if E X a >, then there must exist a fixed b >, such that P ( X > b) c > and P ( X ) < 1 P ( X > b) < 1. Therefore, if P (X ) 1, then E X. The technique using indicator function in the above proof is very useful. 3 Variance of Random Variable 3.1 Definition of Variance Definition 3.1 If the expectation of random variable X is µ EX and finite, then we call E(X µ) 2 the Variance of X, and denote it as var(x) and call σ X var(x) the Standard Deviation. 6

We know that E(X EX) 2 E(X 2 2XEX + (EX) 2 ) EX 2 2(EX) 2 + (EX) 2 EX 2 (EX) 2. This equation will be used very often in the calculating of variance. Now we give the variances of several distributions: 1. Bernoulli Distribution B(1, p): var(x) p(1 p) Since P (X 1) p, P (X ) 1 p, therefore var(x) EX 2 (EX) 2 p p 2 p(1 p). 2. Binomial Distribution B(n, p): var(x) np(1 p). Denote q 1 p, since EX np and we have p j P (X j) C j np j (1 p) n j, j n, EX 2 E(X(X 1)) + EX Cnj(j j 1)p j q n j + np j p 2 d2 dx 2 Cnx j j q n j xp + np j ( ) d p 2 2 (x + q)n dx2 xp + np n(n 1)p 2 + np. Therefore, var(x) EX 2 (EX) 2 n 2 p 2 np 2 + np n 2 p 2 np(1 p). 3.Poisson Distribution P(λ): var(x) λ. Since EX λ and P (X k) λk k! e λ, k, 1,..., therefore we have EX 2 E(X(X 1)) + EX k(k 1) λk k! e λ + λ k λ 2 k2 λ 2 + λ λ k 2 (k 2)! e λ + λ 7

So, var(x) EX 2 (EX) 2 λ 2 + λ λ 2 λ. 4. Geometric Distribution: var(x) 1 p p 2. The calculating of EX 2 is similarly as Binomial distribution. 5. Uniform Distribution U(a, b): var(x) (b a)2 12. EX 2 x 2 f(x)dx b 6. Exponential Distribution ε(λ): var(x) 1 λ 2. a x 2 1 b a b3 a 3 3(b a). EX 2 x 2 λe λx dx 1 λ 2 t 2 e t dt (x t/λ) 1 λ 2 Γ(3) 2 λ 2 7. Normal Distribution N(µ, σ 2 ): var(x) σ 2. From above subsection, we know that EX 2 1 if X N(, 1). var(x) (x µ) 2 1 σ 2 1 2π σ 2 We have calculate it in last section. 8. Gamma Distribution Γ(α, β): var(x) α β 2. EX 2 1 Γ(α)β 2 Γ(α + 2) Γ(α)β x 2 f(x)dx exp ( 2πσ ) (x µ)2 2σ 2 dx ( ) t 2 exp t2 dt t x µ 2 σ t α+1 e t dt α(α + 1), β x 2 βα x α 1 Γ(α) e βx dx (t βx) where the last equation is because Γ(α + 2) (α + 1)αΓ(α). 8

3.2 Properties of Variance Theorem 3.1 Assume EX µ, var(x) <, µ j EX j, var(x j ) < (1 j n), then (1) var(a + bx) b 2 var(x); (2) var(x) E(X µ) 2 < E(X c) 2, if c µ; (3) var(x) X µ a.s.; (4) var( n j1 X j) n j1 n j1 [EX ix j µ i µ j ]; (5) If X 1, X 2,..., X n are independent, then var( n j1 X j) n j1 var(x j); Proof: (1) (2) If c µ, then var(a + bx) E(a + bx a + bµ) 2 b 2 E(X µ) 2 b 2 var(x). E(X c) 2 E(X µ + µ c) 2 E(X µ) 2 + E(µ c) 2 + 2(µ c)e(x µ) E(X µ) 2. (3) Since var( X ) E( X E X ) 2 E X 2 (E X ) 2, therefore E(X µ) 2 E X µ such that E X µ. Then by Lemma 2.1, we have P (X ) 1. We need to prove if X µ a.s., then E(X µ) 2. In fact, if Y (X µ) 2 is a simple random variable such that Y y k I Ak (ω), then if y k, we have P (A k ) by hypothesis and therefore EY. If S Y, where S is a simple random variable, then S a.s. and consequently ES and (4) Omitted. EY sup ES. S Ω:S Y 9

(5) var j1 X j i1 j1 [EX i X j µ i µ j ] [EX i X i µ i µ i ] i1 var(x i ), the second equation is because EX i X j EX i EX j if i j. 3.3 Three Important Inequality i1 Theorem 3.2 (Markov Inequality) For any random variable X and fixed number ϵ, we have P ( X ϵ) 1 ϵ α E X α, α >. (5) If α 2, then we have (Chebyshev Inequality): P ( X EX ϵ) 1 ϵ 2 E X EX 2 1 var(x). (6) ϵ2 Proof: Since I[ X > ϵ] 1 X α /ϵ α, therefore by Theorem 2.2. P ( X ϵ) E(I[ X > ϵ]) E ( X α /ϵ α ) 1 ϵ α E X α Theorem 3.3 Assume EX 2 <, EY 2 <, then we have EXY EX 2 EY 2. (7) And, EXY EX 2 EY 2 if and only if there exists two fixed number a and b (ab ) such that ax + by a.s.. where Proof: Since for any a, b which satisfy ab, E(aX + by ) 2 a 2 EX 2 + b 2 EY 2 + 2abEXY Σ (a, b)σ(a, b), ( ) EX 2 EXY EXY EY 2, therefore det(σ) EX 2 EY 2 (EXY ) 2. In addition, it s obviously that (EXY ) 2 EX 2 EY 2 if and only if E(aX + by ) 2, and E(aX + by ) 2 if and only if ax + by a.s. by Lemma 2.1. 1

Theorem 3.4 (Jensen Inequality) Suppose g(x) is a convex function, that is for each x, there exists a number λ(x ) such that for all x R. Then g(x) g(x ) + (x x )λ(x ) (8) provided both expectation exist, i.e. E ξ and E g(ξ) <. Proof: Putting x ξ and x Eξ, we find from (8) that E(g(ξ)) g(eξ). (9) g(ξ) g(eξ) + (ξ Eξ)λ(x ). Take expectations for both sides for above inequality, we have (9). λ(x ) can be considered as g (x ). 4 Covariance and Correlation 4.1 Covariance and Correlation Definition 4.1 Assume µ X EX, µ Y EY exist, then if E (X µ X )(Y µ Y ) <, we call cov(x, Y ) E[(X µ X )(Y µ Y )] the Covariance of random variables X, Y. If cov(x, Y ), we say X, Y are Uncorrelated. Definition 4.2 If < σ X σ Y <, then we call the Corrlation of X and Y. ρ XY Usually we use the following equation cov(x, Y ) var(x) var(y ) σ XY σ X σ Y cov(x, Y ) E[(X µ X )(Y µ Y )] E(XY µ X Y µ Y X + µ X µ Y ) EXY EXEY to calculate the covariance of X, Y. We can also use the standardization of X, Y to calculate the correlation [( ) ( )] X µx Y µy ρ XY E where X µ X σ X is the standardization of X with mean and variance 1. σ X 11 σ Y

Theorem 4.1 If ρ XY is the correlation of X, Y, then (1) ρ XY 1; (2) ρ XY 1 iff there exist real numbers c, d such that P (Y c + dx) 1; We call X, Y are Linear Correlated if ρ XY 1. (3) X, Y are independent X, Y are uncorrelated. However, X, Y are uncorrelated X, Y are independent, except when (X, Y ) N(µ 1, µ 2 ; σ1 2, σ2 2 ; ρ), then X, Y are independent X, Y are uncorrelated. Proof: (1) ρ XY E (X µ X)(Y µ Y ) E(X µx ) 2 E(Y µ Y ) 2 1 ρ XY 1 iff P (a(x µ X ) + b(y µ Y )) 1 by Theorem 3.3, which is equivalent to ρ XY 1 iff there exist real numbers c and d such that P (Y c+dx) 1 with d a b and c bµ Y +aµ x b. (3) can be proven by EXY xf(x)dx xyf(x, y)dxdy yf(y)dxdy EXEY, xyf(x)f(y)dxdy such that cov(x, Y ) EXY EXEY EXEY EXEY. See Example 4.2 in Page 173 for X, Y are independent X, Y are uncorrelated when (X, Y ) N(µ 1, µ 2 ; σ1 2, σ2 2 ; ρ). Example 4.1 Assume (X, Y ) are uniformly distributed on D {(x, y) x 2 + y 2 1}, then the density of X is f X (x) 1 π f(x, y)dy 1 π I {x 2 +y 2 1}dy I { y 1 x 2 } dy 2 π 1 x 2, similarly, we have f Y (y) 2 π 1 y 2. Therefore, X and Y are not independent. We know that EX and EY, such that cov(x, Y ) xyf(x, y)dxdy 1 ( 1 ) 1 x 2 x dx. R 2 π 1 Therefore, X and Y are uncorrelated. 1 x 2 ydy 12

4.2 Covariance Matrix Assume X (X 1, X 2,..., X n ) is a random vector, and for every i, µ i EX i exists, then we say EX exists and µ EX (EX 1, EX 2,..., EX n ) (µ 1, µ 2,..., µ n ). If the expectation of random variable X ij for any 1 i m and 1 j n exists, then the expectation of X 11 X 12 X 1n X 21 X 22 X 2n Y... X m1 X m2 X mn exists and are defined as EX 11 EX 12 EX 1n EX 21 EX 22 EX 2n EY.... EX m1 EX m2 EX mn Similar as for 1-dimensional random variable, we also have the following equations for above random vector X (X 1, X 2,..., X n ) and random matrix Y. For any real random vector a (a 1, a 2,..., a n ), matrix A with size k m and B with size n h, we have Proof: E(aX ) E EaX aex (EY ) EY ; EAY AEY ; EY B EY B; EAY B AEY B a j X j j1 a j EX j ax ; j1 (EY ) E(X ij ) (EX ji ) E(X ); ( m ) m E(AY ) E a il X lj a il EX lj AEY ; l1 Similarly, we have E(Y B) E(Y )B such that E(AY B) AE(Y B) AEY B. l1 13

Definition 4.3 If the expectation of random vector X (X 1, X 2,..., X n ), µ EX exists, and for any 1 i n, var(x i ) <, then we call as the Covariance Matrix of X, where Σ E[(X µ) (X µ)] (σ ij ) n n σ ij cov(x i, X j ). A covariance matrix has an very important property non-negative. We have the following theorem: Theorem 4.2 Assume Σ is the covariance matrix of X, then A. Σ is non-negative defined and symmetric. B. det(σ) iff there exists a 1, a 2,..., a n, which satisfy at least one of a 1, a 2,..., a n is not equal to, such that with µ i EX i. a i (X i µ i ) a.s. j1 Proof: A. Since for any n-dimensional real vector a (a 1, a 2,..., a n ), we have aσa a i a j σ ij (1) i1 j1 i1 j1 E a i a j E[(X i µ i )(X j µ j )] (11) i1 j1 a i a j (X i µ i )(X j µ j ) (12) [ 2 E a i (X i µ i )] (13) i1 ([ ]) var a i (X i µ i ) i1. (15) We then prove B by (3) in Theorem 3.1. (14) 14

5 Conditional Expectation Definition 5.1 Assume (X, Y ) is a random vector. If E( X Y y) x f X Y (x y)dx <, then m(y) def E(X Y y) xf X Y (x y)dx, (16) where f X Y (x y) f(x, y) f Y (y) and we call random variable m(y ) as the conditional expectation of X condition on Y, denoted on E(X Y ). Similar as calculating g(x) P (A X), we can first calculate m(y) and then change y to Y. Example 5.1 Assume (X, Y ) N(µ 1, µ 2 ; σ1 2, σ2 2 ; ρ). Please give E(X Y ) and E(Y X). Answer: Since we know that conditional on X x, Y N(µ 2 + (ρσ 2 /σ 1 )(x µ 1 ), (1 ρ 2 )σ 2 2), therefore such that Similary, E(Y X x) µ 2 + (ρσ 2 /σ 1 )(x µ 1 ) E(Y X) µ 2 + (ρσ 2 /σ 1 )(X µ 1 ). E(X Y ) µ 1 + (ρσ 1 /σ 2 )(Y µ 2 ). Conditional expectation has similar properties as expectation. We have the following theorem. Theorem 5.1 Assume X, Y are random variables, g(x), h(y) are real functions and E g(x) <. In addition, we assume EX i < for any 1 i n, then (1) E(c + n j1 c jx j Y ) c + n j1 c je(x j Y ); (2) E[h(Y )g(x) Y ] h(y )E[g(X) Y ]; (3) (Important) E(g(X) Y ) Eg(X) if X, Y are independent; (4) (Important) E[E(g(X) Y )] Eg(X). 15

Proof: (1) is because E( Y y) is an expectation and have the property of expectation. (2) is because E[h(Y )g(x) Y y] E[h(y)g(X) Y y] h(y)e[g(x) Y y]. (3) is because E(g(X) Y y) g(x)f X Y (x y)dx g(x)f(x)dx E(g(x)). (4) We only prove the case when (X, Y ) has the joint density function f(x, y). Assume f Y (y) is the marginal density of Y. We have E(g(X) Y y) which is a function of y. Therefore, E[E(g(X) Y )] g(x)f X Y (x y)dx Eg(X). E(g(X) Y y)f Y (y)dy f(x, y) g(x) f Y (y) dx, f(x, y) g(x) f Y (y) dxf Y (y)dy g(x)f(x, y)dxdy { } g(x) f(x, y)dy dx Now, we give several examples about calculating conditional expectations. The first one is about the indicator function. Example 5.2 Prove that E(P (A X)) P (A). Proof: Since P (A X x) is the probability of event A happens conditional on X x, therefore we have P (A X x) E(I A X x) by P (A) E(I A ) such that E(P (A X)) E(E(I A X)) EI A P (A). Example 5.3 Assume X, Y are random variables, h(x) is a real function. If EX 2 < and Eh 2 (Y ) <, then E[(X E(X Y ))h(y )]. (17) Proof: Since EX 2 <, h 2 (Y ) <, therefore E Xh(Y ) EX 2 Eh 2 (Y ) <. Then E[(X E(X Y ))h(y )] E[Xh(Y )] E[E(X Y )h(y )] E[Xh(Y )] E[E(Xh(Y ) Y )] E[Xh(Y )] E[Xh(Y )]. 16

Example 5.4 (Best Predict) Assume EX 2 <, m(y ) E(X Y ), then for any real function g(y), we have E[X m(y )] 2 E[X g(y )] 2. (18) The equality is true iff g(y ) m(y ) a.s.. Proof: If Eg 2 (Y ), then by 2(a b) 2 + 2a 2 b 2 4a 2 4ab + b 2 (2a b) 2 we have Eg(Y ) 2 2E[X g(y )] 2 + 2EX 2. by letting a X and b g(y ) and then take expectation. Therefore E[X g(y )] 2 such that (18) is true. Now we discuss the case Eg 2 (Y ) <. We have [E(X Y y)] 2 E(X 2 Y y) by EXY EX 2 EY 2 and considering Z X Y y as a random variable, and Y 1 in EXY EX 2 EY 2. Therefore Em 2 (Y ) E(E(X Y )) 2 E[E(X 2 Y )] EX 2 <. We have E[X g(y )] 2 E[X m(y ) + m(y ) g(y )] 2 E[X m(y )] 2 + E[m(Y ) g(y )] 2 + 2E[(X m(y ))(m(y ) g(y ))] E[X m(y )] 2 + E[m(Y ) g(y )] 2 + 2E[(X m(y ))h(y )] E[X m(y )] 2, where the third equation is because (17). And, the equality is true in last inequality iff E[m(Y ) g(y )] 2, which is equivalent to g(y ) m(y ) a.s.. We call that m(y ) is the Best Prediction. Theorem 5.2 Assume P (A) >, X is a non-negative random variable, then E(X A) E(XI A) P (A) (19) Proof: For non-negative random variable, we have EX xdf (x) xf (x) F (x)dx dx F (x)dx (1 F (x))dx as F ( ) 1. In addition, because E(X A) is the expectation of X under the probability 17

distribution P ( A), therefore E(X A) 1 P (A) 1 P (A) P (X > x A)dx E(XI A) P (A). P ({X > x} A)dx P ({XI A > x})dx Y XI A if A not happen Proposition 5.1 Assume P (A) >, E(X A) exists, then E(X A) E(XI A) P (A). Proof: To apply above theorem, we write X as X X + X, where { { X + X, X ; X X, X < ;, X < ;, X ; We can find that X + and X are non-negative. So, we have E(X A) E(X + X A) E(X + A) E(X A) E(X+ I A ) P (A) E(X I A ) P (A) E((X+ X )I A ) P (A) E(XI A) P (A). Example 5.5 Assume X ξ(λ). Then for any a >, we have the no-memory property E(X a X > a) EX. Proof: The density of X is f(x) λe λx, x >. Then by above proposition, we have E(XI[X > a]) E(X X > a) P (X > a) 1 e λa xλe λx dx a 1 λe λa [e λx ( λx 1)] a 1 λe λa [λae λa + e λa ] 1 λ + a. te t dt (te t e t dt) (te t + e t ) 18

Therefore, E(X a X > a) 1 λ + a a 1 λ EX E(X X > ). Note that: Because the exponential distribution has non-memory P (X > s + t X > t) P (X > s), therefore we have above results. However, for other distribution, we may not have above result. 19