Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Similar documents
Chapter 4. Chapter 4 sections

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Jointly Distributed Random Variables

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Chapter 4 continued. Chapter 4 sections

BASICS OF PROBABILITY

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Expectation of Random Variables

5 Operations on Multiple Random Variables

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Lecture 11. Multivariate Normal theory

Continuous Random Variables

We introduce methods that are useful in:

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review

3. Probability and Statistics

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

STAT 430/510: Lecture 16

A Probability Review

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

2. Matrix Algebra and Random Vectors

Review of Probability Theory

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

3 Multiple Discrete Random Variables

Appendix A : Introduction to Probability and stochastic processes

STA 256: Statistics and Probability I

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

Chp 4. Expectation and Variance

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

The mean, variance and covariance. (Chs 3.4.1, 3.4.2)

Stat 206: Sampling theory, sample moments, mahalanobis

ECON Fundamentals of Probability

Statistical Methods in Particle Physics

Chapter 4 : Expectation and Moments

Chapter 5 continued. Chapter 5 sections

18.440: Lecture 25 Covariance and some conditional expectation exercises

The Multivariate Gaussian Distribution

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

18.600: Lecture 24 Covariance and some conditional expectation exercises

Final Review: Problem Solving Strategies for Stat 430

Lectures on Elementary Probability. William G. Faris

Lecture 1: August 28

Joint Distribution of Two or More Random Variables

1.12 Multivariate Random Variables

ENGG2430A-Homework 2

ECE Lecture #9 Part 2 Overview

More on Distribution Function

Basics on Probability. Jingrui He 09/11/2007

ECE 541 Stochastic Signals and Systems Problem Set 9 Solutions

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Multiple Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Stat 5101 Notes: Algorithms

Solutions and Proofs: Optimizing Portfolios

7. Be able to prove Rules in Section 7.3, using only the Kolmogorov axioms.

SDS 321: Introduction to Probability and Statistics

MAS223 Statistical Inference and Modelling Exercises

Probability, Random Processes and Inference

Lecture 2: Repetition of probability theory and statistics

The Multivariate Normal Distribution. In this case according to our theorem

6.041/6.431 Fall 2010 Quiz 2 Solutions

ACM 116: Lectures 3 4

Lecture 2: Review of Probability

Probability and Distributions

Notes on Random Vectors and Multivariate Normal

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Random Variables and Their Distributions

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Bivariate distributions

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

STAT/MATH 395 PROBABILITY II

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Discrete Probability Refresher

The Hilbert Space of Random Variables

Bivariate Distributions

Chapter 5. Chapter 5 sections

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Class 26: review for final exam 18.05, Spring 2014

Review of Probability. CS1538: Introduction to Simulations

3 Conditional Expectation

2 (Statistics) Random variables

Section 9.1. Expected Values of Sums

1 Review of Probability

Review: mostly probability and some statistics

1.1 Review of Probability Theory

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Transcription:

Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations, then E(X 1 + X + + X n )=E(X 1 )+E(X )+ + E(X n ) The X i can be continuous or discrete or of any other type. The expectation on the left-hand-side is with with respect to the joint distribution of X 1,...,X n. The ith expectation on the right-hand-side is with with respect to the marginal distribution of X i, i =1,...,n. STAT/MTHE 353: 4 - More on Expectations 1/ 37 STAT/MTHE 353: 4 - More on Expectations / 37 Often we can write a r.v. X as a sum of simpler random variables. Then E(X) is the sum of the expectation of these simpler random variables. Example: Consider (X 1,...,X r ) having multinomial distribution with parameters n and (p 1,...,p r ). Compute E(X i ), i =1,...,r Example: Let (X 1,...,X r ) the multivariate hypergeometric distribution with parameters N and n 1,...,n r. Compute E(X i ), i =1,...,r Example: We have two urns. Initially Urn 1 contains n red balls and Urn contains n blue balls. At each stage of the experiment we pick a ball from Urn 1 at random, also pick a ball from Urn at random, and then swap the balls. Let X = # of red balls in Urn 1 after k stages. Compute E(X) for even k. Example: (Matching problem) If the integers 1,,...,n are randomly permuted, what is the probability that integer i is in the ith position? What is the expected number of integers in the correct position? STAT/MTHE 353: 4 - More on Expectations 3/ 37 STAT/MTHE 353: 4 - More on Expectations 4/ 37

Conditional Expectation Suppose X =(X 1,...,X n ) T and =( 1,..., m ) T are two vector random variables defined on the same probability space. The distributions (joint marginals) of X and can be described the pdfs f X (x) and f (y) (if both X and are continuous) or by the pmfs p X (x) and p (y) (if both are discrete). The joint distribution of the pair (X, ) can be described by their joint pdf f X, (x, y) or joint pmf p X, (x, y). The conditional distribution of X given = y is described by either the conditional pdf or the conditional pmf f X (x y) = f X, (x, y) f (y) p X (x y) = p X, (x, y) p (y) STAT/MTHE 353: 4 - More on Expectations 5/ 37 Remarks: (1) In general, X and can have di erent types of distribution (e.g., one is discrete, the other is continuous). Example: Let n = m =1and X = + Z, where is a Bernoulli(p) r.v.andz N(0, ),and and Z are independent. Determine the conditional pdf of X given =0and =1. Also, determine the pdf of X. () Not all random variables are either discrete or continuous. Mixed discrete-continuous and even more general distributions are possible, but they are mostly out of the scope of this course. STAT/MTHE 353: 4 - More on Expectations 6/ 37 Definitions (1) The conditional expectation of X given = y is the mean (expectation) of the distribution of X given = y and is denoted by E(X = y). () The conditional variance of X given = y is the the variance of the distribution of X given = y and is denoted by Var(X = y). If both X and are discrete, Special case: Assume X and are independent. Then (considering the discrete case) p X (x y) =p X (x) so that for all y, E(X = y) = X x xp X (x y) E(X = y) = X x xp X (x y) = X x xp X (x) =E(X) and Var(X = y) = X x E(X = y) p X (x y) x In case both X and are continuous, wehave and Var(X = y) = E(X = y) = Z 1 Z 1 1 xf X (x y) dx x E(X = y) f X (x y) dx 1 STAT/MTHE 353: 4 - More on Expectations 7/ 37 A similar argument shows E(X = y) = E(X) if X and are independent continuous random variables. STAT/MTHE 353: 4 - More on Expectations 8/ 37

Notation: Let g(y) = E(X = y). We define the random variable E(X ) by setting E(X )=g( ) Similarly, letting h(y) = Var(X = y), the random variable Var(X ) is defined by Var(X )=h( ) The following are important properties of conditional expectation. We don t prove them formally, but they should be intuitively clear. Properties (i) (Linearity of conditional expectation) IfX 1 and X are random variables with finite expectations, then for all a, b R, E(aX 1 + bx )=ae(x 1 )+be(x ) For example, if X and are independent, then E(X = y) = E(X) (constant function), so E(X )=E(X) (ii) If g : R! R is a function such that E[g( )] is finite, then and if E g( )X is finite, then E g( ) = g( ) E g( )X = g( )E(X ) STAT/MTHE 353: 4 - More on Expectations 9/ 37 STAT/MTHE 353: 4 - More on Expectations 10 / 37 Theorem 1 (Law of total expectation) E(X) =E E(X ) Proof: Assume both X and are discrete. Then E E(X ) = X E(X = y)p (y) = X X xp X (x y) p (y) y y x = X X x p X, (x, y) p (y) = X X xp X, (x, y) p y x (y) y x = X x xp X (x) =E(X) Example: Expected value of geometric distribution... Lemma (Variance formula) Var(X) =E Var(X ) + Var E(X ) Proof: Since Var(X = y) is the variance of the conditional distribution of X given = y, Var(X )=E[X ] E[X ] Taking expectation (with respect to ), E Var(X ) = E E[X ]) E E[X ] = E(X ) E E[X ] On the other hand, Var E[X ] = E E[X ] so E E(X ) = E E[X ] E(X) Var(X) =E(X ) E(X) = E Var(X ) + Var E(X ) STAT/MTHE 353: 4 - More on Expectations 11 / 37 STAT/MTHE 353: 4 - More on Expectations 1 / 37

Remarks: (1) Let A be an event and X the indicator of A: 8 < 1 if A occurs X = : 0 if A c occurs () The law of total expectation says that we can compute the mean of a distribution by conditioning on another random variable. This distribution can be a conditional distribution. For example, for r.v. s X,,andZ, E(X = y) =E E(X = y, Z) = y Then E(X) =P (A). Assuming is a discrete r.v., we have E(X = y) =P (A = y) and the law of total expectation states so that E(X )=E E(X,Z) P (A) =E(X) = X y E(X = y)p (y) = X y P (A = y)p (y) For example, if Z is discrete, which is the law of total probability. For continuous we have P (A) = Z 1 E(X = y)f (y) dy = Z 1 1 1 P (A = y)f (y) dy E(X = y) = X E(X = y, Z = z)p Z (z y) z = X E(X = y, Z = z)p (Z = z = y) z Exercise: Prove the above statement if X,,andZ are discrete. STAT/MTHE 353: 4 - More on Expectations 13 / 37 STAT/MTHE 353: 4 - More on Expectations 14 / 37 Example: Repeatedly flip a biased coin which comes up heads with probability p. Let X denote the number of flips until consecutive heads occur. Find E(X). Solution: Example: (Simplex algorithm) There are n vertices (points) that are ranked from best to worst. Start from point j and at each step, jump to one of the better points at random (with equal probability). What is the expected number of steps to reach the best point? Solution: Minimum mean square error (MMSE) estimation Suppose a r.v. is observed and based on its value we want to guess the value of another r.v. X. Formally, we want to use a function g( ) of to estimate the unobserved X in the sense of minimizing the mean square error E (X g( )) It turns out that g ( )=E(X ) is the optimal choice. Theorem 3 Suppose X has finite variance. Then for g ( )=E(X ) and any function g E (X g( )) E (X g ( )) STAT/MTHE 353: 4 - More on Expectations 15 / 37 STAT/MTHE 353: 4 - More on Expectations 16 / 37

Proof: E (X Use the properties of conditional expectation: g( )) = E (X g ( )+g ( ) g( )) = E (X g ( )) +(g ( ) g( )) + (X g ( ))(g ( ) g( )) = E (X g ( )) + E (g ( ) g( )) +E (X g ( ))(g ( ) g( )) = E (X g ( )) +(g ( ) g( )) + (g ( ) g( ))E X g ( ) = E (X g ( )) +(g ( ) g( )) + (g ( ) g( )) E(X ) g ( ) {z } =0 = E (X g ( )) +(g ( ) g( )) Proof cont d Thus E (X g( )) = E (X g ( )) +(g ( ) g( )) Take expectations on both sides and use the law of total expectation to obtain E (X g( )) = E (X g ( )) + E g ( ) g( )) Since g ( ) g( ) 0,thisimplies E (X g( )) E (X g ( )) STAT/MTHE 353: 4 - More on Expectations 17 / 37 STAT/MTHE 353: 4 - More on Expectations 18 / 37 Random Sums Remark: Note that since g (y) =E(X = y), wehave E Var(X ) = E (X g ( )) i.e., E Var(X ) is the mean square error of the MMSE estimate of X given. Example: Suppose X N(0, X ) and Z N(0, Z ),wherex and Z are independent. Here X represents a signal sent from a remote location which is corrupted by noise Z so that the received signal is = X + Z. What is the MMSE estimate of X given = y? Theorem 4 (Wald s equation) Let X 1,X... be i.i.d. random variables with mean µ. LetN be r.v. with values in {1,,...} that is independent of the X i s and has finite mean E(N). DefineX = P N X i. Then Proof: E(X) =E(N)µ E(X N = n) = E(X 1 + + X N N = n) = E(X 1 + + X n N = n) = E(X 1 N = n)+ + E(X n N = n) (linearity of expectation) = E(X 1 )+ + E(X n ) (N and X i are independent) = nµ STAT/MTHE 353: 4 - More on Expectations 19 / 37 STAT/MTHE 353: 4 - More on Expectations 0 / 37

Covariance and Correlation Proof cont d: We obtained E(X N = n) =nµ for all n =1,,...,i.e, E(X N) =Nµ. By the law of total expectation E(X) =E E(X N) = E(Nµ)=E(N)µ Example: (Branching Process) Suppose a population evolves in generations starting from a single individual (generation 0). Each individual of the ith generation produces a random number of o springs; the collection of all o springs by generation i individuals forms generation i +1. The number of o springs born to distinct individuals are independent random variables with mean µ. LetX n be the number of individuals in the nth generation. Find E(X n ). Covariance Definition Let X and be two random variables with finite variance. Their covariance is defined by Cov(X, )=E (X E(X))( E( )) Properties: (1) Cov(X, ) = E(X ) E E(X) = E(X ) E(X)E( )+E(X)E( ) = E(X ) E(X)E( ) E XE( ) + E E(X)E( ) The formula Cov(X, )=E(X ) computations. E(X)E( ) is often useful in STAT/MTHE 353: 4 - More on Expectations 1 / 37 STAT/MTHE 353: 4 - More on Expectations / 37 () Cov(X, )=Cov(,X). (3) If X = we obtain (5) If X and are independent, then Cov(X, )=0. Proof: By independence, E(X )=E(X)E( ), so Cov(X, )=E (X E(X)) ) = Var(X) Cov(X, )=E(X ) E(X)E( )=0 (4) For any constants a, b, c and d, Cov(aX + b, c + d) = E (ax + b E(aX + b))(c + d E(c + d)) = E a(x E(X))c( E( )) = ace (X E(X))( E( )) = ac Cov(X, ) Definition Let X 1,...,X n be random variables with finite variances. The covariance matrix of the vector X =(X 1,...,X n ) T is the n n matrix Cov(X) whose (i, j)th entry is Cov(X i,x j ). Remarks: The ith diagonal entry of Cov(X) is Var(X i ), i =1,...,n Cov(X) is a symmetric matrix since Cov(X i,x j )=Cov(X j,x i ) for all i and j. STAT/MTHE 353: 4 - More on Expectations 3 / 37 STAT/MTHE 353: 4 - More on Expectations 4 / 37

Some properties of covariance are easier to derive using a matrix formalism. Let V = { ij ; i =1,..., m, j =1,...,n} be an m n matrix of random variables having finite expectations. We define E(V ) by taking expectations componentwise: 3 3 11... 1n E( 11 )... E( 1n ) 1... n E(V )=E 6. 4... 7. 5 = E( 1 )... E( n ) 6. 4... 7. 5 m1... mn E( m1 )... E( mn ) Now notice that the n n matrix (X E(X))(X E(X)) T has (X i E(X i ))(X j E(X j )) in its (i, j)th entry. Thus Cov(X) =E (X E(X))(X E(X)) T Lemma 5 Let A be an m n real matrix and define = AX (an m-dimensional random vector). Then Cov( )=A Cov(X)A T Proof: First note that by the linearity of expectation, Thus E( )=E(AX) =AE(X). Cov( ) = E ( E( ))( E( )) T = E (AX AE(X))(AX AE(X)) T = E (A(X E(X)))(A(X E(X))) T = E A(X E(X))(X E(X))) T A T = AE (X E(X))(X E(X))) T A T = A Cov(X)A T STAT/MTHE 353: 4 - More on Expectations 5 / 37 STAT/MTHE 353: 4 - More on Expectations 6 / 37 For any m-vector c =(c 1,...,c m ) T we also have Cov( + c) =Cov( ) since Cov( i + c i, j + c j )=Cov( i, j ). Thus Cov(AX + c) =A Cov(X)A T Let m =1so that c = c is a scalar and A is and 1 n matrix, i.e., A is arowvectora = a T =(a 1,...,a n ). Then X n Cov(a T X + c) =Cov X n a i X i + c = Var a i X i + c On the other hand, Cov(a T X + c) = a T Cov(X)a = = = Hence X n Var a i X i + c = j=1 a i a j Cov(X i,x j ) a i a j Cov(X i,x j ) a i Cov(X i,x i )+X i<j a i a j Cov(X i,x j ) a i Var(X i )+X i<j a i Var(X i )+X i<j a i a j Cov(X i,x j ) Note that if X 1,...,X n are independent, thencov(x i,x j )=0for i 6= j, and we obtain X n Var a i X i + c = a i Var(X i ) STAT/MTHE 353: 4 - More on Expectations 7 / 37 STAT/MTHE 353: 4 - More on Expectations 8 / 37

More generally, let X =(X 1,...,X n ) T and =( 1,..., m ) T and let Cov(X, ) be the n m matrix with (i, j)th entry Cov(X i, j ). Note that Cov(X, )=E (X E(X))( E( )) T If A is a k n matrix, B is an l m matrix, c is a k-vector and d is an l-vector, then Cov(AX + c, B + d) = E (AX + c E(AX + c))(b + d E(B + d)) T We obtain = AE (X E(X))( E( )) T B T Cov(AX + c, B + d) =A Cov(X, )B T We can now prove the following important property of covariance: Lemma 6 For any constants a 1,...,a n and b 1,...,b m, X n Cov a i X i, i.e., Cov(X, ) is bilinear. mx b j j = j=1 j=1 mx a i b j Cov(X i, j ) Proof: Let k = l =1and A = a T =(a 1,...,a n ) and B = b T =(b 1,...,b m ). Then we have X n Cov a i X i, mx b j j j=1 = Cov(a T X, b T ) =Cov(AX, B ) = A Cov(X, )B T = a T Cov(X, )b mx = a i b j Cov(X i, j ) j=1 STAT/MTHE 353: 4 - More on Expectations 9 / 37 STAT/MTHE 353: 4 - More on Expectations 30 / 37 The following property of covariance is of fundamental importance: Lemma 7 Cov(X, ) apple p Var(X) Var( ) Proof: First we prove the Cauchy-Schwarz inequality for random variables U and V with finite variances. Let R, then 0 apple E (U V ) = E(U UV + V ) = E(U ) E(UV)+ E(V ) This is a quadratic polynomial in roots. which cannot have two distinct real Proof cont d: Thus its discriminant cannot be positive: 4 E(UV) 4E(U )E(V ) apple 0 so we obtain [E(UV)] apple E(U )E(V ) Use this with U = X E(X) and V = E( ) to get Cov(X, ) = E (X E(X))( E( )) apple qe (X E(X)) E ( E( )) = p Var(X) Var( ) STAT/MTHE 353: 4 - More on Expectations 31 / 37 STAT/MTHE 353: 4 - More on Expectations 3 / 37

Correlation Recall that Cov(aX, b )=ab Cov(X, ). Thisisanundesirable property if we want to use Cov(X, ) as a measure of association between X and. A proper normalization will solve this problem: Definition The correlation coe variances is defined by Remarks: (X, )= Since Var(aX + b) =a Var(X), cient between X and having nonzero Cov(X, ) p Var(X) Var( ) (ax + b, a + d) = (X, ) Letting µ X = E(X), µ = E( ), X = Var(X), = Var( ), we have (X, ) = Cov(X, ) = Cov(X µ X, µ ) X X X µx = Cov, µ X Thus (X, ) is the covariance between the standardized versions of X and. If X and are independent, then Cov(X, )=0, so (X, )=0. On the other hand, (X, )=0does not imply that X and are independent. Remark: If (X, )=0we say that X and are uncorrelated. Example: Find random variables X and that are uncorrelated but not independent. Example: Covariance and correlation for multinomial random variables... STAT/MTHE 353: 4 - More on Expectations 33 / 37 STAT/MTHE 353: 4 - More on Expectations 34 / 37 Theorem 8 The correlation always satisfies (X, ) apple 1 Moreover, (X, ) =1if and only if = ax + b for some constants a and b (a 6= 0), i.e., is an a ne function of X. Proof: We know that Cov(X, ) apple p Var(X) Var( ), so (X, ) apple 1 always holds. Let s assume now that = ax + b, wherea 6= 0. Then Cov(X, )=Cov(X, ax+b) =Cov(X, ax) = a Cov(X, X) = avar(x) so (X, )= a Var(X) p Var(X)a Var(X) = p a = ±1 a STAT/MTHE 353: 4 - More on Expectations 35 / 37 Proof cont d: Conversely, suppose that (X, )=1. Then X X Var = Cov, X X X X X = Var + Var This means that X X X = Var(X) + Var( ) X = 1+1 =0 X Cov, X Cov(X, ) X = c for some constant c, so = X X c If (X, )= 1, consider Var X X + and use the same proof Remark: The previous theorem implies that correlation can be thought of as a measure of linear association (linear dependence) between X and. Recall the multinomial example... STAT/MTHE 353: 4 - More on Expectations 36 / 37

Example: (Linear MMSE estimation) Let X and be random variables with zero means and finite variances X > 0 and > 0. Suppose we want to estimate X in the MMSE sense using a linear function of ;i.e., we are looking for a R minimizing E (X a ) Find the minimizing a and determine the resulting minimum mean square error. Relate the results to (X, ). STAT/MTHE 353: 4 - More on Expectations 37 / 37