Lecture 4: Conditional expectation and independence

Similar documents
Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Lectures 22-23: Conditional Expectations

Conditional expectation

We will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events.

An Introduction to Stochastic Calculus

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

A D VA N C E D P R O B A B I L - I T Y

SDS 321: Introduction to Probability and Statistics

MATH 418: Lectures on Conditional Expectation

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

STA 711: Probability & Measure Theory Robert L. Wolpert

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

ST5215: Advanced Statistical Theory

Introduction to Probability and Stocastic Processes - Part I

E. Vitali Lecture 5. Conditional Expectation

Independent random variables

Continuous Random Variables

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

18.440: Lecture 26 Conditional expectation

Lecture 22: Variance and Covariance

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Probability Review. Chao Lan

Solutions Homework 6

MAS113 Introduction to Probability and Statistics. Proofs of theorems

ABSTRACT EXPECTATION

STAT 430/510: Lecture 16

Measure-theoretic probability

Lecture 6 Basic Probability

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

More than one variable

Hints/Solutions for Homework 3

CMPSCI 240: Reasoning Under Uncertainty

5 Operations on Multiple Random Variables

ST5215: Advanced Statistical Theory

01 Probability Theory and Statistics Review

UNIT-2: MULTIPLE RANDOM VARIABLES & OPERATIONS

Lecture 17: Minimal sufficiency

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Chapter 4. Chapter 4 sections

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

Lecture 3: Random variables, distributions, and transformations

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

CHANGE OF MEASURE. D.Majumdar

Lecture 4 : Random variable and expectation

Bayesian statistics, simulation and software

Lecture 21: Convergence of transformations and generating a random variable

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Probability and Measure

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability on a Riemannian Manifold

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Probability: Handout

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Bivariate distributions

STA 256: Statistics and Probability I

Appendix A : Introduction to Probability and stochastic processes

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University

Multivariate Random Variable

3 Multiple Discrete Random Variables

Measure theory and stochastic processes TA Session Problems No. 3

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Lecture 5: Functions : Images, Compositions, Inverses

Lecture 16: Hierarchical models and miscellanea

Lecture Note 1: Probability Theory and Statistics

conditional cdf, conditional pdf, total probability theorem?

Lecture 14: Multivariate mgf s and chf s

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Lecture 23: UMPU tests in exponential families

Math-Stat-491-Fall2014-Notes-I

Formulas for probability theory and linear models SF2941

Chp 4. Expectation and Variance

Ch. 5 Joint Probability Distributions and Random Samples

MEASURE AND INTEGRATION: LECTURE 18

Probability. Computer Science Tripos, Part IA. R.J. Gibbens. Computer Laboratory University of Cambridge. Easter Term 2008/9

Probability and Statistics

Solutions to Homework Set #6 (Prepared by Lele Wang)

Econ 508B: Lecture 5

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Exercises Measure Theoretic Probability

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Advanced Probability

Lecture 2: Random Variables and Expectation

Random Variables and Their Distributions

Algorithms for Uncertainty Quantification

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

Lecture 2: Repetition of probability theory and statistics

Transcription:

Lecture 4: Conditional expectation and independence In elementry probability, conditional probability P(B A) is defined as P(B A) P(A B)/P(A) for events A and B with P(A) > 0. For two random variables, X and Y, how do we define P(X B Y y)? Definition 1.6 Let X be an integrable random variable on (Ω,F,P). (i) The conditional expectation of X given A (a sub-σ-field of F ), denoted by E(X A ), is the a.s.-unique random variable satisfying the following two conditions: (a) E(X A ) is measurable from (Ω,A ) to (R,B); (b) A E(X A )dp A XdP for any A A. (ii) The conditional probability of B F given A is defined to be P(B A ) E(I B A ). (iii) Let Y be measurable from (Ω,F,P) to (Λ,G ). The conditional expectation of X given Y is defined to be E(X Y ) E[X σ(y )]. UW-Madison (Statistics) Stat 709 Lecture 4 2018 1 / 15

Remarks The existence of E(X A ) follows from Theorem 1.4. σ(y ) contains the information in Y " E(X Y ) is the expectation of X given the information in Y For a random vector X, E(X A ) is defined as the vector of conditional expectations of components of X. Lemma 1.2 Let Y be measurable from (Ω,F ) to (Λ,G ) and Z a function from (Ω,F ) to R k. Then Z is measurable from (Ω,σ(Y )) to (R k,b k ) iff there is a measurable function h from (Λ,G ) to (R k,b k ) such that Z h Y. By Lemma 1.2, there is a Borel function h on (Λ,G ) such that E(X Y ) h Y. For y Λ, we define E(X Y y) h(y) to be the conditional expectation of X given Y y. h(y) is a function on Λ, whereas h Y E(X Y ) is a function on Ω. UW-Madison (Statistics) Stat 709 Lecture 4 2018 2 / 15

Example 1.21 Let X be an integrable random variable on (Ω,F,P), A 1,A 2,... be disjoint events on (Ω,F,P) such that A i Ω and P(A i ) > 0 for all i, and let a 1,a 2,... be distinct real numbers. Define Y a 1 I A1 + a 2 I A2 +. We now show that E(X Y ) i1 A i XdP P(A i ) I A i. We need to verify (a) and (b) in Definition 1.6 with A σ(y ). Since σ(y ) σ({a 1,A 2,...}), it is clear that the function on the right-hand side is measurable on (Ω,σ(Y )). This verifies (a). To verify (b), we need to show for any B B, Y 1 (B) XdP Y 1 (B) [ i1 ] A i XdP P(A i ) I A i dp. UW-Madison (Statistics) Stat 709 Lecture 4 2018 3 / 15

Example 1.21 (continued) Using the fact that Y 1 (B) i:ai BA i, we obtain Y 1 (B) XdP i:a i B i1 XdP A i A i XdP ( P(A i ) P Y 1 (B) [ i1 ) A i Y 1 (B) ] A i XdP P(A i ) I A i dp, where the last equality follows from Fubini s theorem. This verifies (b) and thus the result. Let h be a Borel function on R satisfying h(a i ) XdP/P(A i ). A i Then E(X Y ) h Y and E(X Y y) h(y). UW-Madison (Statistics) Stat 709 Lecture 4 2018 4 / 15

Proposition 1.9 Let X be a random n-vector and Y a random m-vector. Suppose that (X,Y ) has a joint p.d.f. f (x,y) w.r.t. ν λ, where ν and λ are σ-finite measures on (R n,b n ) and (R m,b m ), respectively. Let g(x,y) be a Borel function on R n+m for which E g(x,y ) <. Then g(x,y )f (x,y )dν(x) E[g(X,Y ) Y ] a.s. f (x,y )dν(x) Proof Denote the right-hand side by h(y ). By Fubini s theorem, h is Borel. Then, by Lemma 1.2, h(y ) is Borel on (Ω,σ(Y )). Also, by Fubini s theorem, f Y (y) f (x,y)dν(x) is the p.d.f. of Y w.r.t. λ. UW-Madison (Statistics) Stat 709 Lecture 4 2018 5 / 15

Proof (continued) For B B m, Y 1 (B) h(y )dp B B h(y)dp Y g(x,y)f (x,y)dν(x) f Y (y)dλ(y) f (x,y)dν(x) R n B R n B Y 1 (B) g(x,y)f (x,y)dν λ g(x,y)dp (X,Y ) g(x,y )dp, where the first and the last equalities follow from Theorem 1.2, the second and the next to last equalities follow from the definition of h and p.d.f. s, and the third equality follows from Fubini s theorem. UW-Madison (Statistics) Stat 709 Lecture 4 2018 6 / 15

Conditional p.d.f. Let (X,Y ) be a random vector with a joint p.d.f. f (x,y) w.r.t. ν λ The conditional p.d.f. of X given Y y is defined to be f X Y (x y) f (x,y)/f Y (y) where f Y (y) f (x,y)dν(x) is the marginal p.d.f. of Y w.r.t. λ. For each fixed y with f Y (y) > 0, f X Y (x y) is a p.d.f. w.r.t. ν. Then Proposition 1.9 states that E[g(X,Y ) Y ] g(x,y )f X Y (x Y )dν(x) i.e., the conditional expectation of g(x,y ) given Y is equal to the expectation of g(x,y ) w.r.t. the conditional p.d.f. of X given Y. UW-Madison (Statistics) Stat 709 Lecture 4 2018 7 / 15

Proposition 1.10 Let X, Y, X 1,X 2,... be integrable random variables on (Ω,F,P) and A be a sub-σ-field of F. (i) If X c a.s., c R, then E(X A ) c a.s. (ii) If X Y a.s., then E(X A ) E(Y A ) a.s. (iii) If a,b R, then E(aX + by A ) ae(x A ) + be(y A ) a.s. (iv) E[E(X A )] EX. (v) E[E(X A ) A 0 ] E(X A 0 ) E[E(X A 0 ) A ] a.s., where A 0 is a sub-σ-field of A. (vi) If σ(y ) A and E XY <, then E(XY A ) YE(X A ) a.s. (vii) If X and Y are independent and E g(x,y ) < for a Borel function g, then E[g(X,Y ) Y y] E[g(X,y)] a.s. P Y. (viii) If EX 2 <, then [E(X A )] 2 E(X 2 A ) a.s. (ix) (Fatou s lemma). If X n 0 for any n, then E ( liminf n X n A ) liminf n E(X n A ) a.s. (x) (Dominated convergence theorem). If X n Y for any n and X n a.s. X, then E(X n A ) a.s. E(X A ). UW-Madison (Statistics) Stat 709 Lecture 4 2018 8 / 15

Example 1.22 Let X be a random variable on (Ω,F,P) with EX 2 < and let Y be a measurable function from (Ω,F,P) to (Λ,G ). One may wish to predict the value of X based on an observed value of Y. Let g(y ) be a predictor, i.e., g ℵ {all Borel functions g with E[g(Y )] 2 < }. Each predictor is assessed by the mean squared prediction error" E[X g(y )] 2. We now show that E(X Y ) is the best predictor of X in the sense that E[X E(X Y )] 2 min g ℵ E[X g(y )]2. First, Proposition 1.10(viii) implies E(X Y ) ℵ. UW-Madison (Statistics) Stat 709 Lecture 4 2018 9 / 15

Example 1.22 (continued) Next, for any g ℵ, E[X g(y )] 2 E[X E(X Y ) + E(X Y ) g(y )] 2 E[X E(X Y )] 2 + E[E(X Y ) g(y )] 2 + 2E{[X E(X Y )][E(X Y ) g(y )]} E[X E(X Y )] 2 + E[E(X Y ) g(y )] 2 + 2E { E{[X E(X Y )][E(X Y ) g(y )] Y } } E[X E(X Y )] 2 + E[E(X Y ) g(y )] 2 + 2E{[E(X Y ) g(y )]E[X E(X Y ) Y ]} E[X E(X Y )] 2 + E[E(X Y ) g(y )] 2 E[X E(X Y )] 2, where the third equality follows from Proposition 1.10(iv), the fourth equality follows from Proposition 1.10(vi), and the last equality follows from Proposition 1.10(i), (iii), and (vi). UW-Madison (Statistics) Stat 709 Lecture 4 2018 10 / 15

Definition 1.7 (Independence). Let (Ω,F,P) be a probability space. (i) Let C be a collection of subsets in F. Events in C are said to be independent iff for any positive integer n and distinct events A 1,...,A n in C, P(A 1 A 2 A n ) P(A 1 )P(A 2 ) P(A n ). (ii) Collections C i F, i I (an index set that can be uncountable), are said to be independent iff events in any collection of the form {A i C i : i I } are independent. (iii) Random elements X i, i I, are said to be independent iff σ(x i ), i I, are independent. Lemma 1.3 (a useful result for checking the independence of σ-fields) Let C i, i I, be independent collections of events. If each C i is a π-system (A C i and B C i implies A B C i ), then σ(c i ), i I, are independent. UW-Madison (Statistics) Stat 709 Lecture 4 2018 11 / 15

Facts Random variables X i, i 1,...,k, are independent according to Definition 1.7 iff F (X1,...,X k )(x 1,...,x k ) F X1 (x 1 ) F Xk (x k ), (x 1,...,x k ) R k Take C i {(a,b] : a R,b R}, i 1,...,k If X and Y are independent random vectors, then so are g(x) and h(y ) for Borel functions g and h. Two events A and B are independent iff P(B A) P(B), which means that A provides no information about the probability of the occurrence of B. Proposition 1.11 Let X be a random variable with E X < and let Y i be random k i -vectors, i 1,2. Suppose that (X,Y 1 ) and Y 2 are independent. Then E[X (Y 1,Y 2 )] E(X Y 1 ) a.s. UW-Madison (Statistics) Stat 709 Lecture 4 2018 12 / 15

Proof First, E(X Y 1 ) is Borel on (Ω,σ(Y 1,Y 2 )), since σ(y 1 ) σ(y 1,Y 2 ). Next, we need to show that for any Borel set B B k 1+k 2, XdP E(X Y 1 )dp. (Y 1,Y 2 ) 1 (B) If B B 1 B 2, where B i B k i, then and Y 1 1 (B 1) Y 1 2 (B 2) (Y 1,Y 2 ) 1 (B) (Y 1,Y 2 ) 1 (B) Y 1 1 (B 1) Y 1 2 (B 2) E(X Y 1 )dp I Y 1 1 (B 1) I Y 1 2 (B 2) E(X Y 1)dP I Y 1 1 (B 1) E(X Y 1)dP I Y 1 1 (B 1) XdP I Y 1 1 (B 1) I Y 1 2 (B 2) XdP I Y 1 2 (B 2) dp XdP, I Y 1 2 (B 2) dp Y 1 1 (B 1) Y 1 2 (B 2) UW-Madison (Statistics) Stat 709 Lecture 4 2018 13 / 15

where the second and the next to last equalities follow the independence of (X,Y 1 ) and Y 2, and the third equality follows from the fact that E(X Y 1 ) is the conditional expectation of X given Y 1. This shows that the result for B B 1 B 2. Note that B k 1 B k 2 is a π-system. We can show that the following collection is a λ-system: { } H B R k 1+k 2 : XdP E(X Y 1 )dp (Y 1,Y 2 ) 1 (B) (Y 1,Y 2 ) 1 (B) Since we have already shown that B k 1 B k 2 H, B k 1+k 2 σ(b k 1 B k 2) H and thus the result follows. Remark The result in Proposition 1.11 still holds if X is replaced by h(x) for any Borel h and, hence, P(A Y 1,Y 2 ) P(A Y 1 ) a.s. for any A σ(x), if (X,Y 1 ) and Y 2 are independent. UW-Madison (Statistics) Stat 709 Lecture 4 2018 14 / 15

Conditional independence Let X, Y, and Z be random vectors. We say that given Z, X and Y are conditionally independent iff P(A X,Z ) P(A Z ) a.s. for any A σ(y ). Proposition 1.11 can be stated as: if Y 2 and (X,Y 1 ) are independent, then given Y 1, X and Y 2 are conditionally independent. Discussion Conditional independence is a very important concept for statistics. For example, if X and Z are covariates associated with a response Y, and if given Z, X and Y are conditionally independent, then when we have Z, we do not need X to study the relationship between Y and covariates. The dimension of the covariate vector is reduced without losing information (sufficient dimension reduction). Although X may be unconditionally dependent of Y, it is related with Y through Z. UW-Madison (Statistics) Stat 709 Lecture 4 2018 15 / 15