If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

Similar documents
Lectures 22-23: Conditional Expectations

MATHS 730 FC Lecture Notes March 5, Introduction

Brownian Motion and Conditional Probability

Reminder Notes for the Course on Measures on Topological Spaces

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Advanced Probability

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

MATH 418: Lectures on Conditional Expectation

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Real Analysis Notes. Thomas Goller

l(y j ) = 0 for all y j (1)

18.175: Lecture 3 Integration

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales

Lebesgue-Radon-Nikodym Theorem

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

1. Probability Measure and Integration Theory in a Nutshell

THEOREMS, ETC., FOR MATH 515

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

ABSTRACT CONDITIONAL EXPECTATION IN L 2

A D VA N C E D P R O B A B I L - I T Y

Examples of Dual Spaces from Measure Theory

An introduction to some aspects of functional analysis

THEOREMS, ETC., FOR MATH 516

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

It follows from the above inequalities that for c C 1

1 Stochastic Dynamic Programming

2 Lebesgue integration

Lecture 4 Lebesgue spaces and inequalities

Part II Probability and Measure

Probability and Measure

Real Analysis Qualifying Exam May 14th 2016

Conditional expectation

Integration on Measure Spaces

3 Integration and Expectation

ABSTRACT EXPECTATION

Measure and integration

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

Analysis Comprehensive Exam Questions Fall 2008

Lecture 5 Theorems of Fubini-Tonelli and Radon-Nikodym

It follows from the above inequalities that for c C 1

A VERY BRIEF REVIEW OF MEASURE THEORY

Chapter 6. Integration. 1. Integrals of Nonnegative Functions. a j µ(e j ) (ca j )µ(e j ) = c X. and ψ =

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

The Lebesgue Integral

( f ^ M _ M 0 )dµ (5.1)

Tools from Lebesgue integration

Real Analysis, 2nd Edition, G.B.Folland Signed Measures and Differentiation

Signed Measures. Chapter Basic Properties of Signed Measures. 4.2 Jordan and Hahn Decompositions

Maths 212: Homework Solutions

Overview of normed linear spaces

Measures and Measure Spaces

1 Inner Product Space

Stat 643 Review of Probability Results (Cressie)

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Riesz Representation Theorems

Summary of Real Analysis by Royden

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Three hours THE UNIVERSITY OF MANCHESTER. 24th January

4 Expectation & the Lebesgue Theorems

CHAPTER 6. Differentiation

Random Process Lecture 1. Fundamentals of Probability

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Functional Analysis Winter 2018/2019

Real Analysis Problems

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Notes on Integrable Functions and the Riesz Representation Theorem Math 8445, Winter 06, Professor J. Segert. f(x) = f + (x) + f (x).

212a1214Daniell s integration theory.

Vector measures of bounded γ-variation and stochastic integrals

2 Measure Theory. 2.1 Measures

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Construction of a general measure structure

Chapter 4. Measure Theory. 1. Measure Spaces

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

Measurable functions are approximately nice, even if look terrible.

AN INTRODUCTION TO GEOMETRIC MEASURE THEORY AND AN APPLICATION TO MINIMAL SURFACES ( DRAFT DOCUMENT) Academic Year 2016/17 Francesco Serra Cassano

Measure Theory & Integration

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

Advanced Analysis Qualifying Examination Department of Mathematics and Statistics University of Massachusetts. Monday, August 26, 2013

1 Orthonormal sets in Hilbert space

Lecture 2: Random Variables and Expectation

FUNCTIONAL ANALYSIS CHRISTIAN REMLING

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

MATH 650. THE RADON-NIKODYM THEOREM

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

I. ANALYSIS; PROBABILITY

MATH 426, TOPOLOGY. p 1.

Transcription:

20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this is that of conditioning. In this lecture we will discuss the simplest example of this technique, leading to the definition of conditional expectation. Definition 6.1 Consider a probability space (W,F,P) and a random variable X 2 L 1 (W,F,P). Let G F be a sigma-algebra. We say that the random variable Y is a conditional expectation of X given G if (1) Y is G -measurable, and (2) we have E(Y 1 A )=E(X1 A ) for every A 2 G. The last condition in particular implies (choosing A := {Y 0}) that Y is integrable. We first check that the random variable Y is determined uniquely by properties (1-2), that is, assuming it exists in the first place. Lemma 6.2 If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. Proof. Let A e := {Y Y 0 + e}. Then A e 2 G because both Y and Y 0 are G -measurable by (1), and so it thus Y Y 0. Therefore, by (2), E(Y 1 Ae )=E(X1 Ae )=E(Y 0 1 Ae ). (6.1) All these numbers are finite and so we may subtract the left and right-hand sides to get E((Y Y 0 )1 Ae )=0. But Y Y 0 e on A e and so we get P(A e )=0. Taking union over e 2{2 n : n 1} yields Y Y 0 a.s. The other inequality follows by symmetry. Since the conditional expectation of X given G is determined up to a modification on a null set, we will henceforth write E(X G ) to denote any version of this function. However, we first have to address existence: Lemma 6.3 For each X 2 L 1, the conditional expectation E(X G ) exists. The proof will be based on the well known Lebesgue decomposition of measures. Given two measures µ,n on the same measurable space, recall that µ is said to be absolutely continuous with respect to n with the notation µ n if n(a)=0 implies µ(a)=0. If µ and n are in turn supported on disjoint sets, i.e., there exists A such that µ(a)=0 and n(a c )=0, then we say that these measures are mutually singular, with notation µ? n. Let µ, n be s-finite mea- Theorem 6.4 (Lebesgue decomposition, Radon-Nikodym Theorem) sures on (W,F ). Then there are measures µ 1, µ 2 so that µ = µ 1 + µ 2, µ 1 n, µ 2? n. (6.2) Moreover, there exists an F -measurable, non-negative f : W! R such that µ 1 (A)= f (x)n(dx), A 2 F. (6.3) A

We remark that the function f from the second part of this theorem is called the Radon- Nikodym derivative of µ 1 with respect to n, sometimes also denoted dµ 1 dn. There is a classical proof of the Lebesgue decomposition which is based on the concept of a signed measure and the Hahn and Jordan decompositions. The proof we will present will in turn be based on Hilbert space techniques. Recall that a linear vector space H is a Hilbert space if it complete in the norm k kassociated with the scalar product (, ) on H. An example of a Hilbert space is L 2 (W,F, µ). A linear map j : H! R (assume H is over reals) is a continuous linear functional if j(x) appleckxk for all x 2 H. Relatively straightforward functional-theoretic arguments then yield: Theorem 6.5 (Riesz representation) Let H be a Hilbert space and j 2 H a continuous linear functional. Then there is a 2 H such that j(x)=(a,x), x 2 H. (6.4) Sketch of main ideas. First we realize that B := {x 2 H : j(x)=0} is a closed linear subspace of H. The case j = 0 is handled trivially, so we may suppose j 6= 0. Then A 6= H and so there exists some x 2 H \ B. Considering the orthogonal decomposition of x = a + b, where b 2 B and a? B, it is then checked that a has the stated property. With this fact in hand, we can now give: Proof of Theorem 6.4. The s-finiteness assumption permits us to assume that both µ and n are finite measures. (Otherwise partition the space into parts where this is true.) Consider the Hilbert space H := L 2 (W,F, µ + n) and define the functional j( f ) := f (x)µ(dx), f 2 H. (6.5) Since µ is finite, we have by Cauchy-Schwarz that j( f ) apple µ(w) 1/2 k f k L 2 (µ) apple Ck f k H. (6.6) Since j is obviously linear, the above expression defines a continuous linear function. The Riesz representation ensures the existence of an h 2 H such that f (x)µ(dx)= f (x)h(x)[µ + n](dx), f 2 L 2 (W,F, µ + n). (6.7) We now make a couple of observations: First, n(h 1)=0. (6.8) Indeed, plugging f := 1 {h 1} implies µ(h 1)=µ(h 1)+n(h 1) implying the claim. Similarly, we observe that µ(h > 1)=0, (6.9) because taking f := 1 {h 1+e} yields µ(h 1+e) (1+e)µ(h 1+e) and so µ(h 1+e)=0 for all e > 0; the claim is obtained by taking a union along a sequence of e # 0. Finally, we also claim that [µ + n](h < 0)=0. (6.10) 21

22 Indeed, taking f := 1 {h< e} yields e[µ + n](h < e) 0 implying that [µ + n](h < e)=0; taking a union along a sequence e # 0 then finishes the claim as well. With these observations in hand, we now define µ 1 (dx) := 1 {0appleh(x)<1} µ(dx) and µ 2 (dx) := 1 {h(x)=1} µ(dx). (6.11) Then µ 1 + µ 2 = µ and µ 2? n by the above claims. For the remaining properties to proof, we rewrite (6.7) as f (x) 1 h(x) µ(dx)= f (x)h(x)n(dx), f 2 L 2 (W,F, µ + n) (6.12) On the left hand side we can now replace µ by µ 1 ; setting f (x) := 1 {0appleh(x)<1 e} 1 1 h(x) 1 A (6.13) which is bounded and thus in L 2 (W,F, µ + n) for e > 0, and taking e # 0 with the help of the Monotone Convergence Theorem then yields h(x) µ 1 (A)= 1 h(x) 1 {0appleh(x)<1} n(dx), A 2 F. (6.14) A This is the desired expression. In particular, µ 1 (A)=0 if n(a)=0, so µ 1 n as claimed. We are now also ready to conclude the existence of the conditional expectation: Proof of Lemma 6.3. Suppose first that X 0. Define the measure µ on G by µ(a) := E(X1 A ). This measure is s-finite and it is absolutely continuous with respect to P the restriction of the probability measure to G. Hence, there is G -measurable Y 0 such that µ(a)= Y (w)p(dw), A 2 G. (6.15) A It follows that Y satisfies the properties (1-2) of conditional expectation. This constructs E(X G ) for non-negative X; the general integrable X are taken care of by expanding into positive and negative parts and applying the above to each of the separately. We proceed to discuss some basic properties of conditional expectation: Lemma 6.6 Assuming all random variables below are integrable, we have: (1) (triviality on constants) E(1 G )=1. (2) (positivity) X 0 ) E(X G ) 0 a.s. (3) (linearity) E(aX + by G )=ae(x G )+be(y G ) (4) (Monotone Convergence Theorem) 0 apple X n " X ) E(X n G ) " E(X G ). (5) E(E(X G )) = E(X). (6) If X is G measurable, then E(X G )=X a.s. Proof. These properties are direct applications of the definition and (for (4)) also the Monotone Convergence Theorem. We leave the details to the reader. A consequence of (3) is then:

Corollary 6.7 (Conditional Fatou lemma) Let X n 0 be integrable. Then A consequence of (5) is in turn: Lemma 6.8 and If X 2 L 1 (W,F,P), then lim inf n! E(X n G ) E liminf n! X n G. (6.16) E(X G ) apple E X G (6.17) E(X G ) L 1 (W,G,P) applekxk L 1 (W,F,P). (6.18) In particular, X 7! E(X G ) is a continuous linear map in fact, a contraction of L 1 (W,F,P) onto L 1 (W,G,P). Proof. We have X applex apple X and so E X G apple E(X G ) apple E X G a.s. (6.19) This proves (6.17); taking expectation then yields (6.18). Since X 7! E(X G ) is obviously linear, it defines a linear map of L 1 (W,F,P) onto L 1 (W,G,P). The operator norm of this map is less than one so this map is in fact a contraction. The map is onto as it an identity on L 1 (W,G,P). A standard argument also yields: Lemma 6.9 (Conditional Cauchy-Schwarz inequality) Suppose X,Y 2 L 2 (W,F,P) and suppose G F is a sigma algebra. Then 2 E(XY G ) apple E(X 2 G )E(Y 2 G ) a.s. (6.20) 23 Proof. Consider the function l 7! E (X ly ) 2 G. (6.21) Let W 0 be the event that this function is non-negative for all l 2 Q. Then W 0 2 G with P(W 0 )=1. Expanding the square and using that the discriminant of the quadratic polynomial which is nonnegative on all rationals is non-positive, we get that the inequality in (6.20) holds on W 0. Lemma 6.10 (Conditional Jensen s inequality) below. Then for any X such that f(x) 2 L 1, Let f : R! R be convex and bounded from E f(x) G f E(X G ) a.s. (6.22) Proof. Define L := {(a,b) 2 R 2 : f(x) > ax + b8x 2 R}. Supposing that f is not linear and bounded from below, the set L is non-empty and open in the topology of R 2. In particular, L \ Q 2 is dense in L. Then f(x)= sup (ax + b), x 2 R. (6.23) (a,b)2l \Q 2 Since f(x) ax + b for each (a,b) 2 L, and since f(x) and thus thanks to our conditions on f also X is integrable, we thus have E f(x) G ae(x G )+b a.s. (6.24)

24 Let W a,b be the event where the inequality holds. Then P(W a,b )=1. Define W 0 := T (a,b)2l \Q 2 W a,b and note that P(W 0 )=1. The proof is completed by taking supremum over (a,b) 2 L \ Q 2 in (6.24) which implies that the desired inequality holds on W 0. A consequence of the conditional Jensen inequality is a generalization of Lemma 6.8: Lemma 6.11 For any p 2 [1, ], the map X 7! E(X G ) is a continuous linear transform of L p (W,F,P) onto L p (W,G,P). In fact, this map is a contraction. Proof. Let p 2 [1, ). By Jensen s inequality we have p E X G apple E X p G a.s. (6.25) Taking expectations and raising both sides to 1/p yields the result. For p := we notice that X applekxk implies and so the result follows in this case as well. As a homework assignment, we also established: E(X G ) applekxk a.s. (6.26) Lemma 6.12 (Conditional Hölder inequality) Let p,q 2 [1, ] be such that 1 / p + 1 / q = 1. Then for any sigma algebra G F, all X 2 L p (W,F,P) and all Y 2 L q (W,F,P), E(XY G ) apple E X p G 1 / p E Y q G 1 / q a.s. (6.27) Finally, we will examine the dependence of the conditional expectation on the underlying sigma algebra. First we note this useful principle: Lemma 6.13 ( Smaller always wins principle) for any X 2 L 1 (W,F,P) as well as Let G 1 G 2 be sub sigma-algebras of F. Then E E(X G 1 ) G 2 = E(X G 1 ) a.s. (6.28) E E(X G 2 ) G 1 = E(X G 1 ) a.s. (6.29) In short, sequential conditioning always reduces to that with respect to the smaller sigma algebra. Proof. For the first identity, note that E(X G 1 ) is G 2 -measurable (because it is already G 1 - measurable) and the expectation of both sides agree on any event A 2 G 2. The proof of uniqueness of the conditional expectation thus applies to show equality a.s. For the second identity we note that both sides are automatically G 1 -measurable. So picking A 2 G 1 and applying condition (2) from the definition of conditional expectation, we get E(X1 A ) = A2G2 E E(X G 2 )1 A = A2G1 E E E(X G 2 ) G 1 1 A (6.30) Again, by the uniqueness argument, this implies the second identity as well. Finally, we observe the following useful fact:

25 Lemma 6.14 Suppose that s(x) and G are independent. Then E(X G )=E(X) a.s. (6.31) Proof. Constants are universally measurable so condition (1) in the definition of conditional expectation holds. For condition (2) we notice that for any A 2 G, the stated independence means Hence, E(X) serves as a version of E(X G ). E(X1 A )=E(X)E(1 A )=E E(X)1 A. (6.32)

26 7. CONDITIONAL DISTRIBUTION The conditional expectation shares so many properties with expectation that one may wonder if it can also be realized as an integral with respect to a measure. This naturally yields to the consideration of conditional distribution and conditional probability. We start with the former object first leaving the latter to later time in the course. Theorem 7.1 Let X be a random variable on a probability space (W,F,P) and let G F be a sigma algebra. Then there exists a map µ X : W B(R)! [0,1] such that (1) for each w 2 W,B7! µ X (w,b) is a probability measure on B(R), (2) for each B 2 B(R), w 7! µ X (w,b) is G -measurable, and (3) whenever f : R! R is Borel measurable with f (X) 2 L 1 (W,F,P), then E f (X) G (w)= f (x)µ X (w,dx) (7.1) for a.e. w. We will call µ X (w, ) the conditional distribution of X given G. Also note that the null set in (7.1) is permitted to depend on f. Proof. For each q 2 Q, let us pick a version of E(1 {Xappleq} G ) and use these objects to define the following sets: First, for q < q 0 rational we set Next, for q 2 Q we define W q,q 0 := w 2 W: E(1 {Xappleq} G )(w) apple E(1 {Xappleq 0 } G )(w). (7.2) W q := n w 2 W: E(1 {Xappleq} G )(w)= inf q2q q 0 >q q 0 2Q In addition, we also define n o W + := w 2 W: sup E(1 {Xappleq 0 } G )(w)=1 o E(1 {Xappleq 0 } G )(w). (7.3) n o (7.4) W := w 2 W: inf E(1 {Xappleq q2q 0 } G )(w)=0 By the properties of the conditional expectation, all of these are full-measure events in G. So, if we define \ \ W 0 := W q,q 0 \ W q \ W + \ W, (7.5) q 0 >q q 0 2Q also W 0 2 G with P(W 0 )=1. We now move to the construction of µ X. First set, for w 2 W and q 2 Q, ( E(1 F(w,q) := {Xappleq} G )(w), if w 2 W 0, 1 {q 0}, otherwise. q2q (7.6)

Then q 7! F(w, q) is non-decreasing, right-continuous on Q with limits 1, resp., 0 as q! +, resp., q!. We may thus extend it to a function on W R by F(w,x) := inf F(w, q) (7.7) q>x q2q and get a function which is, for each w, non-decreasing and right-continuous on R with the said limits at ±. In particular, x 7! F(w,x) is a distribution function and so, by the existence theorems from 275A, there is a Borel probability measure µ X (w, ) such that F(w,x)=µ X w,(,x], x 2 R. (7.8) It now remains to verify the state properties. Property (1) is ensured by the very construction. For property (2), we consider the class of sets L := A 2 B(R): w 7! µ X (w,a) is G -measurable (7.9) It is easy to check that L contains W, /0, is closed under finite unions and complements, and also increasing limits and hence it is a sigma algebra. In light of (7.8) and the fact that w 7! F(w,x) is G -measurable, L contains all sets of the form (,q] for q 2 Q. It follows that L = B(R) thus proving (2). Condition (3) is verified similarly: First we note that (7.1) holds for f s of the form f (x) := 1 {xappleq} for q 2 Q. The class of non-negative f s satisfying (7.1) is closed under additions a monotone limits and hence (by the Monotone Class Theorem) it contains all nonnegative measurable functions. The extension to integrable functions is then immediate. 27