Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script

Size: px

Start display at page:

Download "Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script"

Brittany Fields
6 years ago
Views:

1 Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script Thomas Rast WS 06/07 Warning: We are sure there are lots of mistakes in these notes. Use at your own risk! Corrections and other feedback would be greatly appreciated and can be sent to If you report an error please always state what version (the first number on the Id line below) you found it in. For further information see: Introduction This was going to be a short collection of all chapters explicitly mentioned as not in the script, but as the lecture turned out to use different notation almost everywhere, I started typing it all. So the first few sections are just additions to the script, and more importantly, many small additions have been left out deliberately; if you think they should contain the complete notes, feel free to copy someone s hand-written lecture notes and TEX them! Starting with Martingales, however, the notes should be complete and any missing subsections would be an error. Contents II.2 Construction of (Discrete-Time) Stochastic Processes II.3 Kolmogorov s Consistency Theorem IIIConditional Expectation 8 III.1 Definition and Construction III.2 Properties of Conditional Expectations III.3 Regular Conditional Distributions IV Martingales 18 IV.1 Definitions and Examples IV.2 Playing Systems, Stopping Times and Stopping Theorem

2 2 IV.3 The Convergence Theorem IV.4 Applications IV.5 Branching Processes IV.6 Supermartingales and Inequalities IV.7 Differentiation of Measures, Radon-Nikodým V Weak Convergence of Probability Measures 50 V.1 Integration à la Daniell-Stone V.2 Weak Convergence of Probability Measures on Metric Spaces.. 51 V.3 Tightness and Prohorov s Theorem V.4 Weak Convergence on C[0, 1] V.5 Characteristic functions $Id$

3 II.2 Construction of (Discrete-Time) Stochastic Processes 3 II.2 Construction of (Discrete-Time) Stochastic Processes Goal: construct a model for a system with state space (S, S), possibly very many coordinates λ Λ, and describing stochastic behaviour. First we look for necessary conditions for such a model. Start with a (S, S) measurable, Λ an index set, (Ω, F, P) a probability space and Y = (Y λ ) λ Λ a stochastic process (a (Ω, F, P) indexed by Λ with state space S): each Y λ is an S-valued random variable, and all are defined on Ω, i.e. Y λ : Ω S is measurable (F-S) λ Λ. What is the distribution of Y under P, and what properties does it have? View Y as a mapping from Ω to the product space S Λ := {x = (x λ ) λ Λ x λ S λ} with coordinate maps X λ : S Λ S, x X λ (x) := x λ, and σ-field S Λ := σ(x λ, λ Λ) = σ ({{X λ A} λ Λ, A S}) so each X λ is an S-valued random variable on (S Λ, S Λ ). (Note that for Λ uncountable, S Λ is considerably smaller than the product σ- field λ Λ S.) A -closed generator of S Λ is given by the family Z of cylinder sets Z Z := A i S = {x x i A i, i I} A i S, I Λ finite i I λ I Canonical identification of Z S Λ with i I A i S I. The distribution of Y under P is Q := P Y 1 on (S Λ, S Λ ). In the same way, we can introduce the distribution Q (I) of (Y i ) i I under P: view (for I Λ finite) (Y i ) i I as a mapping Ω S I (with σ-field S I = i I S since I is finite) and denote by Q (I) the image of P under that map. Q (I) is called the marginal distribution of (Y i ) i I or of Y on S I. {Q (I) I Λ finite} is a system of (finite-dimensional) marginal distributions of Y under P. The Q (I) have consistent (or projective) structure: Take I J Λ finite and call π J I the projection from S J to S I. Then Q (I) = Q (J) (π J I ) 1.

4 II.2 Construction of (Discrete-Time) Stochastic Processes 4 Formally, S I is generated by cylinders i I A i, and Q (I)[ i I A i ] def = P[Y i A i, i I] = P[Y i A i, i I, and Y i S, j (J \ I)] = Q (J)[ (π J I ) 1( )] A i i I = (Q (J) (π J I ) 1) [ ] A i For describing the behaviour of Y, it is equivalent to know P or Q. How would one construct a model (Ω, F, P) and Y with given probabilistic behaviour? Canonical choice: Ω := S Λ, F := S Λ, Y := X (coordinate maps); then we only need to construct P (or Q). There are two basic cases for the given probabilistic behaviour : 1. Prescribe all finite-dimensional marginals: Given is a collection Q (I), I Λ finite, of probability measures on (S I, S I ); look for P on (Ω, F) such that the projection of P to (S I, S I ) agrees with Q (I) for all I Λ finite. Interpretation: for each finite subsystem, we prescribe a local model, and look for a global model with the given local behaviour. Problems: existence? uniqueness? of P A necessary condition for existence is that the Q (I) are consistent. If S is nice, this is sufficient, see II.3. Uniqueness is easy, see the lemma below. 2. Prescribe initial distribution and successive transition probabilities: This is typical for temporal evolution in discrete time, with Λ countable indexed as N 0. More precisely, suppose (S i, S i ) are measurable spaces, i N 0. (Idea: at each state λ = i N 0, can allow different state spaces.) Define ( n ) n (S n, S n ) := S i, S i. i=0 i I i=0 This approach is elaborated in the script, section II.2. Lemma II.2.1. A probability measure P on (S Λ, S Λ ) is uniquely determined by its finite-dimensional margins: For any consistent family {Q (I) I Λ finite} of finite-dimensional marginals, there is at most one P on (S Λ, S Λ ) with these marginals, i.e. with P (SI,S I ) = Q (I) I Λ finite.

5 II.2 Construction of (Discrete-Time) Stochastic Processes 5 Proof. Almost obvious: for any Z Z, Z = ( i I A ) ( i λ I S ), we have P[Z] = Q (I)[ ] A i so P is defined on Z, and Z is -closed and generates S Λ. i I

6 II.3 Kolmogorov s Consistency Theorem 6 II.3 Kolmogorov s Consistency Theorem Goal: Construct a model for stochastic processes with arbitrary index set. In preparation, a general comment on the structure of probability measures on product spaces: (S i, S i ), i = 1, 2, measure space, P 1 probability measure on (S 1, S 1 ), K transition kernel from (S 1, S 1 ) to (S 2, S 2 ) P = P 1 K on (S 1 S 2, S 1 S 2 ). Conversely, in nice situations every probability measure Q on (S 1 S 2, S 1 S 2 ) has this structure. Definition. S is a Polish space if it is metrizable with a metric d such that (S, d) is a complete and separable metric space. (S, S) is a Borel space if there exist A B(R) and φ: S A such that φ is a bijection and φ, φ 1 are both measurable. Proposition II.3.1. Let (S 1, S 1 ) be a measure space and S 2 a Polish space with S 2 = B(S 2 ). Then every probability measure Q on (S 1 S 2, S 1 S 2 ) has the form Q = Q 1 K for a probability measure Q 1 on (S 1, S 1 ) and a kernel K from (S 1, S 1 ) to (S 2, S 2 ). Remark.: This is also true if (S 2, S 2 ) is Borel. Proof. X i coordinate maps, i = 1, 2. Choose Q 1 := Q X1 1, the distribution of X 1 under Q, and K a regular conditional distribution of X 2 given X 1. See later. In special cases, we can construct K explicitly: Example.: Suppose S 1 is countable, S 1 = 2 S1. Then define { Q[X 2 X 1 = x 1 ] if Q[{X 1 = x 1 }] 0 K(x 1, ) := any p.m. on (S 2, S 2 ) if Q[{X 1 = x 1 }] = 0 K is a kernel, and Q = Q 1 K: indeed, Q[A 1 A 2 ] = Q[X 1 = x 1, X 2 A 2 ] x 1 A 1 = Q[X 1 = x 1 ] Q[X 2 A 2 X 1 = x 1 ] x 1 A 1 Q[{X 1=x 1}] 0 = x 1 A 1 Q 1 [{x 1 }] K(x 1, A 2 ) = (Q 1 K)(A 1 A 2 ) Example.: Suppose Q a product measure on (S 1 S 2, S 1 S 2 ), i.e. m i on (S i, S i ) with Q m 1 m 2, i.e., Q[A] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ), A S 1 S 2, for some f : S 1 S 2 [0, ) measurable with S 1 S 2 f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) = 1. A

7 II.3 Kolmogorov s Consistency Theorem 7 Then Q 1 = Q X1 1 m 1 ; in fact Q 1 [A 1 ] = Q[A 1 S 2 ] = so Q 1 m 1 with density f 1. How to get K? Intuition: A 1 ( ) f(x 1, x 2 )m 2 (dx 2 ) m 1 (dx 1 ) S } 2 {{} :=f 1(x 1) Q [ X 1 (x 1, x 1 + dx 1 ], X 2 (x 2, x 2 + dx 2 ] ] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) gives Q [ X 2 (x 2, x 2 + dx 2 ] X 1 (x 1, x 1 + dx 1 ] ] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) f(x 1 )m 1 (dx 1 ) So now define K(x 1, dx 2 ) := f 2 (x 2 x 1 )m 2 (dx 2 ) with { f(x1,x 2) f f 2 (x 2 x 1 ) := 1(x 1) if f 1 (x 1 ) 0 g(x 2 ) if f 1 (x 1 ) = 0 for an arbitrary measurable g: S 2 [0, ) with S 2 g(x 2 )m 2 (dx 2 ) = 1. f(x 2 x 1 ) is called the conditional density of X 2 with respect to m 2 given X 1 = x 1. Then K is a kernel (by construction), and Q = Q 1 K; the argument is like for the countable case, with integrals replacing sums. Example.: Suppose (X 1, X 2 ) has a bivariate normal distribution with parameters µ 1, µ 2, σ1 2, σ2 2, ρ; this means S 1 = S 2 = R, S 1 = S 2 = B(R), m 1 = m 2 = λ (Lebesgue measure), ( 1 f(x 1, x 2 ) = 2πσ 1 σ exp ξ2 1 2ρξ 1ξ 2 + ξ 2 ) ρ 2 2(1 ρ 2 ) with ξ i = (x i µ i )/σ i, = ( ) ( 1 exp ξ πσ πσ 2 }{{} 2 (1 ρ 2 ) exp (x2 a(x 1 )) 2 ) 2σ2 2(1 ρ2 ) }{{} =:f 1(x 1) =:f 2(x 2 x 1) where f 1 is the density of N(µ 1, σ 2 1 ); f 2 is the density of N(a(x 1 ), σ 2 2 (1 ρ2 )); and a(x 1 ) := µ 2 + ρ σ 2 σ 1 (x 1 µ 1 ). Theorem II.3.2 (Kolmogorov consistency theorem). Take any consistent family {Q (I) I Λ finite} of finite-dimensional distributions; each Q (I) is a probability measure on (S I, S I ), and the Q (I) are consistent. If S is Polish and S = B(S), probability measure P on (S Λ, S Λ ) having the given Q (I) as marginal distributions. (By Lemma (2.2), P is unique.)

8 II.3 Kolmogorov s Consistency Theorem 8 Proof sketch. If Λ is countable, we can reduce this to Ionescu-Tulcea by using the result on the structure of probability measures on nice product spaces: choose P 0 := Q ({0}) ; write Q ({0,1}) = Q ({0}) K 1 = P 0 K 1 ; write Q ({0,1,2}) = Q ({0,1}) K 2 etc. For the general case, some more work; see Kallenberg, Theorem Why does this work, even for uncountable Λ? The key point is that we work on S Λ, and this is a lot smaller than the productσ-field λ Λ S if Λ is uncountable. Indeed: Lemma II.3.3. S Λ := σ(x λ, λ Λ) ({ = σ A j }) S J Λ countable, A i S =: G 1 j J λ J c = σ (X j, j J) =: G 2 J Λ J countable In particular, every event in S Λ depends on at most countably many coordinates. Proof. S Λ G 1, because every X λ is G 1 -measurable. Conversely, for any J Λ countable, A j S = X 1 j (A j ) S Λ. j J λ J c G 2 S Λ is clear because every X λ is S Λ -measurable. Conversely, for J := {λ}, X λ is measurable with respect to σ(x j, j J), so S Λ G 2 follows if G 2 is a σ-field. This is easy: B n G 2, n B n σ(x j, j J n ) for some J n Λ countable J = n N J n is still countable, and n N B n σ(x j, j J).

9 9 III Conditional Expectation Note: This section is incomplete in several places. I strongly advise you to use the script. If you would like to help filling the gaps, please send an . III.1 Definition and Construction Motivation: Given a random variable X on (Ω, F, P). We have some information and want to use this to predict X. How? Idea: Information is given via a σ-field G; A G means that we can observe A. What is the relation between G and F? X is a random variable, so X is F-measurable; so {X c} F, c R, so if we can observe F, we can observe the value of X. So predicting X on the basis of G is only interesting if G F. Definition. Suppose X is a random variable 0 or in L 1, and G F is a σ-field. Any random variable satisfying i) Y is G-measurable ii) E[Y I A ] = E[XI A ] A G is called (a version of) the conditional expectation of X given G, written E[X G] := Y. We say a version of because Y is not unique! Remark.: (i) says that Y may only use information in G; (ii) formalizes the idea that E[X G] is a best prediction of X. Theorem III.1.1. If X 0 or X L 1 a random variable and G a σ-field, then E[X G] exists, and is unique in the following sense: If Y 1, Y 2 are two versions of E[X G], then Y 1 = Y 2 P-a.s. Proof. Uniqueness: More generally, the conditional expectation is monotone: X X P-a.s. implies E[X G] E[X G] P-a.s. To see this, suppose Y satisfies (i), (ii) (in the definition above) for X. Then A := {Y > Y } G by (i) and Y I A Y I A P-a.s. But also E[Y I A ] = E[XI A ] E[X I A ] = E[Y I A ] by (ii) and the assumption. So Y I A = Y I A P-a.s., so P[A] = 0 by definition of A, so Y Y P-a.s. Existence: not via Radon-Nikodym (as in the script), but via projection in Hilbert space. a) Conditional expectation is monotone (see above) so E[X G] 0 if X 0. b) Call L 2 G the set of all random variables in L 2 (P) which agree P-a.s. with some G-measurable random variable. Then L 2 F = L 2 is by Fischer-Riesz (I 6.4) a Hilbert space. Moreover L 2 G is a closed subspace: linear subspace: ok;

10 III.1 Definition and Construction 10 closed: X n X in L 2 subsequence (X nk ) with X nk X P-a.s. so if all the X n are in L 2 G, then X is again equal to some G-measurable random variable P-a.s., hence in L 2 G. Call π the orthogonal projection in L 2 G. By definition, E[(X π(x))z] = (X π(x), Z) L 2 = 0 Z L 2 G For X L 2 (P), choose some G-measurable Y with Y = π(x) P-a.s. Then Y is G-measurable, and (ii) also holds, because E[(X Y )I A ] = E[(X π(x)) I A ] = 0 A G So E[X G] := Y is ok. c) If X 0, then X n := min{x, n} is in L 2 (P) and X n ր X. By (a) and (b), Y n := E[X n G] exists and 0 Y n ր P-a.s., so Y := lim Y n exists P-a.s. and is G-measurable. Moreover, E[Y n I A ] = E[X n I A ] implies (by monotone convergence) E[Y I A ] = E[XI A ] A G. And of course Y 0 P-a.s., so take E[X G] := Y. d) For X L 1 write X = X + X with X ± 0 and E[X ± ] <. Define Y ± := E[X ± G] so that Y ± are 0 and E[Y ± ] = E[X ± ] (take A = Ω). Then Y := Y + Y L 1 is G-measurable, and it satisfies (ii), so E[X G] := Y will do. Remark.: Construction by projection shows the following optimality of conditional expectation: If X L 2, then E[X G] minimizes the L 2 -norm X Y L 2 = ( E[(X Y ) 2 ] ) 1/2 over all Y L 2 which are G-measurable. This gives a precise sense in which (ii) formalizes best prediction. Explicit construction in special cases (p. 47 in script): Example.: Supose G is finitely generated, i.e. G = σ(a 1,..., A n ) with A i F. Pass to the atoms of G and write G = σ(b 1,..., B n ) where n B i = Ω and B i B k =, i k. Then any (finite) G-measurable random variable Y has the form n Y = c i I Bi for some c i R. Now take a random variable X 0 or in L 1. How can we then write E[X G] more explicitly. E[X G] is G-measurable, so it has the form Moreover, (ii) for A = B i gives E[X G] = Y = E[XI Bk ] = E[Y I Bk ] = n c i I Bi. n c i E[I Bi I Bk ] = c k P[B k ]

11 III.2 Properties of Conditional Expectations 11 so we get E[X G] = n E[XI Bi ] I Bi P[B i ] i.e. the value of the conditional expectation on an atom B i is simply the average value of X on that atom. Link to elementary conditional probabilities: If P[B i ] > 0, then Define so we also have E[X B i ] = P[ B i ] = P[ B i]. P[B i ] Ω E[X G] = XdP[ B i ] = E[XI B i ] P[B i ] ( R!) n E[X B i ] I Bi (a r.v.!) (1) E[X B i ] is the value of E[X G] with respect to the atom B i. Remark.: 1. The same arguments work if G is countably generated. 2. If G is countably generated, one can use (1) as definition of E[X G] and then verify that it satisifies (i) and (ii) (see script). 3. We have not used Radon-Nikodým, so we can later use martingales and the martingale convergence to prove Radon-Nikodým. (That s the advantage of the L 2 approach.) Remark.: If G = σ(z) for some random variable Z, then any G-measurable random variable is of the form h(z) for some measurable h (see I.4). We then write E[X G] = E[X σ(z)] =: E[X Z] = h(z) and (with an abuse of notation) More carefully we should write h(z) =: E[X Z = z]. h(z) = E[X Z] Z=z. (This is similiar to using kernels and conditional probabilities.) III.2 Properties of Conditional Expectations Proposition III.2.1. Without explicit mention, always X 0 or X L 1 (P). Then: 1. E[E[X G]] = E[X]. 2. If X is G-measurable, then E[X G] = X

12 III.2 Properties of Conditional Expectations Linearity: Let X 1, X 2 L 1 and a, b R, then E[aX 1 + bx 2 G] = ae[x 1 G] + be[x 2 G]. In tedious detail, this means: if Y i is a version of E[X i G], then ay 1 +by 2 is a version of E[aX 1 + bx 2 G]. 4. Monotonicity: If X X P-almost surely, then E[X G] E[X G] P- almost surely. 5. Projectivity: If H G is a σ-field, then E[E[X G] H] = E[X H]. 6. If Z is G-measurable and both X, ZX are 0 or in L 1, then E[ZX G] = ZE[X G] P-almost surely. Proof. 1. Use (ii) for A = Ω. 2. obvious 3. clear from linearity of expectation 4. already proved in III Y := E[E[X G] H] is H-measurable, and for any A H, E[Y I A ] = E[E[X G] I A ] = E[XI A ]. 6. The right-hand side is G-measurable. To check (ii), use measure-theoretic induction: For Z = I B with B G, for any A G, and the rest as usual. E[I B E[X G] I A ] = E[E[X G] I B I A ] = E[XI B A ] Proposition III.2.2. Further properties: = E[I B XI A ] = E[E[I B X G] I A ] 1. Monotone convergence: 0 X n ր X P-a.s. implies E[X n G] ր E[X G] P-a.s.. 2. Fatou: If all X n 0 (or Z L 1 ), then P-a.s. E[lim inf X n G] liminf E[X n G] 3. Lebesgue: If lim X n = X P-a.s. and X n Z P-a.s. n, some Z L 1, then lim E[X n G] = E[ lim X n G] = E[X G].

13 III.2 Properties of Conditional Expectations Jensen: If u: R R convex with u(x) L 1, Proof. P-a.s. E[u(X) G] u(e[x G]) 1. Y n version of E[X n G] 0 Y n ր Y := lim Y n P-a.s. and Y is G-measurable; moreover, E[Y n I A ] = E[X n I A ], A G, n monotone convergence gives E[Y I A ] = E[XI A ] A G, so Y is a version of E[X G]. 2. U n = inf m n X m ր liminf X n P-a.s., n, and U n X m for all m n E[U n G] inf m n E[X m G] P-a.s., so P-a.s. E[lim inf X n G] = lim E[U n G] = sup E[U n G] sup n n N inf E[X m G] = liminf E[X n G] m n (Rest in the Script) Corollary III.2.3. The conditional expectation is a contraction on L p for any p 1: If X L p, then also E[X G] L p, and E[X G] L p X L p Proof. p 1 x u(x) := x p is convex Jensen: E[ X p G] E[X G] p P-a.s. To finish the proof, take expectations on both sides. How to compute conditional expectations? Typical situation X i : Ω S i F-Smeasurable, i = 1, 2. How do we find E[F(X 1, X 2 ) X 1 ]? Set S := S 1 S 2, S := S 1 S 2, then (X 1, X 2 ) is an S-valued random variable. Suppose the distribution of (X 1, X 2 ) has the form P 1 K for a probability measure P 1 on S 1 and a kernel K from (S 1, S 1 ) to (S 2, S 2 ). (Intuition: P 1 is the distribution of X 1 under P, K is the conditional distribution of X 2 under P given X 1, see III.3). Proposition III.2.4. In the above situation, for any F : S 1 S 2 [0, ) S 1 S 2 -measurable, E[F(X 1, X 2 ) X 1 ](ω) = F(X 1 (ω), x 2 )K(X 1 (ω), dx 2 ) =: h(x 1 (ω)) (2) S 2 with h(x 1 ) := F(x 1, x 2 )K(x 1, dx 2 ). S 2 Example.: If X 1, X 2 are independent under P, then the distribution of (X 1, X 2 ) is the product measure P 1 P 2, where P i is the distribution of X i under P. In that case, one can take K(x 1, ) P 2 [ ].

14 III.2 Properties of Conditional Expectations 14 Proof. x 1 h(x 1 ) is S 1 -measurable; see proof of construction of P 1 K. So RHS of (2) is h(x 1 ), h measurable so it is σ(x 1 )-measurable. Now take A σ(x 1 ); then I A is σ(x 1 )-measurable, so I A = g(x 1 ) with g: S 1 [0, 1] measurable, see I.4. So: E[F(X 1, X 2 )I A ] = E[F(X 1, X 2 )g(x 1 )] = fd(p 1 K) S 1 S 2 = g(x 1 ) F(x 1, x 2 )K(x 1, dx 2 )P 1 (dx 1 ) S 1 S 2 so h(x 1 ) = E[f(X 1, X 2 ) X 1 ]. Consequences: = E[g(X 1 )h(x 1 )] = E[h(X 1 )I A ] a) In the above situation, we can write the conditional expectation as an integral; this is useful to prove properties. b) K is indeed a conditional distribution of X 2 given X 1 : Choose F = I B2 with B 2 S 2 ; then P[X 2 B X 1 ](ω) := E[I B (X 2 ) X 1 ](ω) = K(X 1 (ω), B 2 ). Useful special case: Independence. Corollary III.2.5. G F σ-field, X 1 G-measurable, X 2 is independent of G. For any F : S 1 S 2 [0, ) S 1 S 2 -measurable, E[F(X 1, X 2 ) G](ω) = E[F(x 1, X 2 )] x1=x 1(ω) = h(x 1 (ω)) with h(x 1 ) := E[F(x 1, X 2 )]. In words: Fix known variable X 1, take expectation over independent variable. Proof. X 1, X 2 independent joint distribution is P 1 P 2 can take K(x 1, ) = P 2 [ ] in Proposition III.2.4 h is measurable RHS is measurable wrt. σ(x 1 ) G, so it is G-measurable. Moreover A G implies E[F(X 1, X 2 )I A ] = E[h(X 1 )I A ] because we can use measure-theoretic induction on F: If F = I B1 B 2 with B i S i then on the other hand so LHS = E[I B1 (X 1 )I B2 (X 2 )I A ] = E[I B2 (X 2 )]E[I B1 (X 1 )I A ] h(x 1 ) = I B1 (x 1 )E[I B2 (X 2 )] RHS = E[I B1 (x 1 )E[I B2 (X 2 )]I A ] = LHS so the assertion follows for F = I B1 B 2, etc.

15 III.2 Properties of Conditional Expectations 15 Example (Wald identities).: Suppose (X i ) i N is a sequence of (real) random variables and N is an N 0 -valued random variable. Consider the (doubly) random sum What are E[S N ], Var[S n ]. (S N )(ω) := N(ω) X i (ω) Random sums: (Y i ) i N sequence of random variables, N a random variable with values in N, S N := S := N Y i. 1. Suppose that the Y i are identically distributed and that N is independent of (Y i ). Then: E[S] = E[N] E[Y 1 ] since E[S] = E[E[S N]] and [ N ] (2.12) [ n ] n=n(ω) E[S N](ω) = E Y i N = E Y i = N(ω) E[Y 1 ]. 2. Suppose in addition that the Y i are in L 2 and independent (so: i.i.d.). Then: Var[S] = E[N] Var[Y 1 ] + (E[Y 1 ]) 2 Var[N] since E[S 2 ] = E[E[S 2 N]] and [ ( n E[S 2 N](ω) = E Using (1) the result follows. Y i)2] n=n(ω) = N(ω) Var[Y i ] + (N(ω)E[Y 1 ]) 2 n = Var[Y i ] + (ne[y 1 ]) 2 Example.: X 1,..., X n i.i.d. random variables in L 1, S n := n X i. Then (as should follow from symmetry). E[X 1 S n ] = S n n Proof. RHS is σ(s n )-measurable: (i) ok. Now take any bounded measurable f on R and compute E[X 1 f(s n )]. X 1,...,X n are i.i.d., so the distribution of (X 1,..., X n ) is product measurable w.r.t. P 1 P 1 = P1 n. So E[X 1 f(s n )] = x 1 f(x x n )P 1 (dx 1 ) P 1 (dx n ) R n and this expression on the right is invariant under a permutation of {1,..., n}. So E[X 1 f(s n )] = E[X i f(s n )] and so n E[S n f(s n )] = E[X i f(s n )] = n E[X 1 f(s N )]. Choose f(s n ) := I A, A σ(s n ), to get E[X 1 I A ] = E[ 1 n S ni A ], A σ(s n ).

16 III.3 Regular Conditional Distributions 16 III.3 Regular Conditional Distributions (Ω, F, P) probability space, (S, S) measure space, X : Ω S F-S-measurable (an S-valued random variable) and G F σ-field. Fix B S and define P[X B G] := E[I X B G]. This gives a G-measurable random variable with values in [0, 1]. So we get a mapping Ω S [0, 1] (ω, B) P[X B G](ω) but this is not well-defined because conditional expectations are only defined P-a.s. Question: can one choose/define this mapping to obtain a stochastic kernel, i.e., a) G-measurable in ω for fixed B S, b) a probability measure in B for fixed ω Ω? Why should there be a problem? Try to do it: We need for each ω σ-additivity in B, i.e. [ ] G n P X B i = P[X B i G] P-a.s. So for each B i, choose P[X B i G]; not well-defined on a nullset N(B i ). Now n I {X Bi} = I S X n ր I Bi X S Bi By monotone convergence, this gives [ P[X B i G] = P X ] G B i P-a.s., where the nullset where this may fail depends (via the N(B i )) on the sequence (B i ). So (3) may go wrong on a nullset, depending on (B i ). What we want is (3) simultaneously for all possible sequences (B i ) of disjoint sets, outside a (possibly large) nullset. For large enough S, there are uncountably many sequences (B i ); (3) can fail on a nullset for each such sequence, and the union of all these uncountably many nullsets is not under our control. So perhaps (huge) nullset outside of which (3) holds simultaneously for all (B i )-sequences. To get a positive answer, one needs some condition on (S, S). Definition (Regular conditional distribution). The r.c.d. of X given G is a stochastic kernel from (Ω, G) to (S, S) such that for each B S: K(, B) is a version of P[X B G], i.e., for each B S, (3) E[I {X B} G] = K(, B) P-a.s.

17 III.3 Regular Conditional Distributions 17 Proposition III.3.1. If S = R with S = B(R), then a r.c.d. of X given G exists. Proof. Use crucially that Q R is countable and dense. For each q Q, choose version V q (ω) of E[I {X q} G]. Set N 1 := {those ω, where q V q (ω) is not increasing on Q} := {V q > V r } q,r Q q<r Monotonicity of conditional expectation gives V q = E[I {X q} G] q<r E[I {X r} G] = V r P-a.s., so {V q > V r } is a P-nullset, so N 1 is a P-nullset. Next set N 2 := {those ω where q V q (ω) is not everywhere on Q right-cont.} := {lim V r V q } q Q r q r Q r>q Monotone convergence: r n ցց q V rn = E[I {X rn} G] ց E[I {X q} G] = V q P-a.s., so N 2 is also a P-nullset. Next set N 3 := { lim q q Q V q 0 or lim q + q Q V q 1} then N 3 is also a P-nullset. So N := N 1 N 2 N 3 has P[N] = 0, and N G, and for ω N, q V q (ω) is monotone on Q, right continuous on Q, and has limits 0 and 1 at or +. Now define F(ω, x) := G(x) limq x q>x q Q for any distribution function G on R. V q (ω) for w N for w N Then for each ω, x F(ω, x) is a distribution function on R by construction. Moreover, P[X x G](ω) = E[I {X x G](ω) = F(ω, x) P-a.s. for each x, i.e. F(, x) is a version of E[I {X x} G], x. (Note that since N G, ω F(ω, x) is G-measurable.) For each ω Ω, choose as K(ω, ) the probability measure on R with distribution function F(ω, ). Then this K is a kernel from (Ω, G) to (R, B(R)): a) K(ω, ) is a probability measure: by definition

18 III.3 Regular Conditional Distributions 18 b) K(, B) is G-measurable, B B(R): D := {A B(R) K(, A) is G-measurable} is a Dynkin system, and D {(, x] x R}, because K(, (, x]) = F(, x), and F is G-measurable in ω. So D = B(R). Finally, K(, B) is a version of E[I {X B} G] for each B B(R) (and then K is the desired r.c.d.): D := {A B(R) K(, A) = E[I {X A} G] P-a.s.} D is a Dynkin system (use monotone convergence), and D {(, x] x R}, because K(, (, x]) = F(, x) is a version E[I {X (,x]} G] by construction of F; so D = B(R). Theorem III.3.2. If (S, S) is a Borel space, a r.c.d. of X given G exists. Proof. φ: S A B(R) bijective and with φ, φ 1 both measurable. So φ(x) is R-valued r.c.d. K 0 of φ(x) given G. Then K(ω, B) := K 0 (ω, φ(b)) does the job: K is G-measurable, B, because K 0 is; moreover, φ(b) = {x R s B with φ(s) = x, i.e. s = φ 1 (x)} = (φ 1 ) 1 (B) So K(ω, B) = K 0 (ω, (φ 1 ) 1 (B)) K(ω, ) is the image measure on S of K 0 under φ 1 ; so K is a probability measure on (S, S). Finally, K(, B) = K 0 (, φ(b)) = E[I {φ(x) φ(b)} G] = E[I {X B} G] P-a.s.

19 19 IV Martingales IV.1 Definitions and Examples (Ω, F, P) probability space, I [0, ] an index set. Two typical cases: a) I = N 0 ( discrete time ) b) I = [0, ) or I = [0, T] ( continuous time ) Definition. A filtration F = (F t ) t I is an increasing family of σ-fields F t F t I, i.e. F s F t for s, t I, s t. Interpretation.: F t is the family of events observable up to time t, i.e. the information available at t. Definition. X = (X t ) t I stochastic process with values in S (i.e. X t : Ω S is F-S-measurable t I): collection of S-valued random variables, indexed by t I, all defined on (Ω, F). X adapted to F: each X t is F t -S-measurable (observable at time t). Definition (Martingale). A (real-valued) martingale (wrt. F, P) is a (realvalued) stochastic process M = (M t ) t I with i) M is F-adapted ii) M is P-integrable, i.e., M t L 1 (P) t I. iii) E[M t F s ] = M s P-a.s. s, t I, s t. Submartingale if in (iii) only holds, supermartingale if holds. Remark.: 1. Changes of martingales cannot be predicted: E[M t M s F s ] = t E[M t ] is constant, so martingales are on average constant. But they fluctuate a lot, pathwise For I = N 0, (iii) is equivalent to E[M n+1 F n ] = M n P-a.s., n. Indeed: E[M n+1 F n ] = k E [ ] E[M n+l M n+l 1 F n+l 1 ] F n + Mn = M n }{{} 0 l=1 Example (Class 1: Sums of independent centred random variables).: I = N 0, (Y i ) i N independent random variables in L 1, F n := σ(y 1,...,Y n ), F 0 := {, Ω}. Then n M n := (Y i E[Y i ]), n N 0, is a martingale.

20 IV.1 Definitions and Examples 20 Proof. (i), (ii): ok; E[M n+1 M n F n ] = E[Y n+1 E[Y n+1 ] F n ] = E[Y n+1 F n ] E[Y n+1 ] = 0 because Y n+1 is independent of F n. Example.: Simple random walk with parameter p: (Y i ) i.i.d. with values ±1 and p = P[Y i = +1]. Then E[Y i ] = 2p 1 and so M n := x + n Y i n(2p 1) =: x + S n n(2p 1) is a martingale, x R. For p = 1 2 this is called the symmetric random walk; in this case (S n ) is a martingale. Example (Class 2: Successive predictions).: I [0, ], F = (F t ) t I any filtration, Z L 1 (P) M t := E[Z F t ], t I, is a (P, F)-martingale. Proof. (i), (ii) ok; for s t, E[M t F s ] = E [E[Z F t ] F s ] = E[Z F s ] = M s P-a.s. Example.: (S n ) symmetric simple random walk, fix N N, I = {0,..., N} (or also I = N 0 ); F n = σ(y 1,..., Y n ) = σ(s 1,...,S n ). Choose Z = f(s N ). For n N, E[Z F n ] = f(s N ); for n < N, S N = S n + N i=n+1 Y i =: S n + U n,n where S n is F n -measurable and U n,n is independent of F n, so E[f(S N ) F n ] = E[f(s + U n,n )] s=sn =: h(s n ) with h(s) = E[f(s + U n,n )] = u f(s + u)p[u n,n = u]. Having k times a +1 means N n k times 1, so u(k) = 2k (N n) and h(s) = N n k=0 ( N n f(s + 2k N + n) k ) p k (1 p) N n k. See also the Cox/Ross/Rubinstein model in mathematical finance. Example (Class 3: Harmonic functions of Markov chains).: X = (X n ) n N0 Markov chain with state space (S, S) and transition kernel K (from (S, S) to (S, S)). Canonical construction: Ω = S N0, F = S N0, initial distribution µ (probability measure on (S, S)), kernels K n ((x 0,..., x n ), ) = K(x n 1, ) n; corresponding P µ via P (n+1) = P (n) K n, P (0) = µ; X n coordinates. Write P x := P δ{x}.

21 IV.2 Playing Systems, Stopping Times and Stopping Theorem 21 h 0 measurable on S is called harmonic (for K) if h = Kh, where (Kh)(x) := h(x)k(x, dy), x S ( mean value property ). S F n := σ(x 0,..., X n ) M n := h(x n ), n N 0, is a (P x, F)-martingale for every x S with h(x) <. Proof. (i) ok; (iii): E x [M n F n 1 ] =? M n 1 : joint distribution of (X 0,..., X n ) is P (n) = P (n 1) K n 1. So using III.2.4, E x [h(x n ) F n 1 ] = h(y)k(x n 1, dy) Now (ii): iterate to get S = (Kh)(X n 1 ) = h(x n 1 ) = M n 1 E x [M n ] = E x [M n 1 ] =... = E x [M 0 ] = E x [h(x 0 )] = h(x) < Example.: Simple random walk with parameter p: n S n = Y i, x Z Here, K(x, ) = p δ {x+1} + (1 p) δ {x 1}. is harmonic for K: ( ) x 1 p h(x) :=, x Z p (Kh)(x) = ph(x + 1) + (1 p)h(x 1) =... = h(x) ( ) So M n := h(x + S n ) = 1 p x+sn, p n N0, is a martingale (and 0). (For p 1 2 ; for p = 1 2 it s boring.) Trajectories suggest that (M n) converges to 0 a.s. This can be proved in two ways: by explicit computation and the strong law of large numbers (see script); or by a more general convergence theorem of martingales. IV.2 Playing Systems, Stopping Times and Stopping Theorem (Ω, F, P) probability space; filtration F = (F t ) t I. Start with I = N 0 and let X = (X n ) n N0 be a (real) stochastic process. Definition. H = (H n ) n N is predictable (wrt. F) if H n is F n 1 -measurable, n.

22 IV.2 Playing Systems, Stopping Times and Stopping Theorem 22 Definition. A playing system (for X) is a real-valued stochastic process H = (H n ) n N which is predictable and such that H n (X n X n 1 ) L 1 (P), n (e.g. if X is integrable, H bounded is enough). Then we define (H X) n := n H k (X k X k 1 ) (n N 0 ) k=1 and call H X = ((H X) n ) n N0 the stochastic integral of H wrt. X. Notation.: Increment X k X k 1 =: X k ; so H X = H k X k is the discrete-time version of the integral of H (integrand) wrt. X (integrator). Interpretation.: Think of X k = X k X k 1 as gain/loss of a game in round k. Then H represents a betting system: for round k (from k 1 to k), place amount H k as bet. This may use past information, but should not depend on the outcome of round k; so H k must be F k 1 -measurable. Then H k X k gives winnings/losses from bet in period k, and H X is the cumulative balance evolution. Example.: S n = n Y i simple random walk, F generated by Y, M n = x+s n, x Z. Set H 1 := 1 (bet 1 initally) and { 2 n 1 if Y 1 = Y 1 = = Y n 1 = 1 H n := 0 otherwise. so keep on doubling until we win, i.e. until Then (H M) n (ω) = T(ω) = inf{n N Y n = +1}. { n k=1 2k 1 ( 1) = (2 n 1) if n < T(ω) (2 T(ω) 1 1) + 2 T(ω) 1 = +1 if n T(ω) This system looks successful but it cannot be so, at least not on average! More precisely, we have: Theorem IV.2.1. Let X be a (sub-/super-)martingale and H a playing system. 1. If X is a martingale, then H X is also a martingale. 2. If X is a sub- or super-martingale and H 0, H X is again a sub- or super-martingale. Proof. Clearly, H X is adapted and integrable. For (iii), write E[(H X) n (H X) n 1 F n 1 ] = E[H n (X n X n 1 ) F n 1 ] and this implies (1) and (2). = H n E[X n X n 1 F n 1 ]

23 IV.2 Playing Systems, Stopping Times and Stopping Theorem 23 Special type of betting: place unit bet until some random time T(ω), then stop: H n (ω) := I n T(ω). For H to be predictable, we want {T n} F n 1, or equivalently {T n 1} F n 1. Definition. Take I [0, ). A stopping time (wrt. F) is a mapping T : Ω I { } with {T t} F t, t I. Interpretation.: Decision on stopping depends only on available information. Remark.: For I = N 0, T stopping time {T = n} F n, n: {T n} = shows, and follows from n {T = k} }{{} F k F n k=1. F n 1 F n {T = n} = {T n} {T n 1} }{{}}{{} F n Example (Canonical example).: X = (X n ) n N0 adapted process with values in (S, S), A S and T A is a stopping time: T A (ω) := inf{n N 0 X n (ω) A} {T A n} = = first hitting time of X to set A n {X k A} F n, n k=0 Remark.: For I = [0, ), this is still (almost) true: one needs a bit of regularity on trajectories of X, and the proof is very difficult. Example (Canonical non-example).: Time of last visit to A, is not a stopping time in general. L A (ω) := sup{n N 0 X n (ω) A} Definition. X = (X t ) t I stochastic process, T : Ω I mapping ( random time ). We define X T : Ω R ω (X T )(ω) := X T(ω) (ω), the value of X at time T. For T : Ω I { } the stopped process is X T = (X T t ) t I, defined by X T t := X T t. Definition. F = (F t ) t I filtration, T a stopping time F T := {A F A {T t} F t, t I} σ-field of events observable up to time T. Now we return to I = N 0.

24 IV.2 Playing Systems, Stopping Times and Stopping Theorem 24 Theorem IV.2.2 (Stopping theorem I). 1. Suppose X = (X n ) n N0 is a (sub-/super-) martingale and T a stopping time (values in N 0 = N 0 {+ }). Then X T is again (sub-/super-) martingale. 2. Suppose X = M is a martingale and T a stopping time. Then E[M T n ] = E[M 0 ], n N 0. This implies E[M T ] = E[M 0 ] if we have a) T is bounded, i.e. T N P-a.s., for some N N; or b) T < P-a.s. and (M T n ) n N0 is uniformly integrable. Remark.: For the doubling system, we had (H S) 0 = 0 and (H S) T = 1, so something in the theorem is violated. Proof. 1. H n := I {n T } is a playing system and 0, so H X is again (sub-/super-) martingale. But (H X) n = n n T H k X k = 1 X k k=1 k=1 = X n T X 0 = X T n X 0 2. M martingale, T stopping time implies by (1): M T is a martingale, so E[M 0 ] = E[M T 0 ] = E[MT n ] = E[M T n] As n, T n T P-a.s. under both (a) and (b), so M T n M T P-a.s. So it remains to prove E[M T ] = lim E[M T n ]. In case (b), (M T n ) n N0 is UI and converges P-a.s., hence also convergent (to M T ) in L 1 : done. In case (a), M T n max k=0,...,n M k L 1 and we can use Lebesgue. Remark.: In general neither (a) nor (b) holds, so you will have to (similiarly) find a proof that the expected values converge. Example (Ruin problem).: X n = x + S n simple random walk with x Z, parameter p. Take a, b Z with a < x < b and set T a,b := inf{n N 0 X n / (a, b)}. Interpretation: A gambler plays against the bank with unit bets and starting capitals x a and b x, resp. At T a,b, one of the two is ruined the gambler at the bottom, the bank at the top. What is the probability of ruin for the player? T a,b is a stopping time (for F generated by Y or by X), and T a,b < P-a.s. (Borel-Cantelli). We want to find r(x) = P[x + S Ta,b = a]

25 IV.2 Playing Systems, Stopping Times and Stopping Theorem 25 Case p = 1 2 : S is a martingale ST a,b is also a martingale and bounded, hence UI x = E[x + 0] = E[x + S Ta,b ] = ar(x) + b(1 r(x)) r(x) = b x b a i.e. the ratio of bank s initial capital to total initial capital. Case p 1 2 : M n := h(x + S n ) with h(x) = ( 1 p p ) x is martingale M T a,b is also a martingale E[M 0 ] = E[M Ta,b n], n; M T a,b is bounded h(x) = E[M 0 ] = E[M Ta,b ] = h(a) r(x) + h(b) (1 r(x)) r(x) = h(b) h(x) h(b) h(a) = 1 ( 1 ( p p 1 p )b x 1 p )b a For p < 1 2 (game is unfair for player), the denominator is < 1, so r(x) 1 ( p 1 p )b x, and this depends only on the initial capital b x of the bank. So the bank can make the probability of ruin arbitrarily close to 1, uniformly over all players, by choosing a large initial capital. Example.: Roulette, p = 18 38, b x = 66 r(x) (!). Example (Sums of random variables).: F = (F n ) n N0 on (Ω, F, P); (Y i ) i N random variables with E[ Y i ] m <, E[Y i ] m, Var[Y i ] σ 2 <. Moreover, (Y i ) is adapted to F and Y i is independent of F i 1, i. (E.g. (Y i ) i N i.i.d. in L 2 and F n := σ(y 1,..., Y n ).) Let T be a stopping time wrt. F with E[T] <. Then we have the generalized Wald identities: For S T := T Y i, 1) S T L 1 and E[S T ] = me[t]. 2) If m = 0, then E[S 2 T ] = σ2 E[T]. 3) If T L 2 and T is independent of the Y i, then Proof. Var[S T ] = σ 2 E[T] + m 2 Var[T]. a) M n := S n nm = n (Y i m) is a martingale stopping theorem: so we need to exchange E and lim. E[S } T {{ n ] = m E[T n] }}{{} S T b) Integrable upper bound for (S T n ) n N : M n := S n n m := րe[t] n ( Y i m) is a martingale with monotone integration: E[ S T n ] }{{} րe[ S T] = m E[T n] }{{} րe[t]

26 IV.2 Playing Systems, Stopping Times and Stopping Theorem 26 E[ S T ] = me[t] <, so S T L 1. And now S T n S T n Lebesgue gives E[S T ] = lim E[S T n] (a) = me[t] which proves (1). c) U n := M 2 n nσ2 is martingale: M n+1 M n = Y n+1 m U n+1 U n = (Y n+1 m) 2 + 2M n (Y n+1 m) σ 2, and this has conditional expectation 0 given F n. Hence stopping theorem gives E[MT 2 n ] = σ2 E[T n], n }{{} րe[t]< Fatou: E[M 2 T ] liminf E[M2 T n ] = σ2 E[T] <, so M T L 2. Moreover, sup n E[M 2 T n ] σ2 E[T] <, so (M T n ) n N is bounded in L 2 and therefore UI. So M T = (M T n ) n N is a martingale and UI; so stopping theorem, part II (see next section) gives so, by Jensen E[M T F n ] = E[M T F n] = M T n = M T n So (M 2 T n ) n N is UI, since M 2 T L1. Therefore (M 2 T n M2 T For m = 0, M = S, so we get (2). 0 M 2 T n E[M2 T F n], n. P-a.s. as n, it is UI E[M 2 T ] = lim E[M2 T n ] = σ2 E[T] d) If m is arbitrary, first use (2) for Ỹi := Y i m; so σ 2 E[T] = E[ S 2 T ] = E[S2 T ] 2mE[TS T] + m 2 E[T 2 ] using T L 2. Now compute (T independent of (Y i )): E[TS T ] = E[E[TS T T]] = E [ E[nS n ] n=t ] = E[T 2 m] E[S 2 T] = σ 2 E[T] + m 2 E[T 2 ] Remark.: Suppose m = 0 so that S n = n Y i is a martingale. For c > 0, define T c := inf{n N 0 S n c}. Then E[T c ] = +, and the symmetric result holds for c < 0 (with... c). Indeed: If we had E[T c ] <, (1) would give E[S Tc ] = 0, but S Tc c. In particular: (S n ) symmetric simple random walk (p = 1 2 ) T 1 := inf{n N 0 S n = 1} T 1 := inf{n N 0 S n = 1} both have E[T 1 ] = E[T 1 ] = + (!). Later (Section 4): T ±1 < P-a.s.

27 IV.3 The Convergence Theorem 27 IV.3 The Convergence Theorem (Ω, F, P), I = N 0, F = (F n ) n N0, X = (X n ) n N0. Fix a < b and consider upcrossings of X across (a, b) up to N: Time intervals S i < T i where the process travels from a or below to b or above. Formally: S 0 := T 0 := 0, (so X Sk a, X Tk b). Finally, S k (ω) := inf{n T k 1 (ω) X n (ω) a} = beginning of upcrossing k T k (ω) := inf{n S k (ω) X n (ω) b} = end of upcrossing k U N a,b(ω) := sup{k T k (ω) N} If X is adapted to F, the S k, T k are stopping times. Lemma IV.3.1. If X is a supermartingale, then Proof. Define E[U N a,b] 1 b a E[(X N a) ] H n := I {Sk <n T k }. k=1 Then H is predictable, because S k, T k are stopping times, and bounded, hence H is a playing system for X. Intuition: bet +1 during each upcrossing, 0 otherwise. Moreover, H 0, so H X is again a supermartingale, so 0 E[(H X) N X 0 ] where (H X) N X 0 = X Tk N X Sk N Ua,b N (b a) + X N X (SU N +1 ) N a,b k=1 If S U N a,b +1 < N, then X (SU N a,b +1 ) N a; if S U N a,b +1 N, then X (SU N a,b +1 ) N = X N. So This gives the assertion. X N X (SU N a,b +1 ) N (X N a) As N, U N a,b ր U a,b := total number of upcrossings of X through (a, b). Monotone integration gives for supermartingales RHS is finite if E[U a,b ] 1 b a sup E[(X N a) ] N N sup E[X N ] < (4) N N or, equivalently for a supermartingale, (X n ) n N0 is bounded in L 1. Supermartingale E[X 0 ] E[X n ] = E[X + n ] E[X n ] E[X n ] E[ X n ] = E[X + n ] + E[X N ] E[X 0] + 2E[X n ]

28 IV.3 The Convergence Theorem 28 Theorem IV.3.2 (Convergence theorem, Doob). Any supermartingale X with sup N N E[X N ] < converges P-a.s. to some X L 1. In particular, any (super)martingale 0 converges P-a.s. Proof. {lim inf X n < limsup X n } {U a,b = + } a<b a,b Q But E[U a,b ] < by (4), a < b, so P[U a,b = + ] = 0 for each pair a < b P[liminf X n < limsup X n ] = 0: convergence P-a.s. Moreover, so X L 1. E[ X ] liminf E[ X n ] sup E[ X N ] < N N X = (X n ) (n N 0 ) is martingale wrt. F = (F n ) (n N 0 ). Suppose X is bounded in L 1 convergence theorem: X := lim X n P-a.s., and X L 1. Define F := σ( n N 0 F n ) X is F -measurable. Set N 0 := N 0 {+ }. Then (X n ) n N0 is adapted to (F N ) n N0 and integrable. Is this still a martingale on N 0? Theorem IV.3.3. For a martingale X = (X n ) n N0, TFAE: 1. Y L 1 with X n = E[Y F n ], n N 0 (i.e. the X n are successive predictions). 2. (X n ) n N0 converges in L 1 (to some F -measurable random variable). 3. X L 1 (P, F ) such that (X n ) n N0 is a martingale on N (X n ) n N0 is uniformly integrable. Moreover, we then have X = E[Y F ]. Remark (Terminology).: X as above is called closable on the right, and X closes (X n ) n N0 on the right as a martingale. Remark.: In the same way: X = (X n ) n N0 supermartingale is closable on the right as a supermartingale iff (X n ) n N 0 is UI. Proof. (1) (4) : Y L 1 {E[Y G] G F σ-field} is UI (see previous exercise). (4) (2) : (X n ) UI (X n ) bounded in L 1 (X n ) converges P-a.s. (X n ) converges in L 1, since it is UI. (2) (3) : X n X in L 1, A F n fixed X n+m I A X I A colon(m ) in L 1 E[X I A ] = lim m E[X n+mi A ] = lim m E[X ni A ] = E[X n I A ] A F n, n X n = E[X F n ] P-a.s. (3) (1) : Take Y := X.

29 IV.3 The Convergence Theorem 29 Finally, as in (2) (3), for all n, A F n, E[X I A ] = E[X n I A ] = E[Y I A ] hence this holds for all A n N 0 F n, this is -closed and generates F, so it still holds for A F X = E[Y F ] P-a.s. Application: generalize stopping theorem to unbounded stopping times in a different form. Indeed: M martingale E[M t F s ] = M s P-a.s. for s t, s and t deterministic. Goal: still true for s t by stopping times S T. Recall that Exercise: S T F S F T. F T := {A F A {T t} F t, t I} Moreover: I = N 0, A F T, T stopping time A {T = k} F k, k, because A {T = k} = A {T k} {T > k 1} = A {T k} {T k 1} F k. Theorem IV.3.4 (Stopping theorem II). Let M = (M n ) n N0 be a martingale and UI. If S, T are stopping times ( N 0 -valued) with S T, then M S, M T L 1 and E[M T F S ] = M S P-a.s. Proof. F S F T we prove only M T = E[M F T ] (and then use projectivity). Moreover, M T is F T -measurable only need E[M T I A ] = E[M I A ], A F T Write M = lim M I {T n} =: lim U n, M T = lim M TI {T n} =: lim V n. Then U n = M I {T=k}, V n = M k I {T=k}, k=0 k=0 and M k = E[M F k ] gives for A F T E[M I {T=k} I A ] = E[M I A {T=k} }{{} F k ] = E[M k I {T=k}I A ], so E[U n I A ] = E[V n I A ]. It remains to interchange E and lim n, for both U n, V n : 1. U n I A = M I {T n} I A M I A (n ) P-a.s., U n I A M, M L 1 can use Lebesgue. 2. V n I A = M T I {T n} I A M T I A (n ) P-a.s., V n I A M T can use Lebesgue as soon as M T L 1.

30 IV.4 Applications V n ր M T E[ M T ] = lim E[ V n ] lim n E[ M k I {T=k} ] M k = E[M F k ] M k E[ M F k ] E[ M k I {T=k} ] E[ M I {T=k} ] gives E[ M T ] lim = k=0 k=0 n E[ M I {T=k} ] E[ M I {T=k} ] k=0 [ ] = E M I {T=k} k=0 = E[ M ] <. IV.4 Applications IV.4.1 Simple Random Walk S n = n Y i simple random walk with p = 1 2 (symmetric). Then S = (S n) n N is a martingale, but cannot converge, because S n+1 S n = 1. Nevertheless, the convergence theorem is useful: For c Z, let T c := inf{n N 0 S n = c}. Then S Tc is a martingale and bounded above (if c > 0) or below (if c < 0), so the convergence theorem applies to S Tc : lim S T c n Z P-a.s. But this implies P[T c < ] = 1, c Z, and then P[lim inf S n =, limsup S n = + ] = 1 i.e. S oscillates unboundedly in both directions with probability 1. IV.4.2 Dirichlet Problem and Markov Chains (S, S) measure space, K kernel from (S, S) to (S, S). For h: S R measurable and 0 or bounded, define (Kh)(x) := h(y)k(x, dy), x S (again a measurable function). Call h harmonic if Kh = h. S Now fix A S and g: S R measurable and bounded. Dirichlet problem for (A, g): Find a function h such that

31 IV.4 Applications h is harmonic on A, i.e. Kh = h on A. 2. h = g on A. Convergent hom. Markov chain X = (X n ) n N0 with transition kernel K (see IV.1), constructed as coordinate process on Ω = S N0 ; write P x for corresponding distribution on (Ω, F), F = S N0, with initial distribution µ = δ {x}, x S. Define T A := inf{n N 0 X n A} as the first hitting time of X to A, and F n := σ(x 0,..., X n ). Theorem IV.4.1. Suppose P x [T A < ] = 1, x S (or x A). Then h(x) := E x [g(x TA )] is the unique bounded solution of the Dirichlet problem for (A, g) (can then use a numerical simulation). Proof. Uniqueness: write T := T A and suppose f is a solution. Then M n := f(x T n ) is a martingale under P x, x S, and bounded: x A T = T A 0 M n f(x). x A: E x [M n+1 F n ] = E x [f(x T (n+1) )(I {T n} + I {T>n} ) F n ] = I {T n} f(x T n ) + I {T>n} E x [f(x n+1 ) F n ] = I {T n} f(x T n ) + I {T>n} (Kf)(X n ) and finally, because x A, T = T A > n and f harmonic on A: E x [M n+1 F n ] = I {T n} f(x T n ) + I {T>n} f(x n ) = f(x T n ) = M n M bounded, so by the stopping theorem, f(x) = E x [M 0 ] = E x [M T ] = E x [f(x TA )] = E x[g(x TA )]. Existence: h is bounded, x A T A = 0 P x -a.s. h(x) = E x [g(x TA )] = g(x), so h = g on A. To show that h is harmonic, we use the Markov property: define shift θ: Ω Ω by Then (x 0, x 1, x 2,...) = ω θω := (x 1, x 2, x 3,...) E x [b(x) θ F 1 ](ω) = E X1(ω)[b(X)] := E z [b(x)] z=x1(ω) for any b: Ω R measurable and bounded or 0. So: (Kh)(x) = h(y)k(x, dy) S [ ] = E x [h(x 1 )] = E x E X1 [g(x )] TA ] ] = E x [E X1 [g(x ) θ F TA 1] = E x [g(x ) θ TA

32 IV.4 Applications 32 But for x A: T = inf{n N 0 X n A} = inf{n 1 X n A} = inf{n 0 (X θ) n A = T θ g(x T ) θ = g(x T ) P x -a.s. for x A. So Kh(x) = E[g(X T ] = h(x). IV.4.3 Unfair Games (S n ) symmetric simple random walk admits a simple winning strategy: T 1 := inf{n N 0 S n = +1} < P-a.s. simply bet on 1 and wait until this happens. Two problems: you might starve because E[T 1 ] = + you need a really generous sponsor, because S T1 is not bounded below (indeed: if it were, then S T1 would be bounded since S T1 1 S T1 would be a bounded martingale 1 = E[S T ] = E[S 0 ] = 0, a contradiction). So the above strategy is not practically feasible but in theory it is ok. More realistic situation: Call (V n ) n N balance evolution in a game. Assume: a) V n 0 (no debts allowed) b) E[V n ] < and E[V n+1 F n ] V n (unfair game; here F n = σ(v 0,..., V n )) c) δ > 0 with either V n+1 = V n or V n+1 V n δ P-a.s. (minimal gain of δ each round) Terminology: We play in round n if V n+1 V n δ. Now let T := sup{n N 0 V n V n 1 δ} = number of last round in which we play Theorem IV.4.2. Under the above assumptions we get: 1) T < P-a.s. (we have to stop playing in finite time) 2) E[V T ] E[V 0 ] Proof. V supermartingale, V 0 by b,a (V n ) converges P-a.s. So so this says T < P-a.s. Moreover: P[ V n V n 1 δ -often] = 0, E[V T ] = E[ lim V T I {T n}] = lim E[V T I {T n}] = lim E[V ni {T n}]

33 IV.4 Applications 33 and because V n I {T n} V n, so E[V T ] lim E[V n] E[V 0 ]. IV.4.4 Martingales with Bounded Increments Theorem IV.4.3. Suppose M is a martingale with respect to F such that sup n N M n := sup M n M n 1 L 1 (P) n N Then with probability 1, the trajectories of M are either convergent (to a finite limit) or oscillating between and +, i.e. with we have P[C O] = 1. C := {ω lim M n(ω) =: M (ω) R} O := {ω liminf M n(ω) =, limsup M n (ω) = + } Proof. Fix a < 0, a Z. Define T a := inf{n N 0 M n a}. Then M Ta because is a martingale to which we can apply the convergence theorem, { Mn Ta = M = M 0 if M 0 a T a n a sup M n if M 0 > a and so sup n E[(Mn Ta) ] <. So M Ta converges P-a.s. to some finite limit, so M n converges to some finite limit P-a.s. on {T a = } and so: P[C {liminf M n > }] P[C {T a = }] = 0 }{{} a Z 0 Analogously so the assertion follows. P[C {limsupm n < }] = 0 Corollary IV.4.4 (generalized Borel-Cantelli). F = (F n ) n N0 filtration, A n F n n, A = n N m n A m = many of the A n occur. Then { } A = P[A n F n 1 ] = + n=1 P-a.s. (5) Classical case is special case: If A n independent of F n 1, n, then P[A n F n 1 ] = P[A n ] and so (5) becomes { Ω if n=1 A = P[A n] = + if n=1 P[A n] <

34 IV.4 Applications 34 Proof. M n = n (I Ak P[A k F k 1 ]) k=1 is martingale with sup n M n 2 L 1 (P) can apply previous theorem. Now on C, we have I A = 1 def I Ak = + k=1 on C P[A k F k 1 ] = + k=1 and on O, we must have k=1 I A k = + and k=1 P[A k F k 1 ] = +. Because P[C O] = 1, the assertion follows. Example.: Consider MC X = (X n ) n N0 with state space S = {0,..., N} and transition kernel K(x, y) := K(x, {y}). Assume: (i) For x {1,...,N 1}, K(x, x) 1 (interior points are not absorbing) (ii) y S yk(x, y) = x, x S (the function x h(x) = x is harmonic for K) Choose x = 0 or x = N to get from (ii) that K(0, 0) = K(N, N) = 1: endpoints are absorbing. Denote by T := inf{n N 0 X n = 0 or X n = N} the time to absorption. Then for all x S, a) P x [T < ] = 1. b) P x [X T = N] = x N. Proof. X n = h(x n ) is by (ii) a martingale (under P x ) and bounded, so it converges P x -a.s. to some limit. This means that X n (ω) const. for n n 0 (ω), P x -a.s. Moreover, on {X n = y} for 0 < y < N so P[X n+1 X n F n ] = 1 K(y, y) 0 P[X n+1 X n F n ] = + n=1 therefore also X n+1 X n -often (P x -a.s.). So P x -a.s. the limit X cannot be a y with 0 < y < N T < P x -a.s. And now: x = E x [X 0 ] = E x [X T ] = 0 P x [X T = 0] + N P x [X T = N], solve for P x [X T = N] and the result follows. Example (voter model).: N people, each of these can vote for one of two parties. Behaviour: if party 1 at last election had x votes, then each person votes for 1 with probability x N, independently of each other.

A D VA N C E D P R O B A B I L - I T Y

A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2