Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script

Size: px
Start display at page:

Download "Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script"

Transcription

1 Wahrscheinlichkeitstheorie Prof. Schweizer Additions to the Script Thomas Rast WS 06/07 Warning: We are sure there are lots of mistakes in these notes. Use at your own risk! Corrections and other feedback would be greatly appreciated and can be sent to If you report an error please always state what version (the first number on the Id line below) you found it in. For further information see: Introduction This was going to be a short collection of all chapters explicitly mentioned as not in the script, but as the lecture turned out to use different notation almost everywhere, I started typing it all. So the first few sections are just additions to the script, and more importantly, many small additions have been left out deliberately; if you think they should contain the complete notes, feel free to copy someone s hand-written lecture notes and TEX them! Starting with Martingales, however, the notes should be complete and any missing subsections would be an error. Contents II.2 Construction of (Discrete-Time) Stochastic Processes II.3 Kolmogorov s Consistency Theorem IIIConditional Expectation 8 III.1 Definition and Construction III.2 Properties of Conditional Expectations III.3 Regular Conditional Distributions IV Martingales 18 IV.1 Definitions and Examples IV.2 Playing Systems, Stopping Times and Stopping Theorem

2 2 IV.3 The Convergence Theorem IV.4 Applications IV.5 Branching Processes IV.6 Supermartingales and Inequalities IV.7 Differentiation of Measures, Radon-Nikodým V Weak Convergence of Probability Measures 50 V.1 Integration à la Daniell-Stone V.2 Weak Convergence of Probability Measures on Metric Spaces.. 51 V.3 Tightness and Prohorov s Theorem V.4 Weak Convergence on C[0, 1] V.5 Characteristic functions $Id$

3 II.2 Construction of (Discrete-Time) Stochastic Processes 3 II.2 Construction of (Discrete-Time) Stochastic Processes Goal: construct a model for a system with state space (S, S), possibly very many coordinates λ Λ, and describing stochastic behaviour. First we look for necessary conditions for such a model. Start with a (S, S) measurable, Λ an index set, (Ω, F, P) a probability space and Y = (Y λ ) λ Λ a stochastic process (a (Ω, F, P) indexed by Λ with state space S): each Y λ is an S-valued random variable, and all are defined on Ω, i.e. Y λ : Ω S is measurable (F-S) λ Λ. What is the distribution of Y under P, and what properties does it have? View Y as a mapping from Ω to the product space S Λ := {x = (x λ ) λ Λ x λ S λ} with coordinate maps X λ : S Λ S, x X λ (x) := x λ, and σ-field S Λ := σ(x λ, λ Λ) = σ ({{X λ A} λ Λ, A S}) so each X λ is an S-valued random variable on (S Λ, S Λ ). (Note that for Λ uncountable, S Λ is considerably smaller than the product σ- field λ Λ S.) A -closed generator of S Λ is given by the family Z of cylinder sets Z Z := A i S = {x x i A i, i I} A i S, I Λ finite i I λ I Canonical identification of Z S Λ with i I A i S I. The distribution of Y under P is Q := P Y 1 on (S Λ, S Λ ). In the same way, we can introduce the distribution Q (I) of (Y i ) i I under P: view (for I Λ finite) (Y i ) i I as a mapping Ω S I (with σ-field S I = i I S since I is finite) and denote by Q (I) the image of P under that map. Q (I) is called the marginal distribution of (Y i ) i I or of Y on S I. {Q (I) I Λ finite} is a system of (finite-dimensional) marginal distributions of Y under P. The Q (I) have consistent (or projective) structure: Take I J Λ finite and call π J I the projection from S J to S I. Then Q (I) = Q (J) (π J I ) 1.

4 II.2 Construction of (Discrete-Time) Stochastic Processes 4 Formally, S I is generated by cylinders i I A i, and Q (I)[ i I A i ] def = P[Y i A i, i I] = P[Y i A i, i I, and Y i S, j (J \ I)] = Q (J)[ (π J I ) 1( )] A i i I = (Q (J) (π J I ) 1) [ ] A i For describing the behaviour of Y, it is equivalent to know P or Q. How would one construct a model (Ω, F, P) and Y with given probabilistic behaviour? Canonical choice: Ω := S Λ, F := S Λ, Y := X (coordinate maps); then we only need to construct P (or Q). There are two basic cases for the given probabilistic behaviour : 1. Prescribe all finite-dimensional marginals: Given is a collection Q (I), I Λ finite, of probability measures on (S I, S I ); look for P on (Ω, F) such that the projection of P to (S I, S I ) agrees with Q (I) for all I Λ finite. Interpretation: for each finite subsystem, we prescribe a local model, and look for a global model with the given local behaviour. Problems: existence? uniqueness? of P A necessary condition for existence is that the Q (I) are consistent. If S is nice, this is sufficient, see II.3. Uniqueness is easy, see the lemma below. 2. Prescribe initial distribution and successive transition probabilities: This is typical for temporal evolution in discrete time, with Λ countable indexed as N 0. More precisely, suppose (S i, S i ) are measurable spaces, i N 0. (Idea: at each state λ = i N 0, can allow different state spaces.) Define ( n ) n (S n, S n ) := S i, S i. i=0 i I i=0 This approach is elaborated in the script, section II.2. Lemma II.2.1. A probability measure P on (S Λ, S Λ ) is uniquely determined by its finite-dimensional margins: For any consistent family {Q (I) I Λ finite} of finite-dimensional marginals, there is at most one P on (S Λ, S Λ ) with these marginals, i.e. with P (SI,S I ) = Q (I) I Λ finite.

5 II.2 Construction of (Discrete-Time) Stochastic Processes 5 Proof. Almost obvious: for any Z Z, Z = ( i I A ) ( i λ I S ), we have P[Z] = Q (I)[ ] A i so P is defined on Z, and Z is -closed and generates S Λ. i I

6 II.3 Kolmogorov s Consistency Theorem 6 II.3 Kolmogorov s Consistency Theorem Goal: Construct a model for stochastic processes with arbitrary index set. In preparation, a general comment on the structure of probability measures on product spaces: (S i, S i ), i = 1, 2, measure space, P 1 probability measure on (S 1, S 1 ), K transition kernel from (S 1, S 1 ) to (S 2, S 2 ) P = P 1 K on (S 1 S 2, S 1 S 2 ). Conversely, in nice situations every probability measure Q on (S 1 S 2, S 1 S 2 ) has this structure. Definition. S is a Polish space if it is metrizable with a metric d such that (S, d) is a complete and separable metric space. (S, S) is a Borel space if there exist A B(R) and φ: S A such that φ is a bijection and φ, φ 1 are both measurable. Proposition II.3.1. Let (S 1, S 1 ) be a measure space and S 2 a Polish space with S 2 = B(S 2 ). Then every probability measure Q on (S 1 S 2, S 1 S 2 ) has the form Q = Q 1 K for a probability measure Q 1 on (S 1, S 1 ) and a kernel K from (S 1, S 1 ) to (S 2, S 2 ). Remark.: This is also true if (S 2, S 2 ) is Borel. Proof. X i coordinate maps, i = 1, 2. Choose Q 1 := Q X1 1, the distribution of X 1 under Q, and K a regular conditional distribution of X 2 given X 1. See later. In special cases, we can construct K explicitly: Example.: Suppose S 1 is countable, S 1 = 2 S1. Then define { Q[X 2 X 1 = x 1 ] if Q[{X 1 = x 1 }] 0 K(x 1, ) := any p.m. on (S 2, S 2 ) if Q[{X 1 = x 1 }] = 0 K is a kernel, and Q = Q 1 K: indeed, Q[A 1 A 2 ] = Q[X 1 = x 1, X 2 A 2 ] x 1 A 1 = Q[X 1 = x 1 ] Q[X 2 A 2 X 1 = x 1 ] x 1 A 1 Q[{X 1=x 1}] 0 = x 1 A 1 Q 1 [{x 1 }] K(x 1, A 2 ) = (Q 1 K)(A 1 A 2 ) Example.: Suppose Q a product measure on (S 1 S 2, S 1 S 2 ), i.e. m i on (S i, S i ) with Q m 1 m 2, i.e., Q[A] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ), A S 1 S 2, for some f : S 1 S 2 [0, ) measurable with S 1 S 2 f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) = 1. A

7 II.3 Kolmogorov s Consistency Theorem 7 Then Q 1 = Q X1 1 m 1 ; in fact Q 1 [A 1 ] = Q[A 1 S 2 ] = so Q 1 m 1 with density f 1. How to get K? Intuition: A 1 ( ) f(x 1, x 2 )m 2 (dx 2 ) m 1 (dx 1 ) S } 2 {{} :=f 1(x 1) Q [ X 1 (x 1, x 1 + dx 1 ], X 2 (x 2, x 2 + dx 2 ] ] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) gives Q [ X 2 (x 2, x 2 + dx 2 ] X 1 (x 1, x 1 + dx 1 ] ] = f(x 1, x 2 )m 1 (dx 1 )m 2 (dx 2 ) f(x 1 )m 1 (dx 1 ) So now define K(x 1, dx 2 ) := f 2 (x 2 x 1 )m 2 (dx 2 ) with { f(x1,x 2) f f 2 (x 2 x 1 ) := 1(x 1) if f 1 (x 1 ) 0 g(x 2 ) if f 1 (x 1 ) = 0 for an arbitrary measurable g: S 2 [0, ) with S 2 g(x 2 )m 2 (dx 2 ) = 1. f(x 2 x 1 ) is called the conditional density of X 2 with respect to m 2 given X 1 = x 1. Then K is a kernel (by construction), and Q = Q 1 K; the argument is like for the countable case, with integrals replacing sums. Example.: Suppose (X 1, X 2 ) has a bivariate normal distribution with parameters µ 1, µ 2, σ1 2, σ2 2, ρ; this means S 1 = S 2 = R, S 1 = S 2 = B(R), m 1 = m 2 = λ (Lebesgue measure), ( 1 f(x 1, x 2 ) = 2πσ 1 σ exp ξ2 1 2ρξ 1ξ 2 + ξ 2 ) ρ 2 2(1 ρ 2 ) with ξ i = (x i µ i )/σ i, = ( ) ( 1 exp ξ πσ πσ 2 }{{} 2 (1 ρ 2 ) exp (x2 a(x 1 )) 2 ) 2σ2 2(1 ρ2 ) }{{} =:f 1(x 1) =:f 2(x 2 x 1) where f 1 is the density of N(µ 1, σ 2 1 ); f 2 is the density of N(a(x 1 ), σ 2 2 (1 ρ2 )); and a(x 1 ) := µ 2 + ρ σ 2 σ 1 (x 1 µ 1 ). Theorem II.3.2 (Kolmogorov consistency theorem). Take any consistent family {Q (I) I Λ finite} of finite-dimensional distributions; each Q (I) is a probability measure on (S I, S I ), and the Q (I) are consistent. If S is Polish and S = B(S), probability measure P on (S Λ, S Λ ) having the given Q (I) as marginal distributions. (By Lemma (2.2), P is unique.)

8 II.3 Kolmogorov s Consistency Theorem 8 Proof sketch. If Λ is countable, we can reduce this to Ionescu-Tulcea by using the result on the structure of probability measures on nice product spaces: choose P 0 := Q ({0}) ; write Q ({0,1}) = Q ({0}) K 1 = P 0 K 1 ; write Q ({0,1,2}) = Q ({0,1}) K 2 etc. For the general case, some more work; see Kallenberg, Theorem Why does this work, even for uncountable Λ? The key point is that we work on S Λ, and this is a lot smaller than the productσ-field λ Λ S if Λ is uncountable. Indeed: Lemma II.3.3. S Λ := σ(x λ, λ Λ) ({ = σ A j }) S J Λ countable, A i S =: G 1 j J λ J c = σ (X j, j J) =: G 2 J Λ J countable In particular, every event in S Λ depends on at most countably many coordinates. Proof. S Λ G 1, because every X λ is G 1 -measurable. Conversely, for any J Λ countable, A j S = X 1 j (A j ) S Λ. j J λ J c G 2 S Λ is clear because every X λ is S Λ -measurable. Conversely, for J := {λ}, X λ is measurable with respect to σ(x j, j J), so S Λ G 2 follows if G 2 is a σ-field. This is easy: B n G 2, n B n σ(x j, j J n ) for some J n Λ countable J = n N J n is still countable, and n N B n σ(x j, j J).

9 9 III Conditional Expectation Note: This section is incomplete in several places. I strongly advise you to use the script. If you would like to help filling the gaps, please send an . III.1 Definition and Construction Motivation: Given a random variable X on (Ω, F, P). We have some information and want to use this to predict X. How? Idea: Information is given via a σ-field G; A G means that we can observe A. What is the relation between G and F? X is a random variable, so X is F-measurable; so {X c} F, c R, so if we can observe F, we can observe the value of X. So predicting X on the basis of G is only interesting if G F. Definition. Suppose X is a random variable 0 or in L 1, and G F is a σ-field. Any random variable satisfying i) Y is G-measurable ii) E[Y I A ] = E[XI A ] A G is called (a version of) the conditional expectation of X given G, written E[X G] := Y. We say a version of because Y is not unique! Remark.: (i) says that Y may only use information in G; (ii) formalizes the idea that E[X G] is a best prediction of X. Theorem III.1.1. If X 0 or X L 1 a random variable and G a σ-field, then E[X G] exists, and is unique in the following sense: If Y 1, Y 2 are two versions of E[X G], then Y 1 = Y 2 P-a.s. Proof. Uniqueness: More generally, the conditional expectation is monotone: X X P-a.s. implies E[X G] E[X G] P-a.s. To see this, suppose Y satisfies (i), (ii) (in the definition above) for X. Then A := {Y > Y } G by (i) and Y I A Y I A P-a.s. But also E[Y I A ] = E[XI A ] E[X I A ] = E[Y I A ] by (ii) and the assumption. So Y I A = Y I A P-a.s., so P[A] = 0 by definition of A, so Y Y P-a.s. Existence: not via Radon-Nikodym (as in the script), but via projection in Hilbert space. a) Conditional expectation is monotone (see above) so E[X G] 0 if X 0. b) Call L 2 G the set of all random variables in L 2 (P) which agree P-a.s. with some G-measurable random variable. Then L 2 F = L 2 is by Fischer-Riesz (I 6.4) a Hilbert space. Moreover L 2 G is a closed subspace: linear subspace: ok;

10 III.1 Definition and Construction 10 closed: X n X in L 2 subsequence (X nk ) with X nk X P-a.s. so if all the X n are in L 2 G, then X is again equal to some G-measurable random variable P-a.s., hence in L 2 G. Call π the orthogonal projection in L 2 G. By definition, E[(X π(x))z] = (X π(x), Z) L 2 = 0 Z L 2 G For X L 2 (P), choose some G-measurable Y with Y = π(x) P-a.s. Then Y is G-measurable, and (ii) also holds, because E[(X Y )I A ] = E[(X π(x)) I A ] = 0 A G So E[X G] := Y is ok. c) If X 0, then X n := min{x, n} is in L 2 (P) and X n ր X. By (a) and (b), Y n := E[X n G] exists and 0 Y n ր P-a.s., so Y := lim Y n exists P-a.s. and is G-measurable. Moreover, E[Y n I A ] = E[X n I A ] implies (by monotone convergence) E[Y I A ] = E[XI A ] A G. And of course Y 0 P-a.s., so take E[X G] := Y. d) For X L 1 write X = X + X with X ± 0 and E[X ± ] <. Define Y ± := E[X ± G] so that Y ± are 0 and E[Y ± ] = E[X ± ] (take A = Ω). Then Y := Y + Y L 1 is G-measurable, and it satisfies (ii), so E[X G] := Y will do. Remark.: Construction by projection shows the following optimality of conditional expectation: If X L 2, then E[X G] minimizes the L 2 -norm X Y L 2 = ( E[(X Y ) 2 ] ) 1/2 over all Y L 2 which are G-measurable. This gives a precise sense in which (ii) formalizes best prediction. Explicit construction in special cases (p. 47 in script): Example.: Supose G is finitely generated, i.e. G = σ(a 1,..., A n ) with A i F. Pass to the atoms of G and write G = σ(b 1,..., B n ) where n B i = Ω and B i B k =, i k. Then any (finite) G-measurable random variable Y has the form n Y = c i I Bi for some c i R. Now take a random variable X 0 or in L 1. How can we then write E[X G] more explicitly. E[X G] is G-measurable, so it has the form Moreover, (ii) for A = B i gives E[X G] = Y = E[XI Bk ] = E[Y I Bk ] = n c i I Bi. n c i E[I Bi I Bk ] = c k P[B k ]

11 III.2 Properties of Conditional Expectations 11 so we get E[X G] = n E[XI Bi ] I Bi P[B i ] i.e. the value of the conditional expectation on an atom B i is simply the average value of X on that atom. Link to elementary conditional probabilities: If P[B i ] > 0, then Define so we also have E[X B i ] = P[ B i ] = P[ B i]. P[B i ] Ω E[X G] = XdP[ B i ] = E[XI B i ] P[B i ] ( R!) n E[X B i ] I Bi (a r.v.!) (1) E[X B i ] is the value of E[X G] with respect to the atom B i. Remark.: 1. The same arguments work if G is countably generated. 2. If G is countably generated, one can use (1) as definition of E[X G] and then verify that it satisifies (i) and (ii) (see script). 3. We have not used Radon-Nikodým, so we can later use martingales and the martingale convergence to prove Radon-Nikodým. (That s the advantage of the L 2 approach.) Remark.: If G = σ(z) for some random variable Z, then any G-measurable random variable is of the form h(z) for some measurable h (see I.4). We then write E[X G] = E[X σ(z)] =: E[X Z] = h(z) and (with an abuse of notation) More carefully we should write h(z) =: E[X Z = z]. h(z) = E[X Z] Z=z. (This is similiar to using kernels and conditional probabilities.) III.2 Properties of Conditional Expectations Proposition III.2.1. Without explicit mention, always X 0 or X L 1 (P). Then: 1. E[E[X G]] = E[X]. 2. If X is G-measurable, then E[X G] = X

12 III.2 Properties of Conditional Expectations Linearity: Let X 1, X 2 L 1 and a, b R, then E[aX 1 + bx 2 G] = ae[x 1 G] + be[x 2 G]. In tedious detail, this means: if Y i is a version of E[X i G], then ay 1 +by 2 is a version of E[aX 1 + bx 2 G]. 4. Monotonicity: If X X P-almost surely, then E[X G] E[X G] P- almost surely. 5. Projectivity: If H G is a σ-field, then E[E[X G] H] = E[X H]. 6. If Z is G-measurable and both X, ZX are 0 or in L 1, then E[ZX G] = ZE[X G] P-almost surely. Proof. 1. Use (ii) for A = Ω. 2. obvious 3. clear from linearity of expectation 4. already proved in III Y := E[E[X G] H] is H-measurable, and for any A H, E[Y I A ] = E[E[X G] I A ] = E[XI A ]. 6. The right-hand side is G-measurable. To check (ii), use measure-theoretic induction: For Z = I B with B G, for any A G, and the rest as usual. E[I B E[X G] I A ] = E[E[X G] I B I A ] = E[XI B A ] Proposition III.2.2. Further properties: = E[I B XI A ] = E[E[I B X G] I A ] 1. Monotone convergence: 0 X n ր X P-a.s. implies E[X n G] ր E[X G] P-a.s.. 2. Fatou: If all X n 0 (or Z L 1 ), then P-a.s. E[lim inf X n G] liminf E[X n G] 3. Lebesgue: If lim X n = X P-a.s. and X n Z P-a.s. n, some Z L 1, then lim E[X n G] = E[ lim X n G] = E[X G].

13 III.2 Properties of Conditional Expectations Jensen: If u: R R convex with u(x) L 1, Proof. P-a.s. E[u(X) G] u(e[x G]) 1. Y n version of E[X n G] 0 Y n ր Y := lim Y n P-a.s. and Y is G-measurable; moreover, E[Y n I A ] = E[X n I A ], A G, n monotone convergence gives E[Y I A ] = E[XI A ] A G, so Y is a version of E[X G]. 2. U n = inf m n X m ր liminf X n P-a.s., n, and U n X m for all m n E[U n G] inf m n E[X m G] P-a.s., so P-a.s. E[lim inf X n G] = lim E[U n G] = sup E[U n G] sup n n N inf E[X m G] = liminf E[X n G] m n (Rest in the Script) Corollary III.2.3. The conditional expectation is a contraction on L p for any p 1: If X L p, then also E[X G] L p, and E[X G] L p X L p Proof. p 1 x u(x) := x p is convex Jensen: E[ X p G] E[X G] p P-a.s. To finish the proof, take expectations on both sides. How to compute conditional expectations? Typical situation X i : Ω S i F-Smeasurable, i = 1, 2. How do we find E[F(X 1, X 2 ) X 1 ]? Set S := S 1 S 2, S := S 1 S 2, then (X 1, X 2 ) is an S-valued random variable. Suppose the distribution of (X 1, X 2 ) has the form P 1 K for a probability measure P 1 on S 1 and a kernel K from (S 1, S 1 ) to (S 2, S 2 ). (Intuition: P 1 is the distribution of X 1 under P, K is the conditional distribution of X 2 under P given X 1, see III.3). Proposition III.2.4. In the above situation, for any F : S 1 S 2 [0, ) S 1 S 2 -measurable, E[F(X 1, X 2 ) X 1 ](ω) = F(X 1 (ω), x 2 )K(X 1 (ω), dx 2 ) =: h(x 1 (ω)) (2) S 2 with h(x 1 ) := F(x 1, x 2 )K(x 1, dx 2 ). S 2 Example.: If X 1, X 2 are independent under P, then the distribution of (X 1, X 2 ) is the product measure P 1 P 2, where P i is the distribution of X i under P. In that case, one can take K(x 1, ) P 2 [ ].

14 III.2 Properties of Conditional Expectations 14 Proof. x 1 h(x 1 ) is S 1 -measurable; see proof of construction of P 1 K. So RHS of (2) is h(x 1 ), h measurable so it is σ(x 1 )-measurable. Now take A σ(x 1 ); then I A is σ(x 1 )-measurable, so I A = g(x 1 ) with g: S 1 [0, 1] measurable, see I.4. So: E[F(X 1, X 2 )I A ] = E[F(X 1, X 2 )g(x 1 )] = fd(p 1 K) S 1 S 2 = g(x 1 ) F(x 1, x 2 )K(x 1, dx 2 )P 1 (dx 1 ) S 1 S 2 so h(x 1 ) = E[f(X 1, X 2 ) X 1 ]. Consequences: = E[g(X 1 )h(x 1 )] = E[h(X 1 )I A ] a) In the above situation, we can write the conditional expectation as an integral; this is useful to prove properties. b) K is indeed a conditional distribution of X 2 given X 1 : Choose F = I B2 with B 2 S 2 ; then P[X 2 B X 1 ](ω) := E[I B (X 2 ) X 1 ](ω) = K(X 1 (ω), B 2 ). Useful special case: Independence. Corollary III.2.5. G F σ-field, X 1 G-measurable, X 2 is independent of G. For any F : S 1 S 2 [0, ) S 1 S 2 -measurable, E[F(X 1, X 2 ) G](ω) = E[F(x 1, X 2 )] x1=x 1(ω) = h(x 1 (ω)) with h(x 1 ) := E[F(x 1, X 2 )]. In words: Fix known variable X 1, take expectation over independent variable. Proof. X 1, X 2 independent joint distribution is P 1 P 2 can take K(x 1, ) = P 2 [ ] in Proposition III.2.4 h is measurable RHS is measurable wrt. σ(x 1 ) G, so it is G-measurable. Moreover A G implies E[F(X 1, X 2 )I A ] = E[h(X 1 )I A ] because we can use measure-theoretic induction on F: If F = I B1 B 2 with B i S i then on the other hand so LHS = E[I B1 (X 1 )I B2 (X 2 )I A ] = E[I B2 (X 2 )]E[I B1 (X 1 )I A ] h(x 1 ) = I B1 (x 1 )E[I B2 (X 2 )] RHS = E[I B1 (x 1 )E[I B2 (X 2 )]I A ] = LHS so the assertion follows for F = I B1 B 2, etc.

15 III.2 Properties of Conditional Expectations 15 Example (Wald identities).: Suppose (X i ) i N is a sequence of (real) random variables and N is an N 0 -valued random variable. Consider the (doubly) random sum What are E[S N ], Var[S n ]. (S N )(ω) := N(ω) X i (ω) Random sums: (Y i ) i N sequence of random variables, N a random variable with values in N, S N := S := N Y i. 1. Suppose that the Y i are identically distributed and that N is independent of (Y i ). Then: E[S] = E[N] E[Y 1 ] since E[S] = E[E[S N]] and [ N ] (2.12) [ n ] n=n(ω) E[S N](ω) = E Y i N = E Y i = N(ω) E[Y 1 ]. 2. Suppose in addition that the Y i are in L 2 and independent (so: i.i.d.). Then: Var[S] = E[N] Var[Y 1 ] + (E[Y 1 ]) 2 Var[N] since E[S 2 ] = E[E[S 2 N]] and [ ( n E[S 2 N](ω) = E Using (1) the result follows. Y i)2] n=n(ω) = N(ω) Var[Y i ] + (N(ω)E[Y 1 ]) 2 n = Var[Y i ] + (ne[y 1 ]) 2 Example.: X 1,..., X n i.i.d. random variables in L 1, S n := n X i. Then (as should follow from symmetry). E[X 1 S n ] = S n n Proof. RHS is σ(s n )-measurable: (i) ok. Now take any bounded measurable f on R and compute E[X 1 f(s n )]. X 1,...,X n are i.i.d., so the distribution of (X 1,..., X n ) is product measurable w.r.t. P 1 P 1 = P1 n. So E[X 1 f(s n )] = x 1 f(x x n )P 1 (dx 1 ) P 1 (dx n ) R n and this expression on the right is invariant under a permutation of {1,..., n}. So E[X 1 f(s n )] = E[X i f(s n )] and so n E[S n f(s n )] = E[X i f(s n )] = n E[X 1 f(s N )]. Choose f(s n ) := I A, A σ(s n ), to get E[X 1 I A ] = E[ 1 n S ni A ], A σ(s n ).

16 III.3 Regular Conditional Distributions 16 III.3 Regular Conditional Distributions (Ω, F, P) probability space, (S, S) measure space, X : Ω S F-S-measurable (an S-valued random variable) and G F σ-field. Fix B S and define P[X B G] := E[I X B G]. This gives a G-measurable random variable with values in [0, 1]. So we get a mapping Ω S [0, 1] (ω, B) P[X B G](ω) but this is not well-defined because conditional expectations are only defined P-a.s. Question: can one choose/define this mapping to obtain a stochastic kernel, i.e., a) G-measurable in ω for fixed B S, b) a probability measure in B for fixed ω Ω? Why should there be a problem? Try to do it: We need for each ω σ-additivity in B, i.e. [ ] G n P X B i = P[X B i G] P-a.s. So for each B i, choose P[X B i G]; not well-defined on a nullset N(B i ). Now n I {X Bi} = I S X n ր I Bi X S Bi By monotone convergence, this gives [ P[X B i G] = P X ] G B i P-a.s., where the nullset where this may fail depends (via the N(B i )) on the sequence (B i ). So (3) may go wrong on a nullset, depending on (B i ). What we want is (3) simultaneously for all possible sequences (B i ) of disjoint sets, outside a (possibly large) nullset. For large enough S, there are uncountably many sequences (B i ); (3) can fail on a nullset for each such sequence, and the union of all these uncountably many nullsets is not under our control. So perhaps (huge) nullset outside of which (3) holds simultaneously for all (B i )-sequences. To get a positive answer, one needs some condition on (S, S). Definition (Regular conditional distribution). The r.c.d. of X given G is a stochastic kernel from (Ω, G) to (S, S) such that for each B S: K(, B) is a version of P[X B G], i.e., for each B S, (3) E[I {X B} G] = K(, B) P-a.s.

17 III.3 Regular Conditional Distributions 17 Proposition III.3.1. If S = R with S = B(R), then a r.c.d. of X given G exists. Proof. Use crucially that Q R is countable and dense. For each q Q, choose version V q (ω) of E[I {X q} G]. Set N 1 := {those ω, where q V q (ω) is not increasing on Q} := {V q > V r } q,r Q q<r Monotonicity of conditional expectation gives V q = E[I {X q} G] q<r E[I {X r} G] = V r P-a.s., so {V q > V r } is a P-nullset, so N 1 is a P-nullset. Next set N 2 := {those ω where q V q (ω) is not everywhere on Q right-cont.} := {lim V r V q } q Q r q r Q r>q Monotone convergence: r n ցց q V rn = E[I {X rn} G] ց E[I {X q} G] = V q P-a.s., so N 2 is also a P-nullset. Next set N 3 := { lim q q Q V q 0 or lim q + q Q V q 1} then N 3 is also a P-nullset. So N := N 1 N 2 N 3 has P[N] = 0, and N G, and for ω N, q V q (ω) is monotone on Q, right continuous on Q, and has limits 0 and 1 at or +. Now define F(ω, x) := G(x) limq x q>x q Q for any distribution function G on R. V q (ω) for w N for w N Then for each ω, x F(ω, x) is a distribution function on R by construction. Moreover, P[X x G](ω) = E[I {X x G](ω) = F(ω, x) P-a.s. for each x, i.e. F(, x) is a version of E[I {X x} G], x. (Note that since N G, ω F(ω, x) is G-measurable.) For each ω Ω, choose as K(ω, ) the probability measure on R with distribution function F(ω, ). Then this K is a kernel from (Ω, G) to (R, B(R)): a) K(ω, ) is a probability measure: by definition

18 III.3 Regular Conditional Distributions 18 b) K(, B) is G-measurable, B B(R): D := {A B(R) K(, A) is G-measurable} is a Dynkin system, and D {(, x] x R}, because K(, (, x]) = F(, x), and F is G-measurable in ω. So D = B(R). Finally, K(, B) is a version of E[I {X B} G] for each B B(R) (and then K is the desired r.c.d.): D := {A B(R) K(, A) = E[I {X A} G] P-a.s.} D is a Dynkin system (use monotone convergence), and D {(, x] x R}, because K(, (, x]) = F(, x) is a version E[I {X (,x]} G] by construction of F; so D = B(R). Theorem III.3.2. If (S, S) is a Borel space, a r.c.d. of X given G exists. Proof. φ: S A B(R) bijective and with φ, φ 1 both measurable. So φ(x) is R-valued r.c.d. K 0 of φ(x) given G. Then K(ω, B) := K 0 (ω, φ(b)) does the job: K is G-measurable, B, because K 0 is; moreover, φ(b) = {x R s B with φ(s) = x, i.e. s = φ 1 (x)} = (φ 1 ) 1 (B) So K(ω, B) = K 0 (ω, (φ 1 ) 1 (B)) K(ω, ) is the image measure on S of K 0 under φ 1 ; so K is a probability measure on (S, S). Finally, K(, B) = K 0 (, φ(b)) = E[I {φ(x) φ(b)} G] = E[I {X B} G] P-a.s.

19 19 IV Martingales IV.1 Definitions and Examples (Ω, F, P) probability space, I [0, ] an index set. Two typical cases: a) I = N 0 ( discrete time ) b) I = [0, ) or I = [0, T] ( continuous time ) Definition. A filtration F = (F t ) t I is an increasing family of σ-fields F t F t I, i.e. F s F t for s, t I, s t. Interpretation.: F t is the family of events observable up to time t, i.e. the information available at t. Definition. X = (X t ) t I stochastic process with values in S (i.e. X t : Ω S is F-S-measurable t I): collection of S-valued random variables, indexed by t I, all defined on (Ω, F). X adapted to F: each X t is F t -S-measurable (observable at time t). Definition (Martingale). A (real-valued) martingale (wrt. F, P) is a (realvalued) stochastic process M = (M t ) t I with i) M is F-adapted ii) M is P-integrable, i.e., M t L 1 (P) t I. iii) E[M t F s ] = M s P-a.s. s, t I, s t. Submartingale if in (iii) only holds, supermartingale if holds. Remark.: 1. Changes of martingales cannot be predicted: E[M t M s F s ] = t E[M t ] is constant, so martingales are on average constant. But they fluctuate a lot, pathwise For I = N 0, (iii) is equivalent to E[M n+1 F n ] = M n P-a.s., n. Indeed: E[M n+1 F n ] = k E [ ] E[M n+l M n+l 1 F n+l 1 ] F n + Mn = M n }{{} 0 l=1 Example (Class 1: Sums of independent centred random variables).: I = N 0, (Y i ) i N independent random variables in L 1, F n := σ(y 1,...,Y n ), F 0 := {, Ω}. Then n M n := (Y i E[Y i ]), n N 0, is a martingale.

20 IV.1 Definitions and Examples 20 Proof. (i), (ii): ok; E[M n+1 M n F n ] = E[Y n+1 E[Y n+1 ] F n ] = E[Y n+1 F n ] E[Y n+1 ] = 0 because Y n+1 is independent of F n. Example.: Simple random walk with parameter p: (Y i ) i.i.d. with values ±1 and p = P[Y i = +1]. Then E[Y i ] = 2p 1 and so M n := x + n Y i n(2p 1) =: x + S n n(2p 1) is a martingale, x R. For p = 1 2 this is called the symmetric random walk; in this case (S n ) is a martingale. Example (Class 2: Successive predictions).: I [0, ], F = (F t ) t I any filtration, Z L 1 (P) M t := E[Z F t ], t I, is a (P, F)-martingale. Proof. (i), (ii) ok; for s t, E[M t F s ] = E [E[Z F t ] F s ] = E[Z F s ] = M s P-a.s. Example.: (S n ) symmetric simple random walk, fix N N, I = {0,..., N} (or also I = N 0 ); F n = σ(y 1,..., Y n ) = σ(s 1,...,S n ). Choose Z = f(s N ). For n N, E[Z F n ] = f(s N ); for n < N, S N = S n + N i=n+1 Y i =: S n + U n,n where S n is F n -measurable and U n,n is independent of F n, so E[f(S N ) F n ] = E[f(s + U n,n )] s=sn =: h(s n ) with h(s) = E[f(s + U n,n )] = u f(s + u)p[u n,n = u]. Having k times a +1 means N n k times 1, so u(k) = 2k (N n) and h(s) = N n k=0 ( N n f(s + 2k N + n) k ) p k (1 p) N n k. See also the Cox/Ross/Rubinstein model in mathematical finance. Example (Class 3: Harmonic functions of Markov chains).: X = (X n ) n N0 Markov chain with state space (S, S) and transition kernel K (from (S, S) to (S, S)). Canonical construction: Ω = S N0, F = S N0, initial distribution µ (probability measure on (S, S)), kernels K n ((x 0,..., x n ), ) = K(x n 1, ) n; corresponding P µ via P (n+1) = P (n) K n, P (0) = µ; X n coordinates. Write P x := P δ{x}.

21 IV.2 Playing Systems, Stopping Times and Stopping Theorem 21 h 0 measurable on S is called harmonic (for K) if h = Kh, where (Kh)(x) := h(x)k(x, dy), x S ( mean value property ). S F n := σ(x 0,..., X n ) M n := h(x n ), n N 0, is a (P x, F)-martingale for every x S with h(x) <. Proof. (i) ok; (iii): E x [M n F n 1 ] =? M n 1 : joint distribution of (X 0,..., X n ) is P (n) = P (n 1) K n 1. So using III.2.4, E x [h(x n ) F n 1 ] = h(y)k(x n 1, dy) Now (ii): iterate to get S = (Kh)(X n 1 ) = h(x n 1 ) = M n 1 E x [M n ] = E x [M n 1 ] =... = E x [M 0 ] = E x [h(x 0 )] = h(x) < Example.: Simple random walk with parameter p: n S n = Y i, x Z Here, K(x, ) = p δ {x+1} + (1 p) δ {x 1}. is harmonic for K: ( ) x 1 p h(x) :=, x Z p (Kh)(x) = ph(x + 1) + (1 p)h(x 1) =... = h(x) ( ) So M n := h(x + S n ) = 1 p x+sn, p n N0, is a martingale (and 0). (For p 1 2 ; for p = 1 2 it s boring.) Trajectories suggest that (M n) converges to 0 a.s. This can be proved in two ways: by explicit computation and the strong law of large numbers (see script); or by a more general convergence theorem of martingales. IV.2 Playing Systems, Stopping Times and Stopping Theorem (Ω, F, P) probability space; filtration F = (F t ) t I. Start with I = N 0 and let X = (X n ) n N0 be a (real) stochastic process. Definition. H = (H n ) n N is predictable (wrt. F) if H n is F n 1 -measurable, n.

22 IV.2 Playing Systems, Stopping Times and Stopping Theorem 22 Definition. A playing system (for X) is a real-valued stochastic process H = (H n ) n N which is predictable and such that H n (X n X n 1 ) L 1 (P), n (e.g. if X is integrable, H bounded is enough). Then we define (H X) n := n H k (X k X k 1 ) (n N 0 ) k=1 and call H X = ((H X) n ) n N0 the stochastic integral of H wrt. X. Notation.: Increment X k X k 1 =: X k ; so H X = H k X k is the discrete-time version of the integral of H (integrand) wrt. X (integrator). Interpretation.: Think of X k = X k X k 1 as gain/loss of a game in round k. Then H represents a betting system: for round k (from k 1 to k), place amount H k as bet. This may use past information, but should not depend on the outcome of round k; so H k must be F k 1 -measurable. Then H k X k gives winnings/losses from bet in period k, and H X is the cumulative balance evolution. Example.: S n = n Y i simple random walk, F generated by Y, M n = x+s n, x Z. Set H 1 := 1 (bet 1 initally) and { 2 n 1 if Y 1 = Y 1 = = Y n 1 = 1 H n := 0 otherwise. so keep on doubling until we win, i.e. until Then (H M) n (ω) = T(ω) = inf{n N Y n = +1}. { n k=1 2k 1 ( 1) = (2 n 1) if n < T(ω) (2 T(ω) 1 1) + 2 T(ω) 1 = +1 if n T(ω) This system looks successful but it cannot be so, at least not on average! More precisely, we have: Theorem IV.2.1. Let X be a (sub-/super-)martingale and H a playing system. 1. If X is a martingale, then H X is also a martingale. 2. If X is a sub- or super-martingale and H 0, H X is again a sub- or super-martingale. Proof. Clearly, H X is adapted and integrable. For (iii), write E[(H X) n (H X) n 1 F n 1 ] = E[H n (X n X n 1 ) F n 1 ] and this implies (1) and (2). = H n E[X n X n 1 F n 1 ]

23 IV.2 Playing Systems, Stopping Times and Stopping Theorem 23 Special type of betting: place unit bet until some random time T(ω), then stop: H n (ω) := I n T(ω). For H to be predictable, we want {T n} F n 1, or equivalently {T n 1} F n 1. Definition. Take I [0, ). A stopping time (wrt. F) is a mapping T : Ω I { } with {T t} F t, t I. Interpretation.: Decision on stopping depends only on available information. Remark.: For I = N 0, T stopping time {T = n} F n, n: {T n} = shows, and follows from n {T = k} }{{} F k F n k=1. F n 1 F n {T = n} = {T n} {T n 1} }{{}}{{} F n Example (Canonical example).: X = (X n ) n N0 adapted process with values in (S, S), A S and T A is a stopping time: T A (ω) := inf{n N 0 X n (ω) A} {T A n} = = first hitting time of X to set A n {X k A} F n, n k=0 Remark.: For I = [0, ), this is still (almost) true: one needs a bit of regularity on trajectories of X, and the proof is very difficult. Example (Canonical non-example).: Time of last visit to A, is not a stopping time in general. L A (ω) := sup{n N 0 X n (ω) A} Definition. X = (X t ) t I stochastic process, T : Ω I mapping ( random time ). We define X T : Ω R ω (X T )(ω) := X T(ω) (ω), the value of X at time T. For T : Ω I { } the stopped process is X T = (X T t ) t I, defined by X T t := X T t. Definition. F = (F t ) t I filtration, T a stopping time F T := {A F A {T t} F t, t I} σ-field of events observable up to time T. Now we return to I = N 0.

24 IV.2 Playing Systems, Stopping Times and Stopping Theorem 24 Theorem IV.2.2 (Stopping theorem I). 1. Suppose X = (X n ) n N0 is a (sub-/super-) martingale and T a stopping time (values in N 0 = N 0 {+ }). Then X T is again (sub-/super-) martingale. 2. Suppose X = M is a martingale and T a stopping time. Then E[M T n ] = E[M 0 ], n N 0. This implies E[M T ] = E[M 0 ] if we have a) T is bounded, i.e. T N P-a.s., for some N N; or b) T < P-a.s. and (M T n ) n N0 is uniformly integrable. Remark.: For the doubling system, we had (H S) 0 = 0 and (H S) T = 1, so something in the theorem is violated. Proof. 1. H n := I {n T } is a playing system and 0, so H X is again (sub-/super-) martingale. But (H X) n = n n T H k X k = 1 X k k=1 k=1 = X n T X 0 = X T n X 0 2. M martingale, T stopping time implies by (1): M T is a martingale, so E[M 0 ] = E[M T 0 ] = E[MT n ] = E[M T n] As n, T n T P-a.s. under both (a) and (b), so M T n M T P-a.s. So it remains to prove E[M T ] = lim E[M T n ]. In case (b), (M T n ) n N0 is UI and converges P-a.s., hence also convergent (to M T ) in L 1 : done. In case (a), M T n max k=0,...,n M k L 1 and we can use Lebesgue. Remark.: In general neither (a) nor (b) holds, so you will have to (similiarly) find a proof that the expected values converge. Example (Ruin problem).: X n = x + S n simple random walk with x Z, parameter p. Take a, b Z with a < x < b and set T a,b := inf{n N 0 X n / (a, b)}. Interpretation: A gambler plays against the bank with unit bets and starting capitals x a and b x, resp. At T a,b, one of the two is ruined the gambler at the bottom, the bank at the top. What is the probability of ruin for the player? T a,b is a stopping time (for F generated by Y or by X), and T a,b < P-a.s. (Borel-Cantelli). We want to find r(x) = P[x + S Ta,b = a]

25 IV.2 Playing Systems, Stopping Times and Stopping Theorem 25 Case p = 1 2 : S is a martingale ST a,b is also a martingale and bounded, hence UI x = E[x + 0] = E[x + S Ta,b ] = ar(x) + b(1 r(x)) r(x) = b x b a i.e. the ratio of bank s initial capital to total initial capital. Case p 1 2 : M n := h(x + S n ) with h(x) = ( 1 p p ) x is martingale M T a,b is also a martingale E[M 0 ] = E[M Ta,b n], n; M T a,b is bounded h(x) = E[M 0 ] = E[M Ta,b ] = h(a) r(x) + h(b) (1 r(x)) r(x) = h(b) h(x) h(b) h(a) = 1 ( 1 ( p p 1 p )b x 1 p )b a For p < 1 2 (game is unfair for player), the denominator is < 1, so r(x) 1 ( p 1 p )b x, and this depends only on the initial capital b x of the bank. So the bank can make the probability of ruin arbitrarily close to 1, uniformly over all players, by choosing a large initial capital. Example.: Roulette, p = 18 38, b x = 66 r(x) (!). Example (Sums of random variables).: F = (F n ) n N0 on (Ω, F, P); (Y i ) i N random variables with E[ Y i ] m <, E[Y i ] m, Var[Y i ] σ 2 <. Moreover, (Y i ) is adapted to F and Y i is independent of F i 1, i. (E.g. (Y i ) i N i.i.d. in L 2 and F n := σ(y 1,..., Y n ).) Let T be a stopping time wrt. F with E[T] <. Then we have the generalized Wald identities: For S T := T Y i, 1) S T L 1 and E[S T ] = me[t]. 2) If m = 0, then E[S 2 T ] = σ2 E[T]. 3) If T L 2 and T is independent of the Y i, then Proof. Var[S T ] = σ 2 E[T] + m 2 Var[T]. a) M n := S n nm = n (Y i m) is a martingale stopping theorem: so we need to exchange E and lim. E[S } T {{ n ] = m E[T n] }}{{} S T b) Integrable upper bound for (S T n ) n N : M n := S n n m := րe[t] n ( Y i m) is a martingale with monotone integration: E[ S T n ] }{{} րe[ S T] = m E[T n] }{{} րe[t]

26 IV.2 Playing Systems, Stopping Times and Stopping Theorem 26 E[ S T ] = me[t] <, so S T L 1. And now S T n S T n Lebesgue gives E[S T ] = lim E[S T n] (a) = me[t] which proves (1). c) U n := M 2 n nσ2 is martingale: M n+1 M n = Y n+1 m U n+1 U n = (Y n+1 m) 2 + 2M n (Y n+1 m) σ 2, and this has conditional expectation 0 given F n. Hence stopping theorem gives E[MT 2 n ] = σ2 E[T n], n }{{} րe[t]< Fatou: E[M 2 T ] liminf E[M2 T n ] = σ2 E[T] <, so M T L 2. Moreover, sup n E[M 2 T n ] σ2 E[T] <, so (M T n ) n N is bounded in L 2 and therefore UI. So M T = (M T n ) n N is a martingale and UI; so stopping theorem, part II (see next section) gives so, by Jensen E[M T F n ] = E[M T F n] = M T n = M T n So (M 2 T n ) n N is UI, since M 2 T L1. Therefore (M 2 T n M2 T For m = 0, M = S, so we get (2). 0 M 2 T n E[M2 T F n], n. P-a.s. as n, it is UI E[M 2 T ] = lim E[M2 T n ] = σ2 E[T] d) If m is arbitrary, first use (2) for Ỹi := Y i m; so σ 2 E[T] = E[ S 2 T ] = E[S2 T ] 2mE[TS T] + m 2 E[T 2 ] using T L 2. Now compute (T independent of (Y i )): E[TS T ] = E[E[TS T T]] = E [ E[nS n ] n=t ] = E[T 2 m] E[S 2 T] = σ 2 E[T] + m 2 E[T 2 ] Remark.: Suppose m = 0 so that S n = n Y i is a martingale. For c > 0, define T c := inf{n N 0 S n c}. Then E[T c ] = +, and the symmetric result holds for c < 0 (with... c). Indeed: If we had E[T c ] <, (1) would give E[S Tc ] = 0, but S Tc c. In particular: (S n ) symmetric simple random walk (p = 1 2 ) T 1 := inf{n N 0 S n = 1} T 1 := inf{n N 0 S n = 1} both have E[T 1 ] = E[T 1 ] = + (!). Later (Section 4): T ±1 < P-a.s.

27 IV.3 The Convergence Theorem 27 IV.3 The Convergence Theorem (Ω, F, P), I = N 0, F = (F n ) n N0, X = (X n ) n N0. Fix a < b and consider upcrossings of X across (a, b) up to N: Time intervals S i < T i where the process travels from a or below to b or above. Formally: S 0 := T 0 := 0, (so X Sk a, X Tk b). Finally, S k (ω) := inf{n T k 1 (ω) X n (ω) a} = beginning of upcrossing k T k (ω) := inf{n S k (ω) X n (ω) b} = end of upcrossing k U N a,b(ω) := sup{k T k (ω) N} If X is adapted to F, the S k, T k are stopping times. Lemma IV.3.1. If X is a supermartingale, then Proof. Define E[U N a,b] 1 b a E[(X N a) ] H n := I {Sk <n T k }. k=1 Then H is predictable, because S k, T k are stopping times, and bounded, hence H is a playing system for X. Intuition: bet +1 during each upcrossing, 0 otherwise. Moreover, H 0, so H X is again a supermartingale, so 0 E[(H X) N X 0 ] where (H X) N X 0 = X Tk N X Sk N Ua,b N (b a) + X N X (SU N +1 ) N a,b k=1 If S U N a,b +1 < N, then X (SU N a,b +1 ) N a; if S U N a,b +1 N, then X (SU N a,b +1 ) N = X N. So This gives the assertion. X N X (SU N a,b +1 ) N (X N a) As N, U N a,b ր U a,b := total number of upcrossings of X through (a, b). Monotone integration gives for supermartingales RHS is finite if E[U a,b ] 1 b a sup E[(X N a) ] N N sup E[X N ] < (4) N N or, equivalently for a supermartingale, (X n ) n N0 is bounded in L 1. Supermartingale E[X 0 ] E[X n ] = E[X + n ] E[X n ] E[X n ] E[ X n ] = E[X + n ] + E[X N ] E[X 0] + 2E[X n ]

28 IV.3 The Convergence Theorem 28 Theorem IV.3.2 (Convergence theorem, Doob). Any supermartingale X with sup N N E[X N ] < converges P-a.s. to some X L 1. In particular, any (super)martingale 0 converges P-a.s. Proof. {lim inf X n < limsup X n } {U a,b = + } a<b a,b Q But E[U a,b ] < by (4), a < b, so P[U a,b = + ] = 0 for each pair a < b P[liminf X n < limsup X n ] = 0: convergence P-a.s. Moreover, so X L 1. E[ X ] liminf E[ X n ] sup E[ X N ] < N N X = (X n ) (n N 0 ) is martingale wrt. F = (F n ) (n N 0 ). Suppose X is bounded in L 1 convergence theorem: X := lim X n P-a.s., and X L 1. Define F := σ( n N 0 F n ) X is F -measurable. Set N 0 := N 0 {+ }. Then (X n ) n N0 is adapted to (F N ) n N0 and integrable. Is this still a martingale on N 0? Theorem IV.3.3. For a martingale X = (X n ) n N0, TFAE: 1. Y L 1 with X n = E[Y F n ], n N 0 (i.e. the X n are successive predictions). 2. (X n ) n N0 converges in L 1 (to some F -measurable random variable). 3. X L 1 (P, F ) such that (X n ) n N0 is a martingale on N (X n ) n N0 is uniformly integrable. Moreover, we then have X = E[Y F ]. Remark (Terminology).: X as above is called closable on the right, and X closes (X n ) n N0 on the right as a martingale. Remark.: In the same way: X = (X n ) n N0 supermartingale is closable on the right as a supermartingale iff (X n ) n N 0 is UI. Proof. (1) (4) : Y L 1 {E[Y G] G F σ-field} is UI (see previous exercise). (4) (2) : (X n ) UI (X n ) bounded in L 1 (X n ) converges P-a.s. (X n ) converges in L 1, since it is UI. (2) (3) : X n X in L 1, A F n fixed X n+m I A X I A colon(m ) in L 1 E[X I A ] = lim m E[X n+mi A ] = lim m E[X ni A ] = E[X n I A ] A F n, n X n = E[X F n ] P-a.s. (3) (1) : Take Y := X.

29 IV.3 The Convergence Theorem 29 Finally, as in (2) (3), for all n, A F n, E[X I A ] = E[X n I A ] = E[Y I A ] hence this holds for all A n N 0 F n, this is -closed and generates F, so it still holds for A F X = E[Y F ] P-a.s. Application: generalize stopping theorem to unbounded stopping times in a different form. Indeed: M martingale E[M t F s ] = M s P-a.s. for s t, s and t deterministic. Goal: still true for s t by stopping times S T. Recall that Exercise: S T F S F T. F T := {A F A {T t} F t, t I} Moreover: I = N 0, A F T, T stopping time A {T = k} F k, k, because A {T = k} = A {T k} {T > k 1} = A {T k} {T k 1} F k. Theorem IV.3.4 (Stopping theorem II). Let M = (M n ) n N0 be a martingale and UI. If S, T are stopping times ( N 0 -valued) with S T, then M S, M T L 1 and E[M T F S ] = M S P-a.s. Proof. F S F T we prove only M T = E[M F T ] (and then use projectivity). Moreover, M T is F T -measurable only need E[M T I A ] = E[M I A ], A F T Write M = lim M I {T n} =: lim U n, M T = lim M TI {T n} =: lim V n. Then U n = M I {T=k}, V n = M k I {T=k}, k=0 k=0 and M k = E[M F k ] gives for A F T E[M I {T=k} I A ] = E[M I A {T=k} }{{} F k ] = E[M k I {T=k}I A ], so E[U n I A ] = E[V n I A ]. It remains to interchange E and lim n, for both U n, V n : 1. U n I A = M I {T n} I A M I A (n ) P-a.s., U n I A M, M L 1 can use Lebesgue. 2. V n I A = M T I {T n} I A M T I A (n ) P-a.s., V n I A M T can use Lebesgue as soon as M T L 1.

30 IV.4 Applications V n ր M T E[ M T ] = lim E[ V n ] lim n E[ M k I {T=k} ] M k = E[M F k ] M k E[ M F k ] E[ M k I {T=k} ] E[ M I {T=k} ] gives E[ M T ] lim = k=0 k=0 n E[ M I {T=k} ] E[ M I {T=k} ] k=0 [ ] = E M I {T=k} k=0 = E[ M ] <. IV.4 Applications IV.4.1 Simple Random Walk S n = n Y i simple random walk with p = 1 2 (symmetric). Then S = (S n) n N is a martingale, but cannot converge, because S n+1 S n = 1. Nevertheless, the convergence theorem is useful: For c Z, let T c := inf{n N 0 S n = c}. Then S Tc is a martingale and bounded above (if c > 0) or below (if c < 0), so the convergence theorem applies to S Tc : lim S T c n Z P-a.s. But this implies P[T c < ] = 1, c Z, and then P[lim inf S n =, limsup S n = + ] = 1 i.e. S oscillates unboundedly in both directions with probability 1. IV.4.2 Dirichlet Problem and Markov Chains (S, S) measure space, K kernel from (S, S) to (S, S). For h: S R measurable and 0 or bounded, define (Kh)(x) := h(y)k(x, dy), x S (again a measurable function). Call h harmonic if Kh = h. S Now fix A S and g: S R measurable and bounded. Dirichlet problem for (A, g): Find a function h such that

31 IV.4 Applications h is harmonic on A, i.e. Kh = h on A. 2. h = g on A. Convergent hom. Markov chain X = (X n ) n N0 with transition kernel K (see IV.1), constructed as coordinate process on Ω = S N0 ; write P x for corresponding distribution on (Ω, F), F = S N0, with initial distribution µ = δ {x}, x S. Define T A := inf{n N 0 X n A} as the first hitting time of X to A, and F n := σ(x 0,..., X n ). Theorem IV.4.1. Suppose P x [T A < ] = 1, x S (or x A). Then h(x) := E x [g(x TA )] is the unique bounded solution of the Dirichlet problem for (A, g) (can then use a numerical simulation). Proof. Uniqueness: write T := T A and suppose f is a solution. Then M n := f(x T n ) is a martingale under P x, x S, and bounded: x A T = T A 0 M n f(x). x A: E x [M n+1 F n ] = E x [f(x T (n+1) )(I {T n} + I {T>n} ) F n ] = I {T n} f(x T n ) + I {T>n} E x [f(x n+1 ) F n ] = I {T n} f(x T n ) + I {T>n} (Kf)(X n ) and finally, because x A, T = T A > n and f harmonic on A: E x [M n+1 F n ] = I {T n} f(x T n ) + I {T>n} f(x n ) = f(x T n ) = M n M bounded, so by the stopping theorem, f(x) = E x [M 0 ] = E x [M T ] = E x [f(x TA )] = E x[g(x TA )]. Existence: h is bounded, x A T A = 0 P x -a.s. h(x) = E x [g(x TA )] = g(x), so h = g on A. To show that h is harmonic, we use the Markov property: define shift θ: Ω Ω by Then (x 0, x 1, x 2,...) = ω θω := (x 1, x 2, x 3,...) E x [b(x) θ F 1 ](ω) = E X1(ω)[b(X)] := E z [b(x)] z=x1(ω) for any b: Ω R measurable and bounded or 0. So: (Kh)(x) = h(y)k(x, dy) S [ ] = E x [h(x 1 )] = E x E X1 [g(x )] TA ] ] = E x [E X1 [g(x ) θ F TA 1] = E x [g(x ) θ TA

32 IV.4 Applications 32 But for x A: T = inf{n N 0 X n A} = inf{n 1 X n A} = inf{n 0 (X θ) n A = T θ g(x T ) θ = g(x T ) P x -a.s. for x A. So Kh(x) = E[g(X T ] = h(x). IV.4.3 Unfair Games (S n ) symmetric simple random walk admits a simple winning strategy: T 1 := inf{n N 0 S n = +1} < P-a.s. simply bet on 1 and wait until this happens. Two problems: you might starve because E[T 1 ] = + you need a really generous sponsor, because S T1 is not bounded below (indeed: if it were, then S T1 would be bounded since S T1 1 S T1 would be a bounded martingale 1 = E[S T ] = E[S 0 ] = 0, a contradiction). So the above strategy is not practically feasible but in theory it is ok. More realistic situation: Call (V n ) n N balance evolution in a game. Assume: a) V n 0 (no debts allowed) b) E[V n ] < and E[V n+1 F n ] V n (unfair game; here F n = σ(v 0,..., V n )) c) δ > 0 with either V n+1 = V n or V n+1 V n δ P-a.s. (minimal gain of δ each round) Terminology: We play in round n if V n+1 V n δ. Now let T := sup{n N 0 V n V n 1 δ} = number of last round in which we play Theorem IV.4.2. Under the above assumptions we get: 1) T < P-a.s. (we have to stop playing in finite time) 2) E[V T ] E[V 0 ] Proof. V supermartingale, V 0 by b,a (V n ) converges P-a.s. So so this says T < P-a.s. Moreover: P[ V n V n 1 δ -often] = 0, E[V T ] = E[ lim V T I {T n}] = lim E[V T I {T n}] = lim E[V ni {T n}]

33 IV.4 Applications 33 and because V n I {T n} V n, so E[V T ] lim E[V n] E[V 0 ]. IV.4.4 Martingales with Bounded Increments Theorem IV.4.3. Suppose M is a martingale with respect to F such that sup n N M n := sup M n M n 1 L 1 (P) n N Then with probability 1, the trajectories of M are either convergent (to a finite limit) or oscillating between and +, i.e. with we have P[C O] = 1. C := {ω lim M n(ω) =: M (ω) R} O := {ω liminf M n(ω) =, limsup M n (ω) = + } Proof. Fix a < 0, a Z. Define T a := inf{n N 0 M n a}. Then M Ta because is a martingale to which we can apply the convergence theorem, { Mn Ta = M = M 0 if M 0 a T a n a sup M n if M 0 > a and so sup n E[(Mn Ta) ] <. So M Ta converges P-a.s. to some finite limit, so M n converges to some finite limit P-a.s. on {T a = } and so: P[C {liminf M n > }] P[C {T a = }] = 0 }{{} a Z 0 Analogously so the assertion follows. P[C {limsupm n < }] = 0 Corollary IV.4.4 (generalized Borel-Cantelli). F = (F n ) n N0 filtration, A n F n n, A = n N m n A m = many of the A n occur. Then { } A = P[A n F n 1 ] = + n=1 P-a.s. (5) Classical case is special case: If A n independent of F n 1, n, then P[A n F n 1 ] = P[A n ] and so (5) becomes { Ω if n=1 A = P[A n] = + if n=1 P[A n] <

34 IV.4 Applications 34 Proof. M n = n (I Ak P[A k F k 1 ]) k=1 is martingale with sup n M n 2 L 1 (P) can apply previous theorem. Now on C, we have I A = 1 def I Ak = + k=1 on C P[A k F k 1 ] = + k=1 and on O, we must have k=1 I A k = + and k=1 P[A k F k 1 ] = +. Because P[C O] = 1, the assertion follows. Example.: Consider MC X = (X n ) n N0 with state space S = {0,..., N} and transition kernel K(x, y) := K(x, {y}). Assume: (i) For x {1,...,N 1}, K(x, x) 1 (interior points are not absorbing) (ii) y S yk(x, y) = x, x S (the function x h(x) = x is harmonic for K) Choose x = 0 or x = N to get from (ii) that K(0, 0) = K(N, N) = 1: endpoints are absorbing. Denote by T := inf{n N 0 X n = 0 or X n = N} the time to absorption. Then for all x S, a) P x [T < ] = 1. b) P x [X T = N] = x N. Proof. X n = h(x n ) is by (ii) a martingale (under P x ) and bounded, so it converges P x -a.s. to some limit. This means that X n (ω) const. for n n 0 (ω), P x -a.s. Moreover, on {X n = y} for 0 < y < N so P[X n+1 X n F n ] = 1 K(y, y) 0 P[X n+1 X n F n ] = + n=1 therefore also X n+1 X n -often (P x -a.s.). So P x -a.s. the limit X cannot be a y with 0 < y < N T < P x -a.s. And now: x = E x [X 0 ] = E x [X T ] = 0 P x [X T = 0] + N P x [X T = N], solve for P x [X T = N] and the result follows. Example (voter model).: N people, each of these can vote for one of two parties. Behaviour: if party 1 at last election had x votes, then each person votes for 1 with probability x N, independently of each other.

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Math 6810 (Probability) Fall Lecture notes

Math 6810 (Probability) Fall Lecture notes Math 6810 (Probability) Fall 2012 Lecture notes Pieter Allaart University of North Texas September 23, 2012 2 Text: Introduction to Stochastic Calculus with Applications, by Fima C. Klebaner (3rd edition),

More information

1. Stochastic Processes and filtrations

1. Stochastic Processes and filtrations 1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Lectures 22-23: Conditional Expectations

Lectures 22-23: Conditional Expectations Lectures 22-23: Conditional Expectations 1.) Definitions Let X be an integrable random variable defined on a probability space (Ω, F 0, P ) and let F be a sub-σ-algebra of F 0. Then the conditional expectation

More information

(A n + B n + 1) A n + B n

(A n + B n + 1) A n + B n 344 Problem Hints and Solutions Solution for Problem 2.10. To calculate E(M n+1 F n ), first note that M n+1 is equal to (A n +1)/(A n +B n +1) with probability M n = A n /(A n +B n ) and M n+1 equals

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Martingale Theory and Applications

Martingale Theory and Applications Martingale Theory and Applications Dr Nic Freeman June 4, 2015 Contents 1 Conditional Expectation 2 1.1 Probability spaces and σ-fields............................ 2 1.2 Random Variables...................................

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

STOR 635 Notes (S13)

STOR 635 Notes (S13) STOR 635 Notes (S13) Jimmy Jin UNC-Chapel Hill Last updated: 1/14/14 Contents 1 Measure theory and probability basics 2 1.1 Algebras and measure.......................... 2 1.2 Integration................................

More information

Notes 13 : Conditioning

Notes 13 : Conditioning Notes 13 : Conditioning Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Sections 0, 4.8, 9, 10], [Dur10, Section 5.1, 5.2], [KT75, Section 6.1]. 1 Conditioning 1.1 Review

More information

Probability Theory. Richard F. Bass

Probability Theory. Richard F. Bass Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2

More information

Notes 15 : UI Martingales

Notes 15 : UI Martingales Notes 15 : UI Martingales Math 733 - Fall 2013 Lecturer: Sebastien Roch References: [Wil91, Chapter 13, 14], [Dur10, Section 5.5, 5.6, 5.7]. 1 Uniform Integrability We give a characterization of L 1 convergence.

More information

Part III Advanced Probability

Part III Advanced Probability Part III Advanced Probability Based on lectures by M. Lis Notes taken by Dexter Chua Michaelmas 2017 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after

More information

Math-Stat-491-Fall2014-Notes-V

Math-Stat-491-Fall2014-Notes-V Math-Stat-491-Fall2014-Notes-V Hariharan Narayanan November 18, 2014 Martingales 1 Introduction Martingales were originally introduced into probability theory as a model for fair betting games. Essentially

More information

Martingale Theory for Finance

Martingale Theory for Finance Martingale Theory for Finance Tusheng Zhang October 27, 2015 1 Introduction 2 Probability spaces and σ-fields 3 Integration with respect to a probability measure. 4 Conditional expectation. 5 Martingales.

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability Chapter 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary family

More information

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1 ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1 Last compiled: November 6, 213 1. Conditional expectation Exercise 1.1. To start with, note that P(X Y = P( c R : X > c, Y c or X c, Y > c = P( c Q : X > c, Y

More information

Inference for Stochastic Processes

Inference for Stochastic Processes Inference for Stochastic Processes Robert L. Wolpert Revised: June 19, 005 Introduction A stochastic process is a family {X t } of real-valued random variables, all defined on the same probability space

More information

4th Preparation Sheet - Solutions

4th Preparation Sheet - Solutions Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space

More information

Advanced Probability

Advanced Probability Advanced Probability Perla Sousi October 10, 2011 Contents 1 Conditional expectation 1 1.1 Discrete case.................................. 3 1.2 Existence and uniqueness............................ 3 1

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

Lecture 12. F o s, (1.1) F t := s>t

Lecture 12. F o s, (1.1) F t := s>t Lecture 12 1 Brownian motion: the Markov property Let C := C(0, ), R) be the space of continuous functions mapping from 0, ) to R, in which a Brownian motion (B t ) t 0 almost surely takes its value. Let

More information

PROBABILITY THEORY II

PROBABILITY THEORY II Ruprecht-Karls-Universität Heidelberg Institut für Angewandte Mathematik Prof. Dr. Jan JOHANNES Outline of the lecture course PROBABILITY THEORY II Summer semester 2016 Preliminary version: April 21, 2016

More information

Lecture 17 Brownian motion as a Markov process

Lecture 17 Brownian motion as a Markov process Lecture 17: Brownian motion as a Markov process 1 of 14 Course: Theory of Probability II Term: Spring 2015 Instructor: Gordan Zitkovic Lecture 17 Brownian motion as a Markov process Brownian motion is

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

STOCHASTIC MODELS FOR WEB 2.0

STOCHASTIC MODELS FOR WEB 2.0 STOCHASTIC MODELS FOR WEB 2.0 VIJAY G. SUBRAMANIAN c 2011 by Vijay G. Subramanian. All rights reserved. Permission is hereby given to freely print and circulate copies of these notes so long as the notes

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013. Conditional expectations, filtration and martingales MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 9 10/2/2013 Conditional expectations, filtration and martingales Content. 1. Conditional expectations 2. Martingales, sub-martingales

More information

STOCHASTIC CALCULUS JASON MILLER AND VITTORIA SILVESTRI

STOCHASTIC CALCULUS JASON MILLER AND VITTORIA SILVESTRI STOCHASTIC CALCULUS JASON MILLER AND VITTORIA SILVESTRI Contents Preface 1 1. Introduction 1 2. Preliminaries 4 3. Local martingales 1 4. The stochastic integral 16 5. Stochastic calculus 36 6. Applications

More information

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Probability Models. 4. What is the definition of the expectation of a discrete random variable? 1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

Solutions to the Exercises in Stochastic Analysis

Solutions to the Exercises in Stochastic Analysis Solutions to the Exercises in Stochastic Analysis Lecturer: Xue-Mei Li 1 Problem Sheet 1 In these solution I avoid using conditional expectations. But do try to give alternative proofs once we learnt conditional

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

µ X (A) = P ( X 1 (A) )

µ X (A) = P ( X 1 (A) ) 1 STOCHASTIC PROCESSES This appendix provides a very basic introduction to the language of probability theory and stochastic processes. We assume the reader is familiar with the general measure and integration

More information

Advanced Probability

Advanced Probability Advanced Probability University of Cambridge, Part III of the Mathematical Tripos Michaelmas Term 2006 Grégory Miermont 1 1 CNRS & Laboratoire de Mathématique, Equipe Probabilités, Statistique et Modélisation,

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables EE50: Probability Foundations for Electrical Engineers July-November 205 Lecture 2: Expectation of CRVs, Fatou s Lemma and DCT Lecturer: Krishna Jagannathan Scribe: Jainam Doshi In the present lecture,

More information

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

CONVERGENCE OF RANDOM SERIES AND MARTINGALES CONVERGENCE OF RANDOM SERIES AND MARTINGALES WESLEY LEE Abstract. This paper is an introduction to probability from a measuretheoretic standpoint. After covering probability spaces, it delves into the

More information

Exercise Exercise Homework #6 Solutions Thursday 6 April 2006

Exercise Exercise Homework #6 Solutions Thursday 6 April 2006 Unless otherwise stated, for the remainder of the solutions, define F m = σy 0,..., Y m We will show EY m = EY 0 using induction. m = 0 is obviously true. For base case m = : EY = EEY Y 0 = EY 0. Now assume

More information

Conditional expectation

Conditional expectation Chapter II Conditional expectation II.1 Introduction Let X be a square integrable real-valued random variable. The constant c which minimizes E[(X c) 2 ] is the expectation of X. Indeed, we have, with

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1 Problem Sheet 1 1. Let Ω = {1, 2, 3}. Let F = {, {1}, {2, 3}, {1, 2, 3}}, F = {, {2}, {1, 3}, {1, 2, 3}}. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X :

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Lecture 3 - Expectation, inequalities and laws of large numbers

Lecture 3 - Expectation, inequalities and laws of large numbers Lecture 3 - Expectation, inequalities and laws of large numbers Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 1 / 67 Part

More information

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011 Random Walks and Brownian Motion Tel Aviv University Spring 20 Instructor: Ron Peled Lecture 5 Lecture date: Feb 28, 20 Scribe: Yishai Kohn In today's lecture we return to the Chung-Fuchs theorem regarding

More information

Lecture 2: Random Variables and Expectation

Lecture 2: Random Variables and Expectation Econ 514: Probability and Statistics Lecture 2: Random Variables and Expectation Definition of function: Given sets X and Y, a function f with domain X and image Y is a rule that assigns to every x X one

More information

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce

More information

Lecture 19 L 2 -Stochastic integration

Lecture 19 L 2 -Stochastic integration Lecture 19: L 2 -Stochastic integration 1 of 12 Course: Theory of Probability II Term: Spring 215 Instructor: Gordan Zitkovic Lecture 19 L 2 -Stochastic integration The stochastic integral for processes

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Exercises in stochastic analysis

Exercises in stochastic analysis Exercises in stochastic analysis Franco Flandoli, Mario Maurelli, Dario Trevisan The exercises with a P are those which have been done totally or partially) in the previous lectures; the exercises with

More information

MATH 418: Lectures on Conditional Expectation

MATH 418: Lectures on Conditional Expectation MATH 418: Lectures on Conditional Expectation Instructor: r. Ed Perkins, Notes taken by Adrian She Conditional expectation is one of the most useful tools of probability. The Radon-Nikodym theorem enables

More information

Building Infinite Processes from Finite-Dimensional Distributions

Building Infinite Processes from Finite-Dimensional Distributions Chapter 2 Building Infinite Processes from Finite-Dimensional Distributions Section 2.1 introduces the finite-dimensional distributions of a stochastic process, and shows how they determine its infinite-dimensional

More information

Notes 18 : Optional Sampling Theorem

Notes 18 : Optional Sampling Theorem Notes 18 : Optional Sampling Theorem Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Chapter 14], [Dur10, Section 5.7]. Recall: DEF 18.1 (Uniform Integrability) A collection

More information

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes This section introduces Lebesgue-Stieltjes integrals, and defines two important stochastic processes: a martingale process and a counting

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

Stochastic integration. P.J.C. Spreij

Stochastic integration. P.J.C. Spreij Stochastic integration P.J.C. Spreij this version: April 22, 29 Contents 1 Stochastic processes 1 1.1 General theory............................... 1 1.2 Stopping times...............................

More information

1. Probability Measure and Integration Theory in a Nutshell

1. Probability Measure and Integration Theory in a Nutshell 1. Probability Measure and Integration Theory in a Nutshell 1.1. Measurable Space and Measurable Functions Definition 1.1. A measurable space is a tuple (Ω, F) where Ω is a set and F a σ-algebra on Ω,

More information

1 Probability space and random variables

1 Probability space and random variables 1 Probability space and random variables As graduate level, we inevitably need to study probability based on measure theory. It obscures some intuitions in probability, but it also supplements our intuition,

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Stochastic Processes II/ Wahrscheinlichkeitstheorie III. Lecture Notes

Stochastic Processes II/ Wahrscheinlichkeitstheorie III. Lecture Notes BMS Basic Course Stochastic Processes II/ Wahrscheinlichkeitstheorie III Michael Scheutzow Lecture Notes Technische Universität Berlin Sommersemester 218 preliminary version October 12th 218 Contents

More information

{σ x >t}p x. (σ x >t)=e at.

{σ x >t}p x. (σ x >t)=e at. 3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ

More information

Universal examples. Chapter The Bernoulli process

Universal examples. Chapter The Bernoulli process Chapter 1 Universal examples 1.1 The Bernoulli process First description: Bernoulli random variables Y i for i = 1, 2, 3,... independent with P [Y i = 1] = p and P [Y i = ] = 1 p. Second description: Binomial

More information

Probability Theory II. Spring 2016 Peter Orbanz

Probability Theory II. Spring 2016 Peter Orbanz Probability Theory II Spring 2016 Peter Orbanz Contents Chapter 1. Martingales 1 1.1. Martingales indexed by partially ordered sets 1 1.2. Martingales from adapted processes 4 1.3. Stopping times and

More information

Stochastic Processes. Winter Term Paolo Di Tella Technische Universität Dresden Institut für Stochastik

Stochastic Processes. Winter Term Paolo Di Tella Technische Universität Dresden Institut für Stochastik Stochastic Processes Winter Term 2016-2017 Paolo Di Tella Technische Universität Dresden Institut für Stochastik Contents 1 Preliminaries 5 1.1 Uniform integrability.............................. 5 1.2

More information

Useful Probability Theorems

Useful Probability Theorems Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Harmonic functions on groups

Harmonic functions on groups 20 10 Harmonic functions on groups 0 DRAFT - updated June 19, 2018-10 Ariel Yadin -20-30 Disclaimer: These notes are preliminary, and may contain errors. Please send me any comments or corrections. -40

More information

Lecture 21 Representations of Martingales

Lecture 21 Representations of Martingales Lecture 21: Representations of Martingales 1 of 11 Course: Theory of Probability II Term: Spring 215 Instructor: Gordan Zitkovic Lecture 21 Representations of Martingales Right-continuous inverses Let

More information

Modern Discrete Probability Branching processes

Modern Discrete Probability Branching processes Modern Discrete Probability IV - Branching processes Review Sébastien Roch UW Madison Mathematics November 15, 2014 1 Basic definitions 2 3 4 Galton-Watson branching processes I Definition A Galton-Watson

More information

ABSTRACT EXPECTATION

ABSTRACT EXPECTATION ABSTRACT EXPECTATION Abstract. In undergraduate courses, expectation is sometimes defined twice, once for discrete random variables and again for continuous random variables. Here, we will give a definition

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A ) 6. Brownian Motion. stochastic process can be thought of in one of many equivalent ways. We can begin with an underlying probability space (Ω, Σ, P) and a real valued stochastic process can be defined

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Birkhoff s Ergodic Theorem extends the validity of Kolmogorov s strong law to the class of stationary sequences of random variables. Stationary sequences occur naturally even

More information

Stochastic Models (Lecture #4)

Stochastic Models (Lecture #4) Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence

More information

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales Prakash Balachandran Department of Mathematics Duke University April 2, 2008 1 Review of Discrete-Time

More information

Exercises: sheet 1. k=1 Y k is called compound Poisson process (X t := 0 if N t = 0).

Exercises: sheet 1. k=1 Y k is called compound Poisson process (X t := 0 if N t = 0). Exercises: sheet 1 1. Prove: Let X be Poisson(s) and Y be Poisson(t) distributed. If X and Y are independent, then X + Y is Poisson(t + s) distributed (t, s > 0). This means that the property of a convolution

More information

9 Radon-Nikodym theorem and conditioning

9 Radon-Nikodym theorem and conditioning Tel Aviv University, 2015 Functions of real variables 93 9 Radon-Nikodym theorem and conditioning 9a Borel-Kolmogorov paradox............. 93 9b Radon-Nikodym theorem.............. 94 9c Conditioning.....................

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

CHAPTER 3: LARGE SAMPLE THEORY

CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually

More information