Stochastic Processes and Stochastic Analysis Notes of the Lecture by Prof. M. Schweizer

Size: px

Start display at page:

Download "Stochastic Processes and Stochastic Analysis Notes of the Lecture by Prof. M. Schweizer"

Ashlee Harrison
6 years ago
Views:

1 Stochastic Processes and Stochastic Analysis Notes of the Lecture by Prof. M. Schweizer Luca Gugelmann Thomas Rast Sommersemester 27 Warning: We are sure there are lots of mistakes in these notes. Use at your own risk! Corrections and other feedback would be greatly appreciated and can be sent to If you report an error please always state what version (the first number on the Id line below) you found it in. For further information see: Revision 1375 March 26, 21 1

2 2 Contents Preliminaries 3 1 Brownian Motion: Definition and Construction Definition and First Remarks Donsker s theorem Some applications Properties of Brownian Motion Transformations and First Properties Path properties of BM Martingale properties of BM The law of the iterated logarithm Markov processes Basic concepts Markov property and strong Markov property Some applications Generators and martingale problems Stochastic calculus Continuous semimartingales and quadratic variation Stochastic integrals Itô s formula The Girsanov transformation The Kunita-Watanabe decomposition Itô s representation theorem Stochastic differential equations: Basic notions Connections to PDEs Lévy processes Basic concepts Some properties of Lévy processes

3 3 Introduction Topics: Brownian motion Markov processes Stochastic analysis Lévy processes Prerequisite: Probability theory, to the extent of the course last semester. See script and lecture notes. Literature will follow, but the course is not based on any book in particular. Preliminaries Goal: collect/recall a few basic notions and facts. Definition. (Ω, F, P ) a probability space. A filtration on (Ω, F) over [, ) is an increasing family F = (F t ) t of σ-fields F t F, i.e. F s F t, s t. Intuition: F t is a family of events observable up to time t, briefly: the information available in t. F is called right-continuous if F t = F t+ := ε> F t+ε, t. F is called (P -)complete if F is (P -)complete (i.e. any subset of a nullset is measurable), and if F contains all P -nullsets of F. F is said to satisfy the usual conditions if it is right-continuous and complete. Definition. A (real-valued) stochastic process X = (X t ) t is any collection of random variables X t : Ω R indexed by t. More generally one can consider an S-valued stochastic process where (S, S) is a measurable space and indexed by some set I (instead of [, )); but to describe a temporal evolution, t is natural. Three ways of thinking about a stochastic process: 1. A collection of random variables X t on (Ω, F), indexed by t. 2. A family of random functions t X t (ω) on [, ) for every ω Ω; X (ω) for fixed ω is called the path or trajectory corresponding to ω. 3. A mapping X : Ω [, ) R on the product space Ω := Ω [, ). In (2), one can also view X as a mapping from Ω to the function space R [, ), ω X (ω), i.e., a function-valued map on Ω. We say that X has property A (e.g. continuous, increasing, bounded,... ) if X (ω) has property A for P -almost all ω. Note.: Quantitative aspects of A may depend on ω; for example, X a bounded stochastic process usually means: with probability 1, X (ω) is bounded by a constant c(ω). If c does not depend on ω, one says bounded uniformly in ω.

4 4 Now think of X as evolution of system over time; how can this be linked to evolution of information? Call X (F-)adapted if for every t, X t : Ω R is F t -measurable (i.e., X t is observable at time t). Now view X as mapping Ω R; X is called product-measurable w.r.t. the product-σ-field F := F B([, )) on Ω = Ω [, ). There are other possible σ-fields on Ω; one is the optional σ-field O which is generated by all processes which are adapted and have RCLL paths (Right Continuous, Left Limits); X is optional if it is O-measurable. Remark.: g : [, ) R is RCLL if (i) lim s t, s>t g(s) = g(t), t (RC). (ii) lim s t, s<t g(s) exists, t > (LL). Useful result: Theorem..1 (Monotone class theorem). Let M be a set of bounded real-valued functions on some set S. Suppose that M is closed under multiplication and set A := σ(m). Let H be a real vector space of bounded real-valued functions on S and assume H has the following properties: (i) H contains M. (ii) H contains the constant function 1. (iii) H is closed under monotone bounded convergence, i.e., if f 1 f 2... are in H and f := lim n f n is again bounded, then f H. Then H contains all bounded A-measurable functions. The same conclusion is true if H is not a vector space, but in addition closed under uniform convergence, and M is in addition closed under addition (i.e., M is an algebra) and contains the constant function 1. Proof. See Dellacherie/Meyer, I.22, and Protter, Th. I.8. Typical use: want to prove property B for all measurable functions. Then take H := {all bdd. f sat. B}; check (i), (ii), (iii); check that functions in M do satisfy B; if M is good, then this is enough. Example (Exercise!).: Prove that every optional process is adapted. Now view X as in (2) as function valued map X : Ω R [, ). Define coordinate maps Y t : R [, ) R y Y t (y) := y(t) and σ-field B(R) [, ) := σ(y t ; t ). (This is much smaller than the productσ-field, see WT Lemma II.3.6.) Then X is a stochastic process iff X is B(R)- measurable; see Kallenberg, Lemma 2.1. The law/distribution of the stochastic process X under P is a probability measure on (R [, ), B(R [, ) )). So for many things, we can think of stochastic process as measures on some function space. How to describe distribution P X 1 more compactly? It is enough to know all finite-dimensional marginal distributions (fdmds), i.e., the probability measures

5 5 µ (I) := P (X t1,..., X tn ) 1 on R n = R I, for all finite I = {t 1,..., t n } [, ). Then the µ (I), I [, ) finite, determine P X 1 on B(R [, ) ) uniquely; see WT, Lemma II.2.2. A stochastic process is Gaussian if all its fdmds are (multivariate) normal distributions. This says nothing about path properties. The distribution of a Gaussian process is uniquely determined by t E[X t and (s, t) Cov(X s, X t ). Finally, two stochastic processes X, X are called versions/modifications of each other if P [X t = X t = 1, t. They are called indistinguishable if P [X t = X t, t = 1. (More precisely: if t {X t X t } has outer P -measure.) The second notion is usually strictly stronger than the first one; the notions agree e.g. if X, X are both RC. 1 Brownian Motion: Definition and Construction 1.1 Definition and First Remarks Let (Ω, F, P ) a probability space. Definition. A Brownian motion (BM) on [, 1 is a stochastic process with: W = (W t ) t 1 (BM1) P [W = = 1: BM starts at P -a.s. (BM2 ) For any n N and any points = t < t 1 < < t n 1, the increments W tk W tk 1, k = 1,..., n, are independent and N (, t k t k 1 ): independent, Gaussian increments. (BM3) P -almost all trajectories W (ω) are continuous on [, 1. Then the image measure/distribution on (C[, 1, B(C[, 1)) is called Wiener measure. Remark.:. Existence of such a process is non-trivial. 1. Wiener measure is determined by its fdmds because B(C[, 1) is generated by coordinate maps; see WT, V.4.1. and Lemma II BM with respect to a filtration F = (F t ) t 1 : F-adapted stochastic process W = (W t ) t 1 with (BM1), (BM3) and (BM2) for s < t, W t W s is independent of F s and N (, t s). Proposition A Brownian Motion can be characterized as the unique Gaussian process X = (X t ) t 1 with P -a.s. continuous trajectories and with E[X t, Cov(X s, X t ) = min(s, t). Proof. (BM1) plus (BM2 ) are equivalent to saying that the fdmds of W are multivariate normal, because (W, W t1,..., W tn ) is an affine transformation of

6 1.1 Definition and First Remarks 6 X n t 1 n 1/n 1 t Figure 1: Rescaled linear interpolation approach to BM (W t1, W t2 W t1,..., W tn W tn 1 ). So a BM W is a Gaussian process and E[W t =, and for s < t Cov(W s, W t ) = Cov(W s, W t W s ) + Var[W s = s = min(s, t). }{{} = by (BM2 ) Conversely, if X is as above, then E[X =, Var[X =, so X = P -a.s.; and any vector of increments is then multivariate normal, as affine transformation, because X is Gaussian. Moreover E[X t X s = and Var[X t X s = Cov(X t X s, X t X s ) = t s s + s = t s; and finally, for s t u v, Cov(X t X s, X v X u ) = t t s + s =, so increments are uncorrelated, hence independent, because they are multivariate normal. So X is BM. Ways of proving existence of BM: 1. Limit of a rescaled random walk: (Y k ) k N i.i.d., E[Y k =, Var[Y k = 1; S n = n k=1 Y k; X n = (Xt n ) t 1 piecewise linear interpolation of Xl/n n = S l / n (Figure 1). Call µ n the distribution of X n on (C[, 1, B(C[, 1)). Then µ n µ weakly as n, so µ is the Wiener measure. This is Donsker s theorem. 2. Construction via fdmds: (BM1) and (BM2 ) define a system of fdmds; Kolmogorov s consistency theorem (WT, Theorem II.3.5) thus gives existence of a probability measure µ on (R [,1, B(R [,1 )) with these fdmds. Then use Kolmogorov s continuity criterion to show that coordinate process Y on R [,1 is uniformly continuous at rational points, µ-a.s. Then define BM by { continuous continuation of Y (ω) µ-a.e. W (ω) := on a µ-nullset Then W is BM under P := µ on Ω := R [,1. For details see Karatzas/Shreve,

7 1.2 Donsker s theorem 7 3. Random superposition of deterministic functions: Take (Z n ) n N i.i.d. N (, 1) and a sequence (φ n ) n N of deterministic functions in C [, 1. Remark.: W t (ω) := Z n (ω)φ n (t), t 1. n=1 Under suitable assumptions on (φ n ), this W is a BM (Lévy s construction). See Steele, Section BM on [, ): take independent copies W k of BM on [, 1 and glue them together: n W t := W1 k + Wt n n+1 if n < t n + 1. k=1 2. BM in R d : simply (W 1,..., W d ) where W i are independent BMs in R. 1.2 Donsker s theorem (Ω, F, P ) probability space; (Y k ) k N i.i.d. with E[Y k =, Var[Y k = 1; S n := n k=1 Y k and piecewise linear interpolation (as in Figure 1 above) X n t = 1 n S [nt + 1 n Y [nt+1 (nt [nt). µ n := P (X n ) 1 M 1 (C[, 1). Then: Theorem (Donsker s theorem). Under the above assumptions, µ n µ as n, where µ is the Wiener measure. Remark.: 1. Existence plus construction of Wiener measure (or BM). 2. Invariance principle: always same limit irrespective of the exact distribution of the Y k s. 3. As in CLT, one can generalize from i.i.d. to a triangular array. 4. Functional version of CLT: for t = 1, X n 1 = 1 n S n yields CLT; here, we look at the whole process (all t simultaneously). 5. If F : C[, 1 S is continuous, then X n X implies that F (X n ) F (X). One application in finance: convergence (in distribution) of multiplicative binomial models (Cox-Ross-Rubinstein) to geometric BM (Black- Scholes model). To prove Theorem 1.2.1, go back to WT, Theorem V.4.3.: need to prove a) weak convergence of all fdmds (via CLT) b) tightness of (µ n ) n N (real work). Preparations, first for a): Lemma U n, U random variables on (Ω, F, P ) with values in normed vector space (S, ).

8 1.2 Donsker s theorem 8 1. (V n ) n N sequence of S-valued random variables with V n n in probability. If S is separable and U n U, then also U n + V n U. 2. (c n ) n N sequence in R with c n c. If U n U, then also c n U n cu. Proof. 1. S separable B(S S) = B(S) B(S) each U n + V n is still a random variable. Take h C b (S), then 2. E[h(U n + V n ) E[h(U) E[ h(u n + V n ) h(u n ) + E[h(U n ) h(u) }{{} n and E[ h(u n + V n ) h(u n ) 2 h P [ V n δ + sup x,z x z δ h(x) h(z). Now 2 h P [ V n δ as n for each fixed δ (convergence in probability), and sup... h(x) h(z) becomes arbitrarily small for δ small, if h is uniformly continuous. But may take h uniformly continous w.l.o.g., cf. Portmanteau theorem, WT Theorem V.2.2. E[h(c n U n ) E[h(cU) E[ h(c n U n ) h(cu n ) + E[h(cU n ) h(cu) }{{} sup h(x) h(z) + 2 h P [ U n M. x,z x z M c c n Again sup... h(x) h(z) becomes arbitrarily small even for M large, if h is uniformly continuous: ok. For the second term lim sup P [ U n M P [ U M n because U n U, and P [ U M becomes small for M large. Proposition Under the assumptions of Theorem 1.2.1, all fdmds of (µ n ) converge weakly to corresponding fdmds of Wiener measure µ. Proof. Note that even if we do not know yet whether µ exists, we do know how its fdmds must look: they are specified by (BM1) and (BM2 ) as Gaussian. So fix J [, 1 finite and consider µ (J) n = P (Xt n 1,...,t n ) 1, where J = {t 1,..., t m }. First suppose J = {t} (m = 1). Then µ n (J) is the distribution of Xt n, and corresponding fdmd of µ is N (, t). So consider On the one hand X n t = 1 n S [nt + 1 n Y [nt+1 (nt [nt). 1 [nt S [nt S [nt = N (, t), n }{{ n } [nt }{{} t N (,1)

9 1.2 Donsker s theorem 9 by CLT, and on the other hand the second term converges to in probability: 1 Y [nt+1 (nt [nt) n 1 Y [nt+1, n and by Chebyshev P [ Yk δ Var[Y k n nδ 2 n, δ >. So for m = 1, ok. For m > 1 the analogous result follows using CLT in R m. Lemma (Ottaviani). Suppose we have U 1,..., U n independent random variables with E[U i =, n i=1 Var[U i = 1. For Z k = k i=1 U i, we have: [ P for every α > 1. max Z k > 2α k=1,...,n 1 1 1/(α 2 ) P [ Z n > α Proposition Under the assumptions of Theorem 1.2.1, the sequence (µ n ) n N is tight. Proof of Prop We want to use WT, Prop. V.4.3. By construction: n N X n = n N µ n ({X() = }) = 1 µ n = µ n π 1 {} is tight in M 1 (R) It remains to show the second statement in Prop. V.4.3., that is by using: W δ (X) = sup{ X(t) X(s), s, t [, 1, s t δ} and the definition of µ n, that: ε lim lim sup δ n [ P sup s,t s t δ For fixed s with kδ s < (k + 1)δ, we have Xt n Xs n > ε = X n t X n s = X n t X n (k+1)δ + Xn (k+1)δ Xn kδ + X n kδ X n s and t (k + 1)δ δ for t s δ. So we get that: sup Xt n Xs n 3 sup s,t: s t δ k 1 δ sup Xt n Xkδ n t [kδ,(k+1)δ) This gives: P [ sup Xt n Xs n > ε P [ sup sup Xt n Xkδ n > ε/3 s,t s t δ k 1 t [kδ,(k+1)δ) δ P [ sup Xt n Xkδ n > ε/3 k 1/δ t [kδ,(k+1)δ)

10 1.2 Donsker s theorem 1 Fix δ and find, for given n, integers j, m, such that j + m 1 m j m kδ < j + 1 m and (k + 1)δ < j + m m For t that lies between kδ and (k + 1)δ and each lie in turn between two points on a grid of width 1/n. For n sufficiently large we can ensure m/n 2δ, hence m [2δn sup t [kδ,(k+1)δ) { maxl=1,...,m 1 n S j+l S j max l=1,...,m 1 n S j+1+l S j+1 Y k i.i.d. with Var[Y k = 1, S l, l [2δn Var[S l [2δn. Therefore using Lemma with α = ε/6 1/ 2δ: [ [ P Xt n Xkδ n > ε/3 2P max s l > ε/3 n sup t [kδ,(k+1)δ) L (w.l.o.g. δ δ const(ε) independent of δ). By CLT, because S [2δn / [2δn n N (, 1). l=1,...,[2δn [ δ/(ε 2 ) 2P S [2δn [2δn ε 6 2δ [ S[2δn = const(ε)2p [2δn ε 6 2δ ( ( n ε ) ) const(ε)2 1 φ 6 2δ Because we sum over all k 1/δ, we obtain with k(δ) = ε/(6 2δ): lim sup n P [ sup s,t: s t δ For δ, we have k(δ) and because X n t X n s > ε 1 δ const(ε) ε k(δ) k 2 (δ) const(ε) y 2 f(x)dy = 1 6 2δ f(y)dy k(δ) f(x)dy const(ε) y 2 f(x)dy k(δ) the above integral goes to, and this concludes the proof. By combining Proposition and Proposition with WT Thm V.4.3, we n obtain µ n µ and so Donsker s theorem (1.2.1) is proved.

11 1.3 Some applications Some applications Consider situation of Donsker s Theorem: X n, µ n, µ Wiener measure, X BM (so µ n = P (X n ) 1 µ = P X 1 ). Consider F : C[, 1 S (for metric space (S, d)); then F (X n ) F (X) if F is continuous. Lemma D F := {x C[, 1 F is not continuous at x} is in B(C[, 1). Proof. (Note: the proof holds for any metric space, not just C[, 1). F discontinuous at x: ε > such that y, z arbitrarily close to x with d(f (y), F (z)) ε. So: D F = (U δ (y) U δ (z)) B(C[, 1). ε> y,z δ> ε Q δ Q d(f (y),f (z)) ε } {{ } open Proposition Let F : C[, 1 S measurable with µ(d F ) =, i.e., F is µ-a.e. continuous on C[, 1. Under the assumptions of Theorem 1.2.1, we then have F (X n ) F (X) or, equivalently, µ n F 1 µ F 1. Proof. F continuous: obvious; in general, for A S closed, F 1 (A) F 1 (A) D F, so using that F 1 (A) is closed in C[, 1 and the Portmanteau Theorem (WT V.2.2), lim sup n µ n (F 1 (A)) lim sup µ n (F 1 (A)) µ(f 1 (A)) µ(f 1 (A)), n i.e. µ n F 1 µ F 1. How can we use Proposition 1.3.2? Two ways: 1. Suppose we know the distribution of F (X). Then: universal limit result: for any i.i.d. sequence (Y i ) as in Theorem 1.2.1, F (X n ) some limit. 2. To find the distribution of F (X): choose good i.i.d. sequence (Y i ), compute distribution of F (X n ) for that, then compute limit; gives distribution of F (X). Often a good choice is Y i = ±1 with probability 1 2 each. Example.: F (x) = x(1) (final value): then F (X) = X 1 N (, 1) since X is BM. Moreover, F (X n ) = X1 n = 1 S n = 1 n Y i. n n i=1 So Proposition (F is continuous on C[, 1) gives 1 n n i=1 Y i N (, 1),

12 1.3 Some applications 12 whenever (Y i ) are i.i.d. with E[Y i = and Var[Y i = 1. This is the classical CLT. Example.: F (x) := sup t 1 x(t) = max t 1 x(t) (maximum of x): this F is continuous C[, 1 R, indeed even F (x) F (y) x y. Here, F (X) = max t 1 X t is the maximum of BM. Moreover F (X n 1 ) = max t 1 Xn t = max S l l=,...,n n since max will be taken at one of the n + 1 interpolation points. Set M n := max S l = l=,...,n max l=,...,n i= Choose Y i binary: P [Y i = +1 = P [Y i = 1 = 1 2. Lemma For the above binary i.i.d. Y i, l Y i. P [M n b = 2P [S n > b + P [S n = b. Proof. b = : LHS = 1 and RHS = 1 because the distribution of S n is symmetric around. b > : use {S n b} {M n b} to write P [M n b P [S n = b = P [M n b, S n > b + P [M n b, S n < b = P [S n b + P [M n b, S n < b. By reflection principle, P [M n b, S n < b = P [M n b, S n > b, so P [M n b P [S n = b = P [S n b + P [M n b, S n > b = P [S n b + P [S n > b. Example (cont.).: F (X n ) = 1 n M n. So for x and b n := [ nx, Lemma for binary Y i gives [ P [F (X n Sn ) x = P [M n b n = 2P > b [ n Sn + P = b n. n n n n Now take limit as n and use CLT: Sn n N (, 1). Moreover, that P [S n = b n n. Therefore, since X 1 N (, 1). P [ max X t x = P [F (X) x = lim P [F t 1 n (Xn ) x [ = 2 lim P Sn > b n n n n = 2P [X 1 > x = P [ X 1 x, b n n x so Corollary For BM X, max t 1 X t and X 1 have the same distribution, namely P [ max t 1 X t z = P [ X 1 z = 2Φ(z) 1, z [, ).

13 1.3 Some applications 13 t = F (x) y(t) x(t) Figure 2: Example x at which F is discontinuous ±δ u ε F (X) ε F (x) F (X) + ε 1 Figure 3: F must be continuous at a sign change Corollary For every i.i.d. sequence (Y i ) with E[Y i = and Var[Y i = 1, [ l lim P Y i z = 2Φ(z) 1. n max l=,...,n 1 n Proof. Immediate from Corollary and Proposition i=1 Remark.: We could also prove Corollary directly by first proving a reflection principle for BM. This needs the strong Markov property. Example.: F (x) := sup{t [, 1 x(t) = } 1, i.e. F (x) is the time of the last zero of x( ) in [, 1. This F is not continuous on C[, 1: in Figure 2 y x < δ but F (y) = 1 is not close to t < 1, i.e. to F (x). Lemma The above F is µ-a.e. continuous on C[, 1. In other words, F is continuous on typical trajectories of BM; with probability 1, BM cannot behave like x. Proof. Claim: If F is discontinuous at x, then x looks as in Figure 2; more precisely: F discontinuous at x ε > such that x has the same sign on (F (x), 1 and on (F (x) ε, F (x)). Proof of claim: assume x after F (x), but for every ε >, u ε in (F (x) ε, F (x)) with x(u ε ) <. Then F is continuous at x: choose ε >. Choose δ > with x δ on (F (x) + ε, 1. If then y x < δ/2, necessarily F (y) F (x) + ε, since y δ/2 after F (x) + ε. (Figure 3) Take u ε for ε, so x(u ε ) < and u ε lies in (F (x) ε, F (x)). Assume in addition that y x < 1 2 x(u ε) ; then F (y) F (x) ε, since x passes and y is close to x. So F (y) F (x) ε. So define { } A ± x < > on both (F (x), 1 and on := x C[, 1. (F (x) ε, F (x)) for some ε >

14 14 Then D F A + A, so it is enough to prove µ(a ± ) =. Now A { } max (x(s) x(r)) = x(r). r s 1 r Q [,1 For fixed r [, 1, (X s X r ) r s 1 is a BM on [r, 1 and independent of X r. So ({ }) [ µ max (x(s) x(r)) = x(r) = P max (X s X r ) + X r = r s 1 r s 1 and since max r s 1 (X s X r ) =: U N (, 1 r) by Corollary 1.3.7, and X r =: N (, r), where U, V independent, we get ({ }) µ max (x(s) x(r)) = x(r) = P [U + V = =. r s 1 Example.: F (x) := sup{t [, 1 x(t) = } is continuous on C[, 1 (Wiener-measure-)a.e. for the sequence (X n ) from Donsker s theorem, we get F (X n ) F (X). To find the distribution of F (X), again use for X n simple binary random walk; then F (X n ) is the last zero of simple symmetric random walk, and it is known that this has the discrete arcsine distribution. Passing to the limit (and doing some work) we can get Corollary If X is BM and L = F (X) = sup{t [, 1 X t = } is the last zero of X before 1, then L has the arcsine distribution P [L z = 2 π arcsin( z) for z [, For every sequence of i.i.d. random variables (Y i ) with E[Y i = and Var[Y i = 1, we have for all z [, 1: lim P [max{k {,..., n} S k 1S k < } nz = 2 n π arcsin( z) (Details: Billingsley, Sec. 9) 2 Properties of Brownian Motion 2.1 Transformations and First Properties Throughout this section, W = (W t ) t is BM on (Ω, F, P ). How to get new BMs from W? Proposition Each of the following processes is again BM: 1. X 1 := W (reflected at x-axis)

15 2.1 Transformations and First Properties X 2 t := W t+s W s, t, for any fixed s (restarted at time s) 3. X 3 t := cw t/c 2, t, where c > fixed (rescaled in time and space) 4. Inversion of small and large times: { Xt 4 for t = := tw 1/t for t > 5. X 5 t := W t I {t<t} + (W t + Z (W t W t ))I {t t}, where t > is fixed and Z is independent of W and P [Z = +1 = P [Z = 1 = 1 2. Proof. X 5 see exercises. X 1, X 2, X 3 all are P -a.s. continuous. Next, X 1,..., X 4 all are Gaussian processes, all with E[Xt i =. One easily calculates Cov(Xs, i Xt) i = s t; so X 1, X 2, X 3 are BM by Proposition It only remains to prove that X 4 is continuous at, i.e. P [lim t Xt 4 = = 1. Denote by Q the distribution of X 4 on C(, 1. Then Q = µ C(,1, because both X 4 and BM are continuous on (, 1 and, being Gaussian, have the same fdmds. So P [lim t X 4 t = = Q[{x C(, 1 lim t x(t) = } = µ({x C(, 1 lim t x(t) = }) = P [lim t W t = = 1. Corollary (Strong law of large numbers for BM). (for BM W ). W t lim = P -a.s. t t Proof. Wt := tw 1/t is BM by Proposition So [ P lim t W t t [ [ W 1/h = = P lim h 1/h = = P lim h W h = = 1 (last step by Proposition 2.1.1). In particular, Corollary says that BM grows more slowly than linearly as t. So how fast exactly does it grow? Theorem (Law of the iterated logarithm (LIL) for BM). For BM W, with probability 1, lim sup t W t 2t log(log t) = +1, lim inf t W t 2t log(log t) = 1. Theorem is proved like Corollary by using X 4 from Proposition and the following result:

16 2.1 Transformations and First Properties 16 W t t h very small t + h Figure 4: LIL property of BM Theorem (LIL for BM). If we set ψ(h) := have, for BM W, for any t, [ W t+h W t P lim sup = +1 ψ(h) P h [ W t+h W t lim inf = 1 h ψ(h) 2h log(log 1 h ), then we = 1, = 1. So close to the right of t, trajectories of BM behave as sketched in Figure 4. The proof of Theorem is postponed for the moment. Let W be Brownian Motion and define the set of zeroes of W (ω) in [, 1. N(ω) := {t [, 1 W t (ω) = }, Theorem For P -almost all ω, the set N(ω) 1. is closed. 2. has Lebesgue measure. 3. is infinite. 4. is perfect, i.e. it contains no isolated points: each point in N(ω) is a cluster point of points in N(ω). In particular, N(ω) is uncountable. Proof. 1. N(ω) is the set of zeroes of the continuous function W (ω), hence closed. 2. W is product-measurable on Ω [, 1 (exercise), so hence λ(n) = P -a.s. [ 1 E[λ(N) = E I {Wt=} dt = 1 P [W t = }{{} = 3. By the LIL (Theorem 2.1.4), BM oscillates infinitely often close to the right of t =. dt

17 2.2 Path properties of BM We cannot prove this completely (yet), but the argument goes like this: for fixed s, by Proposition 2.1.1, W +s W s is again BM. It seems plausible that this generalizes to random times. So fix q and define the first zero of W after q. τ(ω) := inf{t > q W t (ω) = }, Then we claim (at present without proof) that is again BM. W t := W t+τ W τ = W t+τ, t (2.1.6) By the LIL, t = is a cluster point of the zeros of W from the right, P -a.s. In the same way, by (2.1.6), τ is a cluster point from the right of the zeroes of W, or of W, P -a.s. So the probability that the first zero after some rational q is not a cluster point of zeroes from the right is. But the rational points lie dense in [, 1, and so N(ω) P -a.s. contains no isolated points. That N(ω) is then uncountable is a general fact, see Hewitt/Stromberg Thm. II We still need to prove (2.1.6), and this will be a consequence of the so-called strong Markov property of BM; see later. 2.2 Path properties of BM We know that the trajectories of BM are P -a.s. continuous. But still, they are quite wild: Theorem P -almost all paths W (ω) of BM are nowhere differentiable. Proof. We show that the set of trajectories which are differentiable somewhere has outer measure. (It seems unknown whether this set is measurable.) If W (ω) is differentiable at s, then it has bounded difference quotients close to s, so that W t W s C t s for t near s. Therefore W k n W k 1 n C 1 n, for all n large, k n near s, and for three successive k. So: Now W k+j+1 n 3 P j=1 A := {W is differentiable at some s [, 1 } 3 { W k+j+1 W k+j C 1 } n n n C N m N n m k n j=1 =: B. W k+j n { W k+j+1 n N (, 1 n ), so W k+j n n} C 1 = ( [ P Z C ) 3 const. n 3 2. n

18 2.2 Path properties of BM 18 Next, so B m := n m k n j= P [B m lim sup P n k n j=1 3..., n m, k n j= lim sup n const. n 3 2 n =. So finally P [B =. Immediate consequence: P -almost all paths of BM have infinite variation on any time interval, because Lebesgue s differentiation theorem says that any function of finite variation is λ-a.e. differentiable. Intuitive explanation: W t+h W t N (, h), so in particular, W t+h W t is symmetric around with E[(W t+h W t ) 2 = h. So heuristically, we have W t+h W t is of the order ± h. Now consider a partition of [, 1: Π = {t,..., t k } with t < < t k 1, Π := sup{t k t k 1 t k 1, t k Π}. Then t i Π W ti+1 W ti t ti+1 i Π t i, typically, heuristically. So for a sequence (Π n ) with Π n (e.g. equidistant points), we heuristically expect W t n i+1 W t n n i + t n i Πn since 1/ n is not summable. This explains heuristically why we have infinite variation. In the same heuristic way, we expect: ( Wt n W ) 2 i+1 t n i (t n i+1 t n i ) = 1 t n i Π t n i Πn The surprising thing is that this heuristic reasoning gives a correct result: Theorem (Lévy). Let (Π n ) be a sequence of partitions of [, 1 with Π n. Then for any t [, 1: V n t := t n i Π ( ) 2 n W t t i+1 t W t n i t t in L 2 (P ) If the sequence (Π n ) is refining (i.e., Π n Π n+1 n), then even lim n Vt n = t P -a.s. In other words: P -a.a. trajectories of BM have on [, t quadratic variation t, along any refining sequence of partitions with Π n. Remark.: It is important that the sequence (Π n ) is fixed (it must not depend on ω); for each fixed ω, the quadratic variation of W (ω), computed as the sup of sums of squared increments over all partitions, possibly depending on ω, is +.

19 2.3 Martingale properties of BM 19 Proof. P -a.s. convergence is a bit tricky, see Kallenberg, T L 2 -convergence is easy: take t = 1 and set n i W = W t n i+1 t W t n i t, n t i = t n i+1 t tn i t, so: V n 1 = i ( n i W ) 2. Now: the n i W are independent and N (, n t i ). So E[V n 1 = i n t i = 1, Var[V n 1 = i Var[( n i W ) 2 since: Var[( n i W ) 2 = E[( n i W ) 4 (E[( n i W ) 2 2 ) = 3( n t i ) 2 ( n t i ) 2 we obtain: Hence: Var[V n 1 2 i ( n t i ) 2 Π n 2 n t i. i }{{} =1 V n L 2 = Var[V n 1 n So: along any fixed good sequence of partitions of [, t, BM trajectories have a quadratic variation (of t). This is the key property behind Itô s formula and stochastic calculus; see later. It also implies that BM has infinite variation along any such (good) sequence of partitions, because: any continuous function of finite variation has quadratic variation. (Write n i f := f(tn i+1 t) f(tn i t) to get: ( n i f) 2 sup n i f n i f ; i i i the second factor remains bounded (in n) since f has finite variation, and the first is: sup { f(t n i+1 t) f(t n i t) t n i, t n i+1 Π n} and this goes to as Π n because on [, t, f is uniformly continuous.) 2.3 Martingale properties of BM Start with (Ω, F, P ) and BM (W t ) t on it; so we have (BM1), (BM2 ) and (BM3). There is no filtration yet. Definition. Let H t := σ(w s ; s t) the (raw) filtration generated by W, and: H t := ε> H t+ε Difference: H t H t, and H = (H t ) t is right-continuous; this is important for martingale theory in continuous time.

20 2.3 Martingale properties of BM 2 Lemma For all t and h >, the increment W t+h W t is independent of H t and even of H t. Proof. 1. By (BM2 ), BM has independent increments, and this already implies that W t+h W t is independent of Ht ; see Exercise 2-1. (This uses no properties of distribution of increments nor path properties of BM.) 2. To get from H t to H t, we only use that W has (right-)continuous trajectories, as follows: take g t on Ω, bounded H t -measurable and f on R, bounded, continuous. g t is H t+ε-measurable, W t+h W t+ε is independent of H t+ε. Therefore: E[g t f(w t+h W t+ε ) = E[g t E[f(W t+h W t+ε ). Let ε ; then W t+ε W t P -a.s. by RC, so also f(w t+h W t+ε ) f(w t+h W t ) P -a.s.; all is bounded, therefore Lebesgue gives: E[g t f(w t+h W t ) = E[g t E[f(W t+h W t ) for all f bounded, continuous. Use MCT to extend this to all f bounded measurable, and this shows that W t+h W t is independent of H t. Proposition The following are all (F, P )-martingales: 1) (W t ) t 2) (W 2 t t) t 3) (e αwt 1 2 α2t ) t for any α R. Proof. Adapted: clear. Integrable: W t is normal. Martingale property: 1) E[W t W s F s ind N (,...) = E[W t W s = 2) E[Wt 2 F s = E[(W t W s + W s ) 2 F s W mart. = E[(W t W s ) 2 F s + Ws ind = E[(W t W s ) 2 + Ws 2 N (,t s) = t s + Ws 2 3) With M t := e αwt 1 2 α2t, we get: [ Mt E F s = E[e α(wt Ws) 1 2 α2 (t s) F s M s ind = e 1 2 α2 (t s) E[e α(wt Ws) = e 1 2 α2 (t s) e 1 2 α2 (t s) = 1 To exploit Proposition 2.3.2, we need the stopping theorem for martingales in continuous time. Definition (Stopping time wrt. F). Mapping τ : Ω [, with {τ t} F t, t. σ-field of events observable up to time τ is: check: F τ := {A F A {τ t} F t, t }

21 2.3 Martingale properties of BM 21 W t (ω) a τ a (ω) t Figure 5: Entry time of W into (a, ) no a yes t Figure 6: Local maximum? i) F τ is a σ-field ii) F σ F τ for σ τ iii) If τ t for some fixed t, then F τ = F t. Example.: For a > consider the first entry time of W into (a, ), i.e., τ a := inf{t > W t (a, )} = inf{t > W t > a} (Figure 5). Note: Because W (ω) is continuous, we have W τa = a. This τ a is a stopping time wrt. H (or F), but not wrt. H. Proof. Uses only that τ a is the first time that an adapted RC process (like W ) hits an open set (like (a, )): {τ a t} = {τ a < t + ε} ε> (a, ) is open, W RC = ε> r<t+ε r Q {W r > a} }{{} H r } {{ } H t+ε Now suppose we can only observe H t. τ a t? We cannot tell here: t might be a local maximum of W, but it need not be (Figure 6). This illustrates why τ a cannot be an H -stopping time. Lemma Let X = (X t ) t be adapted and RC and τ be an F-stopping time. Define: X τ : Ω R by (X τ )(ω) := X τ(ω) (ω); assume also that X is well defined (e.g. X := or X := lim n X t ). Then X τ is F τ -measurable. Proof. The lemma is intuitive, but needs proving; see Kallenberg, Lemma 6.5.

22 2.3 Martingale properties of BM 22 Theorem (Stopping Theorem). Let M = (M t ) t be an (F, P ) martingale with RC paths and σ, τ F-stopping times with σ τ. If τ is bounded or if M is uniformly integrable, then: E[M τ F σ = M σ Proof. 1) If M is UI, then the martingale convergence theorem (WT, Thm. IV.3.6) implies that M := lim n M n exists P -a.s., that M L 1 (P ), and that M n = E[M F n, n N. Then also M t = E[M F t, t. If τ N for some N <, then E[M N F t = M t, t, and we only need M on [, N. So in both cases we may assume that M t = E[M N F t, t, and τ N, for some N [,. 2) F σ F τ, conditional expectation is projective it is enough to prove E[M N F τ = M τ. This already follows if ρ E[M ρ is constant over all stopping times ρ. Indeed, take A F τ, ρ N, ρ := τi A + NI A c. Then ρ, ρ are stopping times and so Hence E[M N = E[M ρ = E[M ρ = E[M τ I A + E[M N I A c. E[M N I A = E[M τ I A, A F τ, and M τ is F τ -measurable by Lemma 2.3.4; so M τ = E[M N F τ. 3) Idea: to prove ρ E[M ρ is constant, reduce to discrete time and use stopping theorem from there (WT Thm. IV.3.8). If ρ has only countably many values t n, n N, which are increasingly ordered, view (M tn ) n N as discrete-time martingale in the filtration (F tn ) n N and note that ρ is also a stopping time for this filtration. Stopping Theorem (WT IV.3.8) then says E[M ρ = E[M N (and even M ρ = E[M N F ρ ). Now take a general ρ and define ρ m := (k + 1)2 m I {k2 m ρ<(k+1)2 m } + NI {ρ=n}. k= Then ρ m has only countably many increasingly ordered values, and ρ m is an F-stopping time, because {ρ m t} = {k2 m ρ < (k + 1)2 m } F t for t < N. k:(k+1)2 m t By construction, ρ m ρ, so M ρm M ρ P -a.s. by RC. Moreover, M ρm = E[M N F ρm ; hence (M ρm ) m N is UI and therefore [ E[M ρ = E lim M ρ m = lim E[M ρ m = E[M N. m m So we are done. Remark.: Why assume that F is right-continuous? It seems that we did not use that. General result (Kallenberg, Thm. 7.27): In a right-continuous filtration, every martingale has a version with RCLL trajectories. We always choose such a version (!).

23 2.3 Martingale properties of BM 23 Corollary (Maximal inequality). Let M be a (RC) martingale and set Mt := sup s t M s. Then and letting t then gives [ P sup M s > c s P [Mt > c E[M, c >, c E[M, c >. c Proof. τ c := inf{t M t > c} is a stopping time; {Mt on the set {τ c t}, M τc t = M τc c since M is RC. So > c} {τ c t}, and cp [M t > c = E[cI {M t >c} E[M τc ti {τc t} E[M τc t = E[M. Application.: Estimate for BM W : for any t > and α, β >, [ P (W s α ) 2 s β e αβ. (2.3.7) sup s t Proof. M s := exp(αw s 1 2 α2 s), s, is a martingale by Proposition So [ P sup (W s α ) s t 2 s β = P [Mt e αβ = lim P [Mt > e αγ γ β lim γ β e αγ E[M e αβ. Next we consider the ruin problem for BM. Let W be BM and fix a < < b: W t b τ a,b t a Set with obvious notations. τ a,b := inf{t W t [a, b} = τ a τ b Theorem (Ruin problem for BM). We have P [W τa,b = a = b b a = b b + a, (2.3.9) E[τ a,b = ab = a b. (2.3.1) Proof. We want to use Theorem 2.3.5; but because τ a,b is not bounded and the martingales we examine are not UI, we must be careful.

24 2.4 The law of the iterated logarithm For brevity, write τ := τ a,b. (W 2 t t) t is a martingale by Proposition 2.3.2; so use Theorem for it with bounded stopping time τ n to get E[W 2 τ n = E[τ n E[τ by monotone integration. Now let n : W 2 τ n max(a 2, b 2 ), so E[τ max(a 2, b 2 ) <, so τ < P -a.s. Hence Wτ n 2 Wτ 2 gives E[Wτ 2 = E[τ. So as n P -a.s., and now Lebesgue with p := P [W τa,b = a. E[τ a,b = E[W 2 τ a,b = pa 2 + (1 p)b 2 (2.3.11) 2. Also W is a martingale; so Theorem gives E[W τ n =. As n, W τ n W τ P -a.s. since τ < P -a.s.; and W τ n max( a, b). So Lebesgue gives: E[W τ = lim E[W τ n = n hence = E[W τa,b = pa + (1 p)b Solve this for p to get (2.3.9); plug into (2.3.11) to get (2.3.1). Remark.: By the LIL, we get τ a < P -a.s. and τ b < P -a.s. for any a < < b. Moreover, τ a,b τ b as a. Therefore E[τ b mon. int. = lim E[τ a,b = lim a b = + a a and symmetrically E[τ a =. So these are canonical examples of stopping times τ with τ < P -a.s. but E[τ =. 2.4 The law of the iterated logarithm The goal of this subsection is to prove the LIL (Theorem 2.1.4). First we need a useful technical estimate: Lemma For Z N (, 1) and any a > : 1 a + 1 a e 1 2 a2 2πP [Z > a 1 a e 1 2 a2 That is, the tail probability for N (, 1) decays exponentially with quadratic rate. Proof. RHS: (e 1 2 x2 ) = xe 1 2 x2 ; so: 2πP [Z > a = e 1 2 x2 dx a a x a e 1 2 x 2dx = 1 a e 1 2 a2 LHS: ( 1 x e 1 2 x2 ) = (1 + 1 x )e x2 ; so: πP [Z > a = e 1 2 x2 x dx e 1 2 x2 dx = 1 1 a a e 1 2 a 2. 2 a 2 a

25 2.4 The law of the iterated logarithm 25 Theorem (2.1.4, LIL for BM). Let W be BM and define: ψ(h) := 2h log(log 1 h ). For any t : P [lim sup h P [lim inf h W t+h W t ψ(h) W t+h W t ψ(h) = +1 = 1 = 1 = 1 Proof. (W s+t W t ) s is again BM; so wlog t =. W is also BM; so need only to prove lim sup, that is: { 1 P -a.s. lim sup h W h ψ(h) = 1 P -a.s. Proof of : Idea: approximate ψ(h) in a piecewise affine way, and estimate W very well against these affine functions. Choose a sequence of the form h n = θ n 1 with < θ < 1; set α n := (1 + δ) h n+1 ψ(h n+1 ), β n := 1 2 ψ(h n+1) with δ >. Then: P [ sup (W s α n s h n 2 s) β n (2.3.7) = e αnβn =... = const(n + 1) (1+δ) This is summable over n since δ > ; so Borel-Cantelli implies that with probability 1, sup s hn (W s αn 2 s) β n finally, i.e., for n large enough. So we have P -a.s. for n n (ω) that if we choose h > with h n+1 < h h n, then: W h sup (W s α n s h n 2 s) + α n 2 h n β n + α n 2 h n now use h n = 1 θ h n+1 and the definition of α n, β n ; then P -a.s., for n n (ω), for h with h n+1 < h h n : ( 1 W h ψ(h n+1 ) δ ) ( 1 ψ(h) 2θ δ ) 2θ since ψ is increasing for small h. Take the lim sup as h ; then let δ, θ 1 and get: W h lim sup h ψ(h) 1 P -a.s. Proof of : write W hn = W hn W hn+1 +W hn+1 ; estimate independent random variables W hn W hn+1 from below; use 1) for W to also estimate W hn+1 from below; this gives estimate from below for W hn. Again choose h n = θ n with < θ < 1. By 1), applied to W, we have P -a.s. for n n 1 (ω) W hn+1 2ψ(h n+1 ) (2.4.3)

26 26 On the other hand, W hn W hn+1 =: h n h n+1 Y n = θ n θ n+1 Y n, and the Y n are i.i.d. N (, 1). To estimate P [W hn W hn+1 Cψ(h n ), we choose Now a n := C θn (1 θ) ψ(h n) = = C 2 log n + const.(θ). 1 θ 1 P [W hn W hn+1 Cψ(h n ) = P [Y n a n a n + 1 e 1 a n by Lemma Compute for large n, and exp for C := 1 θ. Therefore n=1 a n + 1 a n const. log n const. log n, ( 1 ) 2 a2 n =... = const. n C2 1 1 θ = const. n n=1 2 a2 n 1 2π P [W hn W hn θψ(h n ) const. n log n = +. By Borel-Cantelli: P -a.s., often W hn W hn+1 1 θψ(h n ). Combine this with (2.4.3); then P -a.s., for many n, W hn = W hn W hn+1 + W hn+1 1 θψ(h n ) 2ψ(h n+1 ) ψ(h n )( 1 θ 4 θ), using that ψ(h n+1 ) 2 θψ(h n ) for large n. So lim sup h W h ψ(h) 1 θ 4 θ P -a.s. and now the assertion follows as θ. 3 Markov processes Basic idea: Markov process indexed by time is a model for temporal evolution of a stochastic system, where predictions about the future do not depend on the entire past, but only on the present state of the system. More mathematically, we want E[g(X u ; u t) σ(x s ; s t) = E[g(X u ; u t) σ(x t ). In addition, we want this not only for fixed times t, but also for random (stopping) times τ, a property called the strong Markov property.

27 3.1 Basic concepts Basic concepts Start with probability space (Ω, F, P ) and measurable space (S, S); the latter is the state space of the process we want to consider. Suppose we have an S-valued stochastic process X = (X t ) t and define Ft := σ(x s ; s t), t. If S is Polish and S = B(S), then X t has a regular conditional distribution given Fs for any s t. If this depends not on all of Fs, but only on X s, we can write P [X t A F s = P [X t A X s = K s,t (X s, A) for some stochastic kernel K s,t from (S, S) to itself. Moreover, for s t u, K s,u (X s, A) = P [X u A Fs = E [ P [X u A Ft F s }{{} = S K t,u (y, A)K s,t (X s, dy) =: (K s,t K t,u )(X s, A) K t,u(x t,a) If things are homogeneous in time, then K s,t will not depend separately on s, t, but only on time difference t s. For many issues, one needs to work with Markov processes having nice trajectories; so we need a nice function space. So assume (S, d) is a metric space and denote by D(S) the space of all functions y : [, ) S which are RCLL. Call Y the coordinate process on D(S): Y = (Y t ) t with Y t : D(S) S, y Y t (y) = y(t), and D t := σ(y s ; s t), D := D = σ(y s ; s ) (on D(S)). Further we define S [, ) := σ(y s ; s ) (on S [, ) ). Then (D(S), D) is a measurable space; it will turn out that if S is nice, then also D(S) is nice. In other words: RCLL processes are nice objects. One can view an S-valued stochastic process X = (X t ) t on (Ω, F, P ) as an S [, ) -valued random variable; if X has RCLL trajectories, it is actually D(S)-valued. Distribution of X under P is a p.m. on (S [, ), S [, ) ) or on (D(S), D), depending on trajectories of X. Theorem Suppose S is Polish and S = B(S); then there is a metric on D(S) which makes D(S) into a complete separable metric space (so D(S) is again Polish) and such that D = B(D(S)). [So: completely analogous to C(S) The resulting topology is the Skorohod topology. Proof. Long and technical; see Ethier/Kurtz, section 3.5, or Jacod/Shiryaev, Section VI.1. Definition. (S, d) metric space, S = B(S). A transition semigroup on (S, S) is a family (K t ) t of stochastic kernels on (S, S) such that: (TS1) K = Id, i.e., K (x, A) = I A (x), x S, A S. (TS2) K s K t = K t+s (= K t K s ) for s, t, where (K s K t )(x, A) := S K s(x, dy)k t (y, A). This is called the semigroup property or Chapman-Kolmogorov equation. (TS3) lim h K h f(x) = f(x), x S, f C b (S), where (K h f)(x) := S K h(x, dy)f(y) for f measurable and bounded or.

28 3.1 Basic concepts 28 Note.: View K t interchangeably as stochastic kernel and as operator acting on functions. Then (TS3) says that (K t ) t is continuous at t =, with respect to pointwise convergence for functions in C b (S); also called in bp-sense. Definition. (S, d) metric space, S = B(S). A (RCLL) Markov process on (Ω, F, P ), with respect to the filtration G = (G t ) t on (Ω, F), with transition semigroup (K t ) is a (RCLL) S-valued stochastic process X = (X t ) t which is adapted to G and such that for all t, h, the conditional distribution of X t+h given G t is given by K h (X t, ), i.e., P [X t+h A G t = K h (X t, A) P -a.s. The distribution on S [, ) (or D(S)) of X under P is then called a Markov distribution with transition semigroup (K t ), and the distribution on S of X under P is called initial distribution. Remark.: 1) Rigorous formulation of intuitive idea at beginning: K h describes the transition mechanism; if we begin at x S, then K h (x, A) is conditional probability of landing in A after h units of time, given that we start in x. If we start at time t, this conditional probability does not depend on G t, but only on starting point X t. 2) Homogeneous in time: transition mechanism from t to u depends only on u t, not on t, u separately; otherwise, need (K s,t ) s t, and the semigroup property is then K s,t K t,u = K s,u. 3) In discrete time with (X n ) n N, a Markov process is usually called Markov chain. There (at least for the time-homogeneous case), it is enough to specify one single kernel, namely K 1 (passage from one time point to next one). 4) If X is Markov wrt. G, it is also Markov with respect to its own filtration F (projectivity); in particular, need G F (this is implicit if X G-adapted). Without mention of filtration, default is always F. 5) Being Markov is a distributional property (and has nothing to do with paths). In particular if X is P -Markov it need not be Q-Markov for some other p.m. Q on (Ω, F). Notation.: The distribution of X under P, initial distribution, on S is usually called ν. The distribution of X with initial distribution ν is called P ν ; if ν = δ {x}, briefly P x for P δ{x}. Exercise.: X Markov for F, f measurable on S, then E[f(X t+h ) Ft = f(y)k h (X t, dy) = (K h f)(x t ). S Remark.: Suppose X is Markov. Take any Z on S [, ) (or D(S)), Z, S [, ) - (or D-) measurable. Then S x E x [Z is measurable; see below. Example.: If W is BM wrt. G, then W is a continuous Markov process wrt. G: W t+h W t is independent of G t and N (, h), so W t+h = W t + (W t+h W t ), conditionally on G t, has a N (W t, h)-distribution; so K h (x, dz) = n x,h (z)dz = 1 e (z x)2 2h dz. 2πh Check: this family (K t ) t is a transition semigroup.

29 3.1 Basic concepts 29 Example.: A Homogeneous Poisson process with intensity (rate) λ > is a stochastic process N with RCLL, piecewise constant trajectories, with jumps of height +1 at random times in such a way that the increments of N are independent, with N t N s P(λ(t s)). This is an RCLL Markov process; state space is R; since N t+h = N t +(N t+h N t ), kernels are given by: λh (λh)j K h (x, {x + j}) = e j! for x R, j N. What is the structure of a Markov distribution on path space? Proposition A process X is Markov with transition semigroup (K t ) and initial distribution ν iff: n N, = t < t 1 < < t n <, f, f 1,..., f n : S [, ) measurable, we have [ n E k= f k (X tk ) = S S ν(dx )f (x ) S K t1 (x, dx 1 )f 1 (x 1 ) K t2 t 1 (x 1, dx 2 )f 2 (x 2 ) S K tn t n 1 (x n 1, dx n )f n (x n ). (3.1.5) This is equivalent to saying that the marginal distribution of P ν at time points t, t 1,..., t n is given by ν K t1 K t2 t 1 K tn t n 1 = P (X t, X t1,..., X tn ) 1. (3.1.6) Proof. First, (3.1.5) determines marginal distribution of (X t, X t1,..., X tn ); so knowing all (3.1.5) is equivalent to knowing P ν. Second, (3.1.5) and (3.1.6) are equivalent by construction of P 1 K; see WT, Chapter II. Suppose X is Markov. Then So E[f n (X tn ) F t n 1 = (K tn t n 1 f n )(X tn 1 ) =: f n 1 (X tn 1 ). [ n E k= [n 1 f k (X tk ) = E k= f k (X tk )E[f n (X tn ) Ft n 1 n 1 = E[ fn 1 (X tn 1 ) f k (X tk ). Now iterate and use X t = X ν to get (3.1.5). k= Conversely, suppose we have (3.1.5). Then we need to show that E[f(X t+h )Z = E[((K h f)(x t ))Z, Z F t -measurable. Use MCT with M := {all products m i= f i(x ti )} where t m t and f i : S [, ) bounded measurable enough to prove [ E f(x t+h ) m i= [ f i (X ti ) = E (K h f)(x t ) m i= f i (X ti ) ; and this is immediate from the definition of K h f and from (3.1.5).

30 3.2 Markov property and strong Markov property 3 Next question: How can we actually construct a Markov distribution? Obvious idea: define fdmds by (3.1.5) or (3.1.6) and check consistency. This works! Intuition: (3.1.5) is roughly P [ X t (z, z + dz,..., X tn (z n, z n + dz n = ν(dz )K t1 (z, dz 1 )K t2 t 1 (z 1, dz 2 ) K tn t n 1 (z n 1, dz n ) i.e. choose a starting point z according to ν; then successively pass from z k 1 at t k 1 to some z k at t k according to K tk t k 1 (z k 1, dz k ). Now how about existence of Markov processes? Proposition Suppose S is Polish and S = B(S). Take any initial distribution ν and any transition semigroup (K t ) t on (S, S). Then Markov process correpsonding to ν, (K t ); more precisely, choose Ω := S [, ), X := Y := coordinate process, Ft := σ(x s ; s t), F := F := σ(x s ; s ); then,! probability measure P on (Ω, F) such that X is (P, F )-Markov with initial distribution ν and transition semigroup (K t ). Proof. For each finite I [, ), define probability measure Q (I) on (S I, S I ) by RHS of (3.1.6). Then (TS2) (the semigroup property K t K s = K t+s ) implies that the Q (I) are consistent (exercise). So the result follows from Kolmogorov s consistency theorem, WT II.3.5, and of course from Proposition Remark.: 1. For ν = δ {x}, then (3.1.6) implies that x E x [Z is measurable for any measurable Z on S [, ) ; see WT, construction of P 1 K. 2. Without extra conditions on (K t ), X from Proposition will not have nice paths; in other words, P lives on all of S [, ), it is not concentrated on D(S). If (K t ) is Feller (see later), things look better. 3.2 Markov property and strong Markov property Intuition for MP: predictions about future given past only depend on present. This is a distributional property, and is most conveniently formulated on path space. So: Ω = S [, ), X = Y = coordinate process; F = filtration generated by X; F = F ; P = P ν for some initial distribution ν and transition semigroup (K t ). (If we need RCLL paths for X, then work on D(S) instead; assume then that (S, d) is metric and S = B(S).) Definition. Shift operator θ t : Ω Ω (for t ) is defined by (θ t ω)(s) := ω(t + s), i.e. θ t ω( ) = ω(t + ): look at ω only on [t, ). For a mapping τ : Ω [,, define θ τ on {τ < } by θ τ ω := θ τ(ω) ω. Past of X up to time t is F t = σ(x s ; s t). Future of X after time t is ˆF t := σ(x u ; u t).

31 3.2 Markov property and strong Markov property 31 Then θ 1 t (F ) = θt 1 (σ(x s ; s )) = σ(θt 1 (X s ); s ) = σ(x u ; u t) = }{{} ˆF X t+s so θ t is ˆF t -F -measurable, and more precisely all ˆF t -measurable random variables are of the form Z θ t for some F -measurable Z, and vice versa. Proposition (Markov property). For any t and any F -measurable Z on Ω, we have E ν [Z θ t F t = E Xt [Z := E x [Z x=xt P ν -a.s. (3.2.2) Remark.: Z θ t on LHS is a typical ˆF t -measurable random variable; so LHS is the typical prediction of the future given the past. x E x [Z is measurable; so RHS is a measurable function of current state X t and so σ(x t )-measurable. Conditioning on σ(x t ) therefore gives E ν [Z θ t F t = E ν [Z θ t σ(x t ), and the RHS only depends on present state X t, or on present σ(x t ). In particular, Z := I Xh A gives Z θ t = I {Xt+h A} and: P ν [X t+h A F t = E ν [Z θ t F t (3.2.2) = E Xt [Z = P Xt [X h A (3.1.6) = K h (X t, A) So (3.2.2) yields interpretation of K h as conditional distribution of X t+h given F t. The next proof shows that, conversely, (3.2.2) is just a reformulation of this. Proof of Need to show, for every U F t -measurable, E ν [(Z θ t )U = E ν [E Xt [ZU By MCT (..1), it is enough to take U of the form U = n i= f i(x ti ) with f i measurable and all t i t, and for Z of the form Z = m k= g k(x tk ) with g k measurable and all t k. And then this equality follows directly from (3.1.5) in Proposition Example (Short-rate models in mathematical finance).: Suppose r = (r t ) t describes instantaneous interest rate. A zero coupon bond with maturity T pays out at time T > amount 1; price of this at time t < T is given by: [ B t,t = E exp ( T r s ds ) G t t for some Q P, if we want to avoid arbitrage. t ; (3.2.4)

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory