STOCHASTIC CALCULUS JASON MILLER AND VITTORIA SILVESTRI

STOCHASTIC CALCULUS JASON MILLER AND VITTORIA SILVESTRI Contents Preface 1 1. Introduction 1 2. Preliminaries 4 3. Local martingales 1 4. The stochastic integral 16 5. Stochastic calculus 36 6. Applications 44 7. Stochastic differential equations 49 8. Diffusion processes 59 Preface These lecture notes are for the University of Cambridge Part III course Stochastic Calculus, given Lent 217. The contents are very closely based on a set of lecture notes for this course due to Christina Goldschmidt. Please notify vs358@cam.ac.uk for comments and corrections. 1. Introduction In ordinary calculus, one learns how to integrate, differentiate, and solve ordinary differential equations. In this course, we will develop the theory for the stochastic analogs of these constructions: the Itô integral, the Itô derivative, and stochastic differential equations (SDEs. Let us start with an example. Suppose a gambler is repeatedly tossing a fair coin, while betting 1 on head at each coin toss. Let ξ 1, ξ 2, ξ 3... be independent random variables denoting the consecutive outcomes, i.e. +1 if head at k th coin toss, ξ k = (ξ k k 1 i.i.d. 1 otherwise, 1

2 JASON MILLER AND VITTORIA SILVESTRI Then the gambler s net winnings after n coin tosses are given by X n := ξ 1 + ξ 2 +... + ξ n. Note that (X n n is a simple random walk on Z starting at X =, and it is therefore a discrete time martingale with respect to the filtration (F n n generated by the outcomes, i.e. F n = σ(ξ 1,..., ξ n. Now suppose that, at the m th coin toss, the gambler bets an amount H m on head (we can also allow H m to be negative by interpreting it as a bet on tail. Take, for now, (H m m 1 to be deterministic (for example, the gambler could have decided in advance the sequence of bets. Then it is easy to see that n (1.1 (H X n := H k (X k X k 1 k=1 gives the net winnings after n coin tosses. We claim that the stochastic process H X is a martingale with respect to (F n n. Indeed, integrability and adaptedness are clear. Furthermore, (1.2 E ( (H X n+1 (H X n Fn = Hn+1 E(X n+1 X n F n = for all n 1, which shows that H X is a martingale. In fact, this property is much more general. Indeed, instead of deciding his bets in advance, the gambler might want to allow its (n+1 th bet to depend on the first n outcomes. That is, we can take H n+1 to be random, as long as it is F n -measurable. Such processes are called previsible, and they will play a fundamental role in this course. In this more general setting, (H X n still represents the net amount of winnings after n bets, and (1.2 still holds, so that H X is again a discrete time martingale. For this reason, the process H X is usually referred to as discrete martingale transform. Remark 1.1. This teaches us that we cannot make money by gambling on a fair game! Our goal for the first part of the course is to generalise the above reasoning to the continuous time setting, i.e. to define the process H s dx s for (H t t suitable previsible 1 process and (X t t continuous martingale (think, for example, of X as being standard one-dimensional Brownian Motion, the analogue to a simple random walk in the continuum. To achieve this, as a first attempt one could try to generalise the Lebesgue-Stieltjes theory of integration to more general integrators: this will lead us to talk about finite variation processes. Unfortunately, as we will see, the only continuous martingales which have finite variation are the constant ones, so a new theory is needed in order to integrate, for example, with respect to Brownian Motion. This is what is commonly called Itô s theory of integration. To see how one could go to build this new theory, note that definition (1.1 is 1 At this point we have not yet defined the notion of previsible process in the continuous time setting: this will be clarified later in the course.

STOCHASTIC CALCULUS 3 perfectly well-posed in continuous time whenever the integrand H is piecewise constant taking only finitely many values. In order to accommodate more general integrands, one could then think to take limits, setting, for example, t/ε (1.3 (H X t := lim H kε (X (k+1ε X kε. ε k= On the other hand, one quickly realises that the r.h.s. above in general does not converge in the usual sense, due to the roughness of X (think, for example, to Brownian Motion. What is the right notion of convergence that makes the above limit finite? As we will see in the construction of Itô integral, to get convergence one has to exploit the cancellations in the above sum. As an example, take X to be Brownian Motion, and observe that [( t/ɛ 2 ] E H kε (X (k+1ε X kε = k= [ t/ε = E k= H 2 kε t/ε ( 2 X(k+1ε X kε + j k [ t/ε ] t/ε = E Hkε 2 ( 2 X(k+1ε X kε = ε Hkε 2 k= ] H jε H kε (X (k+1ε X kε (X (j+1ε X jε k= H 2 s ds as ε, where the cancellations are due to orthogonality of the martingale increments. These type of considerations will allow us to provide a rigorous construction of the Itô integral for X continuous martingale (such as Brownian Motion, and in fact even for slightly more general integrands, namely local martingales and semimartingales. Once the stochastic integral has been defined, we will discuss its properties and applications. We will look, for example, at iterated integrals, the stochastic chain rule and stochastic integration by part. Writing, to shorten the notation, dy t = H t dx t to mean Y t = H sdx s, we will learn how to express df(y t in terms of dy t by mean of the celebrated Itô s formula. This is an extremely useful tool, of which we will present several applications. For example, we will be able to show that any continuous martingale is a time-changed Brownian Motion (this is the so-called Dubins-Schwarz theorem. In the second part of the course we will then focus on the study of Stochastic Differential Equations (SDEs in short, that is equations of the form (1.4 dx t = b(t, X t dt + σ(t, X t db t, X = x,

4 JASON MILLER AND VITTORIA SILVESTRI for suitably nice functions a, σ. These can be thought of as arising from randomly perturbed ODEs. To clarify this point, suppose we are given the ODE dx(t = b(t, x(t, (1.5 dt x( = x, which writes equivalently as x(t = x + b(s, x(sds. In many applications we may wish to perturb (1.5 by adding some noise, say Brownian noise. This gives a new (stochastic equation X t = x + b(s, X s ds + σb t, where the real parameter σ controls the intensity of the noise. In fact, we may also want to allow such intensity to depend on the current time and state of the system, to get X t = x + b(s, X s ds + σ(s, X s db s, where the last term is an Itô integral. This, written in differential form, gives (1.4. Of such SDEs we will present several notions of solutions, and discuss under which conditions on the functions b and σ such solutions exist and are unique. Finally, in the last part of the course we will talk about diffusion processes, which are Markov processes characterised by martingales properties. We will explain how such processes can be constructed from SDEs, and how they can be used to build solutions to certain (deterministic PDEs involving second order elliptic operators. 2. Preliminaries 2.1. Finite variation integrals. Recall that a function a: [, R is said to be cádlág if it is right-continuous and has left limits. In other words, for all x (, we have both lim y x y x a(y exists and lim a(y = a(x. + We write a(x for the left limit at x, and let a(x := a(x a(x. Assume that a is non-decreasing and cádlág with a( =. Then there exists a unique Borel measure da on [, such that for all s < t, we have that da((s, t] = a(t a(s. For any f : R R measurable and integrable function we can define the Lebesgue-Stieltjes integral f a by setting (f a(t = f(sda(s (,t]

STOCHASTIC CALCULUS 5 for all t. It is easy to see that f a is right-continuous. Moreover, if a is continuous then f a is itself continuous. In this case, we can write f(sda(s = f(sda(s (,t] unambiguously. We are now interested in enlarging the class of functions a against which we can integrate. A first, trivial observation is that we can use linearity to integrate against functions a which write as a = a + a, for some a +, a satisfying the conditions above. Then we set (f a(t := (f a + (t (f a (t for all f measurable functions and t, provided both terms in the r.h.s. are finite. How large is the class of such functions? It turns out to exactly coincide with the class of finite variation functions. Definition 2.1. Let a : [, R be a cádlág function. For any n N and t, let (2.1 v n (t = 2 n t 1 k= a((k + 12 n a(k2 n. Then v(t := lim n v n (t exists for all t, and it is called total variation of a on (, t]. If v(t < then a is said to have finite variation on (, t]. We say that a is of finite variation if it is of finite variation on (, t] for all t. To show that the above definition makes sense we have to show that lim n v n (t exists at all t. Indeed, let t + n = 2 n 2 n t, t n = 2 n ( 2 n t 1, and note that t + n t t n for all n. We have v n (t = 2 n t n 1 k= a((k + 12 n a(k2 n + a(t + n a(t n. Now, the first term is increasing in n by the triangle inequality, and hence it has a limit as n. Moreover, by the cádlág property the second term converges to a(t as n. In all, v n (t has a limit as n. Lemma 2.2. Let a : [, R be a cádlág function of finite variation, that is v(t < for all t. Then v is also with v(t = a(t, and non-decreasing in t. In particular, if a is continuous then also v is continuous. Proof. See Example Sheet 1. We can now state the aforementioned characterisation result. Proposition 2.3. A cádlág function a : [, R can be written as difference of two right-continuous, non-decreasing functions if and only if a is of finite variation.

6 JASON MILLER AND VITTORIA SILVESTRI Proof. Assume that a = a + a, for a +, a non-decreasing, right-continuous functions. We now show that a is of finite variation. Indeed, for any s t we have a(t a(s (a + (t a + (s + (a (t a (s, where both terms are non-negative since a +, a are non-decreasing. Plugging this into (2.1, and observing that the summations defining the total variation of a +, a are telescopic, we gather that v n (t a + (t + n a + ( + a (t + n a (. Being a +, a cádlág, the r.h.s. converges to a + (t a + ( + a (t a (, which is finite. Since t is arbitrary, this shows that v(t < for all t, i.e. a is of finite variation. We now prove the reverse implication by constructing a + and a explicitly. Assume that v(t < for all t. Set a + = 1 2 (v + a, a = 1 (v a. 2 Then clearly, a = a + a, and a +, a are cádlág since v and a are. It remains to show that they are non-decreasing. For s < t, define t ± n as above, and s ± n analogously. Then we have a + (t a + (s = lim n = lim n 1[ (v n (t v n (s + (a(t a(s ] 2 [ 2 1 n t n 1 ( 2 k=2 n s + n a((k + 12 n a(k2 n + ( a((k + 12 n a(k2 n }{{} ] + a(t + n a(t n + (a(t + n a(t n }{{} The same argument can be used to show that a is non-decreasing. 2.2. Random integrators. We now discuss integration with respect to random finite variation functions. Let (Ω, F, (F t t, P denote a filtered probability space. Recall that a process X : Ω [, R is adapted to the filtration (F t t if X t = X(, t is F t -measurable for all t. Moreover, X is cádlág if X(ω, is cádlág for all ω Ω (that is, if all realizations of X are cádlág. Definition 2.4. Given a cádlág adapted process A : Ω [, R, its total variation process V : Ω [, R is defined pathwise by letting V (ω, be the total variation of A(ω,, for all ω Ω. We say that A is of finite variation if A(ω, is of finite variation for all ω Ω (that is, if all realizations of A are of finite variation..

STOCHASTIC CALCULUS 7 Lemma 2.5. Let A be a cádlág, adapted process of finite variation V. Then V is cádlág, adapted and pathwise non-decreasing. Proof. By Lemma 2.2, for all ω Ω the path V (ω, is cádlág and non-decreasing, so the process V is cádlág and pathwise non-decreasing. To see that it is adapted, recall that for any t we define t n = 2 n ( 2 n t 1, and introduce for all n the process 2 n t n 1 Ṽt n = A (k+12 n A k2 n. k= Then Ṽ n is adapted for all n, since t n t. Moreover, V t = lim Ṽ n n t + A t, which shows that V t is F t -measurable since both terms in the r.h.s. are F t -measurable. Our next goal is to be able to integrate random functions against A. Clearly, we can define the integral pathwise, but doing so does not guarantee that the resulting process is both cádlág and adapted. To have this, we need to restrict our class of integrands to so called previsible processes. 2.2.1. Previsible processes. Recall from the introduction that a discrete time process H = (H n n 1 is previsible with respect to a filtration (F n n if H n+1 is F n -measurable for all n. Here is the analogous definition in continuous time. Definition 2.6. The previsible σ-algebra P on Ω (, is the σ-algebra generated by sets of the form E (s, t], for E F s and s < t. A process H : Ω (, is said to be previsible if it is measurable with respect to P. It is easy to see directly from the definition that, for example, any process of the form H(ω, t = Z(ω1 (t1,t 2 ](t, Z F t1 -measurable, is previsible. Thus clearly if we let (2.2 H(ω, t = n Z k (ω1 (tk,t k+1 ](t k= for some n N, = t < t 1 < t n < and Z k F tk -measurable for all k, then H is previsible. Processes of the form (2.2 are called simple, and they will play an important role in the construction of the Itô integral later in the course. Remark 2.7. Note that all simple processes are left-continuous and adapted. In general, it is possible to show that P is the smallest σ-algebra on Ω (, such that all left-continuous, adapted processes are measurable.

8 JASON MILLER AND VITTORIA SILVESTRI Although not all cádlág adapted processes are previsible, the next result shows that their left-continuous modification is. Proposition 2.8. Let X be a cádlág adapted process, and define for all t the process H t := X t. Then H is previsible. Proof. Since X is cádlág and adapted, H is left-continuous and adapted. Now to see that H is previsible we express it as a limit of previsible processes. For n 1, define the left-continuous, piecewise constant process Ht n := H k2 n1 (k2 n,(k+12 n ](t. k= Then the processes H n are clearly previsible for all n 1. Moreover, by left-continuity of H we have that lim n Ht n = H t for all t. Thus H is a limit of previsible processes, and it is therefore previsible. Remark 2.9. By the above proposition, all continuous processes (such as Brownian Motion are previsible. The following result provides a useful tool to determine that a process is not previsible. Proposition 2.1. If a process H is previsible, then H t is measurable with respect to F t = σ(f s : s < t, for all t. Proof. See Example sheet 1. This shows, for example, that a Poisson process (N t t is not previsible since N t is not F t -measurable. We can now show that integrating previsible processes against cádlág, adapted processes of finite variation results in cádlág, adapted processes of finite variation. Theorem 2.11. Let A : Ω [, R be a cádlág, adapted process of finite variation V. Let H be a previsible process, and assume that for all ω Ω it holds (2.3 H(ω, s dv (ω, s <, t >. (,t] Then the process (H A : Ω [, R defined by (2.4 (H A(ω, t := H(ω, sda(ω, s, (H A(ω, (,t] is cádlág, adapted and of finite variation.

STOCHASTIC CALCULUS 9 Proof. Note that the integral in (2.4 is well defined thanks to the integrability condition (2.3. Indeed, write H ± := max{±h, } and A ± = 1 2 (V ± A, so that H = H+ H and A = A + A. Then H A = (H + H (A + A = H + A + H A + H + A + H A, and all terms in the r.h.s. are finite by (2.3. Step 1: Cádlág. Note that 1 (,s] 1 (,t] as s t and 1 (,s] 1 (,t as s t. Recall that (H A t = H s 1(s (, t]da s. By the dominated convergence theorem (with ω fixed, we have that (H A t = H s lim 1(s (, r]da s r t = lim H s 1(s (, r]da s r t = lim r t (H A r. Therefore H A is right-continuous. A similar argument implies that H A has left limits. Moreover, we have that (H A t = H s 1(s = tda s = H t A t. Step 2: Adapted. We prove adaptedness by a monotone class argument. Suppose that H = 1 B (s,u] where B F s and u > s. Then we have that which is clearly F t -measurable. Let we aim to show that A = P. To this end, let (H A t = 1 B (A t u A t s, A = {C P : 1 C A is adapted to (F t t }, Π = {B (s, u] : B F s, s < u}. Then Π is a π-system and we have shown above that Π A. To prove the reverse inclusion, observe that A is a d-system, so by Dynkin s lemma we have that P = σ(π A P. Therefore A = P, i.e. H A is adapted for all H = 1 C, C P. Now to generalise, suppose that H is previsible. Set H n = (2 n 2 n H n = 2 n n 1 k=1 2 n k1(h [2 n k, 2 n (k + 1 + n1(h }{{}}{{ > n }, P P

1 JASON MILLER AND VITTORIA SILVESTRI and note that H n is a finite linear combination of functions of the form 1 C for C P. It therefore follows by linearity and that (H n A t is F t -measurable for all t. Moreover H n H almost surely as n, and so by the Monotone Convergence Theorem (MCT we have that (H n A t (H A t as n. It follows that (H A t is F t -measurable for all t, as limit of F t -measurable functions. This extends in the usual way to the case of general H by writing H = H + H and observing that H + A and H A are both adapted, so such is H A. Step 3: Finite variation. Simply observe that H A = (H + H (A + A = (H + A + + H A (H A + + H + A is a difference of non-decreasing processes, hence of finite variation. Our next goal is to integrate against continuous martingales, such as Brownian Motion. By the previous theorem, were we able to identify a class of continuous martingales with finite variations we would be done, as then we would be able to apply the definition in (2.4. Unfortunately but also interestingly we will see in the next section that the only continuous martingales which have finite variation are the constant ones. This shows that in order to be able to integrate with respect to general (non-constant continuous martingales one need to develop a new theory of integration, which we discuss in Section 4. 3. Local martingales Let (Ω, F, (F t t, P be a filtered probability space. Definition 3.1. A filtration (F t t is said to satisfy the usual condition if F contains all P-null sets, (F t t is right-continuous, that is for all t it holds F t = F t + := s>t F s. We assume that (F t t satisfies the usual condition throughout the course. Recall that an integrable, adapted process X = (X t t is a martingale with respect to (F t t if for all s t it holds E(X t F s = X s almost surely. Similarly, X is a supermartingale (resp. submartingale if for all s t it holds E(X t F s X s (resp. E(X t F s X s almost surely. Moreover, a random variable T : Ω [, ] is called a stopping time if {T t} F t for all t. Note that if X is continuous (or cádlág and we let F T := {E F : E {T t} F t t } then X T is F T -measurable. Crucially, if X is a martingale then the stopped process X T := (X t T t is again a martingale. In fact, this property can be used to characterise martingales, as shown in the next theorem.

STOCHASTIC CALCULUS 11 Theorem 3.2 (Optional Stopping Theorem, OST. Let X be an adapted integrable process. Then the following are equivalent: (1 X is a martingale, (2 X T = (X t T t is a martingale for all stopping times T, (3 for all bounded stopping times S T it holds E(X T F S = X S a.s., (4 for all bounded stopping times T it holds E(X T = E(X. In other words, the class of cádlág martingales is stable under stopping. This leads us to define a slightly more general class of such processes, called local martingales. Definition 3.3 (Local martingale. A cádlág adapted process is called a local martingale if there exists a sequence (T n n 1 of stopping times with T n + almost surely, such that the stopped process X Tn is a martingale for all n 1. In this case we say that the sequence (T n n 1 reduces X. Note that any martingale is a local martingale, since any deterministic sequence (T n n 1 will reduce it. The following important example shows that, in fact, the class of local martingales is strictly larger. Example 3.4. Let B be a standard Brownian motion in R 3. Let M t = 1/ B t. Recall from Example Sheet 3, Exercise 7.7 of Advanced Probability that: (i (M t t 1 is bounded in L 2, that is sup E(Mt 2 <. t 1 (ii E(M t as t. (iii M is a supermartingale. Note that M cannot be a martingale, as if it were it should have constant expectation, which would have to vanish by (ii. We now show that it is a local martingale by exhibiting a reducing sequence of stopping times. For each n, let T n = inf{t 1 : B t 1/n} = inf{t 1 : M t n}. We aim to show that (Mt Tn t 1 is a martingale for all n and T n almost surely. We begin by noting that if n M 1 (ω then T n (ω = 1 and if n > M 1 (ω then T n (ω > 1. Moreover, since B t cannot hit 1/(n + 1 without first having hit 1/n, it follows that T n (ω is non-decreasing in n. Recall from Advanced Probability that if f C 2 b (R3 then f(b t f(b 1 2 f(b s ds, t, is a martingale. Note that f(x = 1/ x is harmonic in R 3 \ {}. To deal with the singularity at the origin, let (f n n 1 be a sequence of bounded C 2 (R 3 functions such that f n f on

12 JASON MILLER AND VITTORIA SILVESTRI { x 1/n}, for all n 1. Note that if < B 1 1/n, then T n = 1 and M Tn M 1 is trivially a martingale. Moreover, since B 1 almost surely, we will have that B 1 > 1/n for n large enough, almost surely. Suppose therefore without loss of generality that B 1 > 1/n for all n. Then It follows that f(b t Tn = f n (B t Tn t 1. M t Tn = f(b t Tn f(b 1 + f(b 1 [ = f(b t Tn f(b 1 1 2 1 [ = f n (B t Tn f n (B 1 1 2 Tn Tn and so M Tn = (M t Tn t 1 is a martingale for all n 1. 1 ] f(b s ds + f(b 1 ] f n (B s ds + f n (B 1, It only remains to show that T n almost surely as n. Note that lim n T n exists since the T n s are non-decreasing, so we only need to show that such limit is infinite. For each R > we let S R = inf{t 1 : B t > R} = inf{t 1 : M t < 1/R}, and note that S R + almost surely as R. Then clearly ( P lim T n < n ( P R : T n < S R for all n To compute the latter probability note that by the OST we have E(M Tn S R = E(M 1 =: µ (,. On the other hand, we can also rewrite the left hand side as np(t n < S R + 1 R P(S R T n. = lim R lim n P(T n < S R. Using that P(S R T n = 1 P(T n < S R, we can solve for P(T n < S R in the above to get that We thus conclude that (M t t 1 is: P(T n < S R = µ 1/R as n. n 1/R non-negative, a local martingale but not a martingale, a supermartingale, and L 2 bounded. In fact, we could have concluded that M is a supermartingale from the first 2 properties, as the following result shows.

STOCHASTIC CALCULUS 13 Proposition 3.5. If X is a local martingale, and X t for all t, then X is a supermartingale. Proof. This follows from the conditional Fatou s lemma. Indeed, if (T n n 1 is a reducing sequence for X, then for any s t we have ( E(X t F s = E lim inf X t T n n F s lim inf E(X t T n n F s = lim inf X s T n n = X s almost surely. Recall that our goal is to define integrals with respect to continuous martingales, such as Brownian Motion. In fact, we will be able to integrate with respect to continuous local martingales, with almost no additional effort. But why bother working with local martingales? One good reason is that, for example, this frees us from worrying about integrability, which can be difficult to check. Indeed, in Definition 3.3 the process X is only required to be integrable at stopping times. Moreover, we will often encounter stochastic processes which are only defined up to a random time τ. In this case it is not clear how to adapt the definition of martingale, since the process might fail to be defined on the whole interval [,. On the other hand, it s clear what the definition of local martingale on [, τ should be: simply ask that T n τ as n in Definition 3.3. With this in mind, we ask how big is the class of continuous local martingales with respect to the one of continuous martingales. We will then prove the main result of this section, stating that the only continuous local martingales with finite variation are the constant ones. Definition 3.6 (Uniform integrability, UI. A set X of random variables is said to be uniformly integrable if sup E( X 1( X > λ as λ. X X Remark 3.7. Any family of uniformly bounded random variable is trivially UI. More generally, if X Y for all X X and some integrable random variable Y, then X is UI. Lemma 3.8. If X is in L 1 (Ω, F, P, then the family X = {E(X G : G is a sub σ-algebra} is UI. Proof. See Example sheet 1. We use the above result to prove a useful criterion to determine whether a local martingale is in fact a martingale. Proposition 3.9. The following statements are equivalent: (i X is a martingale,

14 JASON MILLER AND VITTORIA SILVESTRI (ii X is a local martingale, and for all t the family X t = {X T : T is a stopping time with T t} is UI. Proof. Suppose that X is a martingale. By the OST, if T is a stopping time with T t, then we have that X T = E(X t F T. Lemma 3.8 then implies that X t is UI. Suppose, on the other hand, that X is a local martingale and the family X t is UI for all t. We prove that X is a true martingale by showing that for all bounded stopping times T we have E(X T = E(X. Indeed, let (T n n be a reducing sequence for X. Then, for all bounded stopping times T t, E(X = E(X Tn = E(XTn T = E(X T T n. As {X T Tn : n } is UI, we know from Advanced Probability that X T Tn converges almost surely and in L 1 as n, the limit being X T since T n +. It follows that E(X T Tn E(X T as n. This shows that E(X = E(X T and, since T was arbitrary, that X is a martingale. Combining the above proposition with Remark 3.7 we obtain the following useful corollary. Corollary 3.1. A bounded local martingale is a true martingale. More generally, if X is a local martingale such that X t Y for all t and some integrable random variable Y, then X is a true martingale. We can finally turn to the main result of this section. Theorem 3.11. Let X be a continuous local martingale with X =, and assume that X has finite variation. Then X almost surely. Proof. Let V = (V t t denote the total variation process of X. Then V = and V is continuous, adapted and pathwise non-decreasing by Lemmas 2.2-2.5. Define the sequence of stopping times (3.1 T n = inf{t : V t = n}, n 1. Then T n since V is non-decreasing and finite by assumption. Moreover, X Tn t = X t Tn V t Tn n, which shows that X Tn is bounded, and hence a true martingale for all n 1. Thus (T n n 1 reduces X. To show that X, it clearly suffices to show that X Tn for all n 1. Fix any n 1 and write Y = X Tn to shorten the notation. Then Y is a continuous, bounded

STOCHASTIC CALCULUS 15 martingale with Y =. By continuity, to see that Y it suffices to show that E(Yt 2 = for all t. Fix t > and for N 1 write t k = kt/n for k N. Then we have E(Y 2 t = E ( E ( N 1 k= sup k<n (Y 2 t k+1 Y 2 t k Y tk+1 Y tk } {{ } V t n = E N 1 ( N 1 (Y tk+1 Y tk 2 k= Y tk+1 Y tk k= } {{ } V t n where the second equality follows from the orthogonality of increments, and the last from the definition of the stopping time T n. Now, by continuity of Y we have that lim sup N k<n Y tk+1 Y tk = almost surely. We can therefore apply the Bounded Convergence Theorem to conclude that ( N 1 E sup Y tk+1 Y tk Y tk+1 Y tk as N, k<n and thus E(Y 2 t = for all t, which is what we wanted to show. Remark 3.12. Note that: k= (1 The continuity assumption cannot be removed, as we will see in the next section. (2 The theorem shows that Brownian Motion has infinite variation, and hence the Lebesgue- Stieltjes theory of integration cannot be used to define integrals with respect to it. We conclude this section by observing that, although we were able to introduce an explicit reducing sequence (3.1 for X in the proof of Theorem 3.11, this was based on the total variation of X being finite, which fails to hold when dealing with non-constant continuous local martingales. The next result provides an explicit reducing sequence for this class of processes. Proposition 3.13. Let X be a continuous local martingale with X =. For n 1 define the stopping times T n = inf{t : X t = n}. Then the sequence (T n n 1 reduces X. Note that this is the same sequence of stopping times we used to show that M t = 1/ B t is a local martingale in Example 3.4. Proof. To see that (T n n 1 is a sequence of stopping times, note that for all t we can write { } {T n t} = sup X s n = { X s > n 1 } F t, s [,t] k k 1 s t s Q,

16 JASON MILLER AND VITTORIA SILVESTRI which shows that T n is a stopping time for al n. We now prove that T n (ω for all ω Ω. Indeed, fix any ω Ω. Since sup s [,t] X s (ω < by continuity of X, for any t there exists n(ω, t finite such that n(ω, t > sup X s (ω, s [,t] which implies that T n (ω > t for all n n(ω, t, and hence that T n (ω + as n. It only remains to show that (T n n 1 reduces X. Let (Tn n 1 be a reducing sequence for X (such sequence exists by assumption, since X is a local martingale. If T n = Tn for all n there is nothing to prove. Otherwise, we know that X T n is a martingale for all n, and we need to deduce that the same holds for X Tn. Indeed, pick any n 1. Then by the OST the process X T n Tm is a martingale for all m, and hence X Tn is a local martingale with reducing sequence (Tm m 1. But X Tn is also bounded by definition of T n, and so it is a true martingale. Since n was arbitrary, this shows that (T n n 1 is a reducing sequence for X. 4. The stochastic integral In this section we develop a new theory of integration, which will allow us to integrate against continuous local martingales such as Brownian Motion or the process defined in Example 3.4. Recall that a real-valued process X = (X t t is L 2 -bounded if sup t X t L 2 <, or equivalently if sup t E(Xt 2 <. The following two results are known to the reader from the Advanced Probability course, and they will be repeatedly used in this section. Theorem 4.1. Let X be a cádlág and L 2 -bounded martingale. Then there exists a random variable X such that Moreover, X t = E(X F t a.s. for all t. X t X almost surely and in L 2 as t. Proposition 4.2 (Doob s L 2 inequality. Let X be a cádlág and L 2 -bounded martingale. Then ( E sup X t 2 4E(X. 2 t Define the following sets: (4.1 M 2 = { L 2 bounded, cádlág martingales }, M 2 c = { L 2 bounded, continuous martingales }, M 2 c,loc = { L 2 bounded, continuous local martingales }.

STOCHASTIC CALCULUS 17 Definition 4.3 (Simple process. A process H : Ω (, R is said to be a simple process if it writes as n 1 H(ω, t = Z k (ω1 (tk,t k+1 ](t k= for some n N, = t < t 1 < t n <, and Z k bounded F tk -measurable random variables for all k. We let S denote the set of simple processes. We will start by defining integrals of simple processes against L 2 bounded, cádlág martingales (H S, M M 2. To extend this definition to more general integrands we will restrict to continuous integrators (M M 2 c, and equip M 2 c with a Hilbert space structure. We will then introduce the concept of quadratic variation, and use it to define a Hilbert space structure on a larger space of integrands (depending on the integrator, which we call L 2 (M. Finally, we will note that the stochastic integral defines a Hilbert space isometry between M 2 c and L 2 (M when restricted to simple integrands, and show that asking for this property to hold for all integrands in L 2 (M uniquely defines the stochastic integral on this larger space. 4.1. Simple integrands. In this section we define integrals of simple processes against L 2 bounded cádlág martingales M M 2. Throughout, our integrals will be over the interval (, t] and will take the value at. Definition 4.4. For H t = n 1 k= Z k1 (tk,t k+1 ](t simple process, and M M 2, set n 1 (H M t := Z k (M tk+1 t M tk t. k= Note that this is a continuous time version of the discrete martingale transform defined in (1.1. The next result show that the stochastic integral of a simple process against M M 2 is in M 2. Proposition 4.5. If H S and M M 2, then also H M M 2. Moreover, n 1 (4.2 E[(H M 2 ] = E[Zk 2 (M t k+1 M tk 2 ] 4 H E[(M M 2 ]. k= Proof. We first show that H M is a martingale. For t k s t t k+1, we have that (H M t (H M s = Z k (M t M s, so that E[(H M t (H M s F s ] = Z k E[M t M s F s ] =,

18 JASON MILLER AND VITTORIA SILVESTRI as M is a martingale. This extends to general s t by the tower property. Indeed, take t j s t j+1 t k t t k+1. Then we have E[(H M t (H M s F s ] = since = E = [ k 1 k 1 j 1 ] Z i (M ti+1 M ti + Z k (M t M tk Z i (M ti+1 M ti Z j (M s M tj F s i= i=j+1 E[Z i (M ti+1 M ti F s ] + E[Z j (M tj+1 M s F s ] + E[Z k (M t M tk F s ] =, E[Z i (M ti+1 M ti F s ] = E[Z i E(M ti+1 M ti F ti F s ] = for all j + 1 i k 1, E[Z j (M tj+1 M s F s ] = Z j E(M tj+1 M s F s =, and E[Z k (M t M tk F s ] = E[Z k E(M t M tk F tk F s ] =. We thus conclude that H M is a martingale. To see that H M is bounded in L 2, we observe that if j < k then E[Z j (M tj+1 M tj Z k (M tk+1 M tk ] = E[Z j (M tj+1 M tj Z k E[M tk+1 M tk F tk ]] =. Therefore E[(H M 2 t ] = E ( n 1 i= 2 Z k (M tk+1 t M tk t k= n 1 = E [ Zk 2 (M t k+1 t M tk t 2] k= n 1 H 2 L E[(M tk+1 t M tk t 2 ] k= = H 2 L E[(M t n t M 2 ] 4 H 2 L E[(M M 2 ], where the last line follows from Doob s L 2 inequality. Since this bound is uniform in t, we conclude that H M is bounded in L 2, and so H M M 2. Finally, to see that (4.2 holds, note that E[(H M 2 ] = lim E[(H M 2 t ] sup E[(H M 2 t ] 4 H 2 L E[(M M 2 ] t by the previous inequality. t In what follows it will also be useful to understand how the stochastic integral defined above behaves under stopping. The following result tells us that stopping the stochastic integral is equivalent to stopping the martingale M.

STOCHASTIC CALCULUS 19 Proposition 4.6. Let H S and M M 2. Then for any stopping time T it holds H M T = (H M T. Proof. We have that n 1 (H M T t = Z k (Mt T k+1 t Mt T k t k= n 1 = Z k (M T tk+1 t M T tk t k= = (H M t T = (H M T t for any t. 4.2. The space of integrators. In this section we equip the space of integrators M 2 c with a Hilbert space structure. For X cádlág adapted process, define the norm and let X = X L 2 where X = sup X t, t C 2 = { cádlág adapted processes X with X < }. Recall from (4.1 the definition of the sets M 2, M 2 c, M 2 c,loc, and define the following norm on M 2 X := X L 2. This is clearly a seminorm. To see that it is a norm we have to show that X = implies X almost surely. Indeed, if X = X L 2 = then X = almost surely. Moreover, by Theorem 4.1 we have X t = E(X F t = almost surely, for all t. It then follows by continuity of X that X almost surely. Define further (4.3 M = { cádlág martingales }, M c = { continuous martingales }, Proposition 4.7. The following hold: M c,loc = { continuous local martingales }. (a (C 2, is complete. (b M 2 = M C 2. (c (M 2, is a Hilbert space and M 2 c = M c M 2 is a closed subspace.

2 JASON MILLER AND VITTORIA SILVESTRI (d The map M 2 L 2 (F given by X X is an isometry. Remark 4.8. We can identify an element of M 2 with its terminal value and then (M 2, inherits the Hilbert space structure of (L 2 (F, L 2. Moreover, since (M 2 c, is a closed linear subspace of M 2 by (c, it is itself a Hilbert space: this will be our space of integrators. Proof. To prove (a we have to show that any Cauchy sequence in (C 2, converges to an element of C 2. To this end, we will first show that a cádlág subsequential limit exists for any such sequence, and then prove that in fact the whole sequence converges to such limit in norm. Let (X n n 1 be a Cauchy sequence in (C 2,. Then we can find a subsequence (X n k k 1 such that k Xn k+1 X n k <. It follows that sup X n k+1 t X n k t X n k+1 X n k <. t k L 2 k Therefore k sup t X n k+1 t X n k t < almost surely, and hence sup t X n k+1 t X n k t as k almost surely. This shows that (X n k(ω k 1 converges uniformly on [, as k to some limit, which must then be cádlág as uniform limit of cádlág functions. Let X denote such limit: it remains to show that X n X in norm. Indeed, ( X n X 2 = E sup Xt n X t 2 t ( E lim inf k as (X n is Cauchy. This proves (a. lim sup k t ( E Xt n X n k t 2 sup Xt n X n k t 2 t (Fatou s lemma = lim inf k Xn X n k 2 as n To see (b we show that both inclusions hold. Suppose that X C 2 M. Then X <, and so sup X t L 2 sup X t = X. t t L 2 Therefore X M 2. On the other hand, if X M 2, by Doob s L 2 -inequality we find X 2 X L 2 <, so X C 2 M. Hence M 2 = M C 2, which proves (b. We now prove (c.

STOCHASTIC CALCULUS 21 Note that (X, Y E(X Y defines an inner product on M 2. For X M 2, we have just shown that X X 2 X, so and are equivalent norms. Therefore, in order to show that (M 2, is complete we can equivalently show that (M 2, is complete. By (a, it suffices to show that M 2 is closed in (C 2,. To see this, let (X n n 1 be a sequence in M 2 such that X n X as n for some X. Then X is cádlág, adapted and bounded in L 2. To see that X is also a martingale, note that for s < t E(X t F s X s L 2 E(X t X n t F s + X n s X s L 2 E(X t Xt n F s L 2 + X s n X L s 2 Xt Xt n L 2 + X n s X L s 2 2 X n X as n. (Minkowski s inequality Therefore X M 2, which is therefore closed. Exactly the same argument applies to show that M 2 c is a closed subspace of (M 2,. This proves (c. Part (d follows from the definition. In this section we have equipped our space of integrators with a Hilbert space structure. In the next section we do the same for a suitable space of integrands. 4.3. The space of integrands. The crucial tool that will allow us to equip the space of integrands with a Hilbert space structure is the so-called quadratic variation, that we now discuss in detail. Definition 4.9 (Ucp convergence. Let (X n n 1 be a sequence of processes. We say that X n X uniformly on compacts in probability (ucp if for every ɛ, t > we have that ( P sup Xs n X s > ɛ s t as n. The following result defines the quadratic variation of a continuous local martingale. Theorem 4.1 (Quadratic variation. Let M be a continuous local martingale. Then there exists a unique (up to indistinguishability continuous, adapted and non-decreasing process [M] such that [M] = and M 2 [M] is a continuous, local martingale. Moreover, if we define [M] n t = 2 n t 1 k= (M (k+12 n M k2 n 2 then [M] n [M] ucp as n. The process [M] is called the quadratic variation of M.

22 JASON MILLER AND VITTORIA SILVESTRI Proof of Theorem 4.1. By replacing M with M t M if necessary, we may assume without loss of generality that M =. Step 1: Uniqueness. If A and A are two increasing processes satisfying the conditions of the theorem, then we can write A t A t = (M 2 t A t (M 2 t A t. The left hand side is a continuous process of bounded variation as it is given by the difference of two non-decreasing functions while the right hand side is a continuous local martingale as it is given by the difference of two continuous local martingales. Since a continuous local martingale starting from with bounded variation is almost surely equal to, it follows that A A. That is, A A, as desired. Step 2: Existence, M bounded. We shall first consider the case that M is a bounded, continuous martingale. Then we have that M M 2 c. We first build the quadratic variation process on compact time sets [, T ], for T deterministic and finite. Fix such T > and let H n t = 2 n T 1 k= M k2 n1 (k2 n,(k+12 n ](t. Note that H n approximates M T for large n. Then, since H n is a simple process for all n, we have that X n t = (H n M t = 2 n T 1 k= M k2 n(m (k+12 n t M k2 n t is an L 2 -bounded continuous martingale, that is X n M 2 c for all n. We next show that the sequence (X n n 1 is Cauchy in norm, and therefore it has a limit in M 2 c. Pick n m 1 and write H in place of H n H m to shorten the notation. Then X n X m = (H n H m M = H M,

STOCHASTIC CALCULUS 23 and so X n X m 2 = E[(H M 2 ] = E[(H M 2 T ] [( 2 n T 1 2 ] = E H k2 n(m (k+12 n M k2 n k= [ 2 n T 1 ] = E Hk2 2 (M n (k+12 n M k2 n 2 [ E k= sup t [,T ] 2n T 1 ] H t 2 (M (k+12 n M k2 n 2 k= [( 2 ] 1/2 E sup H t 2 E t [,T ] [ = E ] 1/2 sup Ht n Ht m 4 E t [,T ] [( 2 n T 1 k= [( 2 n T 1 k= (M (k+12 n M k2 n 2 2 ] 1/2 (M (k+12 n M k2 n 2 2 ] 1/2, where we have used Cauchy-Schwartz in the last inequality. Note that the first term is bounded by sup Ht n Ht m 4 16 M 4 L t [,T ] since M is bounded, and sup Ht n Ht m 4 almost surely as n, m t [,T ] by uniform continuity of M on [, T ]. It therefore follows from the Bounded Convergence Theorem that [ ] 1/2 E sup Ht n Ht m 4 t [,T ] as n, m. We claim that the second term is bounded since M is. Lemma 4.11. Let M M be bounded. Then, for any N 1 and = t t 1... t N < we have that [( N 1 2 ] E (M tk+1 M tk 2 48 M 4 L. Proof. Note that (4.4 E [( N 1 k= 2 ] (M tk+1 M tk 2 = k= = N 1 k= N 1 E[(M tk+1 M tk 4 ] + 2 k= N 1 E [(M tk+1 M tk 2 j=k+1 (M tj+1 M tj 2 ].

24 JASON MILLER AND VITTORIA SILVESTRI For each fixed k, we have N 1 ] E [(M tk+1 M tk 2 (M tj+1 M tj 2 = j=k+1 Inserting this into (4.4 we then get E [( N 1 2 ] (M tk+1 M tk 2 k= [ = E (M tk+1 M tk 2 E [ = E (M tk+1 M tk 2 E ( N 1 j=k+1 [( N 1 j=k+1 (M tj+1 M tj 2 Ftk+1 ] 2 ] Ftk+1 (M tj+1 M tj = E [ (M tk+1 M tk 2 E[(M tn M tk+1 2 F tk+1 ] ] = E [ (M tk+1 M tk 2 (M tn M tk+1 2] [( N 1 ] E sup M tj+1 M tj 2 + 2 sup M tn M tj 2 (M tk+1 M tk 2 j N 1 j N 1 k= [ N 1 ] 12 M 2 L E (M tk+1 M tk 2 k= = 12 M 2 L E[(M t N M t 2 ] 48 M 4 L. Going back to the proof of the theorem, this implies that X n X m 2 as n, m, and so (X n n 1 is Cauchy in (M 2 c,. It follows that there exists Y M 2 c such that X n Y in norm as n. Now, for any n and 1 k 2 n T, we can write k 1 Mk2 2 2X n n k2 = (M n (j+12 n M j2 n 2 = [M] n k2. n j= Consequently, for each n 1 the process M 2 2X n is non-decreasing when restricted to the set of times {k2 n : 1 k 2 n T }. To show that the same holds for the limiting process M 2 2Y it suffices to prove that X n Y almost surely along a subsequence, which is true by the equivalence of the norms and.

STOCHASTIC CALCULUS 25 Set [M] t = M 2 t 2Y t. Then [M] = ([M] t t T is a continuous, adapted non-decreasing process and M 2 [M] = 2Y is a martingale on [, T ]. We can extend this definition to all times in [, by applying the above for each T = k N. By uniqueness, we note that the process obtained with T = k must be the restriction to [, k] of the process obtained with T = k + 1. Finally we show that [M] n [M] ucp as n. To this end, note that the convergence X n Y in (M 2 c, implies, by the equivalence of and, that sup t T X n t Y t in L 2 as n, and so in particular (4.5 sup Xt n Y t in probability as n. t T Now, [M] n t = M 2 2 n 2 n t 2Xn 2 n 2 n t and so sup [M] t [M] n t sup M2 2 n 2 n t M 2 t t T t T + 2 sup t T + 2 sup t T X n 2 n 2 n t Y 2 n 2 n t Y 2 n 2 n t Y t. Each term in the right hand side converges to in probability as n. Indeed, this follows for the first and third terms by uniform continuity of M 2 and Y on [, T ], and for the second term by (4.5. Thus [M] n [M] ucp as n. Step 3: Existence, M M c,loc. We now obtain existence for the general case M M c,loc by mean of a localisation argument. For each n 1, define T n = inf{t : M t n}. Then (T n n 1 reduces M and moreover M Tn is a bounded martingale by construction for all n. Thus we can apply Step 2 to conclude that there exists a unique continuous, adapted and nondecreasing process [M Tn ] on [, such that [M Tn ] = and (M Tn 2 [M Tn ] M c,loc. Write A n = [M Tn ] for simplicity. By uniqueness, (A n+1 t T n t and (A n t t must be indistinguishable. We thus let A be the non-decreasing process such that A t Tn = A n t for all n. By construction, Mt T 2 n A t Tn is a martingale for each n. Therefore M 2 A is a local martingale with reducing sequence (T n n 1, and we can take [M] t = A t.

26 JASON MILLER AND VITTORIA SILVESTRI Finally, to see that [M] n [M] ucp as n, note that [M T k] n [M T k] as n ucp by Step 2, for all k, that is for all ɛ > it holds ( (4.6 P [M T k ] n t [M T k ] t > ɛ as n. sup t [,T ] Observing that on the event {T k > T } we have [M] n t = [M T k] n t and [M] t = [M T k] t for all t T, we have that ( ( P [M] n t [M] t > ɛ P(T k T + P [M T k ] n t [M T k ] t > ɛ sup t [,T ] sup t [,T ] as n, since P(T k T as k, and the second term in the r.h.s. converges to as n for all k by (4.6. Example 4.12. Let B be a standard Brownian motion. Then (B 2 t t t is a martingale. Therefore [B] t = t. As we will see later, the fact that [B] t = t singles out the standard Brownian motion among continuous local martingales. This is the so-called Lévy characterization of Brownian motion. The following result will be useful later on, and it shows that the a priori local martingale M 2 [M] is in fact a true UI martingale if M M 2 c. Theorem 4.13. If M M 2 c, then M 2 [M] is a uniformly integrable martingale. Proof. We show that the martingale is dominated by an integrable random variable. Let S n = inf{t : [M] t n} for n 1. Then S n + and S n is a stopping time for all n, with [M] t Sn n. Therefore the local martingale Mt S 2 n [M] t Sn is dominated by the random variable n + sup t Mt 2, which is integrable by Doob s L 2 inequality. It then follows from Corollary 3.1 that Mt S 2 n [M] t Sn is a true martingale. The OST then gives E[[M] t Sn ] = E[M 2 t S n ]. Letting t and using the Monotone Convergence Theorem on the l.h.s. and the Dominated Convergence Theorem on the r.h.s., we conclude that E[[M] Sn ] = E[M 2 S n ]. Sending n and using the same theorems to exchange limits and expectation, we find that E[[M] ] = E[M 2 ] <, thus showing that [M] is integrable. Moreover, Mt 2 [M] t sup Mt 2 + [M] t, t and the r.h.s. is integrable, so we conclude that M 2 t [M] t is a true UI martingale.

STOCHASTIC CALCULUS 27 4.3.1. The space L 2 (M. We can finally define our space of integrands. Recall that P denotes the previsible σ-algebra on Ω (,. Given M M 2 c, we define a measure µ on P by setting µ(e (s, t] = E ( 1(E([M] t [M] s for s t, E F s. Since P is generated by a π-system of events of the form E (s, t] for E, s, t as above, this uniquely specifies µ. Note that this is equivalent to defining µ(dω dt := d[m](ω, dtp(dω, where, for fixed ω Ω, the measure d[m](ω, is the Lebesgue-Stieltjes measure associated to the non-decreasing function [M](ω. Then for a previsible process H we have that Ω (, ( Hdµ = E H s d[m] s. Definition 4.14. Let L 2 (M := L 2 (Ω (,, P, µ and write H L 2 (M = H M := [ E ( H 2 s d[m] s ] 1/2. Then L 2 (M is the set of previsible processes H such that H M <. This is clearly a Hilbert space, and it will be our space of integrands. Remark 4.15. Note that the space of integrands (L 2 (M, M depends on the integrator M M 2 c, since the measure µ depends on it. On the other hand, recalling that S denotes the set of simple processes, we have that S L 2 (M for all M M 2 c. 4.4. Itô integrals. Up to now, we have identified two Hilbert spaces, that of integrators (M 2 c, and that of integrands (L 2 (M, M. For a subset S of L 2 (M, that of simple processes of the form H t = n 1 k= Z k1 (tk,t k+1 ](t, we have defined the integral H M by setting n 1 (H M t = Z k (M tk+1 t M tk t M 2 c. k= We now argue that for any M M 2 c the map H H M provides an isometry between (L 2 (M, M and (M 2 c, when restricted to simple processes S L 2 (M. To see this, we have to show that H M = H M for all H S, M M 2 c. Indeed, recall from (4.2 that H M 2 = n 1 (H M 2 = E [ Z 2 L 2 k (M t k+1 M tk 2]. k=

28 JASON MILLER AND VITTORIA SILVESTRI Moreover, by Theorem 4.13 the process M 2 [M] is a true martingale, so that E [ Z 2 k (M t k+1 M tk 2] = E [ Z 2 k E( (M tk+1 M tk 2 F tk ] Therefore (4.7 H M 2 = E ( n 1 k= = E [ Z 2 k E( M 2 t k+1 M 2 t k F tk ] = E [ Z 2 k E( [M] tk+1 [M] tk F tk ] = E [ Z 2 k ([M] t k+1 [M] tk ]. ( Zk 2 ([M] t k+1 [M] tk = E Hs 2 d[m] s = H 2 M, thus showing that the stochastic integral provides an isometry between L 2 (M and M 2 c when restricted to simple processes. The following theorem is the main result of this section, and it tells us that this property uniquely defines the stochastic integral on the whole L 2 (M. Theorem 4.16 (Itô s isometry. There exists a unique isometry I : L 2 (M M 2 c such that I(H = H M for all H S. Definition 4.17. For any M M 2 c and H L 2 (M, let H M := I(H, where I is the unique isometry from (L 2 (M, M to (M 2 c, of Theorem 4.16. We now proceed with the proof of Theorem 4.16 above. To start with, we show that simple processes are dense in L 2 (M, so that there is hope to extend the definition of the stochastic integral from S to the whole L 2 (M uniquely. Lemma 4.18. Let ν be a finite measure on P, the previsible σ-algebra. Then S is a dense subspace of L 2 (P, ν. In particular, taking ν = µ we have that S is a dense subspace of L 2 (M for any M M 2 c. Proof. It is certainly the case that S L 2 (P, ν. Denote by S the closure of S in L 2 (P, ν. We aim to show that S = L 2 (P, ν. Let A = {A P : 1 A S}. Then clearly A P. To see the reverse inclusion, observe that A is a d-system containing the π-system {E (s, t] : E F s, s t}, which generates P. So, by Dynkin s lemma, we have that A P and hence A = P. This shows that S contains indicator functions of all previsible processes. The result then follows because finite linear combinations of such indicators are dense in L 2 (P, ν. We can now prove Itô s isometry.

STOCHASTIC CALCULUS 29 Proof of Theorem 4.16. Take any H L 2 (M. Then by Lemma 4.18 there exists a sequence (H n n 1 in S L 2 (M such that H n H M as n, that is ( E (Hs n H s 2 d[m] s as n. In particular, (H n n 1 is Cauchy in M norm. We argue that this implies that (I(H n n 1 is Cauchy in norm. Indeed, by linearity of the integral for simple processes we have I(H n I(H m = H n M H m M = (H n H m M = H n H m M as n, m, where the last equality follows from (4.7. Thus I(H n = H n M converges in norm to an element of M 2 c, that we define to be I(H. Note that this definition does not depend on the choice of the approximating sequence. Indeed, if (K n n 1 is another sequence of simple processes converging to H in M norm, then I(H n I(K n = H n M K n M = H n K n M H n H M + K n H M as n, thus showing that the limits of H n M and K n M must be indistinguishable. This uniquely extends I to the whole L 2 (M. It only remains to show that the resulting map is an isometry from L 2 (M to M 2 c. Indeed, since convergence in norm implies convergence of the norms by the reverse triangle inequality 2, we have H M = lim n Hn M = lim n Hn M = H M for all H L 2 (M, as required. We write I(H t = (H M t = H s dm s. The process H M is Itô s stochastic integral of H with respect to M. Remark 4.19. Recall from the introduction that we were seeking to define HdM in such a way that if H n is a sequence of simple processes converging to H, then H n dm HdM. The above definition matches this requirement by construction, where the convergence H n H is to be intended in M norm, and the one of the integrals H n dm HdM is to be intended in norm. 2 x y x y.