Continuous Time Finance Lisbon 2013 Tomas Björk Stockholm School of Economics Tomas Björk, 2013
Contents Stochastic Calculus (Ch 4-5). Black-Scholes (Ch 6-7. Completeness and hedging (Ch 8-9. The martingale approach (Ch 10-12). Incomplete markets (Ch 15). Dividends (Ch 16). Currency derivatives (Ch 17). Stochastic Control Theory (Ch 19) Martingale Methods for Optimal Investment (Ch 20) Textbook: Björk, T: Arbitrage Theory in Continuous Time Oxford University Press, 2009. (3:rd ed.) Tomas Björk, 2013 1
Notation X t = any random process, dt = small time step, dx t = X t+dt X t We often write X(t) instead of X t. dx t is called the increment of X over the interval [t,t + dt]. For any fixed interval [t,t + dt], the increment dx t is a stochastic variable. If the increments dx s and dx t, over the disjoint intervals [s, s + ds] and [t,t + dt] are independent, then we say that X has independent increments. If every increment has a normal distribution we say that X is a normal, or Gaussian process. Tomas Björk, 2013 6
The Wiener Process A stochastic process W is called a Wiener process if it has the following properties The increments are normally distributed: For s < t: W t W s N[0, t s] E[W t W s ] = 0, V ar[w t W s ] = t s W has independent increments. W 0 = 0 W has continuous trajectories. Continuous random walk Note: In Hull, a Wiener process is typically denoted by Z instead of W. Tomas Björk, 2013 7
A Wiener Trajectory 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 t Tomas Björk, 2013 8
Important Fact Theorem: A Wiener trajectory is, with probability one, a continuous curve which is nowhere differentiable. Proof. Hard. Tomas Björk, 2013 9
Wiener Process with Drift A stochastic process X is called a Wiener process with drift µ and diffusion coefficient σ if it has the following dynamics dx t = µdt + σdw t, where µ and σ are constants. Summing all increments over the interval [0, t] gives us X t X 0 = µ t + σ (W t W 0 ), X t = X 0 + µt + σw t Thus X t N[X 0 + µt, σ 2 t] Tomas Björk, 2013 10
Itô processes We say, losely speaking, that the process X is an Itô process if it has dynamics of the form dx t = µ t dt + σ t dw t, where µ t and σ t are random processes. Informally you can think of dw t as a random variable of the form dw t N[0,dt] To handle expressions like the one above, we need some mathematical theory. First, however, we present an important example, which we will discuss informally. Tomas Björk, 2013 11
Example: The Black-Scholes model Price dynamics: (Geometrical Brownian Motion) ds t = µs t dt + σs t dw t, Simple analysis: Assume that σ = 0. Then ds t = µs t dt Divide by dt! ds t dt = µs t This is a simple ordinary differential equation with solution S t = s 0 e µt Conjecture: The solution of the SDE above is a randomly disturbed exponential function. Tomas Björk, 2013 12
Intuitive Economic Interpretation ds t S t = µdt + σdw t Over a small time interval [t, t + dt] this means: Return = (mean return) + σ (Gaussian random disturbance) The asset return is a random walk (with drift). µ = mean rate of return per unit time σ = volatility Large σ = large random fluctuations Small σ = small random fluctuations The returns are normal. The stock price is lognormal. Tomas Björk, 2013 13
A GBM Trajectory 10 9 8 7 6 5 4 3 2 1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 t Tomas Björk, 2013 14
Stochastic Differentials and Integrals Consider an expression of the form dx t = µ t dt + σ t dw t, X 0 = x 0 Question: What exactly do we mean by this? Answer: Write the equation on integrated form as X t = x 0 + t 0 µ s ds + t 0 σ s dw s How is this interpreted? Tomas Björk, 2013 15
Recall: X t = x 0 + t 0 µ s ds + t 0 σ s dw s Two terms: t µ s ds 0 This is a standard Riemann integral for each µ- trajectory. t 0 σ s dw s Stochastic integral. This can not be interpreted as a Stieljes integral for each trajectory. We need a new theory for this Itô integral. Tomas Björk, 2013 16
Information Consider a Wiener process W. Def: F W t = The information generated by W over the interval [0,t] Def: Let Z be a stochastic variable. If the value of Z is completely determined by F W t, we write Z F W t Ex: For the stochastic variable Z, defined by Z = 5 0 W s ds, we have Z F W 5. We do not have Z F W 4. Tomas Björk, 2013 17
Adapted Processes Let W be a Wiener process. Definition: A process X is adapted to the filtration { F W t : t 0 } if X t F W t, t 0 An adapted process does not look into the future Adapted processes are nice integrands for stochastic integrals. Tomas Björk, 2013 18
The process is adapted. X t = t 0 W s ds, The process is adapted. X t = sup s t W s The process is not adapted. X t = sup W s s t+1 Tomas Björk, 2013 19
The Itô Integral We will define the Itô integral b a g s dw s for processes g satisfying The process g is adapted. The process g satisfies b a E [ g 2 s] ds < This will be done in two steps. Tomas Björk, 2013 20
Simple Integrands Definition: The process g is simple, if g is adapted. There exists deterministic points t 0...,t n with a = t 0 < t 1 <... < t n = b such that g is piecewise constant, i.e. g(s) = g(t k ), s [t k, t k+1 ) For simple g we define b a g s dw s = n 1 k=0 g(t k ) [W(t k+1 ) W(t k )] FORWARD INCREMENTS! Tomas Björk, 2013 21
Properties of the Integral Theorem: For simple g the following relations hold The expected value is given by E [ b a g s dw s ] = 0 The second moment is given by E ( b a ) 2 g s dw s = b a E [ g 2 s] ds We have b a g s dw s F W b Tomas Björk, 2013 22
General Case For a general g we do as follows. 1. Approximate g with a sequence of simple g n such that b [ E {g n (s) g(s)} 2] ds 0. 2. For each n the integral a b a g n (s)dw(s) is a well defined stochastic variable Z n. 3. One can show that the Z n sequence converges to a limiting stochastic variable. 4. We define b a gdw by b a b g(s)dw(s) = lim g n (s)dw(s). n a Tomas Björk, 2013 23
Properties of the Integral Theorem: For general g following relations hold The expected value is given by E [ b a g s dw s ] = 0 We do in fact have E [ b a g s dw s F a ] = 0 The second moment is given by E ( b a ) 2 g s dw s = b a E [ g 2 s] ds Tomas Björk, 2013 24
We have b a g s dw s F W b Tomas Björk, 2013 25
Martingales Definition: An adapted process is a martingale if E [X t F s ] = X s, s t A martingale is a process without drift Proposition: For any g (sufficiently integrable) he process X t = is a martingale. t 0 g s dw s Proposition: If X has dynamics dx t = µ t dt + σ t dw t then X is a martingale iff µ = 0. Tomas Björk, 2013 26
Continuous Time Finance Stochastic Calculus (Ch 4-5) Tomas Björk Tomas Björk, 2013 27
Stochastic Calculus General Model: dx t = µ t dt + σ t dw t Let the function f(t, x) be given, and define the stochastic process Z t by Z t = f(t, X t ) Problem: What does df(t,x t ) look like? The answer is given by the Itô formula. We provide an intuitive argument. The formal proof is very hard. Tomas Björk, 2013 28
A close up of the Wiener process Consider an infinitesimal Wiener increment We know: dw t = W t+dt W t dw t N[0, dt] E[dW t ] = 0, V ar[dw t ] = dt From this one can show E[(dW t ) 2 ] = dt, V ar[(dw t ) 2 ] = 2(dt) 2 Tomas Björk, 2013 29
Recall E[(dW t ) 2 ] = dt, V ar[(dw t ) 2 ] = 2(dt) 2 Important observation: 1. Both E[(dW t ) 2 ] and V ar[(dw t ) 2 ] are very small when dt is small. 2. V ar[(dw t ) 2 ] is negligable compared to E[(dW t ) 2 ]. 3. Thus (dw t ) 2 is deterministic. We thus conclude, at least intuitively, that (dw t ) 2 = dt This was only an intuitive argument, but it can be proved rigorously. Tomas Björk, 2013 30
Multiplication table. Theorem: We have the following multiplication table (dt) 2 = 0 dw t dt = 0 (dw t ) 2 = dt Tomas Björk, 2013 31
Deriving the Itô formula dx t = µ t dt + σ t dw t Z t = f(t, X t ) We want to compute df(t,x t ) Make a Taylor expansion of f(t, X t ) including second order terms: df = f f dt + t x dx t + 1 2 f 2 t 2 (dt)2 + 1 2 f 2 x 2(dX t) 2 + 2 f t x dt dx t Plug in the expression for dx, expand, and use the multiplication table! Tomas Björk, 2013 32
df = f f dt + t x [µdt + σdw] + 1 2 f 2 t 2 (dt)2 + 1 2 f 2 x 2[µdt + σdw]2 + 2 f dt [µdt + σdw] t x = f dt + µ fdt + σ f t x x dw + 1 2 2 f t 2 (dt)2 + 1 2 f 2 x 2[µ2 (dt) 2 + σ 2 (dw) 2 + 2µσdt dw] + µ 2 f t x (dt)2 + σ 2 f dt dw t x Using the multiplikation table this reduces to: { f df = t + µ f x + 1 } f 2 σ2 2 x 2 dt + σ f x dw Tomas Björk, 2013 33
The Itô Formula Theorem: With X dynamics given by dx t = µ t dt + σ t dw t we have df(t, X t ) = { f t + µ f x + 1 } f 2 σ2 2 x 2 dt + σ f x dw t Alternatively df(t, X t ) = f f dt + t x dx t + 1 2 f 2 x 2 (dx t) 2, where we use the multiplication table. Tomas Björk, 2013 34
Example: GBM ds t = µs t dt + σs t dw t We smell something exponential! Natural Ansatz: S t = e Z t, Z t = ln S t Itô on f(t, s) = ln(s) gives us f s = 1 s, f t = 0, 2 f s 2 = 1 s 2 dz t = 1 S t ds t 1 2 1 S 2 t (ds t ) 2 = (µ 12 ) σ2 dt + σdw t Tomas Björk, 2013 35
Recall Integrate! dz t = (µ 12 ) σ2 dt + σdw t Z t Z 0 = = t 0 (µ 12 ) σ2 ds + σ (µ 12 ) σ2 t + σw t t 0 dw s Using S t = e Z t gives us S t = S 0 e ( µ 1 2 σ2 )t+σw t Since W t is N[0,t], we see that S t has a lognormal distribution. Tomas Björk, 2013 36
Changing Measures Consider a probability measure P on (Ω, F), and assume that L F is a random variable with the properties that L 0 and E P [L] = 1. For every event A F we now define the real number Q(A) by the prescription Q(A) = E P [L I A ] where the random variable I A is the indicator for A, i.e. { 1 if A occurs I A = 0 if A c occurs Tomas Björk, 2013 139
Recall that Q(A) = E P [L I A ] We now see that Q(A) 0 for all A, and that Q(Ω) = E P [L I Ω ] = E P [L 1] = 1 We also see that if A B = then Q(A B) = E P [L I A B ] = E P [L (I A + I B )] = E P [L I A ] + E P [L I B ] = Q(A) + Q(B) Furthermore we see that P(A) = 0 Q(A) = 0 We have thus more or less proved the following Tomas Björk, 2013 140
Proposition 2: If L F is a nonnegative random variable with E P [L] = 1 and Q is defined by Q(A) = E P [L I A ] then Q will be a probability measure on F with the property that P(A) = 0 Q(A) = 0. I turns out that the property above is a very important one, so we give it a name. Tomas Björk, 2013 141
Absolute Continuity Definition: Given two probability measures P and Q on F we say that Q is absolutely continuous w.r.t. P on F if, for all A F, we have P(A) = 0 Q(A) = 0 We write this as Q << P. If Q << P and P << Q then we say that P and Q are equivalent and write Q P Tomas Björk, 2013 142
Equivalent measures It is easy to see that P and Q are equivalent if and only if P(A) = 0 Q(A) = 0 or, equivalently, P(A) = 1 Q(A) = 1 Two equivalent measures thus agree on all certain events and on all impossible events, but can disagree on all other events. Simple examples: All non degenerate Gaussian distributions on R are equivalent. If P is Gaussian on R and Q is exponential then Q << P but not the other way around. Tomas Björk, 2013 143
Absolute Continuity ct d We have seen that if we are given P and define Q by Q(A) = E P [L I A ] for L 0 with E P [L] = 1, then Q is a probability measure and Q << P.. A natural question is now if all measures Q << P are obtained in this way. The answer is yes, and the precise (quite deep) result is as follows. The proof is difficult and therefore omitted. Tomas Björk, 2013 144
The Radon Nikodym Theorem Consider two probability measures P and Q on (Ω, F), and assume that Q << P on F. Then there exists a unique random variable L with the following properties 1. Q(A) = E P [L I A ], A F 2. L 0, P a.s. 3. E P [L] = 1, 4. L F The random variable L is denoted as L = dq dp, on F and it is called the Radon-Nikodym derivative of Q w.r.t. P on F, or the likelihood ratio between Q and P on F. Tomas Björk, 2013 145
A simple example The Radon-Nikodym derivative L is intuitively the local scale factor between P and Q. If the sample space Ω is finite so Ω = {ω 1,...,ω n } then P is determined by the probabilities p 1,...,p n where p i = P(ω i ) i = 1,...,n Now consider a measure Q with probabilities q i = Q(ω i ) i = 1,...,n If Q << P this simply says that p i = 0 q i = 0 and it is easy to see that the Radon-Nikodym derivative L = dq/dp is given by L(ω i ) = q i p i i = 1,...,n Tomas Björk, 2013 146
If p i = 0 then we also have q i = 0 and we can define the ratio q i /p i arbitrarily. If p 1,...,p n as well as q 1,...,q n are all positive, then we see that Q P and in fact as could be expected. dp dq = 1 L = ( ) 1 dq dp Tomas Björk, 2013 147
The likelihood process on a filtered space We now consider the case when we have a probability measure P on some space Ω and that instead of just one σ-algebra F we have a filtration, i.e. an increasing family of σ-algebras {F t } t 0. The interpretation is as usual that F t is the information available to us at time t, and that we have F s F t for s t. Now assume that we also have another measure Q, and that for some fixed T, we have Q << P on F T. We define the random variable L T by L T = dq dp on F T Since Q << P on F T we also have Q << P on F t for all t T and we define L t = dq dp on F t 0 t T For every t we have L t F t, so L is an adapted process, known as the likelihood process. Tomas Björk, 2013 154
The L process is a P martingale We recall that L t = dq dp on F t 0 t T Since F s F t for s t we can use Proposition 5 and deduce that L s = E P [L t F s ] s t T and we have thus proved the following result. Proposition: Given the assumptions above, the likelihood process L is a P-martingale. Tomas Björk, 2013 155
Where are we heading? We are now going to perform measure transformations on Wiener spaces, where P will correspond to the objective measure and Q will be the risk neutral measure. For this we need define the proper likelihood process L and, since L is a P-martingale, we have the following natural questions. What does a martingale look like in a Wiener driven framework? Suppose that we have a P-Wiener process W and then change measure from P to Q. What are the properties of W under the new measure Q? These questions are handled by the Martingale Representation Theorem, and the Girsanov Theorem respectively. Tomas Björk, 2013 156
4. The Martingale Representation Theorem Tomas Björk, 2013 157
Intuition Suppose that we have a Wiener process W under the measure P. We recall that if h is adapted (and integrable enough) and if the process X is defined by X t = x 0 + t 0 h s dw s then X is a a martingale. We now have the following natural question: Question: Assume that X is an arbitrary martingale. Does it then follow that X has the form X t = x 0 + t 0 h s dw s for some adapted process h? In other words: Are all martingales stochastic integrals w.r.t. W? Tomas Björk, 2013 158
Answer It is immediately clear that all martingales can not be written as stochastic integrals w.r.t. W. Consider for example the process X defined by { 0 for 0 t < 1 X t = Z for t 1 where Z is an random variable, independent of W, with E [Z] = 0. X is then a martingale (why?) but it is clear (how?) that it cannot be written as X t = x 0 + t 0 h s dw s for any process h. Tomas Björk, 2013 159
Intuition The intuitive reason why we cannot write X t = x 0 + t 0 h s dw s in the example above is of course that the random variable Z has nothing to do with the Wiener process W. In order to exclude examples like this, we thus need an assumption which guarantees that our probability space only contains the Wiener process W and nothing else. This idea is formalized by assuming that the filtration {F t } t 0 is the one generated by the Wiener process W. Tomas Björk, 2013 160
The Martingale Representation Theorem Theorem. Let W be a P-Wiener process and assume that the filtation is the internal one i.e. F t = F W t = σ {W s ; 0 s t} Then, for every (P, F t )-martingale X, there exists a real number x and an adapted process h such that X t = x + t 0 h s dw s, i.e. dx t = h t dw t. Proof: Hard. This is very deep result. Tomas Björk, 2013 161
Note For a given martingale X, the Representation Theorem above guarantees the existence of a process h such that X t = x + t 0 h s dw s, The Theorem does not, however, tell us how to find or construct the process h. Tomas Björk, 2013 162
5. The Girsanov Theorem Tomas Björk, 2013 163
Setup Let W be a P-Wiener process and fix a time horizon T. Suppose that we want to change measure from P to Q on F T. For this we need a P-martingale L with L 0 = 1 to use as a likelihood process, and a natural way of constructing this is to choose a process g and then define L by { dlt = g t dw t L 0 = 1 This definition does not guarantee that L 0, so we make a small adjustment. We choose a process ϕ and define L by { dlt = L t ϕ t dw t L 0 = 1 The process L will again be a martingale and we easily obtain L t = e R t 0 ϕ sdw s 1 R t 2 0 ϕ2 s ds Tomas Björk, 2013 164
Thus we are guaranteed that L 0. We now change measure form P to Q by setting dq = L t dp, on F t, 0 t T The main problem is to find out what the properties of W are, under the new measure Q. This problem is resolved by the Girsanov Theorem. Tomas Björk, 2013 165
The Girsanov Theorem Let W be a P-Wiener process. Fix a time horizon T. Theorem: Choose an adapted process ϕ, and define the process L by { dlt = L t ϕ t dw t L 0 = 1 Assume that E P [L T ] = 1, and define a new mesure Q on F T by dq = L t dp, on F t, 0 t T Then Q << P and the process W Q, defined by W Q t = W t t 0 ϕ s ds is Q-Wiener. We can also write this as dw t = ϕ t dt + dw Q t Tomas Björk, 2013 166
Changing the drift in an SDE The single most common use of the Girsanov Theorem is as follows. Suppose that we have a process X with P dynamics dx t = µ t dt + σ t dw t where µ and σ are adapted and W is P-Wiener. We now do a Girsanov Transformation as above, and the question is what the Q-dynamics look like. From the Girsanov Theorem we have dw t = ϕ t dt + dw Q t and substituting this into the P-dynamics we obtain the Q dynamics as dx t = {µ t + σ t ϕ t } dt + σ t dw Q t Moral: The drift changes but the diffusion is unaffected. Tomas Björk, 2013 167
1. Dynamic Programming The basic idea. Deriving the HJB equation. The verification theorem. The linear quadratic regulator. Tomas Björk, 2013 323
Problem Formulation max u E [ T ] subject to 0 F(t, X t, u t )dt + Φ(X T ) dx t = µ(t,x t, u t ) dt + σ (t,x t,u t ) dw t X 0 = x 0, u t U(t, X t ), t. We will only consider feedback control laws, i.e. controls of the form u t = u(t, X t ) Terminology: X = state variable u = control variable U = control constraint Note: No state space constraints. Tomas Björk, 2013 324
Main idea Embedd the problem above in a family of problems indexed by starting point in time and space. Tie all these problems together by a PDE: the Hamilton Jacobi Bellman equation. The control problem is reduced to the problem of solving the deterministic HJB equation. Tomas Björk, 2013 325
Some notation For any fixed vector u R k, the functions µ u, σ u and C u are defined by µ u (t,x) = µ(t, x,u), σ u (t,x) = σ(t, x, u), C u (t,x) = σ(t, x, u)σ(t, x,u). For any control law u, the functions µ u, σ u, C u (t, x) and F u (t, x) are defined by µ u (t, x) = µ(t, x,u(t,x)), σ u (t, x) = σ(t, x,u(t, x)), C u (t, x) = σ(t, x,u(t, x))σ(t, x,u(t,x)), F u (t, x) = F(t, x,u(t, x)). Tomas Björk, 2013 326
More notation For any fixed vector u R k, the partial differential operator A u is defined by A u = n i=1 µ u i (t, x) x i + 1 2 n i,j=1 2 Cij(t, u x). x i x j For any control law u, the partial differential operator A u is defined by A u = n i=1 µ u i (t,x) x i + 1 2 n i,j=1 2 Cij(t, u x). x i x j For any control law u, the process X u is the solution of the SDE dx u t = µ (t, X u t,u t ) dt + σ (t, X u t,u t ) dw t, where u t = u(t, X u t ) Tomas Björk, 2013 327
Embedding the problem For every fixed (t, x) the control problem P t,x is defined as the problem to maximize E t,x [ T given the dynamics t F(s, X u s, u s )ds + Φ (X u T) dx u s = µ(s,x u s,u s ) ds + σ (s,x u s,u s ) dw s, X t = x, and the constraints u(s, y) U, (s, y) [t,t] R n. ], The original problem was P 0,x0. Tomas Björk, 2013 328
The optimal value function The value function J : R + R n U R is defined by J (t, x,u) = E [ T t F(s, X u s,u s )ds + Φ(X u T) ] given the dynamics above. The optimal value function is defined by V : R + R n R V (t, x) = sup u U We want to derive a PDE for V. J (t,x,u). Tomas Björk, 2013 329
Assumptions We assume: There exists an optimal control law û. The optimal value function V is regular in the sense that V C 1,2. A number of limiting procedures in the following arguments can be justified. Tomas Björk, 2013 330
Bellman Optimality Principle Theorem: If a control law û is optimal for the time interval [t, T] then it is also optimal for all smaller intervals [s, T] where s t. Proof: Exercise. Tomas Björk, 2013 331
Basic strategy To derive the PDE do as follows: Fix (t, x) (0,T) R n. Choose a real number h (interpreted as a small time increment). Choose an arbitrary control law u on the time inerval [t,t + h]. Now define the control law u by u (s, y) = { u(s, y), (s, y) [t,t + h] R n û(s, y), (s, y) (t + h,t] R n. In other words, if we use u then we use the arbitrary control u during the time interval [t,t + h], and then we switch to the optimal control law during the rest of the time period. Tomas Björk, 2013 332
Basic idea The whole idea of DynP boils down to the following procedure. Given the point (t, x) above, we consider the following two strategies over the time interval [t, T]: I: Use the optimal law û. II: Use the control law u defined above. Compute the expected utilities obtained by the respective strategies. Using the obvious fact that û is least as good as u, and letting h tend to zero, we obtain our fundamental PDE. Tomas Björk, 2013 333
Expected utility for û: Strategy values J (t, x,û) = V (t, x) Expected utility for u : The expected utility for [t,t + h) is given by [ ] t+h E t,x F (s, Xs u,u s )ds. t Conditional expected utility over [t + h,t], given (t, x): E t,x [ V (t + h,x u t+h ) ]. Total expected utility for Strategy II is [ ] t+h E t,x F (s,xs u,u s )ds + V (t + h,xt+h) u. t Tomas Björk, 2013 334
Comparing strategies We have trivially V (t, x) E t,x [ t+h t F (s, X u s,u s ) ds + V (t + h,x u t+h) ]. Remark We have equality above if and only if the control law u is the optimal law û. Now use Itô to obtain V (t + h,x u t+h) = V (t, x) + + t+h t t+h t { } V t (s, Xu s ) + A u V (s, Xs u ) ds x V (s, X u s )σ u dw s, and plug into the formula above. Tomas Björk, 2013 335
We obtain E t,x [ t+h t { F (s, Xs u,u s ) + V } ] t (s, Xu s ) + A u V (s, Xs u ) ds 0. Going to the limit: Divide by h, move h within the expectation and let h tend to zero. We get F(t, x,u) + V t (t, x) + Au V (t, x) 0, Tomas Björk, 2013 336
Recall F(t, x,u) + V t (t, x) + Au V (t, x) 0, This holds for all u = u(t, x), with equality if and only if u = û. We thus obtain the HJB equation V t (t, x) + sup u U {F(t, x,u) + A u V (t, x)} = 0. Tomas Björk, 2013 337
The HJB equation Theorem: Under suitable regularity assumptions the follwing hold: I: V satisfies the Hamilton Jacobi Bellman equation V t (t,x) + sup u U {F(t, x, u) + A u V (t, x)} = 0, V (T, x) = Φ(x), II: For each (t, x) [0, T] R n the supremum in the HJB equation above is attained by u = û(t, x), i.e. by the optimal control. Tomas Björk, 2013 338
Logic and problem Note: We have shown that if V is the optimal value function, and if V is regular enough, then V satisfies the HJB equation. The HJB eqn is thus derived as a necessary condition, and requires strong ad hoc regularity assumptions, alternatively the use of viscosity solutions techniques. Problem: Suppose we have solved the HJB equation. Have we then found the optimal value function and the optimal control law? In other words, is HJB a sufficient condition for optimality. Answer: Yes! Theorem. This follows from the Verification Tomas Björk, 2013 339
The Verification Theorem Suppose that we have two functions H(t, x) and g(t, x), such that H is sufficiently integrable, and solves the HJB equation 8 >< H (t, x) + sup {F(t, x, u) + A u H(t, x)} = 0, t u U >: H(T, x) = Φ(x), For each fixed (t, x), the supremum in the expression sup u U {F(t, x, u) + A u H(t, x)} is attained by the choice u = g(t, x). Then the following hold. 1. The optimal value function V to the control problem is given by V (t, x) = H(t, x). 2. There exists an optimal control law û, and in fact. û(t, x) = g(t, x) Tomas Björk, 2013 340
Handling the HJB equation 1. Consider the HJB equation for V. 2. Fix (t, x) [0, T] R n and solve, the static optimization problem max u U [F(t, x, u) + A u V (t, x)]. Here u is the only variable, whereas t and x are fixed parameters. The functions F, µ, σ and V are considered as given. 3. The optimal û, will depend on t and x, and on the function V and its partial derivatives. We thus write û as û = û (t, x; V ). (4) 4. The function û (t, x; V ) is our candidate for the optimal control law, but since we do not know V this description is incomplete. Therefore we substitute the expression for û into the PDE, giving us the highly nonlinear (why?) PDE V t (t, x) + F û(t, x) + Aû (t, x) V (t, x) = 0, V (T, x) = Φ(x). 5. Now we solve the PDE above! Then we put the solution V into expression (4). Using the verification theorem we can identify V as the optimal value function, and û as the optimal control law. Tomas Björk, 2013 341
Making an Ansatz The hard work of dynamic programming consists in solving the highly nonlinear HJB equation There are no general analytic methods available for this, so the number of known optimal control problems with an analytic solution is very small indeed. In an actual case one usually tries to guess a solution, i.e. we typically make a parameterized Ansatz for V then use the PDE in order to identify the parameters. Hint: V often inherits some structural properties from the boundary function Φ as well as from the instantaneous utility function F. Most of the known solved control problems have, to some extent, been rigged in order to be analytically solvable. Tomas Björk, 2013 342
The Linear Quadratic Regulator min u R k E [ T 0 {X tqx t + u tru t }dt + X THX T ], with dynamics dx t = {AX t + Bu t }dt + CdW t. We want to control a vehicle in such a way that it stays close to the origin (the terms x Qx and x Hx) while at the same time keeping the energy u Ru small. Here X t R n and u t R k, and we impose no control constraints on u. The matrices Q, R, H, A, B and C are assumed to be known. We may WLOG assume that Q, R and H are symmetric, and we assume that R is positive definite (and thus invertible). Tomas Björk, 2013 343
Handling the Problem The HJB equation becomes V t (t, x) + inf u R k {x Qx + u Ru + [ x V ](t, x) [Ax + Bu]} + 1 2 i,j V (T, x) = x Hx. 2 V x i x j (t, x) [CC ] i,j = 0, For each fixed choice of (t, x) we now have to solve the static unconstrained optimization problem to minimize x Qx + u Ru + [ x V ](t, x) [Ax + Bu]. Tomas Björk, 2013 344
The problem was: min u x Qx + u Ru + [ x V ](t, x) [Ax + Bu]. Since R > 0 we set the gradient to zero and obtain 2u R = ( x V )B, which gives us the optimal u as û = 1 2 R 1 B ( x V ). Note: This is our candidate of optimal control law, but it depends on the unkown function V. We now make an educated guess about the structure of V. Tomas Björk, 2013 345
From the boundary function x Hx and the term x Qx in the cost function we make the Ansatz V (t, x) = x P(t)x + q(t), where P(t) is a symmetric matrix function, and q(t) is a scalar function. With this trial solution we have, V t (t, x) = x Px + q, x V (t, x) = 2x P, xx V (t, x) = 2P û = R 1 B Px. Inserting these expressions into the HJB equation we get { } x P + Q PBR 1 B P + A P + PA x + q + tr[c PC] = 0. Tomas Björk, 2013 346
We thus get the following matrix ODE for P { P = PBR 1 B P A P PA Q, P(T) = H. and we can integrate directly for q: { q = tr[c PC], q(t) = 0. The matrix equation is a Riccati equation. equation for q can then be integrated directly. The Final Result for LQ: V (t, x) = x P(t)x + T t û(t, x) = R 1 B P(t)x. tr[c P(s)C]ds, Tomas Björk, 2013 347