Simulation and Parametric Estimation of SDEs Jhonny Gonzalez School of Mathematics The University of Manchester Magical books project August 23, 2012
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Problem An observer has collected a set of data of their interest over a period of time and at a certain sampling rate (not necessarily fixed). Imagine data coming from investment finance, population dynamics, radio-astronomy, etc. She/he would like to find a mathematical model that could best describe the dynamics of the variables involved. Discrete or continuous? Deterministic or stochastic? Is there some noise or randomness that make her think about using SDEs?
Problem The observer thinks some particular model is suitable and proposes a set of equations, for instance, dx t = β(α X t )dt + dw t. an OU process with unknown level and speed of mean reversion θ =(α, β). Well, but what is θ for the corresponding dataset the observer has collected? What can we say about the parameters in θ?
Figure: COB market prices. Taken from Geman and Roncoroni (2006)
Parametric estimation Parametric estimation is the process of determining the parameters that best fit some given data in some sense. Since the process cannot be observed completely we actually want estimates of the parameters. Obviously, for different datasets we will have in general different values for the parameters. Use same SDEs in different markets but with different parameters. In what sense? At least we expect the more we observe the process the more accurate we can find the estimates.
In general Let {(Ω, F, {F t } t 0, P θ ); θ Θ} be a filtered space and {X t } t 0 a stochastic process satisfying the stochastic differential equation dx t = b(x t, θ)dt + σ(x t, θ)dw t X 0 = x, t 0, (1) where {W t } t 0 is a standard Wiener process adapted to the natural filtration {F W t } t 0. Here θ Θ R p is the multidimensional parameter to be estimated from the observations of {X t } t 0. The functions b : R Θ R and σ : R Θ R + are assumed to be known and satisfying conditions for the SDE to have a unique (strong) solution.
Consistency and asymptotic normal estimators Let θ 0 be the true value of the parameter θ. Our main goal is to determine estimates ˆθ of the true value θ 0 that are both consistent and asymptotically normal as defined below. 1. A sequence of estimators ˆθ T (X T ) of the unknown parameter θ is said to be consistent if ˆθ T θ 0 almost surely as T under P θ0. 2. A sequence of consistent estimators ˆθ T (X T ) of the unknown parameter θ is said to be asymptotically normal (Gaussian) if T ˆθ T θ 0 N (0, Σ) in distribution as T under P θ0.hereσ is the asymptotic variance matrix.
Sample schemes We could have different sampling rates. For example, Large sample scheme: the number of observations n increases but the time between consecutive observations is fixed. Hence, the total time of observation [0, n = T ] also increases as more observations are taken. High-frequency scheme: the total time of observation is fixed whilst n increases. Consequently, = n goes to zero with n. Rapidly increasing design: n goes to zero as n increases, but the total time of observation also increases, i.e., n n. Moreover, could shrink to zero at a specified rate n k n 0, k 2. In any case a continuous observation is quite unlikely and the large sample scheme is perhaps the most natural one.
Simulation of SDEs Simulation of SDEs is quite important in any field where SDEs are used to model systems. They are the link between our mental models and the real world. They tells us what to expect from our models and allow analysis, forecasts, etc. In particular, if we want to test whether any estimation method is good in some sense, we need to have at hand the right data to prove it. This data is obtained by simulating the SDE of interest.
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Simulation of sample paths Clip for this slide Consider the solution {X t } to the SDE dx t = b(x t )dt + σ(x t )dw t, X 0 = x for 0 t T.Wewanttoapproximatetheprocess{X t } by a discrete process {Yt } which has the property that Y converges in a suitable sense to X as approaches zero, where we have a partition of the interval [0, T ] of the form t n = n, n = 1, 2,...,n T with = T n T for some integer n T.
Definition It is said that the discrete approximation Y converges strongly with order γ > 0attimet > 0, if there exists a positive constant C and 0 < δ 0 < t such that, for all (0, δ 0 ) where C does not depend on. E [ X t Y t ] C γ In average the discretisation is close to the initial process by a distance that depends on. If < 1, the greater the order γ the smaller the distance in average.
To derive discrete approximations of the process {X t } stochastic Ito-Taylor expansions that satisfy the strong convergence criterion are used. These approximations are called strong Taylor approximations (derived using Taylor series for SDEs).
There are mainly 3 schemes that can be used to simulate Ito processes Euler scheme, the simplest one Milstein scheme The 1.5 strong order Taylor scheme
Euler scheme The simplest strong Taylor approximation is the Euler scheme given by Y n+1 = Y n + b(y n ) + σ(y n ) W n, for n 0withY 0 = X 0 and W n = W tn+1 W tn. Use W n N(0, 1). γ = 1 /2 if strong solution of the SDE exists. Good numerical results for nearly constant drift and diffusion terms. However, the small order of convergence of this scheme makes desirable to use a higher order scheme.
Milstein scheme By adding a small term to the Euler scheme we get the Milstein scheme Y n+1 = Y n + b(y n ) + σ(y n ) W n + 1 2 σ(y n)σ (Y n ) ( W n ) 2, where σ ( ) denotes the derivative of σ( ). γ = 1
Comparison. Euler vs Milstein vs analytic 200 180 160 140 Euler Milstein Analitic solution 120 X(t) 100 80 60 40 20 0 0 1 2 3 4 5 6 7 8 t
Comparison. Euler vs Milstein vs analytic 140 130 120 Euler Milstein Analitic solution 110 100 X(t) 90 80 70 60 50 1.5 2 2.5 3 3.5 4 4.5 5 t
Comparison. Euler vs Milstein vs analytic 110 105 100 Euler Milstein Analitic solution 95 X(t) 90 85 80 75 70 65 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 t
The order 1.5 strong Taylor scheme More information of the sample paths of the process is obtained with Y n+1 = Y n + b + σ W n + 1 2 σσ ( W n ) 2 +b σ Z n + 1 bb + 12 2 σ2 b 2 + bσ + 12 σ2 σ { W n Z n } + 1 2 σ σσ 1 3 ( W n) 2 W n, Here Z n is a standard Gaussian random variable with variance 1 3 3 and E [ Z n W n ]= 1 2 2.
The pair of random variables ( Z n, W n ) can be determined from two independent standard normal variables U 1,n and U 2,n such that W n = U 1,n Z n = 1 2 3 /2 U 1,n + 3 1 U 2,n. γ = 3 /2.
For the geometric and arithmetic model without jumps, we have more than one Wiener process driving the SDEs. In this case, similar equations can be obtained to those presented above (see Kloeden and Platen, 1992). This the classical book for numerical analysis of SDEs!
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Preliminaries Clip for this slide Compound Poisson process. Let {X i } i=1,2,... be a sequence of iid random variables. Define L(t) = N(t) X i, i=1 where N(t) is a Poisson process with intensity λ independent of X i.thenl(t) is said to be a CP process. It is a Lévy process where jumps occur at a intensity λ, and jump size is determined by X i. It gives us random jumps, and not only of jumps of size one as the Poisson process. The intensity of jumps may be time dependent. We want this for electricity markets!
For example, λ(t) = sin 2 π (t τ) k 1. + 1 1 0.9 0.8 0.7 0.6 λ(t) 0.5 0.4 0.3 0.2 0.1 0 Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Time τ is phase change. k controls the concentration of jumps occurring in multiple of k years.
Simulation of a (NC) Poisson process Simulation of a nonhomogeneous Compound Poisson process Algorithm. Thinning or random sampling approach. 1. t = 0, I = 0, L = 0. 2. Generate a random exponential number T e 3. t = t + T e.ift e > T, stop. 4. Generate a random uniform number U. 5. If U λ(t) /λ, set I = I + 1, S(I) =t. Generate random jump X. L = L + X. 6. Go to step 2.
Stationary distributions Stationary distribution of an OU-type process. We want to look at the random variable X = lim t X (t). Consider the solution of an OU process dx (t) = βx (t)dt + σdi(t), X (0) =x, with solution given by t X (t) =e βt x + σe βt e βu di(u). If I(u) =W (u), thenx (t) N(0, σ2 2β ). If I(t) = N(t) i=1 X i, N(t) with intensity λ and exponentially distributed jumps with mean µ J,thenX (t) Γ( λ /β, 1 /µ J ). 0
OU process driven by a Lévy process We know already how to simulate the classical OU process. Now we want to simulate, say, the process dy t = βy t dt + dl t where L is a Lévy process, β > 0. Then discretise the interval [0, T ] by homogeneous time intervals of length > 0. Then, using a integrator factor or the general Itô formula, we get the exact solution Y (s + ) =e β Y (s)+e β Z (s), where Z (s) =e βs s+ e s βu dl(u). By a change of variables (u =: u s), we find that Z (s) d = 0 e βu dl(u).
OU process driven by a Lévy process For a general Levy process, computation of this random variable is complicated and simulation algorithms are usually slow. Following Bandorff-Nielsen and Shephard (2001), suppose the stationary distribution of Y (t) is Γ(ν, 1 /µ J ).Thenthe background Lévy process L(t) is a compound Poisson process with exponentially distributed jumps with mean 1 /µ J.Further, if the speed of mean reversion β is known, the intensity of jumps in L(t) is λ = νβ. In this case it can be shown that if we let c 1 < c 2 <... be the arrival times of a Poisson process with intensity νβ and N(1) the number of jumps up to time 1, then Z (s) = d N(1) µ J ln(ci 1 )e β u i. i=1
The arithmetic spot model Consider the arithmetic model S(t) =Λ(t)+Y 1 (t)+y 2 (t) with Y 1 (0) =S(0) Λ(0), Y 2 (0) =0, and where 2πt Λ(t) =a + bt + c sin, 365 is the floor seasonal function to which the prices will revert back to. It consists of a linear trend describing inflation in price level and a seasonal term explaining variations over the year.
Also, dy 1 (t) = β 1 Y 1 (t)dt + η 1 di 1 (t), dy 2 (t) = β 2 Y 2 (t)dt + η 2 di 2 (t), with Y 1 (t) Γ(ν, 1 /µ J ) and I 2 a compound Poisson process with exponentially distributed jumps and time dependent intensity λ(t) given by λ(t) = sin 2 π (t τ) k 1. + 1 Y 1 describes the normal prices. Low speed reversion. Y 2 captures the spikes in prices. High speed reversion. τ is phase change. k controls the concentration of jumps occurring in multiple of k years.
Figure: Simulation of the Arithmetic spot price model.
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Clip for this slide Recall we want find consistent and asymptotically normal estimates of the parameters of a SDE, e.g., find estimates for β and α in dx t = β(α X t )dt + dw t.
There are several methods that can be used to perform parametric estimation, some easy, others difficult to implement. For example, Exact Maximum Likelihood. Pseudo-likelihood methods (discretise the SDE) Approximated likelihood methods (approximate the likelihood function, e.g., with Hermite polynomials). GMM. Generalised method of moments (match theoretical and sample moments). Martingale Estimating functions. (Adapted) Kalman Filter method. etc. Some useful references at the end!
Continuous time observations Consider the SDE Note that if θ =(θ b, θ σ ) 2 n lim n i=1 dx t = b(t, X t, θ)dt + σ(t, X t, θ)dw t t X i 2 n T X i 1 2 n T 2 = σ 2 (u, X u, θ σ )du 0 And thus, θ σ, the parameters that only appear in the diffusion term, can be calculated from the quadratic variation of the process. So we can focus only on the parameters appearing in the drift term.
Volatility of GBM For a GBM S t = S 0 e (σw t+(µ 1 2 σ2 )t) with unknown volatility σ we may use the quadratic variation of a BM to find that for small enough and a partition of the observation time [0, T ], {t i } i=1,...,n σ 2 1 T N 1 log S 2 i+1. i=0 S i where S i = S ti.
Maximum likelihood For a SDE of the form dx t = b(x t, θ)dt + σ(x t )dw t the continuous-time likelihood ratio is given by T L T (θ) =exp b(x s, θ) σ 2 (X s ) dx s 1 T b 2 (X s, θ) 2 σ 2 (X s ) ds. 0 0 The MLE of θ 0 is the value of θ that maximises this ratio. T (θ) =log L T (θ) could also be maximised. Under some regularity conditions, the estimates so obtained are consistent and asymptotically normal Rao (1999). If time between observations is not small enough estimators are neither consistent nor unbiased.
Discrete likelihood function Instead of using the likelihood ratio above we could use the likelihood function for discrete observations given by L n (θ) = n i=1 p θ (, X i X i 1 ) (2) where p θ (, X i X i 1 ) represents the transition probability density function of the process from X i 1 at time t i 1 to state X i at time t i. This implies that at least we know p θ. GBM, classic OU. But we do not know it in gral., we are dealing with a mixture of processes in energy markets. Trying to compute it numerically could lead to poor estimating algorithms.
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
Clip for this slide When observation of the process is discrete there are many available methods that we can use. We give the basic ideas behind two of these methods, namely, Martingale Estimating functions and the Kalman Filter method. Parametric estimation of SDEs depends upon many factors such as the number of observations n (the more the better), the timestep (the smaller the better). Number of simulations. Sometimes the estimates need to be calculated as the average of the estimates for several simulations of the same process (the larger the number of simulations the better). In any case, parametric estimation is not a magic tool!
Martingale Estimating functions An estimating function G n is a function of the parameter θ and the discretely-observed data of the process X i, i = 0, 1,...,n. An estimator θ is calculated from the equation G n (θ, (X i ) i=0,1,...,n )=0. If G n is also a martingale then it is called a MEF. Basic idea in Bibbi and Sørensen (1995) 1. Take the log- continuous-time likelihood function for an SDE. 2. Take derivative wrt to the parameter. 3. Discretise it to get an estimating function. Since this (estimating) function is biased, subtract its compensator to get a martingale with respect to P θ.
Martingale Estimating functions Under some conditions... the optimal MEF of Bibbi and Sørensen (1995) is given by G n (θ) = n Ḟ (X i 1 ; θ) T φ(x i 1 ; θ) 1 (X i F (X i 1 ; θ)), i=1 where the dot operator denotes differentiation with respect to the parameter θ, and φ(x i 1 ; θ) =E θ [X i F (X i 1 : θ) 2 X i 1 ] and F (X i 1 ; θ) =E θ [X i X i 1 ]. This method exchanges small for ergodicity in the process. If F and φ are complicated to find they can be approximated by their sample counterpart.
OU process For an OU process with zero mean reversion it is known that dx t = θx t dt + σdw t, F (x; θ) =xe θ, and φ(θ) = σ2 e 2θ 1. 2θ Using these last two equations in G n above and assuming σ is given, we get θ n = 1 log n i=1 X i 1 X i n i=1 Xi 1 2.
Motivation The problem Simulation of SDEs SDEs driven by Wiener processes Processes with jumps Parametric estimation Continuous-time observations Discrete-time observations The Kalman Filter
OU process and the AR(1) model Clip for this slide The AR(1) model is the discrete-time version of the OU process (Euler scheme). The AR(p) model can be written as X t = c + p i=1 ϕ i X t i + ε t, where c is a constant, ϕ i are parameters of the model and ε t is white noise. Using the Euler scheme with = 1fortheOUprocess we get dx t = β (µ X t ) dt + σdw t, X t = βµ +(1 β)x t 1 + ε t, with E [ε t ] = 0andE ε 2 t = σ 2.
An interpretation of SDEs In contrast with the financial markets, in most electricity markets we have only hourly data series. So we may interpret our SDEs as discrete time equations. As shorthand notations of discrete models. Prices in i < t < (i + 1) are not observable.
KF in discrete time The state equation (real price process) is given by and the observation equation is x t = F t x t 1 + v t 1, z t = H t x t + A t z t 1 + w t. Here v t, w t are two noise sources with normal distributions N(0, Q) and N(0, R), respectively.thematricesf t, H t, A t, Q and R contains the parameters of the equations. 1. What is the best estimate of x t above based on the observations z s, s t? 2. The filtering problem. We want to filter the noise away from the observations in an optimal way. Extract the state process from noisy measurements.
Running the KF recursive method The variables for the KF can be classified as Apriorivariables. x t t 1, P t t 1 are the estimates for the mean and variance of x t given the observations z 1,...,z t 1. A posteriori variables. x t, P t are the estimates for the mean and variance given z 1,...,z t 1, z t.
1. Project the a posteriori values for t 1 one time step ahead x t t 1 = F t x t 1, 2. Compute the Kalman gain K t 3. Estimate z t given z 1,...,z t 1 4. The residuals are P t t 1 = F t P t 1 F T t + Q. G t = H t P t t 1 H T t + R, K t = P t t 1 H T t G 1 t. z t = H t x t t 1 + Az t 1. r t = z t z t = z t H t x t t 1 Az t 1. 5. Produce a posteriori estimates x t, P t x t = x t t 1 + K t r t, P t = (I K t H t ) P t t 1.
Figure: Basic concept of the Kalman Filter
The residual r t has a normal distribution N(0, G t ).WhenG t is 1-dimensional, the pdf of r t is f t (r t )=(2πG t ) 1/2 exp ( r 2 t/2g t ). The sample log-likelihood is therefore L = N log f t (r t )= 1 t=1 2 log(2π) 1 N 2 log(g t )+ r 2 t. t=1 G t
Parametric estimation Parameters are estimated as follows (Barlow, Gusev, Lai, 2004): 1. Start with some initial guesses of the parameters. They can be optimally chosen to provide the fastest convergence of the KF filter. 2. Run the KF to estimate what the system equation would be if these parameters were correct. 3. Calculate the log-likelihood function L. 4. Repeat 1-3 to find the parameter values that maximise L. An optimisation routine is used here.
Simplex algorithm The Log-likelihood function may be maximised (or minimised its negative) with the Simplex optimisation method. It is the fminsearch function in MATLAB. Allows multidimensional optimisation of a function without using its derivatives. Robust, converges quickly, works well for irregular functions. No rigorous proof of convergence in 2004! Gives only a local minimum, but this happens with all other methods. But have been used in the last 30 years and it is known to give false minimum points for few rare functions.
Barlow, M., Gusev, Y. and Lai, M. (2004) Calibration of multifactor models in Electricity Markets. International Journal of Theoretical and Applied Finance, v. 7, No. 2. Benth, F. E., Benth, J. Š. and Koekebakker S.. Stochastic modelling of electricity and related markets. World Scientific, London. 2008 Bibby, B., Sørensen, M. (1995). Martingale estimating functions for discretely observed diffusion processes, Bernoulli, 1, 17 39. Iacus, S., (2008). Simulation and Inference for Stochastic Differential Equations With R Examples. Springer-Verlag, New York.
Prakasa Rao, B.L.S. (1999) Statistical Inference for Diffusion Type Processes. Oxford University Press, New York. Pedersen, A. R. (1995). A new approach to Maximum Likelihood Estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics, 22, pp. 55-71.