Parameter Estimation p.1/12 Parameter Estimation of Mathematical Models Described by Differential Equations Hossein Zivaripiran Department of Computer Science University of Toronto
Outline Parameter Estimation p.2/12 Modeling with Differential Equations (IVPs, DDEs) Modeling and Parameter Estimation Numerical Parameter Estimation of IVPs DDEs and Parameter Estimation
Modeling with DE - Formulation Parameter Estimation p.3/12 An Initial Value Problem (IVP) for Ordinary Differential Equations (ODEs) y (t) = f(t, y(t)) y(t 0 ) = y 0
Modeling with DE - Formulation Parameter Estimation p.3/12 An Initial Value Problem (IVP) for Ordinary Differential Equations (ODEs) y (t) = f(t, y(t)) y(t 0 ) = y 0 Retarded Delay Differential Equations (RDDEs) y (t) = f(t, y(t), y(t σ 1 ),,y(t σ ν )) for t 0 t t F y(t) = φ(t), for t t 0 σ i = σ i (t, y(t)) 0 delay (constant / time dependent / state dependent) φ(t) history function (constant / time dependent)
Modeling with DE - Formulation Parameter Estimation p.3/12 An Initial Value Problem (IVP) for Ordinary Differential Equations (ODEs) y (t) = f(t, y(t)) y(t 0 ) = y 0 Retarded Delay Differential Equations (RDDEs) y (t) = f(t, y(t), y(t σ 1 ),,y(t σ ν )) for t 0 t t F y(t) = φ(t), for t t 0 σ i = σ i (t, y(t)) 0 delay (constant / time dependent / state dependent) φ(t) history function (constant / time dependent) Neutral Delay Differential Equations (NDDEs) y (t) = f(t, y(t), y(t σ 1 ),,y(t σ ν ), y (t σ ν+1 ),,y (t σ ν+ω )) for t 0 t t F y(t) = φ(t), y (t) = φ (t), for t t 0,
Modeling with DE - Examples Parameter Estimation p.4/12 The Van der Pol Oscillator, d 2 x dt 2 µ(1 x2 ) dx dt + x = 0 using y 1 = x and y 2 = dx dt, y 1(t) = y 2 (t) y 2(t) = µ(1 y 2 1)y 2 y 1
Modeling with DE - Examples Parameter Estimation p.4/12 The Van der Pol Oscillator with µ = 1, y 1 (0) = 2, y 2 (0) = 0 2.5 2 1.5 1 0.5 y 1 0 0.5 1 1.5 2 2.5 0 5 10 15 20 t 3 2 1 y 2 0 1 2 3 0 5 10 15 20 t
Modeling with DE - Examples Parameter Estimation p.4/12 A neutral delay logistic Gause-type predator-prey system [Kuang 1991] y 1(t) = y 1 (t)(1 y 1 (t τ) ρy 1(t τ)) y 2(t)y 1 (t) 2 y 2(t) = y 2 (t) ( y1 (t) 2 ) y 1 (t) 2 + 1 α y 1 (t) 2 + 1 where α = 1/10, ρ = 29/10 and τ = 21/50, for t in [0, 30]. The history functions are for t 0. φ 1 (t) = 33 100 1 10 t φ 2 (t) = 111 50 + 1 10 t
Modeling with DE - Examples Parameter Estimation p.4/12 The predator-prey model 0.37 0.36 0.35 y 1 0.34 0.33 0.32 0.31 0 5 10 15 20 25 30 t 2.255 2.25 2.245 2.24 y 2 2.235 2.23 2.225 2.22 2.215 2.21 0 5 10 15 20 25 30 t
Modeling and Parameter Estimation - Definitions Parameter Estimation p.5/12 Parameterized Models A parameterized IVP y (t;p) = f(t, y(t;p);p) y(t 0 ) = y 0 (p) For example p = [µ] in the Van der Pol oscillator y 1(t) = y 2 (t) y 2(t) = µ(1 y1)y 2 2 y 1 A simple parameterized DDE y (t;p) = f(t, y(t;p), y(t σ(t;p));p) for t 0 (p) t y(t;p) = φ(t;p), for t t 0 (p)
Modeling and Parameter Estimation - Definitions Parameter Estimation p.5/12 Parameter Estimation Problem A System of Parameterized IVP y (t;p) = f(t, y(t;p);p) y(t 0 ) = y 0 (p) or DDE y (t;p) = f(t, y(t;p), y(t σ(t;p));p) for t 0 (p) t y(t;p) = φ(t;p), for t t 0 (p) A Set of Data (Observations/Measurements) {Y (γ i ) y(γ i ;p )} Estimate p by minimizing an objective function. e.g. W(p) = i [Y (γ i ) y(γ i ;p)] 2.
Modeling and Parameter Estimation - Importance Parameter Estimation p.6/12 Physical/Biological Phenomenon Mathematical Model Physical Laws / Empirical Rules Observed Data Sensitivity Analysis Refined Mathematical Model Parameter Estimation Practical Mathematical Model
Modeling and Parameter Estimation - Optimizers Parameter Estimation p.7/12 Algorithms for Nonlinear Least-Squares Unconstrained min p W(p) = i [Y (γ i ) y(γ i ;p)] 2. Constrained Levenberg-Marquardt Variations of Sequential Quadratic Programming (SQP) min p W(p) = i [Y (γ i ) y(γ i ;p)] 2, c j (p) = 0, j E, c j (p) 0, j I. Sequential Quadratic Programming (SQP)
Numerical Parameter Estimation of IVPs Parameter Estimation p.8/12 The initial value approach : 1. Choose an initial guess for the parameters 2. Solve model equations 3. Check optimality conditions, (if satisfied stop). 4. Choose a better value for the parameters and continue with (2) The main difficulty : If the parameters are far from the correct ones the trial trajectory soon loses contact to the measurements. If the optimization method needs to compute the gradient or the Hessian of the objective function, ( 2 ) W(p) p l p m ( ) W(p) = 2 ( ) y(γi ;p) [Y (γ i ) y(γ i ;p)] p l p i l = 2 [( ) ( ) ( y(γi ;p) y(γi ;p) 2 )] y(γ i ;p) [Y (γ i ) y(γ i ;p)] p i l p m p l p m the sensitivity equations are usually used to provide the required values. An alternative approach is to use a divided-difference approximation.
Numerical Parameter Estimation of IVPs Parameter Estimation p.8/12 A Simple Example, The IVP y 1(t) = ay 2 (t) y 2(t) = ay 1 (t) with initial conditions y 1 (0) = 0 y 2 (0) = 1 the analytical solution is y 1 (t) = sin(at) y 2 (t) = cos(at)
Numerical Parameter Estimation of IVPs y1(t) 1 0.8 0.6 0.4 0.2 0 0.2 a * =1 0.4 a=1.05 a=1.1 0.6 a=1.15 a=1.45 0.8 a=1.5 a=1.55 1 0 1 2 3 4 5 6 7 t Parameter Estimation p.8/12
Numerical Parameter Estimation of IVPs 3 2.5 W(a) 2 1.5 1 0.5 0 0 0.5 1 1.5 2 a Parameter Estimation p.8/12
Numerical Parameter Estimation of IVPs Parameter Estimation p.8/12 Multiple shooting : Partitions the fitting interval into many subintervals, each having its own initial values. The measurements are used to get starting guesses for initial values of each interval. In an iterative process, the algorithm minimizes W(p) on the one hand and enforces the continuity of the full trajectory on the other hand. Advantages: Divergences are avoided and the danger of local minima is reduced.
Numerical Parameter Estimation of IVPs y1(t) 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 t Parameter Estimation p.8/12
DDEs and Parameter Estimation Parameter Estimation p.9/12 The adapted initial value approach [ Paul, 1997 ] : Assume that jumps in the derivative of y(t;p) with respect to t occur at the points Λ(p) {λ 1 (p), λ 2 (p),...}. Such discontinuities, when arising from the initial point t 0 (p) (and the initial function φ(t;p)), may propagate into W(p) via the solution values {y(γ i ;p)}. The first and second order partial derivatives of the objective function are ( 2 ) W(p) = 2 p l p m ±± i ( ) W(p) = 2 ( ) y(γi ;p) [Y (γ i ) y(γ i ;p)] p l ± p i l [ ( ) ( ) y(γi ;p) y(γi ;p) p l ± p m ± ( 2 ) y(γ i ;p) [Y (γ i ) y(γ i ;p)] p l p m ± ±± ] A general rule for the propagation of discontinuities to W(p) [ Baker & Paul,1997 ] : if a discontinuity point λ r (p) coincides with one of the data points γ i, and λ r (p) varies as some parameters vary, then W(p) has a jump in its partial derivatives that correspond to the varying parameters.
DDEs and Parameter Estimation Parameter Estimation p.9/12 Multiple shooting [ Horbet et al., 2002 ] : Use cubic splines to parameterize the initial curves and formulate the continuity of the trajectory in terms of these spline variables. A two-phase procedure: During the first iterations of the optimization, the spline variables are held fixed because they are expected to be estimated well from the data. After the algorithm has converged for the first time, they are released and fitted together with the other variables until the final convergence is achieved. Can attain higher convergence rate in the case of noisy data.
DDEs and Parameter Estimation Parameter Estimation p.9/12 The full discretization approach [ Murphy, 1990 ]: Use linear splines to describe solution and delay functions. As the result the problem becomes a very large minimization problem. The number of mesh points is increased gradually to be able to satisfy the specified error tolerance. Although the method has the generality of dealing with any kinds of unknown parameters, it suffers from the heavy computations, slow convergence rate and possibility of being trapped in a local minimum.
DDEs and Parameter Estimation Parameter Estimation p.9/12 Non-smooth optimization Consider the continuous function y(t) = 5(t τ) + c, if t < τ 5(t τ) + c, if t τ with discontinuous derivative y (t) = 5, if t < τ 5, if t τ and observed value of y at the discontinuity point τ = 4.
DDEs and Parameter Estimation 25 20 y(t) 15 10 5 c 0 0 1 2 3 τ 4 5 6 7 t Parameter Estimation p.9/12
DDEs and Parameter Estimation 60 W(τ) 40 W(τ) τ 20 0 20 40 60 3 3.5 4 4.5 5 τ Parameter Estimation p.9/12
DDEs and Parameter Estimation Parameter Estimation p.9/12 Try to find τ using MATLAB s unconstrained minimization routine fminunc 31 iterations. Try to use MATLAB s constrained minimization routine fmincon with the added constraint τ 4 2 iterations. Try to use MATLAB s constrained minimization routine fmincon with the added constraint τ 4 2 iterations.
DDEs and Parameter Estimation - Safe Minimization Parameter Estimation p.10/12 Appearance of Non-smoothness Partial derivatives(gradient) of the objective function ( ) W(p) = 2 p l ± i ( ) y(γi ;p) [Y (γ i ) y(γ i ;p)] p l ± ( ) 2 W(p) = 2 p l p m ±± i ( ) y(γi ;p) p l ± ( ) y(γi ;p) p m ± [Y (γ i ) y(γ i ;p)] ( ) 2 y(γ i ;p) p l p m ±± Recall the jump equation for sensitivities y (λ + r+1 p ) = y (λ r+1 l p ) + ( y (λ r+1 ) y (λ + r+1 )) λ r+1 (p) l p l The General Rule : A jump occurs in W(p) when a discontinuity point λ r+1 (p) passes a data point γ i
DDEs and Parameter Estimation - Safe Minimization Parameter Estimation p.10/12 Algorithms for Nonlinear Least-Squares Unconstrained min p W(p) = i [Y (γ i ) y(γ i ;p)] 2. Levenberg-Marquardt Variations of Sequential Quadratic Programming (SQP) Constrained min p W(p) = i [Y (γ i ) y(γ i ;p)] 2, c j (p) = 0, j E, c j (p) 0, j I. Sequential Quadratic Programming (SQP) The smoothness of functions involved in the problem, the objective function and constraints, is a necessary assumption.
DDEs and Parameter Estimation - Safe Minimization Parameter Estimation p.10/12 Avoiding The Non-smoothness y true solution continiously extended solution discontinuity point data point λ r (p) γ i λ r+1 (p) t y λ r (p) γ i λ r+1 (p) t Force the ordering by adding λ r (p) γ i λ r+1 (p) to the set of constraints. The partial derivatives (gradient) of the new constraints, λ r+1(p), can be computed recursively. p
DDEs and Parameter Estimation - Safe Minimization Parameter Estimation p.10/12 Safe Conduction of Optimization Let ContinuityConstarints[p Base ] = {λ rpbase (p) γ i λ rpbase +1(p)} Then we can describe the steps for finding a local optimum as the following 1. Start with p c p 0. 2. p new SQP(p c, ContinuityConstarints[p c ]) 3. If some of ContinuityConstarints[p c ] are active, then p c p new and continue at (2), otherwise stop with p = p new.
DDEs and Parameter Estimation - A Test Case Parameter Estimation p.11/12 Estimating τ for the predator-prey model y 1(t) = y 1 (t)(1 y 1 (t τ) ρy 1(t τ)) y 2(t)y 1 (t) 2 ( y 2(t) y1 (t) 2 ) = y 2 (t) y 1 (t) 2 + 1 α Start with up to 10% random perturbation in original τ, and up to 3% randomly perturbed y(t; τ) as Data (Y ). For γ s we choose 10 random points, one of which is a discontinuity point. Run the parameter estimator 3 times. Results Estimator Choice FCN OBJ Very Simple 72,2697 0.000142917 Using Sensitivities 26,942 0.000142917 Adding Constraints 9,916 0.000142917 y 1 (t) 2 + 1
DDEs and Parameter Estimation - The Role of Data Parameter Estimation p.12/12 Back to First Talk!