Numerical Methods for Constrained Optimal Control Problems

Size: px

Start display at page:

Download "Numerical Methods for Constrained Optimal Control Problems"

Cecil Dominick Stone
5 years ago
Views:

1 Numerical Methods for Constrained Optimal Control Problems Hartono Hartono A Thesis submitted for the degree of Doctor of Philosophy School of Mathematics and Statistics University of Western Australia July 212

3 Contents 1 Introduction Pontryagin Minimum Principle Hamilton-Jacobi-Bellman Equation Thesis Overview Exact Power Penalty Methods for State-Constrained Optimal Control Problems Introduction Power Penalty Method and Its Convergence Smoothing Technique Numerical Examples Conclusion Sensitivity Analysis Introduction Sensitivity Analysis Using Power Penalty Method Numerical Examples Conclusion Iterative Upwind Finite Difference Method Introduction Iterative Upwind Finite Difference Method Discretization of HJB Finding Optimal Trajectory and Control Region Size Reduction Algorithm for Iterative Upwind Finite Difference Method Numerical Examples Conclusion Modified Iterative Upwind Finite Difference Method Introduction i

4 5.2 Modified Iterative Upwind Finite Difference Method First Iteration Second Iteration Modified Iterative Upwind Finite Difference Method Algorithm Numerical Example Conclusion Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation Introduction Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation Completed Richardson Extrapolation The Singularly Perturbed Convection-Diffusion Equations Algorithm for Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation Numerical Example Conclusion Conclusions Main Contributions Research Outlooks Curse of Dimensionality Order of Accuracy ii

5 List of Figures 2.1 Optimal control for Example Optimal state for Example Optimal control for Example Optimal state x 1 for Example Optimal state x 2 for Example Optimal control comparison for Example with n = 3, p = 2 and λ = Optimal state comparison for Example with n = 3, p = 2 and λ = The difference between computed and MISER optimal state for Example with n = 3, p = 2 and λ = Optimal control comparison for Example with n = 3, p = 2 and λ = Optimal state comparison for Example with n = 3, p = 2 and λ = The difference between computed and MISER optimal state for Example with n = 3, p = 2 and λ = Value function for Example Optimal control for Example Optimal state along optimal trajectory for Example Value function along optimal trajectory for Example Optimal control along optimal trajectory for Example Constraint along optimal trajectory for Example Penalty along optimal trajectory for Example The cross-section of value function corresponding to x 2 = 1 for Example The cross-section of value function corresponding to x 1 = for Example The cross-section of optimal control corresponding to x 2 = 1 for Example iii

6 4.11 The cross-section of optimal control corresponding to x 1 = for Example Value function along optimal trajectory for Example Optimal control along optimal trajectory for Example Optimal states x 1 for Example Optimal states x 2 for Example Original constraint for Example Modified constraint for Example Penalty along optimal trajectory for Example Optimal trajectory from Iteration Optimal trajectory from Iteration Optimal trajectory from Iteration Optimal trajectory from Iteration Computed value function at t= Computed value function at t= Computed optimal control at t= Computed optimal control at t= Value function along optimal trajectory Optimal control along optimal trajectory x 1 vs time x 2 vs time Original constraints along optimal trajectory Modified constraints along optimal trajectory Value function for Example Optimal control for Example Optimal state along optimal trajectory for Example Value function along optimal trajectory for Example Optimal control along optimal trajectory for Example Constraint along optimal trajectory for Example Penalty along optimal trajectory for Example iv

7 List of Tables 2.1 Results for Example Optimal solution for Example Computed error for u and x in the maximum and L 2 norm Result for Example Result for Example Computational results for Example (c = r = 2, ρ = 1 2 ) Computed error for u in the maximum and L 2 norm Computed error for x in the maximum and L 2 norm Computational result for Example (c = r = 2, ρ = 1 2 ) Computational result for Example (c = r = 2, ρ = 1 2 ) Computational result for Example (r = 2, ρ = 1 2 ) Computational result for Example (r = 2, ρ = 1 2 ) Computational result for Example (e = r = 2, ε = 1 1 ) Computed error for u in the maximum and L 2 norm Computed error for x in the maximum and L 2 norm v

8 Acknowledgements First of all, I would like to thank My Lord, Jesus Christ, for His blessing and mercy so that I can complete this thesis. Without Him I would not be able to reach this point. Most gratefully I would like to thank my primary supervisor, Prof. Les Jennings, for his patience, understanding, guidance and concern with the progress of my research. In particular, I am thankful for the proof reading, error catching and many valuable suggestions in order to improve this thesis. His expertise in Numerical Analysis and Optimal Control helps me building a good background for doing the research. I am deeply indebted to him. To my co-supervisor, Prof. Song Wang, I owe much for his agreement to co-supervise my research. He played a crucial role in supervision particularly during 29 when Prof. Les Jennings was on a sabbatical leave. To the entire Mathematics and Statistics School staff, especially Roman Bogoyev and Con Savas, I thank a lot for addressing many problems which come with doing a Ph.D. Next, I want to express my gratitude to AusAID for giving me an opportunity to study in Australia through ADS (Australian Development Scholarships). They make my dream come true with the financial support. Mrs. Chris Kerin and Mrs. Deborah Pyatt as previous and current AusAID Liaison Officers in various ways and at different time during my study have also contributed to the completion of this thesis. Thank you very much! Above all, it takes more than mathematics to make life worth living. For this, I thank to my beloved family; my parents, my brothers and sisters, my friends (Albert, Noni, Widodo etc). In particular, my wife, Margareta Rita, always supports me during time that brings me down. She sacrifices her life by accompanying me here. Once again, I thank a lot. At last but not least, to my mother, Mrs. Maria Roostiati, who constantly loved and supported me without expecting anything in return throughout her life I say a big thank-you. Although you are no longer with me, I believe that in Heaven you always pray for my success. To the memory of you, this thesis is dedicated. vi

9 Abstract In this thesis we consider numerical methods for solving state-constrained optimal control problems. There are two main focii in the research, i.e. stateconstrained optimal open-loop and feedback control problems. For all cases, we reformulate the constrained optimal control problem to the unconstrained problem through a penalty method. The state-constraints which we discuss here are only in the form of inequalities but for both purely state-constraint and control-state constraint types. For solving state-constrained optimal open-loop control problems, we establish a power penalty method and analyze its convergence. This method is then implemented in MISER 3.3 to do some numerical tests. The results confirm that the method work very well. Furthermore, we use the power penalty method to discuss a sensitivity analysis. On the other hand, for solving state-constrained optimal feedback control problems we construct a new numerical algorithm. The algorithm based on upwind finite difference scheme is iterated in order to increase the accuracy and speed of computation. In particular to address the curse of dimensionality, a special method for generating grid points in the domain is developed. Numerical experiment shows that the computational speed increases significantly with this modified method. Moreover, for further improvement in the accuracy the algorithm can be combined with Richardson Extrapolation Method.

10 Chapter 1 Introduction 1.1 Pontryagin Minimum Principle Dynamic Optimization or optimal control theory is a branch of mathematics aimed to find optimal ways to control a dynamical system. It is to optimize some objective function with constraints of a system of differential equations. In general it is in the form min u(t) U J(u) = tf L(x(t), u(t), t) dt + φ(x(t f )) subject to ẋ = f(x(t), u(t), t), for all t (, t f ], x() = x, where x R n and u U R m are the state and control, t f > is a constant. L : R n+m+1 R, φ : R n R and f : R n+m+1 R n are known functions. Theoretical foundations to solve this problem were laid by many great mathematicians and culminated on the publication of Pontryagin Minimum Principle by Pontryagin, Boltyanskii, Gamkrelidze and Mischenko in 1962 [34]. This principle provides a necessary condition (see theorem that follows) for some control u to be an optimal control. That is, it yields a local minimum for the objective function J(u). Theorem If u (t) is an optimal control and x (t) and λ (t) are the corresponding state and adjoint vector, then it is necessary that 1. ẋ (t) = H(t, x (t), u (t), λ (t)) λ = f(t, x (t), u (t)) 2. x () = x 1

11 3. λ (t) = H(t, x (t), u (t), λ (t)) x 4. λ (t f ) = φ(x (t f )) x 5. H(t, x (t), u (t), λ (t)) u = for all t [, t f ] where H is called the Hamiltonian function, H(t, x, u, λ) = L(t, x, u) + λf(t, x, u). Using this principle, some simple cases can be solved analytically (cf. [44], [45]). However, in practice the problem might be too big and complex so that the use of a modern computer is unavoidable. At present, there are a lot of successful computational algorithms available to solve optimal control (see [3], [19], [3], [42], [43], [46], [48]) to name but a few. Among them there is a method called control parameterization which is famous due to its simplicity. This method works on the basis that it is possible to construct a sequence of approximate problems to the original problem so that their solutions constitute suboptimal solutions of the given problem. This is done by discretization of control space so as to transform optimal control problems to ordinary constrained mathematical programming problems. A unified computational approach using this technique is already available [17] and [53]. This approach shows that constraints can be written in a way such that they are in the same form of the objective function through constraint transcription. As a result, a unified and efficient computational scheme can be generated. In [54], a new constraint transcription was introduced so that a more stable numerical algorithm was obtained. Also, the resultant equality constraints can, furthermore, be smoothed using a local smoothing technique [25]. This has been implemented successfully in a commercial software named MISER [52]. Nowadays, it has been developed into MISER 3 Version 3 [24]. Moreover, a new computational procedure to solve functional inequality constrained optimization problems was raised in [55]. This procedure uses a linear penalty function replacing constraints which is, then, added to the objective function. The advantage of this method is that it turns the given problem into an easier unconstrained problem. However, because the linear penalty term is not differentiable, standard optimization methods could not work with this type of constraints. To address this issue, a smoothing technique is required to smooth out the sharp corner in the function. Then, control parameterization is utilized to solve the optimal control problem. 2

12 In the above papers and most of current research, a linear or quadratic term is the most frequently used as a penalty function for solving constrained optimal control problems. On the other hand, a power penalty method that has been successful for solving various complementarity problems such as in [58], [6] and [61] is very rare in the optimal control literature. Therefore, this method has the potential to be investigated in depth and explored in this study. 1.2 Hamilton-Jacobi-Bellman Equation Another way to deal with optimal control problem is to use the Bellman Dynamic Programming idea. This requires the solution of the Hamilton-Jacobi- Bellman(HJB) partial differential equation in a domain of the state space containing the optimal solution. Therefore, in this section we present some theoretical background to construct an appropriate notion for the numerical solution to the HJB equation. We firstly present the definition of a viscosity solution of first order partial differential equations named Hamilton-Jacobi (HJ) equations. Next, we state some necessary assumptions and theorems for showing that the viscosity solution is the unique solution of the HJ equations. Then we implement the viscosity solution theory to optimal feedback control problems and prove that the value function is the unique viscosity solution to the HJB equation. Finally, we discuss the constrained viscosity solution and conclude with the connection between dynamic programming and Pontryagin minimum principle. In the following we consider the first order evolution equations called Hamilton- Jacobi (HJ) equation of the form w t + H(t, x, w) = (1.2.1) w(, x) = q(x) where the Hamiltonian, H : (, t f ] R n R n, and the initial function q are given. There are two unknowns in this equation, i.e. w : [, t f ] R n, w = w(t, x) and w = x w = (w x1,..., w xn ). Unfortunately, in general the HJ equation (1.2.1) does not have a solution in the classical sense. The generalized solution, namely the Lipschitz continuous function on the closure of domain which satisfy the equation at almost every point in the domain, also often failed to lead to a uniqueness result. Therefore, P.L. Lions etc. in [7], [8] and [26] introduced the concept of viscosity solution to deal with this problem. 3

13 Definition A function w C((, t f ] R n ) is a viscosity subsolution of equation (1.2.1) if, for every ϕ C 1 ((, t f ] R n ), ϕ(t, x ) t + H(t, x, ϕ(t, x )) (1.2.2) at any local maximum point (t, x ) (, t f ] R n of w ϕ. Definition A function w C((, t f ] R n ) is a viscosity supersolution of equation (1.2.1) if, for every ϕ C 1 ((, t f ] R n ), ϕ(t, x ) t + H(t, x, ϕ(t, x )) (1.2.3) at any local minimum point (t, x ) (, t f ] R n of w ϕ. Definition A function w C((, t f ] R n ) is a viscosity solution of equation (1.2.1) if w is simultaneously a subsolution and supersolution in the viscosity sense and w(, x) = q(x). It can be shown further that the classical (smooth) solution of HJ equation is also a viscosity solution and vice versa, any sufficiently smooth viscosity solution is a classical solution. Moreover, if a viscosity solution is differentiable at some point then it solves the HJ equation at that point. The uniqueness of a viscosity solution to HJ equation can be proven through Comparison Theorem as follows: Theorem Let the Hamiltonian H : [, t f ] R n R n is Lipschitz continuous, i.e. H(t, x, p) H(s, y, p) C( t s + x y )(1 + p ) (1.2.4) H(t, x, p) H(t, x, q) C p q (1.2.5) for s, t R, x, y, p, q R n and some constant C. Let w 1, w 2 be respectively bounded, uniformly continuous subsolution and supersolution of (1.2.1) and if for all x R n w 1 (, x) w 2 (, x), then w 1 (t, x) w 2 (t, x) for all (t, x) [, t f ] R n. 4

14 PROOF. See [5] or [1] for the proof of this theorem. As a consequence of Comparison Theorem, the uniqueness of a viscosity solution is stated in the following corollary. Corollary Under assumptions (1.2.4) and (1.2.5), there exists at most one bounded, uniformly viscosity solution of (1.2.1). Now, we will implement the above theory to optimal control problems that follow. min u(t) U J(s, y, u) = tf s L(x(t), u(t), t) dt + φ(x(t f )) subject to ẋ = f(x(t), u(t), t), for all t (s, t f ], x(s) = y, where x R n and u U R m are the state and control, t f > is a constant, (s, y) [, t f ) R n,, y R n is a given point. L : R n+m+1 R, φ : R n R and f : R n+m+1 R n are known functions. We make some assumptions to the functions which define the above optimal control problem as follows: 1. The right hand side of the dynamical system f satisfies the following: (a) f (C([, t f ] R n U)) n, (b) f is bounded, f(x, u, t) C for some constant C, x R n, u U, (c) f is locally Lipschitz continous in x, uniformly in u and t, i.e. f(x, u, t) f(y, u, t) C x y for all x, y R n, u U. 2. The running cost L should fulfill (a) L C([, t f ] R n U), (b) L is bounded, i.e. L(x, u, t) C for some constant C, x R n, u U, (c) L is Lipschitz continuous in x, uniformly in u and t, i.e. L(x, u, t) L(y, u, t) C x y for all x, y R n, u U. 5

15 3. The terminal cost function φ needs to meet the following requirements. (a) φ is uniformly continuous, (b) φ is bounded, i.e. φ(x) C for some constant C, x R n. In addition, we also make the assumption that the set U is a compact subset of R m and we denote U = {u : R R m measurable, u(t) U for a.e. t}. Using the Dynamic Programming Method, the optimal control problem can be investigated by turning attention to V (s, y) = inf J(s, y, u). (1.2.6) u U Here we consider the whole family of optimal control problems which have the same dynamical system f and cost function J, then we study how the minimum cost varies as a function of the initial conditions. The lemma that follows deals with the important characteristics of the value function V, i.e. bounded and Lipschitz continuous. Lemma Let the functions f, L, φ satisfy the above assumptions, then the value function V is bounded and Lipschitz continuous, i.e. C such that V (s, y) C and V (s 1, y 1 ) V (s 2, y 2 ) C( s 1 s 2 + y 1 y 2 ). PROOF. See [1] for the proof. Next, before showing that this value function V uniquely solves a certain kind of Hamilton-Jacobi (HJ) equation called Hamilton-Jacobi-Bellman (HJB) equation, we need the following Bellman s Dynamic Programming Principle. Theorem For each τ [s, t f ] and y R n, we have { τ } V (s, y) = inf L(x(t), u(t), t) dt + V (τ, x(τ)) u s where x solves the dynamical system, ẋ = f(x, u, t) with x(s) = y. 6

16 In other words, the theorem suggests that to find the least cost as defined in (1.2.6), we need to do the following: 1. Firstly, we look for the minimum value V for the sub-interval [τ, t f ] with running cost L and terminal cost φ, namely V (τ, x(τ)). 2. Secondly, we solve the optimization problem for the sub-interval [s, τ] with running cost L and terminal cost V (τ, x(τ)) determined before. PROOF. The proof of this theorem can be seen in [5] or [1]. Now, we state the most important theorem showing that the value function V is the unique viscosity solution to the Hamilton-Jacobi-Bellman equation. Theorem The value function V is the unique viscosity solution to the Hamilton-Jacobi-Bellman equation. with terminal condition V t + inf ( V f(x, u, t) + L(x, u, t)) = (1.2.7) u V (t f, x) = φ(x(t f )). PROOF. We refer to [1] for the proof. In addition, if the state x(t) is restricted to stay within a closed set O R n, that is x(t) O, t [s, t f ] then the problem is called HJB with state constraints (see [9], [14], [49]). To address this problem, H. M. Soner in [14], [49] has extended the notion of viscosity solution to the constrained viscosity solution of HJB equation. Definition V is called a constrained viscosity solution of the HJB equation (1.2.7) if V is a viscosity solution of (1.2.7) in [s, t f ) O and a viscosity supersolution of (1.2.7) on [s, t) O. Although the HJB equation has already had a firm theoretical foundation to solve, unfortunately, it is generally unsolvable analytically. Therefore, numerical approximations to this problem are required. There are many numerical algorithms used to solve HJB equation without constraints in the literature, (cf. [2], [22], [23], [36], [57] and [59]), to name but a few. Nonetheless, 7

17 HJB equation with state constraints is still rare to be discussed numerically. Thus, in this thesis we propose some numerical methods to solve it. Finally, let us conclude by mentioning a basic relation between viscosity solution and Pontryagin Minimum Principle, namely the trajectories which satisfy the Pontryagin Minimum Principle provide characteristic curves for the HJB equation. In other words, the adjoint variable is equal to the gradient of the value function (in general sense) evaluated at the optimal trajectory. We refer to [2], [64] for more details. 1.3 Thesis Overview The main contribution of this thesis is to design new efficient numerical methods for solving the state-constrained optimal control problems. In general, this thesis consists of two parts. First, we focus on a power penalty method for solving the state-constrained optimal open-loop control problem and for obtaining sensitivity analysis information. These are discussed throughout Chapters 2 3. The second part is related to linear penalty method for solving the state-constrained optimal feedback control problem. The methods described in Chapters 4 6 are an extension of the Upwind Finite-Difference Method which is available in the open literature. More specifically, the organization of the thesis is as follows: Chapter 1 The first Chapter is intended to provide an introduction that lists the background and literature review of the research. Chapter 2 This Chapter deals with the power penalty method for solving state-constrained optimal control problems. We firstly prove that the proposed power penalty method is an exact penalty method. Then, examples confirming theoretical finding are given. Chapter 3 The power penalty method used to provide sensitivity analysis information is discussed in this Chapter. We start with derivation of formulas which reveal the effect of variations in the constraint to the objective function. Two examples that follow are to substantiate the claim. Chapter 4 Numerical algorithms for solving the state constrained optimal feedback control problem are proposed. We extend the use of the Upwind Finite-Difference Method proposed in [59] for solving unconstrained optimal feedback control problems to state constrained problems by incorporating a linear penalty term in the formulation. Furthermore, to increase the computational speed a systematic iterative 8

18 procedure is introduced, creating a new method named Iterative Upwind Finite-Difference. Two examples from a previous Chapter are revisited to validate the method. Chapter 5 The Iterative Finite-Difference Method discussed in the previous Chapter is then modified in order to increase computational speed further. An example demonstrates the effectiveness of the new method. Chapter 6 The combination of Iterative Upwind Finite-Difference Method and Richardson Extrapolation is presented in this Chapter with the intention of improving the accuracy of the computation. To support the theory, an example illustrates how this method works. Chapter 7 The last Chapter sets out to conclude all research results and describe the outlook for future research. 9

19 Chapter 2 Exact Power Penalty Methods for State-Constrained Optimal Control Problems 2.1 Introduction Optimal control problems arise naturally in decision making of many aspects of life such as science, engineering, finance and management. These problems contain two sets of unknowns: state variables and control variables. Conventionally, the control variables are constrained by a set of conditions defining a set of all feasible policies in the decision making. However, from time to time, constraints on a state variable may also be necessary. For example, feasible solutions of state-space or even state-control-space problem may need to stay in a certain domain. A typical form of such a problem is as follows: Problem where min u(t) Ω J(u) = tf L(x(t), u(t), t) dt + φ(x(t f )) subject to ẋ = f(x(t), u(t), t), for all t (, t f ], x() = x, Ω = {u(t) R q g(x, u, t) }, for all t (, t f ], t f > is a constant, x R n, x R n is a given point and L : R n+q+1 R, φ : R n R, f : R n+q+1 R n and g : R n+q+1 R m are known functions. This problem is in general not solvable analytically and thus a numerical approximation to the solution of the problem is normally sought in practice. 1

20 There are many existing numerical methods for solving optimal control problems [53]. However, most of them have been developed for problems without state constraints. Several numerical methods have been proposed for stateconstrained problems in the open literature. Among them, the linear penalty approach [54] is frequently used to solve Problem due to its simplicity. In this method, a constrained problem is approximated by an unconstrained one obtained by adding to the original objective function a term that prescribes a high cost for violation of the constraints with a penalty constant r >. Under certain conditions, it can be shown that the solution from the penalty method converges to the solution of the original problem as r +. In the case that the solution to the unconstrained problem is also a solution to Problem when r r for a positive constant r, the penalty method is called an exact penalty method. There are some exact penalty methods for state-constrained optimal control problems such as those in [62], [63] and [35]. However, these methods, like most of existing penalty methods, use linear or quadratic penalty functions. Recently, penalty methods based on power functions with power less than 1 (such as 1 and 1 ) have been proposed 2 3 for constrained nonlinear optimization and complementarity problems (cf., for example, [4, 58, 6]). Mathematical analysis in these works shows that the power penalty methods have exponential convergence rates depending on the penalty parameters used. In this work, we will present a power penalty method for Problem and show that it is an exact penalty method. Numerical results will be presented to show the usefulness of the method. 2.2 Power Penalty Method and Its Convergence We now present the power penalty method and show that it is an exact penalty method. Before further discussion, it is necessary to make some assumptions on the given functions in Problem For any non-negative integer l and positive integer d, let C l (R d ) denote the usual function set such that if v C l (R d ), v and all of its partial derivatives of order up to and including l are continuous in R d. Using this notation, we make the following assumptions. Assumption L C 2 (R n+q+1 ), φ C 2 (R n ), f (C 2 (R n+q+1 )) n, g (C 2 (R n+q+1 )) m 2. Problem has a solution. 11

21 Consider the following penalty problems: Problem tf min P (u, r) = J(u) + r T (t)g 1/p (x(t), u(t), t)dt u(t) R q subject to ẋ = f(x(t), u(t), t) for all t (, t f ], g 1/p i = x() = x, where p > 1 is a constant. Vectors r = (r 1 (t), r 2 (t),..., r m (t)) T and g 1/p = (g 1/p 1, g 1/p 2,..., gm 1/p ) T respectively satisfy r i (t) > and if g i ; for i = 1, 2,..., m. g 1/p i if < g i < 1; g p i if g i 1, (2.2.1) Remark We comment that in Problem 2.2.1, the cost function P contains the power penalty term which penalizes the parts of g(x, u, t) that violate the constraint g(x, u, t). It is clear that if u(t) Ω, then u(t) and its response x(t) satisfy g(x, u, t). Hence P (u(t), r(t)) = J(u(t)). Otherwise, P (u(t), r(t)) > J(u(t)) because r T (t)g 1/p (x(t), u(t), t) >. We now show that Problem and Problem are equivalent when r is sufficiently large. The following theorem extends the work of Xing in [62], [63], originally focused on a linear penalty. Theorem Let (x,u ) be a stationary point and H be a Hamiltonian function of Problem 2.1.1, i.e., H(x(t), u(t), λ(t), µ(t)) = L(x, u, t) + λ T (t)f(x, u, t) + µ T (t)g(x, u, t) Suppose that φ xx is positive (semi)definite and H uu, H xx H xu Huu 1 H ux, are positive definite on [, t f ]. If r i (t) µ i (t) for i = 1,..., m then u is a local solution of Problem

22 PROOF. A proof for the case p = 1 can be found in [62], [63]. We now extend the proof to our case that p is an arbitrary positive constant. Let u + δu be any control and x + δx its corresponding response where δx() = then u P (u, r) = J(u + δu) J(u ) + tf r T (t) ( g 1/p (x + δx, u + δu, t) g 1/p (x, u, t) ) dt Because g 1/p (x, u, t) = at the stationary point u P (u, r) = tf [H(x + δx, u + δu, t) λ T (t)f(x + δx, u + δu, t) µ T (t)g(x + δx, u + δu, t)] dt + φ(x (t f ) + δx(t f )) tf φ(x (t f )) + [H(x, u, t) λ T (t)f(x, u, t) µ T g(x, u, t)] dt tf r T (t)g 1/p (x + δx, u + δu, t) dt Using Taylor s formula, the above equation becomes u P (u, r) = tf H x δx + H u δu (δxt H xx δx + 2δx T H xu δu + δu T H uu δu) dt tf tf λ T (t)( x + δx x ) dt µ T g(x + δx, u + δu, t) dt + (φ x (x (t f ))) T δx(t f ) δxt (t f )φ xx (x (t f ))δx(t f ) + tf r T (t)g 1/p (x + δx, u + δu, t) dt + R(δx, δu) as at the stationary point µ T (t)g(x, u, t) =. R(δx, δu) satisfies R(δx, δu) ɛ(δx, δu) tf (δx(t) T δx(t) + δu(t) T δu(t)) dt 13

23 and if δu then δx(t) and ɛ(δx, δu) (because of the continuous dependence of solution on parameter). Since H u = on the stationary point according to Pontryagin s Minimum Principle, u P (u, r) = tf H x δx (δxt H xx δx + 2δx T H xu δu + δu T H uu δu) dt tf tf λ T δx dt µ T g(x + δx, u + δu, t) dt + (φ x (x (t f ))) T δx(t f ) δxt (t f )φ xx (x (t f ))δx(t f ) + tf r T g 1/p (x + δx, u + δu, t) dt + R(δx, δu) Integrating the second integrand by parts, we have u P (u, r) = tf H x δx (δxt H xx δx + 2δx T H xu δu + δu T H uu δu) dt λ T δx t f + tf tf λ T δx dt µ T g(x + δx, u + δu, t) dt + (φ x (x (t f ))) T δx(t f ) δxt (t f )φ xx (x (t f ))δx(t f ) + tf r T g 1/p (x + δx, u + δu, t) dt + R(δx, δu) Based on Pontryagin s Minimum Principle, H x = λ T (t) and λ(t f ) = φ x (x (t f )), then Let δ 2 P = u P (u, r) = tf tf 1 2 (δxt H xx δx + 2δx T H xu δu + δu T H uu δu) dt δxt (t f )φ xx (x (t f ))δx(t f ) + tf (r T g 1/p µ T g)(x + δx, u + δu, t) dt + R(δx, δu) 1 2 (δxt H xx δx+2δx T H xu δu+δu T H uu δu) dt+ 1 2 δxt (t f )φ xx (x (t f ))δx(t f ) 14

24 and because r i (t) µ i (t) and g 1/p i g i for all i = 1, 2,..., m then Consequently, tf (r T g 1/p µ T g)(x + δx, u + δu, t) dt. u P (u, r) δ 2 P + R(δx, δu) Since H uu and H xx H xu H 1 uu H ux are positive definite, then it is possible to take s > (small enough) such that H uu si is positive definite and (H xx si) H xu (H uu si) 1 H ux = H xx H xu Huu 1 H ux (si + H xu (H uu si) 1 H ux H xu Huu 1 H ux ) is also positive definite. Therefore, there is a matrix Q such that and Q T Q = (H xx si) H xu (H uu si) 1 H ux tf δ 2 P s (δx T δx + δu T δu) dt = δxt (t f )φ xx δx(t f ) + 1 tf [ ] δu + (Huu si) 1 T [ ] H ux δx Huu si 2 Q δx I [ ] δu + (Huu si) 1 H ux δx dt Q δx Hence, u P (u, r) s 2 as δu. tf (δx T δx + δu T δu) dt + R(δx, δu) This completes the proof that u (t) is a local solution of Problem Corollary Based on the same assumptions in the theorem, u (t) is also a local solution of Problem

25 PROOF. Because u (t) is a local solution of the problem then there exists ρ such that u P (u, r) for all u(t) N(ρ) = {u(t) : u(t) u (t) ρ} It also holds for u(t) N(ρ) Ω. Thus, J(u ) = P (u, r) P (u, r) = J(u) Hence, under some conditions stated in the theorem, it has been proven that the power penalty function is an exact penalty. Remark For the case of a purely state constraint, h(x, t), the constraint does not give any direct information on how to choose control that satisfies it. For this reason, we need to replace the constraint g(x, u, t) in the theorem with an equivalent constraint ψh(x, t) + ϕh (x, u, t) where ψ, ϕ > are two additional control variables, as mentioned in [21]. Here h (x, u, t) = dh dt = h h f(x, u, t) + x t Remark To reduce the number of additional controls, we modify further this new contraint as follows ψh(x, t) + (1 ψ)h (x, u, t) where < ψ < 1 is a constant close to 1. It is obvious that if h(x, t 1 ) = then h (x, u, t 1 ) so that h(x, t) for t [t 1, t 2 ], t 2 > t 1. On the other hand, when h(x, t 1 ), the inequality implies h(x, t) < for t [t 1, t 2 ], t 2 > t 1 due to the dominance of the first term over the second term. 2.3 Smoothing Technique Note that the power penalty term, g 1/p (x, u, t), is not Lipschitz continuous at zero, therefore, standard optimization routines would have difficulty coping with it. To address this problem, the technique to smooth the sharp corner of 16

26 the function g 1/p is applied. With some modifications of the smoothing technique used in [54] and [55], the nonsmooth function g 1/p can be approximated by the smooth one, gρ 1/p. Consider the function g 1/p : R n+q+1 R : if g ; g 1/p = g 1/p if < g < 1; g p if g 1. where p > 1. To smooth the function g 1/p at g =, we define gρ 1/p : R n+q+1 R as follows: if g < ρ; g 1/p ρ (x, u, t) = ( p )ρ( 1 p 2) (g + ρ) 2 + ( 1 4p 1 4 )ρ( 1 p 3) (g + ρ) 3 if ρ g ρ; g 1/p if ρ < g < 1; g p if g 1. It is clear that gρ 1/p g and if ρ then gρ 1/p g 1/p as defined above. In case p = 1, the function gρ 1/p becomes a linear penalty used in [54] and [55]. Remark There is no need for the smoothing function to have continuous derivative at g = 1 as we are only interested in convergence behaviour at g =. Using g p for g 1, instead of g 1/p gives faster movement of iterations to feasibility. Remark Another successful smoothing technique for g 1/p is given in [58]. if g < gρ 1/p (x, u, t) = (n 1 1 p )ρ( n+ p +1) g (n 1) (n 1 1 p 1)ρ( n+ p ) g n if g ρ; g 1/p if ρ < g < 1; g p if g 1. where n 3 represents the degree of polynomial used in the approximation. 2.4 Numerical Examples In this section, some optimal control problems, to verify the theoretical findings in the previous sections are solved. They illustrate the effectiveness 17

27 of the proposed power penalty methods. For simplicity, the optimal control program MISER 3.3 based on the concept of control paramaterization is used. Details of the control parameterization method are given in [17], [24] and [53]. However, to suit the need of our power penalty method, the functions OCRHOSM and OCDRHOSM in MISER 3.3 should be modified using the smoothing technique described in the previous section. Example This simple example is taken from [62]. The problem is to minimize { 1 } min (x 2 + u 2 2u) dt (x(1))2 subject to ẋ = u x() = g(x, u, t) = (x 2 + u 2 t 2 1) The analytic optimal solution for this problem is x (t) = t and u (t) = 1, so that the constraint is active for all t [, 1]. Additionally, we have λ(t) = 2 e (t2 1)/2 and µ(t) = e(t2 1)/2 so that H(x, u, t) = (x 2 +u 2 2u)+(2 e (t2 1)/2 )u (1 1 2 e(t2 1)/2 )(x 2 +u 2 t 2 1). It is easy to show that φ xx is positive definite and that H uu, H xx H xu H 1 uu H ux are also positive definite on [, 1]. Therefore, all the sufficient conditions for Theorem are satisfied and if r(t) > µ(t) for all t [, 1] then the power penalty method will converge to the analytic optimal solution. The results of some computations are summarized in Table 2.1. The first, second and third columns are the values of smoothing factor, power parameter and penalty constant, respectively. The fourth and fifth column give the values of the cost functional and the number of iterations. And the last one indicates whether the solution violates the constraints or not. ρ p r J N it Constraint violated? No No No Table 2.1: Results for Example To visualize the solution to the problem, we plot in Figures 2.1 and 2.2, the optimal solution for p = 2, r = 1. The tables and figures for p = 2 and r = 1 18

28 that follow show that the results almost coincide with the analytic solution given by [62], i.e. u(t) = 1 and x(t) = t for all t [, 1] with optimal value J = t u x Table 2.2: Optimal solution for Example Error u x Table 2.3: Computed error for u and x in the maximum and L 2 norm. Example The following problem is from [54]. subject to the dynamics min u J(u) = 1 (x x u 2 ) dt x 1 = x 2, x 1 () =, x 2 = x 2 + u, x 2 () = 1, and subject to the all-time state inequality h(t, x) = 8(t.5) x 2, for all t [, 1] 19

29 optimal control u time (t) Figure 2.1: Optimal control for Example state x time (t) Figure 2.2: Optimal state for Example

30 Because this constraint is a purely state constraint then we firstly need to change it according to Remark In this example we choose ψ =.9 so that the new constraint becomes g(t, u, x) =.9 h(t, x) +.1 h (t, u, x) The results of some computations are displayed in the Table 2.4. The best result from the numerical experiments is which is slightly different to the best result in [54], i.e for the same level of control parameterization (1 equally spaced piecewise constant). In [54] the final value of the smoothing parameter ρ is 1 4, hence the better result. The optimal solution with p = 3, r = 1 are drawn in Figures 2.3, 2.4 and 2.5,. Moreover, Figure 2.5 confirms that the optimal solution satisfies the constraint, i.e. x 2 is below the quadrature 8(t.5) 2.5 for all time t. ρ p r J N it Constraint violated? No No No No Table 2.4: Result for Example t u x 1 x Table 2.5: Result for Example

31 1 8 6 optimal control u time (t) Figure 2.3: Optimal control for Example state x time (t) Figure 2.4: Optimal state x 1 for Example

32 state x time (t) 2.5 Conclusion Figure 2.5: Optimal state x 2 for Example This chapter proposes a nonlinear penalty method for solving continuous constrained optimal control problems. In addition, convergence analysis is derived and some numerical examples are used to confirm the theoretical result. The results show that this approach works very well. 23

33 Chapter 3 Sensitivity Analysis 3.1 Introduction Another important topic in optimal control is sensitivity analysis. In many situations, one is not only interested in the optimal solution but also to know how some systems depend on data. Furthermore, one needs to determine how specific variations in data will influence the value of the optimal performance criterion and the optimal solution previously attained. Solutions for these problems constitute what is known as sensitivity analysis. In the literature, there are many methods to determine sensitivity analysis for optimal control such as [11], [33] to name a few. However, to the author s knowledge, there are not many articles which present sensitivity analysis from the penalty method point of view, in particular power penalty method. Fiacco in [13] used a linear penalty method to estimate sensitivity information for nonlinear programming case but not for optimal control problems. For this reason, we investigate this topic. 3.2 Sensitivity Analysis Using Power Penalty Method This section deals with the sensitivity analysis for state-constrained optimal control problems. We will use the power penalty method to determine the effect of small perturbation of the constraint on performance criterion. First, we formulate the problem as follows: Problem min u(t) Ω J(u, θ) = tf L(x(t), u(t), t) dt + φ(x(t f )) 24

34 subject to ẋ = f(x(t), u(t), t), for all t (, t f ], x() = x, where Ω = {u(t) R q g(x, u, t, θ) h(x, u, t) θ }, for all t (, t f ], t f > is a constant, θ R m are small constants which represent small pertubations in the constraints, x R n, x R n is a given point and L : R n+q+1 R, φ : R n R, f : R n+q+1 R n and g : R n+q+1+m R m are known functions. Remark It is clear that if θ = then the Problem corresponds to the constrained optimal problem without perturbation as discussed in the previous Chapter. As in previous Chapter, Problem (3.2.1) can be converted to the following equivalent penalty problem: Problem tf min P (u, r, θ) = J(u) + r T (t)g 1/p (x(t), u(t), t, θ)dt u(t) R q subject to ẋ = f(x(t), u(t), t) for all t (, t f ], g 1/p i = x() = x, where p > 1 is a constant. Vectors r = (r 1 (t), r 2 (t),..., r m (t)) T and g 1/p = (g 1/p 1, g 1/p 2,..., gm 1/p ) T respectively satisfy r i (t) > and if g i ; for i = 1, 2,..., m. g 1/p i if < g i < 1; g p i if g i 1, (3.2.1) The next theorem states formulas which determine the size of the change in performance criterion as a result of small perturbation in constraints. Assumptions in are also necessary for this Theorem. However, the 25

35 equation (3.2.1) firstly needs to be replaced by the smooth version as in Chapter 2, Remark if g < gρ 1/p (x, u, t) = (n 1 1 p )ρ( n+ p +1) g (n 1) (n 1 1 p 1)ρ( n+ p ) g n if g ρ; g 1/p if ρ < g < 1; g p if g 1. where n represents the degree of polynomial used in the approximation. Theorem Let (x, u) and (x + δx, u + δu) be the unique optimal solution of the Problem with θ = and θ respectively. Let ɛ = max{ δx, δu }, then the change in performance criterion of small pertubations in the constraints becomes P = Or, P = tf tf where H is defined as (H(x + δx, u + δu, t, λ, r, θ) H(x + δx, u, t, λ, r, θ)) dt + O(ɛ 2 ) (3.2.2) (H(x + δx, u + δu, t, λ, r, θ) H(x + δx, u, t, λ, r, )) dt + O(ɛ 2 ) H(x, u, t, λ, r, θ) = L(x, u, t) + λ T f(x, u, t) + r T g 1/p ρ (x, u, t, θ) PROOF. We will evaluate P = L + φ + g where L = tf L(x + δx, u + δu, t) L(x, u, t) dt φ = φ(x + δx)(t f ) φ(x)(t f ) m tf ( ) gρ 1/p = r i g 1/k ρ,i (x + δx, u + δu, t, θ i) g 1/k ρ,i (x, u, t, ) dt i=1 (3.2.3) Firstly, the above L can be rewritten as follows: L = = tf tf L(x + δx, u + δu, t) L(x + δx, u, t) + L(x + δx, u, t) L(x, u, t) dt L(x + δx, u + δu, t) L(x + δx, u, t) + n i=1 L x i (x, u, t)δx i + O(ɛ 2 ) 26

36 n L The term (x, u, t) δx i will now be eliminated. Using the result from x i=1 i Pontryagin s Minimum Principle, we have λ i = L x i n j=1 λ j f j x i m l=1 g 1/p ρ,l r l x i ( n ) From d λ i δx i = dt i=1 ( n ) d λ i δx i = dt i=1 n i=1 ( n L δx i x i i=1 ( ) λ i δx i + λ i δx i we have n j=1 λ j f j x i δx i m l=1 g 1/p ρ,l r l δx i + λ i δx i )+O(ɛ 2 ) x i On the other hand, δx i = δf i = f i (x + δx, u + δu, t) f i (x, u, t) = f i (x + δx, u + δu, t) f i (x + δx, u, t) + f i (x + δx, u, t) f i (x, u, t) n f i = f i (x + δx, u + δu, t) f i (x + δx, u, t) + δx j + O(ɛ 2 ) x j j=1 Therefore, n L δx i = d n n λ i δx i + (λ i (f i (x + δx, u + δu, t) f i (x + δx, u, t)) x i dt i=1 i=1 ( n m ) g 1/p ρ,l r l + O(ɛ 2 ) x i i=1 i=1 l=1 Consequently, we can write L = + tf L(x + δx, u + δu, t) L(x + δx, u, t) d n λ i δx i dt i=1 ( ) n m g 1/p ρ,l λ i (f i (x + δx, u + δu, t) f i (x + δx, u, t)) r l dt x i i=1 + O(ɛ 2 ) (3.2.4) 27 l=1

37 Now we evaluate φ as follows: φ = (φ(x + δx) φ(x))(t f )) = n i=1 φ x i (x i (t f )) δx i (t f ) + O(ɛ 2 ) (3.2.5) Next, we also need to determine gρ 1/p. There are 2 possible expressions for it. 1. We expand gρ 1/p by introducing g 1/p ρ,l (x + δx, u, t, θ l) m tf g 1/p ρ = + = + l=1 tf m l=1 tf m l=1 tf m l=1 r l ( g 1/p ρ,l (x + δx, u + δu, t, θ l) g 1/p ρ,l (x + δx, u, t, θ l) r l ( g 1/p ρ,l (x + δx, u, t, θ l) g 1/p ρ,l (x, u, t, ) ) dt ) dt ( ) r l g 1/p ρ,l (x + δx, u + δu, t, θ l) g 1/p ρ,l (x + δx, u, t, θ l) dt r l ( n i=1 g 1/p ρ,l x i δx i + m l=1 ) g 1/p ρ,l δθ l dt + O(ɛ 2 ) (3.2.6) θ l 2. If instead of g 1/p ρ,l (x+δx, u, t, θ l), we insert g 1/p ρ,l (x+δx, u, t, ) then g1/p ρ will have expression as follows: m tf ( ) gρ 1/p = r l g 1/p ρ,l (x + δx, u + δu, t, θ l) g 1/p ρ,l (x + δx, u, t, ) dt + l=1 tf m l=1 n i=1 g 1/p ρ,l r l δx i dt + O(ɛ 2 ) (3.2.7) x i Summing (3.2.4), (3.2.5) and (3.2.6) gives tf n P = L(x + δx, u + δu, t) + λ i f i (x + δx, u + δu, t) dt + + tf m l=1 i=1 r l g 1/p ρ,l (x + δx, u + δu, t, θ l) dt n λ i f i (x + δx, u, t) dt i=1 tf m l=1 g 1/p ρ,l r l δθ l dt + θ l tf n i=1 m l=1 tf L(x + δx, u, t) r l g 1/p ρ,l (x + δx, u, t, θ l) dt φ x i (x i (t f )) δx i (t f ) + O(ɛ 2 ) 28 n i=1 λ i δx i t f

38 Since λ i (t f ) = φ x i (t f ) and δx i () = and if (x, u) is the optimal solution then g(x, u, t, θ) < implies g 1/p ρ,l θ l = g1/p lρ g l g l θ l δθ l =. Hence, where P = tf (H(x + δx, u + δu, t, θ) H(x + δx, u, t, θ)) dt + O(ɛ 2 ) H(x, u, t, θ) = L(x, u, t) + n λ i f i (x, u, t) + i=1 m l=1 r l g 1/p ρ,l (x, u, t, θ l) Analogous with previous work, if we add (3.2.4), (3.2.5) and (3.2.7) then P = tf (H(x + δx, u + δu, t, θ) H(x + δx, u, t, )) dt + O(ɛ 2 ). Remark In the theorem, a different smoothing function from Chapter 2 is applied in order to ensure that if g(x, u, t, θ) < then g1/p lρ g l =. Hence, g 1/p ρ,l θ l = g1/p lρ g l g l θ l δθ l =. Now, we present a method to determine δx and δu. Theorem Let (x, u) be the optimal solution of the Problem with θ =. If H uu >, then (x+δx, u+δu) is the approximated optimal solution for Problem with θ where δx is the solution of [ δẋ δ λ ] [ α β = γ α T 29 ] [ δx δλ ] + [ Rx R λ ]

39 with [ ] Hux α = f x [f u ]C o g x [ ] f T β = [f u ]C u o [ ] γ = H xx + [ H xu gx T Hux ]C o g x [ ] R x = [f u ]C o θ I [ ] R λ = [ H xu gx T ]C o θ I [ ] H 1 uu if constraint g is inactive at t; C o = [ ] Huu g u T 1 if constraint g is active at t. g u and δu is determined by [ ] ([ δu Hux fu = C T δr o g x ] [ δx δλ ] [ I ] ) θ PROOF. Pontryagin s necessary condition for optimality states that if (x, u) is an optimal solution then 1. H u (x, u, t) =. By the variational process, the variation of this equation leads to H uu δu + H ux δx + H uλ δλ + H ur δr =. Because H uλ = H λu = f T u and H ur = H ru = g T u then 2. λt (t) = H x (x, u, t). H uu δu + H ux δx + f T u δλ + g T u δr =. (3.2.8) Taking variations of this equation, we have δ λ = H xx δx H xu δu H xλ δλ H xr δr. As H xλ = H λx = f T x and H xr = H rx = g T x then δ λ = H xx δx H xu δu f T x δλ g T x δr. (3.2.9) 3

40 3. λ T (t f ) = φ x (x(t f )). This terminal condition gives terminal condition for δλ as follows: δλ(t f ) = φ xx (x(t f ))δx(t f ) (3.2.1) In addition, from the dynamical system ẋ = f(x, u, t) we derive the differential equations for δx δẋ = f x δx + f u δu. (3.2.11) with initial condition δx() =. Equations (3.2.11) and (3.2.9) can be written as [ ] [ ] [ ] [ δẋ fx δ λ δx fu = + δλ H xx f T x H xu g T x ] [ δu δr ] (3.2.12) If the constraint is inactive (g < ) at t then g 1/p =. It follows that r T g 1/p = regardless of the value of r. Therefore, we can conclude that Then, equation (3.2.8) becomes Combine (3.2.13) and (3.2.14), we get [ ] [ ] ([ δu H 1 uu Hux fu = T δr g x δr = (3.2.13) δu = H 1 uu (H ux δx + f T u δλ) (3.2.14) ] [ δx δλ ] In the case that the constraint is active (g = ) at t then [ I ] ) θ (3.2.15) g u δu = g x δx + θ (3.2.16) Equations (3.2.8) and (3.2.16) can be coupled together as [ ] [ ] δu Huu gu = T 1 ([ ] [ ] [ Hux fu T δx δr g u g x δλ I ] ) θ (3.2.17) In compact form, equations (3.2.15) and (3.2.17) can be rewritten as [ ] ([ ] [ ] [ ] ) δu Hux fu = C T δx δr o θ (3.2.18) g x δλ I 31

41 where C o = [ H 1 uu [ Huu g T u g u ] if constraint g is inactive at t; ] 1 if constraint g is active at t. Existence of matrix C is guaranteed by the positive definiteness of H uu. Substituting (3.2.18) into (3.2.12) and rearranging it, we have [ ] [ ] [ ] [ ] δẋ α β δx Rx δ λ = γ α T + δλ R λ (3.2.19) where [ Hux α = f x [f u ]C o g x [ ] f T β = [f u ]C u o γ = H xx + [ H xu g T x ]C o [ Hux R x = [f u ]C o [ I ] θ R λ = [ H xu g T x ]C o [ I ] ] θ g x ] Remark The discrete version of Theorem can be found in [18]. The Sweep method [6], usually used for solving unconstrained linear quadratic control problems, needs to be adapted for solving equation (3.2.19). Assume then δλ(t) = Z(t)δx(t) + Q(t) (3.2.2) δλ(t f ) = Z(t f )δx(t f ) + Q(t f ). 32

42 From equations (3.2.2) and (3.2.1) we may conclude that Z(t f ) = φ xx (t f ) (3.2.21) and Q(t f ) = (3.2.22) To determine δx, substitute δλ from (3.2.2) into the first row of (3.2.19) yields δẋ = (α βz)δx + (R x βq) (3.2.23) with initial condition δẋ() =. Integrating forward this initial-value problem, we obtain δx. Next, we need to establish δλ. Substituting (3.2.2) and its derivative into the second row of (3.2.19), we get Żδx + Zδẋ + Q = γ δx α T (Zδx + Q) + R λ. To eliminate δẋ, we use (3.2.23) so that Żδx + Q = (ZβZ Zα α T Z γ)δx + (Zβ α T )Q + (R λ ZR x ). (3.2.24) Equation (3.2.24) requires that Ż = ZβZ Zα α T Z γ (3.2.25) Q = (Zβ α T )Q + (R λ ZR x ) (3.2.26) Integrating backward equations (3.2.25) and (3.2.26) with terminal conditions (3.2.21) and (3.2.22), we obtain δλ. Remark Another possible method to solve the equations is by adapting transition matrix approach explained in [5]. 3.3 Numerical Examples In this section, we modify a numerical example from Chapter 2 and show how to apply Theorems and The computation will be done using MATLAB R21A and to illustrate the effectiveness of the method, we compare the computational results to the result generated by MISER

43 Example The constraint in Chapter 2 will be perturbed by θ =.1. Then the problem is { 1 } min (x 2 + u 2 2u) dt (x(1))2 subject to For θ =.1, the constraint becomes ẋ = u x() = g(x, u, t, θ) (x 2 + u 2 t 2 1).1 Since (x, u, t) satisfying g(x, u, t, ) also satisfies g(x, u, t, θ) then the feasible region of g(x, u, t,.1) is a superset of the feasible region of g(x, u, t, ). For that reason, to evaluate P we will use (3.2.2), i.e. P = tf (H(x + δx, u + δu, t, λ, r, θ) H(x + δx, u, t, λ, r, θ)) dt + O(ɛ 2 ). where δx and δu are from Theorem The result of the computation is P = which is close to the result from MISER 3.3, i.e Figures 3.1, 3.2 and 3.3 compare the optimal solution from computation and MISER 3.3. Example We do the same problem as before but with perturbation θ =.1. For θ =.1, the constraint becomes g(x, u, t, θ) (x 2 + u 2 t 2 1) +.1 Since the feasible region of g(x, u, t, ) contains the feasible region of g(x, u, t,.1), then to evaluate P we will use (3.2.3), i.e. P = tf (H(x + δx, u + δu, t, λ, r, θ) H(x + δx, u, t, λ, r, )) dt + O(ɛ 2 ). where δx and δu are from Theorem The result of the computation is P = which is close to the result from MISER 3.3, i.e Figures 3.4, 3.5 and 3.6 compare the optimal solution from computation and MISER Conclusion This Chapter describes a method to determine directly the effect of small changes on constraints on the objective function value and optimal solution. Numerical examples are tested to show the effectiveness of the proposed method. 34

44 Optimal Control Comparison optimal control from MISER 3.3 computed optimal control u t Figure 3.1: Optimal control comparison for Example with n = 3, p = 2 and λ = Optimal State Comparison optimal state from MISER 3.3 computed optimal state x t Figure 3.2: Optimal state comparison for Example with n = 3, p = 2 and λ =

45 x 1 5 The difference between computed and MISER optimal states δ x t Figure 3.3: The difference between computed and MISER optimal state for Example with n = 3, p = 2 and λ = Optimal Control Comparison optimal control from MISER 3.3 computed optimal control u t Figure 3.4: Optimal control comparison for Example with n = 3, p = 2 and λ =

46 Optimal State Comparison optimal state from MISER 3.3 computed optimal state 1.8 x t Figure 3.5: Optimal state comparison for Example with n = 3, p = 2 and λ = x 1 6 The difference between computed and MISER optimal states 2 4 δ x t Figure 3.6: The difference between computed and MISER optimal state for Example with n = 3, p = 2 and λ =

47 Chapter 4 Iterative Upwind Finite Difference Method 4.1 Introduction In this chapter we present a numerical method for solving a constrained optimal feedback control problem. Unlike in the previous chapters, we will formulate the problem as a closed-loop instead of an open-loop. Consider the following optimal problem, min u(t) U J(u) = tf s L(x(t), u(t), t) dt + φ(x(t f )) subject to ẋ = f(x(t), u(t), t), for all t (s, t f ], x(s) = y, where x R n and u U R m are the state and control, t f > is a constant, (s, y) [, t f ) R n,, y R n is a given point. L : R n+m+1 R, φ : R n R and f : R n+m+1 R n are known functions. If s = and y = x is fixed then the formulation is reduced to an open-loop optimal control problem. The solution to the open-loop problem constitutes an optimal control u along the optimal trajectory x from the initial state x. Unluckily, if perturbations exist in the state x such that the state is off the optimal trajectory then the corresponding optimal control solution is no longer available. On the other hand, the solution to the closed-loop problem is robust or stable because it is defined over a time-space region that contains the optimal trajectory. Hence, the corresponding optimal control still can be found if the disturbances are present in the state x. 38

48 For that purpose, we firstly need to translate the problem into first order partial differential equation called the Hamilton-Jacobi-Bellman(HJB) equation. By defining the value function V (s, y) = inf u J(s, y, u) and using the Dynamic Programming approach, this problem can be converted to HJB, with terminal condition V t + inf ( V f(x, u, t) + L(x, u, t)) = (4.1.1) u V (t f, x) = φ(x(t f )). In this equation there are two unknowns, the value function V and the optimal control u. Generally, the solution to HJB equation is continuous but nonsmooth. To deal with this nonsmooth solution, the concept of viscosity solution was introduced by P.L. Lions etc. in [7], [8] and [26]. In the presence of constraints, H.M. Soner in [49] broadened the definition of the viscosity solution to the constrained viscosity solution. More detail about viscosity solutions of HJB equation can be found in [1] and [14]. Although HJB equation theoretically has been solved, unfortunately, it is in general difficult to find the analytic solution. Hence, in practice numerical approximation is necessary. There are many numerical methods available in the literature for solving unconstrained HJB equation, such as in [1], [15], [2], [22], [57] and [59] to name but a few. Among these methods, we are interested in the Upwind Finite Difference Method introduced in [59] due to its simplicity. For one dimensional problems, the discretization of equation (4.1.1) is as follows: V k+1 i t V k i u k+1 i signf i k 2 ( = arg inf u f k i Vi+1 k Vi k x + 1 signf k i 2 f k i Vi 1 k Vi k x f(x i, u, t k+1 ) V k+1 i V k+1 i 1 + L(x i, u, t k+1 ) x + L k i = for k = 1,..., N and i = k,..., M k, where M, N are the number of partitions of spatial and time intervals respectively, signfi k denotes the sign of fi k and ) f k i = f(x i, t k, u k i ), L k i = L(x i, t k, u k i ), V k i V (x i, t k ), u k i u(x i, t k ). 39

49 This method is the Upwind Finite-Difference because it takes into account upwind directions from which characteristic information propagates. If f k i > the scheme switches to the forward-difference scheme and to backward-difference scheme for the opposite sign. However, there is a drawback, namely trapezoidal propagation of the spatial domain with each time step. This propagation causes a large initial region so that it leads to expensive computation due to a greater number of computed grid points. To address this problem and to improve the speed and accuracy of the method, in this chapter we introduce an Iterative Finite Upwind Difference Method inspired by Luus s work in [27]. 4.2 Iterative Upwind Finite Difference Method In this section we present an Iterative Upwind Finite Difference Method to solve constrained viscosity solution to Hamilton-Jacobi-Bellman equations. The iteration part of the method applies to the doubling of the number of discrete x-values in order to gain better accuracy. Adjustments from iteration to iteration are designed to create efficiencies. In order to iterate without a trapezoidal propagation of the spatial domain in each time step, we impose artificial boundary conditions explained later. Firstly, as in the previous chapter, we convert the constrained problem, Problem where tf min J(u) = L(x(t), u(t), t) dt + φ(x(t f )) u(t) Ω s subject to ẋ = f(x(t), u(t), t), for all t (s, t f ], x(s) = y, Ω = {u(t) R q g(x, u, t) }, for all t (, t f ], t f > is a constant, (s, y) [, t f ) R n, x R n, y R n is a given point and L : R n+q+1 R, φ : R n R, f : R n+q+1 R n and g : R n+q+1 R m are known functions. to an unconstrained optimal control problem by incorporating linear penalty terms in the objective function, 4

50 Problem min u(t) R q P (u, r) = J(u) + t f s r T (t)g ρ (x(t), u(t), t)dt subject to ẋ = f(x(t), u(t), t) for all t (s, t f ], x(s) = y, where r = (r 1 (t), r 2 (t),..., r m (t)) T is a vector satisfying r i (t) > for i=1,2,...,m and g ρ is the smoothed version of the constraints g = (g 1, g 2,..., g m ) T defined in Section 2.3 of Chapter 2. The corresponding HJB equation of this unconstrained problem is as follows: V t + inf u ( V f(x, u, t) + L(x, u, t) + rg ρ(x, u, t)) = (4.2.2) with terminal condition V (t f, x) = φ(x(t f )). Remark The use of linear penalty instead of power penalty terms in the formulation is intended to satisfy the assumption that L + rg should be Lipschitz continuous as mentioned in Chapter 1. For the convergence analysis of the linear penalty on constrained viscosity solution of HJB equations, we refer to [1], [4] and [29] Discretization of HJB To simplify notation, let us consider an optimal control problem with one control u and one state variable x [a, b]. Extension to a multivariable optimal control problem can be easily done with some adjustments to notation. In addition, without loss of generality and for the intention of numerical tests later, we will set s =, y = x and t f = 1. We start with constructions of spatial discretization and time stages. We select a positive integer M and divide the space interval [a, b] into M equal partitions so that x = b a M. (4.2.3) Therefore, the discretization for space interval becomes where i = 1,..., M + 1. x i = a + (i 1) x 41

51 In order that this spatial discretization always contains the initial point, an appropriate shifting might be necessary. Let j = arg min x i x and i make the adjustments x i x i + (x x(j)), i = 1,..., M (4.2.4) a a + (x x(j)) (4.2.5) b b + (x x(j)) (4.2.6) Next, we impose a limit on control u(t) where t [, 1], specifically, the lower bound u l and the upper bound u u. This constraint is usually determined by physical limits of the system control values. Because the Upwind Finite-Difference Method is an explicit method, to determine the number of time stages N, each of length t, we need to take account of the stability condition reported in [59], i.e. t x f. (4.2.7) As in this case t = 1, the stability condition is satisfied by N N f x. (4.2.8) Hence, t k = 1 + (k 1) t with k = 1,..., N + 1 is the backward partition of time interval [, 1]. This means that t 1 and t N+1 correspond to t = 1 and t = respectively. With notation Vi k V (t k, x i ) and u k i u(t k, x i ) for the value function and control variable at point x i and time t k, we split the equation (4.2.2) into 2 equations as in [59] and discretize it using a first order method for i = 1,..., M + 1 as follows: V k+1 i u k+1 i t V k i = arg inf u signf i k 2 ( V k fi k i+1 Vi k + 1 signf i k f k i 1 Vi k i +L k i +r gρ,i k = x 2 x f(x i, u, t k+1 ) V ) k+1 i V k+1 i 1 + L(x i, u, t k+1 ) + rg ρ (x i, u, t k+1 ) x where f k i = f(x i, t k, u k i ), L k i = L(x i, t k, u k i ), g k ρ,i = g ρ (x i, t k, u k i ). Next, we set the initial value function according to V 1 i = φ(x i (1)), i = 1,..., M + 1. (4.2.9) V k 42

52 and initial control value for i = 2,..., M + 1 u 1 i = arg min (f(t 1, x i, u) V ) i 1 Vi L(t 1, x i, u) + rg ρ (t 1, x i, u). x (4.2.1) In order to avoid the effect of a trapezoidal propagation on the spatial domain for each time stage as in [59], we set up some artificial boundary conditions for control and value function based on linear extrapolation of the closest known points. The linear extrapolation to boundaries is chosen because it is simple to apply in computation and the HJB equation is a first order PDE. Moreover, it gives freedom for the edge points to flip following the line directed by the values of two closest points. Thereby, for i = 1 u 1 1 = 2u 1 2 u 1 3. (4.2.11) Let η = t and we update the value function for k = 1,..., N as follows. x For i = 2,..., M V k+1 i = (1+η f k i )V k i 1 + signf k i 2 ηf k i V k i+1+ 1 signf k i 2 ηf k i V k i 1 tl k i t r g k ρ,i (4.2.12) where signf denotes the sign of f and for both boundaries for the value function, we do extrapolations u k+1 i V1 k+1 = 2V2 k+1 V3 k+1. (4.2.13) V k+1 M+1 = 2V k+1 M V k+1 M 1. (4.2.14) Moreover, to update control we set for k = 1,..., N and i = 2,..., M +1, ( = arg min f(t k+1, x i, u) V ) k+1 i V k+1 i 1 + L(t k+1, x i, u) + r g ρ (t k+1, x i, u) u x (4.2.15) and for the left boundary for control values u k+1 1 = 2u k+1 2 u k+1 3. (4.2.16) So far we have obtained V k i and u k i for i = 1,..., M +1 and k = 1,..., N +1. These constitute the first iteration of the method. 43

53 Remark For a multivariable optimal control problem the stability condition (4.2.7) and (4.2.8) can be replaced respectively by t N 1 (4.2.17) n f j / x j j=1 n j=1 and formulas (4.2.12) and (4.2.16) by ( ) n n V k+1 i = 1 + η j fj,i k V k+1 i + n j=1 j=1 1 signf k j,i 2 η j f k j,i V i j j=1 f j x j (4.2.18) 1 + signf k j,i 2 t L(t k, x i, u k i ) η j f k j,i V i + j t r g ρ (t k, x i, u k i ), for all internal i n V k+1 i V k+1 i i = arg min j f j (t k+1, x i, u) + L(t k+1, x i, u) + rg ρ (t k+1, x i, u), u x j u k+1 j=1 for all i except those where i j is zero. where i = (i 1,..., i n ) are indices corresponding to the discretization of x = (x 1,..., x n ), i + j = (i 1,..., i j 1, i j + 1, i j+1,..., i n ), i j = (i 1,..., i j 1, i j 1, i j+1,..., i n ) and f k j,i = f j (t k, x i, u k i ). The extrapolation to boundaries is also computed, to all edge points for the value function and to the left edge points for control Finding Optimal Trajectory and Control To iterate, we first need to determine the optimal trajectory from the first iteration. Starting with the initial value, we integrate forward the state equation ẋ = f(t, x, u) using the following predictor-corrector method. Let us name the resultant trajectory and control y p and u p for predictor, y c and u c for corrector with y p (1) = y c (1) = x and u c (1) = u(x, t N+1 ) respectively. The control value used during the integration is the optimal control value corresponding to the closest grid point to the resultant state as suggested in [56]. Thus, for l = 2,..., N + 1 y p (l) = y c (l 1) tf(t l+n+3, y p (l), u c (l 1)) u p (l) = u(x i (l), t l+n+3 ) 44

54 where i = arg min y p (l) x i, i = 1,..., M + 1 i y c (l) = y c (l 1) 1 t(f(t 2 l+n+3, y c (l 1), u c (l 1)) + f(t l+n+3, y p (l), u p (l))) (4.2.19) u c (l) = u(x i, t l+n+3 ) (4.2.2) where i = arg min i y c (l) x i, i = 1,..., M + 1. The resultant pair (y c (l), u c (l)) for l = 1,..., N + 1 makes an optimal trajectory and control for all time steps from the first iteration of the HJB. In addition, the value function along the optimal trajectory can be determined by the value function of the corresponding closest grid points. The penalty value and objective function value can also be evaluated by forward integration along the optimal trajectory of corresponding terms Region Size Reduction Now, we determine a procedure for region reduction based on the optimal trajectory and control from previous iteration. This new region is applied to the next iteration in order to improve computational speed and accuracy. What we need, first, is the maximum and the minimum value of resultant control and trajectory. Thus, for l = 1,..., N + 1 x max = max y c (l) x min = min y c (l) u max = max u c (l) u min = min u c (l) In view of the fact that for the next iteration the number of interval partitions M will be doubled, we set the region for the next iteration as follows: a = x min c x (4.2.21) b = x max + c x (4.2.22) u l = u min (4.2.23) u u = u max (4.2.24) M = 2M (4.2.25) where u l and u u are consecutively the lower bound and upper bound for the control and z means rounding the elements of z to the nearest integer less 45

55 than or equal to z, z rounding the elements of z to the nearest integer greater than or equal to z and c some given positive integer. c is used to make the region larger so as to improve stability Algorithm for Iterative Upwind Finite Difference Method To sum up, the algorithm consists of 3 phases, i.e. discretization of the computation of the HJB equation, forward integration of optimal trajectory and region reduction. 1. Discretization of HJB equation (a) Divide the space interval [a, b] into M equal partitions, each of length x as in (4.2.3). If necessary, shift the spatial discretization to contain the initial value using (4.2.4), (4.2.5) and (4.2.6). (b) Decide the lower and upper bound for control. (c) Determine the number of time stages N according to stability criterion (4.2.7) or (4.2.8). (d) Set the initial value for value function V and control u as is shown in (4.2.9) and (4.2.1) and extrapolate linearly the missing control value (see ). (e) Update the value function for the next time stage based on (4.2.12) and extrapolate the value function for the end grid points through (4.2.13) and (4.2.14). (f) Update and extrapolate the control value for the next time stage as is stated in (4.2.15) and (4.2.16). (g) Repeat (e) and (f) until t = is reached. 2. Finding optimal trajectory and control. To determine optimal trajectory and control for each iteration, use predictor-corrector method in (4.2.19) and (4.2.2). 3. Region Reduction. Set the new region for the next iteration using (4.2.21), (4.2.22), (4.2.23), (4.2.24) and (4.2.25). 4. Repeat the above steps until maximum iteration is reached or the difference between two consecutive value functions at the initial state point is less than some prescribed tolerance. 46

56 4.3 Numerical Examples In this section two examples are presented to show the effectiveness of the proposed method. The computation is done using MATLAB R21A and MATLAB Optimization Toolbox. The results are then compared to the analytic solution (if it exists) or to the result generated by MISER 3.3. Example This example contains 1 state, 1 control and 1 mixed(statecontrol) inequality constraint taken from [62]. The problem is to minimize { 1 } min (x 2 + u 2 2u) dt (x(1))2 subject to ẋ = u x() = g(x, u, t) = (x 2 + u 2 t 2 1) The analytic optimal solution for this problem is x (t) = t and u (t) = 1, so that the constraint is active for all t [, 1]. The value function for this solution is The corresponding HJB initial-value problem for this example is V t + min u (uv x + (x 2 + u 2 2u) rg ρ (x, u, t)) = V (1, x) = 1 2 (x(1))2 We start with region 1 x 2 and 2 u 2 for the first iteration and then reduce it progressively according to our proposed method. The problem has been resolved for various values of M and the summary of our computations are given in Table 4.1. it. M N pen. obj. value [a, b] [u l, u u ] %x [-1., 2.] [-2, 2] [-.375, 1.318] [, 2] [-.15, 1.97] [, 2] [-.37, 1.38] [, 2] [-.16, 1.15] [, 2].98 Table 4.1: Computational results for Example (c = r = 2, ρ = 1 2 ). The first, second and third column are respectively the number of iterations, the number of spatial and time partitions. The penalty value and 47

57 objective function value along the optimal trajectory for each iteration are shown in fourth and fifth columns. These values are evaluated by forward integration of corresponding terms along the optimal trajectory whereas the value function in sixth column is the value function at the initial point obtained from the Upwind Finite Difference Method. The discrepancy between the objective function value and the value function in each iteration is caused by the use of state and control values of the corresponding closest grid points along the optimal trajectory in the evaluation of the objective function value. However, from iteration to iteration this discrepancy becomes smaller. This indicates that the use of the state and control values of the corresponding closest grid points is a reasonable choice. It can be seen also that in general the penalty and value function decrease as the number of iterations and M increase. Additional information related to the space and control interval used during the iteration are in seventh and eighth column. Last column contains the space interval shrinkage factor, i.e. the ratio of latter space interval length to former. The smaller the shrinkage factor is, the larger the reduction of the space interval length for the next iteration. The shrinkage factor close to 1 indicates that the length of space interval for the next iteration does not change much. In addition, if the control interval becomes fixed, the iteration is almost convergent and further improvement might not be possible. The following tables confirm that the computed optimal control and state converge to the analytic solution as the error decreases significantly. M Error Table 4.2: Computed error for u in the maximum and L 2 norm. M Error Table 4.3: Computed error for x in the maximum and L 2 norm. The computational results for the last iteration are plotted in the following figures. It can be seen that the value function and control shown 48

58 in Figures 4.1 and 4.2 are smooth in the solution domain. This shows the success of the linear extrapolation used. Figure 4.1: Value function for Example Figure 4.3, 4.4 and 4.5 show state, value function and optimal control along the optimal trajectory. In this example, the computed state looks good since it coincides with the state from analytic solution. Unluckily, the choice of the value function and optimal control corresponding to the closest grid point to the optimal trajectory does not always guarantee that the resultant graph is continuous. This occurs in the optimal control in Figure 4.5. Even so, from Table 4.2 we can see that the error in computed optimal control is very small. Aside from this minor disadvantage, this remains a good choice due to its simplicity and high accuracy. In Figures 4.6 and 4.7 computed constraint and penalty value along optimal trajectory are plotted. The smoothing technique used with ρ = 1 2 sometimes results in a discrepancy between constraint and penalty values for some time period. For example, for t <.4 the graphs show that the constraint is inactive, but the penalty value is positive. This is due to the smoothing applied to max{, g} and will not lead to a big error owing to a relatively small value of ρ. Example The following optimal control problem has 2 states and 1 49

59 Figure 4.2: Optimal control for Example optimal state computed optimal state x analytical optimal state x 1 x t Figure 4.3: Optimal state along optimal trajectory for Example

60 .6 value function V t Figure 4.4: Value function along optimal trajectory for Example optimal control computed optimal control u analytical optimal control u 1.1 u t Figure 4.5: Optimal control along optimal trajectory for Example

61 2 x 1 3 constraints 1 g t Figure 4.6: Constraint along optimal trajectory for Example x 1 3 penalty term P t Figure 4.7: Penalty along optimal trajectory for Example

62 control taken from [54]. subject to the dynamics min u J(u) = 1 (x x u 2 ) dt x 1 = x 2, x 1 () =, x 2 = x 2 + u, x 2 () = 1, and subject to the all-time state inequality h(t, x) = 8(t.5) x 2, for all t [, 1] Because this constraint is a purely state constraint, it does not give any direct information on how to choose control that satisfies the constraint. For this reason, we need to change it according to Remark In this example we choose ψ =.9 so that the new constraint become g(t, u, x) =.9 h(t, x) +.1 h (t, u, x) The equivalent problem in HJB form for this problem is V t + min u (x 2 V x1 + (u x 2 )V x2 + (x x u 2 ) + rg ρ (t, x, u)) = V (1, x) = For the first iteration, we choose the region 1 x 1 1, 3 x 2 1, 2 u 2 and M = 8. The scheme and the stability condition for this multivariable problem are explained in Remark The results of the computation for various M are summarized in Table 4.4 and 4.5. The first, second and third column are respectively the number of iterations, the number of spatial and time partitions. Like in the previous example, the penalty value and the objective function value along the optimal trajectory for each iteration can be found in the fourth and fifth columns whereas the value function at the initial point is in the sixth column of Table 4.4. We can see that from iteration to iteration objective function value increases while at the same time the value function and penalty decrease significantly. The difference between the value function and objective function is even smaller. The lower and upper bound of the control used in each iteration are in the last column. The fourth and fifth column of Table 4.5 provide information related to interval for state variables x 1 and x 2 in each iteration and the last two 53

63 it. M N pen. obj. value [u l, u u ] [-2,2] [-5,14] [-3,14] [-3,14] [-3,14] Table 4.4: Computational result for Example (c = r = 2, ρ = 1 2 ). it. M N [a 1, b 1 ] [a 2, b 2 ] %x 1 %x [-1., 1.] [-3., 1.] [-.78,.5] [-2., 1.323] [-.45,.151] [-1.415,.595] [-.299,.34] [-1.125,.191] [-.29,.1] [-1.41,.65] Table 4.5: Computational result for Example (c = r = 2, ρ = 1 2 ). columns contain the space interval shrinkage factor, i.e. the ratio of latter space interval length to former. The results of the computation for M = 128 are plotted in the following figures. Figures 4.8 and 4.9 are respectively the cross-section of value function corresponding to x 2 = 1 and to x 1 =. The cross-section of optimal control corresponding to x 2 = 1 and to x 1 = are in Figures 4.1 and The graph of value function and optimal control along optimal trajectory are in Figures 4.12 and Figures 4.14 and 4.15 show the optimal states. From the latter figure, it is clear that the optimal solution satisfies the constraint, i.e. x 2 is below the quadrature 8(t.5) 2.5 for all time t. Unlike in the Example 4.3.1, both value function and optimal control along optimal trajectory have continous graphs. Moreover, in comparison to the results produced by MISER 3.3, the computed optimal control and states have many features in common. Figures 4.16 and 4.17 show the distinction between the original constraint h(t, x) and the modified one g(t, u, x), in particular when t <.5. During that period the constraint h(t, x) is inactive, whereas g(t, u, x) is active. However, this does not influence the computation that much as validated by the small value of penalty in Table 4.4 and Figure Unlike in the Example 4.3.1, there is no clear discrepancy between modified constraint in Figure 4.17 and penalty in Figure At t <.7 the violated modified 54

corresponding to x 2 = 1 for Example 4.3.

64 Figure 4.8: The cross-section of value function corresponding to x 2 = 1 for Example Figure 4.9: The cross-section of value function corresponding to x 1 = for Example

65 t x1 Figure 4.1: The cross-section of optimal control corresponding to x 2 = 1 for Example t x2 Figure 4.11: The cross-section of optimal control corresponding to x 1 = for Example

66 .25 value function.2 V t Figure 4.12: Value function along optimal trajectory for Example optimal control u from computation u from MISER 1 8 u t Figure 4.13: Optimal control along optimal trajectory for Example

67 .5 x1 from computation x1 from computation x1 from MISER.1.15 x t Figure 4.14: Optimal states x 1 for Example x2 from computation x2 from computation x2 from MISER constraint.5 x t Figure 4.15: Optimal states x 2 for Example

68 constraint introduces small penalty values, but on the other hand as t.7 the inactive modified constraint contributes zero penalty. original constraints: h(x,t) t Figure 4.16: Original constraint for Example Conclusion In this Chapter we present the Iterative Upwind Finite-Difference Method for the approximation of constrained viscosity solutions to Hamilton-Jacobi- Bellman Equations. As has been seen from two successful examples ranging from simple to more complex, this method is very effective not only to improve the accuracy but also reduce computational time compared to the available Upwind Finite-Difference Method introduced in [59]. In this method the trapezoidal propagation does not occur so that it reduces the number of computed grid points. Hence, the computational time is reduced. 59

69 .5 modified constraints: g=.9 h +.1 h' t Figure 4.17: Modified constraint for Example penalty term t Figure 4.18: Penalty along optimal trajectory for Example

70 Chapter 5 Modified Iterative Upwind Finite Difference Method 5.1 Introduction In Chapter 4 we have seen that the Iterative Upwind Finite-Difference Method works very well to approximate the constrained viscosity solutions to Hamilton- Jacobi-Bellman Equations. However, for multidimensional problems, it needs much computation time due to the fact that in each time step the method evaluates all grid points in the prescribed state domain. In this chapter, we set out the Modified Iterative Upwind Finite-Difference Method to increase the efficiency of computation. The main idea of this method is to get rid of unnecessary grid points in the domain and only compute the grid points close to the optimal trajectory, starting from a given initial point x. 5.2 Modified Iterative Upwind Finite Difference Method The Modified Iterative Upwind Finite-Difference Method will work as follows. For the first iteration, we do all steps of the Iterative Upwind Finite- Difference Method as explained in Chapter 4. Then, for the second iteration, instead of evaluating the value function and control for all grid points in the reduced region, we only update the value function and control for some selected grid points. These grid points are determined based on the closeness to the optimal trajectory within some prescribed distance d. Furthermore, for the rest of the iterations we update the value function and control only on these selected grid points. 61

71 5.2.1 First Iteration To simplify notation, let us consider an optimal control problem with one control u and 2 state variables x h [a h, b h ] where h = 1, 2. Extension to a general multivariable optimal control problem can be easily done with some adjustments to notation. This subsection follows closely the algorithm in Chapter 4, except that we describe here the method for 2 state variables and a way of setting the reduced region for next iteration which is now determined by the given distance d. We start with constructions of spatial discretization and time stages. We select a positive integer M and divide space interval [a h, b h ], h = 1, 2 into M partitions so that x h = b h a h M. (5.2.1) Therefore, the discretization for space interval becomes x h,i = a h + (i 1) x h (5.2.2) where h = 1, 2 and i = 1,..., M + 1. In order that this spatial discretization always contains the initial point, an appropriate shifting might be necessary. Let l = arg min x h,i x h () and make the adjustments for i = 1,..., M + 1 i x h,i x h,i + (x h () x h (l)) (5.2.3) a h a h + (x h () x h (l)) (5.2.4) b h b h + (x h () x h (l)) (5.2.5) Next, we impose a limit on control u(t) where t [, 1], specifically, the lower bound u l and the upper bound u u. Because the Upwind Finite-Difference Method is an explicit method, to determine the number of time stages N, each of length t, we need to take account of the stability condition reported in [59], i.e. t 1 2 h=1 f h / x h. (5.2.6) In the case that t = 1, the stability condition is satisfied by N N 2 h=1 f h x h. (5.2.7) Hence, t k = 1 + (k 1) t with k = 1,..., N + 1 is the backward partition of time interval [, 1]. This means that t 1 and t N+1 correspond to t = 1 and t = respectively. 62

72 With notation V k i,j V (t k, x i,j ) and u k i,j u(t k, x i,j ) for value function and control variable at point x i,j = (x 1,i, x 2,j ) and time t k, we set the initial value function according to V 1 i,j = φ(x i,j (1)); i, j = 1,..., M + 1. (5.2.8) and initial control value for i, j = 2,..., M + 1 u 1 i,j = arg min(f 1 (t 1, x i,j, u) V i,j 1 Vi 1,j 1 + f 2 (t 1, x i,j, u) V i,j 1 Vi,j 1 1 u x 1 x 2 + L(t 1, x i,j, u) + rg ρ (t 1, x i,j, u)). (5.2.9) where g ρ is the smoothed version of constraints g as in the section 2.3. Like in the Chapter 4 in order to avoid the effect of a trapezoidal propagation on the spatial domain for each time stage, we set up some artificial boundary conditions for control and value function based on linear extrapolation of the closest known points. Thereby, for i, j = 1,..., M + 1 u 1 i,1 = 2u 1 i,2 u 1 i,3 (5.2.1) u 1 1,j = 2u 1 2,j u 1 3,j. (5.2.11) Let η h = t x h, h = 1, 2 and we update the value function for k = 1,..., N as follows: i,j = 1 + signf 1 η 1 f 1 (t k, x i,j, u k 2 i,j)vi+1,j k 1 + signf signf 1 + η 1 f 1 (t k, x i,j, u k 2 i,j)vi 1,j k + 1 signf (1 + V k+1 h=1 η 2 f 2 (t k, x i,j, u k i,j)v k i,j+1 η 2 f 2 (t k, x i,j, u k i,j)v k i 1,j η h f h (t k, x i,j, u k i,j) )V k i,j tl(t k, x i,j, u k i,j) t r g ρ (t k, x i,j, u k i,j) for i, j = 2,..., M. (5.2.12) where signf h denotes the sign of f h (t k, x i,j, u k i,j). For both boundaries of the value function, similar extrapolation as in (5.2.1) and (5.2.11) are done, i.e. for i, j = 2,..., M + 1 V k+1 i,1 = 2V k+1 i,2 V k+1 i,3 (5.2.13) V k+1 1,j = 2V k+1 2,j V k+1 3,j (5.2.14) V k+1 i,m+1 = 2V k+1 i,m V k+1 i,m 1 (5.2.15) V k+1 M+1,j = 2V k+1 M,j V k+1 M 1,j. (5.2.16) 63

73 Moreover, to update control we set for k = 1,..., N u k+1 i,j = arg min(f 1 (t k+1, x i,j, u) V k+1 i,j V k+1 i 1,j + f 2 (t k+1, x i,j, u) V k+1 i,j V k+1 i,j 1 u x 1 x 2 + L(t k+1, x i,j, u) + r g ρ (t k+1, x i,j, u)) (5.2.17) for i, j = 2,..., M + 1. For the left and upper boundaries for control value, we have u k+1 i,1 = 2u k+1 i,2 u k+1 i,3 (5.2.18) u k+1 1,j = 2u k+1 2,j u k+1 3,j (5.2.19) So far we obtained Vi,j k and u k i,j for i, j = 1,..., M + 1 and k = 1,..., N + 1. Next, we determine the optimal trajectory and control from the iteration. Starting with the initial value, we integrate forward the state equation ẋ = f(t, x, u), where x = (x 1, x 2 ) and f = (f 1, f 2 ), using the following predictor-corrector method. Let us name the resultant path and control y p = (y p,1, y p,2 ) and u p for predictor, y c = (y c,1, y c,2 ) and u c for corrector with y p (1) = y c (1) = x and u c (1) = u(x(), t N+1 ) respectively. The control value used during the integration is the optimal control value corresponding to the closest grid point to the resultant state as suggested in [56]. Thus, for l = 2,..., N + 1 y p (l) = y c (l 1) t f(t l+n+3, y p (l), u c (l 1)) u p (l) = u(x i,j (l), t l+n+3) where (i, j ) = arg min (i,j) y p(l) x i,j, i, j = 1,..., M + 1 y c (l) = y c (l 1) 1 2 t(f(t l+n+3, y c (l 1), u c (l 1)) + f(t l+n+3, y p (l), u p (l))) (5.2.2) u c (l) = u(x i,j, t l+n+3) (5.2.21) where (i, j ) = arg min (i,j) y c(l) x i,j, i, j = 1,..., M + 1. The resultant pair (y c (l), u c (l)) for l = 1,..., N + 1 makes an optimal trajectory and control for all time steps from the first iteration of the HJB. Furthermore, the value function along the optimal trajectory can be determined by the value function of the corresponding closest grid points. The penalty value and objective function value can also be evaluated by forward integration along the optimal trajectory of corresponding terms. 64

74 Next, we reduce region size based on the optimal trajectory and control. This new region is applied to the next iteration in order for improving computational speed and accuracy. What we need, first, is the maximum and the minimum value of resultant control and path. Thus, for h = 1, 2 and l = 1,..., N + 1 x h,max = max y c,h (l) x h,min = min y c,h (l) u max = max u c (l) u min = min u c (l) For the purpose of evaluating the space interval shrinkage factor, i.e. the ratio of latter space interval length to former, we may save the space intervals before updating later. For h = 1, 2, a h a h b h b h Afterwards, in view of the fact that for the next iteration the number of interval partitions M will be doubled, we set the region for the next iteration as follows. For h = 1, 2, x h = x h,max x h,min 2M 2d a h = x h,min d x h (5.2.22) b h = x h,max + d x h (5.2.23) u l = u min (5.2.24) u u = u max (5.2.25) M = 2M (5.2.26) where z means rounding the elements of z to the nearest integer less than or equal to z, z rounding the elements of z to the nearest integer greater than or equal to z and d some given constant. This constant d constitutes a prescribed distance for selected grid points from the optimal trajectory as explained in more detail in the next part. If we would like to determine the space interval shrinkage factor, the following formula can be used. s h = b h a h b h a h for h = 1, 2. (5.2.27) We will see later that this shrinkage factor is a contributing factor in the distance update. 65

75 5.2.2 Second Iteration For the second iteration, we start with determining the number of time stages according to (5.2.7) and discretize the space as in (5.2.2). If necessary, we shift the spatial discretization to contain the initial value using (5.2.3), (5.2.4) and (5.2.5). Among the established grid points from space discretization, we select grid points for the second iteration based on their distance from the optimal trajectory. Principally, we choose points x d closed to the optimal trajectory y c within some given distance d, that satisfy (in integer grid point unit) y c x d 1 < d (5.2.28) Then, we set the initial value function and control for these selected grid points. Nevertheless, we firstly need to separate these selected grid points into 2 groups, namely interior and border grid points. What we mean with interior points are points that are surrounded by other selected points and border points are points that are directly adjacent to unselected points. For interior points, the initial value function is set following (5.2.8), whereas for border points (i, j) for k = 1 one of the following possible linear extrapolations can be used. 1. V k i,j = V k i,j 1 + V k i 1,j V k i 1,j 1 2. V k i,j = V k i 1,j + V k i,j+1 V k i 1,j+1 3. V k i,j = V k i,j 1 + V k i+1,j V k i+1,j 1 4. V k i,j = V k i+1,j + V k i,j+1 V k i+1,j+1 5. V k i,j = 2V k i,j 1 V k i,j 2 6. V k i,j = 2V k i 1,j V k i 2,j 7. V k i,j = 2V k i,j+1 V k i,j+2 8. V k i,j = 2V k i+1,j V k i+2,j We see that the right hand side of the above equalities are value function on interior points set previously. At least one of these extrapolations exists due to compactness of points close to the optimal trajectory for some (big enough) given distance d. Which extrapolation that we choose really depends on the availability of interior points. However, extrapolation using more interior points are preferential since we expect to get a better approximation to the true value. Similar to initial value function, the initial control is set as in (5.2.9) for interior points, while for border points (i, j) for k = 1 the initial control uses one of the extrapolations as follows: 66

76 1. u k i,j = u k i,j 1 + u k i 1,j u k i 1,j 1 2. u k i,j = u k i 1,j + u k i,j+1 u k i 1,j+1 3. u k i,j = u k i,j 1 + u k i+1,j u k i+1,j 1 4. u k i,j = u k i+1,j + u k i,j+1 u k i+1,j+1 5. u k i,j = 2u k i,j 1 u k i,j 2 6. u k i,j = 2u k i 1,j u k i 2,j 7. u k i,j = 2u k i,j+1 u k i,j+2 8. u k i,j = 2u 1 i+1,j u k i+2,j To update the value function for later time stages, i.e. k = 2,..., N + 1, the schemes follow (5.2.12) for interior points and one of the former value function linear extrapolations for border points. In a similar way, controls for time stages k = 2,..., N + 1 can be updated using (5.2.17) for interior points and one of the previous control linear extrapolations for border points. Thus far, we have Vi,j k and u k i,j for selected points (i, j) for all time stages k = 1,..., N + 1. The next step is integrating forward the dynamical system to obtain optimal pair (y c, u c ) for all time stages as explained in (5.2.2) and (5.2.21). Remark If during integration, the resultant trajectory goes outside the selected grid points, then we enlarge the distance d by one grid spacing. Thus, the new distance is d d + 1. (5.2.29) Then, we start again the second iteration with this new distance. Afterwards, as in the first iteration we save the space intervals as mentioned in (5.2.27) and reduce region size for next iteration like in (5.2.22), (5.2.23), (5.2.24), (5.2.25) and (5.2.26). In order to include these selected points for the next iteration, we need to update the distance d as follows: ( 1 ϱ = max, 1 ) s 1 s 2 d = 2 ϱ d. (5.2.3) where s h in ϱ is the shrinkage factor of space interval for x h between two consecutive iterations. In the distance update, the variable ϱ is multiplied by 2 due to the fact that we have doubled the space interval partition M for the next iteration. The distance d, in actual state units, needs to be maintained as iterations progress. 67

77 5.2.3 Modified Iterative Upwind Finite Difference Method Algorithm In summary, the algorithm of the Modified Iterative Upwind Finite-Difference Method is as follows: 1. First Iteration (a) Divide the space interval [a, b] into M equal partitions, each of length x as in (5.2.1). If necessary, shift the spatial discretization to contain the initial value using (5.2.3), (5.2.4) and (5.2.5). (b) Decide the lower and upper bound for control. (c) Determine the number of time stages N according to stability criterion (5.2.6) or (5.2.7). (d) Set the initial value for value function V and control u as is shown in (5.2.8) and (5.2.9) and extrapolate linearly the missing control value (see and ). (e) Update the value function for the next time stage based on (5.2.12) and extrapolate the value function for the end grid points through (5.2.13), (5.2.14), (5.2.15) and (5.2.16). (f) Update and extrapolate the control for the next time stage as is stated in (5.2.17), (5.2.18) and (5.2.19). (g) Repeat (e) and (f) until t = is reached. (h) Determine optimal trajectory and control using predictor-corrector method in (5.2.2) and (5.2.21). (i) Set the new region for next iteration using (5.2.22), (5.2.23), (5.2.24), (5.2.25) and (5.2.26). 2. Second Iteration (a) Discretize space interval as in (5.2.2) and if necessary, shift the spatial discretization to contain the initial value using (5.2.3), (5.2.4) and (5.2.5). (b) Determine the number of time stages N according to stability criterion (5.2.6) or (5.2.7). (c) Select points close to the optimal trajectory within some given distance d using (5.2.28). (d) Set the initial value function for the selected points using (5.2.8) for interior points and one of the value function extrapolation for border points. 68

78 (e) Set the initial control for the selected points using (5.2.9) for interior points and one of the control extrapolation for border points. (f) Update the value function for the selected points using (5.2.12) for interior points and one of the value function extrapolation for border points. (g) Update the control for the selected points using (5.2.9) for interior points and one of the control extrapolation for border points. (h) Repeat (f) and (g) until t = is reached. (i) Determine the optimal trajectory and control. If the resultant trajectory goes outside the region determined by the selected grid points, increase the distance d by one grid spacing and start again the second iteration using this new distance. (j) Reduce the size region according to (5.2.22), (5.2.23), (5.2.24), (5.2.25) and (5.2.26). (k) Update the distance d by multiplying with two times of the maximum of shrinkage factor inverse (see 5.2.3). 3. Repeat all steps in the second iteration for the next iteration. Stop iterations if maximum iteration is reached or the difference between two consecutive value functions at the initial state point are less than some prescribed tolerance. 5.3 Numerical Example The point of this section is to show that the proposed method is more efficient in term of computation time compared to method described in previous Chapter. The computation is done using MATLAB R21A and MATLAB Optimization Toolbox, then the results are compared to the results generated by MISER 3.3. For example, we look back on Example in Chapter 4 that has 2 states and 1 control. Example This optimal control problem is from [54]. subject to the dynamics min u J(u) = 1 (x x u 2 ) dt x 1 = x 2, x 1 () =, x 2 = x 2 + u, x 2 () = 1, 69

79 and subject to the all-time state inequality h(t, x) = 8(t.5) x 2, for all t [, 1] Like in Chapter 4, this constraint is a purely state constraint so that it does not give any direct information on how to choose control that satisfy the constraint. For this reason, we need to change it according to Remark In this example we choose ψ =.9 so that the new constraint become g(t, u, x) =.9 h(t, x) +.1 h (t, u, x) The equivalent problem in HJB form for this problem is V t + min u (x 2 V x1 + (u x 2 )V x2 + (x x u 2 ) + rg ρ (t, x, u)) = V (1, x) = For the first iteration, we choose the region 1 x 1 1, 3 x 2 1, 2 u 2 and M = 8. The result of the computation for various M is summarized in Table 5.1 and 5.2. The first, second, third and fourth column are respectively the number of iterations, distance, the number of spatial and time partitions. The penalty value and the objective function value along the optimal trajectory for each iteration are in the fifth and sixth columns. The seventh column of Table 5.1 contains the value function at the initial point. We can see that from iteration to iteration in general objective function value increases while at the same time the value function and penalty decrease significantly. The difference between the value function and objective function is getting smaller. The lower and upper bound of the control used in each iteration are in the last column. it. d M N pen. obj. value [u l, u u ] [-2,2] [-6,14] [-3,14] [-3,14] [-3,14] Table 5.1: Computational result for Example (r = 2, ρ = 1 2 ). The fifth and sixth column of Table 5.2 provide information related to the interval for state variables x 1 and x 2 in the iteration. The seventh and 7

80 eighth column contain the space interval shrinkage factor, i.e. the ratio of latter space interval length to former. The smaller the shrinkage factor is, the larger the reduction of the space interval length for the next iteration. The shrinkage factor close to 1 indicates that the length of space interval for the next iteration does not change much. In addition, if the control interval becomes fixed, the iteration is almost convergent and further improvement might not be possible. On the other hand, the shrinkage factor exceeding 1 gives an indication that the previous interval length is too short so that the method will automatically adapt to restore it. The last column refers to percentage of grid point evaluation compared to method without grid selection as explained in Chapter 4. As can be seen from the table, for this example the new method only evaluate not more than 65 percent of the complete set of grid points. Therefore, the efficiency in computation time can be increased by about 35 percent. Moreover, in comparison to the best value function in Chapter 4, i.e..272, the value function produced by this method is just slightly different,.278. it. d M N [a 1, b 1 ] [a 2, b 2 ] %x 1 %x 2 %M [-1., 1.] [-3., 1.] [-.243,.34] [-1.217,.523] [-.31,.34] [-1.217,.281] [-.316,.5] [-1.217,.286] [-.334,.53] [-1.217,.238] Table 5.2: Computational result for Example (r = 2, ρ = 1 2 ). Figures 5.1, 5.2, 5.3 and 5.4 show comparisons between optimal trajectory generated by two consecutive iterations and MISER 3.3. For instance, in Figure 5.1 we can see 3 graphs of optimal trajectories produced by first iteration (-+), second iteration (-) and MISER 3.3 (-.) whereas Figure 5.2 produce optimal trajectories from second iteration (-+), third iteration (-) and MISER 3.3 (-.). We note that in the first two iterations, the optimal trajectories from previous and present iterations are different. However, in the last two iterations, the differences between optimal trajectories from former and present iteration become smaller and two graphs even almost coincide (see Figures 5.3 and 5.4). The computational results for the last iteration are plotted in the Figures 5.5, 5.6, 5.7 and 5.7. They show the initial time(t=) and final time(t=1) for value function and control. The similar basic shape of the figures indicate that the method is quite stable under proposed distance d. 71

81 comparison of optimal paths x x1 Figure 5.1: Optimal trajectory from Iteration 2.2 comparison of optimal paths.2.4 x x1 Figure 5.2: Optimal trajectory from Iteration 3. 72

82 comparison of optimal paths.2.2 x x1 Figure 5.3: Optimal trajectory from Iteration 4..2 comparison of optimal paths.2.4 x x1 Figure 5.4: Optimal trajectory from Iteration 5. 73

83 value function x x2 Figure 5.5: Computed value function at t=. value function x x2 Figure 5.6: Computed value function at t=1. 74

84 optimal control x x2 Figure 5.7: Computed optimal control at t=. 2 optimal control x x2 Figure 5.8: Computed optimal control at t=1. 75

85 Similar graphs as in Chapter 4 can be seen in figures provided..25 value function.2.15 V t Figure 5.9: Value function along optimal trajectory. 5.4 Conclusion In this Chapter we modify the Iterative Upwind Finite-Difference Method from Chapter 4 in order to improve the speed of computation. As has been seen from example, this method is much more efficient than the former. It is likely to be even more efficient in higher state dimensions. However, a method for choosing a minimal value d still need to be investigated. Choosing d too small results in instability of the scheme. The article [37] may provide a good starting point for the study of d. 76

86 14 12 optimal control computed optimal control optimal control from MISER u t Figure 5.1: Optimal control along optimal trajectory..5 x1 from computation x1path from computation x1path from MISER.1.15 x t Figure 5.11: x 1 vs time 77

87 1.5 x2 from computation x2path from computation x2path from MISER constraint 1.5 x t Figure 5.12: x 2 vs time original constraints.5 1 g t Figure 5.13: Original constraints along optimal trajectory 78

88 .5 modified constraints.5 gmod t Figure 5.14: Modified constraints along optimal trajectory 79

89 Chapter 6 Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation 6.1 Introduction The numerical example in the previous Chapters showed that although the algorithms worked very well, their accuracy might be increased further. Thus in this Chapter we aim to combine Iterative Upwind Finite Difference Method from Chapter 4 with a variant of Richardson Extrapolation, namely Completed Richardson Extrapolation, to improve the accuracy. Although in this Chapter we only discuss the combination of those two methods, the same principle can be applied to the Modified Iterative Upwind Finite Difference Method from Chapter 5 and the Completed Richardson Extrapolation with some adjustments. As is commonly known, Richardson extrapolation is a technique for improving the order of accuracy of numerical results. The main idea behind this technique is as follows. If the rate of convergence of a discretization method with grid refinement is known and if discrete solutions on two systematically refined grids (coarse and fine grids) are available, then this information can be used to provide higher-order solution on the coarse grid. As a result, it is easily implemented as a postprocessor to solutions regardless of the algorithms/methods or equations producing them. The successful area of applications of this technique are numerical differentiations, numerical integrations and numerical solutions of ordinary differ- 8

90 ential equations. However, in order to work well, Richardson Extrapolation requires some assumptions such as smoothness and asymptotic range of the solution. Smoothness of the solution is mainly important because the analysis of Richardson Extrapolation is based on Taylor series expansion. The asymptotic range of solution means that the sequence of systematically refined grid points over which discretization error reduces at the formal order of accuracy of the discretization scheme. For example, if the order of accuracy of scheme is p, then the asymptotic range is reached as h is small(enough) such that the h p term dominates any remaining higher-order terms. Further details can be read in [32]. 6.2 Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation Completed Richardson Extrapolation The Completed Richardson Extrapolation proposed by Roache and Knupp in [39] is an extension of the original Richardson Extrapolation. They completed the method by giving higher-order solution not only on the coarse grid but on the entire fine grid. In particular, they presented application of the extrapolation on numerical solution of time-independent partial differential equations as examples. Furthermore, Richard [38] modified it in order to be used on time-dependent partial differential equation problems. In short, the formulas for Completed Richardson Extrapolation are as follows. Let ϕ c,i and ϕ f,j denote respectively the first order approximate solution at node i on the coarse and j on the fine grid. The fine grid here is formed by bisecting the coarse grid such that the fine grid coincide with the coarse grid only when the indices are odd (j = 2i 1) where i = 1, 2,..., N+1. Then the extrapolated second order approximate solution by the Completed Richardson Extrapolation, ϕ RE,j, is determined by ϕ RE,j = 2ϕ f,j ϕ c,i for j = 2i 1 ϕ RE,j+1 = ϕ f,j+1 +.5(ϕ RE,j ϕ f,j + ϕ RE,j+2 ϕ f,j+2 ) for j + 1 even. Remark In [39], Roache and Knupp applied the Completed Richardson Extrapolation to second order approximate solution obtained by centred differences so that the extrapolated solution became fourth order. In this case, the formula for j = 2i 1 need to be replaced by ϕ RE,j = 4 3 ϕ f,j 1 3 ϕ c,i. 81

91 6.2.2 The Singularly Perturbed Convection-Diffusion Equations In order to have a smooth solution as required by Richardson Extrapolation method, we firstly need to change V t + inf u ( V f(x, u, t) + L(x, u, t) + rg ρ(x, u, t)) = (6.2.1) with terminal condition V (t f, x) = φ(x(t f )) to the singularly perturbed convection-diffusion equation V t + inf u ( V f(x, u, t) + L(x, u, t) + rg ρ(x, u, t)) ε 2 V = (6.2.2) with terminal condition V (t f, x) = φ(x(t f )). The difference between equation (6.2.2) and (6.2.1) is only the diffusion term ε 2 V, ε >, which represents a small perturbation parameter. As ε the solution of (6.2.2) converges to the solution (6.2.1)(see [1]). As in Chapter 4, without loss of generality, we will use one control u and one state variable x [a, b] as a model to illustrate the scheme. Extension to a multivariable optimal control problem can be easily done with some adjustments to notation. First, we split the equation (6.2.2) into 2 equations, V t + ( V f(x, u, t) + L(x, u, t) + rg ρ (x, u, t)) ε 2 V =, u = arg inf ρ(x, u, t)). u then discretize it using a first order method for i = 1,..., M + 1 as follows: V k+1 i u k+1 i t V k i = arg inf u signf k i 2 f k i Vi+1 k Vi k x + r gρ,i k ε V i 1 k 2Vi k [ + 1 signf i k 2 f k i Vi 1 k Vi k x + L k i + Vi+1 k = (6.2.3) ( x) 2 f(x i, u, t k+1 ) V ] k+1 i V k+1 i 1 + L(x i, u, t k+1 ) + rg ρ (x i, u, t k+1 ) x (6.2.4) 82

92 where V k i V (t k, x i ), u k i u(t k, x i ), L k i = L(x i, u k i, t k ), f k i = f(x i, u k i, t k ), g k ρ,i = g ρ (x i, u k i, t k ) and signfi k denotes the sign of f at point x i and time t k. With η 1 = t and η x 2 = t, equation (6.2.3) can be rewritten as follows: ( x) 2 [ ] 1 + signf V k+1 i = (1 + η 1 fi k 2εη 2 )Vi k k i η 1 fi k + εη 2 Vi+1 k 2 [ ] 1 signf k + i η 1 fi k + εη 2 Vi 1 k t(l k k i + r g ρ i 2 ) for i = 2,..., M. (6.2.5) To prove the stability of the scheme under some conditions on the step length t and x, we will use the energy method. It is known from [51], that the scheme (6.2.5) is stable if and only if with L + rg ρ = it is also stable. The scheme (6.2.5) with L + rg ρ = is equivalent to V k+1 i = [ signf i k η 1 fi k 1 signf i k 2 2 [ ] [ 1 + signf k i 1 signf η 1 fi k + εη 2 Vi+1 k k + i 2 2 Denote the discrete maximum norm and V k = α = 1 + signf k i 2 β = 1 signf k i 2 max V i k 1 i M+1 η 1 f k i + εη 2 η 1 f k i + εη 2 then, under the condition < α + β < 1, we prove that η 1 f k i 2εη 2 ] V k i V k+1 i = (1 α β)vi k + αvi+1 k + βvi 1 k (1 α β) Vi k + α Vi+1 k + β Vi 1 k V k. η 1 f k i + εη 2 ] V k i 1 Taking the maximum for both sides with respect to i and using the induction method, we get V N+1 V N... V 1, 83

93 in other words, the scheme is stable under the condition < α + β < 1. Furthermore, in terms of N and x, the above stability condition becomes N x f 2ε ( x) 2. (6.2.6) Remark Note that the stability condition (6.2.6) is the extension of the stability condition in [59] in the case for ε Algorithm for Iterative Upwind Finite Difference Method with Completed Richardson Extrapolation In general, the algorithm is similar to the algorithm for Iterative Upwind Finite Difference Method in Chapter 4. First of all, we run Iterative Upwind Finite Difference Method as in Chapter 4. However, a small change in stopping criterion is introduced to incorporate Completed Richardson Extrapolation within the algorithm. Instead of using some prescribed tolerance for the difference at two consecutive value functions at the initial point, we will prescribe a lower bound for space interval shrinkage factor %x described in Chapter 4 to indicate that the iteration has nearly reached convergence and the result might not be improved anymore. For instance, the bound for shrinkage factor fixed at 95% means that we will stop the iteration only after the ratio of latter space interval length to former is at least 95%. This condition is also to ensure that the asymptotic range requirement for applying Richardson Extrapolation is satisfied. Afterwards, we will run additional iteration with Completed Richardson Extrapolation on the region reduction from the last iteration. This will improve the result accuracy from first-order to second-order. From the last iteration, we have a region reduction below. a = x min e x (6.2.7) b = x max + e x (6.2.8) u l = u min (6.2.9) u u = u max (6.2.1) M = 2M (6.2.11) where u l and u u are consecutively the lower bound and upper bound for the control and z means rounding the elements of z to the nearest integer less than or equal to z, z rounding the elements of z to the nearest integer 84

94 greater than or equal to z and e some given positive integer. In our algorithm, we use subscript c, f and RE to identify coarse grid, fine grid and grid with Richardson Extrapolation respectively. For example, we use Vc,i k to mean (V c ) k i, i.e. the value function on the coarse grid at point x i and time step k. Then, for the space interval [a, b] we set the following variables for coarse grid. M c = M x c = b a M c N c x c f 2ε ( x c ) 2 t c = 1 N c η 1,c = t c x c t c η 2,c = ( x c ) 2 Analoguous with coarse grid, we can determine x f ; t f ; η 1,f ; η 2,f M f = 2M c and N f = 2N c for fine grid. using Remark At this stage, an appropriate shifting in the spatial discretization such in the Chapter 4 might be necessary to include the starting point x. Remark The condition N f = 2N c fulfills the stability condition in (6.2.6), because [ ] xc f 2ε N f = 2N c = 2 ( x c ) [ 2 ] [ ] xc f 4ε xf f 2ε 2 =. ( x c ) 2 ( x f ) 2 To make clear, we use subscript and superscript j and l for space and time of the fine grid and i and k for coarse grid. Because fine grid is formed by dividing coarse grid by a factor of two (both in space and time) then fine grid coincides in space and time with coarse grid only if j = 2i 1 and l = 2k 1 where i = 1,..., M c + 1; k = 1,..., N c + 1. Next, we set the following initial value for value functions (both for coarse and fine grids) Vc,i 1 = φ(x c (1)) for i = 1,..., M c + 1 Vf,j 1 = φ(x f (1)) for j = 1,..., M f

95 and initial controls [ u 1 c,i = arg inf u u 1 f,j = arg inf u [ f(x c,i, u c, t 1 ) V c,i 1 Vc,(i 1) 1 + L(x c,i, u c, t 1 ) + rg ρ (x c,i, u c, t 1 ) x c f(x f,j, u f, t 1 ) V 1 f,j V 1 f,(j 1) x f + L(x f,j, u f, t 1 ) + rg ρ (x f,j, u f, t 1 ) where i = 2,..., M c + 1 and j = 2,..., M f + 1. The extrapolation to the left boundary for control values are required to update the value function for the next time steps. u 1 c,1 = 2u 1 c,2 u 1 c,3 u 1 f,1 = 2u 1 f,2 u 1 f,3 ] ] Then, we update the value function and control value as follows. k = 1,..., N c, compute VRE,i k for i = 1,..., M c + 1 as follows: For update coarse grid value function, i.e. for i = 2,..., M c [ ] 1 + signf V k+1 c,i = (1 + η 1,c fi k 2εη 2,c )Vc,i k k i η 1,c fi k + εη 2,c Vc,i+1 k 2 [ ] 1 signf k + i η 1,c fi k + εη 2,c Vc,i 1 k t(l k i + r g k 2 ρ,i) and extrapolate value function to boundaries, i.e. all edge points Vc,1 k+1 = 2Vc,2 k+1 Vc,3 k+1 V k+1 c,m c+1 = 2V k+1 c,m c V k+1 c,m c 1 update fine grid value function at time step l = 2k, i.e. for j = 2,..., M f V 2k f,j = (1 + η 1,f f 2k 1 + j 2εη 2,f )V 2k 1 f,j [ ] 1 + signf 2k 1 j η 1,f f 2k 1 j + εη 2,f V 2k 1 f,j+1 2 [ ] 1 signf 2k 1 j η 1,c f 2k 1 j + εη 2,c V 2k 1 f,j 1 2 t(l2k 1 j + r g 2k 1 ρ,j ) and extrapolate value function to boundaries, i.e. all edge points Vf,1 2k = 2Vf,2 2k Vf,3 2k Vf,M 2k f +1 = 2Vf,M 2k f Vf,M 2k f 1 86

96 update control value on fine grid at time step l = 2k, i.e. for j = 2,..., M f + 1 u 2k f,j = arg inf u [ f 2k j Vf,j 2k V ] f,j 1 2k + L 2k j + rg 2k x ρ,j) f and extrapolate control value to the left edge point u 2k f,1 = 2u 2k f,2 u 2k f,3 update fine grid value function at time step l = 2k + 1, i.e. for j = 2,..., M f V 2k+1 f,j = (1 + η 1,f fj 2k + 2εη 2,f )V 2k f,j [ ] 1 + signf 2k j η 1,f fj 2k + εη 2,f Vf,j+1 2k 2 [ ] 1 signf 2k j η 1,c fj 2k + εη 2,c Vf,j 1 2k t(l 2k j + r g 2k 2 ρ,j) and extrapolate value function to boundaries, i.e. all edge points V 2k+1 f,1 = 2V 2k+1 f,2 V 2k+1 f,3 V 2k+1 f,m f +1 = 2V 2k+1 f,m f V 2k+1 f,m f 1 apply Richardson Extrapolation on coarse grid, for i = 2,..., M c V k+1 2k+1 RE,i = 2Vf,2i 1 V k+1 c,i and extrapolate value function to boundaries, i.e. all edge points V k+1 RE,1 = 2V k+1 RE,2 V k+1 RE,3 V k+1 RE,M c+1 = 2V k+1 RE,M c V k+1 RE,M c 1 apply Richardson Extrapolation on fine grid value function, for j = 1,..., M c + 1 V 2k+1 f,2j = V 2k+1 f,2j +.5(V 2k+1 RE,2j 1 V 2k+1 f,2j 1 + V 2k+1 RE,2j+1 V 2k+1 f,2j+1 ) V 2k+1 f,2j 1 = V k+1 RE,j 87

97 update control value on coarse grid, i.e. for i = 2,..., M c + 1 [ V k+1 u k+1 c,i = arg inf f k+1 RE,i V ] k+1 RE,i 1 i + L k+1 i + rg k+1 ρ,i u x f and extrapolate control value to the left edge point u k+1 c,1 = 2u k+1 c,2 u k+1 c,3 update control value on fine grid, i.e. for j = 2,..., M f + 1 [ ] V 2k+1 u 2k+1 f,j = arg inf f 2k+1 f,j V 2k+1 f,j 1 j + L 2k+1 j + rg 2k+1 ρ,j u x f and extrapolate control value to the left edge point u 2k+1 f,1 = 2u 2k+1 f,2 u 2k+1 f,3 The resultant pair of matrices (V RE, u c ) is the solution of HJB equation which has a second order of accuracy. Remark The use of artificial boundary conditions, i.e. linear extrapolations to the boundaries, in the algorithm does not create boundary layers. Hence, nonuniform mesh layer-adapted meshes in our case are unnecessary. For further information related to boundary layers, we refer to [12], [31], [41], [47] and the references cited therein. 6.3 Numerical Example To test the effectiveness of this algorithm, we revisited one dimensional example from previous Chapter. Example This example contains 1 state, 1 control and 1 mixed (statecontrol) inequality constraint taken from [62]. The problem is to minimize { 1 } min (x 2 + u 2 2u) dt (x(1))2 subject to ẋ = u x() = g(x, u, t) = (x 2 + u 2 t 2 1) 88

98 The analytic optimal solution for this problem is x (t) = t and u (t) = 1, so that the constraint is active for all t [, 1]. The value function for this solution is The corresponding HJB initial-value problem for this example is V t + min(uv x + (x 2 + u 2 2u) + rg ρ (x, u, t)) = V (1, x) = 1 2 (x(1))2 We start with region 1 x 2 and 2 u 2 for the first iteration and then reduce it progressively according to our proposed method. The problem has been resolved for ε = 1 1 and various values of M. The first four iterations are purely computed with Iterative Upwind Finite-Difference Method while the last iteration for M = 256 is the result of implementation of the Completed Richardson Extrapolation. The summary of our computations are given in Table 6.1. it. M N pen. obj. value [a, b] [u l, u u ] %x [-1., 2.] [-2, 2] [-.375, 1.319] [, 2] [-.16, 1.98] [, 2] [-.38, 1.39] [, 2] [-.17, 1.15] [, 2] Table 6.1: Computational result for Example (e = r = 2, ε = 1 1 ). Tables 6.1, 6.2 and 6.3 indicate that the computed optimal control and state converge to the analytic solution as the error decreases significantly. The computational results in the first four rows confirm the claim that the singularly perturbed convection-diffusion equation converges to HJB equation for ε (see similar tables in Chapter 4 for comparison). M Error Table 6.2: Computed error for u in the maximum and L 2 norm. After application of the extrapolation method, this algorithm gives a better result in terms of value function at the initial point and computed 89

M Error 16 32 64 128 256.562.83.13.16.4 2.149.25.69.94.72 Table 6.3: Computed error for x in the maximum and L 2 norm. error in the L 2 norm for M = 256.

Similar graphs as in Chapter 4 for the last iteration, M = 256, can be seen in the figures provided. Figure 6.1: Value function for Example 6.3.1 6.

99 M Error Table 6.3: Computed error for x in the maximum and L 2 norm. error in the L 2 norm for M = 256. The value function decreases slightly from.1613 to.1623 and the computed error in L 2 norm goes down about 2.7% in x and.5% in u. Similar graphs as in Chapter 4 for the last iteration, M = 256, can be seen in the figures provided. Figure 6.1: Value function for Example Conclusion This Chapter presents the implementation of the Completed Richardson Extrapolation in the Iterative Upwind Finite Difference Method from Chapter 4 in order to improve the computational accuracy. The method is stable under some conditions and it has slightly improved the accuracy in the numerical test. 9

Figure 6.2: Optimal control for Example 6.3.1 1.4 1.

100 Figure 6.2: Optimal control for Example optimal state computed optimal state x analytical optimal state x 1.8 x t Figure 6.3: Optimal state along optimal trajectory for Example

Deterministic Dynamic Programming

Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0