Course on Optimal Control. Part I: the Pontryagin approach. J. Frédéric Bonnans

Size: px

Start display at page:

Download "Course on Optimal Control. Part I: the Pontryagin approach. J. Frédéric Bonnans"

Ralph Ford
5 years ago
Views:

1 Course on Optimal Control OROC Ensta Paris Tech and Optimization master, U. Paris-Saclay Part I: the Pontryagin approach Version of Sept. 5, J. Frédéric Bonnans Centre de Mathématiques Appliquées, Inria, Ecole Polytechnique, Palaiseau, France (Frederic.Bonnans@inria.fr). 1 Updates and additional material on the page bonnans/notes/oc/oc.html

3 Preface These are lecture notes for the Optimal control course for students of both Ensta Paris Tech and the Optimization master, U. Paris-Saclay. We consider optimization problems for control systems modelized with ordinary differential equations. These lecture note deal only with the Pontryagin approach, in which we mainly discuss necessary conditions for a trajectory to be optimal. The notes are organized as follows. We give several examples of optimal control problems in chapter 1. Chapter 2 recalls general results in convex analysis and differential calculus (with a special focus on the notion of reduction) and obtains first order optimality conditions. Chapter 3 states Pontryagin s principle for a problem with control and initial-final constraints, as well as its extension for problems with decision variables (not depending on time), and variable horizon. It presents an application to minimal time problems, including geodesics in a Riemaniann setting, and a discussion of qualification conditions. The proof of Pontryagin s principle is based on Ekeland s principle. Chapter 4 is devoted to the study of the shooting algorithm, that sometimes allows to reduce an optimal control problem to the computation of a zero of a finite dimensional shooting function. This assumes that Pontryagin s principle allows to express the control as a function of the state and costate. We analyze problems that are unconstrained or with initial-final constraints. It is shown how the local well-posedness of the shooting function allows to obtain error estimates for discretized problems. Finally we show that this well-posedness can be characterized by some second order optimality conditions. Chapter 5 deals with problems with running state constraints (constraints for each time). We state and prove a version of Pontryagin s principle for this class. Junction conditions are analyzed, based on the notion of order of the constraints. As an illustration we discuss problems of elastic string and beam with obstacle. Some useful references related to this course are: [14] for optimization theory (convex analysis, duality, Lagrange multipliers), [13] for an introduction to automatic control and optimal control, and [5], [6] for algorithmic aspects. Concerning applications : in spatial mechanics and quantum mechanics in the second case) [15], [16], in medecine, part IV of [26], in ecology [23], in chemistry [6], and on bioprocesses, e.g. [4]. We also mention the more advanced book [34], and [2] on the shooting approach. 3

5 Contents Chapter 1. Motivating examples 9 1. Simple linear quadratic problems Quadratic regulator Fuller s problem 1 2. Population dynamics Fish harvesting Cell population control Predator-Prey Bioreactors Chemostat Bio-gaz production Flight mechanics The Goddard problem Space trajectories 13 Chapter 2. First steps Calculus in R n and in Banach spaces Some notations Normal cones and separation of convex sets Preliminary Minimization over a convex set Normal cones Separation of sets Taylor expansions Lagrange multipliers Implicit function theorem On Newton s method Reduced equation Reduced cost First order analysis Computation of the derivative of the reduced cost Reduction Lagrangian Second order analysis Equality constrained optimization First order optimality conditions Second order optimality conditions Link with the reduction approach Calculus over L spaces Optimal control setting Weak derivatives Controlled dynamical system and associated cost Derivative of the cost function Optimality conditions with control constraints Examples 33 5

6 4.1. Multiple integrators Example: double integrator, quadratic energy Control constraints 34 Chapter 3. Pontryagin s principle Pontryagin s minimum principle The easy Pontryagin s minimum principle Main result A related nonlinear expansion of the state Pontryagin s principle with two point constraints Main result Specific structures Link with Lagrange multipliers; convex problems Variations of the pre-hamiltonian Pontryagin extremal over (, T ) Constant pre-hamiltonian for autonomous problem Variation of the pre-hamiltonian for non autonomous problem Extensions Decision variables A design problem Variable horizon Varying initial and final time Interior point constraints A general setting Variable durations Variable dynamics structure Minimal time problems, geodesics Qualification conditions Lagrange multipliers General framework Specific structures Pontryagin multipliers Proof of Pontryagin s principle Ekeland s principle Ekeland s metric and the penalized problem Proof of theorem Chapter 4. Shooting algorithm, second order conditions Shooting algorithm Unconstrained problems Problems with initial-final state constraints Time discretization Second-order optimality conditions Expansion of the reduced cost More on the Hessian of the reduced cost Case of initial-final equality constraints Links with the shooting algorithm 68 Chapter 5. State constrained problems Pontryagin s principle Setting Duality in spaces of continuous functions Bounded variation functions 72 6

7 Costate equation Link with Lagrange multipliers; convex problems Variation of the pre-hamiltonian Constant pre-hamiltonian for autonomous systems Non autonomous systems Decision variables, variable horizon Decision variables Variable horizon Junction conditions Representation of control constraints Constraint order and junction conditions Order of a state constraint Costate jumps Continuity of the multipliers Continuity of the control Jumps of the derivative of the control Examples 8 3. Proof of Pontryagin s principle Some results in convex analysis Renormalization and distance to a convex set Diffuse perturbations Proof of theorem Bibliography 89 Index 91 7

9 1 Motivating examples Contents 1. Simple linear quadratic problems Quadratic regulator Fuller s problem 1 2. Population dynamics Fish harvesting Cell population control Predator-Prey Bioreactors Chemostat Bio-gaz production Flight mechanics The Goddard problem Space trajectories 13 We present in this chapter some typical examples of optimal control problems arising in the computation of regulators, for population dynamics problem in ecology or biology, bioprocesses, and flight mechanics. The examples were solved numerically using the software bocop.org; some of them are taken from the collection [8]. 1. Simple linear quadratic problems In this section we present two academic examples, who may seem close one to each other. It appears, however, that the solutions have completely different behavior Quadratic regulator. Consider the problem: (1.1) min 1 2 (x 2 1(t) + x 2 2(t))dt; s.t. ẋ 1 (t) = x 2 (t); ẋ 2 (t) = u(t), t (, T ); x 1 () =, x 2 () = 1, for t [, T ], T = 5, subject to the control and state bound constraint: (1.2) 1 u(t) 1, x 2 (t).2, t (, T ). 9

10 We write the problem in the Mayer form (i.e., with a final cost only), defining an extra state variable with dynamics (1.3) ẋ 3 (t) = 1 2 (x2 1(t) + x 2 2(t)), x 3 () =. In this new setting the cost function can be expressed as x 3 (T ). The optimal trajectory, displayed in figure 1, appears to have three arcs (connected by two junction points): lower control bound, active state constraint, and a final unconstrained arc; the control is discontinuous at the junction points. Figure 1. x 1 : position, x 2 : speed, x 3 : cost, u : acceleration 1.2. Fuller s problem. Here is a variant of the previous problem, due to [22]: (1.4) Min x 2 (t)dt; ẍ(t) = u(t) [ 1 2, 1 2 ]; x() = x 1 ; ẋ() = x 2 given. For large enough T, the solution is as follows. There exists τ (, T ) such that: (i) over (, τ), the control has values ±1, the arc lenghts converging geometrically to, and (ii) over (τ, T ), x(t) = u(t) =. These switches are not easy to reproduce numerically. Those in figure 2 were obtained with 1 time steps, T = 3.5, x() =, ẋ() = 1, x(t ) = ẋ(t ) =. 2. Population dynamics We next discuss examples of population dynamics problems, that are widely used in ecology and biology Fish harvesting. The state equation is (1.5) ẋ(t) = B(x(t))x(t) au(t)x(t) Here x(t) is the size of the fish population, B( ) is the birth rate, a > is the harvesting efficiency, u(t) is the harvesting effort; the cost function is (1.6) (c(u(t)) u(t)x(t))dt 1

11 Figure 2. Fuller problem: chattering control (with zoom); x and v. where c(u) is the harvesting cost, with constraint u(t) 1 and x(t ) x T. Clasical birth rate models are the exponential, logistics and Gomperz ones, with positive parameters a, b: (1.7) B E (x) := ax; B L (x) := a(1 x/b); B G (x) := a log(b/x). In the uncontrolled case, that is, when u(t) = for all t, for the logistics and Gomperz models, x(t) b in a monotone way, with an exponential speed of convergence: for some C > and d > depending on the initial state x(): (1.8) x(t) b Ce dt. One can consider the optimization of a steady-state solution: compute u solution of (1.9) Min u c(u) aux; B(x)x = aux; u [, u M ]. About this and related optimal control problems, see [23, Ch. 4] Cell population control. Similar state equation (1.1) ẋ(t) = B(x(t))x(t) aw(t)x(t) are used for modelling cell populations. Here w(t) is the drug concentration, subject to the upper bound w(t) w M, and possibly satisfying the dynamics (1.11) ẇ(t) = bw(t) + v(t), with b > elimination rate and v(t) drug injection rate. One calls (1.1) the pharmacodynamics equation (modelling the action of the drug) and (1.11) the pharmacokinetics equation (which models the dynamics of the drug itself). We want to minimize x(t ), with possibly constraints on time integrals of w(t): (1.12) βi α i w(t)dt γ i, with α i < β i T and γ i >, for i = 1 to N. Extensions of such models are applied to the problem of optimizing cancer treatments [35]. 11

12 Figure 3. Trajectory in predator-prey space, beginning at the bottom of the figure. Four arcs: lower control bound, state constraint (vertical segment on the right), lower control bound (nearly vertical segment on the right), unconstrained Predator-Prey. A simple extension to the controlled setting of the Lokta-Volterra system is, see [23, Ch. 5] or [24]: { ẋ = αx βxy a1 u, (1.13) ẏ = δxy γy a 2 u, with positive parameters α, β, δ, a 1, a 2. The uncontrolled dynamics (with u(t) = for all t) has a unique nonzero equilibrium point (1.14) ˆx = γ/δ; ŷ = α/β. The uncontrolled trajectories are a closed loop around this point. The objective is to reach (ˆx, ŷ) by minimizing the sum of final time and c times the integral of the control. For the numerical illustration the constants are α = β = γ = δ = 1; b 1 =.9, b 2 = 1, c = 1; see figure Bioreactors 3.1. Chemostat. The simplest bioreactor model is the chemostat one: ẋ(t) = (µ(s(t)) u(t)/v)x(t), (1.15) ṡ(t) = 1 u(t) µ(s(t))x(t) + γ v (s in s(t)). The state variables are the biomass concentration x and substrate s. We can take the growth function µ of Monod type: µ(s) = µ max s/(k+s), with µ max > and k >. The yield coefficient is γ >. The control is the flow u [, u M ]. The volume v > is either constant, of for batch processes a function of time, subject to the dynamics (1.16) v(t) = u(t) The problem is usually to reach a given state in minimal time. See the analysis of the solution in [3]. 12

13 3.2. Bio-gaz production. The state equation is ẏ(t) = ν(t)y(t)/(1 + y(t)) (r + u(t))y(t), (1.17) ẋ(t) = (µ(s(t)) βu(t))x(t), ṡ(t) = βu(δy(t) s(t)) µ(s(t))x(t). Here y is the mass of algae, x is the biomass concentration and s is the substrate; β > is the volume ratio between the algae tanks and the bioreactor. To function to be maximized is (1.18) µ(s(t))x(t)dt. Given a light model ν(t) that is periodic with period of a day, the solution tends to have a periodic behavior when optimizing with a large horizon; see [3]. 4. Flight mechanics 4.1. The Goddard problem. This is the problem of a vertical ascent of a rocket, see [36], State variables: position h, speed v, mass m. Control variable u: normalized thrust (maximal thrust denoted by T max ) Cost: opposite of final mass (minimize fuel consumption) T : free final time, D(h, v) is the drag function. The state equation is (1.19) max m(t ), s.t. ḣ = v, v = 1/h m (T max u D(h, v)), ṁ = b T max u, u 1, D(h, v) D max, h() =, v() =, m() = 1, h(t ) = 1. We take the simplest model of drag function: cρ(h)v 2, with c > the aerodynamic coefficient, and ρ volumic mass expressed as function of altitude, in the exponential volumic mass model: ρ(h) = αe βh for some positive α, β. The optimal solution has four arcs, see figure 4. For a three dimensional extension of this problem, see [12] Space trajectories. There is a large literature on the subject; let us just mention the case of space lauchers [29], and atmosphere reentry [1]. 13

14 Figure 4. Four arcs: full thrust, maximum drag, unconstrained arc, zero thrust. Discontinuity of control at all junction points. 14

15 2 First steps Contents 1. Calculus in R n and in Banach spaces Some notations Normal cones and separation of convex sets Taylor expansions Lagrange multipliers Implicit function theorem On Newton s method Reduced equation Reduced cost Equality constrained optimization Calculus over L spaces Optimal control setting Weak derivatives Controlled dynamical system and associated cost Derivative of the cost function Optimality conditions with control constraints Examples Multiple integrators Control constraints Calculus in R n and in Banach spaces 1.1. Some notations. We denote by R n the Euclidean space of dimension n, whose elements are identified with vertical vectors, with norm x := ( n i=1 x2 i )1/2 and scalar product x y := n i=1 x iy i, for all x, y in R n. The dual space R n is identified with the set of n dimensional horizontal vectors. Other useful norms in R n are the l s norm x s := ( n i=1 x i s ) 1/s, for s [1, [, and the uniform norm x := max{ x i, 1 i n}. Note that norms in R n are denoted with a single bar. Let A, B be n dimensional symmetric matrices. We write A B if A B is semi positive definite, and A B if A B is positive definite. Let X be a Banach space (a normed vector space that is complete, i.e., every Cauchy sequence has a limit), with norm denoted by x X or x if there is no ambiguity. If Y is another Banach space, we denote by L(X, Y ) the set of linear continuous mappings X Y. Endowed with the 15

16 norm (2.1) A := sup{ Ax Y ; x X 1}, L(X, Y ) is a Banach space. Note that a linear mapping A : X Y is continuous iff the above r.h.s. is finite. Exercice 2.1. Prove that, similarly, if Z is a third Banach space and a : X Y Z is bilinear, then it is continuous iff (2.2) a := sup{ a(x, y) Z ; x X 1, y Y 1}, is finite, and that the set of continuous bilinear forms endowed with this norm is a Banach space. The set of linear continuous forms over X (linear and continuous applications X R) is denoted by X, and the action of x X over x X is denoted by x, x X. The space X is a Banach space, endowed with the dual norm (2.3) x X := sup{ x, x X ; x X 1}. If A L(X, Y ) with X and Y Banach spaces, we denote by A L(Y, X ) the transpose operator defined, for all y Y, by (2.4) A y, x X = y, Ax Y, for all x X. That A is continuous follows from the relations (2.5) A y X = sup A y, x X A y Y. x 1 We say that the Banach space X is a Hilbert space if there exists a symmetric continuous bilinear form a(, ) over X such that (2.6) x 2 = a(x, x), for all x X. A Hilbert space H is endowed with the scalar product (2.7) (x, x ) X := a(x, x ), for all x, x in X. By the Fréchet-Riesz representation theorem, if X is a Hilbert space, we have that (2.8) Given x X, there exists x X such that x, x X = (x, x) X, for all x X. Exercice 2.2. Check that x is uniquely defined, and prove the Fréchet-Riesz representation theorem by showing that we can take for x the unique solution of the optimization problem below: 1 (2.9) Min x X 2 x 2 X x, x X. In the sequel X is a Banach space. If A and B are subsets of X, their Minkowski sum and difference are { A + B := {a + b; a A, b B}; (2.1) A B := {a b; a A, b B}. If E R we define the product (2.11) EA := {ea; e E, a A}. If f : X Y, A X and B Y, we set { f(a) := {f(a); a A}; (2.12) f 1 (B) := {x R n ; f(x) B}. The closure of A X is the intersection of closed sets containing A, and is denoted by cl(a). The interior and boundary of A are denoted by { int(a) := {x X; y A if y is close enough to x}; (2.13) A := cl(a) \ int(a). 16

17 The segment [x, y], where x and y belong to X is (2.14) [x, y] := {αx + (1 α)y; α [, 1]}. We say that A X is convex if (2.15) [x, y] A, for all x, y in A. By {} X we denote the set having for unique element the zero of X. The positive (resp. negative) cone of R n is the set of element of R n whose all coordinates are nonnegative (resp. nonpositive), and is denoted by R n + (resp. R n ). More generally, if X is a space of real valued functions (perhaps defined only a.e.), we call positive (resp. negative) cone of X and denote by X + (resp. X ) the set of nonnegative (resp. nonpositive) (perhaps only a.e.) functions of X. We then denote by X + the dual positive cone defined by (2.16) X + := {x X ; x, x, for all x X + } Normal cones and separation of convex sets Preliminary. We say that a subset C of a vector space Y is a cone if α y C, for all α > and y C. Again in the sequel X is a Banach space. The orthogonal space to E X is the closed vector space of X defined by (2.17) E := {x X ; x, x X =, for all x E}. Remark 2.3. When X is a Hilbert space and E X we have another notion of primal orthogonal space which is (we remind that (, ) X denotes the scalar product in X): (2.18) {x X; (x, x) X =, for all x E}. There should be no confusion in our applications Minimization over a convex set. Let K be a convex subset of a Banach space X, and let f : X R be differentiable. We say that f attain a local minimum over K at x K if f( x) f(x), for all x K sufficiently close to x. Lemma 2.4. Let f attain a local minimum over K at x. Then (2.19) Df( x)(x x), for all x K. Conversely, if f is convex, (2.19) implies that f attains its (global) minimum over K at x. Proof. Let x K. For t [, 1], x t := (1 t) x + tx belongs to K, and x t x when t, so that f( x) f(x t ), for small enough t. Therefore the conclusion follows from 1 (2.2) lim t t (f(x t) f( x)) = Df( x)(x x). Conversely, if f is convex, and x K, f(x) f( x) Df( x)(x x), so that (2.19) implies f(x) f( x) Normal cones. Definition 2.5. Let K be a closed convex subset of X and x K. The normal cone to K at x is the set (2.21) N K ( x) := {x X ; x, x x, for all x K}. Its elements are called normal directions. This is a closed and convex cone. When X = R n we identify X with its dual and see normal directions as elements of R n. Exercice 2.6. Check that (i) If K is a singleton, the normal cone is X. (ii) If x int(k), the normal cone is the singleton {}. (iii) If K is a vector subspace, the normal cone coincides with the orthogonal space. (iv) If K has a smooth boundary, the normal cone at x K is of the form R + y, where y is the outward normal at x to K. 17

18 Exercice 2.7 (Product form). Let X = X X, where X and X are Banach spaces, and K = K K, with K X and K X closed and convex. Let x = ( x, x ) K with x K and x K. Check that (2.22) N K ( x) = N K ( x ) N K ( x ). Exercice 2.8. Let K = {} R n R m. Check that, for x K, we have (2.23) N K ( x) = R n {λ R m + ; λ i x n+i =, i = 1,..., m}. Definition 2.9. Let E X. Its negative polar cone is (2.24) E := {x X ; x, x, for all x E}. Its positive polar cone is E + := E. By polar cone we mean the negative one. We see that the polar cone is a closed convex cone, intersection of closed half spaces. Lemma 2.1. Let C X be a closed convex cone, and x C. Then (2.25) N C (x) = C x. Proof. Let x C x and y C. Then x, y and x, x =, so that x, y x, proving that x N C (x). Conversely, let x N C (x). For y C, since C is a convex cone, z := x + y = 2( 1 2 x + 1 2y) C, and therefore x, z x = x, y, proving that x C. Since C we also have x, x = x, x, and the converse inequality hods since x C. The conclusion follows. Exercice Let K := R n. Deduce from the previous lemma that if x K, then λ N K (x) iff λ R n + and λ i x i =, i = 1 to n Separation of sets. Definition Let A and B be two convex subsets of a Banach space X. Let x X, x. We say that x separates A and B if (2.26) x, a x, b, for all a A and b B. We start with the following geometric form of the Hahn-Banach theorem. Theorem Let A and B be two convex subsets of a Banach space X, with empty intersection. If A B has a nonempty interior, then there exists x separating A and B. Proof. (a) The result is obtained in [17, Ch. 1, Thm 1.6], assuming that either A of B is open. (b) Set E := int(b A). Since A B =, E does not contain zero. By step (a), there exists x separating and E. Now let a A and b B, and set e := b a. Then e is the limit of a sequence e k in E (take e k := (1 1/k)e + (1/k)ē, with ē E). Therefore, x, b a = lim k x, e k, as was to be proved. Remark Let X be finite dimensional, and A, B be convex subsets with empty intersection. Then there exists x separating A and B. See (the stronger result) [32, Thm 11.3] Taylor expansions. Let f : X Y, where X and Y are Banach spaces. We call directional derivatives of f at x X in direction h X the following amount, if it exists: (2.27) f f( x + th) f( x) ( x, h) := lim. t t We say that f is differentiable, or Fréchet differentiable, at x X, if there exists a linear continuous mapping X Y, denoted by Df(x) or f (x), and called derivative of f at x, such that (2.28) f(x + h) f(x) f (x)h Y = o( h X ). 18

19 In this case (2.29) f ( x, h) = Df( x)h, for all h X. When X = R n and Y = R p, we identify f (x) with the usual p n Jacobian matrix: (2.3) f ij(x) = f i(x) x j, 1 i p; 1 j n. When Y = R, and X is a Hilbert space, we denote by f(x) and call gradient of f at x the element of X associated with f (x) by the Fréchet-Riesz representation theorem, characterized by (2.31) f (x)h = ( f(x), h) X, for all h X. When X = R n, f(x) is the element of R n whose coordinates are equal to the partial derivatives of f at x, and f (x) = f(x). If Z is another Banach space and f : X Y Z, we denote by e.g. D x f(x, y) or f x (x, y) its partial derivatives. If in addition Z = R, we denote by x f(x, y) its partial gradient, and then, if X = R n and Y = R q, f xy (x, y) is identified with the n q matrix with general term f(x, y)/ x i y j. We recall the Taylor expansion with integral term: if f : X Y is (n+1) times continuously differentiable, with n, then (2.32) f(x + h) = f(x) n! Dn f(x)(h) n + 1 (1 t) n D n+1 f(x + th)(h) n+1 dt. n! For n = and 1, this gives 1 f(x + h) = f(x) + Df(x + th)hdt, (2.33) 1 f(x + h) = f(x) + Df(x)h + (1 t)d 2 f(x + th)(h) 2 dt. The Taylor expansion (2.32) may be expressed as (2.34) f(x + h) = f(x) (n + 1)! Dn+1 f(x)(h) n+1 + r n (x, h) where the remainder satisfies (i) r n (x, h) = a(x, h)(h) n+1 ; (2.35) 1 (1 t) n ( (ii) a(x, h) = D n+1 f(x + th) D n+1 f(x) ) dt. n! The above relation uses the identity (2.36) 1 (1 t) n dt = n! 1 (n + 1)! Since f : X Y is (n + 1) times continuously differentiable if follows that (2.37) r n (x, h) Y = o( h n+1 X ) when h, for any given x X. Sometimes we need more precise estimates of the remainder, as the one below: Lemma Let f : X Y be (n + 1) times continuously differentiable, D n+1 f(x) being Lipschitz with constant L. Then (2.38) r n (x, h) Y L (n + 2)! h n+2 X. 19

20 Proof. By (2.35), we have that (2.39) r n (x, h) Y L h n+2 X 1 t (1 t)n dt. n! Integrating by parts,we see that the integral is equal to ((n + 1)!) 1 1 (1 t)n+1 dt = 1/(n + 2)! The conclusion follows. Remark We do not search for optimal constants in statements such as the one above, it is enough to get the right order of magnitude. Some more accurate estimates of the remainder are based on the following notion. Definition (i) Let X and Y be Banach spaces, E X and f : E Y. The modulus of continuity of f, over E, at x E is the nondecreasing function ω f, x : R + R + {+ } defined by (2.4) ω f, x,e (ε) := sup{ f(x) f( x) ; x x ε, x E}. (ii) The modulus of continuity of f over E is (2.41) ω f,e (ε) := sup{ f(x ) f(x) ; x x ε, x, x E}. The moduli of continuity are nondecreasing functions with value at, and the restriction of f to E is continuous at x iff ω f, x,e (ε) when ε. Definition We say that f is uniformly continuous over E if ω f,e (ε) when ε. Lemma Let D n+1 f be uniformly continuous over the convex set E with modulus ω. Denote by Ω the primitive of ω with value at. Then (2.42) r n (x, h) 1 n! Ω( h ) h n. Proof. By (2.35)(ii), we have that (2.43) a(x, h) 1 n! We conclude using (2.35)(i). 1 ω(t h )dt = 1 h n! Ω( h ). Remark 2.2. Under the hypotheses of the lemma, Ω(t) = o(t), t. Therefore, (2.42) implies (2.44) r n (x, h) = o( h n+1 ). Example Let D n+1 f be Hölder continuous, i.e. there exist c H > and α (, 1] such that the modulus of continuity ω of D n+1 f satisfies ω(ε) c H ε α. Then 1 (2.46) r n (x, h) c H (1 + α)n! h n+1+α. When α = 1 we recover the order of magnitude in (2.38). We recall the link between continuity and uniform continuity. Lemma Let E be a compact subset (any open covering contains a finite covering) of a Banach space X. Then a continuous function over E is uniformly continuous over E. 1 We can obtain a more precise estimate, setting an,α := 1 (1 t)n t α dt/(n!). Then r n(x, h) c Ha n,α. On the other hand, integrating by parts we get that (2.45) a n,α = a n 1,α+1(α + 1) 1 = = a,α+n(α + 1) 1 (α + n) 1 = (α + 1) 1 (α + n + 1) 1. When D n+1 f is Lipschitz, taking α = 1, we recover (2.38). 2

21 Proof. We must prove that if x k, x k in E satisfy x k x k, then f(x k ) f(x k). If this does not hold, extracting a subsequence we may assume that lim inf k f(x k ) f(x k) >. Since any sequence in a compact metric space has a convergent subsequence, extracting if necessary a subsequence, we may assume that (x k, x k) (x, x). Since x k x k we must have x = x. A compact set being closed, x E. From the continuity of f at x, it follows that f(x k ) f(x k). We have obtained a contradiction. The conclusion follows. 2. Lagrange multipliers 2.1. Implicit function theorem. Let U, Y and Z be Banach spaces, and F : U Y Z be of class C 1. Let ( x, ȳ) U Y satisfy (2.47) F (ū, ȳ) = ; D y F (ū, ȳ) is invertible. We recall that A L(U, Y ) is said to be invertible if it is a bijection. It can then be proved that its inverse A 1 (obviously linear) is continuous (as a consequence of the celebrated open mapping theorem, see [17, Corollary 2.7]). Theorem [IFT] If (2.47) holds, there exist an open neighbourhood V of ū and a C 1 mapping ϕ : V Y and γ > such that the equation (2.48) F (u, y) = ; u V ; y ȳ Y γ holds iff y = ϕ(u). Given u V, we have that F (u, ϕ(u)) =. The derivative of this function is also equal to zero, i.e. (2.49) F u (u, ϕ(u)) + F y (u, ϕ(u))dϕ(u) =. Taking u = ū, we find that (2.5) F u (ū, ȳ) + F y (ū, ȳ)dϕ(ū) =. Since F y (ū, ȳ) is invertible, we deduce that (2.51) Dϕ(ū) = F y (ū, ȳ) 1 F u (ū, ȳ). From this expression, and since the inverse mapping A A 1 is of class C over the (open) set of invertible mappings Y Z, we easily deduce that: Corollary Under the hypotheses of theorem 2.23, if F is of class C p, p 1, then ϕ is also of class C p. If D p F is Lipschitz, then so is D p ϕ On Newton s method. Let X and Y be Banach spaces and F : X Y be of class C 1. We say that x X is a zero of F if F ( x) =, and that it is a regular zero if in addition DF ( x) is invertible. Newton s method for finding a zero of F consists in computing the Newton sequence x k in X such that (2.52) F (x k ) + DF (x k )(x k+1 x k ) =, k N, starting from some point x X. The sequence is well-defined as long as DF (x k ) is an invertible. It is easily checked that the set of linear invertible elements of L(X, Y ) is open. Since DF (x) is a continuous function of x it follows that DF (x) is inversible in a vicinity of x. It can be proved that: Theorem if x is close enough to the regular zero x, the Newton sequence is welldefined and x k x superlinearly, i.e., (2.53) x k+1 x x k x. If in addition x DF (x) is Lipschitz near x, then we have quadratic convergence, i.e. ( (2.54) x k+1 x = O x k x 2). 21

22 Proof. See e.g. [9, Ch. 6] Reduced equation. Consider a system with two (vector) equations and unknowns, such that the second variable can be eliminated from the second equation. We will compare the original system of two equations, with the reduced one obtained after elimination of the second variable. Namely, let U, Y, W, Z be Banach spaces, Φ : U Y W and Ψ : U Y Z be of class C 1. Consider the equations, in W Z: (2.55) Φ(u, y) = ; Ψ(u, y) =. Let (ū, ȳ) U Y satisfy (2.56) (ū, ȳ) is a zero of (Φ, Ψ), and D y Ψ(ū, ȳ) is invertible. By the IFT, in the vicinity of (ū, ȳ), we have that (2.57) The relation Ψ(u, y) = is equivalent to y = χ(u), for some C 1 function χ : V Y, where V is a neighbourhood of ū, such that whenever y = χ(u): D u Ψ(u, y) + D y Ψ(u, y)dχ(u) =. We will call Ψ(u, y) = the state equation. Locally, (2.55) is equivalent to the reduced equation (2.58) Ξ(u) =, where Ξ(u) := Φ(u, χ(u)). In addition (2.59) DΞ(ū) := D u Φ(ū, ȳ) + D y Φ(ū, ȳ)dχ(ū) = D u Φ(ū, ȳ) D y Φ(ū, ȳ)d y Ψ(ū, ȳ) 1 D u Ψ(ū, ȳ). Given (v, w) U W, one easily checks that { DΞ(ū)v = w iff there exists z Y such that (2.6) DΦ(ū, ȳ)(v, z) = w; DΨ(ū, ȳ)(v, z) =. Lemma The point ū is a regular zero of Ξ iff (ū, ȳ) is a regular zero of (Φ, Ψ). Proof. Consider the system { Du Φ(ū, ȳ)v + D (2.61) y Φ(ū, ȳ)z = a; D u Ψ(ū, ȳ)v + D y Ψ(ū, ȳ)z = b. Since D y Ψ(ū, ȳ) is invertible, we can eliminate z from the second equation. Using (2.59), we see that the above system is equivalent to (2.62) DΞ(ū)v = a D y Φ(ū, ȳ)d y Ψ(ū, ȳ) 1 b. The conclusion easily follows Reduced cost First order analysis. We next consider an optimization problem of the form (2.63) Min u,y f(u, y); Ψ(u, y) =. Here U, Y, Z are Banach spaces, f : U Y R and Ψ : U Y Z are of class C 1. Let (ū, ȳ) be a zero of Ψ, such that D y Ψ(ū, ȳ) is invertible. Then for (u, y) close to (ū, ȳ), the consequence (2.57) of the IFT applies. Let F (u) := f(u, χ(u)) be the reduced cost. Given a neighbourhood V of ū, we may define a a (localized) reduced problem as (2.64) Min u F (u); u V. Remark The reduced problem is locally equivalent to the original one in the sense that there exists γ > such that, if V is small enough : (2.65) inf{f (u); u V } = inf{f(u, y); Ψ(u, y) = ; u V ; y ȳ γ}. 22

23 Computation of the derivative of the reduced cost. In view of (2.57), the derivative of F at u close to ū is, writing y = χ(u): (2.66) Therefore, DF (u)v = D u f(u, y)v + D y f(u, y)dχ(u)v = D u f(u, y)v D y f(u, y)d y Ψ(u, y) 1 D u Ψ(u, y)v. (2.67) DF (u) = D u f(u, y) D y f(u, y)d y Ψ(u, y) 1 D u Ψ(u, y). However, we must pay attention to the fact that we usually represent the action of the linear form DF (u) over v with the dual notation DF (u), v. Note that the transpose of an invertible linear mapping is invertible, and that transposition and inversion commute 2, which justifies the notation A. We have that (2.68) that is, (2.69) DF (u), v = D u f(u, y), v U + D y f(u, y), Dχ(u)v Y = D u f(u, y) + Dχ(u) D y f(u, y), v U, DF (u) = D u f(u, y) + Dχ(u) D y f(u, y) = D u f(u, y) D u Ψ(u, y) D y Ψ(u, y) D y f(u, y). One must be careful of avoiding confusions between (2.67) and (2.69), which, despite an apparent difference, represent the same operator Reduction Lagrangian. In practice, one obtains the expression of DF (u) using the reduction Lagrangian, where (u, y, p) U Y Z : (2.7) L R (u, y, p) := f(u, y) + p, Ψ(u, y) Z, and the costate equation, where y = χ(u): (2.71) = D y L R (u, y, p) = D y f(u, y) + D y Ψ(u, y) p =. Note the use of the dual notation. Given (u, y) U Y, y being the state associated with u, since D y Ψ(u, y) (and hence, D y Ψ(u, y) ) is invertible, the costate equation has a unique solution p Z, called the costate associated with u: (2.72) p = D y Ψ(u, y) D y f(u, y). Observe now that (2.73) D u L R (u, y, p) = D u f(u, y) + D u Ψ(u, y) p, = D u f(u, y) D u Ψ(u, y) D y Ψ(u, y) D y f(u, y), = DF (u), where the last equality uses (2.69), and is therefore equal to if u is a local solution. We say that u U is a stationary point of F (u) if DF (u) =. We have proved the following useful rule: Lemma (i) The derivative of the reduced cost F is equal to the partial derivative w.r.t. the control of the reduction Lagrangian, computed at the control and associated state and costate. (ii) An element u of U is a stationary point of F (u) iff there exists p Z such that, y being the associated state: (2.74) D u L R (u, y, p) = ; D y L R (u, y, p) =. 2 Indeed, if A L(X, Y ) is invertible, solving A y = x, with y Y and given x X, amounts to y, Ax = x, x for all x X, or equivalently y, y = x, A 1 y = (A 1 ) x, y, for all y Y. Since y (A 1 ) x, y is linear and continuous, the existence and uniqueness of y follows, and we have that (A ) 1 x = (A 1 ) x. 23

24 Second order analysis. The costate approach is also useful for obtaining a simple expression of the Hessian (the second derivative seen as a bilinear form) of the reduced cost, assuming now that f and Ψ are C 2. The Hessian of the reduction Lagrangian (w.r.t. the control and costate variables) is the quadratic form Q : U Y R defined by (2.75) Q(v, z) := D 2 f(u, y)(v, z) 2 + p, D 2 Ψ(u, y)(v, z) 2. Here we assume that y and p are the state and costate associated with the control u. We denote by z[v] the solution of the linearized state equation (2.76) D u Ψ(u, y)v + D y Ψ(u, y)z =. Given v U it has a unique solution denoted by z[v] in Y, namely (2.77) z[v] := D y Ψ(u, y) 1 D u Ψ(u, y)v. Lemma The Hessian of the reduced cost satisfies (2.78) D 2 F (u)(v, v) = Q(v, z[v]), for all v U. Proof. Given σ, set u σ := u + σv. The associated state denoted by y σ satisfies (2.79) y σ y σz[v] Y = O(σ 2 ). So we have that (2.8) F (u σ ) = f(u σ, y σ ) = L(u σ, y σ, p) = L(u, y) + σd u L(u, y, p)v σ2 Q(v, z[v]) + o(σ 2 ), where we have used the fact that D y L(u, y, p) =. The result follows. Corollary 2.3. Let F have a local minimum at u U. v U. Then Q(v, z[v]), for all Proof. From the Taylor expansion of F it follows that, for v U and t R: (2.81) F (u + tv) = F (u) + tdf (u)v t2 D 2 F (u)(v, v) + o(t 2 ), and since DF (u) =, the local optimality of u implies F (u + tv) F (u) (2.82) lim t 1 = D 2 F (u)(v, v). 2 t2 We conclude using lemma Equality constrained optimization First order optimality conditions. Consider next an abstract optimization problem with equality constraints: (2.83) Min x f(x); g(x) =. Here g : X Y (Banach spaces) and f : X R are of class C 2. This is obviously a generalization of the previous setting, but without the possibility of reduction to an unconstrained reduced probem. Let x be a local solution of this problem, in the sense that for some ε > : (2.84) f( x) f(x), whenever g(x) = and x x X ε. We assume that x satisfies the following qualification condition (2.85) The operator Dg( x) is surjective. The Lagrangian of the problem is the function L g : X Y R defined by (2.86) L g (x, λ) := f(x) + λ, g(x) Y. Theorem There exists a unique Lagrange multiplier λ Y such that (dual notation) (2.87) = D x L g ( x, λ) := Df( x) + Dg( x) λ. 24

25 The proof is based on the following lemmas below. The starting point is the concept of metric regularity. Lemma Let (2.85) hold. Then the following metric regularity property is satisfied: there exists c g > such that, if x X is close enough to x, there exists x X such that (2.88) x x X c g g(x) Y and g(x ) =. Proof. See e.g. [9, Ch. 3]. Exercice Take X = Y = R and g(x) = x 2. Show that the metric regularity property does not hold at x =. Also, show that the problem of minimizing f(x) = x under the constraint g(x) = has solution x =, with which no Lagrange multiplier is associated. Corollary If (2.85) holds, any h Ker g ( x) is tangent to the manifold g 1 () in the sense that, setting x σ := x + σh, for σ R: (2.89) There exists x σ U such that g(x σ) = and x σ x σ = o( σ ). Proof. That h Ker g ( x) implies g(x σ ) = o(σ). We conclude with lemma Corollary If (2.85) holds, then f ( x) (Ker g ( x)). Proof. Otherwise there would exists h Ker g ( x) such that f ( x)h. Changing h into h if necessary we may assume that f ( x)h <. Define x σ as above. By the previous corollary, f(x (2.9) lim σ) f( x) = f ( x)h <, σ σ contradicting the local optimality of x. The conclusion follows. Proof of theorem By the above corollary, f ( x) (Ker g ( x)). Since g ( x) is surjective, its image is closed. Therefore, the orthogonal of its kernel coincides with the image of its transpose operator, see e.g. [17, Thm. 2.19]. So, there exists λ Y such that f ( x) + g ( x) λ =, as was to be proved. Remark In a finite dimensional setting, the orthogonal of the kernel of a linear operator always coincides with the image of its transpose. For a counterexample in a Banach space setting, let A be the injection from L 2 (, 1) (identified with its dual) into L 1 (, 1). Its image is dense, but not closed. Now A : L (, 1) L 2 (, 1) is defined, for all w L (, 1) and w(t)u(t)dt. Since A is injective its kernel is reduced to {} and the orthogonal of the kernel is L 2 (, 1). On the other hand the image of its transpose is L (, 1). u L 1 (, 1) by A w, u L 2 (,T ) = w, Au L 1 (,T ) = Second order optimality conditions. We next obtain second order necessary conditions as an easy consequence of the previous results: Proposition Let x be a qualified local solution of (2.83), with associated Lagrange multiplier λ. Then (2.91) D 2 xxl g ( x, λ)(h, h), for all h Ker g ( x). Proof. Set as before x σ := x + σh and let x σ be given by corollary Since g(x σ) =, and D x L g ( x, λ) = by (2.87), we have for σ small enough (2.92) f(x σ) f( x) = L g (x σ, λ) L g ( x, λ) = 1 2 σ2 D 2 xxl g ( x, λ)(h, h) + o(σ 2 ). Dividing by σ 2 and making σ we obtain the conclusion. Exercice Let X be a Hilbert space, x X, M L(X) symmetric and invertible, and for x X, g(x) := 1 2 (x, Mx) X 1 2. Write the first and second order optimality conditions for the problem of projecting x over g 1 () (we do not discuss the existence of a solution). Hint: Set f(x) := 1 2 x x 2. Check that (i) g(x) = Mx (deduce that any feasible point is qualified), (ii) if x is solution, then x x + λm x =, (iii) the Hessian of Lagrangian is I + λm. 25

26 Link with the reduction approach. We next establish the relation with the non reduced formulation, assuming that x = (u, y), i.e. (2.93) Min Φ(u, y); Ψ(u, y) = ; g(u, y) =. u,y We assume that D y Ψ(ū, ȳ) is invertible so that locally Ψ(u, y) = iff y = ϕ(u) where ϕ is given by the IFT. So the corresponding reduced problem is (2.94) Min u F (u); G(u) =, where (2.95) F (u) := Φ(u, ϕ(u)); G(u) := g(u, ϕ(u)); The Lagrangian functions associated with the original and reduced problems are resp. { L(u, y, p, λ) := Φ(u, y) + p, Ψ(u, y) + λ, h(u, y), (2.96) l(u, λ) := F (u) + λ, G(u). The costate equation at the point x = (ū, ȳ), with ȳ the state associated with ū, reads (2.97) = D y L(ū, ȳ, p, λ) = D y Φ(ū, ȳ) + D y Ψ(ū, ȳ) p + D y g(ū, ȳ) λ. The optimality conditions for the the original and reduced problems are resp. (assuming in the case of the original problem that the costate equation holds) { (i) = Du L(ū, ȳ, p, λ) = D u Φ(ū, ȳ) + D u Ψ(ū, ȳ) p + D u g(ū, ȳ) λ. (2.98) (ii) = D u l(ū, λ) = F (ū) + G (ū) λ. Lemma Let the costate equation (2.97) hold. Then λ satisfies (2.98)(i) iff it satisfies (2.98)(ii). In other words, the Lagrange multipliers associated with the original and reduced formulation coincide. Proof. It suffices to eliminate p thanks to the costate equation (2.97) in (2.98)(i), and to express F (ū) and G (ū) (thanks to the IFT) in (2.98)(ii). Remark 2.4. In the same way we can check that the second order necessary optimality conditions for the original and reduced problem give the same information, since (2.99) D 2 uul(ū, λ)(v, v) = D 2 (u,y) 2 L(ū, ȳ, p, λ)(v, z) 2 where (v, z) U Y satisfy DΨ(ū, ȳ)(v, z) = Calculus over L spaces. We denote by L (, T ) the set of measurable functions over (, T ). With f : R R n R m we associate the Nemitskii mapping F : L (, T ) n L (, T ) m defined by (2.1) F (y)(t) := f(t, y(t)) for a.a. t (, T ). Lemma If f is of class C p, then F is of class C p : L (, T ) n L (, T ) m, and satisfies (2.11) (D j F (y)(z) j )(t) = D j y j f(t, y(t))(z(t)) j a.e., for all j p, and so, for all y, z in L (, T ) n we have the following Taylor expansion: p 1 1 (2.12) F (y + z)(t) = f(t, y(t)) + j! Dj f(t, y(t))(z(t)) j + r(t); r y j = o( z p ). j=1 Proof. Being continuous, F is bounded over bounded sets. In addition, the compositition of a measurable mapping by a continuous one is measurable. So, F has image in L (, T ) m. In view of the Taylor expansion (2.32), the equality in (2.12) holds with (2.13) r(t) := a(t, y(t), z(t))(z(t)) p, 26

27 where the symmetric p linear form a is defined by (2.14) a(t, y, z) := 1 (1 t) p 1 (p 1)! ( D p y pf(t, y + sz) Dp y pf(t, y)) ds, and therefore (the norm of the multilinear form is defined analogously to (2.2)) (2.15) r(t) a(t, y(t), z(t)) z(t) p. Since continuous functions are uniformly continuous over compact sets, a(t, y(t), z(t)) in L (, T ) m when z in L (, T ) m. So, (2.12) holds. Writing it with p = 1, we obtain that (2.11) holds for p = 1. Let it hold for some j < p. Then for y, w, z in L (, T ) n, we have a.e.: (2.16) (D j F (y + w)(z) j )(t) = D j y j f(t, y(t) + w(t))(z(t)) j = (D j y j F (y)(z) j )(t) + D j+1 y j+1 f(t, y(t))(w(t), (z(t)) j ) + ρ(t) with ρ(t) = o( w(t) z(t) j ) uniformly in time. This proves (2.11) for j + 1, and so, the result follows by induction. Analyzing the above proof we may guess that smoothness w.r.t. time can be weakened, provided that the functions are measurable and that the property of uniformly small remainders in (2.12) holds. Definition Let g t be a family of mappings X Y parameterized by t (, T ). We say that g t is a Carathéodory mapping if t g t (x) is measurable for each x X, and x g t (x) is continuous for a.a. t. We denote by ω t,g,e the modulus of continuity of g t over E X. (ii) We say that g has a time uniform modulus of continuity over E if there exists a nondecreasing function ω g,e : R + R + {+ }, such that (2.17) ω t,g,e (ε) ω g,e (ε), for a.a. t, and lim ε ω g,e (ε) =. Lemma Let g t be a Carathéodory function with X = R n, Y = R m. Let f : [, T ] R m be measurable. Then t g t (f(t)) is measurable. Proof. (a) We start with the case when f is a simple function, say f(t) = k a kχ Ak (t), where the sum is finite, the A k are measurable subsets of [, T ], with negligible intersection, and χ Ak is the characteristic function of A k (with value 1 over A k and otherwise). Then (2.18) g t (f(t)) = g t (a k )χ Ak (t) k is measurable (being a sum of products of measurable mappings). (b) Since any measurable function f : [, T ] R m is the limit a.e. of simple functions say f j, and since g t ( ) is a.e. continuous, we have that g t (f(t)) = lim j g t (f j (t)) a.e., and the limit a.e. of measurable functions is measurable. Definition We say that f : R R n R m is uniformly quasi C p, for p N, if f(t, x) is a measurable function of time for each x, is a C p function of x for a.a. t, such that for any j p, the function ˆD j f (partial derivative j times w.r.t. x of f): (1) has a uniform (over time) modulus of continuity w.r.t. x, on bounded sets, (2) for fixed x, is an essentially bounded function of time. Remark A uniformly quasi C p function is a Carathéodory function. This is also true for ˆD j f for j p (using the fact that derivatives are limits of quotients). From the inequality f(t, x) f(t, ) + f(t, x) f(t, ) we easily deduce that a uniformly quasi C p function is bounded over bounded sets. Lemma The conclusion of lemma 2.41 still holds if we assume only that f is uniformly quasi C p. Proof. Follows from the previously discussed arguments. 27

28 3.1. Weak derivatives. 3. Optimal control setting Definition Let D(, T ; R n ) denote the set of C function with compact support in (, T ) and value in R n, and let y L 1 (, T ; R n ). We say that g L 1 (, T, R n ) is the weak derivative of y if it satisfies (2.19) y(t) ϕ(t)dt + g(t) ϕ(t)dt =, for all ϕ D(, T ; R n ). Lemma The weak derivative is unique: given y L 1 (, T ; R n ), there is at most one g L 1 (, T, R n ) satisfying (2.19). Proof. (a) It is enough to consider the scalar case n = 1. Let g and g be weak derivatives of y. Then g := g g satisfies (2.11) g(t) ϕ(t)dt =, for all ϕ D(, T ; R n ). A quick conclusion is obtained in the case when g L 2 (, T ); since D(, T ; R n ) is known to be a dense subset of L 2 (, T ), there exists a sequence ϕ k D(, T ; R n ) converging to g in L 2 (, T ), so that (2.111) g(t) 2 dt = lim k g(t) ϕ k (t)dt = which implies that g = so that the result holds. (b) In the general case, observe that if g, there exists ε > and a measurable subset A of (, T ), of positive measure, such that (changing g into g if necessary), g(t) > ε a.e. on A. It is known that there exists a compact set K A with positive measure, see [7, Ch. 1]. Obviously we can take K as a subset of [ε 1, T ε 1 ] for some ε 1 >. Set Ψ k (t) := (1 k dist K (t)) +, where dist K denotes the distance to the set K. By the dominated convergence theorem, g(t)ψ k(t)dt K g(t)dt >. So there exists k such that g(t)ψ k(t)dt >. Observe that Ψ k is continuous and (taking k large enough) with compact support in (, T ). So there exists a sequence ϕ l in D(, T ; R n ) that uniformly converges to Ψ k. By the dominated convergence teorem, (2.112) < g(t)ψ k (t)dt = lim l g(t)ϕ l (t)dt, but the r.h.s. is zero by the definition, which gives the desired contradiction. Conversely, a given weak derivative is the one of a unique function up to a constant, as shows the following result : we give a proof in the L 2 setting. (2.113) Lemma 2.49 (Du Bois-Reymond). Let y L 2 (, T, R n ) be such that Then y is constant. y(t) ϕ(t)dt =, for all ϕ D(, T ; R n ). Proof. We may assume that n = 1. If y satisfies (2.113), then so does the function y (1/T ) y(t)dt. So, we may assume that y(t)dt =. Since D(, T ) is a dense subset of L 2 (, T ), there exist a sequence ψ k in D(, T ) converging to y in L 2 (, T ), so that α k := ψ k(t)dt. Let η D(, T ) have a unit integral. Then ψ k (t) = ψ k(t) α k η(t) is another sequence in D(, T ) converging to y in L 2 (, T ), with zero integral. The primitives ϕ k (t) := (s)ds also belong to D(, T ), so that by hypothesis t ψ k (2.114) = lim y(t) ϕ k (t)dt = lim k k The conclusion follows. 28 y(t)ψ k (t)dt y 2 (t)dt.

29 For s [1, ] we define the Sobolev space (2.115) W 1,s (, T, R n ) := {y L s (, T, R n ); there exists ẏ L s (, T, R n )}, where here ẏ denotes the weak derivative of y, endowed with the norm (2.116) y W 1,s (,T,R n ) := y L s (,T,R n ) + ẏ L s (,T,R n ). One easily checks that W 1,s (, T, R n ) is a Banach space. The following is well-known, see Royden [33, Ch. 5]: Lemma 2.5. Let y L 1 (, T, R n ) and s [1, ]. Then y is the primitive of a function g in L s (, T, R n ) iff y W 1,s (, T, R n ), and then g is the weak derivative of y Controlled dynamical system and associated cost. Consider a controlled dynamical systems of the type { (i) ẏ(t) = f(t, u(t), y(t)), for a.a. t [, T ]; (2.117) (ii) y() = y. The data are the horizon or final time T >, the initial condition y R n, and the dynamics (2.118) f : R m R n R n, uniformly quasi C r, r 1, Lipschitz w.r.t. y. We call u(t) and y(t) the control and state at time t. The control and state spaces are (2.119) U := L (, T, R m ); Y := W 1, (, T, R n ). We may see (2.117)(i) as an equality in L (, T ) n. By the Cauchy-Lipschitz theorem (adapted to the uniformly quasi C r setting, with the same proof based on a fixed-point theorem) for all (u, y ) U R n, the state equation (2.117) has a unique solution in Y, denoted by y[u, y ] (or y[u] if y is fixed). In addition, if f is uniformly Lipschitz in u, by Gronwall s lemma, for some C f depending only on the Lipschitz constant of f : (2.12) y[u, (y ) ] y[u, y ] C f ( u u 1 + (y ) y ). We denote by z[v, z ], or z[v] if z =, the unique solution of the linearized state equation { (i) ż(t) = f (2.121) (t, u(t), y(t))(v(t), z(t)), for a.a. t [, T ]; (ii) z() = z. The mapping z[v, z ] is well defined and continuous U R n Y. It has a continuous extension 3 from L s (, T, R m ) R n into W 1,s (, T, R n ), for any s in [1, ]. Proposition The mapping U R n Y, (u, y ) y[u, y ], is of class C r. Proof. Let F be the mapping U Y R n to L (, T, R n ) R n, that with (u, y, y ) associates the state equation (2.117), that is, ( ) (2.122) F(u, y, y ẏ(t) f(t, u(t), y(t)), t (, T ) ) := y() y. By lemma 2.46, F is of class C r. So the conclusion will follow from the implicit function theorem, provided that the partial derivative of F w.r.t. the state is invertible. This holds iff, for any (g, e) L (, T, R n ) R n, the following variant of the linearized state equation (2.121) has a unique solution in Y: (2.123) (i) ż(t) = f (u(t), y(t))( v(t), z(t)) + g(t), for a.a. t [, T ]; (ii) z() = e, which obviously is the case. The conclusion follows. 3 Let X and Y be Banach spaces and E be a dense vector subspace of X. Let A be a linear mapping E Y, such that for some c >, Ax Y c x X, for all x E. Then there exists a unique A L(X, Y ) that extends A, i.e., such that A x = Ax for all x E, and A satisfies A c. We call A the continuous extension of A. 29

Second-order optimality conditions for state-constrained optimal control problems

Second-order optimality conditions for state-constrained optimal control problems J. Frédéric Bonnans INRIA-Saclay & CMAP, Ecole Polytechnique, France Joint work with Audrey Hermant, INRIA-Saclay & CMAP