Numerical Optimal Control Part 3: Function space methods

Size: px

Start display at page:

Download "Numerical Optimal Control Part 3: Function space methods"

Bernice Curtis
6 years ago
Views:

1 Numerical Optimal Control Part 3: Function space methods SADCO Summer School and Workshop on Optimal and Model Predictive Control OMPC 2013, Bayreuth Institute of Mathematics and Applied Computing Department of Aerospace Engineering Universita t der Bundeswehr Mu nchen (UniBw M) matthias.gerdts@unibw.de Fotos: nchen Magnus Manske (Panorama), Luidger (Theatinerkirche), Kurmis (Chin. Turm), Arad Mojtahedi (Olympiapark), Max-k (Deutsches Museum), Oliver Raupach (Friedensengel), Andreas Praefcke (Nationaltheater)

2 Schedule and Contents Time Topic 9:00-10:30 Introduction, overview, examples, indirect method 10:30-11:00 Coffee break 11:00-12:30 Discretization techniques, structure exploitation, calculation of gradients, extensions: sensitivity analysis, mixedinteger optimal control 12:30-14:00 Lunch break 14:00-15:30 Function space methods: Gradient and Newton type methods 15:30-16:00 Coffee break 16:00-17:30 Numerical experiments

3 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

4 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

5 Overview on Solution Methods DAE Optimal Control Problem Discretization: Approximation by finite dimensional problem Function space approach Direct Approach: Indirect approach based Indirect approach based Direct Approach: Methods for Methods for discretized on finite dimensional on infinite dimensional infinite dimensional optimization problem optimality system optimality system optimization problem Reduced approach (direct shooting) Full discretization (collocation) Methods for finite dimensional complementarity problems and Semi analytical methods (indirect method, boundary value problems) Reduced approach Full approach SQP methods variational inequalities (semismooth Newton, Josephy Newton, fixed point iteration, projection Methods for infinite dimensional complementarity problems and variational inequalities SQP methods Interior Point Methods methods) (semismooth Interior Point Methods Gradient methods Newton, Josephy Gradient methods Penalty methods Newton, fixed point Penalty methods Multiplier methods iteration, projection Multiplier methods Dynamic programming methods) Dynamic programming

6 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? What are the difficulties?

7 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem What are the difficulties?

8 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible What are the difficulties?

9 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth methods) What are the difficulties?

10 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth methods) methods can be very fast What are the difficulties?

11 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth methods) methods can be very fast What are the difficulties? detailed functional analytic background necessary (cannot be expected in an industrial context)

12 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth methods) methods can be very fast What are the difficulties? detailed functional analytic background necessary (cannot be expected in an industrial context) discretizations become necessary at lower level anyway; so, why not discretize right away?

13 Function Space Methods Paradigm Analyze and develop methods for some infinite dimensional optimization problem (e.g. an optimal control problem) of type Minimize J(z) subject to G(z) K, H(z) = 0 in the same Banach or Hilbert spaces where z, G and H live. Why is this useful? no immediate approximation error: algorithms work in the same spaces as the problem massive exploitation of structure possible subtle requirements can get lost in discretizations (e.g. smoothing operator for semismooth methods) methods can be very fast What are the difficulties? detailed functional analytic background necessary (cannot be expected in an industrial context) discretizations become necessary at lower level anyway; so, why not discretize right away? theoretical difficulties with, e.g., state constraints (multipliers are measures; how to handle them numerically?)

14 Functional Analysis Banach space: complete normed vector space (X, X )

15 Functional Analysis Banach space: complete normed vector space (X, X ) Hilbert space: complete vector space X with inner product, X X

16 Functional Analysis Banach space: complete normed vector space (X, X ) Hilbert space: complete vector space X with inner product, X X Dual space X of a vector space X is defined by X := {f : X R f linear and continuous}, f X = sup f (x) x =1 (X, X ) is a Banach space, if (X, X ) is a Banach space.

17 Functional Analysis Banach space: complete normed vector space (X, X ) Hilbert space: complete vector space X with inner product, X X Dual space X of a vector space X is defined by X := {f : X R f linear and continuous}, f X = sup f (x) x =1 (X, X ) is a Banach space, if (X, X ) is a Banach space. For Banach spaces X and Y, the Fréchet derivative of f : X Y at ˆx is a linear and continuous operator f (ˆx) : X Y with the property f (ˆx + h) f (ˆx) f (ˆx)h Y = o( h X ). The Fréchet derivative of f at ˆx in direction d is denoted by f (ˆx)d.

18 Functional Analysis Banach space: complete normed vector space (X, X ) Hilbert space: complete vector space X with inner product, X X Dual space X of a vector space X is defined by X := {f : X R f linear and continuous}, f X = sup f (x) x =1 (X, X ) is a Banach space, if (X, X ) is a Banach space. For Banach spaces X and Y, the Fréchet derivative of f : X Y at ˆx is a linear and continuous operator f (ˆx) : X Y with the property f (ˆx + h) f (ˆx) f (ˆx)h Y = o( h X ). The Fréchet derivative of f at ˆx in direction d is denoted by f (ˆx)d. The Fréchet derivative of a function f : X U Y at (ˆx, û) is given by f (ˆx, û)(x, u) = f x (ˆx)x + f u(û)u, where f x and f u are the partial derivatives of f w.r.t. x and u, respectively.

19 Functional Analysis Lebesgue spaces For 1 p the Lebesgue spaces are defined by L p (I, R n ) := {f : I R n f p < } with ( f p := ) 1/p f (t) p I f := ess sup f (t). t I (1 p < ), Properties: L 2 (I, R n ) is a Hilbert space with inner product f, g = I f (t) g(t)dt L (I, R n ) is the dual space of L 1 (I, R n ), but not vice versa L p (I, R n ) and L q (I, R n ) are dual to each other, if 1/p + 1/q = 1

20 Functional Analysis Sobolev spaces For 1 p and q N the Sobolev spaces are defined by W q,p (I, R n ) := {f : I R n f q,p < } with f q,p := 1/p q f (j) p j=0 (1 p ) Properties: W 1,2 ([t 0, t f ], R n ) is a Hilbert space with inner product f, g = f (t 0 ) g(t 0 ) + f (t) g(t) dt t 0 W 1,1 (I, R n ) is the space of absolutely continuous functions, i.e functions satisfy t f (t) = f (t 0 ) + f (τ )dτ in I = [t 0, t f ]. t 0 tf

21 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

22 Infinite Optimization Problem Throughout we restrict the discussion to the following class of optimization problems. Problem (Infinite Optimization Problem (NLP)) Given: Banach spaces (Z, Z ), (Y, Y ) mappings J : Z R, H : Z Y convex set S Z Minimize J(z) subject to z S, H(z) = 0

23 Infinite Optimization Problem Theorem (KKT Conditions) Assumptions: ẑ is a local minimum of NLP J is Fréchet-differentiable at ẑ, H is continuously Fréchet-differentiable at ẑ S is convex with non-empty interior Mangasarian-Fromowitz Constraint Qualification (MFCQ): H (ẑ) is surjective and there exists d int(s {ẑ}) with H (ẑ)(d) = 0 Then there exists a multiplier λ Y such that J (ẑ)(z ẑ) + λ (H (ẑ)(z ẑ)) 0 z S For a proof see [1, Theorems 3.1,4.1]. [1] S. Kurcyusz. On the Existence and Nonexistence of Lagrange Multipliers in Banach Spaces. Journal of Optimization Theory and Applications, 20(1):81 110, 1976.

24 Infinite Optimization Problem Lagrange function: Special cases: L(z, λ ) := J(z) + λ (H(z))

25 Infinite Optimization Problem Lagrange function: Special cases: If S = Z, then or equivalently L(z, λ ) := J(z) + λ (H(z)) J (ẑ)(z) + λ (H (ẑ)(z)) = 0 z Z L z(ẑ, λ ) = 0

26 Infinite Optimization Problem Lagrange function: Special cases: If S = Z, then or equivalently If H is not present, then L(z, λ ) := J(z) + λ (H(z)) J (ẑ)(z) + λ (H (ẑ)(z)) = 0 z Z L z(ẑ, λ ) = 0 J (ẑ)(z ẑ) 0 z S If Z is a Hilbert space, then this condition is equivalent with the nonsmooth equation ẑ = Π S (ẑ αj (ẑ) ) (α > 0, Π S projection onto S)

27 Infinite Optimization Problem Lagrange function: Special cases: If S = Z, then or equivalently If H is not present, then L(z, λ ) := J(z) + λ (H(z)) J (ẑ)(z) + λ (H (ẑ)(z)) = 0 z Z L z(ẑ, λ ) = 0 J (ẑ)(z ẑ) 0 z S If Z is a Hilbert space, then this condition is equivalent with the nonsmooth equation ẑ = Π S (ẑ αj (ẑ) ) (α > 0, Π S projection onto S) If S = Z and H is not present, then J (ẑ)(z ẑ) = 0 z S

28 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

29 Adjoint Formalism Equality constrained NLP (EQ-NLP) Minimize J(x, u) subject to Ax = Bu. A R n n, B R n m, J : R n R m R differentiable

30 Adjoint Formalism Equality constrained NLP (EQ-NLP) Minimize J(x, u) subject to Ax = Bu. A R n n, B R n m, J : R n R m R differentiable Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su S : R m R n, u x = Su is called solution operator.

31 Adjoint Formalism Equality constrained NLP (EQ-NLP) Minimize J(x, u) subject to Ax = Bu. A R n n, B R n m, J : R n R m R differentiable Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su S : R m R n, u x = Su is called solution operator. Reduced objective functional: J(x, u) = J(Su, u) =: j(u), j : R m R.

32 Adjoint Formalism Equality constrained NLP (EQ-NLP) Minimize J(x, u) subject to Ax = Bu. A R n n, B R n m, J : R n R m R differentiable Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su S : R m R n, u x = Su is called solution operator. Reduced objective functional: J(x, u) = J(Su, u) =: j(u), j : R m R. Reduced NLP (R-NLP) Minimize j(u) = J(Su, u) w.r.t. u R m.

33 Adjoint Formalism Task Compute gradient of j at a given û, for instance in a gradient based optimization method or for the evaluation of necessary optimality conditions.

34 Adjoint Formalism Task Compute gradient of j at a given û, for instance in a gradient based optimization method or for the evaluation of necessary optimality conditions. Differentiation yields j (û) = J x (Sû, û)s + J u (Sû, û) = J x (Sû, û)a 1 B + J u (Sû, û)

35 Adjoint Formalism Task Compute gradient of j at a given û, for instance in a gradient based optimization method or for the evaluation of necessary optimality conditions. Differentiation yields j (û) = J x (Sû, û)s + J u (Sû, û) = J x (Sû, û)a 1 B + J u (Sû, û) Computation of A 1 is expensive. Try to avoid it!

36 Adjoint Formalism Task Compute gradient of j at a given û, for instance in a gradient based optimization method or for the evaluation of necessary optimality conditions. Differentiation yields j (û) = J x (Sû, û)s + J u (Sû, û) = J x (Sû, û)a 1 B + J u (Sû, û) Computation of A 1 is expensive. Try to avoid it! To this end define adjoint vector λ by λ := J x (Sû, û)a 1

37 Adjoint Formalism Adjoint equation A λ = x J(Sû, û) The gradient of j at û reads j (û) = λ B + J u (Sû, û)

38 Adjoint Formalism Adjoint equation A λ = x J(Sû, û) The gradient of j at û reads j (û) = λ B + J u (Sû, û) Connection to KKT conditions: Lagrange function of EQ-NLP L(x, u, λ) := J(x, u) + λ (Bu Ax) If λ solves adjoint equation, then L x (ˆx, û, λ) = J x (ˆx, û) λ A = 0

39 Adjoint Formalism Adjoint equation A λ = x J(Sû, û) The gradient of j at û reads j (û) = λ B + J u (Sû, û) Connection to KKT conditions: Lagrange function of EQ-NLP L(x, u, λ) := J(x, u) + λ (Bu Ax) If λ solves adjoint equation, then L x (ˆx, û, λ) = J x (ˆx, û) λ A = 0 This choice of λ automatically satisfies one part of the KKT conditions.

40 Adjoint Formalism Adjoint equation A λ = x J(Sû, û) The gradient of j at û reads j (û) = λ B + J u (Sû, û) Connection to KKT conditions: Lagrange function of EQ-NLP If λ solves adjoint equation, then L(x, u, λ) := J(x, u) + λ (Bu Ax) L x (ˆx, û, λ) = J x (ˆx, û) λ A = 0 This choice of λ automatically satisfies one part of the KKT conditions. The second part 0 = L u(ˆx, û, λ) = J u(ˆx, û) + λ B = j (û) is only satisfied at a stationary point û of the reduced problem R-NLP!

41 Adjoint Formalism Let X and Y be Banach spaces and X and Y their topological dual spaces. Adjoint operator Let A : X Y be a bounded linear operator. The operator A : Y X with the property (A y )(x) = y (Ax) x X, y Y is called adjoint operator (it is a bounded linear operator). Notion: (A y, x) = (y, Ax) dual pairing Equality constrained NLP (EQ-NLP) Minimize J(x, u) subject to Ax = Bu. A : X Y, B : U Y bounded linear operators, J : X U R Fréchet differentiable, X, U, Y Banach spaces

42 Adjoint Formalism Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su with solution operator S : U X, u x = Su.

43 Adjoint Formalism Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su with solution operator S : U X, u x = Su. Reduced objective functional: J(x, u) = J(Su, u) =: j(u), j : U R.

44 Adjoint Formalism Solution operator: Let A be nonsingular. Then: Ax = Bu x = A 1 Bu =: Su with solution operator S : U X, u x = Su. Reduced objective functional: J(x, u) = J(Su, u) =: j(u), j : U R. Reduced NLP (R-NLP) Minimize j(u) = J(Su, u) w.r.t. u U.

45 Adjoint Formalism Differentiation at û in direction u yields j (û)u = J x (Sû, û)su + J u (Sû, û)u = J x (Sû, û)a 1 Bu + J u(sû, û)u

46 Adjoint Formalism Differentiation at û in direction u yields j (û)u = J x (Sû, û)su + J u (Sû, û)u = J x (Sû, û)a 1 Bu + J u(sû, û)u Computation of A 1 is expensive. Try to avoid it!

47 Adjoint Formalism Differentiation at û in direction u yields j (û)u = J x (Sû, û)su + J u (Sû, û)u = J x (Sû, û)a 1 Bu + J u(sû, û)u Computation of A 1 is expensive. Try to avoid it! Define adjoint λ Y by y := J x (Sû, û)a 1 y (Ax) }{{} = J x (Sû, û)x =(A y )x Adjoint equation A y = J x (Sû, û) (operator equation in X )

48 Adjoint Formalism Differentiation at û in direction u yields j (û)u = J x (Sû, û)su + J u (Sû, û)u = J x (Sû, û)a 1 Bu + J u(sû, û)u Computation of A 1 is expensive. Try to avoid it! Define adjoint λ Y by y := J x (Sû, û)a 1 y (Ax) }{{} = J x (Sû, û)x =(A y )x Adjoint equation A y = J x (Sû, û) (operator equation in X ) The gradient of j at û in direction u reads j (û)u = y Bu + J u (Sû, û)u

49 Adjoint Formalism Connection to KKT conditions: Lagrange function of EQ-NLP L(x, u, y ) := J(x, u) + y (Bu Ax)

50 Adjoint Formalism Connection to KKT conditions: Lagrange function of EQ-NLP L(x, u, y ) := J(x, u) + y (Bu Ax) y solves adjoint equation = L x (ˆx, û, y )x = J x (ˆx, û)x y (Ax) = J x (ˆx, û)x (A y )x = 0 This choice of y automatically satisfies one part of the KKT conditions.

51 Adjoint Formalism Connection to KKT conditions: Lagrange function of EQ-NLP L(x, u, y ) := J(x, u) + y (Bu Ax) y solves adjoint equation = L x (ˆx, û, y )x = J x (ˆx, û)x y (Ax) = J x (ˆx, û)x (A y )x = 0 This choice of y automatically satisfies one part of the KKT conditions. The second part 0 = L u(ˆx, û, y ) = J u(ˆx, û) + y B = j (û) is only satisfied at a stationary point û of the reduced problem R-NLP!

52 How to compute the adjoint operator? Theorem (Riesz) Given: Hilbert space X, inner product,, dual space X. For every f X there exists a unique f X such that f X = f X and f (x) = f, x x X. Example (linear ODE) Given: I := [0, T ], linear mapping A : W 1,2 (I, R n ) =: X Y := L 2 (I, R n ) R n, ( ) x ( ) A( )x( ) (Ax)( ) =. x(0) X and Y are Hilbert spaces. According to Riesz theorem for y Y and x X there exist (λ, σ) Y and µ X with y (h, k) = λ(t) h(t) dt + σ k (h, k) Y I x (x) = µ(0) x(0) + µ (t) x (t) dt x X I

53 How to compute the adjoint operator? Example (linear ODE, continued) We intend to identify λ, σ, and µ such that y (Ax) = (A y )(x) holds for all x X and y Y. Note, that A y X. By partial integration we obtain y (Ax) = I λ(t) ( x (t) A(t)x(t) ) dt + σ x(0)

54 How to compute the adjoint operator? Example (linear ODE, continued) We intend to identify λ, σ, and µ such that y (Ax) = (A y )(x) holds for all x X and y Y. Note, that A y X. By partial integration we obtain y (Ax) = I λ(t) ( x (t) A(t)x(t) ) dt + σ x(0) [( T ) ] T = λ(s) A(s)ds x(t) t 0 + I ( λ(t) + σ x(0) T ) λ(s) A(s)ds x (t) dt t

55 How to compute the adjoint operator? Example (linear ODE, continued) We intend to identify λ, σ, and µ such that y (Ax) = (A y )(x) holds for all x X and y Y. Note, that A y X. By partial integration we obtain y (Ax) = I λ(t) ( x (t) A(t)x(t) ) dt + σ x(0) [( T ) ] T = λ(s) A(s)ds x(t) t 0 = + σ x(0) ( T ) + λ(t) λ(s) A(s)ds x (t) dt I t ( T ) σ λ(s) A(s)ds x(0) + 0 I }{{} =µ(0) ( T λ(t) t ) λ(s) A(s)ds } {{ } =µ (t) x (t) dt

56 How to compute the adjoint operator? Example (linear ODE, continued) We intend to identify λ, σ, and µ such that y (Ax) = (A y )(x) holds for all x X and y Y. Note, that A y X. By partial integration we obtain y (Ax) = I λ(t) ( x (t) A(t)x(t) ) dt + σ x(0) [( T ) ] T = λ(s) A(s)ds x(t) t 0 = + σ x(0) ( T ) + λ(t) λ(s) A(s)ds x (t) dt I t ( T ) σ λ(s) A(s)ds x(0) + 0 I }{{} =µ(0) = (A y )(x). ( T λ(t) t ) λ(s) A(s)ds } {{ } =µ (t) x (t) dt

57 How to compute the adjoint operator? Example (linear ODE, continued) We intend to identify λ, σ, and µ such that y (Ax) = (A y )(x) holds for all x X and y Y. Note, that A y X. By partial integration we obtain y (Ax) = I λ(t) ( x (t) A(t)x(t) ) dt + σ x(0) [( T ) ] T = λ(s) A(s)ds x(t) t 0 = + σ x(0) ( T ) + λ(t) λ(s) A(s)ds x (t) dt I t ( T ) σ λ(s) A(s)ds x(0) + 0 I }{{} =µ(0) = (A y )(x). ( T λ(t) t ) λ(s) A(s)ds } {{ } =µ (t) The latter equation defines the adjoint operator since y (Ax) = (A y )(x) holds for all x X, y Y. x (t) dt

58 How to compute the adjoint equation? Example (OCP and adjoint equation) Optimal control problem: (assumption: ϕ Fréchet differentiable) ( x (t) A(t)x(t) ) ( B(t)u(t) ) Minimize J(x, u) := ϕ(x(t )) s.t. x(0) = 0 } {{ } } {{ } =A(x)(t) =B(u)(t) Adjoint equation: For every x W 1,2 (I, R n ) we have 0 = (A y )(x) J (ˆx, û)(x)

59 How to compute the adjoint equation? Example (OCP and adjoint equation) Optimal control problem: (assumption: ϕ Fréchet differentiable) ( x (t) A(t)x(t) ) ( B(t)u(t) ) Minimize J(x, u) := ϕ(x(t )) s.t. x(0) = 0 } {{ } } {{ } Adjoint equation: For every x W 1,2 (I, R n ) we have =A(x)(t) =B(u)(t) 0 = (A y )(x) J (ˆx, û)(x) ( T ) ( T ) = σ λ(s) A(s)ds x(0) + λ(t) λ(s) A(s)ds x (t) dt 0 I t ϕ (ˆx(T ))x(t )

60 How to compute the adjoint equation? Example (OCP and adjoint equation) Optimal control problem: (assumption: ϕ Fréchet differentiable) ( x (t) A(t)x(t) ) ( B(t)u(t) ) Minimize J(x, u) := ϕ(x(t )) s.t. x(0) = 0 } {{ } } {{ } Adjoint equation: For every x W 1,2 (I, R n ) we have =A(x)(t) =B(u)(t) 0 = (A y )(x) J (ˆx, û)(x) ( T ) ( T ) = σ λ(s) A(s)ds x(0) + λ(t) λ(s) A(s)ds x (t) dt 0 I t ϕ (ˆx(T ))x(t ) Application of variation lemma (DuBois-Reymond) yields and T λ(t) = λ(t ) + λ(s) A(s)ds t λ(0) = σ, λ(t ) = ϕ (ˆx(T )).

61 How to compute the adjoint equation? Example (OCP and adjoint equation, continued) Adjoint equation: λ (t) = A(t) λ(t), λ(t ) = ϕ(ˆx(t )). Gradient of reduced objective functional j(u) = J(x(u), u): j (û)(u) = y (Bu) + J u (ˆx, û)u = λ(t) B(t)u(t) dt + σ 0 I = λ(t) B(t)u(t) dt (u L 2 (I, R m )) I Stationary point: 0 = j (û)(u) = λ(t) B(t)u(t) dt u L 2 (I, R m ) I implies 0 = B(t) λ(t) a.e. in I

62 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

63 Gradient Method in Finite Dimensions Unconstrained minimization problem J : R n R continuously differentiable Minimize J(u) w.r.t. u R n

64 Gradient Method in Finite Dimensions Unconstrained minimization problem Minimize J(u) w.r.t. u R n J : R n R continuously differentiable Gradient Method for Finite Dimensional Problems (0) Let u (0) R n, β (0, 1), σ (0, 1), and k := 0.

65 Gradient Method in Finite Dimensions Unconstrained minimization problem Minimize J(u) w.r.t. u R n J : R n R continuously differentiable Gradient Method for Finite Dimensional Problems (0) Let u (0) R n, β (0, 1), σ (0, 1), and k := 0. (1) Compute d (k) := J(u (k) ).

66 Gradient Method in Finite Dimensions Unconstrained minimization problem Minimize J(u) w.r.t. u R n J : R n R continuously differentiable Gradient Method for Finite Dimensional Problems (0) Let u (0) R n, β (0, 1), σ (0, 1), and k := 0. (1) Compute d (k) := J(u (k) ). (2) If d (k) 0, STOP.

67 Gradient Method in Finite Dimensions Unconstrained minimization problem Minimize J(u) w.r.t. u R n J : R n R continuously differentiable Gradient Method for Finite Dimensional Problems (0) Let u (0) R n, β (0, 1), σ (0, 1), and k := 0. (1) Compute d (k) := J(u (k) ). (2) If d (k) 0, STOP. (3) Perform line-search: Find smallest j {0, 1, 2,...} with and set α k := β j. J(u (k) + β j d (k) ) J(u (k) ) σβ j J(u (k) ) 2 2

68 Gradient Method in Finite Dimensions Unconstrained minimization problem Minimize J(u) w.r.t. u R n J : R n R continuously differentiable Gradient Method for Finite Dimensional Problems (0) Let u (0) R n, β (0, 1), σ (0, 1), and k := 0. (1) Compute d (k) := J(u (k) ). (2) If d (k) 0, STOP. (3) Perform line-search: Find smallest j {0, 1, 2,...} with and set α k := β j. J(u (k) + β j d (k) ) J(u (k) ) σβ j J(u (k) ) 2 2 (4) Set u (k+1) := u (k) + α k d (k), k := k + 1, and go to (1).

69 Gradient Method Pro s: Con s: requires only first derivatives easy to implement global convergence achieved by Armijo linesearch extension: projected gradient method for simple constraints only linear convergence rate only unconstrained minimization (except simple bounds)

70 Gradient Method for Optimal Control Problems Numerical Optimal Control Part 3: Function space methods Optimal control problem (OCP) Given: I := [t 0, t f ], x R nx, continuously differentiable functions ϕ : R nx R, f 0 : R nx R nu R, f : R nx R nu R nx. Minimize Γ(x, u) := ϕ(x(t f )) + f 0 (x(t), u(t))dt I w.r.t. x W 1, (I, R nx ), u L (I, R nu ) subject to the constraints x (t) = f (x(t), u(t)) a.e. in I, x(t 0 ) = x. Define X := W 1, (I, R nx ) and U := L (I, R nu ).

71 Gradient Method for Optimal Control Problems Numerical Optimal Control Part 3: Function space methods Assumptions: (existence of solution operator) For every u U the initial value problem x (t) = f (x(t), u(t)) a.e. in I, x(t 0 ) = x has a unique solution x = S(u) X. The solution mapping S : U X is continuously Fréchet-differentiable. Reduced optimal control problem (R-OCP) Minimize J(u) := Γ(S(u), u) subject to u U. Gradient method requires gradient of reduced objective function J at some û U.

72 Gradient Method for Optimal Control Problems Numerical Optimal Control Part 3: Function space methods Assumptions: (existence of solution operator) For every u U the initial value problem x (t) = f (x(t), u(t)) a.e. in I, x(t 0 ) = x has a unique solution x = S(u) X. The solution mapping S : U X is continuously Fréchet-differentiable. Reduced optimal control problem (R-OCP) Minimize J(u) := Γ(S(u), u) subject to u U. Gradient method requires gradient of reduced objective function J at some û U. How does the gradient look like?

73 Computation of Gradient In R n : J(û) := J (u) = J u 1 (û). J u n (û), J (û)(u) = J(û) u = J(û), u R n R n,

74 Computation of Gradient In R n : J(û) := J (u) = J u 1 (û). J u n (û), J (û)(u) = J(û) u = J(û), u R n R n, In a Hilbert space setting, i.e J : U R, U Hilbert space: J (û) U Riesz = η(û) U : J (û)(u) = η(û), u U U Hence, J(û) := η(û) U is the gradient of J at û.

75 Computation of Gradient In R n : J(û) := J (u) = J u 1 (û). J u n (û), J (û)(u) = J(û) u = J(û), u R n R n, In a Hilbert space setting, i.e J : U R, U Hilbert space: J (û) U Riesz = η(û) U : J (û)(u) = η(û), u U U Hence, J(û) := η(û) U is the gradient of J at û. But, in our case U = L (I, R nu ) is not a Hilbert space and hence the functional J (û) U does not a priori have such a nice representation as in the above cases.

76 Computation of Gradient In R n : J(û) := J (u) = J u 1 (û). J u n (û), J (û)(u) = J(û) u = J(û), u R n R n, In a Hilbert space setting, i.e J : U R, U Hilbert space: J (û) U Riesz = η(û) U : J (û)(u) = η(û), u U U Hence, J(û) := η(û) U is the gradient of J at û. But, in our case U = L (I, R nu ) is not a Hilbert space and hence the functional J (û) U does not a priori have such a nice representation as in the above cases. How to define the gradient in this case? use formal Lagrange technique (see also metric gradient, see [1]) [1] M. Golomb and R. A. Tapia. The metric gradient in normed linear spaces. Numerische Mathematik, 20: , 1972.

77 Computation of Gradient by Formal Lagrange Technique Hamilton function: H(x, u, λ) := f 0 (x, u) + λ f (x, u) Auxiliary functional: (û, ˆx = S(û) given; λ X to be specified later) J(û) := J(û) + λ, f (ˆx, û) ˆx L 2 (I,R nx ) L2 (I,R nx )

78 Computation of Gradient by Formal Lagrange Technique Hamilton function: H(x, u, λ) := f 0 (x, u) + λ f (x, u) Auxiliary functional: (û, ˆx = S(û) given; λ X to be specified later) J(û) := J(û) + λ, f (ˆx, û) ˆx L 2 (I,R nx ) L2 (I,R nx ) = ϕ(ˆx(t f )) + H(ˆx(t), û(t), λ(t)) λ(t) ˆx (t) dt I

79 Computation of Gradient by Formal Lagrange Technique Hamilton function: H(x, u, λ) := f 0 (x, u) + λ f (x, u) Auxiliary functional: (û, ˆx = S(û) given; λ X to be specified later) J(û) := J(û) + λ, f (ˆx, û) ˆx L 2 (I,R nx ) L2 (I,R nx ) = ϕ(ˆx(t f )) + H(ˆx(t), û(t), λ(t)) λ(t) ˆx (t) dt I [ p.i. = ϕ(ˆx(t f )) λ(t) ˆx(t) ] tf + H(ˆx(t), û(t), λ(t)) + λ (t) ˆx(t) dt. t 0 I

80 Computation of Gradient by Formal Lagrange Technique Hamilton function: H(x, u, λ) := f 0 (x, u) + λ f (x, u) Auxiliary functional: (û, ˆx = S(û) given; λ X to be specified later) J(û) := J(û) + λ, f (ˆx, û) ˆx L 2 (I,R nx ) L2 (I,R nx ) = ϕ(ˆx(t f )) + H(ˆx(t), û(t), λ(t)) λ(t) ˆx (t) dt I [ p.i. = ϕ(ˆx(t f )) λ(t) ˆx(t) ] tf + H(ˆx(t), û(t), λ(t)) + λ (t) ˆx(t) dt. t 0 I Fréchet derivative: (exploit S (û)(t 0 ) = 0) J (û)(u) = ( ϕ (ˆx(t f )) λ(t f ) ) S (û)(t f ) ( + H x [t] + λ (t) ) S (û)(t) + H u [t]u(t) dt. I Computation of S (û) is expensive! Eliminate red terms:

81 Computation of Gradient by Formal Lagrange Technique Hamilton function: H(x, u, λ) := f 0 (x, u) + λ f (x, u) Auxiliary functional: (û, ˆx = S(û) given; λ X to be specified later) J(û) := J(û) + λ, f (ˆx, û) ˆx L 2 (I,R nx ) L2 (I,R nx ) = ϕ(ˆx(t f )) + H(ˆx(t), û(t), λ(t)) λ(t) ˆx (t) dt I [ p.i. = ϕ(ˆx(t f )) λ(t) ˆx(t) ] tf + H(ˆx(t), û(t), λ(t)) + λ (t) ˆx(t) dt. t 0 I Fréchet derivative: (exploit S (û)(t 0 ) = 0) J (û)(u) = ( ϕ (ˆx(t f )) λ(t f ) ) S (û)(t f ) ( + H x [t] + λ (t) ) S (û)(t) + H u [t]u(t) dt. I Computation of S (û) is expensive! Eliminate red terms: } λ (t) = H x (ˆx(t), û(t), λ(t)) λ(t f ) = ϕ (ˆx(t f )) adjoint ODE

82 Computation of Gradient Gradient form : J (û)(u) = I H u [t]u(t)dt = uh, u L 2 (I,R nu ) L2 (I,Rnu ). Theorem (steepest descent) The direction solves d(t) := 1 H u 2 H u [t] Minimize J (û)(u) w.r.t. u subject to u 2 = 1. Proof: For every u with u 2 = 1 we have by the Schwarz inequality, J (û)(u) H u 2 u 2 = H u 2 and for d we have J (û)(d) = H u 2.

83 Computation of Gradient Relation between J (û) and J (û), for a proof see [5, Chapter 8]: Theorem Let û U and ˆx = S(û) be given. Let λ satisfy the adjoint ODE λ = H x (ˆx, û, λ), λ(t f ) = ϕ (ˆx(t f )). Then: J (û)(u) = J (û)(u) u U. Owing to the theorem we may define the gradient of J at û as follows. Definition (Gradient of reduced objective functional) Let û U and ˆx = S(û) be given. Let λ satisfy the adjoint ODE. The gradient J(û) U of J at û is defined by J(û)( ) := H u (ˆx( ), û( ), λ( )).

84 Gradient Method Gradient method for R-OCP (0) Choose u (0) U, β (0, 1), σ (0, 1), and set k := 0. (1) Solve the ODE x = f (x, u (k) ), x(t 0 ) = x, and the adjoint ODE λ = H x (x, u (k), λ), λ(t f ) = ϕ (x(t f )). Denote the solution by x (k) = S(u (k) ) and λ (k). (2) If H u 2 0, STOP. (3) Set d (k) (t) := H u(x (k) (t), u (k) (t), λ (k) (t)). (4) Perform an Armijo line-search: Find smallest j {0, 1, 2,...} with and set α k := β j. J(u (k) + β j d (k) ) J(u (k) ) σβ j H u (x (k), u (k), λ (k) ) 2 2 (5) Set u (k+1) := u (k) + α k d (k), k := k + 1, and go to (1).

85 Gradient Method Theorem (Convergence) Suppose that the gradient method does not terminate. Let u be an accumulation point of the sequence {u (k) } k N generated by the gradient method and x := S(u ). Then: J(u ) 2 = 0.

86 Gradient Method Examples Example Minimize x 2 (1) subject to the constraints x 1 (t) = x 1(t) + 3u(t), x 1 (0) = 2, x 2 (t) = 1 ( x 1 (t) 2 + u(t) 2), 2 x 2 (0) = 0. Output of gradient method: (u (0) 0, β = 0.9, σ = 0.1, symplectic Euler, N = 100) k α k J(u (k) ) H u H u E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-12..

87 Gradient Method Examples Example (continued) Some iterates (red) and converged solution (blue): x1(t) state x time t [s] u(t) control u time t [s] lambda1(t) adjoint lambda time t [s]

88 Gradient Method Examples Example Minimize x 2 (1) + 2.5(x 1 (1) 1) 2 subject to the constraints x 1 (t) = u(t) 15 exp( 2t), x 1(0) = 4, x 2 (t) = 1 ( u(t) 2 + x 1 (t) 3), 2 x 2 (0) = 0. Output of gradient method: (u (0) 0, β = 0.9, σ = 0.1, symplectic Euler, N = 100) k α k J(u (k) ) H u H u E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-12..

89 Gradient Method Examples Example (continued) Some iterates (red) and converged solution (blue): 4 state x1 6 control u x1(t) 1 0 u(t) time t [s] time t [s] 15 adjoint lambda lambda1(t) time t [s]

90 Projected Gradient Method We add simple constraints u U with a convex set U U to the reduced problem. Reduced optimal control problem (R-OCP) Minimize J(u) := Γ(S(u), u) subject to u U U. Assumptions: the projection Π U : U U is easy to compute

91 Projected Gradient Method We add simple constraints u U with a convex set U U to the reduced problem. Reduced optimal control problem (R-OCP) Minimize J(u) := Γ(S(u), u) subject to u U U. Assumptions: the projection Π U : U U is easy to compute Example: For box constraints U = {u R a u b} the projection computes to a, if u < a, Π U (u) = max{a, min{b, u}} = u, if a u b, b, if u > b

92 Projected Gradient Method We add simple constraints u U with a convex set U U to the reduced problem. Reduced optimal control problem (R-OCP) Minimize J(u) := Γ(S(u), u) subject to u U U. Assumptions: the projection Π U : U U is easy to compute Example: For box constraints U = {u R a u b} the projection computes to a, if u < a, Π U (u) = max{a, min{b, u}} = u, if a u b, b, if u > b Optimality: J (û)(u û) 0 u U û = Π U (û αj (û) ) (α > 0)

93 Projected Gradient Method The projected gradient method requires a feasible initial u (0) and differs from the gradient method in one of the following two components: Version 1: In iteration k compute ũ (k) := Π U ( u (k) + d (k)) and use the direction d (k) := ũ (k) u (k) instead of d (k) in steps (4) and (5), i.e. ( J u (k) + β j (k) d ) J(u (k) ) + σβ j J (u (k) ) d (k), u (k+1) = u (k) (k) + α k d

94 Projected Gradient Method The projected gradient method requires a feasible initial u (0) and differs from the gradient method in one of the following two components: Version 1: In iteration k compute ũ (k) := Π U ( u (k) + d (k)) and use the direction d (k) := ũ (k) u (k) instead of d (k) in steps (4) and (5), i.e. ( J u (k) + β j (k) d ) J(u (k) ) + σβ j J (u (k) ) d (k), u (k+1) = u (k) (k) + α k d Version 2: In iteration k use the projection within the Armijo linesearch in step (4), i.e. ( J (Π U u (k) + β j d (k))) J(u (k) ) + σβ j J (u (k) )d (k) and set u (k+1) := Π U ( u (k) + α k d (k)) in step (5)

95 Projected Gradient Method Examples Example Minimize x 2 (1) + 2.5(x 1 (1) 1) 2 subject to the constraints x 1 (t) = u(t) 15 exp( 2t), x 1(0) = 4, x 2 (t) = 1 ( u(t) 2 + x 1 (t) 3), 2 x 2 (0) = 0, u(t) U := {u R 1 u 3}. Output of projected gradient method (version 1): (u (0) 1, β = 0.9, σ = 0.1, symplectic Euler, N = 100) k α k J(u (k) ) u (k) Π U (u (k) J (u (k) )) J (u (k) )d (k) E E E E E E E E E E E E E E E E E E E E E E E E E E E E-13..

96 Projected Gradient Method Examples Example (continued) Some iterates (red) and converged solution (blue): 4 state x1 3 control u x1(t) 1 u(t) time t [s] time t [s] lambda1(t) adjoint lambda time t [s]

97 Gradient Method for the Discretized Problem Numerical Optimal Control Part 3: Function space methods Instead of applying the function space gradient method to OCP, we could first discretize OCP and apply the standard gradient method to the discretized problem. Problem (Discretized optimal control problem (D-OCP)) Given: x R nx, grid G N, and continuously differentiable functions defined by Minimize ϕ : R nx R, f 0 : R nx R nu R, f : R nx R nu R nx, G N := {t i t i = t 0 + ih, i = 0, 1,..., N}, h = (t f t 0 )/N, N N. N 1 Γ N (x, y, u) := ϕ(x N ) + h f 0 (x i, u i ) w.r.t. x = (x 0, x 1,..., x N ) R (N+1)nx, u = (u 0,..., u N 1 ) R Nnu subject to the constraints i=0 x i+1 x i h = f (x i, u i ) i = 0,..., N 1, x 0 = x.

98 Gradient Method for the Discretized Problem Numerical Optimal Control Part 3: Function space methods Denote by S : R Nnu R (N+1)nx, u x = S(u), the solution operator, that maps the control input u to the solution x of the discrete dynamics. Problem (Reduced Discretized Problem (RD-OCP)) Minimize J N (u) := Γ N (S(u), u) w.r.t. u R Nnu.

99 Gradient Method for the Discretized Problem Numerical Optimal Control Part 3: Function space methods Auxiliary functional: (û, ˆx = S(u) given; λ R (N+1)nx to be defined later) J N (û) := ϕ(ˆx N ) + h N 1 i=0 ( H(ˆx i, û i, λ i+1 ) λ i+1 ( )) ˆxi+1 ˆx i h Derivative: (S i (û) denotes the sensitivity matrix ˆx i u ; exploit S 0 (û) = 0) J N (û) = ϕ (ˆx N )S N (û) ( ( N 1 +h H x [t i ]S i (û) + H u [t i ] û i S û i+1 (û) S λ i (û) )) i+1 h i=0 ( ) ( ) = ϕ (ˆx N ) λ N S N (û) + hh x [t 0 ] + λ 1 S 0 (û) +h N 1 i=1 ( H x [t i ] + 1 ( λ i+1 h λ i ) ) S i (û) + h N 1 i=0 H u [t i ] û i û Avoid calculation of sensitivities S i! Choose λ such that red terms vanish.

100 Gradient Method for the Discretized Problem Numerical Optimal Control Part 3: Function space methods Discrete adjoint ODE: λ i+1 λ i h = H x (ˆx i, û i, λ i+1 ), i = 0,..., N 1, λ N = ϕ (ˆx N ) Gradient of auxiliary functional: N 1 J N (û) = h i=0 H u[t i ] û i û = h ( H u [t 0] H u [t 1] H u [t N 1] ).

101 Gradient Method for the Discretized Problem Numerical Optimal Control Part 3: Function space methods Link to gradient of reduced objective functional J N : Theorem Let û R Nnu be given and let λ R (N+1)nx satisfy the discrete adjoint ODE. Then, J N (û) = J N (û). Consequence: The gradient method for RD-OCP uses in iteration k the search direction H u[t 0 ] d (k) = J N (u (k) ) = h.. H u [t N 1] This is the same search direction as in the function space gradient method, except that it is scaled by h. Slower convergence expected!

102 Gradient Method for the Discretized Problem Gradient method for RD-OCP (0) Choose u (0) R Nnu, β (0, 1), σ (0, 1), and set k := 0. (1) Solve (2) Set x i+1 x i h λ i+1 λ i h (3) If d (k) 2 0, STOP. Numerical Optimal Control Part 3: Function space methods = f (x i, u (k) i ) (i = 0,..., N 1), x 0 = x, = H x (ˆx i, û (k) i, λ i+1 ) (i = 0,..., N 1), λ N = ϕ (ˆx N ) H u [t 0] d (k) := J N (u (k) ) = h. H u [t N 1] (4) Perform an Armijo line-search: Find smallest j {0, 1, 2,...} with and set α k := β j. J N (u (k) + β j d (k) ) J N (u (k) ) σβ j d (k) 2 2 (5) Set u (k+1) := u (k) + α k d (k), k := k + 1, and go to (1).

103 Gradient Method for the Discretized Problem Example Example (compare function space equivalent example) Minimize x 2 (1) subject to the constraints Numerical Optimal Control Part 3: Function space methods x 1 (t) = x 1(t) + 3u(t), x 1 (0) = 2, x 2 (t) = 1 ( x 1 (t) 2 + u(t) 2), 2 x 2 (0) = 0. Output of gradient method for RD-OCP: (u (0) = 0, N = 100, β = 0.9, σ = 0.1) k α k J(u (k) ) H u H u E E E E E E E E E E E E E E E E E E E E-11 Observation: Since the search direction in RD-OCP is scaled by h, we need much more iterations compared to the function space gradient method, which required only 26 iterations at the same accuracy...

104 Contents Introduction Necessary Conditions Adjoint Formalism Gradient Method Gradient Method in Finite Dimensions Gradient Method for Optimal Control Problems Extensions Gradient Method for Discrete Problem Examples Lagrange-Newton Method Lagrange-Newton Method in Finite Dimensions Lagrange-Newton Method in Infinite Dimensions Application to Optimal Control Search Direction Examples Extensions

105 Lagrange-Newton Method Pro s: Con s: locally quadratic convergence rate (fast convergence) can handle nonlinear equality constraints global convergence achieved by Armijo linesearch requires second derivatives higher implementation effort compared to gradient method

106 Lagrange-Newton Method in Finite Dimensions Numerical Optimal Control Part 3: Function space methods Consider the equality constrained nonlinear optimization problem: Equality constrained optimization problem (E-NLP) Minimize J(x, u) subject to H(x, u) = 0 J : R nx R nu R, H : R nx R nu R n H twice continuously differentiable Lagrange function; L(x, u, λ) := J(x, u) + λ H(x, u) Theorem (KKT conditions) Let (ˆx, û) be a local minimum of E-NLP and let H (ˆx, û) have full rank. Then there exists a multiplier λ R n H such that (x,u) L(ˆx, û, λ) = 0.

107 Lagrange-Newton Method in Finite Dimensions Numerical Optimal Control Part 3: Function space methods Idea of Lagrange-Newton method: Apply Newton s method to the optimality system T (ˆx, û, λ) = 0 with T (x, u, λ) := ( (x,u) L(x, u, λ) H(x, u) ) = x L(x, u, λ) ul(x, u, λ) H(x, u) Newton system: T (x (k), u (k), λ (k) )d (k) = T (x (k), u (k), λ (k) ) with T (x, u, λ) = L xx (x, u, λ) L xu(x, u, λ) L ux (x, u, λ) L uu (x, u, λ) H H x (x, u) u (x, u) H x (x, u) H u (x, u) 0

108 Lagrange-Newton Method in Finite Dimensions Numerical Optimal Control Part 3: Function space methods Theorem (Nonsingularity) T (x, u, λ) is nonsingular, if the following conditions hold: L (x,u),(x,u) (x, u, λ) is positive definite on the nullspace of H (x, u), that is v L (x,u),(x,u) (x, u, λ)v > 0 v : H (x, u)v = 0. H (x, u) has full rank Note: These conditions are actually sufficient, if (x, u, λ) is a stationary point of L.

109 Lagrange-Newton Method in Infinite Dimensions Consider the equality constrained nonlinear optimization problem: Equality constrained optimization problem (E-NLP) Numerical Optimal Control Part 3: Function space methods Minimize J(x, u) subject to H(x, u) = 0 J : X U R, H : X U Λ twice continuously Fréchet differentiable, X, U, Λ Banach spaces Lagrange function; L : X U Λ R with L(x, u, λ ) := J(x, u) + λ (H(x, u)) Theorem (KKT conditions) Let (ˆx, û) be a local minimum of E-NLP and let H (ˆx, û) be surjective. Then there exists a multiplier λ Λ such that L (x,u) (ˆx, û, λ ) = 0.

110 Lagrange-Newton Method in Infinite Dimensions Numerical Optimal Control Part 3: Function space methods Idea of Lagrange-Newton method: Use Newton s method to find a zero ẑ = (ˆx, û, ˆλ ) Z := X U Λ of the operator T : Z Y defined by T (x, u, λ ) := L x (x, u, λ ) L u(x, u, λ ) H(x, u) Observe: L x (x, u, λ )x = J x (x, u)x + λ (H x (x, u)x) = J x (x, u)x + (H x (x, u) λ )x L u (x, u, λ )u = J u (x, u)u + λ (H u (x, u)u) = J u (x, u)u + (H u (x, u) λ )u where H x (x, u) : Y X and H u (x, u) : Y U denote the respective adjoint operators.

111 Local Lagrange-Newton Method Local Lagrange-Newton Method (0) Choose z (0) Z and set k := 0. (1) If T (z (k) ) Y 0, STOP. (2) Compute the search direction d (k) from T (z (k) )(d (k) ) = T (z (k) ). (3) Set z (k+1) := z (k) + d (k), k := k + 1, and go to (1).

112 Lagrange-Newton Method in Infinite Dimensions Numerical Optimal Control Part 3: Function space methods Theorem (Nonsingularity) T (x, u, λ ) is nonsingular, if the following conditions hold: L (x,u),(x,u) (x, u, λ ) is uniformly positive definite on the nullspace of H (x, u), i.e. there exists C > 0 such that v L (x,u),(x,u) (x, u, λ )v C v 2 X U v X U : H (x, u)v = 0. H (x, u) is surjective Note: These conditions are actually sufficient, if (x, u, λ) is a stationary point of L, see [1, Theorem 5.6], [2, Theorem 2.3]. [1] H. Maurer and J. Zowe. First and Second-Order Necessary and Sufficient Optimality Conditions for Infinite-Dimensional Programming Problems. Mathematical Programming, 16:98 110, [2] H. Maurer. First and Second Order Sufficient Optimality Conditions in Mathematical Programming and Optimal Control. Mathematical Programming Study, 14: , 1981.

113 Local Lagrange-Newton Method Theorem (local convergence) Let z be a zero of T. Suppose there exist constants > 0 and C > 0 such that for every z B (z ) the derivative T (z) is non-singular and T (z) 1 L(Y,Z ) C. (a) If ϕ, f 0, f, ψ are twice continuously differentiable, then there exists δ > 0 such that the local Lagrange-Newton method is well-defined for every z (0) B δ (z ) and the sequence {z (k) } k N converges superlinearly to z for every z (0) B δ (z ). (b) If the second derivatives of ϕ, f 0, f, ψ are locally Lipschitz continuous, then the convergence in (a) is quadratic. (c) If in addition to the assumption in (a), T (z (k) ) 0 for all k, then the residual values converge superlinearly: T (z (k+1) ) Y lim k T (z (k) = 0. ) Y

114 Global Lagrange-Newton Method Merit function for globalization: γ(z) := 1 2 T (z) 2 2 Globalized Lagrange-Newton Method (0) Choose z (0) Z, β (0, 1), σ (0, 1/2). (1) If γ(z (k) ) 0, STOP. (2) Compute the search direction d (k) from T (z (k) )(d (k) ) = T (z (k) ). (3) Find smallest j {0, 1, 2,...} with γ(z (k) + β j d (k) ) γ(z (k) ) + σβ j γ (z (k) )(d (k) ) and set α k := β j. (4) Set z (k+1) := z (k) + α k d (k), k := k + 1, and go to (1). Note: γ : Z R is Fréchet-differentiable with γ (z (k) )(d (k) ) = 2γ(z (k) ) = T (z (k) ) 2 2

115 Lagrange-Newton Method Application to Optimal Control Problem (Optimal control problem (OCP)) Given: I := [t 0, t f ], twice continuously differentiable functions ϕ : R nx R nx R, f 0 : R nx R nu R, f : R nx R nu R nx, ψ : R nx R nx R n ψ. Minimize J(x, u) := ϕ(x(t 0 ), x(t f )) + f 0 (x(t), u(t))dt I with respect to x X := W 1, (I, R nx ) and u U := L (I, R nu ) subject to the constraints x (t) = f (x(t), u(t)) a.e. in I, 0 = ψ(x(t 0 ), x(t f )). Remark: A partially reduced approach is possible, where x = x( ; u, x 0 ) is expressed as a function of initial value x 0 and u. The constraint ψ(x 0, x(t f ; u, x 0 )) = 0 remains. This is the function space equivalent of the direct shooting method.

116 Lagrange-Newton Method Application to Optimal Control Hamilton function: Define H(x, u, λ) := f 0 (x, u) + λ f (x, u) κ := ϕ + σ ψ Theorem (Minimum principle, KKT conditions) Let (x, u ) be a local minimizer of OCP. Let H (x, u ) be surjective. Then there exist multipliers λ W 1, (I, R nx ) and σ R n ψ with x (t) f (x (t), u (t)) = 0 λ (t) + H x (x (t), u (t), λ (t)) = 0 ψ(x (t 0 ), x (t f )) = 0 λ (t 0 ) + κ x 0 (x (t 0 ), x (t f ), σ ) = 0 λ (t f ) κ x f (x (t 0 ), x (t f ), σ ) = 0 H u(x (t), u (t), λ (t)) = 0 Root finding problem: T (z ) = 0, T : Z Y

117 Lagrange-Newton Method Application to Optimal Control Definiton of operator T : T (z)( ) := with x ( ) f (x( ), u( )) λ ( ) + H x (x( ), u( ), λ( )) ψ(x(t 0 ), x(t f )) λ(t 0 ) + κ x 0 (x(t 0 ), x(t f ), σ) λ(t f ) κ x f (x(t 0 ), x(t f ), σ) H u (x( ), u( ), λ( )) z := (x, u, λ, σ) Z := X U X R n ψ Y := L (I, R nx ) L (I, R nx ) R n ψ R nx R nx L (I, R nu )

118 Lagrange-Newton Method Computation of Search Direction Newton direction: T (z (k) )d = T (z (k) ), d = (x, u, λ, σ) Z Fréchet derivative: (evaluated at z (k) ) T (z (k) )(z) = x f x x f uu λ + H xx x + H xu u + H xλ λ ψ x 0 x(t 0 ) + ψ x f x(t f ) λ(t 0 ) + κ x 0 x 0 x(t 0 ) + κ x 0 x f x(t f ) + κ x 0 σσ λ(t f ) κ x f x 0 x(t 0 ) κ x f x f x(t f ) κ x f σσ H ux x + H uu u + H uλ λ.

119 Lagrange-Newton Method Computation of Search Direction Newton direction is equivalent to the linear DAE boundary value problem x f x 0 f u x λ H xx H xλ H xu λ = 0 H ux H uλ H uu u ( x (k) ) f ( λ (k) ) + (H x ) (H u ) with boundary conditions ψ x κ x 0 x 0 Id κ x 0 σ κ x f x 0 0 κ x f σ ψ = λ (k) (t 0 ) + (κ x 0 ) λ (k) (t f ) (κ x f ) x(0) λ(t 0 ) σ. + ψ x f 0 0 κ x 0 x f 0 0 κ x f x f Id 0 x(1) λ(t f ) σ Hence: In each Newton iteration, we need to solve the above linear BVP.

120 Lagrange-Newton Method Computation of Search Direction Theorem The differential-algebraic equation (DAE) in BVP has index one (i.e. the last equality can be solved for u), if the matrix function M(t) := H uu [t] is non-singular for almost every t [0, 1] and M(t) 1 C for some constant C and almost every t [0, 1]. If M( ) is singular: BVP contains a differential-algebraic equation of higher index, which is numerically unstable. boundary conditions may become infeasible

121 Lagrange-Newton Method Examples Example (Trolley) (x 1, z) u x 2 l z m 2 g x

122 Lagrange-Newton Method Examples Example (Trolley, continued) Dynamics: x 1 = x 3 x 2 = x 4 x 3 = x 4 = m 2 2 l3 sin(x 2 )x 2 4 m 2l 2 u + m 2 I y lx 2 4 sin(x 2) I y u m 1 m 2 l 2 m 1 I y m 2 2 l2 m 2 I y + m 2 2 l2 cos(x 2 ) 2 m l2 g cos(x 2 ) sin(x 2 ) m 1 m 2 l 2 m 1 I y m2 2l2 m 2 I y + m2 2l2 cos(x 2 ) 2 ( ) m 2 l m 2 l cos(x 2 )x 2 4 sin(x 2) cos(x 2 )u + g sin(x 2 )(m 1 + m 2 ) m 1 m 2 l 2 m 1 I y m 2 2 l2 m 2 I y + m 2 2 l2 cos(x 2 ) 2 Parameters: g = 9.81, m 1 = 0.3, m 2 = 0.5, l = 0.75, r = 0.1, I y =

123 Lagrange-Newton Method Examples Example (Trolley, continued) Minimize subject to the ODE, the initial conditions and the terminal conditions within the fixed time t f = tf u(t) 2 + 5x 4 (t) 2 dt 2 0 x 1 (0) = x 2 (0) = x 3 (0) = x 4 (0) = 0, x 1 (t f ) = 1, x 2 (t f ) = x 3 (t f ) = x 4 (t f ) = 0

124 Lagrange-Newton Method Examples Example (Trolley, continued) Output of Lagrange-Newton method: (N = 1000, Euler) k α k T (z (k) ) 2 2 d (k) E E E E E E E E E E E E-11 Iterations for different stepsizes N: mesh independence; linear CPU N CPU time [s] Iterations

125 Lagrange-Newton Method Examples Example (Trolley, continued) x4(t) x1(t) state x time t [s] state x time t [s] lambda1(t) x2(t) state x time t [s] adjoint lambda time t [s] lambda2(t) x3(t) state x time t [s] adjoint lambda time t [s] lambda3(t) adjoint lambda3 lambda4(t) adjoint lambda4 u(t) control u time t [s] time t [s] time t [s]

126 Lagrange-Newton Method Example from Chemical Engineering DAE index-1 optimal control problem: (chemical reaction of substances A, B, C, and D) Minimize tf M C (t f ) F B (t) 2 + Q(t) 2 dt 0 w.r.t. controls F B (feed rate of substance B) and cooling power Q subject to the index-1 DAE where M A = V A 1 e E 1 /T R C A C B ) M B = F B V (A 1 e E 1 /T R C A C B + A 2 e E 2 /T R C B C C ) M C = V (A 1 e E 1 /T R C A C B A 2 e E 2 /T R C B C C M D = V A 2 e E 2 /T R C B C C ) H = 20F B Q V ( A 1 e E 1 /T R C A C B 75A 2 e E 2 /T R C B C C 0 = H ( M i α i (T R T ref ) + β ( ) ) i T 2 R 2 T 2 ref i=a,b,c,d V = i=a,b,c,d M i ρ i, C i = M i /V, i = A, B, C, D. Source: V. C. Vassiliades, R. W. H. Sargent, and C. C. Pantelides. Solution of a class of multistage dynamic optimization problems. 2. Problems with path constraints. Industrial & Engineering Chemistry Research, 33: , 1994.

127 Lagrange-Newton Method Example from Chemical Engineering Lagrange-Newton method: (N = intervals) control F_B k α k T (z (k) ) 2 2 d (k) E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-05 F_B(t) Q(t) time t [s] control Q time t [s]

128 Numerical Optimal Control Part 3: Function space methods Lagrange-Newton Method Example from Chemical Engineering state M_A state M_C state M_B M_C(t) M_B(t) 8800 M_A(t) time t [s] time t [s] state M_D state H time t [s] algebraic variable T_R T_R(t) H(t) M_D(t) time t [s] time t [s] time t [s]

129 Lagrange-Newton Method Example from Chemical Engineering adjoint lambda1 adjoint lambda2 adjoint lambda lambda1(t) lambda2(t) lambda3(t) time t [s] time t [s] time t [s] adjoint lambda4 adjoint lambda5 adjoint lambda_g e-05 5e-05 lambda4(t) lambda5(t) 0-5e lambda_g(t) 0-5e time t [s] time t [s] time t [s]

130 Lagrange-Newton Method Navier-Stokes Example Optimal control problem: Minimize Numerical Optimal Control Part 3: Function space methods 1 y(t, x) y d (t, x) 2 dx dt + δ u(t, x) 2 dx dt 2 Q 2 Q w.r.t. velocity y, pressure p and control u subject to the 2D Navier-Stokes equations y t = 1 Re y (y )y p + u in Q := (0, t f ) Ω, 0 = div(y) in Q, 0 = y(0, x) for x Ω := (0, 1) (0, 1), 0 = y(t, x) for (t, x) (0, t f ) Ω. Given: desired velocity field y d (t, x 1, x 2 ) = ( q(t, x 1 )q x 2 (t, x 2 ), q(t, x 2 )q x 1 (t, x 1 )), q(t, z) = (1 z) 2 (1 cos(2πzt)) M. Gerdts and M. Kunkel. A globally convergent semi-smooth Newton method for control-state constrained DAE optimal control problems. Computational Optimization and Applications, 48(3): , 2011.

131 Lagrange-Newton Method Navier-Stokes Example Numerical Optimal Control Part 3: Function space methods Discretization by method of lines: (details omitted) Minimize subject to the index-2 DAE 1 tf y h (t) y d,h (t) 2 dt + δ tf u h (t) 2 dt ẏ h (t) = 1 Re A hy h (t) = B h y h(t), y h (0) = 0. y h (t) H h,1 y h (t). B hp h (t) + u h (t), y h (t) H h,2(n 1) 2y h (t)

132 Numerical Optimal Control Part 3: Function space methods Lagrange-Newton Method Navier-Stokes Example Pressure p at t = 0.6, t = 1.0, t = 1.4 and t = 1.967: (Parameters: tf = 2, δ = 10 5, Re = 1, N = 31, Nt = 60, nx = 2(N 1)2 = 1800, ny = (N 1)2 = 900, nu = 1800 controls)

133 Numerical Optimal Control Part 3: Function space methods Lagrange-Newton Method Navier-Stokes Example Desired flow (left), controlled flow (middle), control (right) at t = 0.6, t = 1.0, t = 1.4 and t = 1.967:

134 Lagrange-Newton Method Navier-Stokes Example Numerical Optimal Control Part 3: Function space methods Output of Lagrange-Newton method: Solve Stokes problem as initial guess: k 1 0 f 0[t]dt α k 1 T (z (k) ) e e e e-10 Solve Navier-Stokes problem: k 1 0 f 0[t]dt α k 1 T (z (k) ) e e e e e e e e-10

135 Lagrange-Newton Method Extensions Treatment of inequality constraints: sequential-quadratic programming (SQP) Idea: solve linear-quadratic optimization problems to obtain a search direction interior-point methods (IP) Idea: solve a sequence of barrier problems (equivalent: perturbation of complementarity conditions in KKT conditions) semismooth Newton methods Idea: transform complementarity conditions into equivalent (nonsmooth!) equation

136 References. [1] W. Alt. The Lagrange-Newton method for infinite dimensional optimization problems. Numerical Functional Analysis and Optimization, 11: , [2] W. Alt. Sequential Quadratic Programming in Banach Spaces. In W. Oettli and D. Pallaschke, editors, Advances in Optimization, pages , Berlin, Springer. [3] W. Alt and K. Malanowski. The Lagrange-Newton method for nonlinear optimal control problems. Computational Optimization and Applications, 2:77 100, [4] W. Alt and K. Malanowski. The Lagrange-Newton method for state constrained optimal control problems. Computational Optimization and Applications, 4: , [5] M. Gerdts. Optimal Control of ODEs and DAEs. Walter de Gruyter, Berlin/Boston, 2012, [6] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE constraints. Mathematical Modelling: Theory and Applications 23. Dordrecht: Springer. xi, 270 p., [7] K. Ito and K. Kunisch. Lagrange multiplier approach to variational problems and applications. Advances in Design and Control 15. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), [8] K. C. P. Machielsen. Numerical Solution of Optimal Control Problems with State Constraints by Sequential Quadratic Programming in Function Space. Volume 53 of CWI Tract, Centrum voor Wiskunde en Informatica, Amsterdam, [9] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005.

137 Resources Software: (available for registered users; free for academic use) OCPID-DAE1 (optimal control and parameter identification with differential-algebraic equations of index 1): sqpfiltertoolbox: SQP method for dense NLPs WORHP (Büskens/Gerdts): SQP method for sparse large-scale NLPs QPSOL: interior-point and nonsmooth Newton methods for sparse large-scale convex quadratic programs available upon request Robotics lab at UniBw M: research stays with use of lab equipment upon request KUKA youbot robot (2 arm robot on a platform); 3 scale rc cars; LEGO Mindstorms robots; quarter car testbench; quadcopter

138 More Resources Further optimal control software: CasADI, ACADO: M. Diehl et al.; NUDOCCCS: C. Büskens, University of Bremen SOCS: J. Betts, The Boeing Company, Seattle; DIRCOL: O. von Stryk, TU Darmstadt; MUSCOD-II: H.G. Bock et al., IWR Heidelberg; agbock/research/muscod.php MISER: K.L. Teo et al., Curtin University, Perth; les/miser/ PSOPT: Further optimization software: NPSOL (dense problems), SNOPT (sparse large-scale problems): Stanford Business Software; KNITRO (sparse large-scale problems): Ziena Optimization; IPOPT (sparse large-scale problems): A. Wächter: filtersqp: R. Fletcher, S. Leyffer; leyffer/solvers.html ooqp: M. Gertz, S. Wright; swright/ooqp/ qpoases: H.J. Ferreau, A. Potschka, C. Kirches; optec/software/qpoases/... Software for boundary value problems: BOUNDSCO: H. J. Oberle, University of Hamburg; COLDAE: U. Ascher; ascher/coldae.f... Links: Decision Tree for Optimization Software; CUTEr: large collection of optimization test problems; COPS: large-scale optimization test problems; more/cops/ MINTOC: testcases for mixed-integer optimal control;

139 Numerical Optimal Control Part 3: Function space methods Announcement: youbot Robotics Hackathon Description: I I I student programming contest addresses students and Phd s who like to realize projects with the KUKA youbot robot 12 particpants from 5 Universities (UniBw M, Bayreuth, TUM, TU Berlin, Maastricht)

140 Numerical Optimal Control Part 3: Function space methods Thanks for your Attention! Questions? Further information: Fotos: nchen Magnus Manske (Panorama), Luidger (Theatinerkirche), Kurmis (Chin. Turm), Arad Mojtahedi (Olympiapark), Max-k (Deutsches Museum), Oliver Raupach (Friedensengel), Andreas Praefcke (Nationaltheater)

Numerical Optimal Control Part 2: Discretization techniques, structure exploitation, calculation of gradients

Numerical Optimal Control Part 2: Discretization techniques, structure exploitation, calculation of gradients SADCO Summer School and Workshop on Optimal and Model Predictive Control OMPC 2013, Bayreuth