Static and Dynamic Optimization (42111)

Size: px

Start display at page:

Download "Static and Dynamic Optimization (42111)"

Ira Blake
5 years ago
Views:

1 Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark phone: mobile: :50 Lecture : Dynamic Programming 1/

2 Outline of lecture Dynamic Programming Recap F Dynamic Programming (D) Unconstrained Constrained Reading guidance (DO: chapter, page -) 2/

3 Dynamic Optimization (D) Find a sequence u i U i, i = 0,..., N 1 which takes the system x i+1 = f i (x i,u i ) x 0 = x 0 ψ(x N ) = 0 from its initial state x 0 along a trajectory such that the performance index N 1 J = φ N [x N ]+ i=0 L i (x i,u i ) is minimized. Define the Hamiltonian function as: H i = L i (x i,u i )+λ T i+1 f i(x i,u i ) The necessary conditions are: x i+1 = f i (x i,u i ) λ T i = x i H i ( u i = arg min H i 0 T = ) H i u i U i u i with boundary conditions: x 0 = x 0 λ T N = νt x N ψ + x N φ Dynamic Optimization (C) Find a function u t U t t [0; T] R which takes the system system ẋ t = f t(x t,u t) x 0 = x 0 ψ(x T ) = 0 from its initial state x 0 along a trajectory such that the performance index T J = φ T [x T ]+ L t(x t,u t) dt 0 is minimized. Define the Hamilton function as: H t = L t(x t,u t)+λ T t f t(x t,u t) The necessary conditions are: ẋ t = f t(x t,u t) λ T t = x t H t ( u t = arg min H t 0 T = ) H t u t U t u t with boundary conditions: x 0 = x 0 λ T T = νt x T ψ + x T φ /

4 Dynamic Programming Found in Bertsekas. Kierkegaard: Life can only be understood going backwards, but it must be lived going forwards. Let us (for a start) now focus on: Discrete independent variable (normally the time). Discrete and finite state space and decision space ( X i and U i ) 4/

5 Stagecoach problem Example (Hans Ravn: Statisk og Dynamisk Optimering) Go from A to B with minimal cost. 4 x 14 9 A: London B: Esbjerg i 5/

6 Dynamic Programming 4 x 14 9 A: London B: Esbjerg N 1 J = φ N (x N )+ L i (x i,u i ) i= i Objective x i+1 = f i (x i,u i ) x 0 = x 0 Dynamics (x i,u i ) V i Constraints x N V N /

7 Dynamic Programming Let us start with the easy part - at the end (i.e. i = ) A: London B: Esbjerg Here we easily find the optimal decisions: 1 for x = u (x ) = 0 for x = 2 1 for x = 1 and the optimal costs: for x = V (x ) = for x = 2 for x = /

8 Dynamic Programming For i = 2 we have: A: London 1 B: Esbjerg the optimal decisions: 1 for x 2 = u 2 (x 2) = 1 for x 2 = 2 1 for x 2 = 1 and the optimal costs: for x 2 = V 2 (x 2 ) = 1 for x 2 = 2 14 for x 2 = /

9 Dynamic Programming For i = 1 we have: A: London 22 1 B: Esbjerg the optimal decisions: 0 for x 1 = u 1 (x 1) = 1 for x 1 = 2 0 for x 1 = 1 and the optimal costs: 21 for x 1 = V 1 (x 1 ) = 22 for x 1 = 2 2 for x 1 = 1 9/

10 Dynamic Programming Finally, for i = 0 we have: A: London B: Esbjerg the optimal decisions: u 0(x 0 ) = 0 since x 0 = 2 and the optimal cost: V 0 (x 0 ) = 2 since x 0 = 2 /

11 Dynamic Programming Let us return to i = 1 and see what we actually did A: London 22 1 B: Esbjerg We had the optimal cost to go, V 2 (x 2 ), for the next stage stored ie. for x 2 = V 2 (x 2 ) = 1 for x 2 = 2 14 for x 2 = 1 as well as the optimal decision sequence. For each possible value of x 1 (x 1 = 1,2,), we evaluated the loss for each possible decision (e.g. u 1 = 1,0,1 for x 1 = 2), and minimized wrt. u 1 : L 1 (x 1,u 1 )+V 2 (f 1 (x 1,u 1 )) because x 2 = f 1 (x 1,u 1 ) That minimization results in V 1 (x 1 ). /

12 Dynamic Programming A: London 22 1 B: Esbjerg For each possible value of x 1 (1,2,): { } V 1 (x 1 ) = min L 1 (x 1,u 1 )+V 2 (f 1 (x 1,u 1 )) u 1 (x 1,u 1 ) V 1 Or in general (for i =,2,1,0): { } V i (x i ) = min L(x i,u i )+V i+1 (f i (x i,u i )) u i (x i,u i ) V i 12/

13 Free Dynamic Programming Let us now focus on finding a sequence of decisions u i i = 0,,1,... N which takes the system x i+1 = f i (x i,u i ) x 0 = x 0 along a trajectory, such that the cost function N 1 J = φ(x N )+ L i (x i,u i )= J 0 (x 0,u N 1 0 ) i=0 is minimized i i+1 N Def.: u k i the sequence of decision from i to k. The truncated performance index (the cost to go) N 1 J i (x i,u N 1 i ) = φ(x N )+ L k (x k,u k ) k=i It is quite easy to see that J i (x i,u N 1 i ) = L i (x i,u i )+J i+1 (x i+1,u N 1 i+1 ) and that in particular J = J 0 (x 0,u N 1 0 ) J N = φ(x N ) 1/

14 Free Dynamic Programming The Bellman function (the optimal cost to go) is defined as: V i (x i ) = min u N 1 i J i (x i,u N 1 i ) and is a function of the present state, x i, and index, i. In particular V N (x N ) = φ N (x N ) Theorem The Bellman function V i, is given by the backwards recursion [ V i (x i ) = min u i L i (x i,u i )+V i+1 (x i+1 ) ] with the boundary condition V N (x N ) = φ N (x N ) Bellman equation is a functional equation, gives a sufficient condition and the overall optimimum is given as V 0 (x 0 ) = J. 14/

15 Free Dynamic Programming What about the decisions? [ u i = arg min u i L i (x i,u i )+V i+1 ( f i (x i,u i ) ) }{{} x i+1 ] = u i (x i ) A maximization problem: min max. 15/

16 Proof. By definition we have: V i (x i ) = min u N 1 i = min u N 1 i J i (x i,u N 1 i ) Since u N 1 i+1 do not affect L i we can write V i (x i ) = min u i [ ] L i (x i,u i )+J i+1 (x i+1,u N 1 i+1 ) L i (x i,u i )+ min u N 1 i+1 J i+1 (x i+1,u N 1 i+1 ) The last term is nothing but V i+1, due to the definition of the Bellman function. The boundary condition is also given by the definition of the Bellman function. 1/

17 Pause 1/

18 LQ problem Simple LQ problem: The problem is bring the system x i+1 = ax i +bu i x 0 = x 0 from the initial state along a trajectory such the performance index N 1 J = px 2 N + is minimized. i=0 qx 2 i +ru2 i In the boundary we have V N (x N ) = φ(x N ) = px 2 N Inspired of this, we will try the candidate function V i = s i x 2 i 1/

19 The Bellman equation [ V i (x i ) = min L i (x i,u i )+V i+1 (x i+1 ) u i gives in this special (scalar LQ) case: s i x 2 i = min u i [qx 2 i +ru2 i }{{} L i (x i,u i ) + s i+1 x 2 i+1 }{{} V i+1 (x i+1 ) ] ] (cut an paste) or with the state equation inserted [ s i x 2 i = min qx 2 u i +ru2 i +s i+1(ax i +bu i ) 2] i }{{} f i (x i,u i ) The minimum is obtained for u i = abs i+1 r +b 2 s i+1 x i which inserted in the recursion results in: [ s i x 2 i = q +a 2 s i+1 a2 b 2 s 2 ] i+1 r +b 2 x 2 i s i+1 19/

20 V The candidate function satisfies the Bellman equation if s i = q +a 2 s i+1 a2 b 2 s 2 i+1 r +b 2 s i+1 s N = p which is the (scalar version of the) Riccati equation. The solution: s i+1 and u i = abs i+1 r +b 2 s i+1 x i 25 V t (x) and 20 V i (x) = s i x 2 15 x t time (i) x Method: Guess the type of functionallity in V i (x) i.e. up to a number of parameter. Check if it satisfy the Bellman equation. This results in a (number of) recursion(s) for the parameter(s). 20/

21 Euler-Lagrange and Bellman [ V i (x i ) = min u i L i (x i,u i )+V i+1 (x i+1 ) ] Necessary condition (stationarity): 0 T = L i u i + V i+1 x i+1 x i+1 u i x i+1 = f i (x i,u i ) If the costate (or adjoint state) is defined as the sensitivity, i.e. as: then λ T i = V i(x i ) x i or: 0 T = L i +λ T f i i+1 u i u i 0 T = u i H i H i = L i (x i,u i )+λ T i+1 f i(x i,u i ) i.e. the stationarity condition in the the Euler-Lagrange equation. 21/

22 Euler-Lagrange and Bellman On the optimal trajectory V i (x i ) = L i (x i,u i )+V i+1(f i (x i,u i )) or if we apply the chain rule or λ T i = V i(x i ) = x i L i x i + V i+1 x i+1 f i x i = L i f i x +λt i+1 x i λ T i = x i H i λ T N = x φ N(x N ) i.e. the costate equation. 22/

23 Pause 2/

24 Constrained Dynamic Programming Constraints: u i U i x i X i System dynamics: x i+1 = f i (x i,u i ) x 0 = x 0 Performance index N 1 J = φ N (x N )+ L i (x i,u i ) Feasible state set: i=0 D i = { x i X i u i U i : f i (x i,u i ) D i+1 } Feasible decision set: D N = X N U i (x i) = { u i U i : f i (x i,u i ) D i+1 } 24/

25 Constrained Dynamic Programming Theorem Bellman equation: [ V i (x i ) = min u i U i L i (x i,u i )+V i+1 (x i+1 ) ] V N (x N ) = φ N (x N ) Feasible state set: D i = { x i X i u i U i : f i (x i,u i ) D i+1 } D N = X N Feasible decision set: Ui (x i) = { u i U i : f i (x i,u i ) D i+1 } 25/

26 Stagecoach problem II K A 2 4 B C D E 1 4 F G H I 4 J It is easy to realize that X 4 = { J } X = { K,H,I } X 2 = { E,F,G } X 1 = { B,C,D } X 0 = { A } However, since there is no path from K to J D 4 = { J } D = { H,I } D 2 = { E,F,G } D 1 = { B,C,D } D 0 = { A } 2/

27 Optimal stepping (DD). Consider the system x i+1 = x i +u i x 0 = 2 the performance index N 1 J = x 2 N + i=0 and the constraints x 2 i +u2 i with N = 4 u i { 1, 0, 1} x i { 2, 1, 0, 1, 2} 2/

28 Firstly, we establish V 4 (x 4 ) as in the following table x 4 V The combination of x and u determines the following state x 4 and consequently the V 4 (x 4 ) contribution. x 4 u x V 4 (x 4 ) u x

29 The combination of x and u also determines the instantaneous loss L = x 2 +u2. L u x If we add up the instantaneous loss and V 4 we have a tableau in which we for each possible value of x can perform the minimization in V (x ) = min u [L (x,u )+V 4 (x +u )] and determine the optimal value for the decision and the Bellman function (as function of x ). L +V 4 u x V u , ,0 2-1 Knowing V (x ) we have one of the components for i = 2. In this manner we can iterate backwards and finds:

30 L 2 +V u 2 x V 2 u L 1 +V 2 u 1 x V 1 u L 0 +V 1 u 0 x V 0 u With x 0 = 2 we can trace forward and find the input sequence 1, 1, 0, 0 which give (an optimal) performance equal. Sensitivity. Steering vs. feedback.

31 Optimal stepping (DD) II - EPC Consider the system from previous example, but with the constraints that x 4 = 1. System x i+1 = x i +u i x 0 = 2 x 4 = 1 Performance index J = x x 2 i +u2 i i=0 Constraints: u i { 1, 0, 1} x i { 2, 1, 0, 1, 2} x 4 V L +V 4 u x V u /

32 L 2 +V u 2 x V 2 u L 1 +V 2 u 1 x V 1 u L 0 +V 1 u 0 x V 0 u With x 0 = 2 we can iterate forward and find the optimal input sequence 1, 1, 0, 1 which is connected to a performance index equal 9. 2/

33 Reading guidance DO: chapter, page - /

Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111) Niels Kjølstad Poulsen Build. 303b, room 016 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark