Chapter 5. Pontryagin s Minimum Principle (Constrained OCP)

Chapter 5 Pontryagin s Minimum Principle (Constrained OCP) 1

Pontryagin s Minimum Principle Plant: (5-1) u () t U PI: (5-2) Boundary condition: The goal is to find Optimal Control. 2

Pontryagin s Minimum Principle Solution: Form Pontryagin H Function: To find the optimal control, we must minimize H w.r.t as And solve the set of 2n state and costate equations (5-3) (5-4) 3

Pontryagin s Minimum Principle With boundary condition: (5-5) Note: 1- Eq. (5-3) is valid for both constrained and unconstrained control systems. 2- The optimality conditions given in (5-3) to (5-5) are necessary condition for optimality. 3- Sufficient condition for unconstrained case is: :must be positive definite 4

Additional Results: 1. If t f is fixed and H does not depend on time t explicitly, then Hamiltonian H must be constant when evaluated along the optimal trajectory. (5-6) 2. If the final time t f is free and the Hamiltonian does not depend explicitly on time t, then the Hamiltonian must be identically zero when evaluated along the optimal trajectory; that is, (5-7) 5

Constrained Time Optimal Control (TOC) of LTI systems Plant: (5-8) x(t): n-state vector, u(t): r-input vector, Assumptions: 1. System is completely controllable; i.e, the rank of the following matrix is n: (5-9) 2. Constraints on control : (5-10) 6

Constrained Time Optimal Control (TOC) of LTI systems or component wise: (5-11) By absorbing U into the matrix B or component wise: 3. Boundary condition: t0 0 t f t f is free. (5-12) x( ) x, x( ) 0 7

Constrained Time Optimal Control (TOC) of LTI systems Problem: Find the (optimal) control u*(t) which satisfies the constraint (5-12) and drives the system (5-8) from the initial state x(t 0 ) to the origin 0 in minimum time. Solution: Step1: Performance Index Step2: Hamiltonian: (5-13) (5-14) 8

Constrained Time Optimal Control (TOC) of LTI systems Step 3: State & Costate equations: (5-15) Boundary conditions: (5-16) (t f is free. ) Step 4: Optimal Condition (by minimizing H w.r.t u(t)): (5-17) 9

Constrained Time Optimal Control (TOC) of LTI systems where (5-18) Step 5: Optimal Control * If q ( t ) 0 min T * u ( t ) q ( t ) * * q ( t ) q ( t ) u( t) 1 * If q ( t ) 0 min T * u ( t ) q ( t ) * * q ( t ) q ( t ) u( t) 1 10

Constrained Time Optimal Control (TOC) of LTI systems (5-19) ** Signum function: f o sgn f i 11

Constrained Time Optimal Control (TOC) of LTI systems In component wise: 12

Constrained Time Optimal Control (TOC) of LTI systems Step 6: types of TOC 1) Normal TOC (NTOC) Suppose during [t 0, t f ], there exists a set of times 13

Constrained Time Optimal Control (TOC) of LTI systems u*(t) is piecewise constant function with simple switching at t 1,t 2,t 3,t 4. Thus, the optimal control u * j () t switches four times (number of switchings=4) 14

Constrained Time Optimal Control (TOC) of LTI systems 2) Singular TOC (STOC) suppose during [t 0, t f ], there is one (or more) subintervals [T 1, T 2 ] such that ([T 1, T 2 ] is so-called singular interval) 15

Constrained Time Optimal Control (TOC) of Step 7: Bang-Bang Control Law: LTI systems For NTOC: Step 8: Conditions for NTOC systems: 16

Constrained Time Optimal Control (TOC) of LTI systems Step 9: Uniqueness of optimal control: if system is NTOC, then Time Optimal Control is unique. Step 10: Number of switches 17

Example: Double Integrator System: f(t) m x(t) Assume: u( t) 1 t t0, t f 1 f () t m Problem: Find the admissible optimal control that forces the system from any initial state x(0) to the origin in minimum time. mx x x 2 x x x ( t) x ( t) 1 2 x ( t) u( t) 2 f () t Solution: Let we are dealing with NTOC. Step1: Step2: 18

Example: Double Integrator Step3: Minimization of Hamiltonian: Step4: Costate solution 19

Example: Double Integrator Step5: Time-Optimal Control Sequences: that satisfy Note that sequences like can not be in the above group. Since they violate the recent Theorem. Since number of switchings=2. From the recent equation, there are four possible solutions: 20

Example: Double Integrator Step5: Time-Optimal Control Sequences: that satisfy Note that sequences like The recent Theorem. Since number of switchings=2. From the recent equation, there are four possible solutions: can not be in the above group. Since they violate 21

Example: Double Integrator Step5: Time-Optimal Control Sequences: that satisfy Note that sequences like The recent Theorem. Since number of switchings=2. can not be in the above group. Since they violate From the recent equation, there are four possible solutions: 23

Example: Double Integrator Step6: State Trajectories: where: For phase plots, we need to eliminate t If : If: where: A family of parabolas: 24

Example: Double Integrator Our aim is to drive the system from any initial state (x 1 (0),x 2 (0)) to origin (0,0) in minimum time. At t=t f : x 1 (t f )=0, x 2 (t f )=0 25

Example: Double Integrator rewriting this for any initial state x 1 =x 10,x 2 =x 20 Step 7: switch curve: In the recent figure, two curves γ +, γ - transfer any initial state (x 1,x 2 ) to origin (0,0). 26

Example: Double Integrator Switch curve: 27

Example: Double Integrator Step 8: Phase plane regions: defining the regions in which we need to apply u=+1 or u=-1 28

Example: Double Integrator Step 8: Phase plane regions: defining the regions in which we need to apply u=+1 or u=-1 29

Example: Double Integrator Step 8: Phase plane regions: defining the regions in which we need to apply u=+1 or u=-1 30

Example: Double Integrator Step 8: Phase plane regions: defining the regions in which we need to apply u=+1 or u=-1 31

Example: Double Integrator Step9: Control law: Define: 32

Example: Double Integrator Step 10 : Implementation of control law Step 11 : Minimum time 33

Chapter Dynamic Programming 6 34

Different Types Dynamic Programming (DP): Continuous DP Discrete DP 35

Continuous DP 36

Hamilton-Jacobi-Bellman(HJB) Equation HJB equation is the continuous analog the discrete DP algorithm which yields a closed loop optimal control system. Plant: PI: t f is fixed, We attempt to determine the controls that cause system to be controlled and that minimize J for all admissible u(t); Solution: Step1: Define Hamiltonian: where J * x * J x 37

Hamilton-Jacobi-Bellman(HJB) Equation Step2: Minimize H w.r.t u(t) Step3: Using the above equation, we can find H * Step4: To obtain the optimal control, the following condition (HJB equation) must be satisfied 38

Hamilton-Jacobi-Bellman(HJB) Equation Step5: Using the solution J*, from step 4 to evaluate * J x and substitute into the expression for u * (t) of step 2, to obtain the optimal control. Example: 39

Hamilton-Jacobi-Bellman(HJB) Equation Form Optimal Hamiltonian: HJB equation: With boundary condition: 40

Hamilton-Jacobi-Bellman(HJB) Equation One way to solve HJB: guess a form for the solution assume: (**) Boundary condition: From (**) Also, the optimal control become: (##) Substituting (##) into HJB: 41

Hamilton-Jacobi-Bellman(HJB) Equation Step5 : Optimal Control u * ( t) J p( t) x( t) x Also, in the step 4, as t f u * ( t) ( 5 2) x( t) Substituting in system equation: x( t) 2 x( t) 5 x( t) 2 x( t) 5 x( t) system is satble. 42

LQR Systems Using HJB Equation System: PI: Step1: Form Hamiltonian: Step2: Minimize H w.r.t u(t) 43

LQR Systems Using HJB Equation Step3: Optimal Hamiltonian Step4: HJB equation 44

LQR Systems Using HJB Equation To solve HJB, let us guess the solution where P(t) is positive definite matrix function After substituting in HJB: With boundary condition: 45

LQR Systems Using HJB Equation Step5: Optimal control Exercise: Using the above algorithm, solve the LQR problem for the following system: 46

Discrete DP DP, proposed by Bellman, is based on a simple concept called Principle of Optimality. 47

Principle of Optimality Consider a multiple decision optimization problem: C A B J AC : Cost function for the segment AC J CB : Cost function for the segment CB Optimizing cost for the entire segment AB: J AB =J AC +J CB If J AC is the optimal cost of segment AC on the entire optimal path AB, then J CB is the optimal cost of the remaining segment CB. In other words, one can break the total optimal path into smaller segments which are themselves optimal. 48

Principle of Optimality Principle of Optimality: An optimal policy has the property that whatever the previous state and decision(i.e., control), the remaining decision must constitute an optimal policy with regard to the state resulting from previous decision. For better understanding, consider a multiple decision process: For example: An aircraft routing network 49

Dynamic Programming Applied to a Routing Problem Problem: Finding the most economical rout to fly from city A to city B. In each node (state) the decision (control) is made. 50

Dynamic programming applied to a Routing Problem Let: Decision (control) Move u=+1 Up & Left (, ) u=-1 Down & Right(, ) Here we have 5 stages starting k=0 to k=n=4. It looks natural to start working backward from the final stage or point. Stage 5: k=kf=n=4; This is just the starting point, is only one city B and hence there is no cost involved. Stage 4: k=3; Two cities in this stage H I route from this stage to stage 5., we need to find the most economical H B : ( u 1),cost 2, I B : ( u 1),cost 3.

Dynamic programming applied to a Routing Problem Stage 3: k=2 from E,F,G we can fly to H,I: E B: (u=-1) total cost=2+4=6 ------------------------------------ F H B (u=+1) F I B (u=-1) ----------------------------------- G I B (u=+1) total cost=2+3=5; total cost=3+5=8; total cost=9 The optimal cost to fly is 5 Stage 2: k=1 Similar stage 3: the possible path: From C to B From D to B total cost=9 total cost=7

Dynamic programming applied to a Routing Problem Stage 1: k=0 From A to C to B From A to D to B Optimal solution: total cost=11 total cost=11 A C F H B The economical route A D F H B Minimum cost=11

Dynamic Programming On OCP Example: State and control constraints: PI: Problem: Drive the system so that the final state x(t) close to zero with minimum consuming control effort. Solution: First: Discertization: dividing interval into N equal segments, t 1 2 N 0 t 2 t N t=t 54

Dynamic Programming On OCP Let t is small enough so that the control signal can be approximated by a piecewise constant functions that changes only at instants: for t=k t: x(k t) is referred to as the kth value of x and is denoted by x(k) 55

Dynamic Programming On OCP In a similar way: Let a=0,b=1,λ=2,t=2, k=0,1 1 2 0 1 2 u(0) u(1), 56

Dynamic Programming On OCP The first step is to find the optimal policy for the last stage of operation. Trying all of the allowable control values at each of the allowable state values. The optimal control for each state value is the one which yields the trajectory having the minimum cost. To limit the required number of calculations, x,u must be quantized. 57

k=1 Dynamic Programming On OCP J 12 : cost for joining from state x(1) to x(2),, 58

k=0 Dynamic Programming On OCP J 01 : cost for joining from state x(0) to x(1) J ( x (0)), u ( x (0),0) In optimal case: * * 01 C ( x(0), u(0)) J ( x(0), u(0)) J ( x(1)), J ( x(0)) min[ J ( x(0), u(0)) J ( x(1))] * * * 02 01 12 02 01 12 59

Dynamic Programming On OCP For example let: x(0)=1.5 u (1.5, 0) 0.5, J (1.5) 1.25 * * 02 * using u (1.5, 0) at x(0) 1.5 x(1) 1 * in the above first table : u (1,1) 0.5 Optimal control sequence:= 0.5, 0.5 and minimum cost=1.25 60

Interpolation Suppose trial values of u(k): -1, -0.75, -0.5,-0.25,0, 0.25, 0.5, 0.75, 1 J ( x(1)), u ( x(1),1) Suppose x(0)=1.5 * * 12 Two points in x(1) do not coincide with the grid points (-1, -0.5, 0, 0.5, 1) By linear interpolation: 61

Interpolation Following table 62

The END Good Luck 63