Dynamic Optimal Control! Robert Stengel! Robotics and Intelligent Systems MAE 345, Princeton University, 2017 Learning Objectives Examples of cost functions Necessary conditions for optimality Calculation of optimal trajectories Design of optimal feedback control laws Copyright 2017 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/mae345.html 1 J = = Integrated Effect can be a Scalar Cost J = final time! 0 final time! 0 final range! 0 Time final time J = 1)dt! 0 Fuel fuel use per kilometer)dr Financial cost of time and fuel cost per hour)dt " cost " hour + cost " liter liter kilometer dr + * ) dt -, dt 2
Cost Accumulates from Start to Finish J = final time! fuel flow rate) dt 0 3 Optimal System Regulation Cost functions that penalize state deviations over a time interval: Quadratic scalar variation 1 J = lim T!" T 1 J = lim T!" T T T T x 2 )dt < " 0 Vector variation x T x)dt < " 0 Weighted vector variation 1 J = lim x T Qx)dt T!" T < " 0 No penalty for control use Why not use infinite control? 4
Cement Kiln 5 Pulp Paper Machines Machine length: ~ 2 football fields Paper speed " 2,200 m/min = 80 mph Maintain 3-D paper quality Avoid paper breaks at all cost! Paper-Making Machine Operation https://www.youtube.com/watch?v=6bhexbaak24 6
Hazardous Waste Generated by Large Industrial Plants Cement dust Coal fly ash Metal emissions Dioxin Electroscrap and other hazardous waste Waste chemicals Ground water contamination Ancillary mining and logging issues Greenhouse gasses Need to optimize total cost - benefit of production processes including health/ environmental/regulatory cost) 7 Tradeoffs Between Performance and Control in Integrated Cost Function Trade performance against control usage Minimize a cost function that contains state and control r: relative importance of the two) 8
Dynamic Optimization:! The Optimal Control Problem min ut ) Minimize a scalar function, J, of terminal and integral costs J = min ut ) t f )! " xt f ) + L [ xt),ut) + ) ]dt, *) t o -) with respect to the control, ut), in t o,t f ), subject to a dynamic constraint dxt) dt = f[xt),ut)], xt o ) given dimx) = n x 1 dimf) = n x 1 dimu) = m x 1 9 Example of Dynamic Optimization Fuel Used, kg Optimal control profile Thrust, N Angle of Attack, deg Any deviation from optimal thrust and angle-of-attack profiles would increase total fuel used 10
t f Components of the Cost Function! L[ xt),ut) ]dt positive scalar function of two vectors t o Integral cost is a function of the state and control from start to finish L[ xt),ut) ]: Lagrangian of the cost function Terminal cost is a function of the state at the final time! " xt f ) positive scalar function of a vector 11 Components of the Cost Function Lagrangian examples L[ xt),ut) ] = 1 dm dt d dt 1 2 xt t)qxt) + ut)rut)!" Terminal cost examples xt f ) x Goal! " xt f ) = ) 1 " 2 xt ) x 2 ) f Goal *) 12
Example: Dynamic Model of Infection and Immune Response x 1 = Concentration of a pathogen, which displays antigen x 2 = Concentration of plasma cells, which are carriers and producers of antibodies x 3 = Concentration of antibodies, which recognize antigen and kill pathogen x 4 = Relative characteristic of a damaged organ [0 = healthy, 1 = dead] 13 Natural Response to Pathogen Assault No Therapy) 14
Cost Function Considers Infection, Organ Health, and Drug Usage min ut ) J = min ut ) = min u t f )! " xt f ) + L [ xt),ut) + ) ]dt, *) t o -) 1 2 s x 2 2 11 1 f + s 44 x 4 f ) + 1 t " f. q 11 x 2 1 + q 44 x 2 4 + ru 2 )dt 2 /. t o / Tradeoffs between final values, integral values over a fixed time interval, state, and control Cost function includes weighted square values of! Final concentration of the pathogen! Final health of the damaged organ 0 is good, 1 is bad)! Integral of pathogen concentration! Integral health of the damaged organ 0 is good, 1 is bad)! Integral of drug usage Drug cost may reflect physiological or financial cost 15 Necessary Conditions for Optimal Control! 16
Augment the Cost Function Must express sensitivity of the cost to the dynamic response Adjoin dynamic constraint to integrand using Lagrange multiplier, t)!same dimension as the dynamic constraint, [n x 1]!Constraint = 0 when the dynamic equation is satisfied t f J =! " xt f ) + * L [ xt),ut) ]+ T " t) f[xt),ut)] dxt) - 0 + dt )., / dt t o Optimization goal is to minimize J with respect to ut) in t o,t f ), min ut ) t f J = J* =! " x *t ) f + * L[ x *t),u*t)]+ " 0 + *T t) f[x *t),u*t)], t o dx *t) dt - ). / dt 17 min ut ) Substitute the Hamiltonian! in the Cost Function Substitute the Hamiltonian in the cost function t f J =! " xt f ) + H [ xt),ut),t) ] T t) dxt) +. ), dt * dt - t o The optimal cost, J*, is produced by the optimal histories of state, control, and Lagrange multiplier J = J* =! " x *t ) f Define Hamiltonian, H[.] H x,u,!)! Lx,u) +! T f x,u) t f dx *t) + H [ x *t),u*t), *t)] * T +. ) t), dt * dt - t o 18
Integration by Parts Scalar indefinite integral " u dv = uv! Vector definite integral " vdu u =! T t) dv =!xt)dt = dx t f "! T t) dxt) dt =! T t f d! T t) t)xt) dt t0 " xt)dt dt t 0 t f t 0 19 The Optimal Control Solution Along the optimal trajectory, the cost, J*, should be insensitive to small variations in control policy To first order,, "!J* = " x )/ - T * + 0. 1!x!u) + T!x!u)) * t=to t=t f t f," H + " u!u + " H " x + dt ) - dt +.2 *!x!u) / 3 0 12 dt = 0 t o!x!u) is arbitrary perturbation in state due to perturbation in control over the time interval, t 0,t f ). Setting!J* = 0 leads to three necessary conditions for optimality 20
Three Conditions for Optimality Individual terms should remain zero for arbitrary variations in!x t ) and!u t) Solution for Lagrange Multiplier 1) 2)!"!x T ) * = 0 t =t f! H! x + d"t dt ) = 0 in t 0,t f! " * t) in t o,t f ) 3)!H!u = 0 in t,t 0 f ) Insensitivity to Control Variation! " u * t) in t o,t f ) 21 Iterative Numerical Optimization Using Steepest-Descent Forward solution to find the state, xt) Backward solution to find the Lagrange multiplier, Steepest-descent adjustment of control history, ut) Iterate to find the optimal solution!x k t) = f[x k t),u k!1 t)], with xt o ) given ) u k!1 t) prescribed in t o,t f k = Iteration index )! t Use educated guess for ut) on first iteration 22
Numerical Optimization Using Steepest-Descent Forward solution to find the state, xt) Backward solution to find the Lagrange multiplier, Steepest-descent adjustment of control history, ut) Iterate to optimal solution )! t! k t f ) = "[x kt f )] " x ) d! k t) dt * = 0 " H x k,u k,! k, + " x T ) - /. *, +, * = 0, " L x,u) +! T k t, " x xt )=x k t ) + ut )=u k01 t ) k T Boundary condition at final time Calculated from terminal value of the state ) " f x,u) " x xt )=x k t ) ut )=u k01 t ) - / /. k T - /./ 23 Numerical Optimization Using Steepest-Descent Forward solution to find the state, xt) Backward solution to find the Lagrange multiplier,! t) Steepest-descent adjustment of control history, ut) Iterate to optimal solution u k t) = u k!1 t)! " H T u ) k = u k!1 t)! " L u xt )=x k t ) ut )=u k!1 t ) + * T k t) f u xt )=x k t ) ut )=u k!1 t )! : Steepest-descent gain 24 ) ) T
Optimal Treatment of an Infection! 25 Dynamic Model for the Infection Treatment Problem dxt) dt = f[xt),ut)], xt o ) given Nonlinear Dynamics of Innate Immune Response and Drug Effect!x 1 = a 11! a 12 x 3 )x 1 + b 1 u 1!x 2 = a 21 x 4 )a 22 x 1 x 3! a 23 x 2! x 2 *) + b 2 u 2!x 3 = a 31 x 2! a 32 + a 33 x 1 )x 3 + b 3 u 3!x 4 = a 41 x 1! a 42 x 4 + b 4 u 4 26
Optimal Treatment with Four Drugs separately) Unit Cost Weights J = 1 2 s 11x 2 2 1 f + s 44 x 4 f ) + 1 2 t f! t o q 11 x 2 1 + q 44 x 2 4 + ru 2 )dt 27 Increased Cost of Drug Use Control Cost Weight = 100 28
Accounting for Uncertainty in Initial Condition! 29 Account for Uncertainty in Initial Condition and Unknown Disturbances Nominal, Open-Loop Optimal Control u*t) = u opt t) Neighboring-Optimal Feedback) Control ut) = u*t) +!ut)!ut) = "Ct)[ xt) " x *t)] 30
Neighboring-Optimal Control Linearize dynamic equation!xt) =!x *t) +!!xt) = f{[x *t) +!xt)],[u*t) +!ut)]]} " f[x *t),u*t)]+ Ft)!xt) + Gt)!ut) Nominal optimal control history Optimal perturbation control Sum the two for neighboringoptimal control u * t) = u opt t)!ut) = "Ct) xt) " x opt t) ut) = u opt t) +!ut) 31 Optimal Feedback Gain, Ct) Solution of Euler-Lagrange equations for! Linear dynamic system! Quadratic cost function leads to linear, time-varying LTV) optimal feedback controller!u * t) = "C * t)!xt) where C*t) = R!1 G T t)st)!st) =!F T t)st)! St)Ft) + St)Gt)R!1 G T t)st)! Q Matrix Riccati equation see Supplemental Material for derivation) St f ) = S f 32
50 Increased Initial Infection and Scalar Neighboring-Optimal Control u 1 ) ut) = u*t)! Ct)"xt) = u*t)! Ct)[xt)! x *t)] 33 Optimal, Constant Gain Feedback Control for Linear, Time-Invariant Systems! 34
Linear-Quadratic LQ) Optimal Control Law!!xt) = F t)!x t) + G t)!u t)!!xt) = F t)!x t) + G t)!u C t) " C* t )!x t) 35 Optimal Control for! Linear, Time-Invariant Dynamic Process Original system is linear and time-invariant LTI)!!xt) = F!xt) + G!ut),!x 0) given Minimize quadratic cost function for t f -> Terminal cost is of no concern 1 min J = J* = lim u t f!" 2 t f x * T t)qx *t) + u* T t)ru*t) dt 0 Dynamic constraint is the linear, time-invariant LTI) plant 36
Linear-Quadratic LQ) Optimal Control for LTI System, and!s *0)! t f!" 0 Steady-state solution of the matrix Riccati equation = Algebraic Riccati Equation!F T S *!S * F + S * G * R!1 G T S *!Q = 0 Optimal control gain matrix C* = R!1 G T S* m " n) = m " m ) m " n) n " n) Optimal control!ut) = "C*! xt) t f! " MATLAB function: lqr 37 Example: Open-Loop Stable and Unstable Second-Order System Response to Initial Condition Stable Eigenvalues = 0.5000 + 3.9686i 0.5000-3.9686i Unstable Eigenvalues = +0.2500 + 3.9922i +0.2500-3.9922i 38
Example: LQ Regulator Stabilizes Unstable System, r = 1 and 100 min!u J = min!u 1 2 " 0!x 2 1 +!x 2 2 + r!u 2 )dt )!ut) = " c 1 c 2!x 1 t)!x 2 t) = "c 1!x 1 t) " c 2!x 2 t) r = 1 Control Gain C*) = 0.2620 1.0857 r = 100 Control Gain C*) = 0.0028 0.1726 Riccati Matrix S*) = 2.2001 0.0291 0.0291 0.1206 Riccati Matrix S*) = 30.7261 0.0312 0.0312 1.9183 Closed-Loop Eigenvalues = 6.4061 2.8656 Closed-Loop Eigenvalues = 0.5269 + 3.9683j 0.5269-3.9683j 39 Example: LQ Regulator Stabilizes Unstable System, r = 1 and 100 40
Requirements for Guaranteeing Stability of the LQ Regulator!!xt) = F!xt) + G!ut) = [ F " GC]!xt) Closed-loop system is stable whether or not open-loop system is stable if... Q > 0 R > 0... and F,G) is a controllable pair Rank " G FG! F n!1 G = n 41 Next Time:! Formal Logic, Algorithms, and Incompleteness! 42
Supplementary Material! 43 Linearized Model of Infection Dynamics Locally linearized time-varying) dynamic equation "!!x 1!!x 2!!x 3!!x 4 " = a 11 a 12 x 3 *) 0 a 12 x 1 * 0 a 21 x 4 *)a 22 x 3 * a 23 a 21 x 4 *)a 22 x 1 * ) a 21 ) x 4 a 22 x 1 * x 3 * a 33 x 3 * a 31 a 31 x 1 * 0 a 41 0 0 a 42 " b 1 0 0 0 0 b 2 0 0 + 0 0 b 3 0 0 0 0 b 4 "!u 1!u 2!u 3!u 4 " +!w 1!w 2!w 3!w 4 "!x 1!x 2!x 3!x 4 44
Expand Optimal Control Function Expand optimized cost function to second degree { }! J [ x *t o ) +!xt o )]," x *t f ) +!xt f ) J * " x *t o ),x *t f ) +!J "!xt ),!xt ) o f +!2 J "!xt o ),!xt f ) = J *! " x *t o ),x *t f ) + 2 J! "xt o ),xt f ) as First Variation, J! "xt o ),xt f ) = 0 Nominal optimized cost, plus nonlinear dynamic constraint t f t o [ ] J *! " x *t o ),x *t f ) =! " x *t ) f + L x *t),u*t) dt subject to nonlinear dynamic equation!x *t) = f [ x *t),u*t)], xt o ) = x o 45 Second Variation of the Cost Function Objective: Minimize second-variational cost subject to linear dynamic constraint min!u!2 J = 1 2!xT t f )" xx t f )!xt f ) + 1 * t f, L!x T t)!u T t) xx t) L xu t)!xt)., + 2 ) dt / L ux t) L uu t)!ut) t -, o 0, subject to perturbation dynamics!!xt) = Ft)!xt) + Gt)!ut),!xt o ) =!x o Cost weighting matrices expressed as Qt) M T t) St f )!! xx t f ) = "2! "x 2 t f ) Mt) Rt)! L t) L t) xx xu L ux t) L uu t) dim! " St f ) = dim Qt) dim Rt) [ ] = m m [ ] = n m dim Mt) [ ] = n n 46
Second Variational Hamiltonian Variational cost function! 2 J = 1 2!xT t f )St f )!xt f ) + 1 ) t f + * " 2 t,+ o!x T t)!u T t) " Qt) M T t) Mt) Rt) "!xt)!ut) - dt +. /+ = 1 2!xT t f )St f )!xt f ) + 1 ) 2 *) t f t o + "!x T t)qt)!xt) + 2!x T t)mt)!ut) +!u T ) t)rt)!ut) dt, -) Variational Lagrangian plus adjoined dynamic constraint H!xt),!ut),!" t) = L [!xt),!ut)]+!" T t)f [!xt),!ut)] = 1 2!xT t)qt)!xt) + 2!x T t)mt)!ut) +!u T t)rt)!ut) +!" T t) [ Ft)!xt) + Gt)!ut) ] 47!! " t ) = Second Variational Euler- Lagrange Equations H = 1 2 "!xt t)qt)!xt) + 2!x T t)mt)!ut) +!u T t)rt)!ut)!" t f +! T t) [ Ft)!xt) + Gt)!ut) ] Terminal condition, solution for adjoint vector, and optimality condition ) = xx t f )!xt f ) = St f )!xt f ) ) ) + H!xt),!ut),!" t -+ *.,+ x /+ ) *!H "xt),"ut)," t,* ) - +*!u.* T T = Qt)!xt) Mt)!ut) F T t)!" t) = M T t)"xt) + Rt)"ut) / G T t)" t) = 0 48
Use Control Law to Solve the Two- Point Boundary-Value Problem From H u = 0!ut) = "R "1 t) M T t)!xt) + G T t)! t) Substitute for control in system and adjoint equations Two-point boundary-value problem!!xt) )!! " t * Ft) ) Gt)R )1 t)m T t) )Gt)R )1 t)g T t) =, + )Qt) + Mt)R )1 t)m T t) ) Ft) ) Gt)R )1 t)m T T, t) -., /, 0!xt) )!" t Boundary conditions at initial and final times!xt o ) )!" t f =!x o S f!x f Perturbation state vector Perturbation adjoint vector 49 Use Control Law to Solve the Two- Point Boundary-Value Problem Suppose that the terminal adjoint relationship applies over the entire interval!" t) = S t)!x t) Feedback control law becomes!ut) = "R "1 t) M T t)!xt) + G T t)s t)!x t) ) ) = "R "1 t) M T t) + G T t)s t!x t! "Ct)!x t) dim C) = m! n 50
Linear-Quadratic LQ) Optimal Control Gain Matrix!ut) = "Ct)!x t) Optimal feedback gain matrix Ct) = R!1 t)" G T t)s t) + M T t) Properties of feedback gain matrix! Full state feedback m x n)! Time-varying matrix! R, G, and M given Control weighting matrix, R State-control weighting matrix, M Control effect matrix, G! St) remains to be determined 51 Solution for the Adjoining Matrix, St) Time-derivative of adjoint vector!"! t) =!S t)!x t) + S t)!!x t) Rearrange!S t)!x t) =!"! t) S t)!!x t) Recall!!xt) )!! " t * Ft) ) Gt)R )1 t)m T t) )Gt)R )1 t)g T t) =, + )Qt) + Mt)R )1 t)m T t) ) Ft) ) Gt)R )1 t)m T T, t) -., /, 0!xt) )!" t 52
Solution for the Adjoining Matrix, St)!S t)!x t) = "Qt) + Mt)R "1 t)m T t)!x t) " Ft) " Gt)R "1 t)m T T t)! t ) " S t Substitute {!x t) " Gt)R "1 t)g T t)! t) } ) Ft) " Gt)R "1 t)m T t) Substitute!" t) = S t)!x t)!s t)!x t) = "Qt) + Mt)R "1 t)m T t)!x t) " Ft) " Gt)R "1 t)m T T t) S t)!x t) " S t {!x t) " Gt)R "1 t)g T t)s t)!x t) } ) Ft) " Gt)R "1 t)m T t) xt) can be eliminated 53 Matrix Riccati Equation for St) The result is a nonlinear, ordinary differential equation for St), with terminal boundary conditions!s t) = "!Qt) + Mt)R!1 t)m T t)! Ft)! Gt)R!1 t)m T T " t) S t)!s t) " Ft)! Gt)R!1 t)m T t) + S t)gt)r!1 t)g T t)s t) ) = xx t f ) S t f Characteristics of the Riccati matrix, St)! St f ) is symmetric, n x n, and typically positive semi-definite! Matrix Riccati equation is symmetric! Therefore, St) is symmetric and positive semi-definite throughout Once St) has been determined, optimal feedback control gain matrix, Ct) can be calculated 54
Neighboring-Optimal LQ) Feedback Control Law Full state is fed back to all available controls!ut) = "R "1 t) M T t) + G T t)s t)!x t) = "Ct)!x t) Optimal control history plus feedback correction ut) = u *t)! Ct)"x t ) = u *t)! Ct) x t)! x * t) 55 ut) = u *t)! Ct)"x t Nonlinear System with Neighboring-Optimal Feedback Control Nonlinear dynamic system!xt) = f!" x t),ut) Neighboring-optimal control law ) = u * t)! Ct) x t)! x * t) Nonlinear dynamic system with neighboring-optimal feedback control { }!xt) = f x t), " u * t)! Ct) " x t)! x * t) 56
Example: Response of Stable Second- Order System to Random Disturbance Eigenvalues = 1.1459, 7.8541 57 Example: Disturbance Response of Unstable System with LQ Regulators, r = 1 and 100 58
Equilibrium Response to a Command Input! 59 Steady-State Response to Commands!!xt) = F!xt) + G!ut) + L!wt),!xt o ) given!yt) = H x!xt) + H u!ut) + H w!wt) State equilibrium with constant inputs... 0 = F!x *+G!u*+L!w * )!x* = "F "1 G!u*+L!w *... constrained by requirement to satisfy command input!y* = H x!x * +H u!u * +H w!w * 60
Steady-State Response to Commands Equilibrium that satisfies a commanded input, y C 0 = F!x *+G!u*+L!w *!y* = H x!x *+H u!u*+h w!w * Combine equations " 0!y C = " F H x G H u "!x *!u * " + L H w!w * n + r) x n + m) 61 Equilibrium Values of State and Control to Satisfy Commanded Input Equilibrium that satisfies a commanded input, y C "!x *!u* " = F H x G H u 1 " " L!w *! A 1!y C H w!w * L!w *!y C H w!w * A must be square for inverse to exist Then, number of commands = number of controls 62
Inverse of the Matrix! " F H x G H u 1!! A 1 = B = " B 11 B 12 B 21 B 22 "!x *!u* " = B 11 B 12 B 21 B 22 B ij have same dimensions as equivalent blocks of A Equilibrium that satisfies a commanded input, y C " L!w *!y C H w!w * ) )!x* = "B 11 L!w * +B 12!y C " H w!w *!u* = "B 21 L!w * +B 22!y C " H w!w * 63 Elements of Matrix Inverse and Solutions for Open-Loop Equilibrium Substitution and elimination see Supplement)! " B 11 B 12 B 21 B 22 =! " F 1 GB 21 + I n ) F 1 GB 22 B 22 H x F 1 H x F 1 G + H u ) 1 Solve for B 22, then B 12 and B 21, then B 11!x* = B 12!y C " B 11 L + B 12 H w )!w *!u* = B 22!y C " B 21 L + B 22 H w )!w * 64
LQ Regulator with Command Input Proportional Control Law)!ut) =!u C t) " C!x t) How do we define u C t)? 65 Non-Zero Steady-State Regulation with LQ Regulator Command input provides equivalent state and control values for the LQ regulator Control law with command input ) "!x * t) ) " B 12!y * )!ut) =!u*t) " C!x t = B 22!y *"C!x t = B 22 + CB 12 )!y *"C!x t 66
LQ Regulator with Forward Gain Matrix ) "!x * t) )!ut) =!u*t) " C!x t = C F!y *"C B!x t C F! B 22 + CB 12 C B! C!! Disturbance affects the system, whether or not it is measured!! If measured, disturbance effect of can be countered by C D analogous to C F ) 67