ESC794: Special Topics: Model Predictive Control Nonlinear MPC Analysis : Part 1 Reference: Nonlinear Model Predictive Control (Ch.3), Grüne and Pannek Hanz Richter, Professor Mechanical Engineering Department Cleveland State University
Nonlinear MPC for Constant References Here we consider equilibrium regulation of a nonlinear system under constraints. Consider the open-loop system to be x + = f(x,u) and the MPC feedback law u = µ(x). The resulting closed-loop system is x + = g(x) = f(x,µ(x)) and we assume that x is an equilibrium for this system, that is, g(x ) = x. Also, assume that there is no running cost associated with equilibrium-holding: l(x,µ(x )) = 0. Finally, assume l(x,u) > 0 for (x,u) (x,µ(x )). 2 / 23
Optimal Control Problem (OCP N ) 3.1 and NMPC Algorithm minimize u U N (x 0 ) subject to J N (x 0,u) = N 1 k=0 l(x u (k,x 0 ),u(k)) x u (0,x 0 ) = x 0 x u (k +1,x 0 ) = f(x u (k,x 0 ),u(k)) Let u (k) be the open-loop solution sequence for OCP N. The MPC feedback law is defined by µ N (x(n)) = u (0). Note that n is the instant at which OCP N is solved, using x 0 = x(n) (the feedback). The nominal closed-loop system resulting from this algorithm is x + = f(x,µ N (x)) 3 / 23
Constraints and Related Definitions Recalling the notational elements introduced in Handout 1, we consider X to be the set of admissible states, decided by us as designers. U(x) is the set of admissible inputs for state x. Also specified by designers, and the simplest case is when there s no dependence on x. Define the set of admissible pairs by Y = {(x,u) : x X and u U(x)} Let n N and x 0 X. A control sequence u U N and the resulting trajectory x u (k,x 0 ) are admissible for x 0 up to N if (x u (k,x 0 ),u(k)) Y, k = 0,1,2,...N 1 and x u (N,x 0 ) X We need this separate condition on the last state because control sequences are 1 step shorter than resulting trajectories. The set of all admissible control sequences for x 0 up to N is denoted by U N (x 0 ) 4 / 23
Constraints and Related Definitions... Let x 0 X and u U. The control sequence and corresponding trajectory x u (k,x 0 ) are called admissible for x 0 if they are admissible for any arbitrary n N. The set of all admissible sequences for x 0 is denoted U (x 0 ) A feedback law µ : N 0 X U is called admissible if µ(n,x) U 1 (x) for all x X and all n N 0. 1. Viability: We assume that x X, there is always some u U(x) such that f(x,u) X (there s always some admissible control to apply that will not result in a state constraint violation next). 1 Normally we require the entire predicted sequences out of the OCP N to be admissible, even though only the first control sequence element will be applied. Not much else we can do! 5 / 23
Viability... A car (point mass m) is situated between 2 walls separated by a distance d. The maximum acceleration and deceleration of the car are captured by the constraint u Ū. The maximum allowable speed is V. Suppose the car obeys the double-integrator law mẍ = u. Sketch the viable subset of X = R 2. Note: G & P and other recent approaches to NMPC analysis make state admissibility a part of input admissibility. When we require u U N (x 0 ) not only we enforce input value constraints, but also that the resulting state trajectories are in X up to time N. This shifts the burden of preserving state constraints to the numerical solvers, avoiding theoretical complications. We just assume we have viability, it s up to the solver to find the best viable solution. 6 / 23
Admissibility of the NMPC Feedback Law This theorem shows that NMPC feedback will generate admissible input sequences and corresponding trajectories provided viability is assumed. (At every point x, there s always some control u that may be applied without violating X next. NMPC simply chooses u = µ N (x) among them). Theorem G&P 3.5: Consider the OCP N 3.1 with constraints u U N (x 0 ). Suppose viability holds. Consider the nominal closed-loop system x + = f(x,µ N (x 0,x)) with µ N (x(n)) = u (0) and suppose x 0 = x µn (0) X. Then for all n N. (x µn (n),µ N (x µn (n)) Y This key result leads to the recursive feasibility property, because the assumption x 0 X will be automatically satisfied upon subsequent applications of the MPC feedback law, as a consequence of this very same theorem. 7 / 23
Time-Varying Optimal Control Problem (OCP n N ) For a time-varying reference x ref (n), the running cost l(n,x,u) is assumed to satisfy l(n,x ref (n),u ref (n)) = 0 where x ref (n) has been generated by a suitable u ref (n): x ref (n+1) = f(x ref (n),u ref (n)) Also, the running cost must be non-negative: l(n,x,u) > 0 n, u U, x X,x x ref (n) A running cost that satisfies the above is l(n,x,u) = x x ref (n) 2 +λ u u ref (n) 2 with λ 0. 8 / 23
Time-Varying NMPC Algorithm Measure x(n), set x 0 = x(n) and solve minimize u U N (x 0 ) subject to J N (n,x 0,u) = N 1 k=0 l(n+k,x u (k,x 0 ),u(k)) x u (0,x 0 ) = x 0 x u (k +1,x 0 ) = f(x u (k,x 0 ),u(k)) Let u (k) be the open-loop solution sequence for OCP N. The MPC feedback law is defined by µ N (n,x(n)) = u (0). The nominal closed-loop system resulting from this algorithm is x + = f(x,µ N (n,x)) 9 / 23
Terminal Constraint Sets Terminal constraint sets provide a way to guarantee feasibility and closed-loop stability of NMPC. For a fixed terminal set X 0, a general terminal constraint is expressed as x u (N,x(n)) X 0 for each u U N (x 0 ). In words, we re asking admissible predicted sequences to yield a predicted state at the end of the horizon such that it lies on a desired set X 0. The terminal set may not be fixed, but walk with time: X 0 (n). In this case, the terminal constraint has the form x u (N,x(n)) X 0 (n+n) Terminal sets are used to define feasible sets (of initial conditions), denoted X N : X N = {x 0 X : u U N (x 0 ) such that x u (N,x 0 ) X 0 } The corresponding admissible control sequences available for x 0 are: U N X 0 (x 0 ) = {u U N (x 0 ) : x u (N,x 0 ) X 0 } Similar definitions apply to the T-V case, see Def. 3.9 ii in G & P. 10 / 23
Terminal Costs and Weighted Costs - Everything algorithm The predicted state at the end of the horizon may be included as separate term in the cost function, F(x u (N,x(n)). Again, will be used as part of stability analysis. Weighted costs are generated by using a sequence of non-negative weights ω k. A T-V algorithm with weighted cost, terminal constraint and terminal cost ( everything algorithm) is: At time n, set x 0 = x(n) X N and solve minimize u U N X 0 (n,x 0 ) subject to J N (n,x 0,u) = N 1 k=0 ω k l(n+k,x u (k,x 0 ),u(k))+f(n+n,x u (N,x(n)) x u (0,x 0 ) = x 0 x u (k +1,x 0 ) = f(x u (k,x 0 ),u(k)) Note that the terminal state constraint is part of the admissibility requirement for u, and thus not listed as a subject to constraint. But when coding for numerical solutions, terminal constraints are listed among the subject to constraints. 11 / 23
Recursive Feasibility of NMPC Consider the everything algorithm. The following holds (Corollary 3.13 in G & P) for all n N: x X N (n) f(x,µ N (n,x)) X N 1 (n+1) At any time n, if we solve the NMPC problem at some point of the current N-step feasible set, then x + under NMPC feedback will belong to the next N 1-step feasible set. When X N is constant and contains just a single equilibrium point, it is clear that N 1-step feasibility implies N-step feasibility, since we may just apply the equilibrium control input again. 12 / 23
Bellman s Optimality Principle Suppose a driver has been following a bad route due to confusion. When he realizes the mistake, he will take the best route from where he is, regardless of what he did before. If he had started from that point, he would have followed that same best route. This empirical observation is reflected in Bellman s principle of optimality and will be presented in precise mathematical form as the MPC dynamic programming equality and inequalities. 13 / 23
Optimal Value Function and Optimal Sequences Define the optimal value function (or minimum cost) by V N (n,x 0 ) = inf u U N X 0 (n,x 0 ) J N (n,x 0,u) We use inf instead of min because there may be no (admissible) u that gives V N exactly. For example, the function e t has an infimum value at zero, but there s no t that produces zero. A control sequence u U N X 0 (x 0 ) is optimal if it actually achieves the optimal value: J N (n,x 0,u ) = V N (n,x 0 ) 14 / 23
Dynamic Programming Principle (Th. 3.15 in G & P): For OCP n N,e 2 with x 0 X N (n) and all n,n N 0 : i.) V N (n,x 0 ) = inf u U K X N K (n,x 0 ) { K 1 k=0 + V N K (n+k,x u (k,x 0 )) ω N k l(n+k,x u (k,x 0 ),u(k)) } Total opt cost (start at x 0 at time n with horizon N) = inf of (cost from x 0 at n with horizon K + opt cost from x u (K,x 0 ) at time n+k, horizon N K). Note that the terminal cost does not appear in the DP equation, but the principle is valid for OCP containing such term. 2 Important: this principle applies to the predicted solutions of the OCP, not their repeated use as feedback controls 15 / 23
Dynamic Programming Principle... ii) If an optimal sequence u U N X 0 (n,x 0 ) exists for x 0 then V N (n,x 0 ) = K 1 k=0 ω N k l(n+k,x u (k,x 0 ),u (k))+v N K (n+k,x u (k,x 0 )) time n: solve OCP get u at cost V N stage cost K 1 k=0 ω N kl(k,x u (k,x 0 ),u (k)) time n+k: solve OCP get the tail of u, at cost V N K x u (K,x 0 ) VN K(n+K,x u (k,x 0 )) cost-to-go x 0 16 / 23
Dynamic Programming Principle... (Corollary 3.16 in G &P): If u is an optimal solution to the OCP n N,e for x 0 X N (n) at time n with N 2, then for each K = 1,2,...N 1, the shifted sequence u K(k) = u (k +K), k = 0,1,2,N K 1 is an optimal solution to the OCP n N,e for x u (K,x 0) at time n+k, with horizon N K. (Theorem 3.17): Consider the OCP n N,e with x 0 X N (n) and assume an optimal sequence u exists. Then the NMPC feedback µ N (n,x 0 ) = u (0) satisfies µ N (n,x 0 ) = arg min {ω N l(n,x 0,u) u U 1 X (n,x 0 ) N 1 + V N 1 (n+1,f(x 0,u))} Note that the arg min is taken over all 1-element sequences admissible at time x 0 and time n. 17 / 23
Interpretation At x 0, n, try all admissible 1-element sequences u. Recall what admissibility entails (constrained controls, states, terminal state). Each u results in a 1-step stage cost, a next state f(x 0,u) and an optimal cost-to-go V N 1 (n+1,f(x 0,u)). Tally all admissible us and for each, add the stage cost and the cost-to-go. Locate the minimum sum. Th. 3.17 says that the u giving the minimum sum is the NMPC solution u (0). The next Corollary says the following: suppose we apply the NMPC feedback for some time. If we form a sequence with the applied feedbacks, it will be a solution to the one-shot OCP at the initial time. (Corollary 3.18): Consider the OCP n N,e with x 0 X and consider the set of admissible feedback laws µ N K for k = 0,1,...N 1. 18 / 23
Interpretation... Important: Unless N =, elements 2,3,4,...N of the predicted control and state sequences at time n do not match the feedback and closed-loop states at n+1,n+2,...n+n 1. Example (Prob. 3.2 in G&P): Consider the system x + 1 = x 1 +2x 2 x + 2 = x 2 +2u with running cost l(x,u) = u 2, x 0 = [0 0] T, x N = [4 0] T. For N = 4, use the dynamic programming principle to obtain the first predicted trajectory and optimal cost. We then use numerical simulations to illustrate the validity of the above theorems and colloraries. 19 / 23
Dynamic Programming Principle, LTI DT Systems and Finite-Horizon Quadratic Problem Consider the LTI DT system x + = Ax+Bu with initial state x(0) = x 0 and quadratic cost J N (x 0,u) = {x T u(k,x 0 )Qx u (k,x 0 )+u T (k)ru(k)}+x T u(n)q f x u (N) N 1 k=0 We use the DP principle to find a solution for the unconstrained OCP and corresponding optimal cost function. Standard solvability assumptions: Q = Q T 0, Q f = Q T f 0, R = R T > 0 20 / 23
Finite-Horizon DLQR... Apply the DP principle (Th 3.15) to this case with K = 1 and time n. Then the stage cost term is associated with the initial state and control: x T 0Qx 0 +u T Ru. With u as initial control, the next state is x + = Ax 0 +Bu. Then the cost-to-go to be minimized is V N 1 (Ax 0 +Bu). Then the DP principle reduces to V N (n,x 0 ) = x T { 0Qx 0 + min u T Ru+V N 1 (n+1,ax 0 +Bu) } u U 1 It is guessed that V N 1 (n+1,z) is a quadratic time-varying function: V N 1 (n+1,z) = z T P n+1 z where P n is a sequence of symmetric, p.def. matrices. Substituting the guess: V N (n,x 0 ) = x T 0Qx 0 + min u U 1{uT Ru+(Ax 0 +Bu) T P n+1 (Ax 0 +Bu)} 21 / 23
Finite-Horizon DLQR... Perform the indicated minimization by equating the gradient to zero: 2u T R+2(Ax 0 +Bu) T P n+1 B = 0 which gives the well-known optimal solution u = (R+B T P n+1 B) 1 B T P n+1 Ax 0 K n x 0 Substituting this solution into the DP principle equation gives the Riccati backward recursion: P n = Q+A T P n+1 A A T P n+1 B(R+B T P n+1 B) 1 B T P t+1 A To solve the above, note that P N = Q f. This is used as initial value to find P N 1,P N 2,...P 0 in that order, along with the optimal control sequence. 22 / 23
Example: Finite-Horizon DLQR Consider a double-integrator plant discretized with ZOH. We simulate the finite-horizon optimal regulator with identity weights. See the effect of N. 1 State Trajectory: Finite-Time DLQR 0-1 -2 x 2-3 -4-5 -6-7 -2-1.5-1 -0.5 0 0.5 1 1.5 2 x 1 23 / 23