ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1

DT nonlinear control system: Discrete Time Optimal Control Problem x(t +1)=f(x(t),u(t)),x X, u U, t Z + (1) For traditional system: X R n, U R m are continuous variables A large class of DT hybrid systems can also be written in (or viewed as) the above form: switched systems: U R m Qwith mixed continuous/discrete control input Discrete Time Optimal Control Problem 2

variable structure system (e.g. Piecewise affine system): f defined in terms of partitions: f(x(t),u(t)) = f i (x(t),u(t)), if x(t) Ω i more general hybrid system: X R n Q x and U R m Q u Actual control constraint is state-dependent: U(x) ={u U : f(x, u) X} Discrete Time Optimal Control Problem 3

A general way to determine control: time-varying state-feedback law: u(t) =μ t (x(t)) require: μ t (x(t)) U(x(t)) Control policy: N-horizon policy: π N = {μ 0,...,μ N 1 } Π N Infinite-horizon policy: π = {μ 0,μ 1,...} Π Stationary policy: π = {μ,μ,...} Π Discrete Time Optimal Control Problem 4

x( ; z, π): Cl-trajectory starting from z under policy π = {μ 0,μ 1,...} x(t +1;z, π) =f(x(t; z, π),μ t (x(t; z, π))) (2) Corresponding control sequence: u( ; z, π) =μ t (x(t; z, π)) Quantify performance through cost functions: Finite Horizon: J N (z, π N )= N 1 t=0 l(x(t; z, π N),u(t; z, π N )) + J f (x(n; z, π N )) Infinite Horizon: J (z, π )= t=0 l(x(t; z, π ),u(t; z, π )) running cost l : X U R + ; terminal cost: J f : X R + Discrete Time Optimal Control Problem 5

Optimal control problem: Finite-Horizon: V N (z) =inf πn Π N J N (z, π N ) Infinite-Horizon: V (z) =inf π Π J (z, π ) V N : N-horizon value function; V : infinite horizon value function lim N V N may not exist; even when it exists V V in general Discrete Time Optimal Control Problem 6

Short Introduction to Dynamic Programming Bellman s principle of optimality: any segment along an optimal trajectory is also optimal among all the trajectories joining the two end points of the segment Short Introduction to Dynamic Programming 7

This leads to the backward value iteration: compute V N iteratively starting from V 0 V 0 = J f : zero-horizon cost is just the terminal cost function Suppose we know V j for some j 0, then V j+1 (z) =min u U(z) {l(z, u)+v j (f(z, u))} u = argmin u U(z) {l(z, u)+v j (f(z, u))}: optimal control when there is j +1-step left and system is at state z Short Introduction to Dynamic Programming 8

Example 1 (Shortest Path Problem): Find the shortest path to a 4 X = {a 1,...,a 8 }; U(x): available links to take at x X dynamics: x(k +1)=f(x(k +1),u(k + 1)); running cost: l(x(k),u(k)) = edge length terminal cost: J f (x) = Value iteration: 0 if x = a 4 ow 4 6 3 4 3 2 4 3 5 5 2 Short Introduction to Dynamic Programming 9

Linear Quadratic Regulator: System dynamics: x(t +1)=Ax(t)+Bu(t) Running cost: l(z, u) =z T Qz + u T Ru; Terminal cost: J f () = z T Q f z Suppose j-horizon value function is V j (z) =z T P j z, derive V j+1 (z) Short Introduction to Dynamic Programming 10

Derivation (cont.): optimal control u at t = N (j +1): Short Introduction to Dynamic Programming 11

Summary of LQR results: j-horizon value function: V j (z) =z T P j z, where P j is generated by Riccati recursion P j+1 = ρ(p j ) ρ(p )=Q + A T PA A T PB(R + B T PB) 1 B T PA (3) To compute LQR controller 1. Start with P 0 = Q f 2. Riccati recursion: P j+1 = ρ(p j ) 3. Compute optimal feedback gain: K j+1 = (R + B T P j B) 1 B T P j A To use LQR controller: 1. initial state: x 0 2. compute control input: u (t) =K N t x (t) 3. state update: x (t +1)=Ax (t)+bu (t) Short Introduction to Dynamic Programming 12

Theorem 1 (Infinite-Horizon LQR): If (A, B) is stabilizable and (A, C) is detectable, where Q = C T C, then as j, P j P and K j K with stable cl-system (A + BK ); P satisfies the algebraic Riccati equation: P = ρ(p ) K = (R + B T P B) 1 B T P A K(P ) Short Introduction to Dynamic Programming 13

Value Iteration Operator To simplify notations, we define two function spaces: G {g : X R {± }}and G + {g : X R {+ }} Value iteration can be viewed as an operator T : G G T [g](z) = min l(z, u)+g(f(z, u)), z X, g G (4) u U(z) Iteration under a given control law μ : X U with μ(x) X, x X is defined as T μ [g](z) =l(z, μ(z)) + g(f(z, μ(z))), z X, g G (5) Short Introduction to Dynamic Programming 14

T k : the composition of T with itself k times: T k+1 [g] =T [T k [g]], with T 0 [g] =g Theorem 2 Key Properties of Value Iteration: 1. Value Iteration: V N = T N (J f ) 2. Bellman Equation: V G + is a fixed point of T, i.e. T [V ]=V 3. Minimum Property: For any g G +,ifg T[g], then g V 4. Monotonicity: If J f 0, then J f = V 0 V 1 V 5. Stationary Policy: A stationary policy π = {μ,μ...} is optimal if and only if T [V ]= T μ [V ] Short Introduction to Dynamic Programming 15

Some unfortunate facts: 1. lim N V N may not exist 2. Even when lim N V N exists, we may have V V 3. Bellman equation: T [g] =g may have infinitely many solutions that are not V Counterexamples can be found in Bertsekas book (volume II) These cases won t happen if state space is finite, cost per stage l(z, u) is bounded, and cost function is discounted. However, for control problems, we often don t have the luxury to make these assumptions Short Introduction to Dynamic Programming 16

Connection to Stabilization Problems : an arbitrary p-norm raised to an arbitrary positive power, namely, r p, for some p, r R + Definition 1 (Exponential stabilizability): π such that x(t; z, π ) cγ t z Definition 2 (Exponentially Stabilizing Control Lyapunov Function (ECLF)): A PD function g G + is called an ECLF if μ with μ(z) U(z), z X, such that κ 2 z g(z) κ 1 z g(z) g(f(z, μ(z))) κ 3 z (6) Connection to Stabilization Problems 17

If condition (6) holds, then CL-system under π = {μ,μ,...} is exponentially stable with x(t; z, π ) κ 2 κ 1 1 1+κ 3 /κ 2 t z (7) How does stabilization relate to optimal control? Let us start with a well known result for LQR (see Slide 10 for notations) Theorem 3 If (A, B) is stabilizable, and (A,C) is detectable, where Q = C T C, then 1. Convergence of value iteration: P j P where P j+1 = ρ(p j ) with P 0 =0 2. V (z) =z T P z is an ECLF 3. μ(z) =K(P )z is an stabilizing feedback law Connection to Stabilization Problems 18

The theorem indicates: if we choose cost function properly (i.e. to make (A, C) detectable where Q = C T C), then a linear system if stabilizable if and only if it can be stabilized by the LQR controller Question: can we generalize such result, to what extent, and how? Roughly speaking, if a system can be (exponentially) stabilized, then we can always stabilize it by solving a properly defined optimal control problem. Connection to Stabilization Problems 19

Main idea: select running cost l : X U R + such that exponentially stabilizable V (z) β z (8) V (z) β z the optimal trajectory exponentially stable (9) This can be achieved by various types of cost functions. To illustrate the key idea, let us choose the simplest one, say l(z, u) = z exponentially stabilizable V (z) β z for some β< Connection to Stabilization Problems 20

V (z) β z V (z) is an ECLF Note: always stationary stabilizing policy π = {μ,μ,...}, where T μ [V ]=V : Connection to Stabilization Problems 21

can also incorporate nontrivial control cost (e.g. l(z, u) = z + u ). This will not affect condition (9), but condition (8) requires more discussions. For example, if the stabilizing control inputs u does not converge, then V (z) cannot be bounded. All that matters is whether stabilizable with finite control energy For linear systems (and many other systems) stabilizability always means stabilizability with finite control energy thus, adding control cost will not destroy the stability of the optimal trajectory Connection to Stabilization Problems 22

In summary, if system is exponentially stabilizable, then with properly selected running cost function l(z, u), V is an ECLS optimal infinite-horizon policy π = {μ,μ,...} is exponentially stabilizing This provides a unified way to construct ECLF and stabilizing controller. However, infinite-horizon optimal control is often more challenging to solve than stabilization One way is to use V N (and μ N ) to replace V (and μ ) Model Predictive Control Connection to Stabilization Problems 23