Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004

Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation March 29, 2004Copyrighted by John T. Wen Page 1

Continuous Time Systems Consider ẋ = f (x,u,t); x(t 0 ) = x 0 with cost Cost-to-go J(x(t),t): T J(x(t 0 ),t 0 ) = φ(x(t ),T ) + L(x, u,t) dt. 0 T J(x(t),t) = φ(x(t ),T ) + L(x,u,τ)dτ. t We mimick the discrete time approach by 1. assume we know optimal cost-to-go J (x + x,t + t) 2. find optimal u (τ) for τ [t,t + t] 3. let t 0. March 29, 2004Copyrighted by John T. Wen Page 2

HJB Equation Hamilton-Jacobi-Bellman (HJB) Equation: J t = min u(t) ( L(x,u,t) + ( ) J x ) f (x,u,t). with boundary condition J (ξ,t ) = φ(ξ,t ) for all ξ. If x(t ) is required to satisfy ψ(x(t ),T ) = 0, then the boundary condition becomes J (ξ,t ) = φ(ξ,t ) for all ξ that satisfy ψ(ξ,t ) = 0. Alternatively, T J (x (t),t) = φ(x(t ),T ) + L(x,u,τ)dτ. t Differentiate with respect to t, we obtain the HJB equation directly: dj dt = L(x,u,t) = J t + J x f (x,u,t). March 29, 2004Copyrighted by John T. Wen Page 3

Properties of HJB Equation Partial differential equation of J (x,t) (x and t are independent variables x does not depend on t) with specified boundary condition at (x(t ),T ). Solution is in feedback form (u is in terms of x). Almost never solvable exactly, but sometimes approximate solutions may be possible. March 29, 2004Copyrighted by John T. Wen Page 4

Time Invariant and Infinite Horizon Case Consider a time invariant system and an infinite horizon optimization J = 0 ẋ = f (x,u); f (0,0) = 0,x(0) = x 0 L(x(τ),u(τ))dτ,L(0,0) = 0,L(x,u) 0. In this case, J = J (x) (no explicit dependence on t) and therefore J t = 0. HJB then becomes: min u {L(x,u) + J x f (x,u) } = 0, J (0) = 0,J(x) positive definite. March 29, 2004Copyrighted by John T. Wen Page 5

Scalar Examples Plant: ẋ = x + u; x(0) = x 0. Cost: J = 1 2 x2 (T ) + 1 2 T 0 ru 2 dt. Plant (affine nonlinear system): ẋ = f (x) + g(x)u; x(0) = x 0,x R,u R. Cost: J(x 0 ) = 1 2 0 (q(x) + u 2 )dt. Example: ẋ = x 3 + u. March 29, 2004Copyrighted by John T. Wen Page 6

General Continuous Time LQR Plant: ẋ = Ax + Bu; x(0) = x 0. Cost: J(x 0,0) = 1 2 xt (T )Q(T )x(t ) + 1 2 T 0 ( x T Qx + u T Ru ) dt. March 29, 2004Copyrighted by John T. Wen Page 7

Euler Lagrange Equations Let λ T = J x. Then HJB becomes J t = min u (L + λ T f ). If u is constrained, then u must minimize H(x,u,λ,t) which is the same as Pontryagin s minimum principle. If u is unconstrained, we recover the Euler Lagrange Equations. March 29, 2004Copyrighted by John T. Wen Page 8

Iterative Performance Improvement HJB equation is difficult to solve in general, e.g., for quadratic control penalty and affine state dynamics, L(x,u) = q(x) + 1 2 ut Ru; ẋ = f (x) + g(x)u the HJB solution is u = argmin u HJB is a PDE in J (x): q(x) 1 2 {q(x) + 12 ut Ru + J x ( J x } ( ) J ( f (x) + g(x)u) = R 1 g(x) T T. x ) ( ) J g(x)r 1 g T T (x) + J x x f 0(x) = 0; J (0) = 0. March 29, 2004Copyrighted by John T. Wen Page 9

Relaxation Method It may be easier to apply a relaxation method to solve the HJB iteratively: 1. Start with a (locally) stabilizing control u(x), solve V (x) from the generalized HJB (GHJB) equation (for the given u): L(x,u) + V x Note that the PDE is linear in V. 2. Solve û (with V fixed from step 1) from { V û = argmin v x f (x,u) = 0; V (0) = 0. } f (x,v) + L(x,v). 3. Solve ˆV from GHJB again. It has been shown (Saridis & Lee, 79) that ˆV (x) V (x). 4. Repeat until V converges. For L(x,u) = q(x)+ 1 2 ut Ru, q positive definite, V in each iteration is a Lyapunov function for the stabilizing controller u. March 29, 2004Copyrighted by John T. Wen Page 10

Approximate Solution of HJB Equation GHJB is easier to solve than HJB, but it is still difficult in general: L(x,u) + V x We can apply Galerkin approximation f (x,u) = 0; V (0) = 0. where c i s are determined from < L(x,u) + V (x) = N i=1 c i dφ i (x) dx N c i φ i (x); φ i (0) = 0 i=1 f (x,u),φ j >= 0 j = 1,...,N. Residue error can be made small for N sufficiently large and the approximate V can be made arbitrarily close to true V. March 29, 2004Copyrighted by John T. Wen Page 11

Example Consider a simple scalar example HJB: ẋ = x 3 + u; J = 1 2 0 (x 2 + u 2 )dt. J ( x 3 + u) + 1 2 (x2 + u 2 ) = 0 or u = J. Substituting back: 1 2 J 2 J x 3 + 1 2 x2 = 0. We can solve J explicitly (choose the solution so that J (x) is positive definite). Alternatively, we can use GHJB to iterate on u and V : V ( x 3 + u) + 1 2 (x2 + u 2 ) = 0 March 29, 2004Copyrighted by John T. Wen Page 12

Control Lyapunov Function From the HJB equation V x f (x,u ) = L(x,u ) < 0. Suppose we can find a positive definite function V such that min( V u x f (x,u)) < 0. V is called a control Lyapunov function (clf). The feedback control u generated this way is called the inverse optimal control (since there is an optimal control problem corresponds to u). It is easy to find a clf when the system is feedback linearizable, but tough in general. March 29, 2004Copyrighted by John T. Wen Page 13