Optimization via the Hamilton-Jacobi-Bellman Method: Theory and Applications Navin Khaneja lectre notes taken by Christiane Koch Jne 24, 29 1 Variation yields a classical Hamiltonian system Sppose that we have a system that is described by the following eqations of motion, ẋ = f(x,, t) x R n, Ω R m, (1) where x denotes the state vector, the controls, and t time. We can inclde t in the state vector with ṫ = 1. Eq. (1) is then redced to ẋ = f(x, ) x R n+1, Ω R m. (2) The cost fnction for the optimization problem is typically written as a sm of two terms, J = φ(x f ) + L(x, )dt, (3) where φ(x f ) denotes the teral cost, with x f = x(t ) the state vector at final time T. The second term in Eq. (3) corresponds to the rnning cost. Adding zero, J can be rewritten J = φ(x f ) + = φ(x f ) + L(x, )dt + { L(x, ) + λ T f(x, ) + ( λ T )x The second line is obtained by integrating - λt ẋ dt by parts. In order to search for the optimm, we vary J, δj = φ δx f + λ T ( f(x, ) ẋ ) dt } dt λ T (T )x(t ) + λ T ()x(). { ] } L L f f δx(t) + δ(t) + λt δx(t) + δ(t) + λ T δx(t) dt λ T (T )δx(t ), (4) 1
where we have assmed that the initial state is fixed, δx() =, and δx(t ) = δx f. Regroping terms, we obtain ] φ δj = λ T δx f (5) {( ) ( ) } L f L f + + λt δx(t) + + λt δ(t) + ( λ T )δx(t) dt We are looking for an optimm of J, i.e. the variations shold vanish. The variation with respect to x(t) and x f, respectively, detere the eqation of motion for the adjoint state, λ T (t), and its initial condition: The term inside the sqare brackets in Eq. (5) is zero if ( φ T λ(t ) = ) and the terms inside the integral in Eq. (5) that are mltiplied with δx(t) vanish if ( ) L f T λ = + λt. We are then left with δj = We can introdce a (Hamiltonian) fnction We then see that and the variation of J becomes x f, ( ) L f + λt δ(t) dt. H(x, λ, ) = L(x, ) + λ T f(x, ). (6) δj = A small variation of can be written as λ = δ = ε, H δ(t) dt.. If is optimal, then at least H = sch that δj =. We then obtain the following set of 1st order necessary conditions, ẋ = f(x, ) = with x() (7) H λ λ = with λ(t ) = φ (8) = (9) (x(t),λ(t),(t)) 2
This set of eqations looks like a classical Hamiltonian system, where the state vector x takes the role of the coordinates and the adjoint state λ that of the conjgate momenta. 2 The maximm principle Let s first state Pontryagin s maximm principle 1] before proving it: Along the optimal trajectory, we have (t) = arg H ( x (t), λ (t), ). (1) Note that in or context it is rather a imm than a maximm principle since we seek to imize the cost. The claim is that if (t) is an optimal control, then it imizes H along the optimal trajectory globally at each instant of time. This is a very strong statement since we have only sed first order variation. To prove this 1, let s first recast J sch that it represents a final-time cost only. This can be done by defining, in Eq. (3), L(x, ) in terms of a state variable. That is, if or state vector is x = (x 1, x 2,..., x n ) R n, we add x n+1 with ẋ n+1 = L(x, ). (11) We can then write a cost fnctional that depends on the final time only, J = φ(x f ) + x n+1 (T ). (12) Now sppose that we know the optimal control (t) which deteres the optimal x (t). Dring a very short time interval, τ dτ, τ], let s change the control drastically from to v: Ω v v (t) τ dτ τ t x(t) 1 1 x(τ) x f t This leads to the following change in x, δx(τ) = f ( x (τ), v ) f ( x (τ), (τ) )] dτ. (13) We ths obtain a change of initial condition at time t = τ for the evoltion of x(t) at times t > τ. We can write (x + δx) = f(x + δx, + δ), 1 Note that we will show the proof only for reglar extrema. Pontryagin s proof covers also abnormal extrema 1] bt is mch more difficlt to follow. Roghly speaking, abnormal extrema are those where the final state x f can be reached only for a single T. 3
which leads to δx = f δx + f (x (t), (t)) δ (x }{{} (t), (t)) }{{} A(t) B(t) δx = A(t)δx(t) + B(t)δ(t). (14) This can be expressed in terms of a propagator (or Green s fnction), Φ, Or claim is now or, eqivalently, δx f = δx(t ) = Φ(T, τ) δx(τ). φ δx f, φ Φ(T, τ) δx(τ). (15) Using the initial condition, λ(t ), cf. Eq. (8), we can write for λ at time τ, T φ λ(τ) = Φ(T, τ)] = ( Φ(T, τ) ) T λ(t ) (16) Inserting Eq. (16) and Eq. (13) into Eq. (15), we obtain λ T (τ) f ( x (τ), v ) f ( x (τ), (t) )] δτ or, eqivalently, λ T (τ) f ( x (τ), (t) ) λ T (τ) f ( x (τ), v ) Since Eq. (6) for a final-time only cost becomes we obtain for all v Ω H(x, λ, ) = λ T f(x, ), (17) H(x, λ, ) H(x, λ, v), (18) which proves that indeed (t) imizes H globally along the optimal trajectory, i.e. Eq. (1). This necessary condition is stronger than jst the first order variation being zero. In fact, it is so restrictive that often it allows to detere the optimal soltion. In conclsion, if (t) is the global optimm, then there is a canonical way to define H with the necessary condition, Eq. (18) and generate the state x(t) and co-state λ(t). Eq. (17) implies H f = λt, which, together with the definition of A(t), cf. Eq. (14), leads to λ = = A(t) λ. (19) 4
An example: Controlling the force on a particle 1] The eqation of motion is given by which can be rewritten ẍ = with 1 ẋ 1 = x 2, ẋ 2 =. The goal is to drive the system from any initial point in phase space, (x 1 (), x 2 ()), to the origin in imal time. We therefore have to formlate time as a cost, ṫ = ẋ 3 = 1. Evalating Eq. (17), the eqations of motion lead to the Hamiltonian fnction for this optimization problem, H = λ 1 x 2 + λ 2 + λ 3. The λ j are fond from the Hamiltonian eqations, cf. Eq. (8), λ 1 = H 1 = λ 1 = c = const λ 2 = λ 1 λ 2 = c t + d λ 3 = We can now detere the control from Eq. (9), or, more specifically from the condition to maximize H. Taking the restriction, 1, into accont, H takes its maximm vale for = sgn(λ 2 ). This corresponds to sing the maximm allowed force, i.e. bang-bang control. Since λ 2 is a linear fnction, it changes sign only once. We therefore obtain for possibilities for : +1 +1 1 t t t t 1 Evalating the Hamiltonian fnction, Eq. (7), we can also obtain the evoltion of the state variables. For = 1, ẋ 2 =, yields x 2 (t) = t + c and for = 1, we get correspondingly x 1 (t) = t2 2 + c t + d = x2 2 2 + d 1, x 1 (t) = x2 2 2 + d 1. 5
That is, we obtain parabolas in phase space: A(t) x 2 I II B(t) x 1 If the controls are not switched, or initial point mst be on either crve A(t) or B(t), cf. Eq. (14), since these are the only parabolas that take s to the origin. If the initial point is (x 1 (), x 2 ()) I, then the control is switched from -1 to 1 when the parabola that contains (x 1 (), x 2 ()) and describes the flow (x 1 (t), x 2 (t) crosses B(t). The phase space flow then contines on B(t) to the origin. If (x 1 (), x 2 ()) II, then the control is switched from + 1 to 1 when the parabola that contains (x 1 (), x 2 ()) and describes (x 1 (t), x 2 (t) crosses A(t). The phase space flow then contines on A(t) to the origin. We have solved this example by constrcting the Hamiltonian fnction. Alternatively, it cold also be solved by considering the final-time cost, φ(x f ), which in the example above corresponds to i.e. the total time to reach the target. 2 φ(x 1,f, x 2,f, x 3,f ) = x 3,f = T, 3 Hamilton-Jacobi-Bellman principle When we transform all rnning costs into final costs φ(x f ), the optimization problem consists in designing the trajectory from initial point x to final point x f that imizes the final cost, φ(x f ). We seek to solve this problem globally for all initial points x and introdce the optimal retrn fnction, V (x ) = φ(x f ), (2) that is V (x) corresponds to the best or imm cost starting from initial vale x. Sppose that we have solved the optimization problem and know V (x). This defines the optimal trajectory from x to x f : 2 Note that φ(x f ) doesn t inclde the target state becase we assme the target state to be fixed and the zero variations δx 1,f, δx 2,f wold leave λ 1 and λ 2 ndefined. 6
Chicago 11 Boston δx() x x f 11 Santa Barbara Consider now the optimal retrn for small deviations from the optimal trajectory, The claim is now that V ( x + δx() ). V ( x + δx() ) = V (x). (21) This is the principle of dynamic programg or Hamilton-Jacobi-Bellman (HJB) principle. It states that if we have fond the optimal way to go from Santa Barbara to Boston, and Chicago is located on this way, then also the way from Chicago to Boston is optimal. We can rewrite Eq. (21) by expressing δx in terms of the eqation of motion of x, Eq. (1), which leads to the Bellman PDE, V (x) + V ] f(x, ) δt, = V (x) ] V f(x, ) =. (22) The optimal cost mst satisfy Eq. (22). If we want to achieve the optimization in inm time, we need to consider t as a state variable sch that or eqations of motion become Eq. (22) is then rewritten ẋ = f(x,, t), ṫ = 1 (23) V t + V ] f(x, ) =, (24) which is the Hamilton-Jacobi-Bellman eqation. Note that if there is no restriction on time, then V does not depend on t, i.e. the optimm is achieved at infinite time (these problems are called infinite horizon problems). We now show the connection between the HJB eqation, Eq. (24), and the Pontryagin principle, Eq. (1), by considering what happens along the optimal trajectory. Evalating, along the optimal trajectory, the partial derivative of V (x) with respect to x which also depends on x, we find ( ) V x (t) = λ T (t). (25) Inserting this into Eq. (22), we see that λ T (t)f ( x (t), )] =. 7
Since the term inside the sqare brackets is nothing bt the Hamiltonian, cf. Eq. (17), along the optimal trajectory, x (t), we see that the optimal control imizes H at each time. That is, we recover Pontryagin s imm principle. We can therefore detere the optimal control as the argment that imizes the Hamiltonian, or, (x) = arg (x) = arg H ( x (t), λ(t), )]. (26) ] V f(x, ). (27) Eq. (27) implies V f(x, (x)) =. (28) Eq. (26) implies that the Hamiltonian is zero along the optimal trajectory, why? H ( x (t), λ(t), (t) ) =, and that the optimal cost V is constant along the optimal trajectory, dv dt =, (x (t)) since dv dt = H. We now show that Eq. (25) indeed satisfies the eqation of motion for λ T (t), Eq. (8). Let s denote partial derivatives by sbscripts, Taking the derivative of Eq. (25), we obtain V = V x. λ T = V xx ( x (t) ) f(x, ). On the other hand, we can differentiate Eq. (28) with respect to x and find = ] V x f(x, ) = V xx f ( x, (x) ) + V x f x + V x f. For optimal, f = and we can express V xx in terms of V x f x and obtain λ T = V f. With Eq. (25), we find λ(t) = f T λ. 8
From the constrction of the Hamiltonian, Eq. (17), it follows that H = λt f, and we indeed recover Eq. (8). The connection to Pontryagin s maximm principle is easily made by considering the optimal retrn at the final point, V (x f ) = φ(x f ). This is inserted into Eq. (25) for t = T and we obtain References λ T (T ) = φ ( ). 1] L. S. Pontryagin. The Mathematical Theory of Optimal Processes. Gordon and Breach Science Pblishers, 1986. 9