Optimization via the Hamilton-Jacobi-Bellman Method: Theory and Applications

Similar documents
Essentials of optimal control theory in ECON 4140

The Linear Quadratic Regulator

Second-Order Wave Equation

Partial Differential Equations with Applications

Mean Value Formulae for Laplace and Heat Equation

Lecture: Corporate Income Tax

Lecture: Corporate Income Tax - Unlevered firms

Math 116 First Midterm October 14, 2009

Dynamic Optimization of First-Order Systems via Static Parametric Programming: Application to Electrical Discharge Machining

Math 263 Assignment #3 Solutions. 1. A function z = f(x, y) is called harmonic if it satisfies Laplace s equation:

Section 7.4: Integration of Rational Functions by Partial Fractions

Information Theoretic views of Path Integral Control.

Deterministic Dynamic Programming

4.2 First-Order Logic

Formal Methods for Deriving Element Equations

Complex Variables. For ECON 397 Macroeconometrics Steve Cunningham

FRÉCHET KERNELS AND THE ADJOINT METHOD

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Pulses on a Struck String

Prandl established a universal velocity profile for flow parallel to the bed given by

Linear System Theory (Fall 2011): Homework 1. Solutions

Reflections on a mismatched transmission line Reflections.doc (4/1/00) Introduction The transmission line equations are given by

CDS 110b: Lecture 1-2 Introduction to Optimal Control

ECON3120/4120 Mathematics 2, spring 2009

1. INTRODUCTION. A solution for the dark matter mystery based on Euclidean relativity. Frédéric LASSIAILLE 2009 Page 1 14/05/2010. Frédéric LASSIAILLE

STEP Support Programme. STEP III Hyperbolic Functions: Solutions

CHARACTERIZATIONS OF EXPONENTIAL DISTRIBUTION VIA CONDITIONAL EXPECTATIONS OF RECORD VALUES. George P. Yanev

Pendulum Equations and Low Gain Regime

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

Optimal Control, Statistics and Path Planning

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance

A Single Species in One Spatial Dimension

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on

Uncertainties of measurement

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli

Decision Oriented Bayesian Design of Experiments

Ground Rules. PC1221 Fundamentals of Physics I. Position and Displacement. Average Velocity. Lectures 7 and 8 Motion in Two Dimensions

Optimal Mean-Variance Portfolio Selection

Setting The K Value And Polarization Mode Of The Delta Undulator

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers

Affine Invariant Total Variation Models

BLOOM S TAXONOMY. Following Bloom s Taxonomy to Assess Students

Subcritical bifurcation to innitely many rotating waves. Arnd Scheel. Freie Universitat Berlin. Arnimallee Berlin, Germany

OPTIMUM EXPRESSION FOR COMPUTATION OF THE GRAVITY FIELD OF A POLYHEDRAL BODY WITH LINEARLY INCREASING DENSITY 1

A Theory of Markovian Time Inconsistent Stochastic Control in Discrete Time

Information Source Detection in the SIR Model: A Sample Path Based Approach

Lecture 8: September 26

The Scalar Conservation Law

4 Exact laminar boundary layer solutions

PhysicsAndMathsTutor.com

Formules relatives aux probabilités qui dépendent de très grands nombers

Sufficient Optimality Condition for a Risk-Sensitive Control Problem for Backward Stochastic Differential Equations and an Application

Integration of Basic Functions. Session 7 : 9/23 1

HADAMARD-PERRON THEOREM

Gravitational Instability of a Nonrotating Galaxy *

PhysicsAndMathsTutor.com

Chapter 2 Introduction to the Stiffness (Displacement) Method. The Stiffness (Displacement) Method

Chem 4501 Introduction to Thermodynamics, 3 Credits Kinetics, and Statistical Mechanics. Fall Semester Homework Problem Set Number 10 Solutions

Move Blocking Strategies in Receding Horizon Control

FOUNTAIN codes [3], [4] provide an efficient solution

Mathematical Analysis of Nipah Virus Infections Using Optimal Control Theory

Simplified Identification Scheme for Structures on a Flexible Base

10.4 Solving Equations in Quadratic Form, Equations Reducible to Quadratics

ρ u = u. (1) w z will become certain time, and at a certain point in space, the value of

FEA Solution Procedure

Math 273b: Calculus of Variations

Modelling by Differential Equations from Properties of Phenomenon to its Investigation

Step-Size Bounds Analysis of the Generalized Multidelay Adaptive Filter

Analytical Investigation of Hyperbolic Equations via He s Methods

Krauskopf, B., Lee, CM., & Osinga, HM. (2008). Codimension-one tangency bifurcations of global Poincaré maps of four-dimensional vector fields.

Bridging the Gap Between Multigrid, Hierarchical, and Receding-Horizon Control

Conditions for Approaching the Origin without Intersecting the x-axis in the Liénard Plane

Pontryagin s Minimum Principle 1

Classify by number of ports and examine the possible structures that result. Using only one-port elements, no more than two elements can be assembled.

Approximate Solution of Convection- Diffusion Equation by the Homotopy Perturbation Method

Feature Selection for Neuro-Dynamic Programming

Elements of Coordinate System Transformations

Discontinuous Fluctuation Distribution for Time-Dependent Problems

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2

5. The Bernoulli Equation

Remarks on strongly convex stochastic processes

AMS 212B Perturbation Methods Lecture 05 Copyright by Hongyun Wang, UCSC

A Note on Irreducible Polynomials and Identity Testing

Bertrand s Theorem. October 8, µr 2 + V (r) 0 = dv eff dr. 3 + dv. f (r 0 )

10.2 Solving Quadratic Equations by Completing the Square

Introdction Finite elds play an increasingly important role in modern digital commnication systems. Typical areas of applications are cryptographic sc

Garret Sobczyk s 2x2 Matrix Derivation

A Computational Study with Finite Element Method and Finite Difference Method for 2D Elliptic Partial Differential Equations

Two-media boundary layer on a flat plate

Elliptic Curves Spring 2013

), σ is a parameter, is the Euclidean norm in R d.

An Investigation into Estimating Type B Degrees of Freedom

Approach to a Proof of the Riemann Hypothesis by the Second Mean-Value Theorem of Calculus

Discussion Papers Department of Economics University of Copenhagen

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters

The Cryptanalysis of a New Public-Key Cryptosystem based on Modular Knapsacks

Homework 5 Solutions

Non-Lecture I: Linear Programming. Th extremes of glory and of shame, Like east and west, become the same.

Chapter 2 Difficulties associated with corners

Transcription:

Optimization via the Hamilton-Jacobi-Bellman Method: Theory and Applications Navin Khaneja lectre notes taken by Christiane Koch Jne 24, 29 1 Variation yields a classical Hamiltonian system Sppose that we have a system that is described by the following eqations of motion, ẋ = f(x,, t) x R n, Ω R m, (1) where x denotes the state vector, the controls, and t time. We can inclde t in the state vector with ṫ = 1. Eq. (1) is then redced to ẋ = f(x, ) x R n+1, Ω R m. (2) The cost fnction for the optimization problem is typically written as a sm of two terms, J = φ(x f ) + L(x, )dt, (3) where φ(x f ) denotes the teral cost, with x f = x(t ) the state vector at final time T. The second term in Eq. (3) corresponds to the rnning cost. Adding zero, J can be rewritten J = φ(x f ) + = φ(x f ) + L(x, )dt + { L(x, ) + λ T f(x, ) + ( λ T )x The second line is obtained by integrating - λt ẋ dt by parts. In order to search for the optimm, we vary J, δj = φ δx f + λ T ( f(x, ) ẋ ) dt } dt λ T (T )x(t ) + λ T ()x(). { ] } L L f f δx(t) + δ(t) + λt δx(t) + δ(t) + λ T δx(t) dt λ T (T )δx(t ), (4) 1

where we have assmed that the initial state is fixed, δx() =, and δx(t ) = δx f. Regroping terms, we obtain ] φ δj = λ T δx f (5) {( ) ( ) } L f L f + + λt δx(t) + + λt δ(t) + ( λ T )δx(t) dt We are looking for an optimm of J, i.e. the variations shold vanish. The variation with respect to x(t) and x f, respectively, detere the eqation of motion for the adjoint state, λ T (t), and its initial condition: The term inside the sqare brackets in Eq. (5) is zero if ( φ T λ(t ) = ) and the terms inside the integral in Eq. (5) that are mltiplied with δx(t) vanish if ( ) L f T λ = + λt. We are then left with δj = We can introdce a (Hamiltonian) fnction We then see that and the variation of J becomes x f, ( ) L f + λt δ(t) dt. H(x, λ, ) = L(x, ) + λ T f(x, ). (6) δj = A small variation of can be written as λ = δ = ε, H δ(t) dt.. If is optimal, then at least H = sch that δj =. We then obtain the following set of 1st order necessary conditions, ẋ = f(x, ) = with x() (7) H λ λ = with λ(t ) = φ (8) = (9) (x(t),λ(t),(t)) 2

This set of eqations looks like a classical Hamiltonian system, where the state vector x takes the role of the coordinates and the adjoint state λ that of the conjgate momenta. 2 The maximm principle Let s first state Pontryagin s maximm principle 1] before proving it: Along the optimal trajectory, we have (t) = arg H ( x (t), λ (t), ). (1) Note that in or context it is rather a imm than a maximm principle since we seek to imize the cost. The claim is that if (t) is an optimal control, then it imizes H along the optimal trajectory globally at each instant of time. This is a very strong statement since we have only sed first order variation. To prove this 1, let s first recast J sch that it represents a final-time cost only. This can be done by defining, in Eq. (3), L(x, ) in terms of a state variable. That is, if or state vector is x = (x 1, x 2,..., x n ) R n, we add x n+1 with ẋ n+1 = L(x, ). (11) We can then write a cost fnctional that depends on the final time only, J = φ(x f ) + x n+1 (T ). (12) Now sppose that we know the optimal control (t) which deteres the optimal x (t). Dring a very short time interval, τ dτ, τ], let s change the control drastically from to v: Ω v v (t) τ dτ τ t x(t) 1 1 x(τ) x f t This leads to the following change in x, δx(τ) = f ( x (τ), v ) f ( x (τ), (τ) )] dτ. (13) We ths obtain a change of initial condition at time t = τ for the evoltion of x(t) at times t > τ. We can write (x + δx) = f(x + δx, + δ), 1 Note that we will show the proof only for reglar extrema. Pontryagin s proof covers also abnormal extrema 1] bt is mch more difficlt to follow. Roghly speaking, abnormal extrema are those where the final state x f can be reached only for a single T. 3

which leads to δx = f δx + f (x (t), (t)) δ (x }{{} (t), (t)) }{{} A(t) B(t) δx = A(t)δx(t) + B(t)δ(t). (14) This can be expressed in terms of a propagator (or Green s fnction), Φ, Or claim is now or, eqivalently, δx f = δx(t ) = Φ(T, τ) δx(τ). φ δx f, φ Φ(T, τ) δx(τ). (15) Using the initial condition, λ(t ), cf. Eq. (8), we can write for λ at time τ, T φ λ(τ) = Φ(T, τ)] = ( Φ(T, τ) ) T λ(t ) (16) Inserting Eq. (16) and Eq. (13) into Eq. (15), we obtain λ T (τ) f ( x (τ), v ) f ( x (τ), (t) )] δτ or, eqivalently, λ T (τ) f ( x (τ), (t) ) λ T (τ) f ( x (τ), v ) Since Eq. (6) for a final-time only cost becomes we obtain for all v Ω H(x, λ, ) = λ T f(x, ), (17) H(x, λ, ) H(x, λ, v), (18) which proves that indeed (t) imizes H globally along the optimal trajectory, i.e. Eq. (1). This necessary condition is stronger than jst the first order variation being zero. In fact, it is so restrictive that often it allows to detere the optimal soltion. In conclsion, if (t) is the global optimm, then there is a canonical way to define H with the necessary condition, Eq. (18) and generate the state x(t) and co-state λ(t). Eq. (17) implies H f = λt, which, together with the definition of A(t), cf. Eq. (14), leads to λ = = A(t) λ. (19) 4

An example: Controlling the force on a particle 1] The eqation of motion is given by which can be rewritten ẍ = with 1 ẋ 1 = x 2, ẋ 2 =. The goal is to drive the system from any initial point in phase space, (x 1 (), x 2 ()), to the origin in imal time. We therefore have to formlate time as a cost, ṫ = ẋ 3 = 1. Evalating Eq. (17), the eqations of motion lead to the Hamiltonian fnction for this optimization problem, H = λ 1 x 2 + λ 2 + λ 3. The λ j are fond from the Hamiltonian eqations, cf. Eq. (8), λ 1 = H 1 = λ 1 = c = const λ 2 = λ 1 λ 2 = c t + d λ 3 = We can now detere the control from Eq. (9), or, more specifically from the condition to maximize H. Taking the restriction, 1, into accont, H takes its maximm vale for = sgn(λ 2 ). This corresponds to sing the maximm allowed force, i.e. bang-bang control. Since λ 2 is a linear fnction, it changes sign only once. We therefore obtain for possibilities for : +1 +1 1 t t t t 1 Evalating the Hamiltonian fnction, Eq. (7), we can also obtain the evoltion of the state variables. For = 1, ẋ 2 =, yields x 2 (t) = t + c and for = 1, we get correspondingly x 1 (t) = t2 2 + c t + d = x2 2 2 + d 1, x 1 (t) = x2 2 2 + d 1. 5

That is, we obtain parabolas in phase space: A(t) x 2 I II B(t) x 1 If the controls are not switched, or initial point mst be on either crve A(t) or B(t), cf. Eq. (14), since these are the only parabolas that take s to the origin. If the initial point is (x 1 (), x 2 ()) I, then the control is switched from -1 to 1 when the parabola that contains (x 1 (), x 2 ()) and describes the flow (x 1 (t), x 2 (t) crosses B(t). The phase space flow then contines on B(t) to the origin. If (x 1 (), x 2 ()) II, then the control is switched from + 1 to 1 when the parabola that contains (x 1 (), x 2 ()) and describes (x 1 (t), x 2 (t) crosses A(t). The phase space flow then contines on A(t) to the origin. We have solved this example by constrcting the Hamiltonian fnction. Alternatively, it cold also be solved by considering the final-time cost, φ(x f ), which in the example above corresponds to i.e. the total time to reach the target. 2 φ(x 1,f, x 2,f, x 3,f ) = x 3,f = T, 3 Hamilton-Jacobi-Bellman principle When we transform all rnning costs into final costs φ(x f ), the optimization problem consists in designing the trajectory from initial point x to final point x f that imizes the final cost, φ(x f ). We seek to solve this problem globally for all initial points x and introdce the optimal retrn fnction, V (x ) = φ(x f ), (2) that is V (x) corresponds to the best or imm cost starting from initial vale x. Sppose that we have solved the optimization problem and know V (x). This defines the optimal trajectory from x to x f : 2 Note that φ(x f ) doesn t inclde the target state becase we assme the target state to be fixed and the zero variations δx 1,f, δx 2,f wold leave λ 1 and λ 2 ndefined. 6

Chicago 11 Boston δx() x x f 11 Santa Barbara Consider now the optimal retrn for small deviations from the optimal trajectory, The claim is now that V ( x + δx() ). V ( x + δx() ) = V (x). (21) This is the principle of dynamic programg or Hamilton-Jacobi-Bellman (HJB) principle. It states that if we have fond the optimal way to go from Santa Barbara to Boston, and Chicago is located on this way, then also the way from Chicago to Boston is optimal. We can rewrite Eq. (21) by expressing δx in terms of the eqation of motion of x, Eq. (1), which leads to the Bellman PDE, V (x) + V ] f(x, ) δt, = V (x) ] V f(x, ) =. (22) The optimal cost mst satisfy Eq. (22). If we want to achieve the optimization in inm time, we need to consider t as a state variable sch that or eqations of motion become Eq. (22) is then rewritten ẋ = f(x,, t), ṫ = 1 (23) V t + V ] f(x, ) =, (24) which is the Hamilton-Jacobi-Bellman eqation. Note that if there is no restriction on time, then V does not depend on t, i.e. the optimm is achieved at infinite time (these problems are called infinite horizon problems). We now show the connection between the HJB eqation, Eq. (24), and the Pontryagin principle, Eq. (1), by considering what happens along the optimal trajectory. Evalating, along the optimal trajectory, the partial derivative of V (x) with respect to x which also depends on x, we find ( ) V x (t) = λ T (t). (25) Inserting this into Eq. (22), we see that λ T (t)f ( x (t), )] =. 7

Since the term inside the sqare brackets is nothing bt the Hamiltonian, cf. Eq. (17), along the optimal trajectory, x (t), we see that the optimal control imizes H at each time. That is, we recover Pontryagin s imm principle. We can therefore detere the optimal control as the argment that imizes the Hamiltonian, or, (x) = arg (x) = arg H ( x (t), λ(t), )]. (26) ] V f(x, ). (27) Eq. (27) implies V f(x, (x)) =. (28) Eq. (26) implies that the Hamiltonian is zero along the optimal trajectory, why? H ( x (t), λ(t), (t) ) =, and that the optimal cost V is constant along the optimal trajectory, dv dt =, (x (t)) since dv dt = H. We now show that Eq. (25) indeed satisfies the eqation of motion for λ T (t), Eq. (8). Let s denote partial derivatives by sbscripts, Taking the derivative of Eq. (25), we obtain V = V x. λ T = V xx ( x (t) ) f(x, ). On the other hand, we can differentiate Eq. (28) with respect to x and find = ] V x f(x, ) = V xx f ( x, (x) ) + V x f x + V x f. For optimal, f = and we can express V xx in terms of V x f x and obtain λ T = V f. With Eq. (25), we find λ(t) = f T λ. 8

From the constrction of the Hamiltonian, Eq. (17), it follows that H = λt f, and we indeed recover Eq. (8). The connection to Pontryagin s maximm principle is easily made by considering the optimal retrn at the final point, V (x f ) = φ(x f ). This is inserted into Eq. (25) for t = T and we obtain References λ T (T ) = φ ( ). 1] L. S. Pontryagin. The Mathematical Theory of Optimal Processes. Gordon and Breach Science Pblishers, 1986. 9