Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Similar documents
Deterministic Dynamic Programming

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Static and Dynamic Optimization (42111)

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Homework Solution # 3

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Problem 1 Cost of an Infinite Horizon LQR

Lecture 5 Linear Quadratic Stochastic Control

EE221A Linear System Theory Final Exam

Chapter 5. Pontryagin s Minimum Principle (Constrained OCP)

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

Stochastic and Adaptive Optimal Control

Robotics. Control Theory. Marc Toussaint U Stuttgart

Lecture 2 and 3: Controllability of DT-LTI systems

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

Control of Dynamical System

Module 05 Introduction to Optimal Control

Advanced Mechatronics Engineering

Steady State Kalman Filter

OPTIMAL CONTROL SYSTEMS

Taylor expansions for the HJB equation associated with a bilinear control problem

Game Theory Extra Lecture 1 (BoB)

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

3 rd Tutorial on EG4321/EG7040 Nonlinear Control

EE363 Review Session 1: LQR, Controllability and Observability

Notes on Control Theory

Pontryagin s maximum principle

Math 106 Fall 2014 Exam 2.1 October 31, ln(x) x 3 dx = 1. 2 x 2 ln(x) + = 1 2 x 2 ln(x) + 1. = 1 2 x 2 ln(x) 1 4 x 2 + C

Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field

Lecture 3: Markov Decision Processes

6 OUTPUT FEEDBACK DESIGN

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

9 Controller Discretization

Introduction to Reachability Somil Bansal Hybrid Systems Lab, UC Berkeley

Continuous Time Finance

6.241 Dynamic Systems and Control

Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids

Linear-Quadratic Optimal Control: Full-State Feedback

EE363 homework 2 solutions

Bayesian Decision Theory in Sensorimotor Control

EE C128 / ME C134 Feedback Control Systems

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

Chapter #4 EEE8086-EEE8115. Robust and Adaptive Control Systems

Lecture 4 Continuous time linear quadratic regulator

In mathematics (and physics...) such quantities are called vectors and are represented by a directed line segment.

Optimal Control Theory

ESC794: Special Topics: Model Predictive Control

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HIGH-ORDER STATE FEEDBACK GAIN SENSITIVITY CALCULATIONS USING COMPUTATIONAL DIFFERENTIATION

Lecture 3: Hamilton-Jacobi-Bellman Equations. Distributional Macroeconomics. Benjamin Moll. Part II of ECON Harvard University, Spring

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Optimal Control with Learned Forward Models

EE C128 / ME C134 Final Exam Fall 2014

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

4F3 - Predictive Control

Section Taylor and Maclaurin Series

The Linear Quadratic Regulator

Introduction to Algorithmic Trading Strategies Lecture 4

4F3 - Predictive Control

Optimal nonlinear control for time delay system. Delsys 2013

The variational iteration method for solving linear and nonlinear ODEs and scientific models with variable coefficients

Computational Issues in Nonlinear Dynamics and Control

Suppose that we have a specific single stage dynamic system governed by the following equation:

Optimal Control. Lecture 3. Optimal Control of Discrete Time Dynamical Systems. John T. Wen. January 22, 2004

Solution of Stochastic Optimal Control Problems and Financial Applications

Lecture 5: Control Over Lossy Networks

Nonlinear Control. Nonlinear Control Lecture # 8 Time Varying and Perturbed Systems

Linearly-Solvable Stochastic Optimal Control Problems

Reverse Order Swing-up Control of Serial Double Inverted Pendulums

The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty

Numerical Optimal Control Overview. Moritz Diehl

Linear Quadratic Regulator (LQR) Design I

Review: control, feedback, etc. Today s topic: state-space models of systems; linearization

Nonlinear Systems and Control Lecture # 12 Converse Lyapunov Functions & Time Varying Systems. p. 1/1

Linear Quadratic Regulator (LQR) I

Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems

Theoretical Justification for LQ problems: Sufficiency condition: LQ problem is the second order expansion of nonlinear optimal control problems.

EML5311 Lyapunov Stability & Robust Control Design

34.3. Resisted Motion. Introduction. Prerequisites. Learning Outcomes

The Important State Coordinates of a Nonlinear System

Thuong Nguyen. SADCO Internal Review Metting

Computational Issues in Nonlinear Control and Estimation. Arthur J Krener Naval Postgraduate School Monterey, CA 93943

AMS 212A Applied Mathematical Methods I Appendices of Lecture 06 Copyright by Hongyun Wang, UCSC. ( ) cos2

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

Viscosity Solutions for Dummies (including Economists)

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

1. The Transition Matrix (Hint: Recall that the solution to the linear equation ẋ = Ax + Bu is

Direct Methods. Moritz Diehl. Optimization in Engineering Center (OPTEC) and Electrical Engineering Department (ESAT) K.U.

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Lyapunov Stability Analysis for Feedback Control Design

Transcription:

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm

Discrete VS Continuous x k 1 = f x k,u k k 0,..., N ẋ t = f x t,u t 0 t T N 1 g N x N k=0 g k x k,u k T h x T 0 g x t,u t dt J N x N =g N x N J k x k =min u k U k { g k x k,u k + J k 1 x k,u k } 0=min u U V T, x =h x { g x,u t V t, x + x V t, x ' f x,u }

HJB Equation Extension of Hamilton-Jacobi equation (classical mechanics) Solution is the optimal cost-to-go function Applications path planning medical financial

Derivation Start with continuous time interval t [0,T ] Discretize into N pieces so that = T N Denote x k =x k, k =0,..., N u k =u k, k =0,..., N

Derivation Remember ẋ t = f x t,u t T h x T 0 g x t,u t dt Approximate the continuous time by x k 1 =x k f x k,u k N 1 h x N g x k,u k k=0

Derivation J * t, x : Optimal cost-to-go function for continuous time problem J * t, x : Optimal cost-to-go function for discrete time approximation

Derivation From discrete time DP J N x N =g N x N J k x k =min u k U k { g k x k,u k + J k 1 x k,u k } For the discrete time approximation J * N, x =h x J * k, x = min { g x,u J * k 1, x f x,u } u k U k

Derivation Reminder: Taylor series expansion for f x, y f x x, y y = i=0 { 1 i! [ x x y y ] i f x, y } Assume that the Taylor series expansion exists J * k, x f x,u = i=0 Ignoring all higher order terms { 1 i! [ t f x,u x ] i o J * k, x } J * k, x f x,u = J * k, x t J * k, x f x,u x J * k, x ' o

Derivation Combine J * k, x = min { g x,u J * k 1, x f x,u } u k U k J * k, x f x,u = J * k, x t J * k, x f x,u x J * k, x ' o We get J * k, x =min { g x,u J * k, x t u U f x,u x J * k, x ' o } J * k, x +

Derivation J * k, x =min { g x,u J * k, x t u U f x,u x J * k, x ' o } J * k, x + 0=min u U { g x,u t J * k, x f x,u x J * k, x ' o } 0=min u U { g x,u t J * k, x f x,u x J * k, x ' o }

Derivation 0=min u U { g x,u t J * k, x f x,u x J * k, x ' o } Take the limit 0 k k =t Assume that lim 0,k, k =t J * k, x =J * t, x 0=min u U { g x,u t J * t, x f x,u x J * t, x ' } J * T, x =h x

Claim If V(t,x) is a solution to the HJB equation, V(t,x) equals the optimal cost-to-go function for all t and x

Proof From 0=min u U { g x,u t V t, x f x,u x V t, x ' } With any control and state trajectory { u t t [0,T ]} { x t t [0,T ]} 0 g x t, u t t V t, x t f x t, u t x V t, x t '

Proof Substitute in x t = f x t, u t 0 g x t, u t t V t, x t x t x V t, x t ' 0 g x t, u t dv t, x t dt T 0 0 g x t, u t dt T 0 dv t, x t dt dt

Proof T Evaluate 0 0 g x t, u t dt T 0 dv t, x t dt dt T 0 0 g x t, u t dt V T, x T V 0, x 0 For any state and control trajectory T V 0, x 0 h x T 0 g x t, u t dt

Proof For optimal state and control trajectory T V 0, x 0 =h x * T 0 {u * t t [0,T ]} { x * t t [0,T ]} g x * t,u * t dt=j * 0, x 0 V 0, x 0 =J * 0, x 0

HJB Example Consider the simple scalar system ẋ t =u t u t 1 t [0,T ] The terminal cost V T, x = 1 2 x 2 HJB Equation 0=min u 1 { t V t, x u x V t, x }

HJB Example Candidate control policy u * t, x = sgn x Optimal cost-to-go function J * t, x = 1 2 max {0, x T t } 2 Check to see if it solves the HJB equation

HJB Example J * t, x = 1 2 max {0, x T t } 2 t J * t, x =max {0, x T t } x J * t, x =sgn x max {0, x T t } 0=min u 1 { t V t, x u x V t, x }=min u 1 {1 u sgn x }max {0, x T t }

LQR and HJB Remember the continuous time LQR ẋ t = Ax t Bu t T J =x T ' Q T x T x t ' Qx t u t ' Ru t dt 0 g x,u =x ' Qx u ' Ru h x =x ' Q T x

LQR and HJB The HJB Equation 0=min u { x ' Qx u ' Ru t V t, x Ax Bu x V t, x ' } V T, x =x ' Q T x Try a solution of the form V t, x =x ' K t x x V t, x =2K t x t V t, x =x ' K t x

LQR and HJB Substitute in the HJB equation 0=min u { x ' Qx u ' Ru x ' K t x 2x ' K t Ax 2x ' K t Bu } Differentiate to find minimum 0= u { x ' Qx u ' Ru x ' K t x 2x ' K t Ax 2x ' K t Bu } 2 B' K t x 2 R u=0 u= R 1 B ' K t x

LQR and HJB Therefore, at minimum u 0=x ' K t K t A A' K t K t B R 1 B' K t Q x 0= K t K t A A ' K t K t B R 1 B ' K t Q K t = K t A A ' K t K t B R 1 B' K t Q Continuous-time Riccati Equation

LQR and HJB Given that K(t) satisfies the Riccati equation J * t, x =V t, x =x ' K t x And the optimal control policy is u * t, x = R 1 B' K t x