ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

Similar documents
ECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

Lecture 5 Linear Quadratic Stochastic Control

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Lecture Note 7: Switching Stabilization via Control-Lyapunov Function

On Piecewise Quadratic Control-Lyapunov Functions for Switched Linear Systems

CONTROLLER SYNTHESIS FOR SWITCHED SYSTEMS USING APPROXIMATE DYNAMIC PROGRAMMING. A Dissertation. Submitted to the Faculty.

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

4F3 - Predictive Control

Optimal control and estimation

Stabilization of Discrete-Time Switched Linear Systems: A Control-Lyapunov Function Approach

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

Optimal Control Theory

Robotics. Control Theory. Marc Toussaint U Stuttgart

ECE7850 Lecture 9. Model Predictive Control: Computational Aspects

Model Predictive Control Short Course Regulation

EE C128 / ME C134 Feedback Control Systems

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Automatic Control 2. Nonlinear systems. Prof. Alberto Bemporad. University of Trento. Academic year

Linear-Quadratic Optimal Control: Full-State Feedback

ESC794: Special Topics: Model Predictive Control

Lecture 4 Continuous time linear quadratic regulator

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

ESC794: Special Topics: Model Predictive Control

Lecture 2: Discrete-time Linear Quadratic Optimal Control

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear Quadratic Regulator (LQR) Design I

Problem 1 Cost of an Infinite Horizon LQR

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

EE363 homework 2 solutions

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

5. Observer-based Controller Design

Homework Solution # 3

Asymptotic stability and transient optimality of economic MPC without terminal conditions

Outline. Linear regulation and state estimation (LQR and LQE) Linear differential equations. Discrete time linear difference equations

Computational Issues in Nonlinear Dynamics and Control

IEOR 265 Lecture 14 (Robust) Linear Tube MPC

Infinite-Horizon Switched LQR Problems in Discrete Time: A Suboptimal Algorithm With Performance Analysis

EE363 homework 8 solutions

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Generalized Input-to-State l 2 -Gains of Discrete-Time Switched Linear Control Systems

6. Linear Quadratic Regulator Control

EE363 Review Session 1: LQR, Controllability and Observability

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Models for Control and Verification

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hybrid Control and Switched Systems. Lecture #9 Analysis tools for hybrid systems: Impact maps

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1

Formula Sheet for Optimal Control

Linear Systems. Manfred Morari Melanie Zeilinger. Institut für Automatik, ETH Zürich Institute for Dynamic Systems and Control, ETH Zürich

Value and Policy Iteration

EL2520 Control Theory and Practice

6.241 Dynamic Systems and Control

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

1 Markov decision processes

Linear Quadratic Regulator (LQR) I

Suppose that we have a specific single stage dynamic system governed by the following equation:

5. Solving the Bellman Equation

Topic # /31 Feedback Control Systems. Analysis of Nonlinear Systems Lyapunov Stability Analysis

Module 07 Controllability and Controller Design of Dynamical LTI Systems

Lyapunov Stability Theory

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Stochastic resource allocation Boyd, Akuiyibo

Time-Invariant Linear Quadratic Regulators!

Semidefinite Programming Duality and Linear Time-invariant Systems

EE363 homework 7 solutions

Lecture 6. Foundations of LMIs in System and Control Theory

Z i Q ij Z j. J(x, φ; U) = X T φ(t ) 2 h + where h k k, H(t) k k and R(t) r r are nonnegative definite matrices (R(t) is uniformly in t nonsingular).

Hybrid Control and Switched Systems. Lecture #11 Stability of switched system: Arbitrary switching

Numerical Optimal Control Overview. Moritz Diehl

4F3 - Predictive Control

Nonlinear reference tracking: An economic model predictive control perspective

Markov Decision Processes and Dynamic Programming

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

Input to state Stability

On the Stability of the Best Reply Map for Noncooperative Differential Games

Linear Quadratic Regulator (LQR) Design II

Chapter 3 Nonlinear Model Predictive Control

Optimal control of nonlinear systems with input constraints using linear time varying approximations

Postface to Model Predictive Control: Theory and Design

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Linearly-Solvable Stochastic Optimal Control Problems

Chapter 2 Optimal Control Problem

MATH4406 (Control Theory) Unit 1: Introduction to Control Prepared by Yoni Nazarathy, August 11, 2014

Enlarged terminal sets guaranteeing stability of receding horizon control

Alberto Bressan. Department of Mathematics, Penn State University

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Extensions and applications of LQ

Linear conic optimization for nonlinear optimal control

The Inverted Pendulum

Stochastic and Adaptive Optimal Control

EML5311 Lyapunov Stability & Robust Control Design

Objective. 1 Specification/modeling of the controlled system. 2 Specification of a performance criterion

Chapter 3. LQ, LQG and Control System Design. Dutch Institute of Systems and Control

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

Lecture 20: Linear Dynamics and LQG

Transcription:

ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1

DT nonlinear control system: Discrete Time Optimal Control Problem x(t +1)=f(x(t),u(t)),x X, u U, t Z + (1) For traditional system: X R n, U R m are continuous variables A large class of DT hybrid systems can also be written in (or viewed as) the above form: switched systems: U R m Qwith mixed continuous/discrete control input Discrete Time Optimal Control Problem 2

variable structure system (e.g. Piecewise affine system): f defined in terms of partitions: f(x(t),u(t)) = f i (x(t),u(t)), if x(t) Ω i more general hybrid system: X R n Q x and U R m Q u Actual control constraint is state-dependent: U(x) ={u U : f(x, u) X} Discrete Time Optimal Control Problem 3

A general way to determine control: time-varying state-feedback law: u(t) =μ t (x(t)) require: μ t (x(t)) U(x(t)) Control policy: N-horizon policy: π N = {μ 0,...,μ N 1 } Π N Infinite-horizon policy: π = {μ 0,μ 1,...} Π Stationary policy: π = {μ,μ,...} Π Discrete Time Optimal Control Problem 4

x( ; z, π): Cl-trajectory starting from z under policy π = {μ 0,μ 1,...} x(t +1;z, π) =f(x(t; z, π),μ t (x(t; z, π))) (2) Corresponding control sequence: u( ; z, π) =μ t (x(t; z, π)) Quantify performance through cost functions: Finite Horizon: J N (z, π N )= N 1 t=0 l(x(t; z, π N),u(t; z, π N )) + J f (x(n; z, π N )) Infinite Horizon: J (z, π )= t=0 l(x(t; z, π ),u(t; z, π )) running cost l : X U R + ; terminal cost: J f : X R + Discrete Time Optimal Control Problem 5

Optimal control problem: Finite-Horizon: V N (z) =inf πn Π N J N (z, π N ) Infinite-Horizon: V (z) =inf π Π J (z, π ) V N : N-horizon value function; V : infinite horizon value function lim N V N may not exist; even when it exists V V in general Discrete Time Optimal Control Problem 6

Short Introduction to Dynamic Programming Bellman s principle of optimality: any segment along an optimal trajectory is also optimal among all the trajectories joining the two end points of the segment Short Introduction to Dynamic Programming 7

This leads to the backward value iteration: compute V N iteratively starting from V 0 V 0 = J f : zero-horizon cost is just the terminal cost function Suppose we know V j for some j 0, then V j+1 (z) =min u U(z) {l(z, u)+v j (f(z, u))} u = argmin u U(z) {l(z, u)+v j (f(z, u))}: optimal control when there is j +1-step left and system is at state z Short Introduction to Dynamic Programming 8

Example 1 (Shortest Path Problem): Find the shortest path to a 4 X = {a 1,...,a 8 }; U(x): available links to take at x X dynamics: x(k +1)=f(x(k +1),u(k + 1)); running cost: l(x(k),u(k)) = edge length terminal cost: J f (x) = Value iteration: 0 if x = a 4 ow 4 6 3 4 3 2 4 3 5 5 2 Short Introduction to Dynamic Programming 9

Linear Quadratic Regulator: System dynamics: x(t +1)=Ax(t)+Bu(t) Running cost: l(z, u) =z T Qz + u T Ru; Terminal cost: J f () = z T Q f z Suppose j-horizon value function is V j (z) =z T P j z, derive V j+1 (z) Short Introduction to Dynamic Programming 10

Derivation (cont.): optimal control u at t = N (j +1): Short Introduction to Dynamic Programming 11

Summary of LQR results: j-horizon value function: V j (z) =z T P j z, where P j is generated by Riccati recursion P j+1 = ρ(p j ) ρ(p )=Q + A T PA A T PB(R + B T PB) 1 B T PA (3) To compute LQR controller 1. Start with P 0 = Q f 2. Riccati recursion: P j+1 = ρ(p j ) 3. Compute optimal feedback gain: K j+1 = (R + B T P j B) 1 B T P j A To use LQR controller: 1. initial state: x 0 2. compute control input: u (t) =K N t x (t) 3. state update: x (t +1)=Ax (t)+bu (t) Short Introduction to Dynamic Programming 12

Theorem 1 (Infinite-Horizon LQR): If (A, B) is stabilizable and (A, C) is detectable, where Q = C T C, then as j, P j P and K j K with stable cl-system (A + BK ); P satisfies the algebraic Riccati equation: P = ρ(p ) K = (R + B T P B) 1 B T P A K(P ) Short Introduction to Dynamic Programming 13

Value Iteration Operator To simplify notations, we define two function spaces: G {g : X R {± }}and G + {g : X R {+ }} Value iteration can be viewed as an operator T : G G T [g](z) = min l(z, u)+g(f(z, u)), z X, g G (4) u U(z) Iteration under a given control law μ : X U with μ(x) X, x X is defined as T μ [g](z) =l(z, μ(z)) + g(f(z, μ(z))), z X, g G (5) Short Introduction to Dynamic Programming 14

T k : the composition of T with itself k times: T k+1 [g] =T [T k [g]], with T 0 [g] =g Theorem 2 Key Properties of Value Iteration: 1. Value Iteration: V N = T N (J f ) 2. Bellman Equation: V G + is a fixed point of T, i.e. T [V ]=V 3. Minimum Property: For any g G +,ifg T[g], then g V 4. Monotonicity: If J f 0, then J f = V 0 V 1 V 5. Stationary Policy: A stationary policy π = {μ,μ...} is optimal if and only if T [V ]= T μ [V ] Short Introduction to Dynamic Programming 15

Some unfortunate facts: 1. lim N V N may not exist 2. Even when lim N V N exists, we may have V V 3. Bellman equation: T [g] =g may have infinitely many solutions that are not V Counterexamples can be found in Bertsekas book (volume II) These cases won t happen if state space is finite, cost per stage l(z, u) is bounded, and cost function is discounted. However, for control problems, we often don t have the luxury to make these assumptions Short Introduction to Dynamic Programming 16

Connection to Stabilization Problems : an arbitrary p-norm raised to an arbitrary positive power, namely, r p, for some p, r R + Definition 1 (Exponential stabilizability): π such that x(t; z, π ) cγ t z Definition 2 (Exponentially Stabilizing Control Lyapunov Function (ECLF)): A PD function g G + is called an ECLF if μ with μ(z) U(z), z X, such that κ 2 z g(z) κ 1 z g(z) g(f(z, μ(z))) κ 3 z (6) Connection to Stabilization Problems 17

If condition (6) holds, then CL-system under π = {μ,μ,...} is exponentially stable with x(t; z, π ) κ 2 κ 1 1 1+κ 3 /κ 2 t z (7) How does stabilization relate to optimal control? Let us start with a well known result for LQR (see Slide 10 for notations) Theorem 3 If (A, B) is stabilizable, and (A,C) is detectable, where Q = C T C, then 1. Convergence of value iteration: P j P where P j+1 = ρ(p j ) with P 0 =0 2. V (z) =z T P z is an ECLF 3. μ(z) =K(P )z is an stabilizing feedback law Connection to Stabilization Problems 18

The theorem indicates: if we choose cost function properly (i.e. to make (A, C) detectable where Q = C T C), then a linear system if stabilizable if and only if it can be stabilized by the LQR controller Question: can we generalize such result, to what extent, and how? Roughly speaking, if a system can be (exponentially) stabilized, then we can always stabilize it by solving a properly defined optimal control problem. Connection to Stabilization Problems 19

Main idea: select running cost l : X U R + such that exponentially stabilizable V (z) β z (8) V (z) β z the optimal trajectory exponentially stable (9) This can be achieved by various types of cost functions. To illustrate the key idea, let us choose the simplest one, say l(z, u) = z exponentially stabilizable V (z) β z for some β< Connection to Stabilization Problems 20

V (z) β z V (z) is an ECLF Note: always stationary stabilizing policy π = {μ,μ,...}, where T μ [V ]=V : Connection to Stabilization Problems 21

can also incorporate nontrivial control cost (e.g. l(z, u) = z + u ). This will not affect condition (9), but condition (8) requires more discussions. For example, if the stabilizing control inputs u does not converge, then V (z) cannot be bounded. All that matters is whether stabilizable with finite control energy For linear systems (and many other systems) stabilizability always means stabilizability with finite control energy thus, adding control cost will not destroy the stability of the optimal trajectory Connection to Stabilization Problems 22

In summary, if system is exponentially stabilizable, then with properly selected running cost function l(z, u), V is an ECLS optimal infinite-horizon policy π = {μ,μ,...} is exponentially stabilizing This provides a unified way to construct ECLF and stabilizing controller. However, infinite-horizon optimal control is often more challenging to solve than stabilization One way is to use V N (and μ N ) to replace V (and μ ) Model Predictive Control Connection to Stabilization Problems 23