Lecture 5 Linear Quadratic Stochastic Control

Similar documents
Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Lecture 4 Continuous time linear quadratic regulator

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

Pontryagin s maximum principle

Homework Solution # 3

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Optimal Control Theory

Lecture 20: Linear Dynamics and LQG

EE363 homework 2 solutions

Linear-Quadratic Optimal Control: Full-State Feedback

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Problem 1 Cost of an Infinite Horizon LQR

6.241 Dynamic Systems and Control

EE C128 / ME C134 Feedback Control Systems

EE363 homework 5 solutions

Steady State Kalman Filter

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

EE363 Review Session 1: LQR, Controllability and Observability

EE221A Linear System Theory Final Exam

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Lecture 19 Observability and state estimation

6 OUTPUT FEEDBACK DESIGN

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Chapter 2 Optimal Control Problem

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

Observability and state estimation

Optimal control and estimation

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

4F3 - Predictive Control

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Suppose that we have a specific single stage dynamic system governed by the following equation:

Stochastic resource allocation Boyd, Akuiyibo

Stochastic and Adaptive Optimal Control

Robotics. Control Theory. Marc Toussaint U Stuttgart

Deterministic Dynamic Programming

The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty

OPTIMAL CONTROL SYSTEMS

Linear Systems. Manfred Morari Melanie Zeilinger. Institut für Automatik, ETH Zürich Institute for Dynamic Systems and Control, ETH Zürich

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

EE C128 / ME C134 Final Exam Fall 2014

Lecture 2 and 3: Controllability of DT-LTI systems

Z i Q ij Z j. J(x, φ; U) = X T φ(t ) 2 h + where h k k, H(t) k k and R(t) r r are nonnegative definite matrices (R(t) is uniformly in t nonsingular).

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

1 Markov decision processes

Extensions and applications of LQ

14 - Gaussian Stochastic Processes

Course Outline. Higher Order Poles: Example. Higher Order Poles. Amme 3500 : System Dynamics & Control. State Space Design. 1 G(s) = s(s + 2)(s +10)

EL2520 Control Theory and Practice

Lecture 2: Discrete-time Linear Quadratic Optimal Control

Linear Quadratic Regulator (LQR) Design II

Outline. 1 Linear Quadratic Problem. 2 Constraints. 3 Dynamic Programming Solution. 4 The Infinite Horizon LQ Problem.

1 Linear Quadratic Control Problem

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

6. Linear Quadratic Regulator Control

Risk-Sensitive and Robust Mean Field Games

Bayesian Decision Theory in Sensorimotor Control

Robust Control 5 Nominal Controller Design Continued

QUALIFYING EXAM IN SYSTEMS ENGINEERING

9 Controller Discretization

ECEN 605 LINEAR SYSTEMS. Lecture 20 Characteristics of Feedback Control Systems II Feedback and Stability 1/27

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

A Globally Stabilizing Receding Horizon Controller for Neutrally Stable Linear Systems with Input Constraints 1

Lecture notes: Rust (1987) Economics : M. Shum 1

Outline. Linear regulation and state estimation (LQR and LQE) Linear differential equations. Discrete time linear difference equations

Linearly-Solvable Stochastic Optimal Control Problems

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

LINEAR QUADRATIC GAUSSIAN

Chapter 3. LQ, LQG and Control System Design. Dutch Institute of Systems and Control

Lecture 9. Introduction to Kalman Filtering. Linear Quadratic Gaussian Control (LQG) G. Hovland 2004

6. Approximation and fitting

Time-Invariant Linear Quadratic Regulators!

1 Basics of probability theory

Stochastic Tube MPC with State Estimation

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

Deterministic Kalman Filtering on Semi-infinite Interval

Static and Dynamic Optimization (42111)

Formula Sheet for Optimal Control

Lecture 5: Control Over Lossy Networks

Exam. 135 minutes, 15 minutes reading time

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

Markov Decision Processes Chapter 17. Mausam

EE363 homework 1 solutions

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Linear Quadratic Regulator (LQR) Design I

Q-Learning and Stochastic Approximation

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

Chapter 2 Event-Triggered Sampling

4F3 - Predictive Control

Model predictive control of industrial processes. Vitali Vansovitš

Module 05 Introduction to Optimal Control

Transcription:

EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1

Linear stochastic system linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., N 1 w t is the process noise or disturbance at time t w t are IID with E w t = 0, E w t w T t = W x 0 is independent of w t, with Ex 0 = 0, Ex 0 x T 0 = X Linear Quadratic Stochastic Control 5 2

Control policies state-feedback control: u t = φ t (x t ), t = 0,..., N 1 φ t : R n R m called the control policy at time t roughly speaking: we choose input after knowing the current state, but before knowing the disturbance closed-loop system is x t+1 = Ax t + Bφ t (x t ) + w t, t = 0,..., N 1 x 0,..., x N, u 0,..., u N 1 are random Linear Quadratic Stochastic Control 5 3

Stochastic control problem objective: J = E ( N 1 t=0 ( x T t Qx t + u T t Ru t ) + x T N Q f x N ) with Q, Q f 0, R > 0 J depends (in complex way) on control policies φ 0,..., φ N 1 linear-quadratic stochastic control problem: choose control policies φ 0,..., φ N 1 to minimize J ( linear refers to the state dynamics; quadratic to the objective) an infinite dimensional problem: variables are functions φ 0,..., φ N 1 Linear Quadratic Stochastic Control 5 4

Solution via dynamic programming let V t (z) be optimal value of objective, from t on, starting at x t = z V t (z) = min E φ t,...,φ N 1 ( N 1 τ=t ( x T τ Qx τ + u T τ Ru τ ) + x T N Q f x N ) subject to x τ+1 = Ax τ + Bu τ + w τ, u τ = φ τ (x τ ) we have V N (z) = z T Q f z J = EV 0 (x 0 ) (expectation over x 0 ) Linear Quadratic Stochastic Control 5 5

V t can be found by backward recursion: for t = N 1,...,0 V t (z) = z T Qz + min v { v T Rv + EV t+1 (Az + Bv + w t ) } expectation is over w t we do not know where we will land, when we take u t = v optimal policies have form φ t(x t ) = argmin v { v T Rv + EV t+1 (Ax t + Bv + w t ) } Linear Quadratic Stochastic Control 5 6

Explicit form let s show (via recursion) value functions are quadratic, with form V t (x t ) = x T t P t x t + q t, t = 0,..., N, with P t 0 P N = Q N, q N = 0 now assume that V t+1 (z) = z T P t+1 z + q t+1 Linear Quadratic Stochastic Control 5 7

Bellman recursion is V t (z) = z T Qz + min v {v T Rv + E((Az + Bv + w t ) T P t+1 (Az + Bv + w t ) + q t+1 )} = z T Qz + Tr(WP t+1 ) + q t+1 + min v {v T Rv + (Az + Bv) T P t+1 (Az + Bv)} we use E(w T t P t+1 w t ) = Tr(WP t+1 ) same recursion as deterministic LQR, with added constant optimal policy is linear state feedback: φ t(x t ) = K t x t, K t = (B T P t+1 B + R) 1 B T P t+1 A (same form as in deterministic LQR) Linear Quadratic Stochastic Control 5 8

plugging in optimal w gives V t (z) = z T P t z + q t, with P t = A T P t+1 A A T P t+1 B(B T P t+1 B + R) 1 B T P t+1 A + Q q t = q t+1 + Tr(WP t+1 ) first recursion same as for deterministic LQR second term is just a running sum we conclude that P t, K t are same as in deterministic LQR strangely, optimal policy is same as LQR, and independent of X, W Linear Quadratic Stochastic Control 5 9

optimal cost is J = E V 0 (x 0 ) = Tr(XP 0 ) + q 0 N = Tr(XP 0 ) + Tr(WP t ) t=1 interpretation: x T 0 P 0 x 0 is optimal cost of deterministic LQR, with w 0 = = w N 1 = 0 Tr(XP 0 ) is average optimal LQR cost, with w 0 = = w N 1 = 0 Tr(WP t ) is average optimal LQR cost, for Ex t = 0, Ex t x T t = W, w t = = w N 1 = 0 Linear Quadratic Stochastic Control 5 10

Infinite horizon choose policies to minimize average stage cost J = lim N N 1 1 N E t=0 ( x T t Qx t + u T t Ru t ) optimal average stage cost is J = Tr(WP ss ) where P ss satisfies the ARE P ss = Q + A T P ss A A T P ss B(R + B T P ss B) 1 B T P ss A optimal average stage cost doesn t depend on X Linear Quadratic Stochastic Control 5 11

(an) optimal policy is constant linear state feedback u t = K ss x t where K ss = (R + B T P ss B) 1 B T P ss A K ss is steady-state LQR feedback gain doesn t depend on X, W Linear Quadratic Stochastic Control 5 12

Example system with n = 5 states, m = 2 inputs, horizon N = 30 A, B chosen randomly; A scaled so max i λ i (A) = 1 Q = I, Q f = 10I, R = I x 0 N(0, X), X = 10I w t N(0, W), W = 0.5I Linear Quadratic Stochastic Control 5 13

Sample trajectories sample trace of (x t ) 1 and (u t ) 1 4 2 (xt)1 0 2 4 0 5 10 15 20 25 30 2 (ut)1 1 0 1 0 5 10 15 20 25 30 blue: optimal stochastic control, red: no control (u 0 = = u N 1 = 0) t Linear Quadratic Stochastic Control 5 14

cost histogram for 1000 simulations Cost histogram 200 100 J 0 0 100 200 300 400 500 600 700 200 100 0 0 100 200 300 400 500 600 700 200 100 0 0 100 200 300 400 500 600 700 200 100 J pre J ol J nc 0 0 100 200 300 400 500 600 700 J Linear Quadratic Stochastic Control 5 15

Comparisons we compared optimal stochastic control (J = 224.2) with prescient control decide input sequence with full knowledge of future disturbances u 0,..., u N 1 computed assuming all w t are known J pre = 137.6 open-loop control u 0,..., u N 1 depend only on x 0 u 0,..., u N 1 computed assuming w 0 = = w N 1 = 0 J ol = 423.7 no control u 0 = = u N 1 = 0 J nc = 442.0 Linear Quadratic Stochastic Control 5 16