Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Similar documents
Lecture 5 Linear Quadratic Stochastic Control

6 OUTPUT FEEDBACK DESIGN

Steady State Kalman Filter

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

= m(0) + 4e 2 ( 3e 2 ) 2e 2, 1 (2k + k 2 ) dt. m(0) = u + R 1 B T P x 2 R dt. u + R 1 B T P y 2 R dt +

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

Lecture 4 Continuous time linear quadratic regulator

Lecture 5: Control Over Lossy Networks

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

EE363 homework 5 solutions

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

Problem 1 Cost of an Infinite Horizon LQR

Linear-Quadratic Optimal Control: Full-State Feedback

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

6.241 Dynamic Systems and Control

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Optimal control and estimation

A Stochastic Online Sensor Scheduler for Remote State Estimation with Time-out Condition

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

EE C128 / ME C134 Final Exam Fall 2014

Stochastic and Adaptive Optimal Control

9 Controller Discretization

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Bayesian Decision Theory in Sensorimotor Control

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

EL2520 Control Theory and Practice

Lecture 2 and 3: Controllability of DT-LTI systems

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Homework Solution # 3

EE221A Linear System Theory Final Exam

FIR Filters for Stationary State Space Signal Models

Lecture 9. Introduction to Kalman Filtering. Linear Quadratic Gaussian Control (LQG) G. Hovland 2004

Kalman Filter and Parameter Identification. Florian Herzog

Networked Sensing, Estimation and Control Systems

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Lecture 20: Linear Dynamics and LQG

Robust Control 5 Nominal Controller Design Continued

Optimization-Based Control

Time Series Analysis

Chapter 3. LQ, LQG and Control System Design. Dutch Institute of Systems and Control

Linear Quadratic Zero-Sum Two-Person Differential Games

EE363 homework 2 solutions

Output-based Event-triggered Control with Performance Guarantees

EE363 Review Session 1: LQR, Controllability and Observability

Pontryagin s maximum principle

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

4F3 - Predictive Control

Feedback Control of Turbulent Wall Flows

Linear Quadratic Regulator (LQR) Design I

Lecture 5 Least-squares

MATH4406 (Control Theory) Unit 6: The Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) Prepared by Yoni Nazarathy, Artem

5. Observer-based Controller Design

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

DECENTRALIZED CONTROL OF INTERCONNECTED SINGULARLY PERTURBED SYSTEMS. A Dissertation by. Walid Sulaiman Alfuhaid

4 Derivations of the Discrete-Time Kalman Filter

Lecture 19 Observability and state estimation

Linear-Quadratic-Gaussian Controllers!

Machine Learning 4771

Optimization-Based Control

CS281A/Stat241A Lecture 17

1 Kalman Filter Introduction

Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph

The Uncertainty Threshold Principle: Some Fundamental Limitations of Optimal Decision Making under Dynamic Uncertainty

6. Linear Quadratic Regulator Control

Optimization-Based Control

MPC/LQG for infinite-dimensional systems using time-invariant linearizations

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

False Data Injection Attacks in Control Systems

State estimation and the Kalman filter

Factor Analysis and Kalman Filtering (11/2/04)

Linear Systems. Manfred Morari Melanie Zeilinger. Institut für Automatik, ETH Zürich Institute for Dynamic Systems and Control, ETH Zürich

Exam in Automatic Control II Reglerteknik II 5hp (1RT495)

Time Varying Optimal Control with Packet Losses.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Contents lecture 5. Automatic Control III. Summary of lecture 4 (II/II) Summary of lecture 4 (I/II) u y F r. Lecture 5 H 2 and H loop shaping

EE363 homework 8 solutions

POLE PLACEMENT. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 19

THIS work is concerned with motion planning in Dynamic,

High-Gain Observers in Nonlinear Feedback Control

LINEAR QUADRATIC GAUSSIAN

Outline. Linear regulation and state estimation (LQR and LQE) Linear differential equations. Discrete time linear difference equations

Observability and state estimation

There are none. Abstract for Gauranteed Margins for LQG Regulators, John Doyle, 1978 [Doy78].

EL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1)

State Observers and the Kalman filter

Structured State Space Realizations for SLS Distributed Controllers

Lecture 6. Foundations of LMIs in System and Control Theory

Linear State Feedback Controller Design

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

Optimal Control Theory

Lecture Note 12: Kalman Filter

ECEEN 5448 Fall 2011 Homework #4 Solutions

LQG CONTROL WITH MISSING OBSERVATION AND CONTROL PACKETS. Bruno Sinopoli, Luca Schenato, Massimo Franceschetti, Kameshwar Poolla, Shankar Sastry

Lecture 2: Discrete-time Linear Quadratic Optimal Control

Control Systems Design, SC4026. SC4026 Fall 2010, dr. A. Abate, DCSC, TU Delft

Lecture 7 : Generalized Plant and LFT form Dr.-Ing. Sudchai Boonto Assistant Professor

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

Transcription:

EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control problem estimation-control separation principle solution via dynamic programming 10 1

Linear stochastic system linear dynamical system, over finite time horizon: x t+1 = Ax t + Bu t + w t, t = 0,..., N 1 with state x t, input u t, and process noise w t linear noise corrupted observations: y t = Cx t + v t, t = 0,..., N y t is output, v t is measurement noise x 0 N(0, X), w t N(0, W), v t N(0, V ), all independent Linear Quadratic Stochastic Control with Partial State Observation 10 2

Causal output feedback control policies causal feedback policies: input must be function of past and present outputs roughly speaking: current state x t is not known u t = φ t (Y t ), t = 0,..., N 1 Y t = (y 0,..., y t ) is output history at time t φ t : R p(t+1) R m called the control policy at time t closed-loop system is x t+1 = Ax t + Bφ t (Y t ) + w t, y t = Cx t + v t x 0,..., x N, y 0,..., y N, u 0,..., u N 1 are all random Linear Quadratic Stochastic Control with Partial State Observation 10 3

Stochastic control with partial observations objective: J = E with Q 0, R > 0 ( N 1 t=0 ( x T t Qx t + u T t Ru t ) + x T N Qx N ) partially observed linear quadratic stochastic control problem (a.k.a. LQG problem): choose output feedback policies φ 0,..., φ N 1 to minimize J Linear Quadratic Stochastic Control with Partial State Observation 10 4

Solution optimal policies are φ t (Y t ) = K t E(x t Y t ) K t is optimal feedback gain matrix for associated LQR problem E(x t Y t ) is the MMSE estimate of x t given measurements Y t (can be computed using Kalman filter) called separation principle: optimal policy consists of estimating state via MMSE (ignoring the control problem) using estimated state as if it were the actual state, for purposes of control Linear Quadratic Stochastic Control with Partial State Observation 10 5

LQR control gain computation define P N = Q, and for t = N,...,1, P t 1 = A T P t A + Q A T P t B(R + B T P t B) 1 B T P t A set K t = (R + B T P t+1 B) 1 B T P t+1 A, t = 0,..., N 1 K t does not depend on data C, X, W, V Linear Quadratic Stochastic Control with Partial State Observation 10 6

Kalman filter current state estimate define ˆx t = E(x t Y t ) (current state estimate) Σ t = E(x t ˆx t )(x t ˆx t ) T (current state estimate covariance) Σ t+1 t = AΣ t A T + W (next state estimate covariance) start with Σ 0 1 = X; for t = 0,..., N, Σ t+1 t Σ t = Σ t t 1 Σ t t 1 C T (CΣ t t 1 C T + V ) 1 CΣ t t 1, = AΣ t A T + W define L t = Σ t t 1 C T (CΣ t t 1 C T + V ) 1, t = 0,..., N Linear Quadratic Stochastic Control with Partial State Observation 10 7

set ˆx 0 = L 0 y 0 ; for t = 0,..., N 1, ˆx t+1 = Aˆx t + Bu t + L t+1 e t+1, e t+1 = y t+1 C(Aˆx t + Bu t ) e t+1 is next output prediction error e t+1 N(0, CΣ t+1 t C T + V ), independent of Y t Kalman filter gains L t do not depend on data B, Q, R Linear Quadratic Stochastic Control with Partial State Observation 10 8

Solution via dynamic programming let V t (Y t ) be optimal value of LQG problem, from t on, conditioned on the output history Y t : V t (Y t ) = min E φ t,...,φ N 1 ( N 1 ) (x T τ Qx τ + u T τ Ru τ ) + x T NQx N Y t τ=t we ll show that V t is a quadratic function plus a constant, in fact, V t (Y t ) = ˆx T t P tˆx t + q t, t = 0,..., N, where P t is the LQR cost-to-go matrix (ˆx t is a linear function of Y t ) Linear Quadratic Stochastic Control with Partial State Observation 10 9

we have V N (Y N ) = E(x T NQx N Y N ) = ˆx T NQˆx N + Tr(QΣ N ) (using x N Y N N(ˆx N,Σ N )) so P N = Q, q N = Tr(QΣ N ) dynamic programming (DP) equation is V t (Y t ) = min u t E ( x T t Qx t + u T t Ru t + V t+1 (Y t+1 ) Y t ) (and argmin, which is a function of Y t, is optimal input) with V t+1 (Y t+1 ) = ˆx T t+1p t+1ˆx t+1 + q t+1, DP equation becomes V t (Y t ) = min u t E ( x T t Qx t + u T t Ru t + ˆx T t+1p t+1ˆx t+1 + q t+1 Y t ) = E(x T t Qx t Y t ) + q t+1 + min u t ( u T t Ru t + E(ˆx T t+1p t+1ˆx t+1 Y t ) ) Linear Quadratic Stochastic Control with Partial State Observation 10 10

using x t Y t N(ˆx t, Σ t ), the first term is E(x T t Qx t Y t ) = ˆx T t Qˆx t + Tr(QΣ t ) using ˆx t+1 = Aˆx t + Bu t + L t+1 e t+1, with e t+1 N(0, CΣ t+1 t C T + V ), independent of Y t, we get E(ˆx T t+1p t+1ˆx t+1 Y t ) = ˆx T t A T P t+1 Aˆx t + u T t B T P t+1 Bu t + 2ˆx T t A T P t+1 Bu t + Tr ( (L T t+1p t+1 L t+1 )(CΣ t+1 t C T + V ) ) using L t+1 = Σ t+1 t C T (CΣ t+1 t C T + V ) 1, last term becomes Tr(P t+1 Σ t+1 t C T (CΣ t+1 t C T +V ) 1 CΣ t+1 t ) = TrP t+1 (Σ t+1 t Σ t+1 ) Linear Quadratic Stochastic Control with Partial State Observation 10 11

combining all terms we get V t (Y t ) = ˆx T t (Q + A T P t+1 A)ˆx t + q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) + min u t (u T t (R + B T P t+1 B)u t + 2ˆx T t A T P t+1 Bu t ) minimization same as in deterministic LQR problem thus optimal policy is φ t(y t ) = K tˆx t, with K t = (R + B T P t+1 B) 1 B T P t+1 A plugging in optimal u t we get V t (Y t ) = ˆx T t P tˆx t + q t, where P t = A T P t+1 A + Q A T P t+1 B(R + B T P t+1 B) 1 B T P t+1 A q t = q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) recursion for P t is exactly the same as for deterministic LQR Linear Quadratic Stochastic Control with Partial State Observation 10 12

Optimal objective optimal LQG cost is J = EV 0 (y 0 ) = q 0 + E ˆx T 0 P 0ˆx 0 = q 0 + TrP 0 (X Σ 0 ) using ˆx 0 N(0, X Σ 0 ) using q N = TrQΣ N and q t = q t+1 + Tr(QΣ t ) + TrP t+1 (Σ t+1 t Σ t+1 ) we get J = using Σ 0 1 = X N Tr(QΣ t ) + t=0 N TrP t (Σ t t 1 Σ t ) t=0 Linear Quadratic Stochastic Control with Partial State Observation 10 13

we can write this as J = N Tr(QΣ t )+ t=0 N TrP t (AΣ t 1 A T +W Σ t )+Tr(P 0 (X Σ 0 )) t=1 which simplifies to where J = J lqr + J est J lqr = Tr(P 0 X) + N Tr(P t W), t=1 J est = Tr((Q P 0 )Σ 0 ) + N Tr((Q P t )Σ t ) + Tr(P t AΣ t 1 A T ) t=1 J lqr is the stochastic LQR cost, i.e., the optimal objective if you knew the state J est is the cost of not knowing (i.e., estimating) the state Linear Quadratic Stochastic Control with Partial State Observation 10 14

when state measurements are exact (C = I, V = 0), we have Σ t = 0, so we get N J = J lqr = Tr(P 0 X) + Tr(P t W) t=1 Linear Quadratic Stochastic Control with Partial State Observation 10 15

Infinite horizon LQG choose policies to minimize infinite horizon average stage cost J = lim N optimal average stage cost is N 1 1 N E t=0 ( x T t Qx t + u T t Ru t ) J = Tr(QΣ) + Tr(P( Σ Σ)) where P and Σ are PSD solutions of AREs P = Q + A T PA A T PB(R + B T PB) 1 B T PA, Σ = A ΣA T + W A ΣC T (C ΣC T + V ) 1 C ΣA T and Σ = Σ ΣC T (C ΣC T + V ) 1 C Σ Linear Quadratic Stochastic Control with Partial State Observation 10 16

optimal average stage cost doesn t depend on X (an) optimal policy is u t = Kˆx t, ˆx t+1 = Aˆx t + Bu t + L(y t+1 C(Aˆx t + Bu t )) where K = (R + B T PB) 1 B T PA, L = ΣC T (C ΣC T + V ) 1 K is steady-state LQR feedback gain L is steady-state Kalman filter gain Linear Quadratic Stochastic Control with Partial State Observation 10 17

Example system with n = 5 states, m = 2 inputs, p = 3 outputs; infinite horizon A, B, C chosen randomly; A scaled so max i λ i (A) = 1 Q = I, R = I, X = I, W = 0.5I, V = 0.5I we compare LQG with the case where state is known (stochastic LQR) Linear Quadratic Stochastic Control with Partial State Observation 10 18

Sample trajectories sample trace of (x t ) 1 and (u t ) 1 in steady state 2 (xt)1 0 2 0 10 20 30 40 50 1 (ut)1 0 blue: LQG, red: stochastic LQR 1 0 10 20 30 40 50 t Linear Quadratic Stochastic Control with Partial State Observation 10 19

Cost histogram histogram of stage costs for 5000 steps in steady state 500 400 300 200 100 J 0 0 5 10 15 20 25 30 500 400 J lqr 300 200 100 0 0 5 10 15 20 25 30 Linear Quadratic Stochastic Control with Partial State Observation 10 20