Stochastic and Adaptive Optimal Control

Similar documents
Linear-Quadratic-Gaussian Controllers!

Stochastic Optimal Control!

Dynamic Optimal Control!

OPTIMAL CONTROL AND ESTIMATION

Time-Invariant Linear Quadratic Regulators!

Time-Invariant Linear Quadratic Regulators Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2015

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Nonlinear State Estimation! Extended Kalman Filters!

First-Order Low-Pass Filter

Linearized Equations of Motion!

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Steady State Kalman Filter

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

First-Order Low-Pass Filter!

Deterministic Dynamic Programming

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Return Difference Function and Closed-Loop Roots Single-Input/Single-Output Control Systems

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Introduction to Optimization!

Optimal Control and Estimation MAE 546, Princeton University Robert Stengel, Preliminaries!

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Advanced Mechatronics Engineering

Lecture 5 Linear Quadratic Stochastic Control

Robotics. Control Theory. Marc Toussaint U Stuttgart

CHAPTER 2 THE MAXIMUM PRINCIPLE: CONTINUOUS TIME. Chapter2 p. 1/67

Solution of Stochastic Optimal Control Problems and Financial Applications

Minimization with Equality Constraints

Optimal Control and Estimation MAE 546, Princeton University Robert Stengel, Preliminaries!

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

6.241 Dynamic Systems and Control

Topic # Feedback Control Systems

Stochastic optimal control theory

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

Adaptive State Estimation Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2018

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Process Approximations

Time Response of Dynamic Systems! Multi-Dimensional Trajectories Position, velocity, and acceleration are vectors

Numerical Optimal Control Overview. Moritz Diehl

Minimization with Equality Constraints!

Optimization-Based Control

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

Control Systems! Copyright 2017 by Robert Stengel. All rights reserved. For educational use only.

Department of Aerospace Engineering and Mechanics University of Minnesota Written Preliminary Examination: Control Systems Friday, April 9, 2010

Nonlinear State Estimation! Particle, Sigma-Points Filters!

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

Observability and state estimation

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Overview of the Seminar Topic

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

Lecture 4 Continuous time linear quadratic regulator

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Lecture 19 Observability and state estimation

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Path integrals for classical Markov processes

EE363 homework 2 solutions

Optimal control and estimation

Problem 1 Cost of an Infinite Horizon LQR

Static and Dynamic Optimization (42111)

CALIFORNIA INSTITUTE OF TECHNOLOGY Control and Dynamical Systems. CDS 110b

Suppose that we have a specific single stage dynamic system governed by the following equation:

Optimal Control Theory

Lecture 4: Numerical Solution of SDEs, Itô Taylor Series, Gaussian Approximations

Introduction Optimality and Asset Pricing

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

Minimization of Static! Cost Functions!

Conditional Simulation of Random Fields by Successive Residuals 1

Feedback Optimal Control of Low-thrust Orbit Transfer in Central Gravity Field

Static and Dynamic Optimization (42111)

Stochastic Differential Dynamic Programming

SUB-OPTIMAL RISK-SENSITIVE FILTERING FOR THIRD DEGREE POLYNOMIAL STOCHASTIC SYSTEMS

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

EE C128 / ME C134 Final Exam Fall 2014

Lecture 9. Introduction to Kalman Filtering. Linear Quadratic Gaussian Control (LQG) G. Hovland 2004

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

Chapter 3. LQ, LQG and Control System Design. Dutch Institute of Systems and Control

Bayesian Decision Theory in Sensorimotor Control

Linear Quadratic Optimal Control Topics

Infinite-dimensional nonlinear predictive controller design for open-channel hydraulic systems

Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems

LINEAR QUADRATIC GAUSSIAN

Continuous Time Finance

CDS 110b: Lecture 2-1 Linear Quadratic Regulators

Game Theoretic Continuous Time Differential Dynamic Programming

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

3 Gramians and Balanced Realizations

Linear-Quadratic Optimal Control: Full-State Feedback

LQR and H 2 control problems

Optimization-Based Control

Kalman Filter. Man-Wai MAK

EL2520 Control Theory and Practice

Alternative Expressions for the Velocity Vector Velocity restricted to the vertical plane. Longitudinal Equations of Motion

Nonlinear and robust MPC with applications in robotics

Linear Systems. Manfred Morari Melanie Zeilinger. Institut für Automatik, ETH Zürich Institute for Dynamic Systems and Control, ETH Zürich

Computational Issues in Nonlinear Dynamics and Control

Transcription:

Stochastic and Adaptive Optimal Control Robert Stengel Optimal Control and Estimation, MAE 546 Princeton University, 2018! Nonlinear systems with random inputs and perfect measurements! Stochastic neighboring-optimal control! Linear-quadratic (LQ) optimal control with perfect measurements! Adaptive Critic Control! Information Sets and Expected Cost Copyright 2018 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/mae546.html http://www.princeton.edu/~stengel/optconest.html 1 Nonlinear Systems with Random Inputs and Perfect Measurements Inputs and initial conditions are uncertain, but the state can be measured without error = f x( t),u t z( t) = x( t) x t E x( 0) = x( 0) x ( 0 ) T { x( 0) } = 0 E x 0 x 0,w( t),t Assume that random disturbance effects are small and additive = f x t x t,u( t),t E w( t)w T τ + L( t)w( t) E w( t) = 0 = W t δ t τ 2

Cost Must Be an Expected Value Deterministic cosunction cannot be minimized because disturbance effect on state cannot be predicted state and control are random variables min u(t ) min u(t ) J = φ x( ) + J = E φ x( ) + t o t o L [ x(t),u(t) ] However, the expected value of a deterministic cost function can be minimized L [ x(t),u(t) ] dt dt 3 Stochastic Euler-Lagrange Equations? There is no single optimal trajectory Expected values of Euler-Lagrange necessary conditions may not be well defined 1) E λ( ) = E φ[x(t )] f x 2) E λ(t) = E H[x(t),u(t),λ(t),t] x T T 3) E H[x(t),u(t),λ(t),t] u = 0 4

Stochastic Value Function for a Nonlinear System Expected values of terminal and integral cost are well defined Apply Hamilton-Jacobi-Bellman (HJB) to expected terminal and integral costs Principle of Optimality Optimal expected value function at t 1 V * t 1 = E φ x *( ) = min u t 1 [ ] L x *(τ ),u*(τ ) E t 1 φ x *(t ) f L x *(τ ),u(τ ) [ ] dτ dτ 5 Rate of Change of the Value Function Total time-derivative of V* dv * = E L x * (t 1 ),u *(t 1 ) dt t =t 1 { [ ]} x(t) and u(t) can be known precisely; therefore dv * dt t =t 1 = L x * (t 1 ),u * (t 1 ) [ ] 6

ΔV* = dv * dt Incremental Change in the Value Function Apply chain rule to total derivative dv * dt = E V * + V * t x x Δt = E V * Δt + V * t x!xδt + 1 2 V * 2!xT!x x 2 Δt 2 +" = E V * Δt + V * t x f. Incremental change in value function, Expand to second degree ( + Lw (.)) Δt + 1 2 Next: cancel Δt ( f (.) + Lw (.)) T 2 V * f. x 2 ( + Lw (.)) Δt 2 +" 7 dv * dt Tr x T Qx Introduction of the Trace Trace of a matrix product is scalar Tr( ABC) = Tr( CAB) = Tr( BCA) = Tr( xx T Q) = Tr( Qxx T ) dim Tr E V * + V * t x f. = E V * + V * t x f. ( + Lw (.)) + 1 2 Tr f (.) + Lw (.) ( + Lw (.)) + 1 2 Tr 2 V * = 1 1 T 2 V * f. x 2 f (.) + Lw (.) x 2 f. + Lw (.) + Lw (.) Δt T Δt 8

dv * dt = E V * t Toward the Stochastic HJB Equation + V * x = V * + V * t x f. Because x(t) and u(t) can be measured, f (.) + Lw (.) + 1 2 Tr 2 V * f (.) + Lw (.) x 2 f. + E V * x Lw. + 1 2 Tr 2 V * f (.) + Lw (.) x 2 f. + Lw (.) + Lw (.) T Δt 1 st two terms can be taken outside the expectation T Δt 9 Toward the Stochastic HJB Equation Disturbance is assumed to be zero-mean white noise E w( t) = 0 w T ( τ ) E f. = 0 E w( t)w T ( τ ) = W( t)δ t τ E w( t)w T ( τ ) Δt W t Δt 0 dv * dt = V * + V * t x f (.) + 1 2 lim = V * t Uncertain disturbance input can only increase the value function rate of change Tr 2 V * E( f (.)f (.) T )Δt + LE( w( t)w( τ ) T )L T Δt 0 x 2 Δt 0 ( t) + V * ( x t)f (.) + 1 2 Tr 2 V * ( t)l( t)w( t)l( t) T x 2 10

Stochastic Principle of Optimality (Perfect Measurements) dv * dt min u = V * t E dv * dt ( t) + V * ( x t)f. + V * x + 1 2 Tr 2 V * V * t ( t) = x 2 ( t)l( t)w( t)l t! Substitute for total derivative, dv*/dt = L(x*,u*)! Solve for the partial derivative, V*/ t! Expected value of HJB Equation -> Stochastic HJB Equation = min u T ( t)f x *( t),u( t),t + 1 2 Tr 2 V * ( t)l( t)w( t)l t x 2 E L x *( t),u( t),t + V * ( x t)f x *( t),u( t),t + 1 2 Tr 2 V * ( t)l( t)w( t)l t x 2 Boundary (terminal) condition : V * = E φ ( ) T T 11 min u Observations of Stochastic Principle of Optimality (Perfect Measurements) V * t ( t) = E L x *( t),u( t),t + V * ( x t)f x *( t),u( t),t + 1 2 Tr 2 V * ( t)l( t)w( t)l t x 2! Control has no effect on the disturbance input! Criterion for optimality is the same as for the deterministic case! Disturbance uncertainty increases the magnitude of the total optimal value function, E[V*(0)] T 12

Adaptive Critic Controller 13 Adaptive Critic Controller Nonlinear control law, c, takes the general form x(t) : = c x(t),a,y * ( t) u t a : y *(t) : state [ ] parameters defining an operating point command input On-line adaptive critic controller Nonlinear control law ( action network ) Criticizes non-optimal performance via critic network Adapts control gains to improve performance, respond to failures, and accommodate parameter variation Adapts cost model to improve performance evaluation 14

Adaptive Critic Iteration Cycle Ferrari, Stengel, Model-Based Adaptive Critic Designs, in Learning and Approximate Dynamic Programming, Si et al, ed., 2004 15 Overview of Adaptive Critic Designs 16

Modular Description of Typical Iteration Cycle 17 Gain Scheduled Controller Design PI-LQ controllers (LQR with integral compensation) that satisfy requirements at N operating points Scheduling variable, a, identifies each operating point u( t k ) = C F ( a i )y *( t k ) + C B ( a i )Δx( t k ) + C I ( a i ) Δy τ c x(t k ),a i,y *( t k ), i = 1, N t k t k 1 dτ 18

Replace Gain Matrices by Neural Networks Replace control gain matrices by sigmoidal neural networks = u t k NN F y *( t k ),a( t k ) + NN B x( t k ),a( t k ) + NN I Δy( τ )dτ,a( t k ) = c x(t k ),a,y *( t k ) 19 Sigmoidal Neuron Input-Output Characteristic Logistic sigmoid function Sigmoid with two inputs, one output u = s(r) = or 1 1+ e r u = s(r) = er 1 e r +1 = tanh r 2 u = s(r) = 1 1+ e w 1r 1 +w 2 r 2 +b 20

Algebraically Trained Neural Control Law Algebraic training of neural networks produces exacit of PI- LQ control gains and trim conditions at N operating points Interpolation and gain scheduling via neural networks One node per operating point in each neural network See Ferrari, Stengel, JGCD, 2002 [on Blackboard] 21 On-line Optimization of Adaptive Critic Neural Network Controller Critic adapts neural network weights to improve performance using approximate dynamic programming Ferrari, Stengel, JGCD, 2004 22

Heuristic Dynamic Programming Adaptive Critic Dual Heuristic, Dynamic Programming Adaptive Critic for receding-horizon optimization problem V x a ( t k ) = L x a ( t k ),u( t k ) + V x a ( t k+1 ) [ ] V u = L u + V x a x a u = 0 V x a t x a t [ ] = NN C x a ( t),a t Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cosunction applied to nonlinear problem Modified resilient backpropagation for neural network training on 2 nd Time Step Ferrari, Stengel, JGCD, 2004 23 Action Network On-line Training Train action network, at time t, holding the critic parameters fixed x a (t) a(t) NN A Aircraft Model Transition Matrices State Prediction Utility Function Derivatives NN A Target Optimality Condition NN C Target Generation 24

Critic Network On-line Training Train critic network, at time t, holding the action parameters fixed x a (t) a(t) NN A Aircraft Model Transition Matrices State Prediction NN C Utility Function Derivatives NN C (old) NN C Target Target Cost Gradient Target Generation 25 Adaptive Critic Flight Simulation 70-deg Bank Angle Command Multiple Failures (Thrust, Stabilator, Rudder) Ferrari, Stengel, JGCD, 2004 26

Stochastic Linear-Quadratic Optimal Control 27 Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem Quadratic value function = E φ x( ) V t o t o L [ x(τ ),u(τ ) ] dτ = 1 t 2 E o Q(t) M(t) xt ( )S( )x( ) x T (t) u T (t) M T (t) R(t) x(t) u(t) dt Linear dynamic constraint x ( t) = F(t)x(t) + G(t)u(t) + L(t)w(t) 28

Components of the LQ Value Function Quadratic value function has two parts V ( t) = 1 2 xt (t)s(t)x(t) + v( t) Certainty-equivalent value function V CE ( t) 1 2 xt (t)s(t)x(t) Stochastic value function increment v( t) = 1 2 t Tr S( τ )L( τ )W( τ )L τ T dτ 29 Value Function Gradient and Hessian Certainty-equivalent value function V CE ( t) 1 2 xt (t)s(t)x(t) Gradient with respect to the state V x t = xt (t)s(t) Hessian with respect to the state 2 V x 2 ( t) = S(t) 30

Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation (Perfect Measurements) V * t Certainty-equivalent plus stochastic terms = min u = min u 1 2 E x *T Qx *+2x * T Mu + u T Ru 1 2 x *T Qx *+2x * T Mu + u T Ru + x * T S Fx *+Gu + x * T S Fx *+Gu Terminal condition V ( ) = 1 2 xt ( )S( )x( ) + Tr SLWL T + Tr SLWL T 31 Optimal Control Law (Perfect Measurements) Differentiate right side of HJB equation w.r.t. u and set equal to zero ( V t ) u u t + x T SG = 0 = x T M + u T R Solve for u, obtaining feedback control law = R 1 ( t) G T t! C( t)x( t) S t + M T ( t) x( t) 32

LQ Optimal Control Law u t (Perfect Measurements) = R 1 ( t) G T t C( t)x( t) S t + M T ( t) x( t) Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law 33 Matrix Riccati Equation for Control S t S t Substitute optimal control law in HJB equation 1 Sx 2 xt + v = 1 2 xt Q + MR 1 M T Matrix Riccati equation provides S(t) = Q(t) + M(t)R 1 (t)m T (t) F(t) G(t)R 1 (t)m T (t) ( F GR 1 M T ) T S S( F GR 1 M T ) + SGR 1 G T S + 1 Tr u ( SLWLT ) ( t) = R 1 t 2 G T t S t F(t) G(t)R 1 (t)m T T (t) S t + S( t)g(t)r 1 (t)g T (t)s( t), S + M T ( t) = φ xx ( )! Stochastic value function increases cost due to disturbance! However, its calculation is independent of the Riccati equation v = 1 2 Tr( SLWLT ) 34 x x t

Evaluation of the Total Cost (Imperfect Measurements)! Stochastic quadratic cosunction, neglecting cross terms J = 1 t f 2 Tr E xt ( )S( )x( ) + E xt (t) u T (t) t o Q(t) 0 0 R(t) x(t) u(t) dt = 1 2 Tr S(t )E x( f )xt ( ) + t o { Q(t)E x(t)xt (t) + R(t)E u(t)u T (t) }dt or where J = 1 2 Tr S( )P( ) + t o Q(t)P(t) + R(t)U( t) dt P(t) E x(t)x T (t) U t E u(t)u T (t) 35 Optimal Control Covariance U t = C t Optimal control vector = C( t) ˆx ( t) u t Optimal control covariance P( t)c T ( t) = R 1 ( t)g T ( t)s( t)p( t)s( t)g( t)r 1 t 36

Express Cost with State and Adjoint Covariance Dynamics J = 1 2 Tr S(t o)p t o Integration by parts to express S( )P( ) t S(t)P(t) f to = S(t)P(t) + S(t) P(t) dt S( )P( ) = S(t o )P(t o ) + + Q(t)P( t) + R(t)U( t) + S(t)P(t) + S(t) P(t) t o t o S(t)P(t) + S(t) P(t) dt Rewrite cosunction to incorporate initial cost t o dt 37 Evolution of State and Adjoint Covariance Matrices (No Control) u( t) = 0; U( t) = 0 State covariance response to random disturbance and initial uncertainty P ( t) = F( t)p( t) + P( t)f T ( t) + L( t)w( t)l T ( t), P( t o ) given Adjoint covariance response to state weighting and terminal cost weight S ( t) = F T ( t)s( t) S( t)f( t) Q( t), S given 38

P t Evolution of State and Adjoint Covariance Matrices (Optimal Control) = F( t) G t State covariance response to random disturbance and initial uncertainty C t P t + P( t) F( t) G t C t T + L( t)w( t)l T t Dependent on S(t) Adjoint covariance response to state weighting and terminal cost weight S ( t) = F T ( t)s( t) S( t)f( t) Q( t) S( t)g( t)r 1 ( t)g T ( t)s( t) Independent of P(t) 39 Total Cost With and Without Control With no control J no control = 1 t 2 Tr f S(t o)p( t o ) + S( t)l( t)w( t)l T ( t)dt t o With optimal control, the equation for the cost is the same J optimal control = 1 2 Tr S(t o)p t o + S t t o L( t)w( t)l T ( t)dt... but evolutions of S(t) and S(t o ) are different in each case 40

Information Sets and Expected Cost 41 The Information Set, I! Sigma algebra(wikipedia definitions)! The collection of sets over which a measure is defined! The collection of events that can be assigned probabilities! A measurable space! Information available at current time, t 1! All measurements from initial time, t o! All control commands from initial time [ ] = { z[ t o,t 1 ],u[ t o,t 1 ]} I t o,t 1! Plus available model structure, parameters, and statistics [ ] = { z[ t o,t 1 ],u[ t o,t 1 ],f ( ),Q,R, } I t o,t 1 42

A Derived Information Set, I D! Measurements may be directly useful, e.g.,! Displays! Simple feedback control!... or they may require processing, e.g.,! Transformation! Estimation! Example of a derived information set! History of mean and covariance from a state estimator [ ] = { ˆx [ t o,t 1 ],P[ t o,t 1 ],u[ t o,t 1 ]} I D t o,t 1 43 Additional Derived Information Sets! Markov derived information set! Most current mean and covariance from a state estimator I MD t 1 = ˆx t 1 {,P( t 1 ),u( t 1 )}! Multiple model derived information set! Parallel estimates of current mean, covariance, and hypothesis probability mass function = I MM t 1 ˆx A ( t 1 ),P A ( t 1 ),u( t 1 ),Pr( H A ), ˆx B ( t 1 ),P B ( t 1 ),u( t 1 ),Pr( H B ),! { } 44

Required and Available Information Sets for Optimal Control! Optimal control requires propagation of information back from the final time! Hence, it requires the entire information set, extending from t o to I t o,! Separate information set into knowable and future sets I t o, [ ] + I t 1, = I t o,t 1! Knowable set has been received! Future set is based on current knowledge of model, current estimates, and statistics of future uncertainties 45 Expected Values of State and Control Expected values of the state and control are conditioned on the information set E x( t) I D = ˆx ( t) E{ T x( t) ˆx ( t) x( t) ˆx ( t) I } D = P t... where the conditional expected values are estimates from an optimal filter 46

Dependence of the Stochastic Cost Function on the Information Set J = 1 t 2 E E Tr S( )x( )x T ( ) I D + E Tr Qx ( f { t xt t) }dt + E Tr Ru( t)u T { ( t) }dt 0 0 P t Expand the state covariance T { x( t) ˆx ( t) I D } = E x( t)x T ( t) ˆx ( t)x T ( t) x t = E x( t) ˆx ( t) { ˆx T ( t) + ˆx ( t) ˆx T ( t) I D } E { ˆx T ( t) I D } = E ˆx t x t { x T ( t) I D } = ˆx t ˆx T ( t) P t or E = E x t x( t)x T t { x T ( t) I D } ˆx t ˆx T ( t) { I D } = P( t) + ˆx ( t) ˆx T ( t)... where the conditional expected values are obtained from an optimal filter 47 Certainty-Equivalent and Stochastic Incremental Costs { + ˆx( )ˆx T ( ) } + Tr Q P t J = 1 2 E Tr S( ) P { + ˆx ( t) ˆx T ( t) }dt + Tr Ru( t)u T ( t) dt 0 0 J CE + J S! Cosunction has two parts! Certainty-equivalent cost! Stochastic increment cost J CE = 1 t 2 E Tr S( )ˆx( )ˆx T ( ) + Tr Qˆx ( t ) f { ˆx T ( t) }dt + Tr Ru( t)u T ( t) dt 0 0 J S = 1 2 E Tr S( )P + 0 Tr QP ( t ) dt 48

Expected Cost of the Trajectory V *( t o ) J * = E ξ I[ t o,t 1 ] E ξ E J * Optimized cosunction = E φ x *( ) Pr I t o,t 1 t F [ ] + L x *(τ ),u * (τ ) Law of total expectation { [ ]} + E ξ I t 1, = E E ( ξ I ) t 0 ( )Pr I t 1, dτ { } Because the past is established at t 1 ( [ ]) 1 = E( J* I[ t o,t 1 ]) + E J* I t 1, = E J* I t o,t 1 [ ]+ E( J* I t 1, )Pr I t 1, ( )Pr{ I t 1, } { } 49 Expected Cost of the Trajectory E J * E J* I( t 0,t 1 ) For planning = E J* I t 0, E J * For post-trajectory analysis (with perfect measurements) For real-time control, t 1 = E J * + E J* I( t 1, ) ( )Pr I( t 0, ) = E J* I t 0, { } Pr I t,t 0 f { } = E( J* I( t 0, )) 1 { } Pr I t,t 1 f [ ] Known Estimated 50

min u Dual Control (Fel dbaum, 1965)! Nonlinear system! Uncertain system parameters to be estimated! Parameter estimation can be aided by test inputs! Approach: Minimize value function with three increments! Nominal control! Cautious control! Probing control min u V* = ( V * nominal + V * cautious + V * probing ) Estimation and control calculations are coupled and necessarily recursive 51 Next Time: Linear-Quadratic-Gaussian Regulators 52