An Overview of Risk-Sensitive Stochastic Optimal Control

An Overview of Risk-Sensitive Stochastic Optimal Control Matt James Australian National University Workshop: Stochastic control, communications and numerics perspective UniSA, Adelaide, Sept. 2004. 1

Outline A brief history Approaches to controller design Relationships among approaches Some limit results (state feedback) Robust interpretation Partially observed case Quantum systems References 2

A brief history The risk-sensitive criterion introduced by Jacobson 1973. Linear system with Gaussian noise: dx = (Ax + Bu)dt + dw [Linear-Exponential-Quadratic-Gaussian (LEQG)] Minimize average of exponential of quadratic cost: ( )] T J µ 1 = E x,t [expµ 2 [x Qx + 1 2 u 2 ] ds + 1 2 x Rx µ > 0 - risk-sensitive µ < 0 - risk-seeking t 3

solution Optimal state feedback control is u (x, t) = B P µ t x where P µ t = P µ t A A P µ t + P µ t (BB µi) P µ t Q, P µ T = R 0 t T [control type Riccati equation] 4

Limit µ 0: recover standard LQG control (risk-neutral). Linear system with Gaussian noise: dx = (Ax + Bu)dt + dw Minimize average of quadratic cost: E x,t [ T t 1 2 [x Qx + 1 2 u 2 ] ds + 1 2 x Rx Solution: Optimal state feedback control is u (x, t) = B P t x [Linear-Quadratic-Gaussian (LQG)] ] where P t = P t A A P t + P t BB P t Q, P T = R 5

partially observed case Linear system with Gaussian noise: dx dy = (Ax + Bu)dt + dw = Cxdt + dv Minimize average of exponential of quadratic cost ( )] T J µ 1 = E x,t [expµ 2 [x Qx + 1 2 u 2 ] ds + 1 2 x Rx over causal maps y( ) u( ). t 6

Solution: Whittle 1981, Bensoussan-van Schuppen 1985. Optimal partially observed control is u t = B P µ t ˆx µ t where dˆx µ = ([A + µy µ Q]ˆx µ + Bu)dt + Y µ C (dy Cˆx µ dt) and Ẏ µ = AY µ + Y µ A + µy µ QY µ + I Y µ C CY µ Significantly, this is not the Kalman filter. When µ 0, LEQG LQG, whose solution is expressed in terms of the Kalman filter. 7

Some subsequent developments: LEQG. Connection with dynamic games. (Jacobson, 1973) LEQG. Connection with robust control (H ). (Glover-Doyle, 1988) Risk-sensitive control for nonlinear systems, state feedback. Connection with dynamic games. (Fleming-McEneaney 1992, James, 1992, Nagai 1996) Solution of risk-sensitive control for nonlinear systems, output feedback. Solution and connection with dynamic games and nonlinear robust control. (James-Baras-Elliott 1994, Charalambous-Hibey 1996, Dai Pra-Meneghini-Runggaldier 1996. ) Robustness properties of risk-sensitive criterion. Also stochastic IQC. (James, Fleming, Boel, Dupuis, Petersen, 1995, 1998) Risk-sensitive control for finance. (Nagai 2000) Risk-sensitive control for quantum systems. (James 2004) 8

Approaches to controller design The basic aim of robust control is: Design a controller to achieve satisfactory performance in the presence of disturbances and uncertainty. Stochastic control has similar aims. 9

Dynamics ẋ s = f(x s ) + g(x s )d s, 0 < s < T, z s = h(x s ) H robust control Disturbance Output Map f(0) = 0, h(0) = 0, x 0 = 0 H Norm Note: T d, z γ iff T d, z : d( ) z( ) T d, z = T 0 z(t) 2 dt sup d L 2 [0,T] T 0 z 2 d 2 γ 2 d(t) 2 dt d L 2 [0, T] 10

D-H 2 deterministic optimal control disturbance: none controller: Optimal cost gives optimal performance without special consideration of disturbances. W(x, t) = inf u [ T t L(x s, u s ) ds + Φ(x T ) ] Dynamics ẋ s = b(x s, u s ), t < s < T, x t = x 11

S-H 2 (RN) stochastic optimal control disturbance: controller: Optimal cost noise gives optimal average performance W ε (x, t) = inf u E x,t [ T t L(x ε s, u s ) ds + Φ(x T ) ] Dynamics dx ε s = b(x ε s, u s ) ds + ε db s, t < s < T, x ε t = x (ε > 0 - noise intensity) 12

RS disturbance: noise stochastic risk-sensitive optimal control controller: gives optimal average performance using exponential cost (heavily penalizes large values) Optimal cost S µ,ε (x, t) = inf u E x,t [ exp µ ε ( T t )] L(x ε s, u s ) ds + Φ(x ε T) Dynamics dx ε s = b(x ε s, u s ) ds + ε db s, t < s < T, x ε t = x (µ > 0 - risk sensitivity) 13

H H robust control disturbance: L 2 signal controller: Find u( ) such that Dynamics achieves specified H norm constraint (worst-case design) T d, z γ ẋ s = b(x s, u s ) + d s t < s < T x t = x z s = h(x s ) (γ = 1/ µ - H constraint) 14

D-DG deterministic differential game disturbance: L 2 signal controller: Optimal cost Dynamics W µ (x, t) = sup d gives optimal performance subject to opposing efforts of disturbance (worst-case design) ẋ s x t inf u [ T t ] L(x s, u s ) 1 d 2µ s 2 ds + Φ(x T ) = b(x s, u s ) + d s, t < s < T = x 15

S-DG stochastic differential game disturbances: controller: Optimal cost noise W µ,ε (x, t) = sup d L 2 signal gives optimal average performance subject to opposing efforts of disturbance (worst-case design) inf u E x,t [ T t L(x ε s, u s ) 1 2µ d s 2 ds + Φ(x ε T) ] Dynamics dx ε s x ε t = (b(x ε s, u s ) + d s ) ds + ε db s, t < s < T = x 16

Relationships among approaches RS µ 0 S-H 2 ε 0 ε 0 D-DG µ 0 D-H 2 RS, S-DG, D-DG, and S-H 2 are perturbations of D-H 2. 17

(James, Fleming-McEneaney, Whittle) RS S-DG D-H 2 + (noise variance term) + (disturbance energy term) Equivalence via logarithmic transformation, dynamic programming PDEs, and representation theorem Expansion valid for small noise intensity and small risk-sensitivity 18

S-H 2 D-H 2 + (noise variance term) H D-H 2 + (disturbance energy term) RS H 2 + H??? 19

Some limit results (state feedback) Assumptions b(x, u) = f(x) + g(x)u, where f : R n R n, g : R m R n and their first order derivatives Df, Dg are bounded and uniformly Lipschitz continuous. Control values: U R m is compact and convex. Disturbance values: D = R n. Φ : R n R is non negative; Φ and DΦ are bounded and uniformly Lipschitz continuous. 20

L : R n R m R is C 2, non negative; L(, u) and D x L(, u) are bounded and uniformly Lipschitz continuous, uniformly in u U; and D 2 ul(x, ) > 0, uniformly in x R n. There exists locally Lipschitz U (x, p) achieving maximum in min u U [p b(x, u) + L(x, u)] Use Elliott Kalton formulation for two player zero sum differential games (strategies, upper and lower values, etc) For later use, assume also L(x, u) = 1 2 u 2 + V (x), and the functions f, g, V and Φ are of class C and have compact support. U (x, p) is of class C 2. 21

Viscosity Solutions Viscosity solutions introduced by Crandall-Lions 1983. The definition for a fully nonlinear parabolic PDE v t + F(x, Dv, D2 v) = 0 in R n (0, T) v(x, T) = Φ(x) in R n. is as follows. 22

Definition An upper semicontinuous (u.s.c.) function v (resp. l.s.c. function v) is called a viscosity subsolution (resp. supersolution) if and v Φ (resp. v Φ) on R n {T } t φ(x, t) + F(x, Dφ(x, t), D2 φ(x, t)) 0 (resp. 0) for every smooth function φ and any local maximum (resp. minimum) (x, t) R n [0, T) of v φ (resp. v φ). A continuous function is called a viscosity solution if it is both a subsolution and a supersolution. 23

Comparison Theorem If v is a subsolution and if v is a supersolution, then v v in R n [0, T]. Thus any continuous viscosity solution is unique. 24

Dynamic Programming PDEs RS (ε > 0, µ > 0) S µ,ε t + ε 2 Sµ,ε + min u U [DS µ,ε b(x, u) + µ ε L(x, u)sµ,ε ] = 0 in R n (0, T) S µ,ε (x, T) = exp µ ε Φ(x) in Rn Logarithmic transformation: (Fleming, 1970 s) W µ,ε (x, t) = ε µ log Sµ,ε (x, t) 25

S-DG (ε > 0, µ > 0) W µ,ε t + ε 2 W µ,ε + min u U max d D [DW µ,ε (b(x, u) + d) + L(x, u) 1 2µ d 2 ] = 0 in R n (0, T) W µ,ε (x, T) = Φ(x) in R n. Note: min u U max d D [ p (b(x, u) + d) + L(x, u) 1 2µ d 2 ] = min u U [p b(x, u) + L(x, u)] + 1 2 µ p 2 26

Representation theorem Viscosity formulation of stochastic differential games developed by Fleming Souganidis (1989). Theorem (James, 1992; Fleming-McEneany 1992) The function W µ,ε defined by the logarithmic transformation is the optimal cost function of the stochastic differential game defined above. i.e. RS S-DG Proof : Methods of Fleming (1971) imply DW µ,ε (x, t) C for all (x, t) R n [0, T]. In view of this bound, the set of disturbance values D can be taken as bounded. The result now follows from Fleming Souganidis (1989). 27

Limit PDEs Obtained as ε 0 and/or µ 0. D-DG (ε 0, µ > 0) W µ t + min u U max d D [DW µ (b(x, u) + d) + L(x, u) 1 2µ d 2 ] = 0 in R n (0, T) W µ (x, T) = Φ(x) in R n. 28

S-H 2 (ε > 0, µ 0) W ε t + ε 2 W ε + min u U [DW ε b(x, u) +L(x, u)] = 0 in R n (0, T) W ε (x, T) = Φ(x) in R n. D-H 2 (ε 0, µ 0) W t + min u U [DW b(x, u) + L(x, u)] = 0 in R n (0, T) W(x, T) = Φ(x) in R n. 29

Limit theorems depend on general viscosity limit techniques developed by Evans Ishii (1984) Barles Perthame (1987) Limit theorems Theorem (James, 1992) We have lim W µ,ε (x, t) = W µ (x, t) ε 0 uniformly on compact subsets, where W µ C(R n [0, T]) is the unique bounded continuous viscosity solution of the corresponding PDE and is the optimal cost function of the deterministic differential game defined above. i.e. S-DG D-DG as ε 0. 30

Proof: Assumptions imply The function 0 W µ,ε (x, t) C for all (x, t) R n [0, T], µ, ε > 0 v(x, t) = is u.s.c. and a viscosity subsolution. The function v(x, t) = is l.s.c and a viscosity supersolution. lim sup W µ,ε (y, s). ε 0, y x, s t lim inf W µ,ε (y, s) ε 0, y x, s t By construction v v By the comparison theorem v v 31

Therefore lim ε 0 W µ,ε (x, t) = v = v = v From Evans Souganidis (1984) the value W µ of the D-DG is the unique viscosity solution, and hence W µ = v. Expansion of optimal cost for S-DG W aε,bε (x, t) = W(x, t) + ε (W g (x, t), W n (x, t)) a b +... [methods of Fleming, 1970 s] 32

Robust interpretation (Boel, Dupuis, James, Petersen 1998, 2000) We wish to interpret the finiteness of the risk-sensitive criterion for a nominal system G 0 in terms of a performance bound for a perturbed system G. nominal system w G 0 Z 33

Nominal system dynamics: dx t = G 0 : b(x t )dt + ε 1 2 σ(x t )dw t Z t = h(x t ) Risk-sensitive criterion: S(x, T). = E x [ exp 1 2γ 2 ε T 0 Z t 2 dt ] γ 2 = 1/µ 34

Key tool is the representation [Dupuis-Ellis 1997] [ ] γ 2 1 T ( ε log S(x, T) = sup E x Zt 2 γ 2 v t 2) dt v V T 2 where X t is given by perturbed system dynamics: d G : X t = b( X t )dt + σ( X t )v t dt + ε 1 2 σ( X t )dw t Z t = h( X t ) 0 35

perturbed system w G Z v X perturbation E x [ 1 2 T 0 Z t 2 dt ] γ 2 E x [ 1 2 T 0 v t 2 dt ] robustness inequality + γ 2 ε log S(x, T) (analogous to H inequality) 36

Partially observed (output feedback) case (James-Baras-Elliott, 1994) Dynamics dx t dỹ t = b(x t, u t ) dt + ε dw t = h(x t ) dt + ε dv t Objective γ 2 = 1/µ [ ( )] J(u) = E exp µ T L(x t, u t ) dt + Φ(x T ) ε 0 37

Information state σ µ, ε solution of RS t (x) not cond prob σ µ,ε t, η = E 0 [η(x ε t)z ε t exp ( µ ε t 0 ) L(x ε s, u s ) ds Y t ] for all η Dynamics dσ µ,ε t = ( A u t + µ ) ε Lu t σ µ,ε t dt + 1 ε hσµ,ε t dỹt, ε σ µ,ε 0 = ρ in L 1 (R n ), stochastic PDE Representation J(u) = E 0 [ σ µ,ε T, exp µ ε Φ ] Linear case: Bensoussan-van Schuppen 38

Dynamic programming Value function S µ,ε (σ, t) = inf u Ũt,T E 0 σ,t [ ] σ µ,ε T, e µ ε Φ. Dyn prog equation t Sµ,ε + 1 2ε D2 S µ,ε (hσ, hσ) { + inf u U DS µ,ε (A u σ + µ ε Lu σ )} = 0 S µ,ε (σ, T) = σ, e µ ε Φ Optimal controller u t(σ) = achieves min in d.p.e. infinite dim, nonlinear, 2nd-order information state feedback 39

Small Noise limit Duality max-plus large deviations σ, ν = R n σ(x)ν(x) dx (p, q) = sup x R n { p(x) + q(x) } lim ε 0 ε µ log e µ ε p, e µ ε q = (p, q) 40

Information state Value function lim ε 0 lim ε 0 ε µ log σµ,ε t (x) = p µ t (x), ε µ log Sµ,ε (e µ ε p, t) = W µ (p, t) limit limit Some consequences Can solve output feedback deterministic dynamic games (D-DG) using analogous, though deterministic, information state methods (cf. max-plus). Output feedback nonlinear H control problem can be solved using the D-DG methods. 41

Risk-sensitive control of quantum systems (James, 2004 - current) system (atom) H X bath (EM field) Γ db,db System: eg. atom, can be controlled, Hilbert space H Bath: e.g. electromagnetic field, continuously monitored, Hilbert space Γ (Fock space) 42

Dynamics on H Γ: [Hudson-Parthasarathy quantum stochastic DE, 1980 s] du(t) = { K(u(t))dt + LdB (t) L db(t) } U(t) with initial condition U(0) = I, where K(u) = i H(u) + 1 2 L L System operators evolve according to X(t) = j t (u, X) = U (t)x IU(t) i.e. dx(t) + (X(t)K(t) + K (t)x(t) L (t)x(t)l(t))dt = [X(t), L(t)]dB (t) [X(t), L (t)]db(t) where L(t) = j t (u, L), K(t) = j t (u, K(u(t))) 43

Measurements of real quadrature of field: Q t = B t + B t dy (t) = (L(t) + L (t))dt+ dq(t) output field input field Risk-sensitive cost: [James, 2004] J µ = π 0 vv, R (T)e µc2(t) R(T) where dr(t) dt [ A, B = tr(a B)] = µ 2 C 1(t)R(t), C 1 (t) = j t (u, C 1 (u(t))), C 2 (t) = j t (u, C 2 ) π 0 = initial system state vv = field vacuum state 44

Stochastic representation: J µ = E 0 [ σ µ T, eµc 2 ] where P 0 is standard Wiener measure, and the information state (an unnormalized density operator) satisfies dσ µ t +[Kσ µ t +σ µ t K Lσ t L ]dt = µ 1 2 [C 1σ µ t +σ µ t C 1 ]dt+[lσ µ t +σ µ t L ]dy(t) [modified stochastic master equation] π µ t = σµ t σ t, 1 [unnormalized state, new to physics] When µ 0 recover the Belavkin quantum filtering equation (stochastic master equation) (1980 s) [unnormalized state σ t, normalized state π t = σ t / σ t, 1 ]. 45

The dynamic programming equation is, formally, t Sµ (σ, t) + inf u U L µ;u S µ (σ, t) Discrete time (James 2004) = 0, 0 t < T S µ (σ, T) = σ, e µc 2 Some related current work Robustness interpretation, discrete time (James-Petersen 2004) Robustness, linear/quadratic case (Doherty, d Helon, James, Petersen, Wilson) 46

References [1] T. Basar and P. Bernhard. H Optimal Control and Related Minimax Design Problems. Birkhauser, Boston, 1995. [2] A. Bensoussan and J. H. van Schuppen. Optimal control of partially observable stochastic systems with an exponential-of-integral performance index. SIAM Journal on Control and Optimization, 23:599 613, 1985. [3] P. Dupuis and R.S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New York, 1997. [4] W.H. Fleming and W.M. McEneaney. Risk-sensitive control on an infinite time horizon. SIAM J. Control and Optimization, 33:1881 1915, 1995. [5] K. Glover and J. Doyle. State space formulea for all stabilizing controllers that statisfy an H norm bound and relations to risk-sensitivity. Systems and Control Letters, 11:167 172, 1988. [6] J.W. Helton and M.R. James. Extending H Control to Nonlinear Systems: Control of Nonlinear Systems to Achieve Performance Objectives, volume 1 of Advances in Design and Control. SIAM, Philadelphia, 1999. [7] D.H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Aut. Control, 18(2):124 131, 1973. [8] M. R. James, J.S. Baras, and R.J. Elliott. Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems. IEEE Trans. Automatic Control, 39(4):780 792, 1994. [9] M.R. James. Asymptotic analysis of nonlinear stochastic risk-sensitive control and differential games. Mathematics of Control, Signals and Systems, 5(4):401 417, 1992. [10] M.R. James. Risk-sensitive optimal control of quantum systems. Phys. Rev. A, 69:032108, 2004. 47

[11] P. Dai Pra, L. Meneghini, and W.J. Runggaldier. Connections between stochastic control and dynamic games. Math. Control, Signals and Systems, 9(4):303 326, 1996. [12] A.J. van der Schaft. L 2 -Gain and Passivity Techniques in Nonlinear Control. Springer Verlag, New York, 1996. [13] P. Whittle. Risk-sensitive linear quadratic Gaussian control. Advances in Applied Probability, 13:764 777, 1981. [14] G. Zames. On the input-output stability of time-varying nonlinear feedback systems. part i: Conditions derived using concepts of loop gain, conicity, and positivity. part ii: Conditions involving circles in the frequency plane and sector nonlinearities. IEEE Trans. Aut. Control, 11:228 238, 465 476, 1966. [15] G. Zames. Feedback and optimal sensitivity: Model reference transformation, multiplicative seminorms and approximate inverses. IEEE Trans. Aut. Control, 26:301 320, 1981. [16] C.D. Charalambous and J.L. Hibey, Minimum principle for partially observable nonlinear risk-sensitive control problems using measure-valued decompositions, Stochastics and Stochastics Reports, vol. 57, no. 3+4, pp. 247-288, August 1996. [17] W.H. Fleming and M.R. James, The Risk-Sensitive Index and the H 2 and H Norms for Nonlinear Systems, Mathematics of Control, Signals, and Systems, 8(3), 1995, 199-221. [18] H. Nagai, Bellman Equation of Risk-Sensitive Control, SIAM J. Control Optim., 34(1), 1996, 74-101. [19] M.G. Crandall, L.C. Evans and P.L. Lions, Some Properties of Viscosity Solutions of Hamilton Jacobi Equations, Trans. AMS, 282 (2) (1984) 487 502. [20] M.G. Crandall, H. Ishii and P.L. Lions, User s Guide to Viscosity Solutions of Second Order Partial Differential Equations, CEREMADE Report No. 9039, (1990). [21] P. Dupuis, M.R. James, and I.R. Petersen, Robust Properties of Risk-Sensitive Control, Math. Control, Systems and Signals, 13, 318-332, 2000. [22] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhauser, Boston, 1997. 48

[23] W.H. Fleming and H.M. Soner, Controlled Markov Processes and Viscosity Solutions, Springer, New York, 1993. 49