Risk-Sensitive and Robust Mean Field Games Tamer Başar Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL - 6181 IPAM Workshop on Mean Field Games UCLA, Los Angeles, CA August 28-September 1, 217 August 31, 217 Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 1 / 1
Outline Introduction to nonzero-sum stochastic differential games (NZS SDGs) and Nash equilibrium: role of information structures Quick overview of risk-sensitive stochastic control (RS SC): equivalence to zero-sum stochastic differential games Mean field game approach to RS NZS SDGs with local state information Problem 1 (P1) Mean field game approach to robust NZS SDGs with local state information Problem 2 (P2) Connections between P1 and P2, and ɛ-nash equilibrium Extensions and conclusions Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 2 / 1
General NZS SDGs N-player state dynamics described by a stochastic differential equation (SDE) dx t = f (t, x t, u t )dt + D(t)db t, x t t= = x, u t := (u 1t,..., u Nt ) Could also be in partitioned form: x t = (x 1t,..., x Nt ) dx it = f i (t, x it, u it ; c i (t, x i,t, u i,t ))dt + D i (t)db it, i = 1,..., N Information structures (control policy of player i: γ i Γ i ): Closed-loop perfect state for all players: u it = γ i (t; x τ, τ t), i = 1,..., N Partial (local) state: u it = γ i (t; x iτ, τ t), i = 1,..., N Measurement feedback: u it = γ i (t; y iτ, τ t), dy it = h i (t, x it, x i,t )dt + E i (t)db it, i = 1,..., N Loss function for player i (over [t, T ]): L i (x [t,t ], u [t,t ] ) := q i (x T ) + t g i (s, x s, u s )ds Take expectations (for horizon [, T ]) with u = γ( ): J i (γ i, γ i ) Nash equilibrium γ Ji(γ i, γ i) = min γ i Γ i J i (γ i, γ i) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 3 / 1
Equilibrium solution State dynamics and loss functions dx t = f (t, x t, u t )dt + D(t)db t, x t t= = x, u t := (u 1t,..., u Nt ) L i (x [t,t ], u [t,t ] ) := q i (x T ) + g i (s, x s, u s )ds, i = 1,..., N t Closed-loop perfect state for all players: assume DD is strongly positive Nash equilibrium exists and is unique, if the coupled PDEs below admit a unique smooth solution: V i,t (t; x) = min v [V i,x f (t, x, v, u i) + g i (t, x, v, u i)] + 1 2 Tr[V i,xx(t; x)dd ] V i (T ; x) = q i (x), u it = γ i (t, x(t)), i = 1,..., N Other dynamic information structures (s.a. local state, decentralized, measurement feedback): Extremely challenging! Possibly infinite-dimensional (even if NE exists and is unique), even in linear-quadratic (LQ) NZS SDGs. Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 4 / 1
LQ NZS SDGs with perfect state information dx(t) = J i (γ i, γ i ) = E [ Ax + [ N B i u i ]dt + Ddb(t), i N = {1, 2,..., N} i=1 x(t ) 2 W i + N ] [ x(t) 2 Qi (t) + u j (t) 2 R ij ]dt j=1 The feedback Nash equilibrium: γ i The coupled RDEs with Z i and Z i (T ) = W i Z i = F Z i + Z i F + Q i + (t, x(t)) = R 1 ii B i Z i (t)x(t), i N N j=1 Z j B j R 1 jj R ij R 1 jj Bj Z j, i N F := A N j=1 B ir 1 ii B i Z i When information is local state, or imperfect measurement (even if shared by all players), existence and characterization an open problem Any hope for N sufficiently large? MFG approach provides the answer! Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 5 / 1
Risk-sensitive (RS) formulation of the NZS SDG Replace J i with J i (γ i, γ i ) = 2 θ ln E{ exp θ 2 L i(x [,T ], u [,T ] ) } where θ > is the risk sensitivity parameter, and as before L i (x [t,t ], u [t,t ] ) := q i (x T ) + g i (s, x s, u s )ds t and u it = γ i ( ), i = 1,..., N Nash equilibrium is defined as before, and the same difficulties with regard to information structures arise as before. Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 6 / 1
Digression: Risk-sensitive (RS) stochastic control State dynamics : dx t = f (t, x t, u t ) dt + ɛ D db t ; x t t= = x b t, t, standard Wiener process; ɛ > ; u t U, t (state FB control law µ M) Objective : Choose µ to minimize : ( θ > ) J(µ; t, x t ) = 2ɛ θ ln E{ exp θ 2ɛ L(x [t,t ], u [t,t ] ) } L(x [t,t ], u [t,t ] ) := q(x T ) + ψ(t; x) value function associated with t g(s, x s, u s )ds E { exp 2ɛ[ θ q(xt ) + g(s, x s, u s ) ds ] } t 2ɛ V (t; x) := inf J(µ; t, x) =: ln ψ(t; x), µ M θ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 7 / 1
DP and Itô differentiation rule { V t (t; x) = inf Vx (t; x) f (t, x, u) + g(t, x, u) } u U V (T ; x) q(x) + 1 4γ 2 DV x(t; x) 2 + ɛ 2 Tr[ V xx DD ] ( γ 2 := θ ) If U = R m1, f linear in u, and g quadratic in u : Optimal control law: f (x, u) = f (t, x) + B(t, x)u ; g(t, x, u) = g (t, x) + u 2 u (t) = µ (t, x) = 1 2 B (t, x) V x(t; x), t t f HJB equation : V t = V x f (t, x) + g (t, x) 1 4 [ BV x 2 γ 2 DV x 2] + ɛ 2 Tr[ V xx (t; x)dd ] ; V (T ; x) q(x) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 8 / 1
A further special case : LEQG Problem f (t, x) = A(t)x, g (t, x) = 1 2 x Qx, Q Explicit solution: q(x) = (1/2) x Q f x V (t; x) = 1 2 x Z(t)x + l ɛ (t), t Z + A Z + ZA + Q Z ( BB γ 2 DD ) Z = l ɛ (t) = ɛ 2 t Tr [ Z(s) D(s)D (s) ] ds u (t) = µ (t, x) = B (t) Z(t) x, t t f Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 9 / 1
A class of stochastic differential games Two Players : Player 1: u t ; Player 2: w t Upper-Value (UV) Function : HJI UV equation : dx t = f (x t, u t ) dt+dw t dt + ɛ D db t ; x J(µ, ν; t, x t ) := E { q(x T ) + g(s, x s, u s ) ds t inf u U γ 2 w s 2 ds } W (t; x) = inf sup J(µ, ν; t, x) µ ν sup { W t + W x ( f + Dw ) + g γ 2 w 2 w R m 2 t + ɛ 2 Tr[ Wxx DD ] } = Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 1 / 1
HJI UV equation : inf u U sup { W t + W x ( f + Dw ) + g γ 2 w 2 w R m 2 + ɛ 2 Tr[ Wxx DD ] } = Isaacs condition holds Value Function : { W t (t; x) = inf Wx (t; x) f (t, x, u) + g(t, x, u) } u U + 1 4γ 2 DW x(t; x) 2 + ɛ 2 Tr[ W xx (t; x)dd ] ; IDENTICAL with V for all permissible ɛ, γ The same holds for the time-average case (LQ) W (T ; x) q(x) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 11 / 1
LQ RS MFGs, Problem 1 (P1) Stochastic differential equation (SDE) for agent i, 1 i N dx i (t) = (A(θ i )x i (t) + B(θ i )u i (t))dt + µd(θ i )db i (t) The risk-sensitive cost function for agent i with δ > P1: J N 1,i(u i, u i ) = lim sup T φ 1 i (x, f N, u) := x i (t) 1 N δ { } T log E e 1 δ φ1 i (x,f N,u) N x i (t) 2 Q + u i (t) 2 Rdt Risk-sensitive control: Robust control w.r.t. the risk parameter δ J1,i(u N 1 [ i, u i ) = lim sup E { φ 1 } 1 i + T T 2δ var{φ1 i } + o( 1 ] δ ) f N (t) := 1 N N i=1 x i(t): Mean Field term (mass behavior) Agents are coupled with each other through the mean field term i=1 Tembine-Zhu-TB, TAC (59, 4, 214); Moon-TB, CDC (214), TAC (62, 3, 217) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 12 / 1
LQ Robust MFGs, Problem 2 (P2) Stochastic differential equation (SDE) for agent i, 1 i N dx i (t) = (A(θ i )x i (t) + B(θ i )u i (t) + D(θ i )v i (t))dt + µd(θ i )db i (t) The worst-case risk-neutral cost function for agent i P2: J2,i(u N 1 i, u i ) = sup lim sup v i V i T T E{φ2 i (x, f N, u, v)} φ 2 i (x, f N, u, v) := x i (t) 1 N x i (t) 2 Q + u i (t) 2 R γ 2 v i (t) 2 dt N i=1 v i can be viewed as a fictitious player (or adversary) of agent i, which strives for a worst-case cost function for agent i Agents are coupled with each other through the mean field term Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 13 / 1
Mean Field Analysis for P1 and P2 Solve the individual local robust control problem with g instead of f N P1: J 1 (u, g) = lim sup T δ T log E{ exp[ 1 δ P2: J 1 2 (u, v, g) = lim sup T T E{ x(t) g(t) 2 Q + u(t) 2 Rdt] } x(t) g(t) 2 Q + u(t) 2 R γ 2 v(t) 2 dt } Characterize g that is a best estimate of the mean field f N need to construct a mean field system T (g)(t) obtain a fixed point of T (g)(t), i.e., g = T (g ) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 14 / 1
Robust Tracking Control for P1 and P2 Proposition: Individual robust control problems for P1 and P2 Suppose that (A, B) is stabilizable and (A, Q 1/2 ) is detectable. Suppose that for a fixed γ = δ/2µ >, there is a matrix P that solves the following GARE Then A T P + PA + Q P(BR 1 B T 1 γ 2 DDT )P = H := A BR 1 B T P + 1 γ 2 DD T P and G := A BR 1 B T P are Hurwitz The robust decentralized controller: ū(t) = R 1 B T Px(t) R 1 B T s(t) = H T s(t) + Qg(t) Remark where ds(t) dt The worst-case disturbance (P2): v(t) = γ 2 D T Px(t) + γ 2 D T s(t) s(t) has a unique solution in Cn b : s(t) = e HT (t s) Qg(s)ds t The two robust tracking problems are identical Related to the robust (H ) control problem w.r.t. γ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 15 / 1
Mean Field Analysis for P1 and P2 x θ (t) = E{x θ (t)} and we use h C b n for P2 Mean field system for P1 (with the robust decentralized controller) T (g)(t) := x θ (t)df (θ, x) θ Θ,x X t x θ (t) = e G(θ)t x + e G(θ)(t τ) B(θ)R 1 B T (θ) ( ) e H(θ)T (τ s) Qg(s)ds dτ τ Mean field system for P2 (with the robust decentralized controller and the worst-case disturbance) L(h)(t) := x θ (t)df (θ, x) θ Θ,x X t x θ (t) = e H(θ)t x + e H(θ)(t τ)( ) B(θ)R 1 B T (θ) γ 2 D(θ)D(θ) T ( ) e HT (θ)(τ s) Qh(s)ds dτ τ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 16 / 1
Mean Field Analysis for P1 and P2 T (g)(t) and L(h)(t) capture the mass behavior when N is large Simplest case lim f 1 N(t) = lim N N N N x i (t) = E{x i (t)} = T (g)(t), i=1 SLLN We need to seek g and h such that g = T (g ) and h = L(h ) Sufficient condition (due to the contraction mapping theorem) P1 : R 1 Q ( P2 : θ Θ θ Θ B(θ) 2( )( e G(θ)τ dτ e H(θ)t 2 dt ) e H(θ)τ dτ df (θ) < 1 ) 2 ( B(θ) 2 R 1 + γ 2 D(θ) 2) df (θ) < 1 lim k T k (g ) = g for any g C b n g (t) and h (t) are best estimates of f N (t) when N is large Generally g h. But when γ, g h Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 17 / 1
Main Results for P1 and P2 Existence and Characterization of an ɛ-nash equilibrium There exists an ɛ-nash equilibrium with g (P1), i.e., there exist {u i, 1 i N} and ɛ N such that J1,i(u N i, u i) inf J u i 1,i(u N i, u i) + ɛ N, Ui c where ɛ N as N. For the uniform agent case, ɛ N = O(1/ N) The ɛ-nash strategy u i is decentralized, i.e., u i is a function of x i and g Law of Large Numbers: g satisfies lim N 1 N 1 lim lim sup N T T N x i=1 i (t) g (t) 2 dt =, T, a.s. 1 N N i=1 i (t) g (t) 2 dt =, a.s. g : deterministic function and can be computed offline The same results also hold for P2 with the worst-case disturbance Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 18 / 1 x
Main Results for P1 and P2 Proof (sketch): Law of large numbers (first part) fn (t) g (t) 2 dt 2 1 N (xi (t) E{x i (t)}) 2 dt N i=1 + 2T sup E{xi (t)} g (t) 2 t The second part is zero (due to the fixed-point theorem) e i (t) = x i (t) E{x i (t)} is a mutually orthogonal random vector with E{e i (t)} = and E{ e i (t) 2 } < for all i and t Strong law of large numbers lim N (1/N) N t [, T ] i=1 e i (t) = for all (1/N) N i=1 e i (t) 2, N 1, is uniformly integrable on [, T ] for all T, we have the desired result Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 19 / 1
Partial Equivalence and Limiting Behaviors of P1 and P2 LQ-MFG,for fixed ( ) LQ-RSMFG (P1) Partial LQ-RMFG (P2) P1 and P2 share the same robust decentralized controller Partial equivalence: the mean field systems (and their fixed points) are different Limiting behaviors Large deviation (small noise) limit (µ, δ with γ = δ/2µ > ): The same results hold under this limit (SDE ODE) Risk-neutral limit (γ ): The results are identical to that of the (risk-neutral) LQ mean field game (g h ) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 2 / 1
Simulations (N = 5) A i = θ i is an i.i.d. uniform random variable with the interval [2, 5], B = D = Q = R = 1, µ = 2 γ θ = γ = 1, g (t) = 5.86e 8.49t, h (t) = 5.1e 3.37t ɛ 2 (N) := lim sup T 1 T E f N (t) g (t) 2 dt γ determines robustness of the equilibrium (due to the individual robust control problems) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 21 / 1
Simulations (N = 5) A i = θ i is an i.i.d. uniform random variable with the interval [2, 5], B = D = Q = R = 1, µ = 2 γ θ = γ = 1, g (t) = 5.86e 8.49t, h (t) = 5.1e 3.37t ɛ 2 (N) := lim sup T 1 T E f N (t) g (t) 2 dt γ determines robustness of the equilibrium (due to the individual robust control problems) 35 3 γ = 1.5 (P1) γ = 1.5 (P2) γ = 15 (P1 and P2).9.8 γ = 1.5 (P1) γ = 15 (P1 and P2) γ = 1.5 (P2).7 25.6 number of agents 2 15 ε 2 (N).5.4.3 1.2 5.1 1.8.6.4.2.2.4.6.8 1 state value 1 2 3 4 5 6 7 8 N Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 21 / 1
Conclusions Decentralized (local state-feedback) ɛ-nash equilibria for LQ risk-sensitive and LQ robust mean field games The equilibrium features robustness due to the local robust optimal control problem parametrized by γ LQ risk-sensitive and LQ robust mean field games are partially equivalent (g h ) hold the same limiting behaviors as the one-agent case Extensions to heterogenous case and nonlinear dynamics are possible, but results are not as explicit; see, Tembine, Zhu, Başar, IEEE-TAC (59, 4, 214) for RSMFG Imperfect state measurements RSMFGs on networks with agents interacting only with their neighbors Leader-Follower MFGs Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 22 / 1
THANKS! Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 23 / 1