Risk-Sensitive and Robust Mean Field Games

Similar documents
Mean-Field Games with non-convex Hamiltonian

Linear Quadratic Zero-Sum Two-Person Differential Games Pierre Bernhard June 15, 2013

MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS

Stochastic optimal control with rough paths

Lecture 5 Linear Quadratic Stochastic Control

Robust control and applications in economic theory

On continuous time contract theory

Linear Quadratic Zero-Sum Two-Person Differential Games

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Nonlinear representation, backward SDEs, and application to the Principal-Agent problem

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

Game Theory Extra Lecture 1 (BoB)

Deterministic Dynamic Programming

OPTIMAL CONTROL SYSTEMS

MA8109 Stochastic Processes in Systems Theory Autumn 2013

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

Theoretical Tutorial Session 2

Stability of Stochastic Differential Equations

Explicit solutions of some Linear-Quadratic Mean Field Games

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

LAN property for sde s with additive fractional noise and continuous time observation

Systemic Risk and Stochastic Games with Delay

Systemic Risk and Stochastic Games with Delay

Production and Relative Consumption

An Overview of Risk-Sensitive Stochastic Optimal Control

Branching Processes II: Convergence of critical branching to Feller s CSB

Dynamical systems with Gaussian and Levy noise: analytical and stochastic approaches

Stochastic Differential Equations

A new approach for investment performance measurement. 3rd WCMF, Santa Barbara November 2009

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Control Design of a Distributed Parameter. Fixed-Bed Reactor

Rough Burgers-like equations with multiplicative noise

Introduction to numerical simulations for Stochastic ODEs

Bernardo D Auria Stochastic Processes /12. Notes. March 29 th, 2012

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

Linear-Quadratic Stochastic Differential Games with General Noise Processes

Solution of Stochastic Optimal Control Problems and Financial Applications

Cheap talk theoretic tools for distributed sensing in the presence of strategic sensors

1. Stochastic Processes and filtrations

Mean Field Consumption-Accumulation Optimization with HAR

Math 234. What you should know on day one. August 28, You should be able to use general principles like. x = cos t, y = sin t, 0 t π.

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

H -Optimal Control and Related Minimax Design Problems

arxiv: v1 [math.pr] 18 Oct 2016

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Rough paths methods 4: Application to fbm

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Bernardo D Auria Stochastic Processes /10. Notes. Abril 13 th, 2010

Large-population, dynamical, multi-agent, competitive, and cooperative phenomena occur in a wide range of designed

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

10 Transfer Matrix Models

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Continuous Time Finance

Mean Field Competitive Binary MDPs and Structured Solutions

Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation

Chapter 3. LQ, LQG and Control System Design. Dutch Institute of Systems and Control

I forgot to mention last time: in the Ito formula for two standard processes, putting

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Non-Zero-Sum Stochastic Differential Games of Controls and St

A Concise Course on Stochastic Partial Differential Equations

Dynamic Bertrand and Cournot Competition

Introduction to Algorithmic Trading Strategies Lecture 4

2012 NCTS Workshop on Dynamical Systems

An Introduction to Noncooperative Games

Exercises. T 2T. e ita φ(t)dt.

Asymptotics for posterior hazards

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

Uncertainty quantification and systemic risk

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

Mean Field Games on networks

Backward Stochastic Differential Equations with Infinite Time Horizon

An Overview of the Martingale Representation Theorem

Exercises for Multivariable Differential Calculus XM521

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Prof. Erhan Bayraktar (University of Michigan)

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Stochastic contraction BACS Workshop Chamonix, January 14, 2008

A Class of Fractional Stochastic Differential Equations

Riccati Equations and Inequalities in Robust Control

Thermodynamic limit and phase transitions in non-cooperative games: some mean-field examples

Lecture 6: Bayesian Inference in SDE Models

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

On the Stability of the Best Reply Map for Noncooperative Differential Games

Mean Field Game Theory for Systems with Major and Minor Agents: Nonlinear Theory and Mean Field Estimation

p 1 ( Y p dp) 1/p ( X p dp) 1 1 p

Lecture 4 Continuous time linear quadratic regulator

Partial Differential Equations with Applications to Finance Seminar 1: Proving and applying Dynkin s formula

Matthew Zyskowski 1 Quanyan Zhu 2

Stochastic Viral Dynamics with Beddington-DeAngelis Functional Response

An Introduction to Malliavin calculus and its applications

UCLA Chemical Engineering. Process & Control Systems Engineering Laboratory

Verona Course April Lecture 1. Review of probability

Control of McKean-Vlasov Dynamics versus Mean Field Games

Applications of Ito s Formula

Stochastic Gradient Descent in Continuous Time

Stochastic Differential Equations.

Nonlinear Control Systems

in Bounded Domains Ariane Trescases CMLA, ENS Cachan

Transcription:

Risk-Sensitive and Robust Mean Field Games Tamer Başar Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL - 6181 IPAM Workshop on Mean Field Games UCLA, Los Angeles, CA August 28-September 1, 217 August 31, 217 Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 1 / 1

Outline Introduction to nonzero-sum stochastic differential games (NZS SDGs) and Nash equilibrium: role of information structures Quick overview of risk-sensitive stochastic control (RS SC): equivalence to zero-sum stochastic differential games Mean field game approach to RS NZS SDGs with local state information Problem 1 (P1) Mean field game approach to robust NZS SDGs with local state information Problem 2 (P2) Connections between P1 and P2, and ɛ-nash equilibrium Extensions and conclusions Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 2 / 1

General NZS SDGs N-player state dynamics described by a stochastic differential equation (SDE) dx t = f (t, x t, u t )dt + D(t)db t, x t t= = x, u t := (u 1t,..., u Nt ) Could also be in partitioned form: x t = (x 1t,..., x Nt ) dx it = f i (t, x it, u it ; c i (t, x i,t, u i,t ))dt + D i (t)db it, i = 1,..., N Information structures (control policy of player i: γ i Γ i ): Closed-loop perfect state for all players: u it = γ i (t; x τ, τ t), i = 1,..., N Partial (local) state: u it = γ i (t; x iτ, τ t), i = 1,..., N Measurement feedback: u it = γ i (t; y iτ, τ t), dy it = h i (t, x it, x i,t )dt + E i (t)db it, i = 1,..., N Loss function for player i (over [t, T ]): L i (x [t,t ], u [t,t ] ) := q i (x T ) + t g i (s, x s, u s )ds Take expectations (for horizon [, T ]) with u = γ( ): J i (γ i, γ i ) Nash equilibrium γ Ji(γ i, γ i) = min γ i Γ i J i (γ i, γ i) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 3 / 1

Equilibrium solution State dynamics and loss functions dx t = f (t, x t, u t )dt + D(t)db t, x t t= = x, u t := (u 1t,..., u Nt ) L i (x [t,t ], u [t,t ] ) := q i (x T ) + g i (s, x s, u s )ds, i = 1,..., N t Closed-loop perfect state for all players: assume DD is strongly positive Nash equilibrium exists and is unique, if the coupled PDEs below admit a unique smooth solution: V i,t (t; x) = min v [V i,x f (t, x, v, u i) + g i (t, x, v, u i)] + 1 2 Tr[V i,xx(t; x)dd ] V i (T ; x) = q i (x), u it = γ i (t, x(t)), i = 1,..., N Other dynamic information structures (s.a. local state, decentralized, measurement feedback): Extremely challenging! Possibly infinite-dimensional (even if NE exists and is unique), even in linear-quadratic (LQ) NZS SDGs. Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 4 / 1

LQ NZS SDGs with perfect state information dx(t) = J i (γ i, γ i ) = E [ Ax + [ N B i u i ]dt + Ddb(t), i N = {1, 2,..., N} i=1 x(t ) 2 W i + N ] [ x(t) 2 Qi (t) + u j (t) 2 R ij ]dt j=1 The feedback Nash equilibrium: γ i The coupled RDEs with Z i and Z i (T ) = W i Z i = F Z i + Z i F + Q i + (t, x(t)) = R 1 ii B i Z i (t)x(t), i N N j=1 Z j B j R 1 jj R ij R 1 jj Bj Z j, i N F := A N j=1 B ir 1 ii B i Z i When information is local state, or imperfect measurement (even if shared by all players), existence and characterization an open problem Any hope for N sufficiently large? MFG approach provides the answer! Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 5 / 1

Risk-sensitive (RS) formulation of the NZS SDG Replace J i with J i (γ i, γ i ) = 2 θ ln E{ exp θ 2 L i(x [,T ], u [,T ] ) } where θ > is the risk sensitivity parameter, and as before L i (x [t,t ], u [t,t ] ) := q i (x T ) + g i (s, x s, u s )ds t and u it = γ i ( ), i = 1,..., N Nash equilibrium is defined as before, and the same difficulties with regard to information structures arise as before. Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 6 / 1

Digression: Risk-sensitive (RS) stochastic control State dynamics : dx t = f (t, x t, u t ) dt + ɛ D db t ; x t t= = x b t, t, standard Wiener process; ɛ > ; u t U, t (state FB control law µ M) Objective : Choose µ to minimize : ( θ > ) J(µ; t, x t ) = 2ɛ θ ln E{ exp θ 2ɛ L(x [t,t ], u [t,t ] ) } L(x [t,t ], u [t,t ] ) := q(x T ) + ψ(t; x) value function associated with t g(s, x s, u s )ds E { exp 2ɛ[ θ q(xt ) + g(s, x s, u s ) ds ] } t 2ɛ V (t; x) := inf J(µ; t, x) =: ln ψ(t; x), µ M θ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 7 / 1

DP and Itô differentiation rule { V t (t; x) = inf Vx (t; x) f (t, x, u) + g(t, x, u) } u U V (T ; x) q(x) + 1 4γ 2 DV x(t; x) 2 + ɛ 2 Tr[ V xx DD ] ( γ 2 := θ ) If U = R m1, f linear in u, and g quadratic in u : Optimal control law: f (x, u) = f (t, x) + B(t, x)u ; g(t, x, u) = g (t, x) + u 2 u (t) = µ (t, x) = 1 2 B (t, x) V x(t; x), t t f HJB equation : V t = V x f (t, x) + g (t, x) 1 4 [ BV x 2 γ 2 DV x 2] + ɛ 2 Tr[ V xx (t; x)dd ] ; V (T ; x) q(x) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 8 / 1

A further special case : LEQG Problem f (t, x) = A(t)x, g (t, x) = 1 2 x Qx, Q Explicit solution: q(x) = (1/2) x Q f x V (t; x) = 1 2 x Z(t)x + l ɛ (t), t Z + A Z + ZA + Q Z ( BB γ 2 DD ) Z = l ɛ (t) = ɛ 2 t Tr [ Z(s) D(s)D (s) ] ds u (t) = µ (t, x) = B (t) Z(t) x, t t f Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 9 / 1

A class of stochastic differential games Two Players : Player 1: u t ; Player 2: w t Upper-Value (UV) Function : HJI UV equation : dx t = f (x t, u t ) dt+dw t dt + ɛ D db t ; x J(µ, ν; t, x t ) := E { q(x T ) + g(s, x s, u s ) ds t inf u U γ 2 w s 2 ds } W (t; x) = inf sup J(µ, ν; t, x) µ ν sup { W t + W x ( f + Dw ) + g γ 2 w 2 w R m 2 t + ɛ 2 Tr[ Wxx DD ] } = Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 1 / 1

HJI UV equation : inf u U sup { W t + W x ( f + Dw ) + g γ 2 w 2 w R m 2 + ɛ 2 Tr[ Wxx DD ] } = Isaacs condition holds Value Function : { W t (t; x) = inf Wx (t; x) f (t, x, u) + g(t, x, u) } u U + 1 4γ 2 DW x(t; x) 2 + ɛ 2 Tr[ W xx (t; x)dd ] ; IDENTICAL with V for all permissible ɛ, γ The same holds for the time-average case (LQ) W (T ; x) q(x) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 11 / 1

LQ RS MFGs, Problem 1 (P1) Stochastic differential equation (SDE) for agent i, 1 i N dx i (t) = (A(θ i )x i (t) + B(θ i )u i (t))dt + µd(θ i )db i (t) The risk-sensitive cost function for agent i with δ > P1: J N 1,i(u i, u i ) = lim sup T φ 1 i (x, f N, u) := x i (t) 1 N δ { } T log E e 1 δ φ1 i (x,f N,u) N x i (t) 2 Q + u i (t) 2 Rdt Risk-sensitive control: Robust control w.r.t. the risk parameter δ J1,i(u N 1 [ i, u i ) = lim sup E { φ 1 } 1 i + T T 2δ var{φ1 i } + o( 1 ] δ ) f N (t) := 1 N N i=1 x i(t): Mean Field term (mass behavior) Agents are coupled with each other through the mean field term i=1 Tembine-Zhu-TB, TAC (59, 4, 214); Moon-TB, CDC (214), TAC (62, 3, 217) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 12 / 1

LQ Robust MFGs, Problem 2 (P2) Stochastic differential equation (SDE) for agent i, 1 i N dx i (t) = (A(θ i )x i (t) + B(θ i )u i (t) + D(θ i )v i (t))dt + µd(θ i )db i (t) The worst-case risk-neutral cost function for agent i P2: J2,i(u N 1 i, u i ) = sup lim sup v i V i T T E{φ2 i (x, f N, u, v)} φ 2 i (x, f N, u, v) := x i (t) 1 N x i (t) 2 Q + u i (t) 2 R γ 2 v i (t) 2 dt N i=1 v i can be viewed as a fictitious player (or adversary) of agent i, which strives for a worst-case cost function for agent i Agents are coupled with each other through the mean field term Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 13 / 1

Mean Field Analysis for P1 and P2 Solve the individual local robust control problem with g instead of f N P1: J 1 (u, g) = lim sup T δ T log E{ exp[ 1 δ P2: J 1 2 (u, v, g) = lim sup T T E{ x(t) g(t) 2 Q + u(t) 2 Rdt] } x(t) g(t) 2 Q + u(t) 2 R γ 2 v(t) 2 dt } Characterize g that is a best estimate of the mean field f N need to construct a mean field system T (g)(t) obtain a fixed point of T (g)(t), i.e., g = T (g ) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 14 / 1

Robust Tracking Control for P1 and P2 Proposition: Individual robust control problems for P1 and P2 Suppose that (A, B) is stabilizable and (A, Q 1/2 ) is detectable. Suppose that for a fixed γ = δ/2µ >, there is a matrix P that solves the following GARE Then A T P + PA + Q P(BR 1 B T 1 γ 2 DDT )P = H := A BR 1 B T P + 1 γ 2 DD T P and G := A BR 1 B T P are Hurwitz The robust decentralized controller: ū(t) = R 1 B T Px(t) R 1 B T s(t) = H T s(t) + Qg(t) Remark where ds(t) dt The worst-case disturbance (P2): v(t) = γ 2 D T Px(t) + γ 2 D T s(t) s(t) has a unique solution in Cn b : s(t) = e HT (t s) Qg(s)ds t The two robust tracking problems are identical Related to the robust (H ) control problem w.r.t. γ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 15 / 1

Mean Field Analysis for P1 and P2 x θ (t) = E{x θ (t)} and we use h C b n for P2 Mean field system for P1 (with the robust decentralized controller) T (g)(t) := x θ (t)df (θ, x) θ Θ,x X t x θ (t) = e G(θ)t x + e G(θ)(t τ) B(θ)R 1 B T (θ) ( ) e H(θ)T (τ s) Qg(s)ds dτ τ Mean field system for P2 (with the robust decentralized controller and the worst-case disturbance) L(h)(t) := x θ (t)df (θ, x) θ Θ,x X t x θ (t) = e H(θ)t x + e H(θ)(t τ)( ) B(θ)R 1 B T (θ) γ 2 D(θ)D(θ) T ( ) e HT (θ)(τ s) Qh(s)ds dτ τ Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 16 / 1

Mean Field Analysis for P1 and P2 T (g)(t) and L(h)(t) capture the mass behavior when N is large Simplest case lim f 1 N(t) = lim N N N N x i (t) = E{x i (t)} = T (g)(t), i=1 SLLN We need to seek g and h such that g = T (g ) and h = L(h ) Sufficient condition (due to the contraction mapping theorem) P1 : R 1 Q ( P2 : θ Θ θ Θ B(θ) 2( )( e G(θ)τ dτ e H(θ)t 2 dt ) e H(θ)τ dτ df (θ) < 1 ) 2 ( B(θ) 2 R 1 + γ 2 D(θ) 2) df (θ) < 1 lim k T k (g ) = g for any g C b n g (t) and h (t) are best estimates of f N (t) when N is large Generally g h. But when γ, g h Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 17 / 1

Main Results for P1 and P2 Existence and Characterization of an ɛ-nash equilibrium There exists an ɛ-nash equilibrium with g (P1), i.e., there exist {u i, 1 i N} and ɛ N such that J1,i(u N i, u i) inf J u i 1,i(u N i, u i) + ɛ N, Ui c where ɛ N as N. For the uniform agent case, ɛ N = O(1/ N) The ɛ-nash strategy u i is decentralized, i.e., u i is a function of x i and g Law of Large Numbers: g satisfies lim N 1 N 1 lim lim sup N T T N x i=1 i (t) g (t) 2 dt =, T, a.s. 1 N N i=1 i (t) g (t) 2 dt =, a.s. g : deterministic function and can be computed offline The same results also hold for P2 with the worst-case disturbance Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 18 / 1 x

Main Results for P1 and P2 Proof (sketch): Law of large numbers (first part) fn (t) g (t) 2 dt 2 1 N (xi (t) E{x i (t)}) 2 dt N i=1 + 2T sup E{xi (t)} g (t) 2 t The second part is zero (due to the fixed-point theorem) e i (t) = x i (t) E{x i (t)} is a mutually orthogonal random vector with E{e i (t)} = and E{ e i (t) 2 } < for all i and t Strong law of large numbers lim N (1/N) N t [, T ] i=1 e i (t) = for all (1/N) N i=1 e i (t) 2, N 1, is uniformly integrable on [, T ] for all T, we have the desired result Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 19 / 1

Partial Equivalence and Limiting Behaviors of P1 and P2 LQ-MFG,for fixed ( ) LQ-RSMFG (P1) Partial LQ-RMFG (P2) P1 and P2 share the same robust decentralized controller Partial equivalence: the mean field systems (and their fixed points) are different Limiting behaviors Large deviation (small noise) limit (µ, δ with γ = δ/2µ > ): The same results hold under this limit (SDE ODE) Risk-neutral limit (γ ): The results are identical to that of the (risk-neutral) LQ mean field game (g h ) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 2 / 1

Simulations (N = 5) A i = θ i is an i.i.d. uniform random variable with the interval [2, 5], B = D = Q = R = 1, µ = 2 γ θ = γ = 1, g (t) = 5.86e 8.49t, h (t) = 5.1e 3.37t ɛ 2 (N) := lim sup T 1 T E f N (t) g (t) 2 dt γ determines robustness of the equilibrium (due to the individual robust control problems) Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 21 / 1

Simulations (N = 5) A i = θ i is an i.i.d. uniform random variable with the interval [2, 5], B = D = Q = R = 1, µ = 2 γ θ = γ = 1, g (t) = 5.86e 8.49t, h (t) = 5.1e 3.37t ɛ 2 (N) := lim sup T 1 T E f N (t) g (t) 2 dt γ determines robustness of the equilibrium (due to the individual robust control problems) 35 3 γ = 1.5 (P1) γ = 1.5 (P2) γ = 15 (P1 and P2).9.8 γ = 1.5 (P1) γ = 15 (P1 and P2) γ = 1.5 (P2).7 25.6 number of agents 2 15 ε 2 (N).5.4.3 1.2 5.1 1.8.6.4.2.2.4.6.8 1 state value 1 2 3 4 5 6 7 8 N Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 21 / 1

Conclusions Decentralized (local state-feedback) ɛ-nash equilibria for LQ risk-sensitive and LQ robust mean field games The equilibrium features robustness due to the local robust optimal control problem parametrized by γ LQ risk-sensitive and LQ robust mean field games are partially equivalent (g h ) hold the same limiting behaviors as the one-agent case Extensions to heterogenous case and nonlinear dynamics are possible, but results are not as explicit; see, Tembine, Zhu, Başar, IEEE-TAC (59, 4, 214) for RSMFG Imperfect state measurements RSMFGs on networks with agents interacting only with their neighbors Leader-Follower MFGs Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 22 / 1

THANKS! Tamer Başar (ECE/CSL, UIUC) IPAM Workshop on Mean Field Games August 31, 217 23 / 1