MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS

MEAN FIELD GAMES WITH MAJOR AND MINOR PLAYERS René Carmona Department of Operations Research & Financial Engineering PACM Princeton University CEMRACS - Luminy, July 17, 217

MFG WITH MAJOR AND MINOR PLAYERS SET-UP R.C. - G. Zhu, R.C. - P. Wang State equations { dxt = b (t, Xt, µ t, αt )dt + σ (t, Xt, µ t, αt )dwt dx t = b(t, X t, µ t, Xt, α t, αt )dt + σ(t, X t, µ t, Xt, α t, αt dw t, Costs { J (α, α) J(α, α) = E [ T f (t, Xt, µ t, αt )dt + g (XT, µ T ) ] = E [ T f (t, Xt, µn t, Xt, α t, αt )dt + g(x T, µ T ) ],

OPEN LOOP VERSION OF THE MFG PROBLEM The controls used by the major player and the representative minor player are of the form: α t = φ (t, W [,T ]), and α t = φ(t, W [,T ], W [,T ] ), (1) for deterministic progressively measurable functions φ : [, T ] C([, T ]; R d ) A and φ : [, T ] C([, T ]; R d ) C([, T ]; R d ) A

THE MAJOR PLAYER BEST RESPONSE Assume representative minor player uses the open loop control given by φ : (t, w, w) φ(t, w, w), Major player minimizes [ T ] J φ, (α ) = E f (t, Xt, µ t, αt )dt + g (XT, µ T ) under the dynamical constraints: dxt = b (t, Xt, µ t, αt )dt + σ (t, Xt, µ t, αt )dwt dx t = b(t, X t, µ t, Xt, φ(t, W [,T ], W [,T ]), αt )dt +σ(t, X t, µ t, Xt, φ(t, W [,T ], W [,T ]), αt )dw t, µ t = L(X t W [,t]) conditional distribution of Xt given W[,t]. Major player problem as the search for: φ, (φ) = arg inf α t =φ (t,w Optimal control of the conditional McKean-Vlasov type! [,T ] ) J φ, (α ) (2)

THE REP. MINOR PLAYER BEST RESPONSE System against which best response is sought comprises a major player a field of minor players different from the representative minor player Major player uses strategy α t = φ (t, W [,T ] ) Representative of the field of minor players uses strategy α t = φ(t, W [,T ], W [,T ]). State dynamics dxt = b (t, Xt, µ t, φ (t, W [,T ] ))dt + σ (t, Xt, µ t, φ (t, W [,T ] ))dw t dx t = b(t, X t, µ t, X t, φ(t, W [,T ], W [,T ]), φ (t, W [,T ] ))dt +σ(t, X t, µ t, X t, φ(t, W [,T ], W [,T ]), φ (t, W [,T ] ))dwt, where µ t = L(X t W [,t]) is the conditional distribution of Xt given W[,t]. Given φ and φ, SDE of (conditional) McKean-Vlasov type

THE REP. MINOR PLAYER BEST RESPONSE (CONT.) Representative minor player chooses a strategy α t = φ(t, W [,T ], W [,T ]) to minimize J φ,φ (ᾱ) = E [ T f (t, X t, X t, µ t, ᾱ t, φ (t, W [,T ]))dt + g(x T, µ t) ], where the dynamics of the virtual state X t are given by: dx t = b(t, X t, µ t, X t, φ(t, W [,T ], W [,T ] ), φ (t, W [,T ]))dt + σ(t, X t, µ t, X t, φ(t, W [,T ], W [,T ] ), φ (t, W [,T ]))dw t, for a Wiener process W = (W t) t T independent of the other Wiener processes. Optimization problem NOT of McKean-Vlasov type. Classical optimal control problem with random coefficients φ (φ, φ) = arg inf J φ α t =φ(t,w [,T ],W [,T ] ),φ (ᾱ)

NASH EQUILIBRIUM Search for Best Response Map Fixed Point ( ˆφ, ˆφ) = ( φ, ( ˆφ), φ ( ˆφ, ˆφ) ). Fixed point in a space of controls, not measures!!!

CLOSED LOOP VERSIONS OF THE MFG PROBLEM Closed Loop Version Controls of the major player and the representative minor player are of the form: α t = φ (t, X [,T ], µ t), and α t = φ(t, X [,T ], µ t, X [,T ]), for deterministic progressively measurable functions φ : [, T ] C([, T ]; R d ) P 2 (R d ) A and φ : [, T ] C([, T ]; R d ) P 2 (R d ) C([, T ]; R d ) A. Markovian Version Controls of the major player and the representative minor player are of the form: α t = φ (t, X t, µ t), and α t = φ(t, X t, µ t, X t ), for deterministic feedback functions φ : [, T ] R d P 2 (R d ) A and φ : [, T ] R d P 2 (R d ) R d A.

NASH EQUILIBRIUM Search for Best Response Map Fixed Point ( ˆφ, ˆφ) = ( φ, ( ˆφ), φ ( ˆφ, ˆφ) ).

CONTRACT THEORY: A STACKELBERG VERSION R.C. - D. Possamaï - N. Touzi State equation dx t = σ(t, X t, ν t, α t)[λ(t, X t, ν t, α t)dt + dw t], X t Agent output α t agent effort (control) ν t distribution of output and effort (control) of agent Rewards {J (ξ) = E [ U P (X [,T ], ν T, ξ) ] J(ξ, α) = E [ T f (t, Xt, νt, αt)dt + U A(ξ) ], Given the choice of a contract ξ by the Principal Each agent in the field of exchangeable agents chooses an effort level α t meets his/her reservation price get the field of agents in a (MF) Nash equilibrium Principal chooses the contract to maximize his/her expected utility

LINEAR QUADRATIC MODELS State dynamics { dx t = (L Xt + B αt + F Xt)dt + D dwt dx t = (LX t + Bα t + F X t + GX t )dt + DdW t where X t = E[X t F t ], (F t ) t filtration generated by W Costs [ T J (α, α) = E [(Xt H Xt η ) Q (X t H Xt η ) + α t ] R αt ]dt [ T ] J(α, α) = E [(X t HXt H 1 Xt η) Q(X t HXt H 1 Xt η) + α t Rαt]dt in which Q, Q, R, R are symmetric matrices, and R, R are assumed to be positive definite.

EQUILIBRIA Open Loop Version Optimization problems + fixed point = large FBSDE affine FBSDE solved by a large matrix Riccati equation Closed Loop Version Fixed point step more difficult Search limited to controls of the form αt = φ (t, Xt, X t) = φ (t) + φ 1(t)Xt + φ 2(t) X t α t = φ(t, X t, Xt, X t) = φ (t) + φ 1 (t)x t + φ 2 (t)xt + φ 3 (t) X t Optimization problems + fixed point = large FBSDE affine FBSDE solved by a large matrix Riccati equation Solutions are not the same!!!!

APPLICATION TO BEE SWARMING V,N t velocity of the (major player) streaker bee at time t V i,n t Linear dynamics { the velocity of the i-th worker bee, i = 1,, N at time t Minimization of Quadratic costs [ J T = E dv,n t = α t dt + Σ dwt dv i,n t = α i t dt + ΣdW i t ( λ V,N t ν t 2 + λ 1 V,N t V ] t N 2 + (1 λ λ 1 ) α t 2) dt N V t := 1 N N i=1 V i,n t the average velocity of the followers, deterministic function [, T ] t νt R d (leader s free will) λ and λ 1 are positive real numbers satisfying λ + λ 1 1 [ J i T ( = E l V i,n t V,N t 2 + l 1 V i,n t l and l 1, l + l 1 1. V ] t N 2 + (1 l l 1 ) α i t 2) dt

SAMPLE TRAJECTORIES IN EQUILIBRIUM ν(t) := [ 2π sin(2πt), 2π cos(2πt)] k =.8 k1 =.19 l =.19 l1 =.8 k =.8 k1 =.19 l =.8 l1 =.19 1. 1..5.5 y y...5 1..5..5 x 1..5..5 x FIGURE: Optimal velocity and trajectory of follower and leaders

SAMPLE TRAJECTORIES IN EQUILIBRIUM ν(t) := [ 2π sin(2πt), 2π cos(2πt)] k =.19 k1 =.8 l =.19 l1 =.8 k =.19 k1 =.8 l =.8 l1 =.19 2. 2. 1.5 1.5 y 1. y 1..5.5....5 1. 1.5 2. x.5..5 1. 1.5 x FIGURE: Optimal velocity and trajectory of follower and leaders

CONDITIONAL PROPAGATION OF CHAOS N = 5 1 N = 1 1 N = 2 1.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.4.4.4.6.6.6.8.8.8 1 N = 5 1 1 N = 1 1 1.8.8.6.6.4.4.2.2.2.2.4.4.6.6.8.8 1 FIGURE: Conditional correlation of 5 followers velocities 1

FINITE STATE SPACES: A CYBER SECURITY MODEL Kolokolstov - Bensoussan, R.C. - P. Wang N computers in a network (minor players) One hacker / attacker (major player) Action of major player affect minor player states (even when N >> 1) Major player feels only µ N t the empirical distribution of the minor players states Finite State Space: each computer is in one of 4 states protected & infected protected & sucseptible to be infected unprotected & infected unprotected & sucseptible to be infected Continuous time Markov chain in E = {DI, DS, UI, US} Each player s action is intended to affect the rates of change from one state to another to minimize expected costs [ T ] J(α, α) = E (k D 1 D + k I 1 I )(X t )dt [ T J (α (, α) = E f (µ t ) + k H φ (µ t ) ) ] dt

FINITE STATE MEAN FIELD GAMES State Dynamics (X t ) t continuous time Markov chain in E with Q-matrix (q t (x, x )) t,x,x E. Mean Field structure of the Q-matrix Control Space q t (x, x ) = λ t (x, x, µ, α) A R k, sometime A finite, e.g. A = {, 1}, or a function space Control Strategies in feedback form α t = φ(t, X t ), for some φ : [, T ] E A Mean Field Interaction through Empirical Measures µ P(E) = (µ({x})) x E Kolmogorov-Fokker-Planck equation: if µ t = L(X t ) t µ t ({x}) = [L µ t,φ(t, ), t µ t ]({x}) = µ t ({x })ˆq µ t,φ(t, ) t (x, x), x E = µ t ({x })λ t (x, x, µ t, φ(t, x)) x E, x E

FINITE STATE MEAN FIELD GAMES: OPTIMIZATION Hamiltonian Hamiltonian minimizer Minimized Hamiltonian HJB Equation H(t, x, µ, h, α) = λ t (x, x, µ, α)h(x ) + f (t, x, µ, α). x E ˆα(t, x, µ, h) = arg inf H(t, x, µ, h, α), α A H (t, x, µ, h) = inf H(t, x, µ, h, α) = H(t, x, µ, h, ˆα(t, x, µ, h)). α A t u µ (t, x) + H (t, x, µ t, u µ (t, )) =, t T, x E, with terminal condition u µ (T, x) = g(x, µ T ).

TRANSITION RATES Q-MATRICES For α = λ t (,, µ, α, ) = DI DS UI US DI q D rec DS α qinf D + β DDµ({DI}) + β UD µ({ui}) UI qrec U US α qinf U + β UUµ({UI}) + β DU µ({di}) and for α = 1: λ t (,, µ, α, 1) = DI DS UI US DI q Drec λ DS α qinf D + β DDµ({DI}) + β UD µ({ui}) λ UI λ qrec U US λ α qinf U + β UUµ({UI}) + β DU µ({di}) where all the instances of should be replaced by the negative of the sum of the entries of the row in which appears on the diagonal.

E QUILIBRIUM D ISTRIBUTION OVER T IME WITH C ONSTANT ATTACKER 2 4 6 time t 8 1.2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us]..2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us]...2.4 µ(t).6.8 µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 1. Time evolution of the state distribution µ(t) 1. Time evolution of the state distribution µ(t) 1. Time evolution of the state distribution µ(t) 2 4 6 time t 8 1 2 4 6 time t 8 1

EQUILIBRIUM OPTIMAL FEEDBACK φ(t, ) Time evolution of the optimal feedback function φ(t)[di] Time evolution of the optimal feedback function φ(t)[ds] Time evolution of the optimal feedback function φ(t)[ui] Time evolution of the optimal feedback function φ(t)[us] φ(t)[di]..2.4.6.8 1. φ(t)[ds]..2.4.6.8 1. φ(t)[ui]..2.4.6.8 1. φ(t)[us]..2.4.6.8 1. 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 time t time t time t time t From left to right φ(t, DI), φ(t, DS), φ(t, UI), and φ(t, US).

CONVERGENCE MAY BE ELUSIVE From left to right, time evolution of the distribution µ t for the parameters given in the text, after 1, 5, 2, and 1 iterations of the successive solutions of the HJB equation and the Kolmogorov Fokker Planck equation.

THE MASTER EQUATION EQUATION t U + H (t, x, µ, U(t,, µ)) + x E h (t, µ, U(t,, µ))(x U(t, x, µ) ) µ({x =, }) where the R E -valued function h is defined on [, T ] P(E) R E by: h ( ) (t, µ, u) = λ t x,, µ, ˆα(t, x, µ, u) dµ(x) E = x E λ t ( x,, µ, ˆα(t, x, µ, u) ) µ({x}). System of Ordinary Differential Equations (ODEs) If and when the Master equation is solved t µ t ({x}) = h (t, µ t, U(t,, µ t ))(x)

µ t EVOLUTION FROM THE MASTER EQUATION Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 2 4 6 8 1 2 4 6 8 1 time t time t As before, we used the initial conditions µ : µ = (.25,.25,.25,.25) in the left and µ = (1,,, ) on the right.

IN THE PRESENCE OF A MAJOR (HACKER) PLAYER Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) Time evolution of the state distribution µ(t) µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] µ(t)..2.4.6.8 1. µ(t)[di] µ(t)[ds] µ(t)[ui] µ(t)[us] 2 4 6 8 1 2 4 6 8 1 2 4 6 8 1 time t time t time t Time evolution in equilibrium, of the distribution µ t of the states of the computers in the network for the initial condition µ : µ = (.25,.25,.25,.25) when the major player is not rewarded for its attacks, i.e. when f (µ) (leftmost pane), in the absence of major player and v = (middle plot), and with f (µ) = k (µ({ui}) + µ({di})) with k =.5 (rightmost plot).

POA BOUNDS FOR CONTINUOUS TIME MFGS Price of Anarchy Bounds compare Social Welfare for NE to what a Central Planer would achieve Koutsoupias-Papadimitriou Usual Game Model for Cyber Security Zero-Sum Game between Attacker and Network Manager Compute Expected Cost to Network for Protection MFG Model for Cyber Security Let the individual computer owners take care of their security Hope for a Nash Equilibrium Compute Expected Cost to Network for Protection How much worse the NE does is the PoA

POA BOUNDS FOR CONTINUOUS TIME MFGS WITH FINITE STATE SPACES X t = (Xt 1,, X t N ) state at time t, with Xt i {e 1,, e d } Use distributed feedback controls, for state to be a continuous time Markov Chain Dynamics given by Q-matrices (q t (x, x ) t,x,x E Empirical measures µ N x = 1 N d δ N x i = p l δ el i=1 l=1 where p l = #{i; 1 i N, x i = e l }/N is the proportion of elements x i of the sample which are equal to e l. Cost Functionals Player i minimizes: [ T J i (α 1,, α N ) = E f (t, Xt i, α i X i t ) dt + g(x T i, µn 1 ) ], t X i T

SOCIAL COST If the N players use distributed Markovian control strategies of the form α i t = φ(t, X i t ) we define the cost (per player) to the system as the quantity J (N) φ J (N) φ In the limit N the social cost should be lim N J(N) φ = lim 1 N N 1 = lim N N = 1 N J i (α 1,, α N ) N j=1 N J i (α 1,, α N ) j=1 N [ T ] E f (t, Xt i X t, φ(t, Xt i T i, µn X T ), j=1 [ T = lim N < f (t,, µ N X t, φ(t, )), µ N X t > dt+ < g(, µ N X T ), µ N X T > if we use the notation < ϕ, ν > for the integral ϕ(z)ν(dz). Now if µ N X t converge toward a deterministic µ t, the social cost becomes: T SC φ (µ) = < f (t,, µ t, φ(t, )), µ t > dt + < g(, µ T ), µ T >, ], (3)

ASYMPTOTIC REGIME N = Two alternatives φ is the optimal feedback function for a MFG equilibrium for which µ is the equilibrium flow of statistical distributions of the state, in which case we use the notation SC MFG for SC φ (µ); [ T ] E f (t, X t, µ t, φ(t, X t ))dt + g(x T, µ T ) T = < f (t,, µ t, φ(t, )), L(X t ) > dt+ < g(, µ T ), L(X T ) > = SC φ (µ) = SC MFG in equilibrium φ is the feedback (chosen by a central planner) minimizing the social cost SC φ (µ) without having to be an MFG Nash equilibrium, in which case we use the notation SC MKV for SC φ (µ); T ˆφ = arg inf < f (t,, µ t, φ(t, )), µ t > dt+ < g(, µ T ), µ t > φ where µ t satisfies Kolmogorov-Fokker-Planck forward dynamics

POA: SOCIAL COST COMPUTATION Minimize T < f (t,, µ t, φ(t, )), µ t > dt+ < g(, µ T ), µ t > under the dynamical constraint t µ t ({x}) = [L µ t,φ(t, ), t µ t ]({x}) = µ t ({x })λ(t, x, x, µ t, φ(t, x)), x E, x E ODE in the d-dimensional probability simplex S d R d!!! Hamiltonian H(t, µ, ϕ, φ) by minimized Hamiltonian: H(t, µ, ϕ, φ) =< ϕ, [L µ,φ, t µ] > + < f (t,, µ, φ( )), µ > Assume infimum is attained for a unique ˆφ: =< L µ,φ t ϕ + f (t,, µ, φ( )), µ > H (t, µ, ϕ) = inf H(t, µ, ϕ, φ). φ Ã H (t, µ, ϕ) = H(t, µ, ϕ, ˆφ(t, µ, ϕ)) =< L µ, ˆφ(t,µ,ϕ) t ϕ + f (t,, µ, ˆφ(t, µ, ϕ)( )), µ >. HJB equation t v(t, µ) + H (t, µ, δv(t, µ) ) =, v(t, µ) =< g(, µ), µ >. δµ

REMARKS ON DERIVATIVES W.R.T. MEASURES Standard identification P(E) µ p = (p 1,, p d ) S d via µ p = (p 1,, p d ) with p i = µ({e i }) for i = 1,, d i.e. µ = d i=1 p i δ ei δv/δµ when v is defined on an open neighborhood of the probability simplex S d. v(t, µ)/ µ({x }) is the derivative of v with respect to the weight µ({x }). Important Remark L µ,φ t ϕ does not change if we add a constant to the function ϕ, so does ˆφ(t, µ, ϕ). Consequence (for numerical purposes): We can identify t v(t, µ) + H ( ( v(t, µ) t, µ, µ({x }) v(t, µ) v(t, µ) µ({x }) µ({x}), v(t, x, µ) ) ) =. µ({x}) x E for x x, with the partial derivative of v(t, ) with respect to µ({x }) whenever v(t, ) is regarded as a smooth function of the (d 1) tuple (µ({x })) x E\{x}, which we can see as an element of the (d 1)-dimensional domain S d 1, = {(p 1,, p d 1 ) [, 1] d 1 : d 1 p i 1} i=1

MFGS OF TIMING WITH MAJOR AND MINOR PLAYERS The Example of Corporate Bonds Major Player = bond issuer Bond is Callable Major Player (issuer) chooses a stopping time to pay-off the investors stop coupon payments to the investors refinance his debt with better terms Minor Players = field of investors Bond is Convertible Each Minor Player (investor) chooses a stopping time at which to convert the bond certificate in a fixed number (conversion ratio) of stock shares if and when owning the stock is more profitable creating Dilution of the stock