Introduction to Mean Field Game Theory Part I

Size: px

Start display at page:

Download "Introduction to Mean Field Game Theory Part I"

Marlene Lillian Hill
5 years ago
Views:

1 School of Mathematics and Statistics Carleton University Ottawa, Canada Workshop on Game Theory, Institute for Math. Sciences National University of Singapore, June 2018

2 Outline of talk Background and motivation for mean field game (MFG) theory Illustrative modeling examples Fundamental methodologies Applications and references

3 A mean field game is a situation of stochastic (dynamic) decision making where each agent interacts with the aggregate effect of all other agents; agents are non-cooperative. Example: Hybrid electric vehicle recharging control (interacting through aggregate load/price)

Noncooperative games: historical background 1.

N-person nonzero-sum Nash equilibrium, 1950

4 Noncooperative games: historical background 1. Cournot duopoly Cournot equilibrium, person zero-sum von Neumann s minimax theorem, N-person nonzero-sum Nash equilibrium, 1950 Antoine Cournot John von Neumann John Nash ( ) ( ) ( )

5 Two directions for generalizations: Include dynamics : L. Shapley (1953), MDP model; Rufus Isaacs (1965), Differential games, Wiley (work of 1950s) Consider large populations of players Technical challenges arise... the Curse of Dimensionality with a large number of players Example: 50 players, each having 2 states, 2 actions. Need a system configuration: 4 50 =

6 Early efforts to deal with large populations J. W. Milnor and L.S. Shapley (1978); R. J. Aumann (1975). This is for cooperative games E. J. Green (1984). Non-cooperative games M. Ali Khan and Y. Sun (2002). Survey on Non-cooperative games with many players. Static models B. Jovanovic and R. W. Rosenthal (1988). Anonymous sequential games. It considers distributional strategies for an infinite population. However, individual behaviour is not addressed...

7 The early motivation for mean field games A wireless network power control problem

8 Rate based power control (Huang, Caines, Malhamé 03) dp i = u i dt, u i u max, 1 i N Log-normal channel gain (in db) dx i = a(x i + b)dt + σdw i (Charalambous et al 99) W i : white noise (Brownian motion) Cost function of user i T {[ J i = E p i e xi γ( α 2 N N j i pj e xj + η)] + r(u i ) 2} dt 0 α N N j i pj e xj + η: total interference due to other users and thermal noise at receiver M 1 p1 p2 Base p 3 M 4 p4 Cocktail party model: each speaker deals with bkgrd noise; one against the mass M 3 M 2 Control objective: keep signal to interference ratio around γ Figure : A cell of N users

9 Computational load with N = 2 (Huang, Caines, Malhamé, IFAC 02, IEEE TAC 04): Grid size for the 4 spatial variables in R 4 : 90 4 Discretize time interval [0, 1] with step size 0.1 For the computation, memory allocation of 70 million sizeof(float) in C Very heavy computation via iterations!! For large N, such iteration is no longer feasible This motivated the development of a different approach to tackle dimensionality difficulty See Aziz and Caines (2017) for more recent numerical analysis using MFG

10 Conceptual Modeling of mean field games Mass influence zi ui i Play against mass m(t) Model: Each player interacts with the empirical state distribution of N players via dynamics and/or costs (see examples) δ (N) z Individual state and control (zi, u i ), δ (N) z(t) = 1 N N i=1 δ z i (t) Objective: Overcome dimensionality difficulty

11 Illustrative examples for MFG modeling Discrete time linear quadratic (LQ) model Continuous time LQ model MDP model Stochastic growth with relative performance

12 Example 1: Discrete time linear quadratic (LQ) games X i t+1 = AX i t + Bu i t + GX (N) J i (u i, u i ) = E t=0 t + DW i t+1, ρ t { X i t (Γ X (N) t 1 i N + η) 2 Q + (ui t) T Ru i t } X i t : state; ui t : control; u i : control of all other agents; W i t : noise; Q 0, R > 0, ρ (0, 1); z Q = (z T Qz) 1 2 The coupling term X (N) t = 1 N N k=1 X k t It is possible to consider different parameters such as A i instead of A, etc.

13 Example 1 : Continuous time linear quadratic (LQ) dxt i = (A θi Xt i + But i + GX (N) )dt + DdW i J i (u i, u i ) = E 0 t e ρt { X i t (Γ X (N) t t, 1 i N + η) 2 Q + (ui t) T Ru i t } dt Xt i : state; ui t : control; W t i : white noise as Brownian motion; ρ > 0; z Q = (z T Qz) 1 2 The coupling term X (N) t = 1 N N k=1 X k t θ i : dynamic parameter to indicate heterogeneity Note: This is the continuous time counterpart of Example 1.

14 Example 2: Discrete time Markov decision processes (MDPs) N MDPs (x i t, a i t), 1 i N, where state x i t has transitions affected by action a i t but not by a j t, j i. Player i has cost J i = E T t=0 ρ t l(x i t, x (N) t, a i t), ρ (0, 1].

15 Example 3: Growth with relative performance Dynamics of N agents: dxt i = [ A(Xt i ) α δxt i Ct i ] dt σx i t dwt i, 1 i N Xt i : capital stock, X 0 i > 0, EX 0 i <, C t i : consumption rate Ax α, α (0, 1): Cobb-Douglas production function, 0 < α < 1, A > 0 δdt + σdwt i : stochastic depreciation (see e.g. Wälde 11, Feicht and Stummer 10 for stochastic depreciation modeling) {Wt i, 1 i N} are i.i.d. standard Brownian motions. The i.i.d. initial states {X0 i, 1 i N} are also independent of the N Brownian motions Take the standard choice γ = 1 α, i.e., equalizing the coefficient of the relative risk aversion to capital share See Huang and Nguyen (2016)

16 Example 3 (ctn) The utility functional of agent i: [ ] T J i (C 1,..., C N ) = E e ρt U(Ct i, C (N,γ) t )dt + e ρt S(X T ), where C (N,γ) t = 1 N 0 N i=1 (C t i ) γ, γ (0, 1). Take S(x) = ηx γ, η > 0, and [ U(Ct i, C (N,γ) t ) = 1 γ (C t i ) γ(1 λ) (Ct i ) γ ( = [ 1 γ (C i t ) γ ] 1 λ [ C (N,γ) t 1 γ ] λ (C i t )γ C (N,γ) t γ ] ) λ =: U 1 λ 0 U1 λ. Interpretation of U: as a weighted geometric mean of U 0 (own utility) and U 1 (relative utility).

17 Try conventional methods to solve, for instance, Example 1 : dxt i = (A θi Xt i + But i + GX (N) t J i (u i, u i ) = E 0 )dt + DdW i t, 1 i N e ρt { Xt i (Γ X (N) t + η) 2 Q + (ui t) T Rut}dt i Form large vector X t := [X 1 t,..., X N t ]T. Similarly, u t = [u 1 t,..., un t ]T This gives a standard dynamic game Immediate difficulties (for large N): Solve a very large scale dynamic programming problem In the resulting solution, each u i t will be a function of X t, which is too much information and may be unnecessary Such difficulties motivate the development of MFG theory!

18 The fundamental diagram of MFG theory Problem 3 : player Nash game states: ( 1 ) L strategies: ( L 1 ) costs: L ( ) 1 route 1 A large-scale coupled equation system Example: 1 coupled dynamic programming equations Performance route 2 route 1 Problem 3 : Optimal control of a single player state: control: mean field µ W is fixed and not controlled by L L L route 2 MFG equation system: 1 equation of optimal control; 1 equation of mean field dynamics (for ) µ W Example: HJB PDE; FPK PDE (or McKean-Vlasov SDE) blue: direct approach (Lasry and Lions, 06, 07); red: fixed point approach (Huang, Caines, Malhamé, 03, 07), (Huang, Malhamé, Caines, 06); see overview of the two approaches in (Caines, Huang, Malhamé, 17)

19 Before analyzing MFG, we mention the math toolkit for optimal control Dynamic programming (DP) (in Markov control models) Stochastic maximum principle (SMP)/calculus of variations SMP is useful when the optimal control problem does not have Markov properties (example: random model coefficients) Here we give brief preliminaries on continuous time DP

20 Consider the optimal control problem Dynamics and cost: dx t = (AX t + Bu t + G X t )ds + DdW t, J(u ( ) ) = E T 0 e ρt L(X t, u t )dt, where L = X t (Γ X t + η) 2 Q + ut t Ru t, ρ > 0, Q 0, R > 0. X C[0, T ]. Moreover X t = O(e t(ρ ϵ)/2 ) for some small ϵ > 0 (depending on X ) if T = ; denote X C ρ/2 [0, ). u t is Ft W -adapted (i.e., adapted to the filtration of the BM). Alternatively, one may take u t to be X t dependent (i.e. state feedback). We consider two cases: i) T is finite, ii) T =

21 By the method of DP, we try to solve a family of optimization problems with initial pair (t, x). Consider dx r = (AX r + Bu r + G X r )dr + DdW r, r t, X t = x, T J(t, x, u ( ) ) = E t,x e ρ(r t) L(X r, u r )dr, t where E t,x indicates X t = x, and u r is F W r -adapted. Define the value function V (t, x) = inf u E t,x T t e ρ(r t) L(X r, u r )dr.

22 By DP, for δ > 0, t+δ V (t, x) = inf E[ e ρ(r t) L(X r, u r )dr + e ρδ V (t + δ, X t+δ )], t and we further obtain the HJB equation ρv = V t + V T x (Ax + G X t ) + x (Γ X t + η) 2 Q Tr(DT V xx D) + min[u T Ru + V T u x Bu] = V t 1 4 V T x BR 1 B T V x + V T x (Ax + G X t ) + x (Γ X t + η) 2 Q Tr(DT V xx D). And V (T, x) = 0 when T is finite. V t, V x, V xx : partial derivatives. See e.g. Fleming and Rishel (1975), Yong and Zhou (1999)

23 i) For finite T, denote V (t, x) = x T P(t)x + 2s T (t)x + q(t). We derive differential Riccati equation (DRE) { ρp(t) = Ṗ(t) P(t)BR 1 B T P(t) + P(t)A + A T P(t) + Q, P(T ) = 0. { ρs(t) = ṡ(t) + (A T PBR 1 B T )s(t) + PG X t Q(Γ X t + η), s(t ) = 0. Fact: The DRE has a unique solution (due to Q 0 and R > 0). Use the minimizer in the HJB to give the optimal control u opt t = R 1 B T (P(t)X t + s(t)).

24 ii) For T =, the DRE is replaced by an algebraic Riccati equation (ARE) which has a unique solution Π 0. In this case, the ODE of s has no terminal condition; a growth condition can be imposed to determine a unique solution for s.

25 Solve Example 1 by the fixed point approach We use the LQ model to illustrate this methodology The procedure here is very robust and can be applied to different models

26 Fixed approach to the (as in Ex. 1 ): Problem P 0 dxt i = (A θi Xt i + But i + GX (N) t J i (u i, u i ) = E 0 Assume x (N) 0 = 1 N N i=1 EX i 0 X 0. )dt + DdW i t, 1 i N { e ρt Xt i (Γ X (N) t + η) 2 Q + (ui t) T Rut i Taking approximation: Optimal Control Problem P } dt dxt i = (A θi Xt i + But i + G X t)dt + DdWt i, 1 i N { } J i (u i, u i ) = E e ρt Xt i (Γ X t + η) 2 Q + (ui t )T Rut i dt 0 Step 1. Solve P assuming the deterministic X t were known Step 2. Use consistency condition of MFG to find an equation for X t

27 Step 1. Denote (we consider A θi A just for simplicity) Riccati equation: ρπ = ΠA + A T Π ΠBR 1 B T Π + Q and (for s C ρ/2 ; no given initial condition) ODE: ρs t = ds t dt + (AT ΠBR 1 B T )s t + ΠG X t Q(Γ X t + η) Best response (BR): û i t = R 1 B T (ΠX i t + s t ). Step 2. Closed-loop for P : dx i t = (A BR 1 B T Π)X i t dt BR 1 B T s t dt + G X t dt + DdW i t = d X t dt = (A BR 1 B T Π) X t + G X t BR 1 B T s t, which is obtained from averaging X i t with X 0 determined.

28 Summarizing, the MFG equation system is { d X t dt = (A BR 1 B T Π + G) X t BR 1 B T s t, ρs t = ds t dt + (AT ΠBR 1 B T )s t + ΠG X t Q(Γ X t + η), where X 0 is given. Look for s C ρ/2 [0, ). Analysis and results: This is carried out in the form of a list of Q&A s.

29 Question 1: Does there exist a unique solution ( X, s)? Note: s only has a growth condition Answer: under some conditions, there exists a unique solution. Method: use fixed point theorem (write X as the fixed point of a linear operator), or subspace decomposition of the ODE system

30 Question 2. How is the solution used in the N player model? Answer: provide decentralized strategies The set of decentralized strategies, denoted as û, for the N players: û i t = R 1 B T (ΠX i t + s t ), 1 i N. Key Feature: Player i does not need the state or sample path information of other players.

31 Question 3: What is the performance of the set of decentralized strategies? To analyze performance, we need to introduce strategy spaces. U i centr consists of strategies of the form u i (t, x 1,..., x N ); u i (t, x 1,..., x N ) C(1 + x ); and u i is Lipschitz in x = (x 1,..., x N ). This ensures well defined SDEs. u i is a centralized strategy, here depending on states of all players u i is a Markov feedback strategy Although decentralized strategies are desired, the consideration of U i centr is useful for characterizing performance.

32 To answer Q3 on performance Main assumptions: i) X0 i, i 1, are independent and also independent of the BMs W i, i 1. ii) sup i E X0 i 2 C. iii) The average of initial means x (N) 0 X 0 as N.

33 Question 3 (restated): What is the performance of the set of decentralized strategies? Theorem: If the MFG equation system has a unique solution, the set of decentralized strategies is an ϵ-nash equilibrium, i.e., for each i, inf J i (u i, û i ) J i (û i, û i ) ϵ, u i Ucentr i where ϵ = O( x (N) 0 X 0 + 1/ N). Interpretation: using full information by one deviating player offers little benefit to itself.

34 Question 4: How much efficiency loss occurs in the MFG? In order to answer this question, we need to compare with a social optimization problem. This is doable; we postpone this until introducing the social optimization problem to minimize J (N) soc = i J i.

35 Question 5: Heterogeneous agents? Answer: the MFG equation system will be modified. Assume empirical distribution for the parameter sequence {θ i, i 1} The intuition behind is to average non-uniform particles

36 Nonlinear case Interacting particle modeling for MFG: N agents (players), 1 i N. Dynamics and costs: dx i = 1 N N f (x i, u i, x j )dt + 1 N j=1 J i (u i, u i ) = E T 0 1 N N σ(x i, u i, x j )dw i, j=1 N L(x i, u i, x j )dt, 1 i N. j=1 each x j R n Assume i.i.d. initial states. (u i : controls of all players other than player i) As in the LQ case, a key step is to solve a special single agent optimal control problem. How to construct such a problem?

37 Recall dx i = 1 N N f (x i, u i, x j )dt + 1 N j=1 J i (u i, u i ) = E T 0 1 N N σ(x i, u i, x j )dw i, j=1 N L(x i, u i, x j )dt, 1 i N. j=1 each x j R n Write δ (N) x = 1 N N j=1 δ x j, δ : dirac measure. For φ = f, σ, L, denote 1 N N φ(x i, u i, x j ) = φ(x i, u i, y)δ x (N) R n j=1 Idea: approximate δ (N) x by a deterministic measure µ t (dy) := φ[x i, u i, δ (N) x ]

38 For simplicity, consider constant σ and scalar state. The optimal control problem with dynamics and cost: dx i = f [x i, u i, µ t ]dt + σdw i, J i (u i ) = E T where µ t is called the mean field. 0 L[x i, u i, µ t ]dt, each x j R Again, following the method in the case Assume {µ t, 0 t T } were known; solve the optimal control û i. Consistency condition the closed-loop state process under û i has its marginal equal to µ t. The analysis can be done for a finite classes/types of agents.

39 HJB McKV/FPK for the nonlinear diffusion model: HJB equation (scaler state for simplicity): { V t = inf f [x, u, µ t ] V } u U x + L[x, u, µ t] + σ2 2 V 2 x 2 V (T, x) = 0, (t, x) [0, T ) R. Best Response : u t = φ(t, x µ ), (t, x) [0, T ] R. Closed-loop McK-V equation (which can be written as a Fokker-Planck equation): dx t = f [x t, φ(t, x µ ), µ t ]dt + σdw t, 0 t T. Find a solution (x t, µ t ) in McK-V sense, i.e., Law(x t ) = µ t. Refs. Huang, Caines and Malhame 06, Lasry and Lions (06, 07), Cardaliaguet 12

40 Lasry and Lions (2006, 2007) considered dynamics and costs dx i t = u i dt + σdw i t, x i R d, σ > 0, J i (u 1,..., u N ) = lim inf T where µ N, i t 1 T E T 0 (L(x i, u i ) + V [x i, µ N, i t ])dt, is the empirical distribution of states of other agents. The second argument of V means it is a functional of the distribution for given x i. Assume periodicity of F and V in the cost. Denote Q = [0, 1] d. Solve the N-player game, and show the solution converges to a limiting HJB-FPK equation system σ2 ν(x) = H(x, ν) + λ = V [x, m], 2 σ2 2 m div( H (x, ν)m) = 0, p where Q ν(x)dx = 0, Q m(x) = 1, m 0. H: Hamiltonian, λ: long run average cost, ν(x): differential cost

41 Finite horizon can also be considered for this model. Dynamics of agent i: dx i t = u i tdt + 2dw i t, x i t R d ; cost functional: J N i (u i, u i ) = E T 0 [(ut) i 2 + F (xt, i µ N, i t )]dt + EG(xt, i µ N, i t ), where µ N, i t is the empirical distribution of the states of all other agents. See Lasry and Lions 07, also notes Cardaliaguet 12

42 The MFG equations: t V V DV 2 = F (x, m), (x, t) R d (0, T ) t m m div(mdv ) = 0, m(0) = m 0, V (x, T ) = G(x, m(t )), x R d, where V (t, x) and m(t, x) are the value function and the density of the state distribution, respectively. See an ϵ-nash equilibrium result in Cardaliaguet 12 This relates the MFG solution to finite population games

43 Problem 3 : player Nash game states: ( 1 ) L strategies: ( L 1 ) costs: L ( ) 1 route 1 A large-scale coupled equation system Example: 1 coupled dynamic programming equations Performance route 2 route 1 Problem 3 : Optimal control of a single player state: control: mean field µ W is fixed and not controlled by L L L route 2 MFG equation system: 1 equation of optimal control; 1 equation of mean field dynamics (for ) µ W Example: HJB PDE; FPK PDE (or McKean-Vlasov SDE) So for nonlinear diffusion models, the diagram reduces to the structure on next page

44 The basic framework of MFGs: Math. Aspects P 0 Game with N players; Example dx i = f (x i, u i, δ x (N) )dt + σ( )dw i J i (u i, u i ) = E T 0 l(x i, u i, δ x (N) )dt δ x (N) : empirical distribution of (x j ) N j=1 solution HJBs coupled via densities p N i,t, 1 i N +N Fokker-Planck-Kolmogorov equations u i adapted to σ(w i (s), s t) (i.e., restrict to decentralized info for N players); so giving u N i (t, x i ) construct performance? (subseq. convergence) N P Limiting problem, 1 player dx i = f (x i, u i, µ t )dt + σ( )dw i J i (u i ) = E T 0 l(x i, u i, µ t)dt Freeze µ t, as approx. of δ (N) x solution û i (t, x i ) : optimal response HJB (with v(t, ) given) : v t = inf ui (f T v xi + l Tr[σσT v xi x i ]) Fokker-Planck-Kolmogorov : p t = div(fp) + (( σσt 2 ) jk p) x j i xk i Coupled via µ t (w. density p t; p 0 given) The consistency based approach (red) is more popular; related to ideas in statistical physics (McKean-Vlasov eqn); FPK can be replaced by an MV-SDE

45 Applications: Economic theory Finance Communication networks Social opinion formation Power systems Electric vehicle recharging control Public health (vaccination games)...

46 Some applications of MFGs to economic growth and finance. Guéant, Lasry and Lions (2011): human capital optimization Lucas and Moll (2011): Knowledge growth and allocation of time Carmona and Lacker (2013): Investment of n brokers Huang (2013): capital accumulation with congestion effect Lachapelle et al. (2013): price formation Espinosa and Touzi (2013): Optimal investment with relative 1 performance concern (depending on N 1 j X j ) Jaimungal and Nourian (2014): Optimal execution Chan and Sircar (2015): Bertrand and Cournot models more...

47 References Only some papers are sampled and inserted into the lecture. For further literature, see these papers and books, and references therein. Bensoussan, Frehse, and Yam. Springer Brief Caines. Mean field games, in Encyclopedia of Systems and Control, Springer-Verlag, Caines, Huang, and Malhame, in Springer dynamic game theory handbook, Cardaliaguet. Notes on mean field games, Carmona and Delarue. Probabilistic Theory of MFGs, vol I and II, Cham: Springer, Gomes and Saud. Mean field games models a brief survey Huang, Caines, and Malhame CDC, IEEE TAC 07. Huang, Malhame and Caines, CIS 06. Lasry and Lions. Mean field games, 2006, Weintraub, Benkard, and Van Roy (2008). Econometrica; for oblivious equilibria

48 Thank you!

Production and Relative Consumption

Mean Field Growth Modeling with Cobb-Douglas Production and Relative Consumption School of Mathematics and Statistics Carleton University Ottawa, Canada Mean Field Games and Related Topics III Henri Poincaré