KINETIC MODELS FOR DIFFERENTIAL GAMES

Size: px

Start display at page:

Download "KINETIC MODELS FOR DIFFERENTIAL GAMES"

Robyn Fisher
5 years ago
Views:

1 KINETIC MODELS FOR DIFFERENTIAL GAMES D. Brinkman, C. Ringhofer math.la.asu.edu/ chris Work supported by NSF KI-NET Partially based on prev. work with P. Degond (Imperial College) M. Herty (RWTH Aachen) and J.G. Liu (Duke U.)

2 Classical kinetic models: An ensemble of mechanical particles is driven by a potential energy and undergoes binary interactions between agents or with a background. Drift: d dt ξ n = x V(ξ n, ξ), ξ ξ t x V(ξ n, ξ), ξ = (ξ 1,.., ξ N ) Random Diffusion: ξ n q n ( ξ), dp[q n ( ξ) = r] = P n (r, ξ) dr Assumptions: 1. (IID) Many particles. All particles are the same. 2. Correlations between particles are approximated. 3. The ensemble tries to minimize a common energy functional V cooperatively.

3 Kinetic density: Let f (x, t) dx = dp[ξ n (t) = x] t f (x, t) x (f x V f ) = Q[f ] x: state of the agent. V f : mean field potential for a multi agent system. Q[f ] interaction; Boltzmann : Q integral operator: Q[f ](x) = P f (x, x )ω f (x ) dx ω f (x)f (x) Einstein - Brownian motion: Q[f ] = x (D f x f ). System driven to a global minimum.

4 SOCIAL, ECONOMIC OR BIOLOGICAL APPLICATIONS: Non - cooperative agents. Each agent has an individual cost functional (energy), which it tries to optimize, given the action of the other agents. This leads to a Pareto optimum and a Nash equilibrium instead of a global optimum.

5 THE NASH EQUILIBRUM NE for pure strategies: Take two players: A and B with strategies (states) x and y and costs C A and C B. Optimal response: x opt = φ A (y) = min z {C A (z, y)}, y opt = φ B (x) = min z {C B (x, z)} Nash equilibrium: Neither player can do better, given the other s state fixed point problem φ A (φ B (x)) = x φ B (φ A (y)) = y 2. NE for mixed strategies: Two ensembles of players (or two players picking strategies from distributions f A and f B ). f A opt = φ A [f B ] = inf g {E g,f B[C A ]}, f B opt = φ B [f A ] = inf g {E f A,g[C B ]} Again: A fixed point problem (for functionals of distributions) φ A [f B [f A ]] = f A φ B [f A [f B ]] = f B

6 x opt = φ A (y) = min z {C A (z, y)}, y opt = φ B (x) = min z {C B (x, z)} φ A (φ B (x)) = x φ B (φ A (y)) = y This is not as optimal as minimizing C A + C B in cooperation (Prisoner s dilemma). General framework of Non-Cooperative, Non-Atomic, Anonymous games with a Continuum of Players (NCNAACP). References: Aumann, Mas Colell, Schmeidler, Shapiro, Shapley,

7 ISSUES: The Nash equilibrium is an equilibrium concept and gives no indication about the dynamics of the system. It assumes global knowledge of each player. Goal: Derive kinetic theories from agent (particle) based models for the dynamics of an actual game based on cost functionals and optimal responses.

8 OUTLINE Dynamics. General framework for mixed strategies and ensembles of players. Predict the future. (Optimal control Mean field games.) Learn from the past. (Decisions based on past experience Behavioral game theory.) Survival of the best strategies (Darwinism) Evolutionary game theory. 2. A simple example (Dynamics of mixed strategies for a RPS game.) 3. Application: Designing insurance policies. Game between an insurer and an ensemble of clients. Numerics. Kinetic theory. 4. Conclusions and outlook.

9 MODELING DYNAMICS I (Estimating the future) Optimal control approach: Swarms of intelligent individuals. Each agent observes the current state of the others and optimizes its cost, based on this observation. (1a.) Estimate over a finite time horizon. (Mean field games.) Given a strategy trajectory y(t) of B, A optimizes the cost of the trajectory t+τ t C A (x(s), y(s)) ds over a finite horizon τ. Job: Find x(t) = x opt (t) such that t+τ t the constraint d dty = V(x, y). C(x(s), y(t)) ds min under Using adjoint calculus, this yields a two point BVP in time, namely a forward problem for y(t) (to satisfy the constraint), coupled to a backward in time problem for a Lagrange multiplier λ(t). Hamilton - Jacobi - Bellmann equtns. Mean-field games (Lasry,Lions)

10 (1b.) Intelligent Swarms. A optimizes the current cost C A (x(t), y(t)) and moves incrementally towards the Pareto optimum. Yields a local in time control and an equation of the type t f A (x, t) = ω(φ A [f B ] f A ). (1c.) Instantaneous control. A proceeds as in (1a) and (1b), but uses only a Taylor series expansion of t+τ t C A (x(s), y(t)) ds for τ 0 and a Taylor series solution of d dty = V(x, y). local in time control. (Degond, Herty, Liu,CR).

11 MODELING DYNAMICS II (Learning) 16 Behavioral game theory; Machine learning General setup: Two ensembles A and B, of distributed players. A uses a (distributed) strategy x for each game. B uses a strategy y. A records outcomes via an observable a, trying to learn the behavior of B. In the same way, B uses an observable b, trying to learn about A.

12 A MULTI AGENT MODEL 2 ensembles A and B with N and M players, each with a strategy x (for A) and y (for B), and an observable a (for A) and b (for B). Step 1: A picks the optimal strategy based on the experience a. x(t + t) = ξ(a(t)), C A (ξ(a), a) = min x C A (x, a) Step 2: They actually play. A picks a random member of B, giving costs C A (x, y) and C B (y, x) for A and B. Step 3: A and B update their observables a and b, based on the outcome. Symmetric picture for B, y, b. a(t + t) = a(t) +???

13 OBSERVABLE UPDATES (a and b) 19 Discount past experiences over time. Behavioral game theory learning and forgetting (Camerer et al) 1) Rolling averages: finite memory 2) Time weighted averages: Discount past experience with a time dependent weight. a(t) = t 0 w(t, τ)c A (x(τ), y(τ)) dτ, t 0 w(t, τ) dτ = 1 t Three types of weights usable in a kinetic approch: Exponential: w(t, τ) e γ(τ t) Geometric: w(t, τ) ( τ t )γ Constant: w(t, τ) 1

14 THE IID MODEL Independent identically distributed agents in ensembles A and B with N >> 1 and M >> 1 members. F(x 1,..x N, a 1,..a N, y 1,..y M, b 1,.., b M, t) = N n=1 f A (x n, a n ) M m=1 f B (y m, b m ) Yields a system of two coupled kinetic equations for f A (x, a, t) and f B (y, b, t). t f A (x, a) = Q A strat[f A ] + Q A obs[f A, f B ] Q A strat[f A ] = ω[δ(x φ A (a)) f A (x, a) dx f A ] +λ Q A obs[f A, f B ] = a (w(t, t)af A ) δ(a C A (x, y )f A (x, a )f B (y, b ) dy a b λf A (x, a)

15 GENERAL KINETIC MODEL FOR TWO PLAYER ENSEMBLES 22 t f A (x, a) = Q A strat[f A ] + Q A obs [f A, f B ], t f B (y, b) = Q B strat[f B ] + Q B obs [f B, f A ] Q A,B strat are simple linear relaxation operators, giving the strategies, based on the observables. Q A,B obs are nonlinear operators, coupling the observables based on the actual outcome of the game. Three time scales: ω: frequency of the games. λ: frequency of observable updates (learning). w(t, t): discounting past experiences (forgetting).

16 DYNAMIC CONVERGENCE TOWARDS A NASH EQUILIBRIUM 24 Theorem Assumption: If learning is successful, i.e. the estimated cost converges against the real cost, and lim t lim t f A (x, a) dx = f B (y, b, t) dy = C A (x, a) dx, C B (y, b) dy then the system converges against the Nash equilibrium for mixed strategies, when averaged over time. 1 lim t t 1 lim t t t 0 t 0 dτ dτ da f A (x, a, τ) = f A Nash(x) db f B (y, b, τ) = f B Nash(y)

17 A SIMPLE EXAMPLE 26 (Rock - Paper - Scissors) Cost to A: A B R P S C A (x, y) = R P S Remark 1: x, y discrete {R = 1, P = 2, S = 3} +λ t f A (x, a) = Q A strat[f A ] + Q A obs[f A, f B ] Q A strat[f A ] = ω[δ(x φ A (a)) f A (x, a) dx f A ] Q A obs[f A, f B ] = a (w(t, t)af A ) C A (x, y )f A (x, a )f B (y, b ) dx a y b λc A (x, y)f A (x, a)

18 Remark 2: RPS has no Nash equilibrium for pure strategies and a trivial NE for mixed strategies P[R, R, S] = ( 1 3, 1 3, 1 3 ). Remark 3: For successful learning we have to try (randomly) new strategies. Example: B plays R all the time and A tries to learn about B. Need to add Q A try: t f A (x, a) = Q A strat[f A ] + Q A try[f A ] + Q A obs [f A, f B ] with Relaxation: Q A try[f A ](x, a) = α[p(x) f A (x, a) dx f A (x, a)] or Brownian Motion: Q A try[f A ](x, a) = α x f A

19 EXAMPLE RPS 1 B plays RPS with f(r)=0.5,f(p)=0.25,f(s)=0.25. A learns. Strategies (left panel) and estimated profits (right panel)for A (R=blue, P=green, S=red) strategies for A strategy profits for A # of games # of games,

20 EXAMPLE RPS 2 A and B both learn convergence to the Nash equilibrium. strategies and profits for A. Strategies (left panel) and estimated profits (right panel)for A (R=blue, P=green, S=red) strategies for A strategy profits for A # of games # of games,

21 A and B both learn convergence to the Nash equilibrium. strategies and profits for B. Strategies (left panel) and estimated profits (right panel)for A (R=blue, P=green, S=red) strategies for B strategy profits for B # of games # of games,

22 Corollary: Steady state given by modified NE 1 lim t t 1 lim t t t dτ 0 t 0 dτ da f A (x, a, τ) = f A Nash(x)+p(x) db f B (y, b, τ) = f B Nash(y)+p(y)

23 OPTIMAL DESIGN OF INSURANCE POLICIES Motivation: 1995: a major US university went from a uniform health insurance plan for all employees to a staggered system with 3 plans with different rates and benefits. Outcome: All plans, except for the cheapest, died out very quickly.

24 OPTIMAL DESIGN OF INSURANCE POLICIES 32 A: ensemble of clients with N >> 1 members. B: one insurer M = 1, offers K insurance plans with annual rates r k, k = 1 : K and cutoffs y k, k = 1 : K. There is a zero benefit plan k = 0 with r 0 = 0 and y 0 = (no insurance). Strategies: Client strategy (A, discrete): x = k {0,.., K}, the plan it chooses (including k = 0 no insurance). Insurer strategy (B): Annual rates r 1,.., r K ; cutoff values y 1,.., y K for accepting a member of A with risk a into plan k = 1 : K. Observable (a, risk, average claims): Poisson process Modification: Claims, independent of x and y!!) A and B share the same observable a.

25 COSTS AND OPTIMAL RESPONSE Optimal response for a client in A: The cost to a member of A, trying to choose plan k = 0 : K is ( ) C A r(k) for k = 1 : K and a < y(k) (eligible) = a for k = 0 (no insurance) If choosing a plan k = 1 : K, the optimal choice is the cheapest for which A is eligible: ν(y, a) = min{k : k = 1 : K, a k < y k } A compares this to the expected out of pocket cost a: ν(y, a) if r ν(y,a) < a (choosing a plan) k opt (y, r, a) = 0 if a < r ν(y,a) (opting out) 0 if a > y K (not eligible) k opt (y, r, a) = H(a r ν(y,a) )H(y K a)ν(y, a)

26 COSTS AND OPTIMAL RESPONSE Insurer B: Insurer cost: Given an ensemble A with distributed risk a choosing plan k C B = K yk k=1 0 (αa r k)f A (k, a) da α < 1: discount factor; y k : risk cutoff in plan k. Optimal response of insurer B : (Has to include a rate strategy!) y opt k = r k af α, r k = β A (k,a) da f A (k,a) da, β > 1 β > 1: greed factor, guarantees that C B 0. C B = α K yk k=1 0 (a βy k )f A (k, a) da 0

27 NUMERICAL RESULTS FOR A SIMPLE MULTI - AGENT MODEL insured (A) 1 insurer (B),. K = 3 insurance plans (+ plan zero = no insurance). 2 risk groups Results extremely sensitive on the greed factor β > 1 which guarantees a profit for the insurer (as long as the plan is not empty).

28 Greed factor β 5 3 : desired profit = 2 3 of the expected risk. Outcome: All but the cheapest plan and plan zero die out.

29 Greed factor β = 4 3 : desired profit = 1 3 of the expected risk. Outcome: All three plans survive.

30 KINETIC INSURANCE MODEL 43 Simplification: The observable claims a of the client ensemble A are governed by a Poisson process, independent of the strategies. This allows for a closure in terms of ρ A k (t) = f A (k, a, t) da and u A (a, t) = k f A (k, a, t) t u A (a, t) = a (w(t, t)au A ) + λa(a) λu A (a, t) t fk A (a, t) = ω[ φ A (k, a, f B )u A (a) da fk A ], t f B k (y, t) = Q B strat[f B, u A, ρ A k ]

31 COMPUTATIONAL COMPLEXITY K + 1 transport equations for the strategies of A. K transport equations for the strategies of B. 1 transport equation for the risk. Three time scales: ω: game frequency, λ: updates of the risk, (claim frequency). w(t, t): discount factor (forgetting about old claims)

32 CONCLUSIONS Content: 1. General kinetic framework for two player non - cooperative games with mixed strategies. 2. Include dynamics via a behavioral game theory approach. 3. Leads to a system of transport equations for the probability densities of strategies and observables involving multiple time scales. 4. Applications to the design and evolution of insurance policies. Outlook: Further analysis of the kinetic model. Time scale separation, moment closures, Chapman - Enskog type expansions. Further investigation in rate strategies.

33 REFERENCES Available at: math.la.asu.edu/ chris Mean Field Games, Jean-Michel LASRY, Pierre-Louis LIONS, Japan. J. Math 2, (2007) DOI: /s Evolution of the distribution of wealth in a nonconservative economic environment driven by local Nash equilibria Pierre Degond, Jian-Guo Liu and Christian Ringhofer Proceedings of the Royal Society A (to be published 2015) Evolution of the Distribution of Wealth in an Economic Environment Driven by Local Nash Equilibria Pierre Degond, Jian-Guo Liu, Christian Ringhofer, Journal of Statistical Physics, published online (2013) Large-Scale Dynamics of Mean-Field Games Driven by Local Nash Equilibria Pierre Degond, Jian-Guo Liu, Christian Ringhofer, Journal of Nonlinear Science, published online (2013)

On the interplay between kinetic theory and game theory

1 On the interplay between kinetic theory and game theory Pierre Degond Department of mathematics, Imperial College London pdegond@imperial.ac.uk http://sites.google.com/site/degond/ Joint work with J.