An Introduction to Noncooperative Games Alberto Bressan Department of Mathematics, Penn State University Alberto Bressan (Penn State) Noncooperative Games 1 / 29
introduction to non-cooperative games: solution concepts differential games in continuous time applications to economic models Alberto Bressan (Penn State) Noncooperative Games 2 / 29
Optimal decision problem maximize: Φ(x, y) y y * Φ = constant x * x The choice (x, y ) R 2 yields the maximum payoff Alberto Bressan (Penn State) Noncooperative Games 3 / 29
A game for two players Player A wishes to maximize his payoff Φ A (a, b) Player B wishes to maximize his payoff Φ B (a, b) B A Φ = constant B Φ = constant b 1 b 2 a 1 a 2 A Player A chooses the value of a A Player B chooses the value of b B Alberto Bressan (Penn State) Noncooperative Games 4 / 29
A game for two players Player A wishes to maximize his payoff Φ A (a, b) Player B wishes to maximize his payoff Φ B (a, b) B A Φ = constant B Φ = constant b 1 b 2 a 1 a 2 A Player A chooses the value of a A Player B chooses the value of b B Alberto Bressan (Penn State) Noncooperative Games 4 / 29
A cooperative solution B A Φ = constant B Φ = constant b C C a C A maximize the sum of payoffs Φ A (a, b) + Φ B (a, b) split the total payoff fairly among the two players (how???) Alberto Bressan (Penn State) Noncooperative Games 5 / 29
The best reply map If Player A adopts the strategy a, the set of best replies for Player B is { } R B (a) = b ; Φ B (a, b) = max s B ΦB (a, s) If Player B adopts the strategy b, the set of best replies for Player A is { } R A (b) = a ; Φ A (a, b) = max s A ΦA (s, b) B A Φ = constant B Φ = constant B R (a) b A R (b) a A Alberto Bressan (Penn State) Noncooperative Games 6 / 29
Nash equilibrium solutions A couple of strategies (a, b ) is a Nash equilibrium if a R A (b ) and b R B (a ) B A Φ = constant B Φ = constant * b N a * A Antoin Augustin Cournot (1838) John Nash (1950) Alberto Bressan (Penn State) Noncooperative Games 7 / 29
Existence of Nash equilibria Theorem. Assume Sets of available strategies for the two players: A, B R n are compact and convex Payoff functions: Φ A, Φ B : A B R are continuous For each a A, the set of best replies R B (a) B is convex For each b B, the set of best replies R A (b) A is convex Then the game admits at least one Nash equilibrium. Proof. If the best reply maps are single valued, the map (a, b) (R A (b), R B (a)) is a continuous map from the compact convex set A B into itself. By Brouwer s fixed theorem, it has a fixed point (a, b ). If R A, R B are convex-valued, by Kakutani s fixed point theorem there exists (a, b ) (R A (b ), R B (a )) Alberto Bressan (Penn State) Noncooperative Games 8 / 29
One-dimensional version of Brouwer s and Kakutani s theorems 1 Brouwer 1910 Kakutani 1941 x * = f(x *) x * F(x *) F f f ε F 0 x* 1 0 x* 0 x Luitzen Egbertus Jan Brouwer (1910) Shizuo Kakutani (1941) Arrigo Cellina (1969) Alberto Bressan (Penn State) Noncooperative Games 9 / 29
Stackelberg equilibrium Player A (the leader) announces his strategy a A in advance Player B (the follower) adopts his best reply: b R B (a) B What is the best strategy for the leader? A Φ = constant B max a A Φ A (a, R B (a)) B Φ = constant B R (a) b * S a* a A A couple of strategies (a, b ) is a Stackelberg equilibrium if b R B (a ) and Φ A (a, b ) Φ A (a, b) for all a A, b R B (a) Alberto Bressan (Penn State) Noncooperative Games 10 / 29
Game theoretical models in Economics and Finance Sellers (choosing prices charged) vs. buyers (choosing quantities bought) Companies competing for market share (choosing production level, prices, amount spent on research & development or advertising) Auctions, bidding games Economic growth. Leading player: central bank (choosing prime rate) followers: private companies (choosing investment levels) Debt management. Lenders (choosing interest rate) vs. borrower (choosing repayment strategy)... Alberto Bressan (Penn State) Noncooperative Games 11 / 29
Differential games in finite time horizon x(t) R n = state of the system Dynamics: ẋ(t) = f (x(t), u 1 (t), u 2 (t)), x(t 0 ) = x 0 u 1 ( ), u 2 ( ) = controls implemented by the two players Goal of i-th player:. ( ) T ( maximize: J i = ψi x(t ) L i x(t), u1 (t), u 2 (t) ) dt t 0 = [terminal payoff] - [running cost] Alberto Bressan (Penn State) Noncooperative Games 12 / 29
Differential games in infinite time horizon Dynamics: ẋ = f (x, u 1, u 2 ), x(0) = x 0 Goal of i-th player: +. maximize: J i = e γt ( Ψ i x(t), u1 (t), u 2 (t) ) dt 0 (running payoff, exponentially discounted in time) Alberto Bressan (Penn State) Noncooperative Games 13 / 29
Example 1: an advertising game Two companies, competing for market share state variable: x(t) [0, 1] = market share of company 1, at time t controls: ẋ = (1 x) u 1 x u 2 u 1, u 2 = advertising rates T payoffs: J i = N x i (T ) p i c i u i (t) dt i = 1, 2 0 N = expected number of items purchased by consumers p i = profit made by player i on each sale c i = advertising cost x i = market share of player i (x 1 = x, x 2 = 1 x) Alberto Bressan (Penn State) Noncooperative Games 14 / 29
Example 2: harvesting of marine resources x(t) = amount of fish in a lake, at time t dynamics: ẋ = αx(m x) xu 1 xu 2 controls: u 1, u 2 = harvesting efforts by the two players payoffs: J i = + 0 e γt (p xu i c i u i ) dt p = selling price of fish c i = harvesting cost Alberto Bressan (Penn State) Noncooperative Games 15 / 29
Example 3: a producer vs. consumer game { p = price State variables: q = size of the inventory { a(t) = production rate Controls: b(t) = consumption rate { ṗ = p ln(q0 /q) The system evolves in time according to q = a b Here q 0 is an appropriate inventory level + J producer. = e γt[ p(t) b(t) c ( a(t) )] dt 0 Payoffs: + J consumer. = e γt[ φ(b(t)) p(t)b(t) ] dt c(a) = production cost, φ(b)= utility to the consumer 0 Alberto Bressan (Penn State) Noncooperative Games 16 / 29
Solution concepts No outcome can be optimal simultaneously for all players Different outcomes may arise, depending on information available to the players their ability and willingness to cooperate Alberto Bressan (Penn State) Noncooperative Games 17 / 29
Nash equilibria (in infinite time horizon) Seek: feedback strategies: u1 (x), u 2 (x) with the following properties Given the strategy u 2 = u2 (x) adopted by the second player, for every initial data x(0) = y, the assignment u 1 = u1 (x) provides a solution to the optimal control problem for the first player: max u 1( ) 0 e γt Ψ 1 (x, u 1, u 2 (x)) dt subject to ẋ = f (x, u 1, u 2 (x)), x(0) = y Similarly, given the strategy u 1 = u1 (x) adopted by the first player, the feedback control u 2 = u2 (x) provides a solution to the optimal control problem for the second player. Alberto Bressan (Penn State) Noncooperative Games 18 / 29
Solving an optimal control problem by PDE methods V (y) = inf u( ) + 0 e γt L(x(t), u(t)) dt subject to: ẋ(t) = f (x(t), u(t)) x(0) = y u(t) U V (y) = minimum cost, if the system is initially at y Alberto Bressan (Penn State) Noncooperative Games 19 / 29
A PDE for the value function y = x(0) y + ε f(y, ω) x(t) If we use the constant control u(t) = ω for t [0, ε] then we play optimally for t [ε, [, the total cost is ( ε + ) J ε,ω = + e γt L(x(t), u(t)) dt 0 = ε L(y, ω) + e γε V ε ( ) y + εf (y, ω) + o(ε) = ε L(y, ω) + (1 γε)v (y) + V (y) εf (y, ω) + o(ε) V (y) Since this is true for every ω U, V (y) V (y) γεv (y) + ε min ω U { L(y, ω) + V (y) f (y, ω) } + o(ε) Alberto Bressan (Penn State) Noncooperative Games 20 / 29
The Hamilton-Jacobi PDE for the value function Choosing ω = u (y) the optimal control, we should have equality. Hence { } V (y) = V (y) γεv (y) + ε min ω U L(y, ω) + V (y) f (y, ω) + o(ε) Letting ε 0 we obtain γv (y) = min ω U { L(y, ω) + V (y) f (y, ω) }. = H(y, V (y)) If V ( ) is known, the optimal feedback control can be recovered by { } u (y) = argmin L(y, u) + V (y) f (y, u) u U Alberto Bressan (Penn State) Noncooperative Games 21 / 29
An example minimize: 0 ( ) e γt φ(x(t)) + u2 (t) dt 2 subject to: ẋ = f (x) + g(x) u, x(0) = y, u(t) R The value function V (y) = minimum cost, starting at y satisfies the PDE γv (x) = min ω R ( ) {φ(x) } + ω2 2 + V (x) f (x) + g(x) ω = φ(x) + V (x) f (x) 1 2 ( V (x) g(x))2 Alberto Bressan (Penn State) Noncooperative Games 22 / 29
Finding the optimal feedback control ẋ = f (x) + g(x) u V(x) g(x) x V = const. If V ( ) is known, the optimal control can be recovered by u (x) = argmin u R { V (x) g(x) u + u2 } 2 = V (x) g(x) Alberto Bressan (Penn State) Noncooperative Games 23 / 29
Solving a differential game by PDE methods Dynamics: ẋ = f (x) + g 1 (x)u 1 + g 2 (x)u 2 ( Player i seeks to minimize: J i = e γt φ i (x(t)) + u2 i (t) ) dt 2 0 Given the strategy u2 (x) of Player 2, the optimal control problem for Player 1 is: minimize J 1 subject to: ẋ = f (x) + g 1 (x)u 1 + g 2 (x)u2 (x) PDE for the Value Function V 1 (y) = minimum cost starting at y γv 1 = φ 1 + V 1 f + V 1 g 2 u2 1 ( ) 2 V1 g 1 2 Optimal feedback control for Player 1 u 1 (x) = V 1 (x) g 1 (x) Alberto Bressan (Penn State) Noncooperative Games 24 / 29
A system of PDEs for the value functions The value functions V 1, V 2 for the two players satisfy the system of H-J equations γv 1 = (f V 1 ) 1 2 (g 1 V 1 ) 2 (g 2 V 1 )(g 2 V 2 ) + φ 1 γv 2 = (f V 2 ) 1 2 (g 2 V 2 ) 2 (g 1 V 1 )(g 1 V 2 ) + φ 2 Optimal feedback controls: u i (x) = V i(x) g i (x) i = 1, 2 highly nonlinear, implicit! Alberto Bressan (Penn State) Noncooperative Games 25 / 29
Differential games in finite time horizon Player i seeks to maximize: J i. = ψi ( x(t ) ) T t 0 ( φ i (x(t)) + u2 i (t) ) dt 2 subject to ẋ = f (x) + g 1 (x)u 1 + g 2 (x)u 2, x(t 0 ) = x 0. The value functions satisfy the system of H-J equations V 1,t = φ 1 (f V 1 ) 1 2 (g 1 V 1 ) 2 (g 2 V 1 )(g 2 V 2 ) V 2,t = φ 2 (f V 2 ) 1 2 (g 2 V 2 ) 2 (g 1 V 1 )(g 1 V 2 ) Terminal conditions: V 1 (T, x) = ψ 1 (x), V 1 (T, x) = ψ 1 (x) Optimal feedback controls: u i (x) = V i(x) g i (x) i = 1, 2 a hard PDE problem to solve! Alberto Bressan (Penn State) Noncooperative Games 26 / 29
Linear - Quadratic games Assume that the dynamics is linear and the cost functions are quadratic: ẋ = (Ax + b 0 ) + b 1 u 1 + b 2 u 2, x(0) = y + J i = e γt( ) a i x + x T P i x + c i u i + u2 i dt 2 0 Then the system of PDEs has a special solution of the form quadratic polynomial: V i (x) = α i + β i x + x T Γ i x i = 1, 2 } optimal controls: ui (x) = argmin {c i ω + ω2 ω R 2 + (β i + 2x T Γ i )b i ω = c i (β i + 2x T Γ i ) b i To find this solution, it suffices to determine the coefficients α i, β i, Γ i by solving a system of algebraic equations Alberto Bressan (Penn State) Noncooperative Games 27 / 29
Validity of linear-quadratic approximations? Assume the dynamics is almost linear ẋ = f 0 (x)+g 1 (x)u 1 +g 2 (x)u 2 (Ax +b 0 )+b 1 u 1 +b 2 u 2, x(0) = y and the cost functions are almost quadratic J i = + 0 e γt( φ i (x) + u2 ) i dt 2 + 0 e γt( a i x + x T P i x + u2 ) i dt 2 Is it true that the nonlinear game has a feedback solution close to the one for linear-quadratic game? Alberto Bressan (Penn State) Noncooperative Games 28 / 29
References On optimal control: A. Bressan and B. Piccoli, Introduction to the Mathematical Theory of Control, AIMS Series in Applied Mathematics, 2007. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control. Springer-Verlag, 1975. S. Lenhart and J. T. Workman, Optimal Control Applied to Biological Models Chapman and Hall/CRC, 2007. On differential games: A.Bressan, Noncooperative differential games. A tutorial. Milan J. Math. 79 (2011), 357 427. T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory. SIAM, Philadelphia, 1999. E. Dockner, S. Jorgensen, N. Van Long, and G. Sorger, Differential Games in Economics and Management Science, Cambridge University Press, 2000. Alberto Bressan (Penn State) Noncooperative Games 29 / 29