Game Theory and Control - PDF Free Download

Game Theory and Control Lecture 4: Potential games Saverio Bolognani, Ashish Hota, Maryam Kamgarpour Automatic Control Laboratory ETH Zürich 1 / 40

Course Outline 1 Introduction 22.02 Lecture 1: Introduction to games 2 Single-stage games 01.03 Lecture 2: Zero-sum games 08.03 Lecture 3: Non-zero-sum games 15.03 Lecture 4: Potential games 22.03 Lecture 5: Convex games I 29.03 Lecture 6: Convex games II 10.04 Homework 1 due 3 Multi-stage games 12.04 Lecture 6: Feedback games 19.04 Lecture 7: Randomized strategies for feedback games 26.04 Lecture 8: Dynamic games I 03.05 Lecture 9: Dynamic games II 15.05 Homework 2 due 17.05 Lecture 10: Stackelberg games 24.05 Lecture 11: Auctions I 31.05 Lecture 12: Auctions II 12.06 Homework 3 due 2 / 40

Recall: Finite non-zero-sum games Let there be n < players. Player i has m i < pure actions available to it. Denote the set of pure actions of player i by Si, S i = m i. Pure strategy of player i is denoted by π i,j. The set of mixed strategies of player i, denoted by X i, is the set X i = { (x i,1,..., x i,mi ) : m i j=1 x i,j = 1, x i,j 0, j = 1,..., m i } where x i,j is the probability with which player i selects action j S i. 3 / 40

Recall: Utility under pure and mixed strategies Consider a set of pure strategies of all players π = (π 1,j1, π 2,j2,..., π n,jn ), where π i,j S i for every player i. Utility of player i is: v i (π) = v i (π i,ji, π i ). Consider a set of mixed strategies of all players x = {x 1, x 2,..., x n }, where x i X i for every player i. We also denote it by x = (x i, x i ), where x i is the joint mixed strategy of all players other than i, i.e., x i = (x 1, x 2,..., x i 1, x i+1,..., x n ). The expected utility of player i is denoted by m 1 m 2 m n U i (x i, x i ) =... v i (π 1,j1, π 2,j2,..., π n,jn )x 1,j1 x 2,j2... x n,jn j 1 =1 j 2 =1 j n=1 4 / 40

Recall: Pure and mixed Nash equilibrium Pure Nash equilibrium A pure strategy profile π = (π 1, π 2,..., π n) is a pure Nash equilibrium if for every player i, v i (π i, π i) v i (π i,j, π i), π i,j S i. Mixed Nash equilibrium A mixed strategy profile x = (x 1, x 2,..., x n ) is a mixed Nash equilibrium if for every player i, U i (x i, x i) U i (x i, x i), x i X i. Every pure Nash equilibrium is a mixed Nash equilibrium. 5 / 40

Recall: Nash s theorem Theorem Every finite game has a mixed Nash equilibrium. Proof. Consider the set X = X 1 X 2... X n. X is compact and convex. Consider the map f : X X defined earlier. f is continuous. From Brouwer s fixed point theorem: There exists a fixed point x such that f(x ) = x. From Proposition 2: Every fixed point of f is a mixed Nash equilibrium. 6 / 40

Lecture outline Topics covered today: How do we compute a pure Nash equilibrium? Best response dynamics Special class of games: Potential games Application in traffic equilibrium 7 / 40

Example: Pure Nash equilibrium Utility of Red Car Utility of Blue Car Red Car Blue Car Go Wait Go 5 10 Wait 9 8 Red Car Blue Car Go Wait Go 3 9 Wait 10 8 Recall: v r(g, g) = 5, v b (g, w) = 9, and so on. Which of the following joint pure strategies are Nash equilibria? (Go,Go): (Go,Wait): (Wait,Go): (Wait,Wait): 8 / 40

Minimization vs. Maximization Suppose each player i minimizes a cost function c i as opposed to maximize a utility function v i. Pure Nash equilibrium A pure strategy profile π = (π 1, π 2,..., π n) is a pure Nash equilibrium if for every player i, Example: Prisoner s dilemma c i (π i, π i) c i (π i,j, π i), π i,j S i. Betray Stay silent [ Betray (10, 10) (0, 11) Stay silent (11, 0) (1, 1) ] or A = [ 10 ] 0 11 1 B = [ ] 10 11 0 1 Entries represent costs, which players minimize. 9 / 40

Myopic strategy Let π i = (π 1,j1, π 2,j2,..., π i 1,ji 1, π i+1,ji+1,..., π n,jn ) be the pure strategy profile of all players other than i. How should player i choose her strategy? What about the strategy that maximizes its utility? Pure Best Response The pure best response of player i is the set Si (π i ) S i such that πi Si (π i ) if and only if v i (π i, π i ) v i (π i,j, π i ), π i,j S i. In other words: S i (π i ) := argmax πi S i v i (π i, π i ). S i (π i ) is not necessarily single-valued; it is set-valued. S i (π i ) is a function of the joint strategies of other players. 11 / 40

Example: Best response Utility of Red Car Utility of Blue Car Red Car Blue Car Go Wait Go 5 10 Wait 9 8 Red Car Blue Car Go Wait Go 3 9 Wait 10 8 Recall: v r(g, g) = 5, v b (g, w) = 9, and so on. What are the best responses: S r(g): best response of the red car when the blue car chooses go? S r(w): best response of the red car when the blue car chooses wait? S b (g): S b (w): 12 / 40

Best response and Nash equilibrium Proposition A pure strategy profile π = (π1, π 2,..., π n) is a pure Nash equilibrium if and only if πi Si (π i ) for every player i. Fixed point interpretation: Consider a set valued map S such that when π S, S (π) := [S 1 (π 1), S 2 (π 2),..., S n(π n )]. Note: S (π) S. A pure strategy profile π S is a Nash equilibrium if and only if π S (π ). We require a stronger version of Brouwer s fixed point theorem to show existence of fixed points in set-valued maps. 13 / 40

Best response dynamics Best response dynamics 1 Consider an initial pure strategy profile π 0 = (π 0 1, π0 2,..., π0 n). 2 If π k is a pure Nash equilibrium Stop. 3 Else there exists a player i, and π k+1 i πi k such that v i (π k+1 i, π i k ) > v i(πi k, π i k ). 4 Update: π k+1 := (π k+1 i, π k i ). 5 Repeat steps 2-4. Does this dynamics converge? If yes, then to which joint strategy? 14 / 40

Example: Best response dynamics in odds and evens If sum of both numbers is odd: P 1 wins 1 Franc, P 2 loses 1 Franc If sum of both numbers is even: P 2 wins 1 Franc, P 1 loses 1 Franc P 1 maximizes, P 2 maximizes P 2 P 1 1, 1 1, 1 1, 1 1, 1 Let π 0 = (1, 1). Does the best response dynamics converge? 15 / 40

Example: Best response dynamics in prisoner s dilemma Example: Prisoner s dilemma Betray Stay silent [ Betray (10, 10) (0, 11) Stay silent (11, 0) (1, 1) ] or A = [ 10 ] 0 11 1 B = [ ] 10 11 0 1 Entries represent costs, which players minimize. Let π 0 = (silent, silent). Does the best response dynamics converge? 16 / 40

Potential game: Definition Ordinal Potential Function A function P : S 1 S 2... S n R is an ordinal potential function if for every player i, every π i, v i (π i,j1, π i ) v i (π i,j2, π i ) > 0 iff P(π i,j1, π i ) P(π i,j2, π i ) > 0, for every π i,j1, π i,j2 S i. Note: The potential function assigns a value to each joint strategy profile. When player i chooses a best response, the potential increases. 17 / 40

Potential game: Definition Ordinal Potential Function A function P : S 1 S 2... S n R is an ordinal potential function if for every player i, every π i, v i (π i,j1, π i ) v i (π i,j2, π i ) > 0 iff P(π i,j1, π i ) P(π i,j2, π i ) > 0, for every π i,j1, π i,j2 S i. Exact Potential Function A function P : S 1 S 2... S n R is an ordinal potential function if for every player i, every π i, v i (π i,j1, π i ) v i (π i,j2, π i ) = P(π i,j1, π i ) P(π i,j2, π i ), for every π i,j1, π i,j2 S i. A game is an (ordinal/exact) potential game if it admits an (ordinal/exact) potential function. 18 / 40

Example: Prisoner s dilemma Payoff matrices are given by Betray Stay silent [ ] [ ] Betray ( 10, 10) (0, 11) 10 0 i.e., A = Stay silent ( 11, 0) ( 1, 1) 11 1 B = [ ] 10 11 0 1 Both players maximize. Is the following a potential function? Betray Stay silent [ Betray 0 1 Stay silent 1 2 Note: If a player deviates, then the change in potential is equal to the change in utility of the deviating player. ] 19 / 40

Existence of pure Nash equilibrium Proposition Finite games with an ordinal potential function possess a pure Nash equilibrium. Furthermore, best response dynamics converges. Proof idea: The joint strategy profile that maximizes the potential function is a Nash equilibrium. 20 / 40

Improvement paths Let s introduce some terminology. Let S := S 1 S 2... S n. A path in S is a sequence z = (z 0, z 1,...), z k S, such that for every k, there exists a unique player i k such that z k = (π ik,j, z k i k ) for some π ik,j S ik, π ik,j z k 1 i. A path z is an improvement path if at every k 1, v ik (z k ) > v ik (z k 1 ). Proposition In a finite ordinal potential game, every improvement path is finite. The above property is known as the finite improvement property (FIP). Does the converse hold? 21 / 40

Potential game and FIP Ordinal Potential Function A function P : S 1 S 2... S n R is a generalized ordinal potential function if for every player i, every π i, v i (π i,j1, π i ) v i (π i,j2, π i ) > 0 = P(π i,j1, π i ) P(π i,j2, π i ) > 0, for every π i,j1, π i,j2 S i. Proposition A finite game has the finite improvement property (FIP) if and only if it admits a generalized ordinal potential. 22 / 40

Potential game characterization Consider any finite path z = (z 0, z 1,..., z m ). z need not be an improvement path. Define I(z, v) := where i k is the player with z k i k m [v ik (z k ) v ik (z k 1 )], k=1 z k 1 i k. A path is closed if z 0 = z m. A path is simple if z i z j, for every i, j (except z 0 and z m ). Question Suppose the game admits an exact potential function. Let z be a closed path. Then, I(z, v) =? 23 / 40

Potential game characterization Proposition [4] Consider a finite game. Then the following are equivalent: 1 The game admits an exact potential function. 2 I(z, v) = 0 for every finite closed path z. 3 I(z, v) = 0 for every finite simple closed path z. 4 I(z, v) = 0 for every finite simple closed path z of length 4. 24 / 40

Example: Coordination game Payoff matrices are given by Movie Football [ ] Movie (2, 1) (0, 0) Football (0, 0) (1, 2) Is it a potential game? Construct a potential function? 25 / 40

Example: Odds and evens game Exercise - P 1 maximizes, P 2 maximizes - Is this game a potential game? P 2 P 1 1, 1 1, 1 1, 1 1, 1 - Evaluate I(z, v) for a closed simple path of length 4. 26 / 40

Congestion game Consider a game with n players. Let there be m resources. Each player chooses a resource, i.e., S i = {1, 2,..., m} for every player i. Consider a pure strategy profile π. Denote the load on resource j as σ j (π) := {1 i n π i = j}, the number of players who choose resource j in strategy profile π. Cost of a player depends on the load on the resource it chose. c i (π) = f j (σ j (π)), when π i = j. The function f j is resource-specific. Each player who choose a given resource experience the same cost. 27 / 40

Example: Traffic routing A 15 + 0.1n Road Ferry 40 Ferry 40 Road 15 + 0.1n B There are two ways to reach city B from city A, and both include some driving, and a trip on the ferry. The two paths are perfectly equivalent, the only difference is whether you first drive, or take the ferry. The time needed for the trip depends on what other travellers do. The ferry time is constant, 40 minutes The road time depends on the number of cars on the road. We consider a population of N = 200 travellers. 28 / 40

Example: Traffic routing A 15 + 0.1n Road Ferry 40 Ferry 40 Road 15 + 0.1n B Formulation as a non-zero-sum N-person game. Each traveller is a Player. Each path is a resource. Each Player can decide to take the North or the South path. { γ (i) 1 North = 0 South All players have identical cost function { c i (γ (i), γ ( i) 40 + 15 + 0.1 j ) = γ(j) if γ (i) = 1 40 + 15 + 0.1 j (1 γ(j) ) if γ (i) = 0 29 / 40

Potential function Theorem The following is an exact potential function for congestion games. P(π) = m j=1 σ j (π) k=1 f j (k). Proof. Consider a player i, and two joint pure strategies π 1 = (p, π i ) and π 2 = (q, π i ). Note that σ p (π 2 ) = σ p (π 1 ) 1. σ q (π 2 ) = σ q (π 1 ) + 1. σ j (π 2 ) = σ j (π 1 ) for every resource j p, q. It suffices to show that P(π 1 ) P(π 2 ) = c i (π 1 ) c i (π 2 ). 30 / 40

Proof cont. Proof. P(π 1 ) P(π 2 ) = m j=1 σ j (π 1 ) k=1 f j (k) m j=1 σ j (π 2 ) k=1 f j (k) = σ p(π 1 ) k=1 σ q(π 1 ) f p (k) + k=1 = f p (σ p (π 1 )) f q (σ q (π 2 )) = c i (π 1 ) c i (π 2 ). σ p(π 2 ) f q (k) k=1 σ q(π 2 ) f p (k) k=1 f q (k) Consequently, congestion games admit a pure Nash equilibrium. 31 / 40

Example: Traffic routing Ferry 40 A 15 + 0.1n Road Ferry 40 Road 15 + 0.1n B Are there pure NE? Suppose 100 players choose North path, 100 choose South path. Travel cost of each player: c i (γ (i), γ ( i) ) = 40 + 15 + 0.1 200 2 = 65 minutes Can you improve the outcome by unilaterally deviating from the NE? 32 / 40

Example: Braess paradox A 15 + 0.1n Road Ferry 40 Bridge 0 Ferry 40 Road 15 + 0.1n B Assume a bridge is build, to help reduce traffic. It takes no time to cross the bridge, allowing to go from city A to city B without taking the ferry. New Nash equilibrium: all travellers avoid the ferry. c i (γ (1),..., γ (N) ) = 2 (15 + 0.1 200) = 70 minutes Can you improve your outcome by unilaterally deviate from the NE? No, road + ferry now takes 40 + 15 + 0.1 200 = 75 minutes! 33 / 40

Example: Braess paradox Ferry 40 Ferry 40 A 15 + 0.1n Road Ferry 40 Road 15 + 0.1n B A 15 + 0.1n Road Ferry 40 Bridge 0 Road 15 + 0.1n B c NE i = 65 minutes c NE i = 70 minutes With the new link in the transportation graph the original choice (road + ferry) is still present the new link is intensively used all agents experience higher cost! 34 / 40

What if They Closed 42d Street and Nobody Noticed? 25 December 1990 On Earth Day this year, New York City s Transportation Commissioner decided to close 42d Street, which as every New Yorker knows is always congested. [...] But to everyone s surprise, Earth Day generated no historic traffic jam. Traffic flow actually improved when 42d Street was closed. And many other real-life cases in road traffic, data networks, etc. 35 / 40

Social welfare Welfare function In a n-person game, let x i X i be the (possibly mixed) strategy played by agent i. Let x X := X 1 X 2... X n be the system-wide strategy. A welfare cost W : X R is a measure of efficiency of each strategy for the social cost of the population of agents. Let the individual cost be c i (x) that player i wants to minimize. For example: W(x) = i c i (x) W(x) = i log c i (x) W(x) = max c i (x) i Different meanings: think of income 36 / 40

Price of Anarchy The Price of Anarchy is defined as the ratio PoA := max x X NE W(x) min x X W(x) where X is the set of all possible strategies for all agents, while X NE is the set of all strategies which are NE. In Braess paradox example, assume W = N i=1 c i(x). Theorem [5] PoA = 70 65 = 108% When the delay functions are affine for every edge, PoA 3+ 5 2. 37 / 40

Outlook Smaller the PoA, better the quality of Nash equilibrium. Every finite potential game is isomorphic to a congestion game [4]. Many different types of learning dynamics can be shown to converge to Nash equilibrium in potential games [3, 2]. Every finite game can be decomposed to a potential game and a harmonic game [1]. Next lecture: Games with an infinite number of pure strategies or continuous pure strategy sets. 38 / 40

References I Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A Parrilo. Flows and decompositions of games: Harmonic and potential games. Mathematics of Operations Research, 36(3):474 503, 2011. Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. IEEE Transactions on Automatic Control, 54(2):208 220, 2009. Dov Monderer and Lloyd S Shapley. Fictitious play property for games with identical interests. Journal of Economic Theory, 68(1):258 265, 1996. Dov Monderer and Lloyd S Shapley. Potential games. Games and Economic Behavior, 14(1):124 143, 1996. 39 / 40

References II Tim Roughgarden. Selfish routing. Technical report, PhD Thesis, Cornell University, 2002. 40 / 40