Game Theory. School on Systems and Control, IIT Kanpur. Ankur A. Kulkarni

Game Theory School on Systems and Control, IIT Kanpur Ankur A. Kulkarni Systems and Control Engineering Indian Institute of Technology Bombay kulkarni.ankur@iitb.ac.in Aug 7, 2015 Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 1 / 42

Game theory: examples Prisoner s dilemma Consider two prisoners A and B, each confined in a solitary room, who are given a choice to either testify or maintain their silence with the following consequences- If A and B both testify, each of them serves 2 years in prison If one of them opts to remain silent but the other testifies, then the prisoner who testified will be set free and the one who opted to remain silent will serve 3 years in prison If both of them remain silent, then each of them will serve 1 year each in prison silent testify silent (1,1) (3,0) testify (0,3) (2,2) (row,column); both minimizing Question: What will each player do? What must each player do? Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 2 / 42

Main observations Each player is faced with an optimization problem Decisions have to be made without the knowledge of the other s decision But outcome depends not only on what one player does, but also on what the other does Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 2 / 42 Game theory: examples Prisoner s dilemma Consider two prisoners A and B, each confined in a solitary room, who are given a choice to either testify or maintain their silence with the following consequences- If A and B both testify, each of them serves 2 years in prison If one of them opts to remain silent but the other testifies, then the prisoner who testified will be set free and the one who opted to remain silent will serve 3 years in prison If both of them remain silent, then each of them will serve 1 year each in prison silent testify silent (1,1) (3,0) testify (0,3) (2,2) (row,column); both minimizing Question: What will each player do? What must each player do?

Game theory: examples Stag hunt/hunter s dilemma The game consists of two hunters who can choose to hunt either deer or rabbits with following rules: If A and B both choose to hunt deer, each of them get 2 If one of them opts to hunt deer but the other choses to hunt rabbits, then the hunter who decides to hunt rabbit will get one, while the one who went for deer will get nothing! If both of them choose rabbits, then each of them will receive half a rabbit each deer rabbit deer (2,2) (0,1) rabbit (1,0) (0.5,0.5) Question: What will each player do? What must each player do? Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 3 / 42

What is Game Theory? Game theory When two or more rational decision makers (agents) interact, one would like to have a mathematical framework for reasoning about such situations, under certain assumptions. Game theory is the study of such interactions involving strategic decision making. A game comprises of A set of players - {1, 2,..., N} For each player i, a set of strategies S i An objective function π i : S R that he is trying to minimize or a payoff/utility function he is trying to maximize. S = S 1... S N What are the players, strategies and utility functions in the above games? Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 4 / 42

What is Game Theory and what it is not What it is Game theory develops a framework for logically reasoning about strategic interactions It attempts to answer what one can say would be the logical outcomes of a game Game theory develops and studies solution concepts, which are concepts that can logically be regarded as outcomes of games The point of view taken is that of on observer of the game It can be used for predicting outcomes. In many applications it has provided surprisingly good predictions It can be used for tactical/strategic advice/prescriptions. It can be used to explain, alter or induce behaviour What it is not Theory of human behaviour, psychology, emotion, trust etc Secret code to win games Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 5 / 42

Classification of games First main vertical along which games are classified is based on communication requirements: Two broad categories: Cooperative Game: Any amount of communication allowed between players involved in the game. Players can have binding agreements between them. Noncooperative Game: No communication between the players involved. No binding agreement between the players. Further classifications. Nature of payoff: Zero sum games Nonzero sum games Aspect of time and information: Static games: Decisions are made in one shot, without the knowledge of decisions of other players Dynamic games: Decisions are made sequentially, with some knowledge of decisions of other players Quality of information: Imperfect information, incomplete information, asymmetric information Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 6 / 42

Game theory in control theory Game theory covers competitive scenarios as well as cooperative scenarios Team problems: Games where all players have identical objectives Competitive Security: design of engineering systems, e.g., the power grid, has to now consider an additional aspect of security in additional to usual considerations such as stability, reliability etc. Game theory provides a natural framework. Engineering and economics: design of engineering systems, e.g., the internet, is now intertwined with their economics. Again, game theory applies. Cooperative Decentralized control problems can be thought of as team problems Multiagent, distributed systems are becoming common. These can be modeled teams where individual components are players Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 7 / 42

How does one apply game theory Game theory applies whenever a strategic decision is involved, i.e., when there are multiple agents and payoffs of the agents are affect not just by their own decision but also of others Game theory, like optimization, is an omnibus theory that has specialized tools that have been sharpened for various situations Data requirements Game theory presumes players know the rules of the game, their options, which players are there in the game and their options, and the payoffs for all players. Game theory assumes each player is rational: i.e., consistent in seeking the highest payoff Often the conclusions drawn from game theory are based only on the ordinal relationships between payoffs, not the exact values Game theory requires knowledge of the time evolution of the information i.e., who would know what in various situations Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 8 / 42

How does one apply game theory Process of applying game theory Identify players, identify decisions, identify payoffs, identify rules Can agreements on cooperation be enforced? yes: cooperative, no: non-cooperative Can the rules be altered? If so, in this case one is really playing a pregame before the actual game in which the rules of the actual game are strategies Are decisions sequential or simultaneous? Simultaneous: e.g., prisoner s dilemma = static game theory. In sequential games one additional information before making the move and one has to look ahead in order to act now = dynamic games Are objectives in conflict or is there some commonality of interests? diametrically opposite interests zero-sum games. Some commonality non-zero sum Is the game played once, or repeatedly (repeated games)? Do players have full or partial information? static or dynamic games Add as much contextual information as possible Select the appropriate kind and analyze using tools available for it Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 9 / 42

Analyzing noncooperative games: Nash equilibrium Noncooperative games Players have no scope for communication or discussion There is no mechanism available for letting the players get into binding agreements Notation Decisions are made based on the existing incentives only x i S i = strategy of player i x = (x 1,..., x N ) = strategy profile x i = (x 1,..., x i 1, x i+1,..., x N ) = strategies of all players except i ( x i, x i ) = (x 1,..., x i 1, x i, x i+1,..., x N ) Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 10 / 42

Analyzing noncooperative games: Nash equilibrium Noncooperative games Players have no scope for communication or discussion There is no mechanism available for letting the players get into binding agreements Notation Decisions are made based on the existing incentives only x i S i = strategy of player i x = (x 1,..., x N ) = strategy profile x i = (x 1,..., x i 1, x i+1,..., x N ) = strategies of all players except i Nash equilibrium ( x i, x i ) = (x 1,..., x i 1, x i, x i+1,..., x N ) Profile of strategies x = (x 1,..., x N) S such that no player has an incentive to deviate. i.e., for all i = 1,..., N π i (x ) π i (x i ; x i) x i S i Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 10 / 42

Nash equilibrium It is a profile of strategies x where if player any i unilaterally shifts from it, he is worse off Note that one does not take into account the effect of this deviation on other players strategy. Strategies of other players are held fixed Nash equilibrium for Prisoner s dilemma Testify, Testify Why does the Nash equilibrium make sense? Noncooperative games means no communication allowed If prisoners could discuss and get into binding agreements, they would agree to both stay silent In the absence of a binding agreement, each player has an incentive to deviate from (silent, silent) or any other point Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 11 / 42

Nash equilibrium It is a profile of strategies x where if player any i unilaterally shifts from it, he is worse off Note that one does not take into account the effect of this deviation on other players strategy. Strategies of other players are held fixed Nash equilibrium for Prisoner s dilemma Testify, Testify Why does the Nash equilibrium make sense? Noncooperative games means no communication allowed If prisoners could discuss and get into binding agreements, they would agree to both stay silent In the absence of a binding agreement, each player has an incentive to deviate from (silent, silent) or any other point Nash equilibria for Hunter s dilemma (deer, deer) and (rabbit, rabbit) Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 11 / 42

Nash equilibrium Justifications and interpretations Stability: A point that is not stable against unilateral deviation is cannot be regarded as an outcome Self-fulfilling agreement: if the players could communicate and decide to play Nash, the decision will hold. Since none of the players will have an incentive to deviate Settling point of an adjustment process Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology Can be deduced, in some cases Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 12 / 42

Nash equilibrium Justifications and interpretations Caveats Stability: A point that is not stable against unilateral deviation is cannot be regarded as an outcome Self-fulfilling agreement: if the players could communicate and decide to play Nash, the decision will hold. Since none of the players will have an incentive to deviate Settling point of an adjustment process Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology Can be deduced, in some cases Nash equilibrium cannot be derived. It is a concept, it can only be defined Nash equilibrium as described applies to static games also. There is really no adjustment process involved A Nash equilibrium may not always exist and if it exists, may not be unique Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 12 / 42

Nash equilibrium Justifications and interpretations Caveats Stability: A point that is not stable against unilateral deviation is cannot be regarded as an outcome Self-fulfilling agreement: if the players could communicate and decide to play Nash, the decision will hold. Since none of the players will have an incentive to deviate Settling point of an adjustment process Nature: Nash equilibrium is seen in nature, e.g., evolutionary biology Can be deduced, in some cases Nash equilibrium cannot be derived. It is a concept, it can only be defined Nash equilibrium as described applies to static games also. There is really no adjustment process involved A Nash equilibrium may not always exist and if it exists, may not be unique Finding a Nash equilibrium Systematically check all the possible outcome Computational schemes more later Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 12 / 42

Example: Nash-Cournot equilibrium N firms, want to decide production levels q 1,..., q N of the same good, q i [0, ). Cost is c i (q i ) The market clearing price = p(q) is not exogenous but depends on q = (q 1,..., q N ) The firms want to maximize profit u i (q 1,..., q N ) = p(q)q i c i (q i ), i.e., an optimization problem max p(q)q i c i (q i ) subject to q i 0 Suppose p(q) = p( i q i). Then KKT = for all i p (q )q i + p(q ) + c i (q i ) µ i = 0, µ i 0, q i 0, µ i q i = 0. NE requires a simultaneous solution of N KKT conditions If p(q) = 1 i q i and c i (q i ) = cq i (0 c 1), and N = 2, then q 1 = q 2 = 1 c 3 Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 13 / 42

Reaction curves U i = Set of pure strategies of P i, J i (u i, u i ) objective of P i when P i plays u i and other players play u i (minimizing players) Best response or reaction curve of player i against u i R i (u i ) = argmin u i U i J i (u i, u i ) For two player games, the NE is the intersection of reaction curves Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 14 / 42

Refinements of the Nash equilibrium Nash equilibrium is often non-unique and depending on the situation some NE can be more meaningful than others Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 15 / 42

Refinements of the Nash equilibrium Nash equilibrium is often non-unique and depending on the situation some NE can be more meaningful than others Security Dilemma Consider the game in which the USA and USSR have to decide on whether to have nuclear weapons or not. The payoff matrix is (both maximizing) USA Yes No USSR Yes (2,2) (3,-1) No (-1,3) (4,4) Two Nash equilibria (Yes, Yes)= (2, 2) and (No, No) = (4, 4). In terms of payoff, playing (No, No) is preferable for both players. Such a NE is Pareto dominant. But (No,No) is also risky since if the other player changes strategy to Yes, the player playing No gets -1. Thus (Yes,Yes) is a safer equilibrium; it is a risk dominant Nash equilibrium Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 15 / 42

Refinements of the Nash equilibrium Refinements of the NE A refinement of the NE is a particular NE with additional properties Many other refinements of the NE are known, e.g., trembling hand perfect equilibrium, proper equilibrium, subgame perfect equilibrium, etc Applying the Nash equilibrium In order to get good predictions/prescriptions, it may not be enough to only consider the NE. To choose out of the many possible NEs, one must apply the right contextual refinement Some Nash equilibria may be mathematical quirks of the data and have no meaning per se May be worth perturbing the data to see which equilibria survive Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 16 / 42

Zero-sum games Zero-sum games A two-player game with objectives π 1, π 2 is a zero-sum game if π 1 = π 2. Note: only two players! Objectives polar opposites of each other For finitely many strategies, we represent using a single matrix, say, A: rows are strategies of one player, columns of the other player (a, a) (c, c) (b, b) (d, d) = a b c d Row player is trying to minimize the value in the matrix, column player trying to maximize Security strategy Row i is a security strategy for row player if V (A) max j a i j max j a ij i Column j is a security strategy for column player if V (A) min i a ij min i a ij j V (A) = min i max j a ij V (A) = max j min a ij Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 17 / 42

Saddle point For any game A, V (A) V (A), but the inequality is often strict: P 2 P 1 0 1 2 1 1 3 3 2 2 2 0 1 V (A) =?V (A) =? Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 18 / 42

Saddle point For any game A, V (A) V (A), but the inequality is often strict: P 2 P 1 0 1 2 1 1 3 3 2 2 2 0 1 V (A) =?V (A) =? Pair of strategies (i, j ) is a saddle point if a i,j = V (A) = V (A) Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 18 / 42

Saddle point For any game A, V (A) V (A), but the inequality is often strict: P 2 P 1 0 1 2 1 1 3 3 2 2 2 0 1 V (A) =?V (A) =? Pair of strategies (i, j ) is a saddle point if a i,j = V (A) = V (A) At a saddle point: a i,j a i,j a i,j In short a saddle point is Nash equilibrium of a zero-sum game Zero-sum game need not have a saddle point If there is a saddle point, then all saddle points have the same value. If (i 1, j 1) and (i 2, j 2) are saddle points, (i 1, j 2) and (i 2, j 1) are also saddle points Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 18 / 42

Saddle point For any game A, V (A) V (A), but the inequality is often strict: P 2 P 1 0 1 2 1 1 3 3 2 2 2 0 1 V (A) =?V (A) =? Pair of strategies (i, j ) is a saddle point if a i,j = V (A) = V (A) At a saddle point: a i,j a i,j a i,j In short a saddle point is Nash equilibrium of a zero-sum game Zero-sum game need not have a saddle point If there is a saddle point, then all saddle points have the same value. If (i 1, j 1) and (i 2, j 2) are saddle points, (i 1, j 2) and (i 2, j 1) are also saddle points Players need not coordinate on a particular equilibrium: cricket movies cricket movies Husband cricket 2 0 Wife cricket 1 0 movies 0 1 movies 0 2 Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 18 / 42

Mixed strategies Rather than let players pick specific rows/columns, we now let them randomly choose them Strategies now are the probabilities, in other words the distributions Mixed strategies For row player is a probability distribution y over the set of rows of A R m n y R m such that y 0 and 1 y = 1. Similarly, a mixed strategy for the column player is a probability distribution over the set of columns of A R m n. i.e., z R n such that z 0, 1 z = 1. Column player chooses z to maximize i,j a ijy i z j = y Az. Row player y to minimize y Az. Rows/columns are called pure strategies. Interpretation Deliberate randomization Division of resources Diversification of portfolio Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 19 / 42

Minimax theorem Although there may be no saddle point in pure strategies, there is always a saddle point if one allows mixed strategies There exist y, z such that (y ) Az (y ) Az y Az y, z (y ) Az = min y max z All saddle points have same value y Az = max z Saddle points are comprised of security strategies min y Az y If (y 1, z 1 ) and (y 2, z 2 ) are saddle points, then so are (y 1, z 2 ), (y 2, z 1 ) Minimax theorem is a major milestone in the theory of games It was later observed that it is equivalent to linear programming duality, which came many years later Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 20 / 42

Non-zero sum games let N = {1, 2,..., N} set of players M i is the finite set of pure strategies for each i N ax i 1,x 2,...,x N = payoff for player i, denoted as P i, when the strategies chosen by P 1, P 2,, P n are x 1, x 2,, x N respectively. Y i : set of probability distributions on M i for P i, equivalently mixed strategies Player i chooses y i to minimize π i (y 1,..., y N ) = aj i 1,...,j N yj 1 1... yj N N j 1 M 1,j 2 M 2,...,j N M N Although there may not be a Nash equilibrium in pure strategies, there always is a Nash equilibrium in mixed strategies This is a significant generalization of von Neumann s minimax theorem Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 21 / 42

Continuous kernel games Results on existence of Nash equilibria can be extended beyond matrix games and mixed strategies. In continuous kernel games, players have a continuum of strategies. e.g., velocity of a robot, price of a commodity etc. A mixed strategy in such games is a probability measure on the space of pure strategies Theorem (Existence of equilibria in pure strategies) Let S i R m i be a convex and compact for each i N. For each i N, let u i : j N S j R be continuous and such that u i (x i, x i ) is convex in x i for each fixed x i. Then, there exists a Nash equilibrium (in pure strategies). Theorem (Existence of equilibria in mixed strategies) Let S i R m i be compact (not necessarily convex) for each i N and let u i be continuous. Then there exists an equilibrium in mixed strategies. Further extensions: S i are compact Hausdorff spaces (Glicksberg 1952) Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 22 / 42

Dynamic games All the above games were simultaneous move games Example P 1 making the first move starting at node x. If P 1 plays L 1, the game ends. Else if P 1 plays R 1, P 2 has to move next at node y. What are the strategies? What are the Nash equilibria? ( 1, 1) (1, 1) L 2 R 2 (0, 2) z L 1 x (P1) y (P2) R 1 For P 1 : L 1 or R 1 For P 2 : do nothing if P 1 has played L 1 or L 2, R 2 if P 1 has played R 1 Strategies for P 2 are a function of what has happened previously This game has two Nash equilibria: (L 1, L 2) and (R 1, R 2) (L 1, L 2) is a threat equilibrium (R 1, R 2) is subgame perfect (more later) Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 23 / 42

Dynamic games Suppose that P 2 knows whether P 1 has played L or not, but cannot distinguish between M and R. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 24 / 42

Dynamic games Suppose that P 2 knows whether P 1 has played L or not, but cannot distinguish between M and R. The set of possible strategies for P1 is L, M, R. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 24 / 42

Dynamic games Suppose that P 2 knows whether P 1 has played L or not, but cannot distinguish between M and R. The set of possible strategies for P1 is L, M, R. Strategies of P 2 are a function of L and L C. Since P 2 cannot distinguish between M and R, his strategies are same for both cases. γ 2 1 = γ 2 3 = { L if P 1 plays L L if P 1 plays M or R γ2 2 = { L if P 1 plays L R if P 1 plays M or R γ2 4 = { R R { R L if P 1 plays L if P 1 plays M or R if P 1 plays L if P 1 plays M or R Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 24 / 42

Extensive form dynamic games An extensive-form dynamic game for N players is a tree with the following properties: A specific vertex indicating the starting point A payoff for each player at each terminal node: J 1 (node),..., J N (node) A partition of the nodes of the tree into N player sets A subpartition of each player set into information sets {η i j }, i N such that the same number of branches emanate from each node belonging to the same information set and no node follows another node in the same information set Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 25 / 42

Extensive form dynamic games An extensive-form dynamic game for N players is a tree with the following properties: A specific vertex indicating the starting point A payoff for each player at each terminal node: J 1 (node),..., J N (node) A partition of the nodes of the tree into N player sets A subpartition of each player set into information sets {η i j }, i N such that the same number of branches emanate from each node belonging to the same information set and no node follows another node in the same information set P2 a c b P1 P1 P1 d e f g h f g h P3 P4 P4 P3 m n o Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 25 / 42

Strategies in extensive form dynamic games In a dynamic game, strategies are not merely actions ; they are a complete plan of actions for each possible scenario Thus they are functions of the information Denote by I i the set of all information sets of P i. For any η i I i, let U i η be the set i of actions available to P i at η i. A strategy for P i is a function γ i : I i U i, where U i = such that η i I i U i η i γ i (η i ) U i η i ηi I i Optimization v/s Optimal control This is exactly the distinction between optimization and optimal control. In optimization the problem is to find a vector. In optimal control, the problem is to find a function or a control law. The sets I i, i N determine the information structure of the game Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 26 / 42

Nash equilibrium and the normal form Each choice of strategies (functions γ 1,..., γ N ) results in a specific path on the tree and hence a specific payoff for each player (Why?). Denote this as J i (γ 1,..., γ N ) A dynamic game can be considered to be a static game, but in the space of γ s Nash equilibrium γ 1,..., γ N such that for all i N, J i (γ 1,..., γ N ) J i (γ 1,..., γ i 1, γ i, γ i+1,..., γ i ) γ i In an extensive form game one can write an equivalent finite strategy game where strategies are the γ s (why?). This is called the normal form The Nash equilibrium above is that of this normal form Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 27 / 42

Classification of games based on information structures A Nash equilibrium of a dynamic game can be found using the normal form; but in practice this is difficult The tree structure suggests that one should be able to use a recursive argument something like dynamic programming. This is not always possible since information sets may stretch across many branches. Thus a player cannot argue recursively since he does know which branch he is at. The ease of recursively solving games depends on its information structure. In a simultaneous move or static game: each player has only one information set A game of perfect information: each node is in a different information set, or each information set is a singleton (each player knows the sequence of strategies played at any point. In other words) In a single act game, each player can play at most once. Therefore, each path starting from the root node intersects the player set of each player at most once. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 28 / 42

Perfect information and backward induction We can find an equilibrium of a game with perfect information by backward induction (multiplayer-version of dynamic programming) (assume players are seeking to minimize) At node 4 optimal action for P 3 is to pick L 3, at node 5 R 3, at node 6 R 3 and at node 7 L 3. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 29 / 42

Perfect information and backward induction We can find an equilibrium of a game with perfect information by backward induction (multiplayer-version of dynamic programming) (assume players are seeking to minimize) We are left with this game. Now at node 2, P 2 will pick L 2, at node 3, P 2 will pick R 2. At root node P 1 will choose L 1. Thus we are left with a strategy {L 1, L 2, L 3}, corresponding to which the payoff is (1,2,3) for P 1, P 2 and P 3, is in Nash equilibrium. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 29 / 42

Subgame perfection Does the backward induction process capture all equilibria? No! It captures only those that are recursively rational i.e., equilibria that are also equilibria for every subgame These equilibria are called subgame perfect Example P 1 making the first move starting at node x. If P 1 plays L 1, the game ends. Else if P 1 plays R 1, P 2 has to move next at node y. (0, 2) z ( 1, 1) (1, 1) L 2 R 2 y (P2) L 1 R 1 x (P1) The only reliable way to find all equilibria is to analyze the normal form Finding equilibria for general games involves decomposing the tree into subgames that are either games of perfect information or static games Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 30 / 42

Existence of Nash equilibria in extensive form games Perfect information A game with perfect information always admits a Nash equilibrium in pure strategies More generally, very little can be said Mixed strategy equilibrium Any game in extensive form always admits a Nash equilibrium in mixed strategies (Why?) A mixed strategy is a randomization over pure strategies Ideally we would like something more intuitive: at each information set, we would like to pick an action randomly. Such a strategy is called a behavioural strategy. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 31 / 42

Existence of Nash equilibria in extensive form games Perfect information A game with perfect information always admits a Nash equilibrium in pure strategies More generally, very little can be said Mixed strategy equilibrium Any game in extensive form always admits a Nash equilibrium in mixed strategies (Why?) A mixed strategy is a randomization over pure strategies Ideally we would like something more intuitive: at each information set, we would like to pick an action randomly. Such a strategy is called a behavioural strategy. Kuhn s theorem If each player in the game has perfect recall, there exists an equilibrium in behavioural strategies Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 31 / 42

Dynamic games in state space form Control theory allows two kinds of models of systems: input/output models and state space The extensive form is an input/output representation. Following is a state space representation State space model of a game Players N = {1,..., N} Discrete time k K = {1,..., K} Action uk i Ui k at time k State x k X k at time k, given initial state x 0 and state dynamics given by x k+1 = f k (x k, u 1 k,..., un k ) Observations yk i = hi k (x k) for player i N at time k K Information ηk i {y j t j N, t k} {uj t j N, t k 1}. Let I k i be the ambient space of ηk i Strategies γk i : I k i Ui k for each i N, k K. γi = (γk i ) k K. Space of such mappings is Γ i Cost function J i (γ 1,..., γ N ) for each i N Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 32 / 42

Dynamic games in state space: Nash equilibrium and information structures Once again, the Nash equilibrium is given by γ 1,..., γ N such that J i (γ 1,..., γ N ) J i (γ i ; γ i ) γ i Γ i. As before a choice of γ 1,..., γ N uniquely determines a state trajectory (akin to a path through the tree) Usually one also makes the assumption that the cost is stage additive J i (γ 1,..., γ N ) k K g i k(x k, u 1 k,..., u N k ) Again, whether we can apply dynamic programming or do we have to solve the problem in function space is determined by the information structure One can also make the model more general by including a terminal state Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 33 / 42

Examples of information structures and relation to control theory Open loop: η i k = {x 0} for all i, k Closed loop, perfect state: η i k = {x 0,..., x k } Closed loop, imperfect state: η i k = {y i k,..., y i k} Feedback, perfect state η i k = {x k } Feedback, imperfect state η i k = {y i k} Optimal control Optimal control is a particular example of the above model with N = {1} and feedback or closed loop information structure. When one is seeking optimal feedback control laws, or output feedback one is implicitly assuming the feedback or closed loop information structure. Information structures can be much more complex than those above; whether one can do dynamic programming depends on the information structure Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 34 / 42

Informational nonuniqueness of Nash equilibria An open loop equilibrium strategy is a constant strategy. Thus open loop equilibria can be found by considering a static game (in the actions) obtained by back-substituting the state equation: e.g., u i k = γ i k(x k ) = γ i k(f k 1 (x k 1, u 1 k 1,..., u N k 1)) = In a deterministic game, since future state is a deterministic function of past states and actions, any closed loop strategy has an equivalent open loop strategy that generates the same state trajectory. Consequently, in optimal control, strategies are generically informationally nonunique Thus one speaks not of an optimal strategy but an equivalence class of optimal strategies that generate the same trajectory. This issue also percolates to Nash equilibria. Generally, we have Informational nonuniqueness G 1 is informationally inferior to G 2 if for every player, we have that any stage, whatever the player knows in G 1, he also knows in G 2 at the same stage. Then any equilibrium of G 1 is also an equilibrium of G 2. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 35 / 42

Team problems and optimal control A game is called a team problem if J 1 = J 2 =... = J N J A team is only faced with optimization and no competition. γ 1,..., γ N is called team optimal if J(γ 1,..., γ N ) J(γ 1,..., γ N ), γ 1,..., γ N. The team optimal solution concept applies if the game is cooperative. i.e., players can beforehand agree on strategies γ 1,..., γ N is called person by person optimal if J(γ 1,..., γ N ) J(γ i, γ i ), γ i, i N. This solution concept applies if the game is noncooperative. Optimal control problems can be considered as team problems in two ways Trivial team with N = {1} Each stage or time instant is a separate player that acts only once. i.e., N = K and U i k is a singleton (a trivial action) if i k More generally, teams are distinct from optimal control. Team optimization cannot be reduced to ordinary optimization since information of player is different Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 36 / 42

Concrete results: open loop and closed loop equilibria of LQ games Open loop Nash equilibria can in principle be found by considering a static game (obtained by back-substitution). Generally this game becomes too complex. The other option is to use Pontryagin s minimum principle, which is also complex 1 However, if the game is linear-quadratic, i.e., g i k(u k, x k ) = 1 2 x k Q i kx k + j N (u j k ) R ij k uj k, x k+1 = A k x k + j N B j k uj k, with Q i k 0, R ii 0, one can solve the conditions given by the minimum principle to show equilibria exist and to find equilibria A similar method yields equilibria for closed loop information structure and feedback information structure Structure of the closed loop equilibrium is closely related to the Ricatti equation Similar results exist for LQ games with infinite horizon 1 My student and I have developed a new way of attacking this question [AK15] Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 37 / 42

Stochastic teams We now focus only on team problems; things are harder for games Things get significantly more complex if there is noise involved x k+1 = f (x k, u 1 k,..., u N k, w k ) The recursive back-substitution no more works Dynamic programming needs cascading conditional expectations, which work only if the information structure is nested. The only case we know to solve reliably is when I k I k+1 Outside this setting, even simple team problems remain unsolved Witsenhausen s counterexample [Wit68] x 0, w independent Gaussian x 1 = u 0 + w [ J(γ 0, γ 1) = E (u 0 x 0) 2 + (u 0 u 1) 2], u 0 = γ 0(x 0), u 1 = γ 1(u 0 + w), Optimal γ 0, γ 1 are unknown. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 38 / 42

Interpretation of the Witsenhausen problem As a two-stage decision problem with finite memory Zero recall of past state and actions As a team problem Two players, cooperative game, sequentially played Imperfect communication between them As an engineering system γ 0 is a sensor γ 1 is a controller which is not co-located with the sensor to which these signals are sent As an organization γ 0 is a field agent γ 1 is a supervisor he reports to As a communication system γ 0, γ 1 are encoders and decoders, w is the channel noise Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 39 / 42

Static information structure A case we can solve Static team Environmental randomness is ξ. Observations z i = h i (ξ) for each i N. Find u i = γ i (z i ) to minimize E[L(u 1,..., u N, ξ)] The information structure is called static because the actions of a player do not affect the information of other players Closely related to the broadcast channel in communications Key structural result If L is convex in u 1,..., u N and continuously differentiable, then γ is team optimal if and only if it is person by person optimal LQG games (where h i is linear and L is convex quadratic in u and linear in ξ and ξ is Gaussian) can be solved easily 2. First shown by Radner [Rad62]. 2 I have recently extended this to show that it is easy to find near-optimal strategies if ξ has log-concave density [Kul15] Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 40 / 42

Dynamic information structure and information theory Dynamic information structure Information structure is dynamic if the action of a player affects the information of another player z i = h i (ξ, u j ). If u j cannot be inferred from z i, we have the situation where P j affects what P i can know, but P i does not know what P j knows Closely related to communication/information theory. Information theory does not get us too far, though. In general, there is little clarity about what to do in these settings More generally, complex networked settings like cyber-physical systems, smart-grids etc all face this issue Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 41 / 42

Additional reading on game theory Game theory by Fudenberg and Tirole Game theory by Myerson Game theory by Maschler, Solan and Zamir Dynamic noncooperative game theory by Başar and Olsder Stochastic networked control by Başar and Yuksel Works on H. Witsenhausen and Y.C. Ho Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 42 / 42

Mathew P. Abraham and Ankur A. Kulkarni. New results on existence of open loop Nash equilibria in discrete time dynamic games. to be submitted to IEEE Transactions on Automatic Control, 2015. Ankur A. Kulkarni. Approximately optimal linear strategies for static teams with big non-gaussian noise. In under review for the IEEE Conference on Decision and Control, 2015. Roy Radner. Team decision problems. The Annals of Mathematical Statistics, pages 857 881, 1962. H. S. Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6:131, 1968. Ankur A. Kulkarni (IIT Bombay) Game Theory Aug 7, 2015 42 / 42