Equilibria for games with asymmetric information: from guesswork to systematic evaluation

Size: px

Start display at page:

Download "Equilibria for games with asymmetric information: from guesswork to systematic evaluation"

Meryl Tucker
5 years ago
Views:

1 Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos EECS Department University of Michigan February 11, 2016

2 Joint work with Deepanshu Vasal (PhD student graduating May 2016) and Prof. Vijay Subramanian Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

3 Decentralized decision making in dynamic systems Communication networks Sensor networks Social networks Queuing systems Energy markets Wireless resource sharing Repeated online advertisement auctions Competing sellers/buyers chilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

Salient features Multiple agents (cooperative or strategic) Objective: Maximize expected (social or self) reward Underlying system state (not perfectly observed) Agents make observations (asymmetric

4 Salient features Multiple agents (cooperative or strategic) Objective: Maximize expected (social or self) reward Underlying system state (not perfectly observed) Agents make observations (asymmetric information) and take actions partially affecting future state chilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

5 Classification of problems Symmetric Information Teams Markov decision processes (MDP) or partially observed MDP (POMDP) Games subgame-perfect equilibrium (SPE) Markov-perfect equilibrium (MPE) chilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

6 Classification of problems Symmetric Information Teams Markov decision processes (MDP) or partially observed MDP (POMDP) Games subgame-perfect equilibrium (SPE) Markov-perfect equilibrium (MPE) Asymmetric Information Common information approach IEEE Control Theory Axelby paper award [Nayyar, Mahajan, Teneketzis, 2013] Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

7 Classification of problems Teams Games Symmetric Information Markov decision processes (MDP) or partially observed MDP (POMDP) subgame-perfect equilibrium (SPE) Markov-perfect equilibrium (MPE) Asymmetric Information Common information approach 1 Perfect Bayesian (PBE) Sequential eq. (SE) and refinements No methodology!? IEEE Control Theory Axelby paper award [Nayyar, Mahajan, Teneketzis, 2013] Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

8 Model Discrete-time dynamical system with N strategic agents over finite horizon T Player i privately observes her (static 2 ) type X i X i where N P(X ) = Q i (X i ), i=1 X = (X 1, X 2,... X N ) X Player i takes action A i t A i which is publicly observed Player i s observations: Private: X i, Common: A 1:t 1 = (A 1, A 2,..., A t 1 ) = (A j k )j N k t 1 Action (randomized) A i t σ i t( X i, A 1:t 1 ) Instantaneous reward R i (X, A t ) Player i s objective max σ i E σ { T } R i (X, A t ) t=1 2 Generalization to dynamic types straightforward. chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

9 Concrete example: A public goods game 3 Two players take action to either contribute (A i t = 1) or not contribute (A i t = 0) to the production of a public good Player i s type (private information) is her cost of contributing: X i {L, H}, where X i s are i.i.d. with P(X i = H) = q. (Assume 0 < L < 1 < H < 2) If either player contributes, the public good is produced and the utility enjoyed is 1 for both users (free riding) Per-period rewards (R 1 (X 1, A t ), R 2 (X 2, A t )) are contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) (1 X 1, 1 X 2 ) (1 X 1, 1) don t contribute(a 1 t = 0) (1, 1 X 2 ) (0,0) Each player s action A i t σ i t( X i, A 1:t 1 ). 3 Adapted from [Fudenberg and Tirole, 1991, Example 8.3] chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

10 Classification of problems Teams Games Symmetric Information Markov decision processes (MDP) or partially observed MDP (POMDP) Asymmetric Information Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

11 Team with perfect observation of X X is observed by everyone Single team objective R(X, A t ) = i N Ri (X, A t ) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 X 1 X 2 2 X 1 don t contribute(a 1 t = 0) 2 X 2 0 chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

12 Team with perfect observation of X X is observed by everyone Single team objective R(X, A t ) = i N Ri (X, A t ) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 X 1 X 2 2 X 1 don t contribute(a 1 t = 0) 2 X 2 0 Optimal decisions are myopic (just look at instantaneous reward) and functions of the current system state X = (X 1, X 2 ) (A 1 t, A 2 t ) = (1, 0) if (X 1, X 2 ) = (L, H) (0, 1) if (X 1, X 2 ) = (H, L) (1, 0) or (0, 1) if (X 1, X 2 ) = (L, L) (1, 0) or (0, 1) if (X 1, X 2 ) = (H, H) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

13 Team with perfect observation of X X is observed by everyone Single team objective R(X, A t ) = i N Ri (X, A t ) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 X 1 X 2 2 X 1 don t contribute(a 1 t = 0) 2 X 2 0 Optimal decisions are myopic (just look at instantaneous reward) and functions of the current system state X = (X 1, X 2 ) (A 1 t, A 2 t ) = (1, 0) if (X 1, X 2 ) = (L, H) (0, 1) if (X 1, X 2 ) = (H, L) (1, 0) or (0, 1) if (X 1, X 2 ) = (L, L) (1, 0) or (0, 1) if (X 1, X 2 ) = (H, H) What about time-varying types, e.g., Q(X t+1 X t ) or Q(X t+1 X t, A t )? MDP chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

14 Team with no observation of X X is not observed at all (symmetric information) Single team objective R(X, A t ) = i N Ri (X, A t ) Previous actions are not informative of X Same as before with average rewards (w.r.t. prior belief P(X i = H) = q) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 (mean total cost) 2 (qh + ql) don t contribute(a 1 t = 0) 2 (qh + ql) 0 chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

15 Team with no observation of X X is not observed at all (symmetric information) Single team objective R(X, A t ) = i N Ri (X, A t ) Previous actions are not informative of X Same as before with average rewards (w.r.t. prior belief P(X i = H) = q) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 (mean total cost) 2 (qh + ql) don t contribute(a 1 t = 0) 2 (qh + ql) 0 Optimal decisions are constant (A 1 t, A 2 t ) = (1, 0) or (0, 1) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

16 Team with no observation of X X is not observed at all (symmetric information) Single team objective R(X, A t ) = i N Ri (X, A t ) Previous actions are not informative of X Same as before with average rewards (w.r.t. prior belief P(X i = H) = q) contribute(a 2 t = 1) don t contribute(a 2 t = 0) contribute(a 1 t = 1) 2 (mean total cost) 2 (qh + ql) don t contribute(a 1 t = 0) 2 (qh + ql) 0 Optimal decisions are constant (A 1 t, A 2 t ) = (1, 0) or (0, 1) What about time-varying types, e.g., Q(X t+1 X t ) or Q(X t+1 X t, A t )? POMDP chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

17 Classification of problems Teams Games Symmetric Information subgame-perfect equilibrium (SPE) Markov-perfect equilibrium (MPE) Asymmetric Information Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

18 Game with perfect observation of X LL LH HL HH subgame L 1-L 1 1-L L L 1-L 1 1-H H H 1-H 1 1-L L H 1-H 1 1-H H 0 Players know exactly what branch they are on at each stage of the game Sub-game perfect equilibrium (SPE): given any history (path) players see a continuation game (sub-game) and do not want to deviate Algortihm: Backward induction Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

19 Game with perfect observation of X LL LH HL HH L 1-L 1-L L L 1-L 1 1-H H H 1-H 1-H 1-H 1 1-L H L 0 1-H 0 Here, at each stage of the game, the continuation game is the same SPE strategy profile does not depend on the entire history of actions but only on state X. Even with time-varying states, similar algorithm (backward induction) can be used Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

20 Classification of problems Teams Games Symmetric Asymmetric Information Information Common information approach Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

21 Decentralized team problem Player i s observations: Private: X i, Common: A 1:t 1 Action (randomized) A i t σ i t( X i, A 1:t 1 ) Design objective for entire team { T } max E σ R(X, A t ) σ }{{} t=1 e.g., i N Ri (X,A t) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

22 Decentralized team problem Player i s observations: Private: X i, Common: A 1:t 1 Action (randomized) A i t σ i t( X i, A 1:t 1 ) Design objective for entire team { T } max E σ R(X, A t ) σ }{{} t=1 e.g., i N Ri (X,A t) Problems to be addressed 4 1 Presence of common A 1:t 1 and private X i information for agent i 2 Decentralized, non-classical information structure (this is not a MDP/POMDP-like problem!) 3 Domain of policies A i t σ i t( X i, A 1:t 1) increases with time. 4 All these have been addressed in [Nayyar, Mahajan, Teneketzis, 2013] chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

23 A simple but powerful idea A policy σ i t( X i, A 1:t 1 ) can be interpreted in two equivalent ways: chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

24 A simple but powerful idea A policy σ i t( X i, A 1:t 1 ) can be interpreted in two equivalent ways: 1) A function of A 1:t 1 and X i to (A i ) X i A 1:t 1 H L (A i t) σ i t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

25 A simple but powerful idea A policy σ i t( X i, A 1:t 1 ) can be interpreted in two equivalent ways: 1) A function of A 1:t 1 and X i to (A i ) 2) A function of A 1:t 1 to mappings from X i to (A i ) X i A 1:t 1 H L (A i t) A 1:t X i (A i ) σ i t ψ i t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

26 A simple but powerful idea In the first interpretation, the policies to be designed (σ i ) i N have inherent asymmetric information structure X 1 σ 1 t A 1 t A 1:t 1 A 1:t 1 X 2 σ 2 t A 2 t Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

27 A simple but powerful idea In the second interpretation, each agent s action A i t σ i t( X i, A 1:t 1 ) can be thought of as a two-stage process chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

28 A simple but powerful idea In the second interpretation, each agent s action A i t σ i t( X i, A 1:t 1 ) can be thought of as a two-stage process 1 Based on common info A 1:t 1 select prescription functions Γ i t : X i (A i ) through the mapping ψ i Γ i t = ψ i t[a 1:t 1 ] A 1:t 1 ψ t Γ t = (Γ 1 t, Γ 2 t ) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

29 A simple but powerful idea In the second interpretation, each agent s action A i t σ i t( X i, A 1:t 1 ) can be thought of as a two-stage process 1 Based on common info A 1:t 1 select prescription functions Γ i t : X i (A i ) through the mapping ψ i X 1 A 1 t Γ i t = ψ i t[a 1:t 1 ] 2 The actions A i t are determined by evaluating Γ i t at the private information X i, i.e., A 1:t 1 ψ t Γ t = (Γ 1 t, Γ 2 t ) X 2 Γ 1 t Γ 2 t A i t Γ i t( X i ) A 2 t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

30 A simple but powerful idea In the second interpretation, each agent s action A i t σ i t( X i, A 1:t 1 ) can be thought of as a two-stage process 1 Based on common info A 1:t 1 select prescription functions Γ i t : X i (A i ) through the mapping ψ i X 1 A 1 t Γ i t = ψ i t[a 1:t 1 ] 2 The actions A i t are determined by evaluating Γ i t at the private information X i, i.e., A 1:t 1 ψ t Γ t = (Γ 1 t, Γ 2 t ) X 2 Γ 1 t Γ 2 t A i t Γ i t( X i ) A 2 t Overall A i t Γ i t( X i ) = ψ i t[a 1:t 1 ]( X i ) = σ i t( X i, A 1:t 1 ) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

31 Transformation to a centralized problem X 1 σ 1 t A 1 t X 1 A 1 t Γ 1 t A1:t 1 A1:t 1 A1:t 1 ψt Γt = (Γ 1 t, Γ 2 t ) Γ 2 t X 2 σ 2 t A 2 t X 2 A 2 t Generation of A i t is a dumb evaluation A i t Γ i t( X i ) (nothing to be designed here) The control problem boils down to selecting prescription functions Γ i t = ψ i t[a 1:t 1 ] through policy ψ = (ψ i t) i N t T The decentralized control problem has been transformed to a centralized control problem with a fictitious common agent who observes A 1:t 1 and takes actions Γ t Last issue to address: increasing domain A t 1 of the pre-encoder mappings ψ t. Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

32 Introduction of information state We would like to summarize A 1:t 1 in a quantity (state) with time invariant domain chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

33 Introduction of information state We would like to summarize A 1:t 1 in a quantity (state) with time invariant domain Consider the dynamical system with state: (X, A t 1 ) observation: A t 1 action: Γ t reward: E{R(X, A t ) X, A 1:t 1, Γ 1:t } = a t Γ t (a t X )R(X, a t ) := R(X, Γ t ) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

34 Introduction of information state We would like to summarize A 1:t 1 in a quantity (state) with time invariant domain Consider the dynamical system with state: (X, A t 1 ) observation: A t 1 action: Γ t reward: E{R(X, A t ) X, A 1:t 1, Γ 1:t } = a t Γ t (a t X )R(X, a t ) := R(X, Γ t ) This is a POMDP! Define the posterior belief Π t (X ) Π t (x) := P(X = x A 1:t 1, Γ 1:t 1 ) for all x X Can show that Π t can be updated using common information Π t+1 = F (Π t, Γ t, A t ) (Bayes law) (*) for this problem it also factors into its marginals Π t (x) = Π i t(x i ) with Π i t+1 = F (Π i t, Γ i t, A i t) i N chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

35 Characterization of optimal team policy From standard POMDP results, optimal policy is Markovian, i.e., Γ t = (Γ i t) i N = ψ t [A 1:t 1 ] = θ t [Π t ] A i t Γ i t( X i ) = θ i t[π t ]( X i ) = m i t( X i, Π t ) and can be obtained using backward dynamic programming (DP) θ t [π t ] = γ t = arg max γ t E {R(X, A t ) + V t+1 (F (π t, γ t, A t )) π t, γ t } V t (π t ) = max γ t E {R(X, A t ) + V t+1 (F (π t, γ t, A t )) π t, γ t } on the space of beliefs π t (X ) over prescriptions γ t (X i A i ) i N chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

36 Characterization of optimal team policy From standard POMDP results, optimal policy is Markovian, i.e., Γ t = (Γ i t) i N = ψ t [A 1:t 1 ] = θ t [Π t ] A i t Γ i t( X i ) = θ i t[π t ]( X i ) = m i t( X i, Π t ) and can be obtained using backward dynamic programming (DP) θ t [π t ] = γ t = arg max γ t E {R(X, A t ) + V t+1 (F (π t, γ t, A t )) π t, γ t } V t (π t ) = max γ t E {R(X, A t ) + V t+1 (F (π t, γ t, A t )) π t, γ t } on the space of beliefs π t (X ) over prescriptions γ t (X i A i ) i N In the public goods example: π t (π 1 t (H), π 2 t (H)) [0, 1] 2 and γ t (γ 1 t (0 H), γ 1 t (0 L), γ 2 t (0 H), γ 2 t (0 L)) [0, 1] 4 chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

37 Summary of team problem Introduction of prescription functions was crucial We gained: - Decentralized non-classical information structure POMDP A i t θ i t[π t ]( X i ) and θ can be obtained using DP chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

38 Summary of team problem Introduction of prescription functions was crucial We gained: - Decentralized non-classical information structure POMDP A i t θ i t[π t ]( X i ) and θ can be obtained using DP We gave up: - Fictitious common agent does not observe X i. - Can only maximize average reward-to-go E{ T t =t R(X, A t ) A 1:t 1} before seeing private information, - This is not a problem in teams since we are interested in maximizing the average reward chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

39 Classification of problems Teams Games Asymmetric Information Symmetric Information Perfect Bayesian (PBE) Sequential eq. (SE) and refinements No methodology!? Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

40 Perfect Bayesian equilibria (PBE) Player s 1 perspective LL LH HL HH chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

41 Perfect Bayesian equilibria (PBE) Player s 1 perspective LL LH HL HH not a proper sub-game 2 need belief on X to evaluate expected future reward chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

42 Perfect Bayesian equilibria (PBE) Player s 1 perspective LL LH HL HH not a proper sub-game 2 need belief on X (conditioned on 01) to evaluate expected future reward SPE is not appropriate equilibrium concept! Perfect Bayesian equilibrium (PBE) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

43 Perfect Bayesian equilibria (PBE) A PBE is an assessment (σ, µ ) of strategy profiles σ and beliefs µ satisfying (a) sequential rationality and (b) consistency (a) For every t T, agent i N, information set (A 1:t 1, X i ), and unilateral deviation σ i T T E µ,σ i σ i { R i (X, A t ) A 1:t 1, X i } E µ,σ i σ i { R i (X, A t ) A 1:t 1, X i } t =t (b) Beliefs µ should be updated by Bayes law (whenever possible) given σ and satisfy further consistency conditions [Fudenberg and Tirole, 1991, ch. 8] t =t Due to the circular dependence of µ and σ finding PBE is a large fixed-point problem (no time decomposition) Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

44 Ideas from teams: structured equilibrium strategies σ Useful idea from teams: Instead of considering equilibria with general strategies σ = (σt i ) i N t T form A i t σ i t ( X i, A 1:t 1 ) consider equilibria with structured strategies θ = (θ i t) i N t T where A i t Γ i t( X i ) = θ i t[π t ]( X i ) = m i t( X i, Π t ) of the form of the Π t+1 = F (Π t, Γ t, A t ) = F (Π t, θ t [Π t ], A t ) = F θ t (A 1:t ) (essentially Bayes law) σ θ Note: although equilibrium strategies are structured, unilateral deviations may be anything Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

45 Parenthesis: are structured strategies restrictive? Lemma For any given strategy profile σ = (σ i ) i N, there exists a structured strategy profile θ m = (m i ) i N with the players receiving the same average rewards for both σ and m. chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

46 Parenthesis: are structured strategies restrictive? Lemma For any given strategy profile σ = (σ i ) i N, there exists a structured strategy profile θ m = (m i ) i N with the players receiving the same average rewards for both σ and m. Bottom line: Structured strategy profiles m are a sufficiently rich class so that we can concentrate on equilibria within this class. Caveat: Each m i depends on the entire σ = (σ i ) i N, so unilateral deviations in σ i result in multilateral deviations in m chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

47 Ideas from teams: beliefs µ Recall that in PBE, µ is a set of beliefs on unobserved types X i for each agent i and for each private history (information set) (A 1:t 1, X i ) Consider beliefs that are: (a) only functions of the common history A 1:t 1 and (b) are generated from a common belief in product form µ t [A 1:t 1 ](X ) = µ j t [A 1:t 1 ](X j ) So, for each agent i and for each history (A 1:t 1, X i ) belief on X i is µ j t [A 1:t 1 ](X j ) j N \{i} j N In addition, given strategies σ θ, these beliefs are updated as µ i t+1[a 1:t ] = F (µ i t [A 1:t 1 ], θ }{{}}{{} t[µ i t [A 1:t 1 ]], A i }{{} t) Π i Π i t+1 t Γ i t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

48 Ideas from teams: beliefs µ Recall that in PBE, µ is a set of beliefs on unobserved types X i for each agent i and for each private history (information set) (A 1:t 1, X i ) Consider beliefs that are: (a) only functions of the common history A 1:t 1 and (b) are generated from a common belief in product form µ t [A 1:t 1 ](X ) = µ j t [A 1:t 1 ](X j ) So, for each agent i and for each history (A 1:t 1, X i ) belief on X i is µ j t [A 1:t 1 ](X j ) j N \{i} j N In addition, given strategies σ θ, these beliefs are updated as µ i t+1[a 1:t ] = F (µ i t [A 1:t 1 ], θ }{{}}{{} t[µ i t [A 1:t 1 ]], A i }{{} t) Π i Π i t+1 t Γ i t Bottom line: all consistency conditions are satisfied automatically. chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

49 Summary so far We have motivated the use of structured (equilibrium) strategies σ θ {}}{ A i t σt i ( A 1:t 1, X i ) = θt[ i µ t [A 1:t 1 ]]( X i ) }{{} Γ i t We have restricted attention to a class of beliefs µ that remain independent and updated as µ i t+1[a 1:t ] = F (µ i t [A 1:t 1 ], θ }{{}}{{} t[µ i t [A 1:t 1 ]], A i }{{} t) Π i Π i t+1 t Γ i t PBE equilibrium (σ, µ ) (θ, µ ) even in this restricted class is still the solution of a large fixed point equation. Circularity between θ and µ still present Π t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

50 Summary so far We have motivated the use of structured (equilibrium) strategies σ θ {}}{ A i t σt i ( A 1:t 1, X i ) = θt[ i µ t [A 1:t 1 ]]( X i ) }{{} Γ i t We have restricted attention to a class of beliefs µ that remain independent and updated as µ i t+1[a 1:t ] = F (µ i t [A 1:t 1 ], θ }{{}}{{} t[µ i t [A 1:t 1 ]], A i }{{} t) Π i Π i t+1 t Γ i t PBE equilibrium (σ, µ ) (θ, µ ) even in this restricted class is still the solution of a large fixed point equation. Circularity between θ and µ still present How can we find θ with a simple algorithm? Beliefs and policies are decomposed by considering the policies for all possible beliefs π; not just for µ chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44 Π t

51 First erroneous attempt Recall DP equation from team problem For each t = T, T 1,..., 1 and for every π t (X ) solve the following maximization problem θ t [π t ] = γt { = arg max E πt,γi t γ i t R(X, At ) + V t+1 (F (π t, γtγ i t i, A t )) } γt i γ i t What is the logical extension in games? chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

52 First erroneous attempt Recall DP equation from team problem For each t = T, T 1,..., 1 and for every π t (X ) solve the following maximization problem θ t [π t ] = γt { = arg max E πt,γi t γ i t R(X, At ) + V t+1 (F (π t, γtγ i t i, A t )) } γt i γ i t What is the logical extension in games? Transform it into a best-response type equation (fix γt i γt) i and maximize over γ i t for all i N arg max E πt,γi t γ i t γt i { R i (X, A t ) + V i t+1(f (π t, γ i tγ i t, A t )) } Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

53 First erroneous attempt: what is the catch? γ i t for all i N Why erroneous? arg max E πt,γi t γ i t γt i { R i (X, A t ) + V i t+1(f (π t, γ i tγ i t, A t )) } chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

54 First erroneous attempt: what is the catch? γ i t for all i N Why erroneous? arg max E πt,γi t γ i t γt i { R i (X, A t ) + V i t+1(f (π t, γ i tγ i t, A t )) } Explanation: reward-to-go is not conditioned on the entire history (A 1:t 1, X i ) for user i but only on part of it A 1:t 1 Π t. This was OK in teams but is not sufficient to prove sequential rationality in games! T T E µ,σ i σ i { R i (X, A t ) A 1:t 1, X i } E µ, σ i σ i { R i (X, A t ) A 1:t 1, X i } t =t t =t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

55 Special case 5 Consider dynamical systems for which belief update is prescription-independent, i.e., Π t+1 = F (Π t, A t ) In that case the backward process decomposes and conditioning on X i is irrelevant A strong statement can be made for this special case: For every PBE there exists a structured PBE that corresponds to a SPE of an equivalent symmetric-information game 5 [Nayyar, Gupta, Langbort, Başar, 2014], [Gupta, Nayyar, Langbort, Başar, 2014] chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

56 Second erroneous attempt Condition on X i in the backward induction step to be consistent with sequential rationality condition For each t = T, T 1,..., 1 and for every π t (X ) solve the following one-step fixed-point equation γ i t for all i N and for all x i X i arg max E πt,γi t ( x i )γ i t γt i Note in this case reward-to-go is V i t (π t, x i ) { R i (X, A t ) + V i t+1(f (π t, γ i tγ i t, A t ), x i ) x i} Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

57 Second erroneous attempt: explanation E{ } = γt(a i t x i i )γt i (at i x i )π i (x i ) a t,x i ( R i (x i x i, a t ) + Vt+1(F i (π t, γtγ i t i, a t ), x i ) ) This is an unusual fixed point equation: dependence on γ i t( x i ) but also on the entire γ i t( ) (inside the belief update) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

58 Second erroneous attempt: explanation E{ } = γt(a i t x i i )γt i (at i x i )π i (x i ) a t,x i ( R i (x i x i, a t ) + Vt+1(F i (π t, γtγ i t i, a t ), x i ) ) This is an unusual fixed point equation: dependence on γ i t( x i ) but also on the entire γ i t( ) (inside the belief update) Unfortunately this results in FP solution θ with γ t = θ t [π t, x] so resulting policy is of the form A i t Γ i t ( X i ) = θ i t[π t, X ]( X i ) which is not implementable (requires unknown private information X i for the strategy of i). chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

59 Third erroneous attempt Condition on X i in the backward induction step to be consistent with sequential rationality and optimize only over some part of the prescription For each t = T, T 1,..., 1 and for every π t (X ) solve the following one-step fixed-point equation γ i t ( x i ) for all i N and for all x i X i arg max E πt,γi t ( x i )γ i t γt i ( x i ) { R i (X, A t ) + V i t+1(f (π t, γ i t( x i )γ i t ( )γ i t, A t ), x i ) x i} chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

60 Third erroneous attempt Condition on X i in the backward induction step to be consistent with sequential rationality and optimize only over some part of the prescription For each t = T, T 1,..., 1 and for every π t (X ) solve the following one-step fixed-point equation γ i t ( x i ) for all i N and for all x i X i arg max E πt,γi t ( x i )γ i t γt i ( x i ) { R i (X, A t ) + V i t+1(f (π t, γ i t( x i )γ i t ( )γ i t, A t ), x i ) x i} This results in FP solution θ with γ t = θ t [π t ] for all π t (X ) Unfortunately, does not work in the proof: something more fundamental is going on... chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

61 An algorithm for PBE evaluation: backward recursion For each t = T, T 1,..., 1 and for every π t (X ) solve the following one-step fixed-point equation for all i N and for all x i X i γt i ( x i ) arg max E πt,γi t ( x i )γ i t γt i ( x i ) { R i (X, A t ) + V i t+1(f (π t, γ i t γ i t, A t ), x i ) x i} This results in FP solution θ with γ t = θ t [π t ] for all π t (X ) This is not a best-response type function: γt i side present on left/right hand Intuition: Find γ i t( x i ) that is optimal under unperturbed belief update! Remember the core concept in PBE... Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

62 An algorithm for PBE evaluation: forward recursion From backard recursion we have obtained θ = (θ i t) i N t T. For each t = 1, 2,..., T and for every i N, A 1:t, and X i σ i t (A i t A 1:t 1, X i ) := θt[µ i t [A 1:t 1 ]](A i }{{} t X i ) Γ i t µ t+1[a 1:t ] := F (µ t [A 1:t 1 ], θ t [µ t [A 1:t 1 ]], A t ) }{{}}{{}}{{} Π t+1 Π t Γ t In fact we can obtain a family of PBEs for any type distribution i N Qi (X i ) with appropriate initialization of µ 1 Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

63 Main Result Theorem (σ, µ ) generated by the backward/forward algorithm (whenever it exists) is a PBE, i.e. for all i, t, A 1:t 1, X i, σ i, E σ i t:t σ i t:t µ t { T R i (X, A n ) } A 1:t 1 X i n=t E σi t:t σ i t:t µ t and µ satisfies the consistency conditions. { T R i (X, A n ) } A 1:t 1 X i n=t chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

64 Sketch of the proof Independence of types and specific DP equation are crucial in proving the result Modified comparison principle (backward induction) chilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

65 Sketch of the proof Independence of types and specific DP equation are crucial in proving the result Modified comparison principle (backward induction) Specific DP guarantees that unperturbed reward-to-go (LHS) at time t is the obtained value function Vt i = R i + Vt+1 i Specific DP guarantees that unilateral deviations with fixed belief update reduce Vt i Induction step reduces Vt+1 i to (perturbed) reward-to-go at time t + 1 Independence of types guarantees that resulting expression is exactly the (perturbed) reward-to-go at time t (RHS) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

66 Comments on the new per-stage FP equation This is not a best-response type of FP equation (due to presence of γ i on both the LHS and RHS of equation) Standard tools for existence of solution (e.g., Brouwer, Kakutani) do not apply (problem with continuity of V ( ) functions) chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

67 Comments on the new per-stage FP equation This is not a best-response type of FP equation (due to presence of γ i on both the LHS and RHS of equation) Standard tools for existence of solution (e.g., Brouwer, Kakutani) do not apply (problem with continuity of V ( ) functions) Existence can be shown for a special case 6 where R i (X, A t ) does not depend on its own type X i In that case prescriptions Γ i t( X i ) = Γ i t( ) do not depend on private type X i and FP equation reduces to best response. No signaling! Essentially reduces to the model Π t+1 = F (Π t, A t ) 6 [Ouyang, Tavafoghi, Teneketzis, 2015] chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

68 Current/Future work Model generalizations: Types are independent controlled Markov processes (controlled by all actions) P(X t X 1:t 1, A 1:t 1) = i N Qi (Xt i Xt 1, i A t 1) 7 Dependence types with strategic independence 8 Types are observed through a noisy channel (even by same user) Q(Yt i Xt i ). Example: informational cascades literature Infinite horizon and continuous action spaces Existence results: prove existence for the simplest non-trivial class of problems. Core issue: the per-stage FP equation is not a best response Dynamic mechanism design (indirect mechanisms with message space smaller than type space) 7 [Vasal, Subramanian, A, 2015b] 8 [Battigalli, 1996] chilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

69 Thank you! Achilleas Anastasopoulos (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

70 Extra: FP equations First attempt γ 1 = f 1 (γ 2, π) γ 2 = f 2 (γ 1, π) } γ = f (γ, π) γ = θ(π) Second attempt γ 1 = f 1H (γ 2, π) γ 1 = f 1L (γ 2, π) γ 2 = f 2H (γ 1, π) γ 2 = f 2L (γ 1, π) γ = f LL (γ, π) γ = f LH (γ, π) γ = f HL (γ, π) γ = f HH (γ, π) γ = θ(π, x) Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

71 Extra: FP equations Third attempt Correct γ H 1 = f 1H (γl 1, γ2, π) γ L 1 = f 1L (γh 1, γ2, π) γ H 2 = f 2H (γl 2, γ1, π) γ L 2 = f 2L (γh 2, γ1, π) γ H 1 = f 1H (γ 1, γ 2, π) γ L 1 = f 1L (γ 1, γ 2, π) γ H 2 = f 2H (γ 2, γ 1, π) γ L 2 = f 2L (γ 2, γ 1, π) { γ 1 = f 1 (γ 1, γ 2, π) γ 2 = f 2 (γ 1, γ 2, π) { γ 1 = f 1 (γ 1, γ 2, π) γ 2 = f 2 (γ 1, γ 2, π) } γ = f (γ, π) γ = θ(π) } γ = f (γ, π) γ = θ(π) Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

72 Battigalli, P. (1996). Strategic independence and perfect Bayesian equilibria. Journal of Economic Theory, 70(1): Fudenberg, D. and Tirole, J. (1991). Game Theory. MIT Press, Cambridge, MA. Gupta, A., Nayyar, A., Langbort, C., and Başar, T. (2014). Common information based markov perfect equilibria for linear-gaussian games with asymmetric information. SIAM Journal on Control and Optimization, 52(5): Nayyar, A., Gupta, A., Langbort, C., and Başar, T. (2014). Common information based markov perfect equilibria for stochastic games with asymmetric information: Finite games. IEEE Trans. Automatic Control, 59(3): Nayyar, A., Mahajan, A., and Teneketzis, D. (2013). Decentralized stochastic control with partial history sharing: A common information approach. Automatic Control, IEEE Transactions on, 58(7): Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

73 Ouyang, Y., Tavafoghi, H., and Teneketzis, D. (2015). Dynamic oligopoly games with private markovian dynamics. Available at www-personal.umich.edu/~tavaf/oligopolygames.pdf. Vasal, D., Subramanian, V., and Anastasopoulos, A. (2015a). A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. Technical report. Vasal, D., Subramanian, V., and Anastasopoulos, A. (2015b). A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. In American Control Conference. (Accepted for publication/presentation). Achilleas Anastasopoulos anastas@umich.edu (U of Michigan) Equilibria for games with asymmetric information: from guesswork to systematic February evaluation 11, / 44

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and