Mean Field Competitive Binary MDPs and Structured Solutions

Size: px
Start display at page:

Download "Mean Field Competitive Binary MDPs and Structured Solutions"

Transcription

1 Mean Field Competitive Binary MDPs and Structured Solutions Minyi Huang School of Mathematics and Statistics Carleton University Ottawa, Canada MFG217, UCLA, Aug 28 Sept 1, / 32

2 Outline of talk The competitive binary MDP model (Huang and Ma 16, 17) The stationary MFG equation (with discount) Model features: monotone state dynamics, resetting action, positive externality Check: ergodicity, existence, uniqueness, solution structure, pure strategies, stable/unstable equilibria Exploit positive externality to prove uniqueness Extensions 2 / 32

3 Motivation and background Dynamics and cost Finite horizon Model: Individual controlled state x i t [, 1], action a i t {, 1}. Motivation for this class of competitive MDP models Except for LQ cases, general nonlinear MFGs rarely have closed-form solutions We introduce this class of MDP models which have simple solution structures threshold policy (see next page) i) There is a long tradition of looking for threshold policies to various decision problems in the operations research community ii) A threshold policy is also studied in R. Johari s ride sharing problem Sometimes simple models tell more; the binary MDP modeling is of interest in its own right 3 / 32

4 Motivation and background Dynamics and cost Finite horizon action a t i i state x t Figure : threshold is.6 4 / 32

5 Motivation and background Dynamics and cost Finite horizon Dynamic games of MDPs were initial introduced by L. Shapley (1953) and called stochastic games. Literature on large population games with discrete states and/or actions (or discrete time MDP) Weintraub et al. (25, 28), Huang (212), Gomes et al. (213), Adlakha, Johari, and Weintraub (215), Kolokoltsov and Bensoussan (216), Saldi, Basar and Raginsky (217). Also see anonymous games e.g. Jovanovic and Rosenthal (1988) 5 / 32

6 Motivation and background Dynamics and cost Finite horizon The MF MDP model: N players (or agents A i, 1 i N) with states x i t, t Z +, as controlled Markov processes (no coupling) State space: S = [, 1] Interpret state as unfitness Action space: A = {a, a 1 }. a : inaction, a 1 : active control Agents are coupled via the costs (Huang and Ma, CDC 16, CDC 17) Examples of binary action space (action or inaction) maintenance of equipment; network security games; wireless medium access control (channel shared by users; transmission or back off; collisions possible) (flu) vaccination games, etc. 6 / 32

7 Motivation and background Dynamics and cost Finite horizon Binary choice models are widely used in various decision problems Schelling, T. (1973), Hockey helmets, concealed weapons, and daylight saving: a study of binary choices with externalities, J. Conflict Resol. Brock, W. and Durlauf, S. (21), Discrete choice with social interactions, Rev. Econ. Studies. Schelling, T. (26), Micromotives and Macrobehavior Babichenko, Y. (213), Best-reply dynamics in large binary-choice anonymous games, Games Econ. Behavior Lee, L., Li, J. and Lin, X. (214), Binary choice models with social network under heterogeneous rational expectations, Rev. Econ. Statistics 7 / 32

8 Motivation and background Dynamics and cost Finite horizon Dynamics: The controlled transition kernel for xt. i For t and x S, P(xt+1 i B xt i = x, at i = a ) = Q (B x), P(xt+1 i = xt i = x, at i = a 1 ) = 1, Q ( x): stochastic kernel defined for B B(S) (Borel sets). Q ([x, 1] x) = 1. Under inaction a, state gets worse. Transition of xt i is not affected by other at, j j i. The stochastic deterioration is similar to hazard rate modelling in the maintenance literature (Bensoussan and Sehti, 27; Grall et al. 22) 8 / 32

9 Motivation and background Dynamics and cost Finite horizon x i t under inaction. It is getting worse and worse. 9 / 32

10 Motivation and background Dynamics and cost Finite horizon The cost of A i : J i = E T t= ρ t c(x i t, x (N) t, a i t), 1 i N. < T ; ρ (, 1): discount factor. Population average state: x (N) t = 1 N N i=1 x i t. The one stage cost: c(xt, i x (N) t, at) i = R(xt, i x (N) t ) + γ1 {a i t =a 1 } R : unfitness-related cost; γ > : effort cost 1 / 32

11 Motivation and background Dynamics and cost Finite horizon The MFG equation system (MFG Eq [,T ] ): [ V (t, x) = min ρ 1 V (t + 1, y)q (dy x) + R(x, z t ), ρv (t + 1, ) + R(x, z t ) + γ ], t < T V (T, x) = R(x, z T ), z t = Ext, i t T consistency condition in MFG Notation: b s,t = (b s, b s+1,..., b t). Find a solution (ẑ,t, â,t i ) such that {xt i, t T } is generated by {âi t (x), t T } as best response. Fact: Given z,t, the best response is a threshold policy Theorem: i) Under technical conditions, (MFG Eq [,T ] ) has a solution. ii) The resulting set of decentralized strategies for the N players is an ϵ-nash equilibrium. Below we will focus on its stationary version. 11 / 32

12 Action space {, 1}. Competitive MDP model Motivation and background Dynamics and cost Finite horizon Threshold policy: { at i 1 x i = t r xt i < r action a t i i state x t Figure : r =.6 12 / 32

13 Stationary equation system for MFG [ V (x) = min ρ z = 1 1 V (y)q (dy x) + R(x, z), xπ(dx) for probability measure π. ] ρv () + R(x, z) + γ (V, ẑ, ˆπ, â i ) is called a stationary equilibrium with discount if i) the feedback policy â i is the best response with respect to ẑ (in the associated infinite horizon control problem); ii) The distribut n of {x i t, t } under âi converges to the stationary distribution ˆπ; iii) (ẑ, ˆπ) satisfies the second equation. Note: This solution definition is different from that in anonymous games which only deal with a fixed point equation for a measure (describe the continuum population) without considering ergodicity 13 / 32

14 What we want to study? Q1 Existence (based on ergodicity) Q2 Threshold policy exists? Q3 Uniqueness Q4 Under what condition can a pure equilibrium strategy fail to exist? Q5 How does the model parameter affect the solution? Related to comparative statistics Q6 Does the equilibrium have stability, or can this fail? Can the model show oscillatory behaviors (as observed in vaccination situations)? 14 / 32

15 Assumptions: Competitive MDP model (A1) {x i, i 1} are independent random variables taking values in S. (A2) R(x, z) is a continuous function on S S. For each fixed z, R(, z) is strictly increasing. (A3) i) Q ( x) satisfies Q ([x, 1] x) = 1 for any x, and is strictly stochastically increasing; ii) Q ( x) has a positive density for all x < 1. (A4) R(x, ) is increasing for fixed x. (i.e. positive externality) 1 (A5) γ > β max z [R(y, z) R(, z)]q (dy ). Remarks: Montonicity in (A2): cost increases when state is poorer. (A3)-i) means dominance of distributions (A5) Effort cost should not be too low; this prevents zero action threshold; this condition can be refined if more information is known on R (such as sub-modularity). 15 / 32

16 Theorem (existence on Q1, Q2) Assume (A1)-(A5). Then [ V (x) = min ρ z = 1 xπ(dx) 1 V (y)q (dy x) + R(x, z), ] ρv () + R(x, z) + γ has a solution (V, z, π, a i ) for which the best response a i is a threshold policy. Note: a i : x [, 1] {a, a 1 }, is implicitly specified by the first equation On the right hand side, if the first term is smaller, then a i (x) = a (otherwise, a 1 ) 16 / 32

17 Theorem (uniqueness on Q3) In the previous theorem, if we further assume R(x, z) = R 1 (x)r 2 (z) and R 2 > is strictly increasing on S, then [ V (x) = min ρ z = 1 xπ(dx) 1 V (y)q (dy x) + R(x, z), has a unique solution (V, z, π, a i ). ] ρv () + R(x, z) + γ Remark: Monotonicity of R 2 indicates positive externalities (crucial in uniqueness analysis), i.e, a person benefits from the improving behaviour of the population. 17 / 32

18 How to prove the existence theorem? Show ergodicity of the controlled Makov chain {x i t} under an interior threshold policy (so that π in the equation makes sense); the case of threshold 1 is handled separately Further use Brouwer s fixed point theorem to show existence How to prove uniqueness? Use two comparison theorems 18 / 32

19 Theorem (ergodicity, Q1) : For threshold θ (, 1), {xt i,θ, t } is uniformly ergodic with stationary probability distribution π θ, i.e., sup P t (x, ) π θ Kr t x S for some constants K > and r (, 1), where is the total variation norm of signed measures. Proof. Show Doeblin s condition by checking inf P(x 4 = x = x) η >. x S and aperiodicity. Here 4 is the magic number. 19 / 32

20 The regenerative process: x i t under threshold θ =.85 2 / 32

21 Numerics: Recall the MFG solution equation system [ V (x) = min ρ z = 1 Algorithm: xπ(dx) 1 Step 1. Initialize z () V (y)q (dy x) + R(x, z), ] ρv () + R(x, z) + γ Step 2. Given z (k), solve V (k) (x) and associated threshold policy a (k) from the DP Eqn Step 3. Use a (k) to find the mean field z (k+1) via π (k). Step 4. z (k+1) =.98z (k) +.2 z (k+1) (cautious update!) Step 5. Go back to step 2 (until accuracy attained). 21 / 32

22 Competitive MDP model 5 V(x) x iterate Figure : V (x) defined on [, 1] V(x) Figure : Left: V (x) (threshold.49) Right: search of the threshold / 32

23 The comparison theorems use very intuitive ideas. Used to answer Q3. First comparison theorem (with R(x, z) = R 1 (x)r 2 (z)): If z (i.e., decreasing population threat), best response s threshold θ Second comparison theorem: If the threshold θ (, 1), the resulting regenerative process xt i θ s long term mean (since more chance to grow ; lawn, say, under biweekly cut) 23 / 32

24 Method to prove the second comparison theorem: Take the threshold θ (, 1). Set x i,θ =, and the Markov chain xt i,θ evolves under inaction until τ θ : τ θ = inf{t x i,θ t θ}. Reset x i,θ τ θ +1 =, and the Markov chain restarts at τ θ + 1. Denote S k = k. Let S k be defined with θ (θ, 1). Then t= x t i,θ 1 k 1 λ θ := lim xt i,θ = ES τ θ, similarly define λ θ. k k 1 + Eτ θ t= The second comparison theorem: For < θ < θ < 1, then i) ES τθ 1+Eτ θ ES τ θ 1+Eτ θ ii) So λ θ λ θ 24 / 32

25 A useful lemma for proving the second comparison theorem. Let < r < r < 1. Consider a Markov process {Y t, t } with state space [, 1], and transition kernel Q Y ( y) which satisfies Q Y ([y, 1] y) = 1 for any y [, 1], and is stochastically monotone. Suppose Y y < r. Define the stopping time τ = inf{t Y t r} Lemma If Eτ <, then E τ t= Y t < and Facts: E τ t= Y t 1 + Eτ = EY + EY 1 + k=1 E(Y k+11 {Yk <r}) 2 + k=1 P(Y k < r) Apply the lemma to the regenerative process xt i,θ for one cycle. Show that increasing r = θ to r = θ increases the ratio in the lemma. 25 / 32

26 Example (to answer Q4): Assume R(x, z) = x(c + z) where c >. Q ( x) has the same distribution as x + (1 x)ξ where ξ has uniform distribution on [, 1]. Consider all γ >. Denote c = Eξ 2 = 1 4. Then If γ (, ρc 2 ] ( ρ(c+c ) 2, ), there exists a unique pure strategy as a stationary equilibrium with discount If γ ( ρc 2, ρ(c+c ) 2 ], there exists no pure strategy as a stationary equilibrium with discount This result can be extended to the general model for R(x, z) the long run average cost case 26 / 32

27 Q5: Check the dependence of equilibrium solution (z, θ) on γ. This is actually a question on comparative statistics formalized by J. R. Hicks (1939) and P. A. Samuelson (1947) It studies how a change of the model parameters affects the equilibrium Important in the economic literature, game theory and optimization; however, most literature considered static models Example of dynamic models: D. Acemoglu, M. K. Jensen (215) J. Political Econ. (on large dynamic economies) 27 / 32

28 Theorem (monotone comparative statistic, Q5). Suppose both γ 1 and γ 2 (effort cost) satisfy (A5), and γ 1 < γ 2 ; (γ i, z i, θ i ) is solved from the MFG equation system. Then θ 1 < θ 2, z 1 < z 2. Interpretation: Effort being more expensive less willing to act worse average state of the population 28 / 32

29 Q6. Instability of the equilibrium solution can occur Take R(x, z) = x(c + R (z)) where R (z) has a quick leap from a near value to nearly 1 around some value z c Such R is not purely artificial; similar phenomena appear in vaccination models where group risk has sharp decrease toward when the vaccination coverage exceeds a certain critical level Figure : Iteration of threshold in best response w.r.t. mean field; swing back and forth 29 / 32

30 Game with long-run average costs: Define J i (x 1,..., x N, π 1,..., π N ) = lim sup T T 1 1 T E t= c(xt, i x (N) t, at), i where c(xt, i x (N) t, at) i = R(xt, i x (N) t ) + γ1 {a i t =a 1 } The solution equation system of the MFG: { λ + h(x) = min{ 1 h(y)q (dy x) + R(x, z), R(x, z) + γ}, z = 1 xν(dx), where ν is the limiting distribution of the closed-loop state X i t. Idea: Consider discount factor ρ and denote value function V ρ(x). Define h ρ (x) = V ρ (x) V ρ (). Try to get a subsequence of {h ρ } converging to h when ρ 1 under an additional concavity condition on R(, z) for each fixed z. 3 / 32

31 Further possible extensions: Unbounded state space (such as half line) Pairwise coupling in the cost Continuous time modeling (example, compound Poisson process with impulse control; CDC 17, with Zhou); a Levy process with one-sided jumps (such as a subordinator) also fits the monotone state modeling 31 / 32

32 Thank you! 32 / 32

Mean Field Games: Basic theory and generalizations

Mean Field Games: Basic theory and generalizations Mean Field Games: Basic theory and generalizations School of Mathematics and Statistics Carleton University Ottawa, Canada University of Southern California, Los Angeles, Mar 2018 Outline of talk Mean

More information

Mean Field Consumption-Accumulation Optimization with HAR

Mean Field Consumption-Accumulation Optimization with HAR Mean Field Consumption-Accumulation Optimization with HARA Utility School of Mathematics and Statistics Carleton University, Ottawa Workshop on Mean Field Games and Related Topics 2 Padova, September 4-7,

More information

Large-population, dynamical, multi-agent, competitive, and cooperative phenomena occur in a wide range of designed

Large-population, dynamical, multi-agent, competitive, and cooperative phenomena occur in a wide range of designed Definition Mean Field Game (MFG) theory studies the existence of Nash equilibria, together with the individual strategies which generate them, in games involving a large number of agents modeled by controlled

More information

Mean-field equilibrium: An approximation approach for large dynamic games

Mean-field equilibrium: An approximation approach for large dynamic games Mean-field equilibrium: An approximation approach for large dynamic games Ramesh Johari Stanford University Sachin Adlakha, Gabriel Y. Weintraub, Andrea Goldsmith Single agent dynamic control Two agents:

More information

Ergodicity and Non-Ergodicity in Economics

Ergodicity and Non-Ergodicity in Economics Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate

More information

Mean field equilibria of multiarmed bandit games

Mean field equilibria of multiarmed bandit games Mean field equilibria of multiarmed bandit games Ramesh Johari Stanford University Joint work with Ramki Gummadi (Stanford University) and Jia Yuan Yu (IBM Research) A look back: SN 2000 2 Overview What

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio

Mean-Field optimization problems and non-anticipative optimal transport. Beatrice Acciaio Mean-Field optimization problems and non-anticipative optimal transport Beatrice Acciaio London School of Economics based on ongoing projects with J. Backhoff, R. Carmona and P. Wang Robust Methods in

More information

Production and Relative Consumption

Production and Relative Consumption Mean Field Growth Modeling with Cobb-Douglas Production and Relative Consumption School of Mathematics and Statistics Carleton University Ottawa, Canada Mean Field Games and Related Topics III Henri Poincaré

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016

More information

1 Lattices and Tarski s Theorem

1 Lattices and Tarski s Theorem MS&E 336 Lecture 8: Supermodular games Ramesh Johari April 30, 2007 In this lecture, we develop the theory of supermodular games; key references are the papers of Topkis [7], Vives [8], and Milgrom and

More information

Risk-Sensitive and Robust Mean Field Games

Risk-Sensitive and Robust Mean Field Games Risk-Sensitive and Robust Mean Field Games Tamer Başar Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL - 6181 IPAM

More information

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and

More information

1. Introduction. 2. A Simple Model

1. Introduction. 2. A Simple Model . Introduction In the last years, evolutionary-game theory has provided robust solutions to the problem of selection among multiple Nash equilibria in symmetric coordination games (Samuelson, 997). More

More information

Price and Capacity Competition

Price and Capacity Competition Price and Capacity Competition Daron Acemoglu, Kostas Bimpikis, and Asuman Ozdaglar October 9, 2007 Abstract We study the efficiency of oligopoly equilibria in a model where firms compete over capacities

More information

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk ANSAPW University of Queensland 8-11 July, 2013 1 Outline (I) Fluid

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Preliminary Results on Social Learning with Partial Observations

Preliminary Results on Social Learning with Partial Observations Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Evolutionary Game Theory: Overview and Recent Results

Evolutionary Game Theory: Overview and Recent Results Overviews: Evolutionary Game Theory: Overview and Recent Results William H. Sandholm University of Wisconsin nontechnical survey: Evolutionary Game Theory (in Encyclopedia of Complexity and System Science,

More information

NBER WORKING PAPER SERIES PRICE AND CAPACITY COMPETITION. Daron Acemoglu Kostas Bimpikis Asuman Ozdaglar

NBER WORKING PAPER SERIES PRICE AND CAPACITY COMPETITION. Daron Acemoglu Kostas Bimpikis Asuman Ozdaglar NBER WORKING PAPER SERIES PRICE AND CAPACITY COMPETITION Daron Acemoglu Kostas Bimpikis Asuman Ozdaglar Working Paper 12804 http://www.nber.org/papers/w12804 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts

More information

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Equilibria for games with asymmetric information: from guesswork to systematic evaluation

Equilibria for games with asymmetric information: from guesswork to systematic evaluation Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan February 11, 2016 Joint work

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

High-dimensional Problems in Finance and Economics. Thomas M. Mertens High-dimensional Problems in Finance and Economics Thomas M. Mertens NYU Stern Risk Economics Lab April 17, 2012 1 / 78 Motivation Many problems in finance and economics are high dimensional. Dynamic Optimization:

More information

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook

More information

Bayesian Learning in Social Networks

Bayesian Learning in Social Networks Bayesian Learning in Social Networks Asu Ozdaglar Joint work with Daron Acemoglu, Munther Dahleh, Ilan Lobel Department of Electrical Engineering and Computer Science, Department of Economics, Operations

More information

Zero-sum semi-markov games in Borel spaces: discounted and average payoff

Zero-sum semi-markov games in Borel spaces: discounted and average payoff Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

Variational approach to mean field games with density constraints

Variational approach to mean field games with density constraints 1 / 18 Variational approach to mean field games with density constraints Alpár Richárd Mészáros LMO, Université Paris-Sud (based on ongoing joint works with F. Santambrogio, P. Cardaliaguet and F. J. Silva)

More information

Explicit solutions of some Linear-Quadratic Mean Field Games

Explicit solutions of some Linear-Quadratic Mean Field Games Explicit solutions of some Linear-Quadratic Mean Field Games Martino Bardi Department of Pure and Applied Mathematics University of Padua, Italy Men Field Games and Related Topics Rome, May 12-13, 2011

More information

Distributed learning in potential games over large-scale networks

Distributed learning in potential games over large-scale networks Distributed learning in potential games over large-scale networks Fabio Fagnani, DISMA, Politecnico di Torino joint work with Giacomo Como, Lund University Sandro Zampieri, DEI, University of Padova Networking

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

Abstract Dynamic Programming

Abstract Dynamic Programming Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Optimal Control of an Inventory System with Joint Production and Pricing Decisions

Optimal Control of an Inventory System with Joint Production and Pricing Decisions Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer

More information

A.Piunovskiy. University of Liverpool Fluid Approximation to Controlled Markov. Chains with Local Transitions. A.Piunovskiy.

A.Piunovskiy. University of Liverpool Fluid Approximation to Controlled Markov. Chains with Local Transitions. A.Piunovskiy. University of Liverpool piunov@liv.ac.uk The Markov Decision Process under consideration is defined by the following elements X = {0, 1, 2,...} is the state space; A is the action space (Borel); p(z x,

More information

1 Basics of probability theory

1 Basics of probability theory Examples of Stochastic Optimization Problems In this chapter, we will give examples of three types of stochastic optimization problems, that is, optimal stopping, total expected (discounted) cost problem,

More information

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games 6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Asu Ozdaglar MIT March 2, 2010 1 Introduction Outline Review of Supermodular Games Reading: Fudenberg and Tirole, Section 12.3.

More information

Zero-Sum Stochastic Games An algorithmic review

Zero-Sum Stochastic Games An algorithmic review Zero-Sum Stochastic Games An algorithmic review Emmanuel Hyon LIP6/Paris Nanterre with N Yemele and L Perrotin Rosario November 2017 Final Meeting Dygame Dygame Project Amstic Outline 1 Introduction Static

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

Asymptotics of Efficiency Loss in Competitive Market Mechanisms

Asymptotics of Efficiency Loss in Competitive Market Mechanisms Page 1 of 40 Asymptotics of Efficiency Loss in Competitive Market Mechanisms Jia Yuan Yu and Shie Mannor Department of Electrical and Computer Engineering McGill University Montréal, Québec, Canada jia.yu@mail.mcgill.ca

More information

STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES. Toshimasa Maruta 1 and Akira Okada 2

STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES. Toshimasa Maruta 1 and Akira Okada 2 STOCHASTIC STABILITY OF GROUP FORMATION IN COLLECTIVE ACTION GAMES Toshimasa Maruta 1 and Akira Okada 2 December 20, 2001 We present a game theoretic model of voluntary group formation in collective action

More information

Dynamic stochastic game and macroeconomic equilibrium

Dynamic stochastic game and macroeconomic equilibrium Dynamic stochastic game and macroeconomic equilibrium Tianxiao Zheng SAIF 1. Introduction We have studied single agent problems. However, macro-economy consists of a large number of agents including individuals/households,

More information

Risk-Averse Dynamic Optimization. Andrzej Ruszczyński. Research supported by the NSF award CMMI

Risk-Averse Dynamic Optimization. Andrzej Ruszczyński. Research supported by the NSF award CMMI Research supported by the NSF award CMMI-0965689 Outline 1 Risk-Averse Preferences 2 Coherent Risk Measures 3 Dynamic Risk Measurement 4 Markov Risk Measures 5 Risk-Averse Control Problems 6 Solution Methods

More information

Minimum average value-at-risk for finite horizon semi-markov decision processes

Minimum average value-at-risk for finite horizon semi-markov decision processes 12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn

More information

Unrestricted Information Acquisition

Unrestricted Information Acquisition Unrestricted Information Acquisition Tommaso Denti MIT June 7, 2016 Introduction A theory of information acquisition in games Endogenize assumptions on players information Common extra layer of strategic

More information

On continuous time contract theory

On continuous time contract theory Ecole Polytechnique, France Journée de rentrée du CMAP, 3 octobre, 218 Outline 1 2 Semimartingale measures on the canonical space Random horizon 2nd order backward SDEs (Static) Principal-Agent Problem

More information

Lecture notes: Rust (1987) Economics : M. Shum 1

Lecture notes: Rust (1987) Economics : M. Shum 1 Economics 180.672: M. Shum 1 Estimate the parameters of a dynamic optimization problem: when to replace engine of a bus? This is a another practical example of an optimal stopping problem, which is easily

More information

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems

Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Restless Bandit Index Policies for Solving Constrained Sequential Estimation Problems Sofía S. Villar Postdoctoral Fellow Basque Center for Applied Mathemathics (BCAM) Lancaster University November 5th,

More information

process on the hierarchical group

process on the hierarchical group Intertwining of Markov processes and the contact process on the hierarchical group April 27, 2010 Outline Intertwining of Markov processes Outline Intertwining of Markov processes First passage times of

More information

Coordinating over Signals

Coordinating over Signals Coordinating over Signals Jeff S. Shamma Behrouz Touri & Kwang-Ki Kim School of Electrical and Computer Engineering Georgia Institute of Technology ARO MURI Program Review March 18, 2014 Jeff S. Shamma

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

Economics 2010c: Lecture 2 Iterative Methods in Dynamic Programming

Economics 2010c: Lecture 2 Iterative Methods in Dynamic Programming Economics 2010c: Lecture 2 Iterative Methods in Dynamic Programming David Laibson 9/04/2014 Outline: 1. Functional operators 2. Iterative solutions for the Bellman Equation 3. Contraction Mapping Theorem

More information

A mathematical framework for Exact Milestoning

A mathematical framework for Exact Milestoning A mathematical framework for Exact Milestoning David Aristoff (joint work with Juan M. Bello-Rivas and Ron Elber) Colorado State University July 2015 D. Aristoff (Colorado State University) July 2015 1

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

Uniformly Uniformly-ergodic Markov chains and BSDEs

Uniformly Uniformly-ergodic Markov chains and BSDEs Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,

More information

Nonlinear Dynamics between Micromotives and Macrobehavior

Nonlinear Dynamics between Micromotives and Macrobehavior Nonlinear Dynamics between Micromotives and Macrobehavior Saori Iwanaga & kira Namatame Dept. of Computer Science, National Defense cademy, Yokosuka, 239-8686, JPN, E-mail: {g38042, nama}@nda.ac.jp Tel:

More information

Rate of Convergence of Learning in Social Networks

Rate of Convergence of Learning in Social Networks Rate of Convergence of Learning in Social Networks Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar Massachusetts Institute of Technology Cambridge, MA Abstract We study the rate of convergence

More information

Competitive Equilibria in a Comonotone Market

Competitive Equilibria in a Comonotone Market Competitive Equilibria in a Comonotone Market 1/51 Competitive Equilibria in a Comonotone Market Ruodu Wang http://sas.uwaterloo.ca/ wang Department of Statistics and Actuarial Science University of Waterloo

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

The Social Value of Credible Public Information

The Social Value of Credible Public Information The Social Value of Credible Public Information Ercan Karadas NYU September, 2017 Introduction Model Analysis MOTIVATION This paper is motivated by the paper Social Value of Public Information, Morris

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality

More information

Bayes Correlated Equilibrium and Comparing Information Structures

Bayes Correlated Equilibrium and Comparing Information Structures Bayes Correlated Equilibrium and Comparing Information Structures Dirk Bergemann and Stephen Morris Spring 2013: 521 B Introduction game theoretic predictions are very sensitive to "information structure"

More information

Linear Programming Methods

Linear Programming Methods Chapter 11 Linear Programming Methods 1 In this chapter we consider the linear programming approach to dynamic programming. First, Bellman s equation can be reformulated as a linear program whose solution

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

Price and Capacity Competition

Price and Capacity Competition Price and Capacity Competition Daron Acemoglu a, Kostas Bimpikis b Asuman Ozdaglar c a Department of Economics, MIT, Cambridge, MA b Operations Research Center, MIT, Cambridge, MA c Department of Electrical

More information

Mean Field Games on networks

Mean Field Games on networks Mean Field Games on networks Claudio Marchi Università di Padova joint works with: S. Cacace (Rome) and F. Camilli (Rome) C. Marchi (Univ. of Padova) Mean Field Games on networks Roma, June 14 th, 2017

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

Political Economy of Institutions and Development. Lecture 8. Institutional Change and Democratization

Political Economy of Institutions and Development. Lecture 8. Institutional Change and Democratization 14.773 Political Economy of Institutions and Development. Lecture 8. Institutional Change and Democratization Daron Acemoglu MIT March 5, 2013. Daron Acemoglu (MIT) Political Economy Lecture 8 March 5,

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Bayesian Persuasion Online Appendix

Bayesian Persuasion Online Appendix Bayesian Persuasion Online Appendix Emir Kamenica and Matthew Gentzkow University of Chicago June 2010 1 Persuasion mechanisms In this paper we study a particular game where Sender chooses a signal π whose

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Chapter Introduction. A. Bensoussan

Chapter Introduction. A. Bensoussan Chapter 2 EXPLICIT SOLUTIONS OFLINEAR QUADRATIC DIFFERENTIAL GAMES A. Bensoussan International Center for Decision and Risk Analysis University of Texas at Dallas School of Management, P.O. Box 830688

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Markov Chain BSDEs and risk averse networks

Markov Chain BSDEs and risk averse networks Markov Chain BSDEs and risk averse networks Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) 2nd Young Researchers in BSDEs

More information

Robust Comparative Statics of Non-monotone Shocks in Large Aggregative Games

Robust Comparative Statics of Non-monotone Shocks in Large Aggregative Games Robust Comparative Statics of Non-monotone Shocks in Large Aggregative Games Carmen Camacho Takashi Kamihigashi Çağrı Sağlam June 9, 2015 Abstract A policy change that involves redistribution of income

More information

Communication constraints and latency in Networked Control Systems

Communication constraints and latency in Networked Control Systems Communication constraints and latency in Networked Control Systems João P. Hespanha Center for Control Engineering and Computation University of California Santa Barbara In collaboration with Antonio Ortega

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL Adrien Blanchet IAST/TSE/Université Toulouse 1 Capitole Journées SMAI-MODE 216 En collaboration avec G. Carlier & P. Mossay & F. Santambrogio

More information

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class.

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class. Political Economy of Institutions and Development: 14.773 Problem Set 1 Due Date: Thursday, February 23, in class. Answer Questions 1-3. handed in. The other two questions are for practice and are not

More information

Robust Predictions in Games with Incomplete Information

Robust Predictions in Games with Incomplete Information Robust Predictions in Games with Incomplete Information joint with Stephen Morris (Princeton University) November 2010 Payoff Environment in games with incomplete information, the agents are uncertain

More information

Random Access Game. Medium Access Control Design for Wireless Networks 1. Sandip Chakraborty. Department of Computer Science and Engineering,

Random Access Game. Medium Access Control Design for Wireless Networks 1. Sandip Chakraborty. Department of Computer Science and Engineering, Random Access Game Medium Access Control Design for Wireless Networks 1 Sandip Chakraborty Department of Computer Science and Engineering, INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR October 22, 2016 1 Chen

More information

Learning in Mean Field Games

Learning in Mean Field Games Learning in Mean Field Games P. Cardaliaguet Paris-Dauphine Joint work with S. Hadikhanloo (Dauphine) Nonlinear PDEs: Optimal Control, Asymptotic Problems and Mean Field Games Padova, February 25-26, 2016.

More information

LEARNING IN CONCAVE GAMES

LEARNING IN CONCAVE GAMES LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries

More information

Some Background Math Notes on Limsups, Sets, and Convexity

Some Background Math Notes on Limsups, Sets, and Convexity EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges

More information

Ernesto Mordecki 1. Lecture III. PASI - Guanajuato - June 2010

Ernesto Mordecki 1. Lecture III. PASI - Guanajuato - June 2010 Optimal stopping for Hunt and Lévy processes Ernesto Mordecki 1 Lecture III. PASI - Guanajuato - June 2010 1Joint work with Paavo Salminen (Åbo, Finland) 1 Plan of the talk 1. Motivation: from Finance

More information