LEARNING IN CONCAVE GAMES

Similar documents
Distributed Learning based on Entropy-Driven Game Dynamics

Inertial Game Dynamics

Convergence to Nash equilibrium in continuous games with noisy first-order feedback

Near-Potential Games: Geometry and Dynamics

Mirror Descent Learning in Continuous Games

Near-Potential Games: Geometry and Dynamics

arxiv: v2 [math.oc] 16 Jan 2018

Pairwise Comparison Dynamics for Games with Continuous Strategy Space

Stochastic Mirror Descent in Variationally Coherent Optimization Problems

Population Games and Evolutionary Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Evolutionary Game Theory: Overview and Recent Results

Survival of Dominated Strategies under Evolutionary Dynamics. Josef Hofbauer University of Vienna. William H. Sandholm University of Wisconsin

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

Online Convex Optimization

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS

Population Games and Evolutionary Dynamics

Spatial Economics and Potential Games

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Adaptive Online Gradient Descent

1 Directional Derivatives and Differentiability

6.207/14.15: Networks Lecture 11: Introduction to Game Theory 3

Constrained Optimization and Lagrangian Duality

6. Proximal gradient method

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

The FTRL Algorithm with Strongly Convex Regularizers

Noncooperative Games, Couplings Constraints, and Partial Effi ciency

1 Lattices and Tarski s Theorem

Population Dynamics Approach for Resource Allocation Problems. Ashkan Pashaie

Convex Analysis and Economic Theory AY Elementary properties of convex functions

A Generic Bound on Cycles in Two-Player Games

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Unified Convergence Proofs of Continuous-time Fictitious Play

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

6. Proximal gradient method

Local Stability of Strict Equilibria under Evolutionary Game Dynamics

Proximal methods. S. Villa. October 7, 2014

Big Data Analytics: Optimization and Randomization

Game Theory and its Applications to Networks - Part I: Strict Competition

Evolution & Learning in Games

Game Theory and Control

How hard is this function to optimize?

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Accelerating Stochastic Optimization

Bayesian Persuasion Online Appendix

arxiv: v1 [math.oc] 7 Dec 2018

Gains in evolutionary dynamics. Dai ZUSAI

Informed Principal in Private-Value Environments

Mean-field equilibrium: An approximation approach for large dynamic games

Proximal and First-Order Methods for Convex Optimization

Stable Games and their Dynamics

Boundary Behavior of Excess Demand Functions without the Strong Monotonicity Assumption

Multimarket Oligopolies with Restricted Market Access

An Optimal Affine Invariant Smooth Minimization Algorithm.

Variational approach to mean field games with density constraints

An inexact subgradient algorithm for Equilibrium Problems

Correlated Equilibria: Rationality and Dynamics

Chapter 9. Mixed Extensions. 9.1 Mixed strategies

Multi-Agent Learning with Policy Prediction

OSNR Optimization in Optical Networks: Extension for Capacity Constraints

Hessian Riemannian Gradient Flows in Convex Programming

Optimization and Optimal Control in Banach Spaces

Stochastic Proximal Gradient Algorithm

Accelerated Proximal Gradient Methods for Convex Optimization

Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks

Online Learning and Online Convex Optimization

4. Opponent Forecasting in Repeated Games

Random Access Game. Medium Access Control Design for Wireless Networks 1. Sandip Chakraborty. Department of Computer Science and Engineering,

Zero sum games Proving the vn theorem. Zero sum games. Roberto Lucchetti. Politecnico di Milano

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

CYCLES IN ADVERSARIAL REGULARIZED LEARNING

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Least squares under convex constraint

Introduction to Machine Learning (67577) Lecture 7

Subgradient Method. Ryan Tibshirani Convex Optimization

Lecture 7 Monotonicity. September 21, 2008

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

The discrete-time second-best day-to-day dynamic pricing scheme

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

10-725/36-725: Convex Optimization Prerequisite Topics

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Generalized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Algorithms for Nonsmooth Optimization

Anisotropic congested transport

Optimality Conditions for Nonsmooth Convex Optimization

A Unified Approach to Proximal Algorithms using Bregman Distance

Applied Mathematics Letters

Key words. saddle-point dynamics, asymptotic convergence, convex-concave functions, proximal calculus, center manifold theory, nonsmooth dynamics

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Stochastic Evolutionary Game Dynamics: Foundations, Deterministic Approximation, and Equilibrium Selection

An Online Convex Optimization Approach to Blackwell s Approachability

Transcription:

LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015

Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions

Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions Context/applications: Standard in economics & finance (multi-portfolio optimization, auctions, oligopolies, ) Networking (routing, tolling, network economics, ) Electrical engineering (wireless communications, electricity grids, )

Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions Context/applications: Standard in economics & finance (multi-portfolio optimization, auctions, oligopolies, ) Networking (routing, tolling, network economics, ) Electrical engineering (wireless communications, electricity grids, ) What this talk is about: Distributed learning algorithms that allow players to converge to an equilibrium state.

Basic Definitions A concave game consists of: A finite set of players N = {1,..., N}. A compact, convex set of actions x k X k per player. An individually concave payoff function u k k X k R per player, i.e. u k (x k ; x k ) is concave in x k for all x k l k X l. Each player seeks to maximize his individual payoff. Fine print: Each X k assumed to live in a finite-dimensional ambient space V k R d k. (No infinite dimensionalities in this talk) Each ambient space V k equipped with a norm.

Example 1: Finite (Affine) Games A finite game consists of: A finite set of players N = {1,..., N}. A finite set of actions α k A k per player. Each player s payoff function u k k A k R. In the mixed extension of a finite game, players can play mixed strategies x k (A k ). Corresponding (expected) payoff: u k (x) = α1 A 1 αn A N x 1,α1 x N,αN u k (α 1,..., α N ) The mixed strategy space X k = (A k ) is convex and u k (x k ; x k ) is linear in x k mixed extensions of finite games are concave

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) v 1 A E w 1 D w 2 B v 2 C

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. v 1 A E w 1 D w 2 B v 2 C

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } v 1 A x 1,A E x 1,BE w 1 D x 2,ED w x 2 2,C C B v 2

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested Under standard assumptions for l e (convex, increasing), u k is concave in x k G(N, X, u) is a concave game

Nash Equilibrium and Payoff Gradients A Nash equilibrium is an action profile x X such that u k (x k ; x k) u k (x k ; x k) for every unilateral deviation x k X k, k N There is no direction that could unilaterally increase a player s payoff.

Nash Equilibrium and Payoff Gradients A Nash equilibrium is an action profile x X such that u k (x k ; x k) u k (x k ; x k) for every unilateral deviation x k X k, k N There is no direction that could unilaterally increase a player s payoff. Alternative characterization: consider the individual payoff gradient of player k: v k (x) = k u k (x) u k (x k ; x k ), (differentiation taken only w.r.t. x k ; the opponents profile x k is kept fixed) Since u k is concave in x k, x is an equilibrium if and only if v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Fine print: x k X k V k treated as primal variables; payoff gradients v k V k treated as duals and assumed Lipschitz.

Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness?

Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness? Theorem (Rosen, 1965) Suppose that the players payoff gradients satisfy the monotonicity property: k λ k v k (x ) v k (x) x k x k < 0 (R) for some λ 0 and for all x x X. Then, the game admits a unique Nash equilibrium.

Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness? Theorem (Rosen, 1965) Suppose that the players payoff gradients satisfy the monotonicity property: k λ k v k (x ) v k (x) x k x k < 0 (R) for some λ 0 and for all x x X. Then, the game admits a unique Nash equilibrium. Rosen (1965) calls this condition diagonal strict concavity. Define the λ-weighted Hessian H(x; λ) = j,k H jk (x; λ) of the game as: H jk (x; λ) = λ j k v j (x) + λ k ( j v k (x)) If H(x; λ) 0 for all x X, the game admits a unique equilibrium.

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X )

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process y y + γ v(x), x y,

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and project: y y + γ v(x), x arg min y x 2, x X

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and project: y y + γ v(x), x arg max { y x 1 x 2 x 2 2}, X

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and regularize: y y + γ v(x), x arg max { y x h(x )}, x X where the penalty function (or regularizer) h X R is smooth and strongly convex: h(tx + (1 t)x ) th(x) + (1 t)h(x ) 1 2 Kt(1 t) x x 2 for some K > 0 and for all t 0, x, x X.

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α)

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α) 3. If X = {X X 0, tr(x) 1}, the von Neumann entropy h(x) = tr[x log X] gives Q(Y) = exp(y) 1 + tr[exp(y)]

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α) 3. If X = {X X 0, tr(x) 1}, the von Neumann entropy h(x) = tr[x log X] gives Q(Y) = exp(y) 1 + tr[exp(y)] etc. Important: if dh(x) as x bd(x ), we say that h is steep. Steep penalty functions induce interior point methods: im Q = rel int X.

Learning via Mirror Descent Multi-agent mirror descent: y k (n + 1) = y k (n) + γ n v k (x(n)) x k (n + 1) = Q k (y k (n + 1)) (MD) where γ n is a variable step-size and the choice map Q k is defined as: Q k (y k ) = arg max x k X k { y k x k h k (x k )} Long history in optimization (Nemirovski, Yudin, Nesterov, Juditski, Beck, Teboulle, ) and, more recently, in machine learning (Shalev-Shwartz, ) Well-understood for single-agent problems

Learning via Mirror Descent Multi-agent mirror descent: y k (n + 1) = y k (n) + γ n v k (x(n)) x k (n + 1) = Q k (y k (n + 1)) (MD) where γ n is a variable step-size and the choice map Q k is defined as: Q k (y k ) = arg max x k X k { y k x k h k (x k )} Long history in optimization (Nemirovski, Yudin, Nesterov, Juditski, Beck, Teboulle, ) and, more recently, in machine learning (Shalev-Shwartz, ) Well-understood for single-agent problems Multi-agent problems (games):???

Variational Stability No uncoupled dynamics can always lead to equilibrium (Hart and Mas-Colell, 2003) must refine convergence target Definition We say that x X is variationally stable if k λ k v k (x) x k x k < 0 for some λ 0 and for all x x in a neighborhood U of x. If U = X, x will be called globally variationally stable.

Variational Stability No uncoupled dynamics can always lead to equilibrium (Hart and Mas-Colell, 2003) must refine convergence target Definition We say that x X is variationally stable if k λ k v k (x) x k x k < 0 for some λ 0 and for all x x in a neighborhood U of x. If U = X, x will be called globally variationally stable. Contrast with Nash equilibrium (which it refines): k λ k v k (x k ; x k) x k x k < 0 Compare with notion of (Taylor) evolutionary stability in multi-population games: k v k (x) x k x k < 0 for all x x near x Global/local ESSs are globally/locally variationally stable. Rosen s condition implies global variational stability.

Strict Nash Equilibrium Recall: a Nash equilibrium is an action profile x X such that v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Definition x is called strict if the above inequality is strict for every nonzero z k TC k (x k ), k N.

Strict Nash Equilibrium Recall: a Nash equilibrium is an action profile x X such that v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Definition x is called strict if the above inequality is strict for every nonzero z k TC k (x k ), k N. Some basics: Generalizes notion of strict equilibrium in finite games (pure, no payoff equalities). If x is a strict equilibrium, then x is variationally stable. If x is a strict equilibrium, then it is a corner of X (i.e. the tangent cone TC(x ) of X at x does not contain any lines).

Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game.

Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game. Theorem Suppose that x X is variationally stable and (MD) is run with the same conditions as above. Then, x is locally attracting. Corollary Strict equilibria are locally attracting.

Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game. Theorem Suppose that x X is variationally stable and (MD) is run with the same conditions as above. Then, x is locally attracting. Corollary Strict equilibria are locally attracting. Proposition Assume that x X is a strict equilibrium with x dom h im Q (i.e. h is not steep at x ). Then, convergence to x occurs after a finite number of iterations.

Global convergence Theorem Suppose that x X is globally variationally stable and the algorithm s step-size sequence γ j satisfies j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. Then, x(n) x for every initialization of (MD). Corollary If Rosen s condition holds, players converge to equilibrium from any initial condition.

Global convergence Theorem Suppose that x X is globally variationally stable and the algorithm s step-size sequence γ j satisfies j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. Then, x(n) x for every initialization of (MD). Corollary If Rosen s condition holds, players converge to equilibrium from any initial condition. Proof idea. x(n) is an asymptotic pseudo-trajectory of the continuous-time dynamics ẏ = v(q(y)) A (global) Lyapunov function is given by the (λ-weighted) Fenchel coupling F(y) = k λ k [h k (x k ) + h k (y k ) y k x k ] Standard stochastic approximation results do not suffice for convergence of x(n). Show that x(n) visits a neighborhood of x infinitely often directly. Use Benaïm s theory of attractors on the flow induced by Q on X.

Learning with Imperfect Information The above analysis relies on perfect observations of the payoff gradients v k (x). In finite games, it is not too hard to deduce u k (α k ; α k ) for every action α k A k given a fixed action α k A k of one s opponents. However, knowing u k (x k ; x k ) is much more demanding (because of mixing).

Learning with Imperfect Information The above analysis relies on perfect observations of the payoff gradients v k (x). In finite games, it is not too hard to deduce u k (α k ; α k ) for every action α k A k given a fixed action α k A k of one s opponents. However, knowing u k (x k ; x k ) is much more demanding (because of mixing). Imperfect feedback: players only have access to noisy estimates of their payoff gradients, i.e. ˆv k (n) = v k (x(n)) + z k (n) Statistical hypotheses for the noise process z k (n): (H1) Unbiasedness: E[z(n + 1) F n ] = 0. (H2) Finite mean squared error: E [ z(n + 1) 2 F n ] <. (H2+) Finite errors: sup n z(n) < (a.s.).

Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =.

Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =. Theorem Suppose that x is globally variationally stable and (H1), (H2) hold. Then, x(n) converges to x (a.s.) for every initialization of (MD).

Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =. Theorem Suppose that x is globally variationally stable and (H1), (H2) hold. Then, x(n) converges to x (a.s.) for every initialization of (MD). Theorem Suppose that x is variationally stable and (H1), (H2) hold. Then, for every ε > 0, there exists a neighborhood U of x such that P(lim n x(n) = x x(0) U) 1 ε, i.e. x attracts all nearby initializations of (MD) with high probability. Under (H2+), the above also holds for ε = 0.

Applications to finite games Suppose that players play repeatedly a finite game G G(N, A, u): 1. At stage n + 1, each player selects an action α k (n + 1) A k based on a mixed strategy x k (n) X k. 2. Players estimate (noisily) the payoff of each of their actions: ˆv kα (n + 1) = u k (α; α k (n + 1)) + z kα (n + 1) α A k 3. Players update their mixed strategies using (MD) and the process repeats.

Applications to finite games Suppose that players play repeatedly a finite game G G(N, A, u): 1. At stage n + 1, each player selects an action α k (n + 1) A k based on a mixed strategy x k (n) X k. 2. Players estimate (noisily) the payoff of each of their actions: ˆv kα (n + 1) = u k (α; α k (n + 1)) + z kα (n + 1) α A k 3. Players update their mixed strategies using (MD) and the process repeats. Corollary With assumptions as above, strict equilibria are locally attracting with high probability.

Convergence in a Potential Game 1.0 n = 1 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 3 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 5 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 8 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 10 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 20 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 50 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate.

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling).

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling). Observations of realized payoffs only (single call to u k instead of v k )? Standard estimators often fail because of infinite variance. Not a problem in online learning maybe in multi-agent case as well? Two-time-scale stochastic approximation can help (at the cost of convergence speed).

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling). Observations of realized payoffs only (single call to u k instead of v k )? Standard estimators often fail because of infinite variance. Not a problem in online learning maybe in multi-agent case as well? Two-time-scale stochastic approximation can help (at the cost of convergence speed).