LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015
Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions
Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions Context/applications: Standard in economics & finance (multi-portfolio optimization, auctions, oligopolies, ) Networking (routing, tolling, network economics, ) Electrical engineering (wireless communications, electricity grids, )
Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions Context/applications: Standard in economics & finance (multi-portfolio optimization, auctions, oligopolies, ) Networking (routing, tolling, network economics, ) Electrical engineering (wireless communications, electricity grids, ) What this talk is about: Distributed learning algorithms that allow players to converge to an equilibrium state.
Basic Definitions A concave game consists of: A finite set of players N = {1,..., N}. A compact, convex set of actions x k X k per player. An individually concave payoff function u k k X k R per player, i.e. u k (x k ; x k ) is concave in x k for all x k l k X l. Each player seeks to maximize his individual payoff. Fine print: Each X k assumed to live in a finite-dimensional ambient space V k R d k. (No infinite dimensionalities in this talk) Each ambient space V k equipped with a norm.
Example 1: Finite (Affine) Games A finite game consists of: A finite set of players N = {1,..., N}. A finite set of actions α k A k per player. Each player s payoff function u k k A k R. In the mixed extension of a finite game, players can play mixed strategies x k (A k ). Corresponding (expected) payoff: u k (x) = α1 A 1 αn A N x 1,α1 x N,αN u k (α 1,..., α N ) The mixed strategy space X k = (A k ) is convex and u k (x k ; x k ) is linear in x k mixed extensions of finite games are concave
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) v 1 A E w 1 D w 2 B v 2 C
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. v 1 A E w 1 D w 2 B v 2 C
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } v 1 A x 1,A E x 1,BE w 1 D x 2,ED w x 2 2,C C B v 2
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested
Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested Under standard assumptions for l e (convex, increasing), u k is concave in x k G(N, X, u) is a concave game
Nash Equilibrium and Payoff Gradients A Nash equilibrium is an action profile x X such that u k (x k ; x k) u k (x k ; x k) for every unilateral deviation x k X k, k N There is no direction that could unilaterally increase a player s payoff.
Nash Equilibrium and Payoff Gradients A Nash equilibrium is an action profile x X such that u k (x k ; x k) u k (x k ; x k) for every unilateral deviation x k X k, k N There is no direction that could unilaterally increase a player s payoff. Alternative characterization: consider the individual payoff gradient of player k: v k (x) = k u k (x) u k (x k ; x k ), (differentiation taken only w.r.t. x k ; the opponents profile x k is kept fixed) Since u k is concave in x k, x is an equilibrium if and only if v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Fine print: x k X k V k treated as primal variables; payoff gradients v k V k treated as duals and assumed Lipschitz.
Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness?
Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness? Theorem (Rosen, 1965) Suppose that the players payoff gradients satisfy the monotonicity property: k λ k v k (x ) v k (x) x k x k < 0 (R) for some λ 0 and for all x x X. Then, the game admits a unique Nash equilibrium.
Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness? Theorem (Rosen, 1965) Suppose that the players payoff gradients satisfy the monotonicity property: k λ k v k (x ) v k (x) x k x k < 0 (R) for some λ 0 and for all x x X. Then, the game admits a unique Nash equilibrium. Rosen (1965) calls this condition diagonal strict concavity. Define the λ-weighted Hessian H(x; λ) = j,k H jk (x; λ) of the game as: H jk (x; λ) = λ j k v j (x) + λ k ( j v k (x)) If H(x; λ) 0 for all x X, the game admits a unique equilibrium.
Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X )
Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process y y + γ v(x), x y,
Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and project: y y + γ v(x), x arg min y x 2, x X
Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and project: y y + γ v(x), x arg max { y x 1 x 2 x 2 2}, X
Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and regularize: y y + γ v(x), x arg max { y x h(x )}, x X where the penalty function (or regularizer) h X R is smooth and strongly convex: h(tx + (1 t)x ) th(x) + (1 t)h(x ) 1 2 Kt(1 t) x x 2 for some K > 0 and for all t 0, x, x X.
Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X
Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α)
Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α) 3. If X = {X X 0, tr(x) 1}, the von Neumann entropy h(x) = tr[x log X] gives Q(Y) = exp(y) 1 + tr[exp(y)]
Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α) 3. If X = {X X 0, tr(x) 1}, the von Neumann entropy h(x) = tr[x log X] gives Q(Y) = exp(y) 1 + tr[exp(y)] etc. Important: if dh(x) as x bd(x ), we say that h is steep. Steep penalty functions induce interior point methods: im Q = rel int X.
Learning via Mirror Descent Multi-agent mirror descent: y k (n + 1) = y k (n) + γ n v k (x(n)) x k (n + 1) = Q k (y k (n + 1)) (MD) where γ n is a variable step-size and the choice map Q k is defined as: Q k (y k ) = arg max x k X k { y k x k h k (x k )} Long history in optimization (Nemirovski, Yudin, Nesterov, Juditski, Beck, Teboulle, ) and, more recently, in machine learning (Shalev-Shwartz, ) Well-understood for single-agent problems
Learning via Mirror Descent Multi-agent mirror descent: y k (n + 1) = y k (n) + γ n v k (x(n)) x k (n + 1) = Q k (y k (n + 1)) (MD) where γ n is a variable step-size and the choice map Q k is defined as: Q k (y k ) = arg max x k X k { y k x k h k (x k )} Long history in optimization (Nemirovski, Yudin, Nesterov, Juditski, Beck, Teboulle, ) and, more recently, in machine learning (Shalev-Shwartz, ) Well-understood for single-agent problems Multi-agent problems (games):???
Variational Stability No uncoupled dynamics can always lead to equilibrium (Hart and Mas-Colell, 2003) must refine convergence target Definition We say that x X is variationally stable if k λ k v k (x) x k x k < 0 for some λ 0 and for all x x in a neighborhood U of x. If U = X, x will be called globally variationally stable.
Variational Stability No uncoupled dynamics can always lead to equilibrium (Hart and Mas-Colell, 2003) must refine convergence target Definition We say that x X is variationally stable if k λ k v k (x) x k x k < 0 for some λ 0 and for all x x in a neighborhood U of x. If U = X, x will be called globally variationally stable. Contrast with Nash equilibrium (which it refines): k λ k v k (x k ; x k) x k x k < 0 Compare with notion of (Taylor) evolutionary stability in multi-population games: k v k (x) x k x k < 0 for all x x near x Global/local ESSs are globally/locally variationally stable. Rosen s condition implies global variational stability.
Strict Nash Equilibrium Recall: a Nash equilibrium is an action profile x X such that v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Definition x is called strict if the above inequality is strict for every nonzero z k TC k (x k ), k N.
Strict Nash Equilibrium Recall: a Nash equilibrium is an action profile x X such that v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Definition x is called strict if the above inequality is strict for every nonzero z k TC k (x k ), k N. Some basics: Generalizes notion of strict equilibrium in finite games (pure, no payoff equalities). If x is a strict equilibrium, then x is variationally stable. If x is a strict equilibrium, then it is a corner of X (i.e. the tangent cone TC(x ) of X at x does not contain any lines).
Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game.
Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game. Theorem Suppose that x X is variationally stable and (MD) is run with the same conditions as above. Then, x is locally attracting. Corollary Strict equilibria are locally attracting.
Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game. Theorem Suppose that x X is variationally stable and (MD) is run with the same conditions as above. Then, x is locally attracting. Corollary Strict equilibria are locally attracting. Proposition Assume that x X is a strict equilibrium with x dom h im Q (i.e. h is not steep at x ). Then, convergence to x occurs after a finite number of iterations.
Global convergence Theorem Suppose that x X is globally variationally stable and the algorithm s step-size sequence γ j satisfies j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. Then, x(n) x for every initialization of (MD). Corollary If Rosen s condition holds, players converge to equilibrium from any initial condition.
Global convergence Theorem Suppose that x X is globally variationally stable and the algorithm s step-size sequence γ j satisfies j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. Then, x(n) x for every initialization of (MD). Corollary If Rosen s condition holds, players converge to equilibrium from any initial condition. Proof idea. x(n) is an asymptotic pseudo-trajectory of the continuous-time dynamics ẏ = v(q(y)) A (global) Lyapunov function is given by the (λ-weighted) Fenchel coupling F(y) = k λ k [h k (x k ) + h k (y k ) y k x k ] Standard stochastic approximation results do not suffice for convergence of x(n). Show that x(n) visits a neighborhood of x infinitely often directly. Use Benaïm s theory of attractors on the flow induced by Q on X.
Learning with Imperfect Information The above analysis relies on perfect observations of the payoff gradients v k (x). In finite games, it is not too hard to deduce u k (α k ; α k ) for every action α k A k given a fixed action α k A k of one s opponents. However, knowing u k (x k ; x k ) is much more demanding (because of mixing).
Learning with Imperfect Information The above analysis relies on perfect observations of the payoff gradients v k (x). In finite games, it is not too hard to deduce u k (α k ; α k ) for every action α k A k given a fixed action α k A k of one s opponents. However, knowing u k (x k ; x k ) is much more demanding (because of mixing). Imperfect feedback: players only have access to noisy estimates of their payoff gradients, i.e. ˆv k (n) = v k (x(n)) + z k (n) Statistical hypotheses for the noise process z k (n): (H1) Unbiasedness: E[z(n + 1) F n ] = 0. (H2) Finite mean squared error: E [ z(n + 1) 2 F n ] <. (H2+) Finite errors: sup n z(n) < (a.s.).
Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =.
Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =. Theorem Suppose that x is globally variationally stable and (H1), (H2) hold. Then, x(n) converges to x (a.s.) for every initialization of (MD).
Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =. Theorem Suppose that x is globally variationally stable and (H1), (H2) hold. Then, x(n) converges to x (a.s.) for every initialization of (MD). Theorem Suppose that x is variationally stable and (H1), (H2) hold. Then, for every ε > 0, there exists a neighborhood U of x such that P(lim n x(n) = x x(0) U) 1 ε, i.e. x attracts all nearby initializations of (MD) with high probability. Under (H2+), the above also holds for ε = 0.
Applications to finite games Suppose that players play repeatedly a finite game G G(N, A, u): 1. At stage n + 1, each player selects an action α k (n + 1) A k based on a mixed strategy x k (n) X k. 2. Players estimate (noisily) the payoff of each of their actions: ˆv kα (n + 1) = u k (α; α k (n + 1)) + z kα (n + 1) α A k 3. Players update their mixed strategies using (MD) and the process repeats.
Applications to finite games Suppose that players play repeatedly a finite game G G(N, A, u): 1. At stage n + 1, each player selects an action α k (n + 1) A k based on a mixed strategy x k (n) X k. 2. Players estimate (noisily) the payoff of each of their actions: ˆv kα (n + 1) = u k (α; α k (n + 1)) + z kα (n + 1) α A k 3. Players update their mixed strategies using (MD) and the process repeats. Corollary With assumptions as above, strict equilibria are locally attracting with high probability.
Convergence in a Potential Game 1.0 n = 1 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 3 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 5 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 8 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 10 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 20 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Convergence in a Potential Game 1.0 n = 50 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0
Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate.
Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling).
Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling). Observations of realized payoffs only (single call to u k instead of v k )? Standard estimators often fail because of infinite variance. Not a problem in online learning maybe in multi-agent case as well? Two-time-scale stochastic approximation can help (at the cost of convergence speed).
Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling). Observations of realized payoffs only (single call to u k instead of v k )? Standard estimators often fail because of infinite variance. Not a problem in online learning maybe in multi-agent case as well? Two-time-scale stochastic approximation can help (at the cost of convergence speed).