LEARNING IN CONCAVE GAMES - PDF Free Download

LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015

Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions

Motivation and Preliminaries Learning Perspectives Context and motivation Concave games: finitely many players continuous action spaces individually concave payoff functions Context/applications: Standard in economics & finance (multi-portfolio optimization, auctions, oligopolies, ) Networking (routing, tolling, network economics, ) Electrical engineering (wireless communications, electricity grids, )

Basic Definitions A concave game consists of: A finite set of players N = {1,..., N}. A compact, convex set of actions x k X k per player. An individually concave payoff function u k k X k R per player, i.e. u k (x k ; x k ) is concave in x k for all x k l k X l. Each player seeks to maximize his individual payoff. Fine print: Each X k assumed to live in a finite-dimensional ambient space V k R d k. (No infinite dimensionalities in this talk) Each ambient space V k equipped with a norm.

Example 1: Finite (Affine) Games A finite game consists of: A finite set of players N = {1,..., N}. A finite set of actions α k A k per player. Each player s payoff function u k k A k R. In the mixed extension of a finite game, players can play mixed strategies x k (A k ). Corresponding (expected) payoff: u k (x) = α1 A 1 αn A N x 1,α1 x N,αN u k (α 1,..., α N ) The mixed strategy space X k = (A k ) is convex and u k (x k ; x k ) is linear in x k mixed extensions of finite games are concave

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) v 1 A E w 1 D w 2 B v 2 C

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested

Example 2: Routing Consider the following model of Internet congestion: Origin nodes (v k ) generate traffic that must be routed to intended destination nodes (w k ) Set of paths A k joining v k w k. Actions: traffic distributions over different paths X k = {x k x kα 0 and α Ak x kα = ρ k } Path latency: l kα (x) r α l r(y r) where y e = k α e x kα is the total load on link e and l e (y e ) is the induced delay. Payoff: u k (x) = α Ak x kα l kα (x). v 1 A x 1,A E x 1,BE B w 1 v 2 D x 2,ED w x 2 2,C C Congested Under standard assumptions for l e (convex, increasing), u k is concave in x k G(N, X, u) is a concave game

Nash Equilibrium and Payoff Gradients A Nash equilibrium is an action profile x X such that u k (x k ; x k) u k (x k ; x k) for every unilateral deviation x k X k, k N There is no direction that could unilaterally increase a player s payoff. Alternative characterization: consider the individual payoff gradient of player k: v k (x) = k u k (x) u k (x k ; x k ), (differentiation taken only w.r.t. x k ; the opponents profile x k is kept fixed) Since u k is concave in x k, x is an equilibrium if and only if v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Fine print: x k X k V k treated as primal variables; payoff gradients v k V k treated as duals and assumed Lipschitz.

Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness?

Equilibrium Existence and Uniqueness Every concave game admits a Nash equilibrium (Debreu, 1952; Rosen, 1965). What about uniqueness? Theorem (Rosen, 1965) Suppose that the players payoff gradients satisfy the monotonicity property: k λ k v k (x ) v k (x) x k x k < 0 (R) for some λ 0 and for all x x X. Then, the game admits a unique Nash equilibrium. Rosen (1965) calls this condition diagonal strict concavity. Define the λ-weighted Hessian H(x; λ) = j,k H jk (x; λ) of the game as: H jk (x; λ) = λ j k v j (x) + λ k ( j v k (x)) If H(x; λ) 0 for all x X, the game admits a unique equilibrium.

Learning via Payoff Gradient Ascent Pavlov s dog reaction to improving one s payoffs: ascend the payoff gradient x x + γ v(x) where γ is a (possibly variable) step-size parameter. Problem: must respect players action constraints (x X ) To do that, rewrite the gradient ascent process and regularize: y y + γ v(x), x arg max { y x h(x )}, x X where the penalty function (or regularizer) h X R is smooth and strongly convex: h(tx + (1 t)x ) th(x) + (1 t)h(x ) 1 2 Kt(1 t) x x 2 for some K > 0 and for all t 0, x, x X.

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X

Examples 1. The quadratic penalty h(x) = 1 2 α xα 2 = 1 2 x 2 2 gives the Euclidean projection Π(y) = arg max{ y x 1 2 x 2 2 } = arg min y x 2 2 x X x X 2. If X = d, the (negative) Gibbs entropy h(x) = α x α log x α gives the logit map G(y) = (exp(y1),..., exp(y d)) d α=1 exp(y α) 3. If X = {X X 0, tr(x) 1}, the von Neumann entropy h(x) = tr[x log X] gives Q(Y) = exp(y) 1 + tr[exp(y)]

Learning via Mirror Descent Multi-agent mirror descent: y k (n + 1) = y k (n) + γ n v k (x(n)) x k (n + 1) = Q k (y k (n + 1)) (MD) where γ n is a variable step-size and the choice map Q k is defined as: Q k (y k ) = arg max x k X k { y k x k h k (x k )} Long history in optimization (Nemirovski, Yudin, Nesterov, Juditski, Beck, Teboulle, ) and, more recently, in machine learning (Shalev-Shwartz, ) Well-understood for single-agent problems

Variational Stability No uncoupled dynamics can always lead to equilibrium (Hart and Mas-Colell, 2003) must refine convergence target Definition We say that x X is variationally stable if k λ k v k (x) x k x k < 0 for some λ 0 and for all x x in a neighborhood U of x. If U = X, x will be called globally variationally stable. Contrast with Nash equilibrium (which it refines): k λ k v k (x k ; x k) x k x k < 0 Compare with notion of (Taylor) evolutionary stability in multi-population games: k v k (x) x k x k < 0 for all x x near x Global/local ESSs are globally/locally variationally stable. Rosen s condition implies global variational stability.

Strict Nash Equilibrium Recall: a Nash equilibrium is an action profile x X such that v k (x ) z k 0 for every tangent vector z k TC k (x k ), k N. Definition x is called strict if the above inequality is strict for every nonzero z k TC k (x k ), k N. Some basics: Generalizes notion of strict equilibrium in finite games (pure, no payoff equalities). If x is a strict equilibrium, then x is variationally stable. If x is a strict equilibrium, then it is a corner of X (i.e. the tangent cone TC(x ) of X at x does not contain any lines).

Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game.

Local convergence Proposition Suppose that (MD) is run with a small enough step-size γ n such that j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. If x(n) x, then x is a Nash equilibrium of the game. Theorem Suppose that x X is variationally stable and (MD) is run with the same conditions as above. Then, x is locally attracting. Corollary Strict equilibria are locally attracting. Proposition Assume that x X is a strict equilibrium with x dom h im Q (i.e. h is not steep at x ). Then, convergence to x occurs after a finite number of iterations.

Global convergence Theorem Suppose that x X is globally variationally stable and the algorithm s step-size sequence γ j satisfies j=1 γ j = and n j=1 γ 2 j / n j=1 γ j 0. Then, x(n) x for every initialization of (MD). Corollary If Rosen s condition holds, players converge to equilibrium from any initial condition. Proof idea. x(n) is an asymptotic pseudo-trajectory of the continuous-time dynamics ẏ = v(q(y)) A (global) Lyapunov function is given by the (λ-weighted) Fenchel coupling F(y) = k λ k [h k (x k ) + h k (y k ) y k x k ] Standard stochastic approximation results do not suffice for convergence of x(n). Show that x(n) visits a neighborhood of x infinitely often directly. Use Benaïm s theory of attractors on the flow induced by Q on X.

Learning with Imperfect Information The above analysis relies on perfect observations of the payoff gradients v k (x). In finite games, it is not too hard to deduce u k (α k ; α k ) for every action α k A k given a fixed action α k A k of one s opponents. However, knowing u k (x k ; x k ) is much more demanding (because of mixing). Imperfect feedback: players only have access to noisy estimates of their payoff gradients, i.e. ˆv k (n) = v k (x(n)) + z k (n) Statistical hypotheses for the noise process z k (n): (H1) Unbiasedness: E[z(n + 1) F n ] = 0. (H2) Finite mean squared error: E [ z(n + 1) 2 F n ] <. (H2+) Finite errors: sup n z(n) < (a.s.).

Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =.

Convergence Analysis Run (MD) with steep penalty functions and a small enough step-size γ n such that n=1 γ 2 n < n=1 γ n =. Theorem Suppose that x is globally variationally stable and (H1), (H2) hold. Then, x(n) converges to x (a.s.) for every initialization of (MD). Theorem Suppose that x is variationally stable and (H1), (H2) hold. Then, for every ε > 0, there exists a neighborhood U of x such that P(lim n x(n) = x x(0) U) 1 ε, i.e. x attracts all nearby initializations of (MD) with high probability. Under (H2+), the above also holds for ε = 0.

Applications to finite games Suppose that players play repeatedly a finite game G G(N, A, u): 1. At stage n + 1, each player selects an action α k (n + 1) A k based on a mixed strategy x k (n) X k. 2. Players estimate (noisily) the payoff of each of their actions: ˆv kα (n + 1) = u k (α; α k (n + 1)) + z kα (n + 1) α A k 3. Players update their mixed strategies using (MD) and the process repeats.

Convergence in a Potential Game 1.0 n = 1 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 3 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 5 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 8 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 10 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 20 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Convergence in a Potential Game 1.0 n = 50 3, 1 0, 0 0.8 0.6 0.4 0.2 0, 0 1, 3 0.0 0.0 0.2 0.4 0.6 0.8 1.0

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate.

Perspectives Rates of convergence? Doable for empirical frequencies of play. Less so for last iterate. Coupled constraints on the players action sets? Possible, but lose distributedness (because of constraint coupling). Observations of realized payoffs only (single call to u k instead of v k )? Standard estimators often fail because of infinite variance. Not a problem in online learning maybe in multi-agent case as well? Two-time-scale stochastic approximation can help (at the cost of convergence speed).