Distributed Learning in Routing Games

Size: px

Start display at page:

Download "Distributed Learning in Routing Games"

Clement Gardner
5 years ago
Views:

1 Distributed Learning in Routing Games Convergence, Estimation of Player Dynamics, and Control Walid Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley IPAM Workshop on Decision Support for Traffic November 18, /38

2 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. 2/38

3 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). 2/38

4 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38

5 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38

6 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. 3/38

7 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. x (t) X Convergence rates? 3/38

8 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. Convergence rates? Robust to stochastic perturbations. Observation noise (Bandit feedback) x (t) X 3/38

9 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38

10 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38

11 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution Environment learning algorithm ( ) = u x (t), l (t) x (t+1) outcome l (t) Agent k Figure: Sequential decision problem. 4/38

12 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution learning algorithm ( ) = u x (t), l (t) x (t+1) Environment Other agents Agent k outcome l Ak (x (t) A 1,..., x (t) A K ) Figure: Sequential decision problem. Loss of agent k affected by strategies of other agents. Does not know this function, only observes its value. Write x (t) = (x (t) A 1,..., x (t) A K ). 4/38

13 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38

14 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38

15 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 6/38

16 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 6/38

17 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 6/38

18 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 6/38

19 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 Main problem Define class of dynamics C such that u k C k x (t) X 6/38

20 A brief review Discrete time: Hannan consistency: [10] Hedge algorithm for two-player games: [9] Regret based algorithms: [11] Online learning in games: [7] Potential games: [19] Continuous-time: [10]James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, 1957 [9]Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, 1999 [11]Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, 2001 [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [19]Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , /38

21 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 7/38

22 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of xa k have minimal loss. 8/38

23 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of x have minimal loss. Population k ] [ x (τ) A1 E Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ [ˆlp(x (τ) ) ] E path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions and path losses 8/38

24 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K 9/38

25 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K Nash condition first order optimality x, l(x ), x x 0 x, f (x ), x x 0 x x f(x ) = l(x ) 9/38

26 Regret analysis Technique 1: Regret analysis 10/38

27 Regret analysis Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Online optimality condition. Sublinear if lim sup t R (t) t 0. Convergence of averages [ ] k, R (t) is sublinear x (t) X x (t) = 1 t t τ=1 x (τ). proof 10/38

28 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 Population 2 x (τ) A1 x (τ) p path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions More 11/38

29 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses More 11/38

30 Stochastic approximation Technique 2: Stochastic approximation 12/38

31 Stochastic approximation Idea: View the learning dynamics as a discretization of an ODE. Study convergence of ODE. Relate convergence of discrete algorithm to convergence of ODE. 0 η 1 η 1 + η 2... Figure: Underlying continuous time 12/38

32 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

33 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

34 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

35 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

36 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

37 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. Log-linear learning [5], [18] [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

38 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38

39 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) Theorem: [8] Every solution of the ODE (1) converges to the set of its stationary points. [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38

40 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

41 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

42 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. (U (t) ) t 1 perturbations that satisfy for all T > 0, lim τ1 max τ2 : τ 2 t=τ η 1 t <T τ 2 t=τ 1 η tu (t+1) = 0 (a sufficient condition is that q 2: sup τ E U (τ) q < and τ η1+ q 2 τ < ) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

43 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38

44 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. However, No convergence rates. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38

45 Stochastic convex optimization Technique 3: (Stochastic) convex optimization 17/38

46 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). 17/38

47 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. 17/38

48 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. Here: Class of distributed optimization methods: stochastic mirror descent. 17/38

49 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

50 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) f (x (t) ) l (t), x 3: x (t+1) = arg min x X 4: end for + 1 η t D ψ (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

51 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) 3: x (t+1) 4: end for Ak f (x (t) ) = arg min l (t) A x X k, x Ak + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

52 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 SMD Method with learning rates (η t) 1: for t N do 2: observe 3: x (t+1) 4: end for ˆl (t) [ˆl(t) with E = arg min x X Ak ˆl(t), x ] F t 1 Ak f (x (t) ) + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

53 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

54 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and η 2 t < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

55 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

56 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < Deterministic version: If d τ+1 d τ a τ +b τ, and b τ <, then (d τ ) converges. [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

57 Convergence [ ] To show convergence E f (x (t) ) f, generalize the technique of Shamir et al. [25] (for SGD, α = 1 2 ). Convergence of Distributed Stochastic Mirror Descent For η k t = θ k t α k, α k (0, 1), [ ] E f (x (t) ) f = O ( k ) log t t min(α k,1 α k ) Non-smooth, non-strongly convex. More details [25]Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, 2013 [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38

58 Summary Regret analysis: convergence of x (t) Stochastic approximation: almost sure convergence of x (t) Stochastic [ ] convex optimization: [ almost ] sure convergence, E f (x (t) ) f, E D ψ (x, x (t) ) 0, convergence rates. 21/38

59 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 22/38

60 Application to the routing game Figure: Example with strongly convex potential. Centered Gaussian noise on edges. Population 1: Hedge with ηt 1 = t 1 Population 2: Hedge with ηt 2 = t 1 23/38

61 Routing game with strongly convex potential Population 1 x (τ) A Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 x (τ) p path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ ˆlp(x (τ) ) path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ Figure: Population distributions and noisy path losses 24/38

62 Routing game with strongly convex potential η 1 t = t 1, η 2 t = t 1 E [ DKL(x, x (τ) ) ] Figure: Distance to equilibrium. For ηt k = θ k l f t α, α k k (0, 1], E [ D ψ (x, x (t) ) ] = O( k t α k ) τ 25/38

63 Routing game with weakly convex potential Figure: A weakly convex example. 26/38

64 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t f(x (τ) ) f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

65 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t.4 E [ f(x (τ) )] ] f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

66 Routing game with weakly convex potential E [ f(x (τ) )] ] f η 1 t = t.3, η 2 t = t.4 η 1 t = t.5, η 2 t = t τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

67 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 28/38

68 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). Figure: Interface for the routing game experiment. 29/38

69 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). ( l (t) (t 1) 1, l 1,... ) x (t) 1 ( l (t) (t 1) 2, l 2,... ) x (t) 2 x (t) Server ( l (t) (t 1) 3, l 3,... ) x (t) 3 Figure: Interface for the routing game experiment. 29/38

70 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

71 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

72 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) Then d(η) = D KL ( x (t+1), x (t+1) (η)) is a convex function. Can minimize it to estimate η (t) k. [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

73 Preliminary results Normalized cost x(t) k,l(t) k x k,l k Exploration Convergence Player 1 Player 2 Player 3 Player 4 Player Game iteration Figure: Costs of each player (normalized by the equilibrium cost) 31/38

74 Preliminary results f(x (t) ) f(x ) Exploration f(x (t) ) f(x ) f(x (t) ) f(x ) minimum Convergence Game iteration Figure: Potential function f (x (t) ) f. 31/38

75 Preliminary results Average Bregman divergence Parameterized ηt Previous ηt Moving average ηt Linear regression ηt Prediction horizon h Figure: Average KL divergence between predicted distributions and actual distributions, as a function of the prediction horizon h. 31/38

76 Optimal routing with learning dynamics Assumptions A central authority has control over a fraction of traffic: u (t) α 1 A 1 α K A K Remaining traffic follows learning dynamics: x (t) (1 α 1) A 1 (1 α K ) A K Optimal routing under selfish learning constraints minimize u (1:T ),x (1:T ) T t=1 J(x (t), u (t) ) subject to x (t+1) = u(x (t) + u (t), l(x (t) + u (t) )) 32/38

77 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) 33/38

78 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) Mirror descent with the adjoint method. Adjoint method equivalent to minimize u J(u, x) subject to H(x, u) = 0 minimize J(u, X (u)) Then perform mirror descent on this function of u. 33/38

79 A simple example c e(φ e) = 1 F = 1 s t c e(φ e) = 2φ e Figure: Simple Pigou network used for the numerical experiment. Social optimum: ( 3, 1 ) 4 4 Nash equilibrium ( 1, 1 ) 2 2 Control over α = 1 of traffic 2 34/38

80 A simple example J t (u t +x t ) J t (u t +x t ) 1.4 J t (f * t ) 0.98 J t (f * t ) t t Figure: Social cost J (t) over time induced by adjoint solution (left) and the greedy solution (right). The dashed line shows the social optimal allocation. 35/38

81 A simple example u t and f t * ( ) x t and f t * ( ) l(x t +u t ) 1.5 Figure: Adjoint controlled flows (top), selfish flows (bottom). The green lines 1 correspond to the top path, and the blue lines to the bottom path. The dashed lines show the social optimal flows x SO = ( 3 4, 1 4 ) t 35/38

Application to the L.A. highway network Simplified model of the L.A. highway network. Cost functions uses the B.P.R. function, calibrated using the work of [28]. Figure: Los Angeles highway network.

82 Application to the L.A. highway network Simplified model of the L.A. highway network. Cost functions uses the B.P.R. function, calibrated using the work of [28]. Figure: Los Angeles highway network. [28]J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July /38

83 Application to the L.A. highway network J(x (i),u (i) ) 10.5 x α= 0.1 α= 0.3 α= 0.5 α= 0.7 α= 0.9 J (s) J τ Figure: Average delay without control (dashed), with full control (solid), and different values of α. [27]Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, /38

84 Summary learning algorithm ( ) x (t+1) = u x (t), Ak Ak l(t) Ak Environment Other agents Agent k outcome lak (x(t),..., A1 x(t) ) AK Figure: Coupled sequential decision problems. Simple model for distributed learning. Techniques for design / analysis of learning dynamics. Estimation of learning dynamics. Optimal control. 37/38

85 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. 38/38

86 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. Thank you! eecs.berkeley.edu/ walid/ 38/38

87 References I [1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, [4] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, PODC 06, pages 45 52, New York, NY, USA, ACM. [5] Lawrence E. Blume. The statistical mechanics of strategic interaction. Games and Economic Behavior, 5(3): , ISSN [6] Léon Bottou. Online algorithms and stochastic approximations [7] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, /38

88 References II [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, [9] Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, [12] Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132 (1):1 63, [13] Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st annual ACM symposium on Theory of computing, pages ACM, /38

89 References III [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, [16] Walid Krichene, Syrine Krichene, and Alexandre Bayen. Convergence of mirror descent dynamics in the routing game. In European Control Conference (ECC), [17] Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., [18] Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 75(2): , /38

90 References IV [19] Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , [20] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , [21] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, [22] H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, [24] William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN /38

91 References V [25] Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, [26] Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov s accelerated gradient method: Theory and insights. In NIPS, [27] Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, [28] J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July [29] Jörgen W Weibull. Evolutionary game theory. MIT press, /38

92 Continuous time model Back Continuous-time learning model ( ) ẋ Ak (t) = v k x (t), l Ak (x (t) ) Evolution in populations: [24] Convergence in potential games under dynamics which satisfy a positive correlation condition [23] Replicator dynamics for the congestion game [8] and in evolutionary game theory [29] No-regret dynamics for two player games [11] [24]William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, 2001 [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, 2004 [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, /38

93 Oscillating example Back Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses 45/38

94 Oscillating example Back f(x (τ) ) f t Figure: Potentials 46/38

95 Oscillating example Back p 1 lim τ x(τ) A1 p 4 x (0) A2 p 0 p 3 p 2 x (0) A1 p 5 lim τ x(τ) A2 Figure: Trajectories in the simplex 47/38

96 Regret [10] Back Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Convergence of averages By convexity of f, f 1 t k, lim sup t τ t x (τ) R (t) t 0 x (t) = 1 x (τ) X t τ t f (x) 1 f (x (τ) ) f (x) t 1 t τ t τ t l(x (t) ), x (t) x = K k=1 R (t) t [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, /38

97 AREP convergence proof Back Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) The set of limit points of an APT is internally chain transitive ICT. If Γ is compact invariant, and has a Lyapunov function f with int f (Γ) =, then L ICT, Γ, and f is constant on L. In particular, f is constant on L(x (t) ), so f (x (t) ) converges. 49/38

98 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

99 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y Example [2]: when X = d ψ(x) = H(x) = a xa ln xa D ψ (x, y) = D KL (x, y) = xa a xa ln y a The MD update has closed form solution x (t+1) x a (t) (t) ηt g a e A.k.a. Hedge algorithm, exponential weights. δ 3 δ 2 q Figure: KL divergence δ 1 [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

100 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. 51/38

101 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. Define DKL(x, ɛ y) = d i=1 (x i + ɛ) ln x i +ɛ y i +ɛ Proposition DKL ɛ 1 is -strongly convex w.r.t. 1 1+dɛ DKL ɛ is bounded by (1 + dɛ) ln 1+ɛ. ɛ 51/38

102 Convergence of DMD Back Theorem: Convergence of DMD [16] Suppose f has L Lipschitz gradient. Then under the MD class with η t 0 and ηt =, ( ) f (x (t) ) f τ t = O ητ t η t t and 1 t f (x (t) ) f k τ t L 2 k 2l ψk τ t f (x (t) ) f 1 f (x (t) ) f + O t τ t η k τ + D k η k t ( ) 1 t 52/38

103 Convergence in DSMD Back Regret bound [14] SMD method with (η t). t 2 > t 1 0 and F t1 -measurable x, t 2 [ ] g (τ), x (τ) x E [ D ψ (x, x (t1) ) ] ( 1 + D 1 ) + G η t1 η t2 η t1 2l ψ τ=t 1 E Strongly convex case: E[D ψ (x, x (t+1) )] (1 2l f η t) E[D ψ (x, x (t) )] + G 2l ψ η 2 t t 2 τ=t 1 η τ 53/38

104 Convergence in DSMD Back Theorem [14] Weakly convex case: Distributed SMD such that η p t ( [ ] E f (x (t) ) f (x ) 1 + ( = O Define S i = 1 t i+1 t i E[f (x (τ) )] ( Show S i 1 S i + D θ 1 t α 1 + = θp t αp t i=1 with α p (0, 1). Then ) ( ) 1 D θ k G 1 + t 1 α k θ k 2l ψ (1 α k ) t α k k A ) 1 i log t t min(min k α k,1 max k α k ) ) θg 1 2l ψ (1 α) t α 1 i [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38

105 Stochastic mirror descent in machine learning Back Large scale learning: minimize x subject to N i=1 x X f i (x) N very large. Gradient prohibitively expensive to compute exactly. Instead, compute ĝ(x (t) ) = i I f i (x (t) ) with I random subset of {1,..., N}. 55/38

Advanced Machine Learning

Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.