Distributed Learning in Routing Games

Size: px
Start display at page:

Download "Distributed Learning in Routing Games"

Transcription

1 Distributed Learning in Routing Games Convergence, Estimation of Player Dynamics, and Control Walid Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley IPAM Workshop on Decision Support for Traffic November 18, /38

2 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. 2/38

3 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). 2/38

4 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38

5 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38

6 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. 3/38

7 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. x (t) X Convergence rates? 3/38

8 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. Convergence rates? Robust to stochastic perturbations. Observation noise (Bandit feedback) x (t) X 3/38

9 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38

10 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38

11 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution Environment learning algorithm ( ) = u x (t), l (t) x (t+1) outcome l (t) Agent k Figure: Sequential decision problem. 4/38

12 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution learning algorithm ( ) = u x (t), l (t) x (t+1) Environment Other agents Agent k outcome l Ak (x (t) A 1,..., x (t) A K ) Figure: Sequential decision problem. Loss of agent k affected by strategies of other agents. Does not know this function, only observes its value. Write x (t) = (x (t) A 1,..., x (t) A K ). 4/38

13 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38

14 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38

15 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 6/38

16 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 6/38

17 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 6/38

18 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 6/38

19 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 Main problem Define class of dynamics C such that u k C k x (t) X 6/38

20 A brief review Discrete time: Hannan consistency: [10] Hedge algorithm for two-player games: [9] Regret based algorithms: [11] Online learning in games: [7] Potential games: [19] Continuous-time: [10]James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, 1957 [9]Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, 1999 [11]Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, 2001 [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [19]Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , /38

21 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 7/38

22 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of xa k have minimal loss. 8/38

23 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of x have minimal loss. Population k ] [ x (τ) A1 E Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ [ˆlp(x (τ) ) ] E path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions and path losses 8/38

24 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K 9/38

25 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K Nash condition first order optimality x, l(x ), x x 0 x, f (x ), x x 0 x x f(x ) = l(x ) 9/38

26 Regret analysis Technique 1: Regret analysis 10/38

27 Regret analysis Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Online optimality condition. Sublinear if lim sup t R (t) t 0. Convergence of averages [ ] k, R (t) is sublinear x (t) X x (t) = 1 t t τ=1 x (τ). proof 10/38

28 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 Population 2 x (τ) A1 x (τ) p path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions More 11/38

29 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses More 11/38

30 Stochastic approximation Technique 2: Stochastic approximation 12/38

31 Stochastic approximation Idea: View the learning dynamics as a discretization of an ODE. Study convergence of ODE. Relate convergence of discrete algorithm to convergence of ODE. 0 η 1 η 1 + η 2... Figure: Underlying continuous time 12/38

32 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

33 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

34 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

35 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

36 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

37 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. Log-linear learning [5], [18] [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

38 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38

39 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) Theorem: [8] Every solution of the ODE (1) converges to the set of its stationary points. [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38

40 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

41 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

42 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. (U (t) ) t 1 perturbations that satisfy for all T > 0, lim τ1 max τ2 : τ 2 t=τ η 1 t <T τ 2 t=τ 1 η tu (t+1) = 0 (a sufficient condition is that q 2: sup τ E U (τ) q < and τ η1+ q 2 τ < ) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38

43 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38

44 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. However, No convergence rates. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38

45 Stochastic convex optimization Technique 3: (Stochastic) convex optimization 17/38

46 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). 17/38

47 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. 17/38

48 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. Here: Class of distributed optimization methods: stochastic mirror descent. 17/38

49 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

50 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) f (x (t) ) l (t), x 3: x (t+1) = arg min x X 4: end for + 1 η t D ψ (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

51 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) 3: x (t+1) 4: end for Ak f (x (t) ) = arg min l (t) A x X k, x Ak + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

52 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 SMD Method with learning rates (η t) 1: for t N do 2: observe 3: x (t+1) 4: end for ˆl (t) [ˆl(t) with E = arg min x X Ak ˆl(t), x ] F t 1 Ak f (x (t) ) + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38

53 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

54 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and η 2 t < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

55 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

56 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < Deterministic version: If d τ+1 d τ a τ +b τ, and b τ <, then (d τ ) converges. [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38

57 Convergence [ ] To show convergence E f (x (t) ) f, generalize the technique of Shamir et al. [25] (for SGD, α = 1 2 ). Convergence of Distributed Stochastic Mirror Descent For η k t = θ k t α k, α k (0, 1), [ ] E f (x (t) ) f = O ( k ) log t t min(α k,1 α k ) Non-smooth, non-strongly convex. More details [25]Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, 2013 [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38

58 Summary Regret analysis: convergence of x (t) Stochastic approximation: almost sure convergence of x (t) Stochastic [ ] convex optimization: [ almost ] sure convergence, E f (x (t) ) f, E D ψ (x, x (t) ) 0, convergence rates. 21/38

59 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 22/38

60 Application to the routing game Figure: Example with strongly convex potential. Centered Gaussian noise on edges. Population 1: Hedge with ηt 1 = t 1 Population 2: Hedge with ηt 2 = t 1 23/38

61 Routing game with strongly convex potential Population 1 x (τ) A Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 x (τ) p path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ ˆlp(x (τ) ) path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ Figure: Population distributions and noisy path losses 24/38

62 Routing game with strongly convex potential η 1 t = t 1, η 2 t = t 1 E [ DKL(x, x (τ) ) ] Figure: Distance to equilibrium. For ηt k = θ k l f t α, α k k (0, 1], E [ D ψ (x, x (t) ) ] = O( k t α k ) τ 25/38

63 Routing game with weakly convex potential Figure: A weakly convex example. 26/38

64 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t f(x (τ) ) f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

65 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t.4 E [ f(x (τ) )] ] f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

66 Routing game with weakly convex potential E [ f(x (τ) )] ] f η 1 t = t.3, η 2 t = t.4 η 1 t = t.5, η 2 t = t τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38

67 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 28/38

68 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). Figure: Interface for the routing game experiment. 29/38

69 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). ( l (t) (t 1) 1, l 1,... ) x (t) 1 ( l (t) (t 1) 2, l 2,... ) x (t) 2 x (t) Server ( l (t) (t 1) 3, l 3,... ) x (t) 3 Figure: Interface for the routing game experiment. 29/38

70 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

71 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

72 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) Then d(η) = D KL ( x (t+1), x (t+1) (η)) is a convex function. Can minimize it to estimate η (t) k. [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38

73 Preliminary results Normalized cost x(t) k,l(t) k x k,l k Exploration Convergence Player 1 Player 2 Player 3 Player 4 Player Game iteration Figure: Costs of each player (normalized by the equilibrium cost) 31/38

74 Preliminary results f(x (t) ) f(x ) Exploration f(x (t) ) f(x ) f(x (t) ) f(x ) minimum Convergence Game iteration Figure: Potential function f (x (t) ) f. 31/38

75 Preliminary results Average Bregman divergence Parameterized ηt Previous ηt Moving average ηt Linear regression ηt Prediction horizon h Figure: Average KL divergence between predicted distributions and actual distributions, as a function of the prediction horizon h. 31/38

76 Optimal routing with learning dynamics Assumptions A central authority has control over a fraction of traffic: u (t) α 1 A 1 α K A K Remaining traffic follows learning dynamics: x (t) (1 α 1) A 1 (1 α K ) A K Optimal routing under selfish learning constraints minimize u (1:T ),x (1:T ) T t=1 J(x (t), u (t) ) subject to x (t+1) = u(x (t) + u (t), l(x (t) + u (t) )) 32/38

77 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) 33/38

78 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) Mirror descent with the adjoint method. Adjoint method equivalent to minimize u J(u, x) subject to H(x, u) = 0 minimize J(u, X (u)) Then perform mirror descent on this function of u. 33/38

79 A simple example c e(φ e) = 1 F = 1 s t c e(φ e) = 2φ e Figure: Simple Pigou network used for the numerical experiment. Social optimum: ( 3, 1 ) 4 4 Nash equilibrium ( 1, 1 ) 2 2 Control over α = 1 of traffic 2 34/38

80 A simple example J t (u t +x t ) J t (u t +x t ) 1.4 J t (f * t ) 0.98 J t (f * t ) t t Figure: Social cost J (t) over time induced by adjoint solution (left) and the greedy solution (right). The dashed line shows the social optimal allocation. 35/38

81 A simple example u t and f t * ( ) x t and f t * ( ) l(x t +u t ) 1.5 Figure: Adjoint controlled flows (top), selfish flows (bottom). The green lines 1 correspond to the top path, and the blue lines to the bottom path. The dashed lines show the social optimal flows x SO = ( 3 4, 1 4 ) t 35/38

82 Application to the L.A. highway network Simplified model of the L.A. highway network. Cost functions uses the B.P.R. function, calibrated using the work of [28]. Figure: Los Angeles highway network. [28]J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July /38

83 Application to the L.A. highway network J(x (i),u (i) ) 10.5 x α= 0.1 α= 0.3 α= 0.5 α= 0.7 α= 0.9 J (s) J τ Figure: Average delay without control (dashed), with full control (solid), and different values of α. [27]Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, /38

84 Summary learning algorithm ( ) x (t+1) = u x (t), Ak Ak l(t) Ak Environment Other agents Agent k outcome lak (x(t),..., A1 x(t) ) AK Figure: Coupled sequential decision problems. Simple model for distributed learning. Techniques for design / analysis of learning dynamics. Estimation of learning dynamics. Optimal control. 37/38

85 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. 38/38

86 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. Thank you! eecs.berkeley.edu/ walid/ 38/38

87 References I [1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, [4] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, PODC 06, pages 45 52, New York, NY, USA, ACM. [5] Lawrence E. Blume. The statistical mechanics of strategic interaction. Games and Economic Behavior, 5(3): , ISSN [6] Léon Bottou. Online algorithms and stochastic approximations [7] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, /38

88 References II [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, [9] Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, [12] Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132 (1):1 63, [13] Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st annual ACM symposium on Theory of computing, pages ACM, /38

89 References III [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, [16] Walid Krichene, Syrine Krichene, and Alexandre Bayen. Convergence of mirror descent dynamics in the routing game. In European Control Conference (ECC), [17] Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., [18] Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 75(2): , /38

90 References IV [19] Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , [20] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , [21] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, [22] H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, [24] William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN /38

91 References V [25] Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, [26] Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov s accelerated gradient method: Theory and insights. In NIPS, [27] Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, [28] J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July [29] Jörgen W Weibull. Evolutionary game theory. MIT press, /38

92 Continuous time model Back Continuous-time learning model ( ) ẋ Ak (t) = v k x (t), l Ak (x (t) ) Evolution in populations: [24] Convergence in potential games under dynamics which satisfy a positive correlation condition [23] Replicator dynamics for the congestion game [8] and in evolutionary game theory [29] No-regret dynamics for two player games [11] [24]William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, 2001 [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, 2004 [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, /38

93 Oscillating example Back Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses 45/38

94 Oscillating example Back f(x (τ) ) f t Figure: Potentials 46/38

95 Oscillating example Back p 1 lim τ x(τ) A1 p 4 x (0) A2 p 0 p 3 p 2 x (0) A1 p 5 lim τ x(τ) A2 Figure: Trajectories in the simplex 47/38

96 Regret [10] Back Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Convergence of averages By convexity of f, f 1 t k, lim sup t τ t x (τ) R (t) t 0 x (t) = 1 x (τ) X t τ t f (x) 1 f (x (τ) ) f (x) t 1 t τ t τ t l(x (t) ), x (t) x = K k=1 R (t) t [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, /38

97 AREP convergence proof Back Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) The set of limit points of an APT is internally chain transitive ICT. If Γ is compact invariant, and has a Lyapunov function f with int f (Γ) =, then L ICT, Γ, and f is constant on L. In particular, f is constant on L(x (t) ), so f (x (t) ) converges. 49/38

98 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

99 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y Example [2]: when X = d ψ(x) = H(x) = a xa ln xa D ψ (x, y) = D KL (x, y) = xa a xa ln y a The MD update has closed form solution x (t+1) x a (t) (t) ηt g a e A.k.a. Hedge algorithm, exponential weights. δ 3 δ 2 q Figure: KL divergence δ 1 [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38

100 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. 51/38

101 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. Define DKL(x, ɛ y) = d i=1 (x i + ɛ) ln x i +ɛ y i +ɛ Proposition DKL ɛ 1 is -strongly convex w.r.t. 1 1+dɛ DKL ɛ is bounded by (1 + dɛ) ln 1+ɛ. ɛ 51/38

102 Convergence of DMD Back Theorem: Convergence of DMD [16] Suppose f has L Lipschitz gradient. Then under the MD class with η t 0 and ηt =, ( ) f (x (t) ) f τ t = O ητ t η t t and 1 t f (x (t) ) f k τ t L 2 k 2l ψk τ t f (x (t) ) f 1 f (x (t) ) f + O t τ t η k τ + D k η k t ( ) 1 t 52/38

103 Convergence in DSMD Back Regret bound [14] SMD method with (η t). t 2 > t 1 0 and F t1 -measurable x, t 2 [ ] g (τ), x (τ) x E [ D ψ (x, x (t1) ) ] ( 1 + D 1 ) + G η t1 η t2 η t1 2l ψ τ=t 1 E Strongly convex case: E[D ψ (x, x (t+1) )] (1 2l f η t) E[D ψ (x, x (t) )] + G 2l ψ η 2 t t 2 τ=t 1 η τ 53/38

104 Convergence in DSMD Back Theorem [14] Weakly convex case: Distributed SMD such that η p t ( [ ] E f (x (t) ) f (x ) 1 + ( = O Define S i = 1 t i+1 t i E[f (x (τ) )] ( Show S i 1 S i + D θ 1 t α 1 + = θp t αp t i=1 with α p (0, 1). Then ) ( ) 1 D θ k G 1 + t 1 α k θ k 2l ψ (1 α k ) t α k k A ) 1 i log t t min(min k α k,1 max k α k ) ) θg 1 2l ψ (1 α) t α 1 i [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38

105 Stochastic mirror descent in machine learning Back Large scale learning: minimize x subject to N i=1 x X f i (x) N very large. Gradient prohibitively expensive to compute exactly. Instead, compute ĝ(x (t) ) = i I f i (x (t) ) with I random subset of {1,..., N}. 55/38

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Generalized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games

Generalized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games Generalized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games Po-An Chen Institute of Information Management National Chiao Tung University, Taiwan poanchen@nctu.edu.tw Abstract.

More information

LEARNING IN CONCAVE GAMES

LEARNING IN CONCAVE GAMES LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries

More information

Continuous and Discrete Dynamics For Online Learning and Convex Optimization. Walid Krichene. A dissertation submitted in partial satisfaction of the

Continuous and Discrete Dynamics For Online Learning and Convex Optimization. Walid Krichene. A dissertation submitted in partial satisfaction of the Continuous and Discrete Dynamics For Online Learning and Convex Optimization by Walid Krichene A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Online Convex Programming and Regularization in Adaptive Control

Online Convex Programming and Regularization in Adaptive Control Online Convex Programming and Regularization in Adaptive Control Maxim Raginsky, Alexander Rakhlin, and Serdar Yüksel Abstract Online Convex Programming (OCP) is a recently developed model of sequential

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Mirror Descent Learning in Continuous Games

Mirror Descent Learning in Continuous Games Mirror Descent Learning in Continuous Games Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, and Peter Glynn Abstract Online Mirror Descent (OMD) is an important and widely

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Using Piecewise-Constant Congestion Taxing Policy in Repeated Routing Games

Using Piecewise-Constant Congestion Taxing Policy in Repeated Routing Games Using Piecewise-Constant Congestion Taxing Policy in Repeated Routing Games Farhad Farohi and Karl Henri Johansson Abstract We consider repeated routing games with piecewiseconstant congestion taxing in

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

Anticipating Concept Drift in Online Learning

Anticipating Concept Drift in Online Learning Anticipating Concept Drift in Online Learning Michał Dereziński Computer Science Department University of California, Santa Cruz CA 95064, U.S.A. mderezin@soe.ucsc.edu Badri Narayan Bhaskar Yahoo Labs

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Performance and Convergence of Multi-user Online Learning

Performance and Convergence of Multi-user Online Learning Performance and Convergence of Multi-user Online Learning Cem Tekin, Mingyan Liu Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, Michigan, 4809-222 Email: {cmtkn,

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

Game-Theoretic Learning:

Game-Theoretic Learning: Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background

More information

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality

More information

Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks

Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks Pierre Coucheney, Corinne Touati, Bruno Gaujal INRIA Alcatel-Lucent, LIG Infocom 2009 Pierre Coucheney (INRIA)

More information

Stochastic Mirror Descent in Variationally Coherent Optimization Problems

Stochastic Mirror Descent in Variationally Coherent Optimization Problems Stochastic Mirror Descent in Variationally Coherent Optimization Problems Zhengyuan Zhou Stanford University zyzhou@stanford.edu Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, LIG panayotis.mertikopoulos@imag.fr

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Spatial Economics and Potential Games

Spatial Economics and Potential Games Outline Spatial Economics and Potential Games Daisuke Oyama Graduate School of Economics, Hitotsubashi University Hitotsubashi Game Theory Workshop 2007 Session Potential Games March 4, 2007 Potential

More information

Unified Convergence Proofs of Continuous-time Fictitious Play

Unified Convergence Proofs of Continuous-time Fictitious Play Unified Convergence Proofs of Continuous-time Fictitious Play Jeff S. Shamma and Gurdal Arslan Department of Mechanical and Aerospace Engineering University of California Los Angeles {shamma,garslan}@seas.ucla.edu

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Coevolutionary Modeling in Networks 1/39

Coevolutionary Modeling in Networks 1/39 Coevolutionary Modeling in Networks Jeff S. Shamma joint work with Ibrahim Al-Shyoukh & Georgios Chasparis & IMA Workshop on Analysis and Control of Network Dynamics October 19 23, 2015 Jeff S. Shamma

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

Evolutionary Game Theory: Overview and Recent Results

Evolutionary Game Theory: Overview and Recent Results Overviews: Evolutionary Game Theory: Overview and Recent Results William H. Sandholm University of Wisconsin nontechnical survey: Evolutionary Game Theory (in Encyclopedia of Complexity and System Science,

More information

Game Theory and Control

Game Theory and Control Game Theory and Control Lecture 4: Potential games Saverio Bolognani, Ashish Hota, Maryam Kamgarpour Automatic Control Laboratory ETH Zürich 1 / 40 Course Outline 1 Introduction 22.02 Lecture 1: Introduction

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Learning with Bandit Feedback in Potential Games

Learning with Bandit Feedback in Potential Games Learning with Bandit Feedback in Potential Games Johanne Cohen LRI-CNRS, Université Paris-Sud,Université Paris-Saclay, France johanne.cohen@lri.fr Amélie Héliou LIX, Ecole Polytechnique, CNRS, AMIBio,

More information

4. Opponent Forecasting in Repeated Games

4. Opponent Forecasting in Repeated Games 4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents

More information

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS

DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS Jörgen Weibull March 23, 2010 1 The multi-population replicator dynamic Domain of analysis: finite games in normal form, G =(N, S, π), with mixed-strategy

More information

A Modified Q-Learning Algorithm for Potential Games

A Modified Q-Learning Algorithm for Potential Games Preprints of the 19th World Congress The International Federation of Automatic Control A Modified Q-Learning Algorithm for Potential Games Yatao Wang Lacra Pavel Edward S. Rogers Department of Electrical

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Convergence to Nash equilibrium in continuous games with noisy first-order feedback

Convergence to Nash equilibrium in continuous games with noisy first-order feedback Convergence to Nash equilibrium in continuous games with noisy first-order feedback Panayotis Mertikopoulos and Mathias Staudigl Abstract This paper examines the convergence of a broad class of distributed

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine

More information

Influencing Social Evolutionary Dynamics

Influencing Social Evolutionary Dynamics Influencing Social Evolutionary Dynamics Jeff S Shamma Georgia Institute of Technology MURI Kickoff February 13, 2013 Influence in social networks Motivating scenarios: Competing for customers Influencing

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i : On the Connection between No-Regret Learning, Fictitious Play, & Nash Equilibrium Amy Greenwald Brown University Gunes Ercal, David Gondek, Amir Jafari July, Games A game is a tuple = (I; (S i ;r i ) i2i)

More information

Gains in evolutionary dynamics. Dai ZUSAI

Gains in evolutionary dynamics. Dai ZUSAI Gains in evolutionary dynamics unifying rational framework for dynamic stability Dai ZUSAI Hitotsubashi University (Econ Dept.) Economic Theory Workshop October 27, 2016 1 Introduction Motivation and literature

More information

Irrational behavior in the Brown von Neumann Nash dynamics

Irrational behavior in the Brown von Neumann Nash dynamics Irrational behavior in the Brown von Neumann Nash dynamics Ulrich Berger a and Josef Hofbauer b a Vienna University of Economics and Business Administration, Department VW 5, Augasse 2-6, A-1090 Wien,

More information

Probabilistic Bisection Search for Stochastic Root Finding

Probabilistic Bisection Search for Stochastic Root Finding Probabilistic Bisection Search for Stochastic Root Finding Rolf Waeber Peter I. Frazier Shane G. Henderson Operations Research & Information Engineering Cornell University, Ithaca, NY Research supported

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information