Distributed Learning in Routing Games
|
|
- Clement Gardner
- 5 years ago
- Views:
Transcription
1 Distributed Learning in Routing Games Convergence, Estimation of Player Dynamics, and Control Walid Krichene Alexandre Bayen Electrical Engineering and Computer Sciences, UC Berkeley IPAM Workshop on Decision Support for Traffic November 18, /38
2 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. 2/38
3 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). 2/38
4 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38
5 Introduction Convergence of agent dynamics Routing Examples Estimation and optimal control References Learning dynamics in the routing game Routing games model congestion on networks. Communication networks Nash equilibrium quantifies efficiency of network in steady state. System does not operate at equilibrium. Beyond equilibria, we need to understand decision dynamics (learning). A realistic model for decision dynamics is essential for prediction, optimal control. 2/38
6 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. 3/38
7 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. x (t) X Convergence rates? 3/38
8 Desiderata Learning dynamics should be Realistic in terms of information requirements, computational complexity. Consistent with the full information Nash equilibrium. Convergence rates? Robust to stochastic perturbations. Observation noise (Bandit feedback) x (t) X 3/38
9 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38
10 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 3/38
11 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution Environment learning algorithm ( ) = u x (t), l (t) x (t+1) outcome l (t) Agent k Figure: Sequential decision problem. 4/38
12 Interaction of K decision makers Decision maker k faces a sequential decision problem At iteration t (1) chooses probability distribution x (t) over action set (2) discovers a loss function l (t) : [0, 1] (3) updates distribution learning algorithm ( ) = u x (t), l (t) x (t+1) Environment Other agents Agent k outcome l Ak (x (t) A 1,..., x (t) A K ) Figure: Sequential decision problem. Loss of agent k affected by strategies of other agents. Does not know this function, only observes its value. Write x (t) = (x (t) A 1,..., x (t) A K ). 4/38
13 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38
14 Examples of decentralized decision makers Routing game Player drives from source to destination node Chooses path from Mass of players on each edge determines cost on that edge Figure: Routing game 5/38
15 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 6/38
16 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 6/38
17 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 6/38
18 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 6/38
19 Online learning model Online Learning Model 1: for t N do 2: Play p x (t) 3: Discover l (t) 4: Update 5: end for x (t+1) = u k ( x (t), l (t) ) x (t) A 1 A 1 Sample p x (t) A 1 Discover l (t) A 1 Update x (t+1) A 1 Main problem Define class of dynamics C such that u k C k x (t) X 6/38
20 A brief review Discrete time: Hannan consistency: [10] Hedge algorithm for two-player games: [9] Regret based algorithms: [11] Online learning in games: [7] Potential games: [19] Continuous-time: [10]James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, 1957 [9]Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, 1999 [11]Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, 2001 [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [19]Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , /38
21 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 7/38
22 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of xa k have minimal loss. 8/38
23 Nash equilibria, and the Rosenthal potential Write x = (x A1,..., x AK ) A 1 A K l(x) = (l A1 (x),..., l AK (x)) Nash equilibrium x is a Nash equilibrium if l(x ), x x 0 x k, x Ak, l Ak (x ), x Ak x 0 In words, for all k, paths in the support of x have minimal loss. Population k ] [ x (τ) A1 E Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ [ˆlp(x (τ) ) ] E path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions and path losses 8/38
24 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K 9/38
25 Nash equilibria, and the Rosenthal potential Rosenthal potential f convex such that f (x) = l(x) Then the set of Nash equilibria is X = arg min f (x) x A 1 A K Nash condition first order optimality x, l(x ), x x 0 x, f (x ), x x 0 x x f(x ) = l(x ) 9/38
26 Regret analysis Technique 1: Regret analysis 10/38
27 Regret analysis Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Online optimality condition. Sublinear if lim sup t R (t) t 0. Convergence of averages [ ] k, R (t) is sublinear x (t) X x (t) = 1 t t τ=1 x (τ). proof 10/38
28 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 Population 2 x (τ) A1 x (τ) p path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Population distributions More 11/38
29 Convergence of x (t) Vs. convergence of x (t) Routing game example Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses More 11/38
30 Stochastic approximation Technique 2: Stochastic approximation 12/38
31 Stochastic approximation Idea: View the learning dynamics as a discretization of an ODE. Study convergence of ODE. Relate convergence of discrete algorithm to convergence of ODE. 0 η 1 η 1 + η 2... Figure: Underlying continuous time 12/38
32 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
33 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
34 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
35 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
36 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
37 Example: the Hedge algorithm Hedge algorithm Update the distribution according to observed loss x (t+1) a x a (t) e ηk t l(t) a Also known as Exponentially weighted average forecaster [7]. Multiplicative weights update [1]. Exponentiated gradient descent [12]. Entropic descent [2]. Log-linear learning [5], [18] [7]Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006 [1]Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , 2012 [12]Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 63, 1997 [2]Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
38 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38
39 The replicator ODE In Hedge x (t+1) p Replicator equation [29] x p (t) e ηk t l(t) p, take η t 0. a, dxa dt = xa ( la (x), k x la(x)) (1) Theorem: [8] Every solution of the ODE (1) converges to the set of its stationary points. [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, /38
40 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38
41 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38
42 AREP dynamics: Approximate REPlicator Discretization of the continuous-time replicator dynamics ( ) x a (t+1) x a (t) = η tx a (t) l Ak (x (t) ), x (t) l a(x (t) ) + η tu a (t+1) η t discretization time steps. (U (t) ) t 1 perturbations that satisfy for all T > 0, lim τ1 max τ2 : τ 2 t=τ η 1 t <T τ 2 t=τ 1 η tu (t+1) = 0 (a sufficient condition is that q 2: sup τ E U (τ) q < and τ η1+ q 2 τ < ) [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, /38
43 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38
44 Convergence to Nash equilibria Theorem [15] Under AREP updates, if η t 0 and η t =, then x (t) X Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) Use f as a Lyapunov function. However, No convergence rates. proof details [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, /38
45 Stochastic convex optimization Technique 3: (Stochastic) convex optimization 17/38
46 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). 17/38
47 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. 17/38
48 Stochastic convex optimization Idea: View the learning dynamics as a distributed algorithm to minimize f. (More generally: distributed algorithm to find zero of a monotone operator). Allows us to analyze convergence rates. Here: Class of distributed optimization methods: stochastic mirror descent. 17/38
49 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38
50 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) f (x (t) ) l (t), x 3: x (t+1) = arg min x X 4: end for + 1 η t D ψ (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38
51 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 MD Method with learning rates (η t) 1: for t N do 2: observe l (t) 3: x (t+1) 4: end for Ak f (x (t) ) = arg min l (t) A x X k, x Ak + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38
52 Stochastic Mirror Descent minimize f (x) convex function subject to x X R d convex, compact set Algorithm 2 SMD Method with learning rates (η t) 1: for t N do 2: observe 3: x (t+1) 4: end for ˆl (t) [ˆl(t) with E = arg min x X Ak ˆl(t), x ] F t 1 Ak f (x (t) ) + 1 D ηt k ψk (x, x (t) ) f(x (t+1) ) f(x (t) ) η t: learning rate D ψ : Bregman divergence f(x) f(x (t) ) + l (t), x x (t) f(x (t) ) + l (t), x x (t) + 1 ηt Dψ(x, x(t) ) [21]A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, 1983 [20]A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , /38
53 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38
54 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and η 2 t < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38
55 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38
56 Convergence d τ = D ψ (X, x (τ) ). Main ingredient E [d τ+1 F τ 1 ] d τ η τ (f (x (τ) ) f )+ η2 τ 2µ E [ ˆl (τ) 2 F τ 1 ] From here, Can show a.s. convergence x (t) X if η t = and ηt 2 d τ is an almost super martingale [22], [6] < Deterministic version: If d τ+1 d τ a τ +b τ, and b τ <, then (d τ ) converges. [22]H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, 1971 [6]Léon Bottou. Online algorithms and stochastic approximations /38
57 Convergence [ ] To show convergence E f (x (t) ) f, generalize the technique of Shamir et al. [25] (for SGD, α = 1 2 ). Convergence of Distributed Stochastic Mirror Descent For η k t = θ k t α k, α k (0, 1), [ ] E f (x (t) ) f = O ( k ) log t t min(α k,1 α k ) Non-smooth, non-strongly convex. More details [25]Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, 2013 [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38
58 Summary Regret analysis: convergence of x (t) Stochastic approximation: almost sure convergence of x (t) Stochastic [ ] convex optimization: [ almost ] sure convergence, E f (x (t) ) f, E D ψ (x, x (t) ) 0, convergence rates. 21/38
59 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 22/38
60 Application to the routing game Figure: Example with strongly convex potential. Centered Gaussian noise on edges. Population 1: Hedge with ηt 1 = t 1 Population 2: Hedge with ηt 2 = t 1 23/38
61 Routing game with strongly convex potential Population 1 x (τ) A Mass distributions x (t) Path losses l Ak (x (t) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v4, v6, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 x (τ) p path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ ˆlp(x (τ) ) path p3 = (v2, v3) path p4 = (v2, v4, v5, v3) path p5 = (v2, v4, v6, v3) τ Figure: Population distributions and noisy path losses 24/38
62 Routing game with strongly convex potential η 1 t = t 1, η 2 t = t 1 E [ DKL(x, x (τ) ) ] Figure: Distance to equilibrium. For ηt k = θ k l f t α, α k k (0, 1], E [ D ψ (x, x (t) ) ] = O( k t α k ) τ 25/38
63 Routing game with weakly convex potential Figure: A weakly convex example. 26/38
64 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t f(x (τ) ) f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38
65 Routing game with weakly convex potential 10 2 η 1 t = t.3, η 2 t = t.4 E [ f(x (τ) )] ] f τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38
66 Routing game with weakly convex potential E [ f(x (τ) )] ] f η 1 t = t.3, η 2 t = t.4 η 1 t = t.5, η 2 t = t τ Figure: Potential values. θ For k t α, α k k (0, 1), E [ f (x (t) ) ] ( f = O k ) log t t min(α k,1 α k ) 27/38
67 Outline 1 Introduction 2 Convergence of agent dynamics 3 Routing Examples 4 Estimation and optimal control 28/38
68 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). Figure: Interface for the routing game experiment. 29/38
69 A routing experiment Interface for the routing game. Used to collect sequence of decisions x (t). ( l (t) (t 1) 1, l 1,... ) x (t) 1 ( l (t) (t 1) 2, l 2,... ) x (t) 2 x (t) Server ( l (t) (t 1) 3, l 3,... ) x (t) 3 Figure: Interface for the routing game experiment. 29/38
70 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38
71 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38
72 Estimation of learning dynamics We observe a sequence of player decisions ( x (t) ) and losses ( l (t) ). Can we fit a model of player dynamics? Mirror descent model Estimate the learning rate in the mirror descent model x (t+1) (η) = arg min x l(t), x + 1 η D KL(x, x (t) ) Then d(η) = D KL ( x (t+1), x (t+1) (η)) is a convex function. Can minimize it to estimate η (t) k. [17]Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., /38
73 Preliminary results Normalized cost x(t) k,l(t) k x k,l k Exploration Convergence Player 1 Player 2 Player 3 Player 4 Player Game iteration Figure: Costs of each player (normalized by the equilibrium cost) 31/38
74 Preliminary results f(x (t) ) f(x ) Exploration f(x (t) ) f(x ) f(x (t) ) f(x ) minimum Convergence Game iteration Figure: Potential function f (x (t) ) f. 31/38
75 Preliminary results Average Bregman divergence Parameterized ηt Previous ηt Moving average ηt Linear regression ηt Prediction horizon h Figure: Average KL divergence between predicted distributions and actual distributions, as a function of the prediction horizon h. 31/38
76 Optimal routing with learning dynamics Assumptions A central authority has control over a fraction of traffic: u (t) α 1 A 1 α K A K Remaining traffic follows learning dynamics: x (t) (1 α 1) A 1 (1 α K ) A K Optimal routing under selfish learning constraints minimize u (1:T ),x (1:T ) T t=1 J(x (t), u (t) ) subject to x (t+1) = u(x (t) + u (t), l(x (t) + u (t) )) 32/38
77 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) 33/38
78 Solution methods Greedy method: Approximate the problem with a sequence of convex problems. minimize u (t)j(u(x (t 1), u (t 1) ), u (t) ) Mirror descent with the adjoint method. Adjoint method equivalent to minimize u J(u, x) subject to H(x, u) = 0 minimize J(u, X (u)) Then perform mirror descent on this function of u. 33/38
79 A simple example c e(φ e) = 1 F = 1 s t c e(φ e) = 2φ e Figure: Simple Pigou network used for the numerical experiment. Social optimum: ( 3, 1 ) 4 4 Nash equilibrium ( 1, 1 ) 2 2 Control over α = 1 of traffic 2 34/38
80 A simple example J t (u t +x t ) J t (u t +x t ) 1.4 J t (f * t ) 0.98 J t (f * t ) t t Figure: Social cost J (t) over time induced by adjoint solution (left) and the greedy solution (right). The dashed line shows the social optimal allocation. 35/38
81 A simple example u t and f t * ( ) x t and f t * ( ) l(x t +u t ) 1.5 Figure: Adjoint controlled flows (top), selfish flows (bottom). The green lines 1 correspond to the top path, and the blue lines to the bottom path. The dashed lines show the social optimal flows x SO = ( 3 4, 1 4 ) t 35/38
82 Application to the L.A. highway network Simplified model of the L.A. highway network. Cost functions uses the B.P.R. function, calibrated using the work of [28]. Figure: Los Angeles highway network. [28]J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July /38
83 Application to the L.A. highway network J(x (i),u (i) ) 10.5 x α= 0.1 α= 0.3 α= 0.5 α= 0.7 α= 0.9 J (s) J τ Figure: Average delay without control (dashed), with full control (solid), and different values of α. [27]Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, /38
84 Summary learning algorithm ( ) x (t+1) = u x (t), Ak Ak l(t) Ak Environment Other agents Agent k outcome lak (x(t),..., A1 x(t) ) AK Figure: Coupled sequential decision problems. Simple model for distributed learning. Techniques for design / analysis of learning dynamics. Estimation of learning dynamics. Optimal control. 37/38
85 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. 38/38
86 Ongoing / future work Larger classes of games. Acceleration of learning dynamics (Nesterov s method). In continuous time: accelerated replicator dynamics. In discrete time: convergence in O(1/t 2 ) instead of O(1/t). Heuristic to remove oscillations. Applications to load balancing. Other control schemes: tolling / incentivization. Thank you! eecs.berkeley.edu/ walid/ 38/38
87 References I [1] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1): , [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May [3] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Séminaire de probabilités XXXIII, pages Springer, [4] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, PODC 06, pages 45 52, New York, NY, USA, ACM. [5] Lawrence E. Blume. The statistical mechanics of strategic interaction. Games and Economic Behavior, 5(3): , ISSN [6] Léon Bottou. Online algorithms and stochastic approximations [7] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, /38
88 References II [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, [9] Yoav Freund and Robert E Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1):79 103, [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, [12] Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132 (1):1 63, [13] Robert Kleinberg, Georgios Piliouras, and Eva Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. In Proceedings of the 41st annual ACM symposium on Theory of computing, pages ACM, /38
89 References III [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, [15] Walid Krichene, Benjamin Drighès, and Alexandre Bayen. Learning nash equilibria in congestion games. SIAM Journal on Control and Optimization (SICON), to appear, [16] Walid Krichene, Syrine Krichene, and Alexandre Bayen. Convergence of mirror descent dynamics in the routing game. In European Control Conference (ECC), [17] Kiet Lam, Walid Krichene, and Alexandre M. Bayen. Estimation of learning dynamics in the routing game. In International Conference on Cyber-Physical Systems (ICCPS), in review., [18] Jason R. Marden and Jeff S. Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 75(2): , /38
90 References IV [19] Jason R Marden, Gürdal Arslan, and Jeff S Shamma. Joint strategy fictitious play with inertia for potential games. Automatic Control, IEEE Transactions on, 54(2): , [20] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4): , [21] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience series in discrete mathematics. Wiley, [22] H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, [24] William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN /38
91 References V [25] Ohad Shamir and Tong Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In ICML, pages 71 79, [26] Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov s accelerated gradient method: Theory and insights. In NIPS, [27] Milena Suarez, Walid Krichene, and Alexandre Bayen. Optimal routing under hedge response. Transactions on Control of Networked Systems (TCNS), in preparation, [28] J. Thai, R. Hariss, and A. Bayen. A multi-convex approach to latency inference and control in traffic equilibria from sparse data. In American Control Conference (ACC), 2015, pages , July [29] Jörgen W Weibull. Evolutionary game theory. MIT press, /38
92 Continuous time model Back Continuous-time learning model ( ) ẋ Ak (t) = v k x (t), l Ak (x (t) ) Evolution in populations: [24] Convergence in potential games under dynamics which satisfy a positive correlation condition [23] Replicator dynamics for the congestion game [8] and in evolutionary game theory [29] No-regret dynamics for two player games [11] [24]William H. Sandholm. Population games and evolutionary dynamics. Economic learning and social evolution. Cambridge, Mass. MIT Press, ISBN [23] William H Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81 108, 2001 [29] Jörgen W Weibull. Evolutionary game theory. MIT press, 1997 [8] Simon Fischer and Berthold Vöcking. On the evolution of selfish routing. In Algorithms ESA 2004, pages Springer, 2004 [11] Sergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98(1):26 54, /38
93 Oscillating example Back Population 1 ˆlp(x (τ) ) Path losses l Ak (x (t) ) l Ak ( x (t) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Population 2 ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ ˆlp(x (τ) ) path p0 = (v0, v5, v1) path p1 = (v0, v4, v5, v1) path p2 = (v0, v1) τ Figure: Path losses 45/38
94 Oscillating example Back f(x (τ) ) f t Figure: Potentials 46/38
95 Oscillating example Back p 1 lim τ x(τ) A1 p 4 x (0) A2 p 0 p 3 p 2 x (0) A1 p 5 lim τ x(τ) A2 Figure: Trajectories in the simplex 47/38
96 Regret [10] Back Cumulative regret R (t) = sup x Ak τ t x (t) x Ak, l Ak (x (t) ) Convergence of averages By convexity of f, f 1 t k, lim sup t τ t x (τ) R (t) t 0 x (t) = 1 x (τ) X t τ t f (x) 1 f (x (τ) ) f (x) t 1 t τ t τ t l(x (t) ), x (t) x = K k=1 R (t) t [10] James Hannan. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games, 3:97 139, /38
97 AREP convergence proof Back Affine interpolation of x (t) is an asymptotic pseudo trajectory. x (0) x (1) Φtk 2(x(k 2) ) Φt0(x (0) ) x (k 1) Φtk 1 (x(k 1) ) x (k) The set of limit points of an APT is internally chain transitive ICT. If Γ is compact invariant, and has a Lyapunov function f with int f (Γ) =, then L ICT, Γ, and f is constant on L. In particular, f is constant on L(x (t) ), so f (x (t) ) converges. 49/38
98 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
99 Bregman Divergence Back Bregman Divergence Strongly convex function ψ D ψ (x, y) = ψ(x) ψ(y) ψ(y), x y Example [2]: when X = d ψ(x) = H(x) = a xa ln xa D ψ (x, y) = D KL (x, y) = xa a xa ln y a The MD update has closed form solution x (t+1) x a (t) (t) ηt g a e A.k.a. Hedge algorithm, exponential weights. δ 3 δ 2 q Figure: KL divergence δ 1 [2] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett., 31(3): , May /38
100 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. 51/38
101 A bounded entropic divergence Back X = D KL (x, y) = d i=1 x i ln x i y i is unbounded. Define DKL(x, ɛ y) = d i=1 (x i + ɛ) ln x i +ɛ y i +ɛ Proposition DKL ɛ 1 is -strongly convex w.r.t. 1 1+dɛ DKL ɛ is bounded by (1 + dɛ) ln 1+ɛ. ɛ 51/38
102 Convergence of DMD Back Theorem: Convergence of DMD [16] Suppose f has L Lipschitz gradient. Then under the MD class with η t 0 and ηt =, ( ) f (x (t) ) f τ t = O ητ t η t t and 1 t f (x (t) ) f k τ t L 2 k 2l ψk τ t f (x (t) ) f 1 f (x (t) ) f + O t τ t η k τ + D k η k t ( ) 1 t 52/38
103 Convergence in DSMD Back Regret bound [14] SMD method with (η t). t 2 > t 1 0 and F t1 -measurable x, t 2 [ ] g (τ), x (τ) x E [ D ψ (x, x (t1) ) ] ( 1 + D 1 ) + G η t1 η t2 η t1 2l ψ τ=t 1 E Strongly convex case: E[D ψ (x, x (t+1) )] (1 2l f η t) E[D ψ (x, x (t) )] + G 2l ψ η 2 t t 2 τ=t 1 η τ 53/38
104 Convergence in DSMD Back Theorem [14] Weakly convex case: Distributed SMD such that η p t ( [ ] E f (x (t) ) f (x ) 1 + ( = O Define S i = 1 t i+1 t i E[f (x (τ) )] ( Show S i 1 S i + D θ 1 t α 1 + = θp t αp t i=1 with α p (0, 1). Then ) ( ) 1 D θ k G 1 + t 1 α k θ k 2l ψ (1 α k ) t α k k A ) 1 i log t t min(min k α k,1 max k α k ) ) θg 1 2l ψ (1 α) t α 1 i [14] Syrine Krichene, Walid Krichene, Roy Dong, and Alexandre Bayen. Convergence of heterogeneous distributed learning in stochastic routing games. In 53rd Allerton Conference on Communication, Control and Computing, /38
105 Stochastic mirror descent in machine learning Back Large scale learning: minimize x subject to N i=1 x X f i (x) N very large. Gradient prohibitively expensive to compute exactly. Instead, compute ĝ(x (t) ) = i I f i (x (t) ) with I random subset of {1,..., N}. 55/38
Advanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationGeneralized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games
Generalized Mirror Descents with Non-Convex Potential Functions in Atomic Congestion Games Po-An Chen Institute of Information Management National Chiao Tung University, Taiwan poanchen@nctu.edu.tw Abstract.
More informationLEARNING IN CONCAVE GAMES
LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries
More informationContinuous and Discrete Dynamics For Online Learning and Convex Optimization. Walid Krichene. A dissertation submitted in partial satisfaction of the
Continuous and Discrete Dynamics For Online Learning and Convex Optimization by Walid Krichene A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationOnline Convex Programming and Regularization in Adaptive Control
Online Convex Programming and Regularization in Adaptive Control Maxim Raginsky, Alexander Rakhlin, and Serdar Yüksel Abstract Online Convex Programming (OCP) is a recently developed model of sequential
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationMirror Descent Learning in Continuous Games
Mirror Descent Learning in Continuous Games Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, and Peter Glynn Abstract Online Mirror Descent (OMD) is an important and widely
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationUsing Piecewise-Constant Congestion Taxing Policy in Repeated Routing Games
Using Piecewise-Constant Congestion Taxing Policy in Repeated Routing Games Farhad Farohi and Karl Henri Johansson Abstract We consider repeated routing games with piecewiseconstant congestion taxing in
More informationDistributed Learning based on Entropy-Driven Game Dynamics
Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)
More informationAnticipating Concept Drift in Online Learning
Anticipating Concept Drift in Online Learning Michał Dereziński Computer Science Department University of California, Santa Cruz CA 95064, U.S.A. mderezin@soe.ucsc.edu Badri Narayan Bhaskar Yahoo Labs
More informationOnline Optimization with Gradual Variations
JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationPerformance and Convergence of Multi-user Online Learning
Performance and Convergence of Multi-user Online Learning Cem Tekin, Mingyan Liu Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, Michigan, 4809-222 Email: {cmtkn,
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationPopulation Games and Evolutionary Dynamics
Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population
More informationGame-Theoretic Learning:
Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background
More informationSmooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality
More informationFair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks
Fair and Efficient User-Network Association Algorithm for Multi-Technology Wireless Networks Pierre Coucheney, Corinne Touati, Bruno Gaujal INRIA Alcatel-Lucent, LIG Infocom 2009 Pierre Coucheney (INRIA)
More informationStochastic Mirror Descent in Variationally Coherent Optimization Problems
Stochastic Mirror Descent in Variationally Coherent Optimization Problems Zhengyuan Zhou Stanford University zyzhou@stanford.edu Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, LIG panayotis.mertikopoulos@imag.fr
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationSpatial Economics and Potential Games
Outline Spatial Economics and Potential Games Daisuke Oyama Graduate School of Economics, Hitotsubashi University Hitotsubashi Game Theory Workshop 2007 Session Potential Games March 4, 2007 Potential
More informationUnified Convergence Proofs of Continuous-time Fictitious Play
Unified Convergence Proofs of Continuous-time Fictitious Play Jeff S. Shamma and Gurdal Arslan Department of Mechanical and Aerospace Engineering University of California Los Angeles {shamma,garslan}@seas.ucla.edu
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationCoevolutionary Modeling in Networks 1/39
Coevolutionary Modeling in Networks Jeff S. Shamma joint work with Ibrahim Al-Shyoukh & Georgios Chasparis & IMA Workshop on Analysis and Control of Network Dynamics October 19 23, 2015 Jeff S. Shamma
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationEvolutionary Game Theory: Overview and Recent Results
Overviews: Evolutionary Game Theory: Overview and Recent Results William H. Sandholm University of Wisconsin nontechnical survey: Evolutionary Game Theory (in Encyclopedia of Complexity and System Science,
More informationGame Theory and Control
Game Theory and Control Lecture 4: Potential games Saverio Bolognani, Ashish Hota, Maryam Kamgarpour Automatic Control Laboratory ETH Zürich 1 / 40 Course Outline 1 Introduction 22.02 Lecture 1: Introduction
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,
More informationThe convex optimization approach to regret minimization
The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationLearning with Bandit Feedback in Potential Games
Learning with Bandit Feedback in Potential Games Johanne Cohen LRI-CNRS, Université Paris-Sud,Université Paris-Saclay, France johanne.cohen@lri.fr Amélie Héliou LIX, Ecole Polytechnique, CNRS, AMIBio,
More information4. Opponent Forecasting in Repeated Games
4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents
More informationDETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS
DETERMINISTIC AND STOCHASTIC SELECTION DYNAMICS Jörgen Weibull March 23, 2010 1 The multi-population replicator dynamic Domain of analysis: finite games in normal form, G =(N, S, π), with mixed-strategy
More informationA Modified Q-Learning Algorithm for Potential Games
Preprints of the 19th World Congress The International Federation of Automatic Control A Modified Q-Learning Algorithm for Potential Games Yatao Wang Lacra Pavel Edward S. Rogers Department of Electrical
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationOnline Sparse Linear Regression
JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU
More informationOnline Submodular Minimization
Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationAdvanced Machine Learning
Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationDelay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationLower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization
JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationConvergence to Nash equilibrium in continuous games with noisy first-order feedback
Convergence to Nash equilibrium in continuous games with noisy first-order feedback Panayotis Mertikopoulos and Mathias Staudigl Abstract This paper examines the convergence of a broad class of distributed
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine
More informationInfluencing Social Evolutionary Dynamics
Influencing Social Evolutionary Dynamics Jeff S Shamma Georgia Institute of Technology MURI Kickoff February 13, 2013 Influence in social networks Motivating scenarios: Competing for customers Influencing
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationGames A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :
On the Connection between No-Regret Learning, Fictitious Play, & Nash Equilibrium Amy Greenwald Brown University Gunes Ercal, David Gondek, Amir Jafari July, Games A game is a tuple = (I; (S i ;r i ) i2i)
More informationGains in evolutionary dynamics. Dai ZUSAI
Gains in evolutionary dynamics unifying rational framework for dynamic stability Dai ZUSAI Hitotsubashi University (Econ Dept.) Economic Theory Workshop October 27, 2016 1 Introduction Motivation and literature
More informationIrrational behavior in the Brown von Neumann Nash dynamics
Irrational behavior in the Brown von Neumann Nash dynamics Ulrich Berger a and Josef Hofbauer b a Vienna University of Economics and Business Administration, Department VW 5, Augasse 2-6, A-1090 Wien,
More informationProbabilistic Bisection Search for Stochastic Root Finding
Probabilistic Bisection Search for Stochastic Root Finding Rolf Waeber Peter I. Frazier Shane G. Henderson Operations Research & Information Engineering Cornell University, Ithaca, NY Research supported
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More information