Online Learning and Online Convex Optimization

Size: px
Start display at page:

Download "Online Learning and Online Convex Optimization"

Transcription

1 Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49

2 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 The joy of convex N. Cesa-Bianchi (UNIMI) Online Learning 2 / 49

3 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 The joy of convex N. Cesa-Bianchi (UNIMI) Online Learning 3 / 49

4 Machine learning Classification/regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) Training data S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) (e.g., messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A(S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Learning 4 / 49

5 Two notions of risk View data as a statistical sample: statistical risk [ E l ( ) ] A( S T ), (X, Y) } {{ }} {{ } trained test model example Training set S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) and test example (X, Y) drawn i.i.d. from the same unknown and fixed distribution N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

6 Two notions of risk View data as a statistical sample: statistical risk [ E l ( ) ] A( S T ), (X, Y) } {{ }} {{ } trained test model example Training set S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) and test example (X, Y) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T l ( ) A(S t 1 ), (X } {{ } t, Y t ) } {{ } trained test model example Sequence of models trained on growing prefixes S t = ( (X 1, Y 1 ),..., (X t, Y t ) ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

7 Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning [ E l ( A(S T ), (X, Y) )] [ inf E l ( h, (X, Y) )] h H compare to expected loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

8 Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning [ E l ( A(S T ), (X, Y) )] [ inf E l ( h, (X, Y) )] h H compare to expected loss of best model in the class Regret in online learning T l ( A(S t 1 ), (X t, Y t ) ) inf h H T l ( h, (X t, Y t ) ) compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

9 Incremental model update A natural blueprint for online learning algorithms For t = 1, 2,... 1 Apply current model h t 1 to next data element (X t, Y t ) 2 Update current model: h t 1 h t H (local optimization) N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

10 Incremental model update A natural blueprint for online learning algorithms For t = 1, 2,... 1 Apply current model h t 1 to next data element (X t, Y t ) 2 Update current model: h t 1 h t H (local optimization) Goal: control regret T l ( h t 1, (X t, Y t ) ) inf h H T l ( h, (X t, Y t ) ) N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

11 Incremental model update A natural blueprint for online learning algorithms For t = 1, 2,... 1 Apply current model h t 1 to next data element (X t, Y t ) 2 Update current model: h t 1 h t H (local optimization) Goal: control regret T l ( h t 1, (X t, Y t ) ) inf h H T l ( h, (X t, Y t ) ) View this as a repeated game between a player generating predictors h t H and an opponent generating data (X t, Y t ) N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

12 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 The joy of convex N. Cesa-Bianchi (UNIMI) Online Learning 8 / 49

13 Theory of repeated games James Hannan ( ) David Blackwell ( ) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Learning 9 / 49

14 Zero-sum 2-person games played more than once M 1 l(1, 1) l(1, 2)... 2 l(2, 1) l(2, 2) N N M known loss matrix Row player (player) has N actions Column player (opponent) has M actions For each game round t = 1, 2,... Player chooses action i t and opponent chooses action y t The player suffers loss l(i t, y t ) (= gain of opponent) Player can learn from opponent s history of past choices y 1,..., y t 1 N. Cesa-Bianchi (UNIMI) Online Learning 10 / 49

15 Prediction with expert advice t = 1 t = l 1 (1) l 2 (1)... 2 l 1 (2) l 2 (2) N l 1 (N) l 2 (N) Volodya Vovk Manfred Warmuth Opponent s moves y 1, y 2,... define a sequential prediction problem with a time-varying loss function l(i t, y t ) = l t (i t ) N. Cesa-Bianchi (UNIMI) Online Learning 11 / 49

16 Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions l t = ( l t (1),..., l t (N) ) [0, 1] N for t = 1, 2,...????????? For t = 1, 2,... N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

17 Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions l t = ( l t (1),..., l t (N) ) [0, 1] N for t = 1, 2,...????????? For t = 1, 2,... 1 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

18 Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions l t = ( l t (1),..., l t (N) ) [0, 1] N for t = 1, 2, For t = 1, 2,... 1 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 2 Player gets feedback information: l t (1),..., l t (N) N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

19 Regret analysis Regret [ T ] def R T = E l t (I t ) min T i=1,...,n l t (i) want = o(t) N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

20 Regret analysis Regret [ T ] def R T = E l t (I t ) min T i=1,...,n l t (i) want = o(t) Lower bound using random losses [Experts paper, 1997] l t (i) L t (i) {0, 1} independent random coin flip [ T ] For any player strategy E L t (I t ) = T 2 Then the expected regret is [ T ( ) ] 1 E max i=1,...,n 2 L t(i) = ( 1 o(1) ) T ln N 2 for N, T N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

21 Exponentially weighted forecaster (Hedge) At time t pick action I t = i with probability proportional to ( ) t 1 exp η l s (i) s=1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts paper, 1997] If η = T ln N (ln N)/(8T) then R T 2 Matching lower bound including constants Dynamic choice η t = (ln N)/(8t) only loses small constants N. Cesa-Bianchi (UNIMI) Online Learning 14 / 49

22 The nonstochastic bandit problem????????? N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

23 The nonstochastic bandit problem????????? For t = 1, 2,... 1 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

24 The nonstochastic bandit problem?.3??????? For t = 1, 2,... 1 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 2 Player gets partial information: Only l t (I t ) is revealed N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

25 The nonstochastic bandit problem?.3??????? For t = 1, 2,... 1 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 2 Player gets partial information: Only l t (I t ) is revealed Player still competing agaist best offline action [ T ] T R T = E l t (I t ) min l t (i) i=1,...,n N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

26 The Exp3 algorithm [Auer et al., 2002] Hedge with estimated losses ( ) t 1 P t (I t = i) exp η ls (i) lt (i) = s=1 l t (i) P t ( lt (i) observed ) if I t = i 0 otherwise Only one non-zero component in l t i = 1,..., N N. Cesa-Bianchi (UNIMI) Online Learning 16 / 49

27 The Exp3 algorithm [Auer et al., 2002] Hedge with estimated losses ( ) t 1 P t (I t = i) exp η ls (i) lt (i) = s=1 l t (i) P t ( lt (i) observed ) if I t = i 0 otherwise Only one non-zero component in l t i = 1,..., N Properties of importance weighting estimator ] E t [ lt (i) = l t (i) unbiasedness E t [ lt (i) 2] 1 P t ( lt (i) observed ) variance control N. Cesa-Bianchi (UNIMI) Online Learning 16 / 49

28 Exp3 regret bound R T ln N η ln N η + η 2 E + η 2 E [ T i=1 [ T i=1 N P t (I t = i)e t [ lt (i) 2]] N ] P t (I t = i) ( P t lt (i) is observed ) = ln N η + η 2 NT = NT ln N lower bound Ω ( NT ) N. Cesa-Bianchi (UNIMI) Online Learning 17 / 49

29 Exp3 regret bound R T ln N η ln N η + η 2 E + η 2 E [ T i=1 [ T i=1 N P t (I t = i)e t [ lt (i) 2]] N ] P t (I t = i) ( P t lt (i) is observed ) = ln N η + η 2 NT = NT ln N lower bound Ω ( NT ) Improved matching upper bound by [Audibért and Bubeck, 2009] N. Cesa-Bianchi (UNIMI) Online Learning 17 / 49

30 Exp3 regret bound R T ln N η ln N η + η 2 E + η 2 E [ T i=1 [ T i=1 N P t (I t = i)e t [ lt (i) 2]] N ] P t (I t = i) ( P t lt (i) is observed ) = ln N η + η 2 NT = NT ln N lower bound Ω ( NT ) Improved matching upper bound by [Audibért and Bubeck, 2009] The full information (experts) setting Player observes vector of losses l t after each play P t (l t (i) is observed) = 1 R T T ln N N. Cesa-Bianchi (UNIMI) Online Learning 17 / 49

31 Nonoblivious opponents The adaptive adversary The loss of action i at time t depends on the player s past m actions l t (i) l t (I t m,..., I t 1, i) N. Cesa-Bianchi (UNIMI) Online Learning 18 / 49

32 Nonoblivious opponents The adaptive adversary The loss of action i at time t depends on the player s past m actions l t (i) l t (I t m,..., I t 1, i) Examples: bandits with switching cost N. Cesa-Bianchi (UNIMI) Online Learning 18 / 49

33 Nonoblivious opponents The adaptive adversary The loss of action i at time t depends on the player s past m actions l t (i) l t (I t m,..., I t 1, i) Examples: bandits with switching cost Nonoblivious regret [ T = E l t (I t m,..., I t 1, I t ) R non T min i=1,...,n ] T l t (I t m,..., I t 1, i) N. Cesa-Bianchi (UNIMI) Online Learning 18 / 49

34 Nonoblivious opponents The adaptive adversary The loss of action i at time t depends on the player s past m actions l t (i) l t (I t m,..., I t 1, i) Examples: bandits with switching cost Nonoblivious regret [ T = E l t (I t m,..., I t 1, I t ) R non T R pol T min Policy regret T = E l t (I t m,..., I t 1, I t ) i=1,...,n min ] T l t (I t m,..., I t 1, i) i=1,...,n T l t (i,..., i, i) } {{ } m times N. Cesa-Bianchi (UNIMI) Online Learning 18 / 49

35 Bandits and reactive opponents Bounds on the nonoblivious regret (even when m depends on T) R non T = O ( ) TN ln N Exp3 with biased loss estimates Is the ln N factor necessary? N. Cesa-Bianchi (UNIMI) Online Learning 19 / 49

36 Bandits and reactive opponents Bounds on the nonoblivious regret (even when m depends on T) R non T = O ( ) TN ln N Exp3 with biased loss estimates Is the ln N factor necessary? Bounds on the policy regret for any constant m 1 = O ((N ln N) 1/3 T 2/3) R pol T Achieved by a very simple player strategy Optimal up to log factors! [Dekel, Koren, and Peres, 2014] N. Cesa-Bianchi (UNIMI) Online Learning 19 / 49

37 Partial monitoring: not observing any loss Dynamic pricing: Perform as the best fixed price 1 Post a T-shirt price 2 Observe if next customer buys or not 3 Adjust price Feedback does not reveal the player s loss c c c c c c c c c c 0 Loss matrix Feedback matrix N. Cesa-Bianchi (UNIMI) Online Learning 20 / 49

38 A characterization of minimax regret Special case Multiarmed bandits: loss and feedback matrix are the same N. Cesa-Bianchi (UNIMI) Online Learning 21 / 49

39 A characterization of minimax regret Special case Multiarmed bandits: loss and feedback matrix are the same A general gap theorem [Bartok, Foster, Pál, Rakhlin and Szepesvári, 2013] A constructive characterization of the minimax regret for any pair of loss/feedback matrix Only three possible rates for nontrivial games: 1 Easy games (e.g., bandits): Θ ( T ) 2 Hard games (e.g., revealing action): Θ ( T 2/3) 3 Impossible games: Θ(T) N. Cesa-Bianchi (UNIMI) Online Learning 21 / 49

40 A game equivalent to prediction with expert advice Online linear optimization in the simplex 1 Play p t from the N-dimensional simplex N 2 Incur linear loss E [ l t (I t ) ] = p t l t 3 Observe loss gradient l t Regret: compete against the best point in the simplex T p t l t min q N T q l t } {{ } 1 T = min l t (i) i=1,...,n T N. Cesa-Bianchi (UNIMI) Online Learning 22 / 49

41 From game theory to machine learning UNLABELED DATA CLASSIFICATION SYSTEM GUESSED LABEL TRUE LABEL OPPONENT Opponent s moves y t are viewed as values or labels assigned to observations x t R d (e.g., categories of documents) A repeated game between the player choosing an element w t of a linear space and the opponent choosing a label y t for x t Regret with respect to best element in the linear space N. Cesa-Bianchi (UNIMI) Online Learning 23 / 49

42 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 The joy of convex N. Cesa-Bianchi (UNIMI) Online Learning 24 / 49

43 Online convex optimization [Zinkevich, 2003] 1 Play w t from a convex and compact subset S of a linear space 2 Observe convex loss l t : S R and pay l t (w t ) 3 Update: w t w t+1 S N. Cesa-Bianchi (UNIMI) Online Learning 25 / 49

44 Online convex optimization [Zinkevich, 2003] 1 Play w t from a convex and compact subset S of a linear space 2 Observe convex loss l t : S R and pay l t (w t ) 3 Update: w t w t+1 S Example Regression with square loss: l t (w) = ( w x t y t ) 2 y t R Classification with hinge loss: l t (w) = [ 1 y t w x t ]+ y t { 1, +1} N. Cesa-Bianchi (UNIMI) Online Learning 25 / 49

45 Online convex optimization [Zinkevich, 2003] 1 Play w t from a convex and compact subset S of a linear space 2 Observe convex loss l t : S R and pay l t (w t ) 3 Update: w t w t+1 S Example Regression with square loss: l t (w) = ( w x t y t ) 2 y t R Classification with hinge loss: l t (w) = [ 1 y t w x t ]+ y t { 1, +1} Regret T T R T (u) = l t (w t ) l t (u) u S N. Cesa-Bianchi (UNIMI) Online Learning 25 / 49

46 Finding a good online algorithm Follow the leader w t+1 = arginf w S t l s (w) s=1 Regret can be linear due to lack of stability S = [ 1, +1] l 1 (w) = w 2 l t (w) = { w if t is even +w if t is odd N. Cesa-Bianchi (UNIMI) Online Learning 26 / 49

47 Finding a good online algorithm Follow the leader w t+1 = arginf w S t l s (w) s=1 Regret can be linear due to lack of stability S = [ 1, +1] l 1 (w) = w 2 l t (w) = { w if t is even +w if t is odd Note: t l s (w) = s=1 { w 2 if t is even + w 2 if t is odd Hence l t+1 (w t+1 ) = 1 for all t = 1, 2... N. Cesa-Bianchi (UNIMI) Online Learning 26 / 49

48 Follow the regularized leader [Shalev-Shwartz, 2007; Abernethy, Hazan and Rakhlin, 2008] w t+1 = argmin w S [ η ] t l s (w) + Φ(w) s=1 Φ is a strongly convex regularizer and η > 0 is a scale parameter N. Cesa-Bianchi (UNIMI) Online Learning 27 / 49

49 Convexity, smoothness, and duality Strong convexity Φ : S R is β-strongly convex w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + β u v 2 2 N. Cesa-Bianchi (UNIMI) Online Learning 28 / 49

50 Convexity, smoothness, and duality Strong convexity Φ : S R is β-strongly convex w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + β u v 2 2 Smoothness Φ : S R is α-smooth w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + α u v 2 2 N. Cesa-Bianchi (UNIMI) Online Learning 28 / 49

51 Convexity, smoothness, and duality Strong convexity Φ : S R is β-strongly convex w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + β u v 2 2 Smoothness Φ : S R is α-smooth w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + α u v 2 2 If Φ is β-strongly convex w.r.t. 2, then 2 Φ βi If Φ is α-smooth w.r.t. 2, then 2 Φ αi N. Cesa-Bianchi (UNIMI) Online Learning 28 / 49

52 Examples Euclidean norm: Φ = is 1-strongly convex w.r.t. 2 N. Cesa-Bianchi (UNIMI) Online Learning 29 / 49

53 Examples Euclidean norm: Φ = is 1-strongly convex w.r.t. 2 p-norm: Φ = p is (p 1)-strongly convex w.r.t. p (for 1 < p 2) N. Cesa-Bianchi (UNIMI) Online Learning 29 / 49

54 Examples Euclidean norm: Φ = is 1-strongly convex w.r.t. 2 p-norm: Φ = p is (p 1)-strongly convex w.r.t. p (for 1 < p 2) Entropy: Φ(p) = d p i ln p i is 1-strongly convex w.r.t. 1 i=1 (for p in the probability simplex) N. Cesa-Bianchi (UNIMI) Online Learning 29 / 49

55 Examples Euclidean norm: Φ = is 1-strongly convex w.r.t. 2 p-norm: Φ = p is (p 1)-strongly convex w.r.t. p (for 1 < p 2) Entropy: Φ(p) = d p i ln p i is 1-strongly convex w.r.t. 1 i=1 (for p in the probability simplex) Power norm: Φ(w) = 1 2 w Aw is 1-strongly convex w.r.t. w = w Aw (for A symmetric and positive definite) N. Cesa-Bianchi (UNIMI) Online Learning 29 / 49

56 Convex duality Definition The convex dual of Φ is ( ) Φ (θ) = max θ w Φ(w) w S 1-dimensional example Convex f : R R such that f(0) = 0 f ( ) (θ) = max w θ f(w) w R The maximizer is w 0 such that f (w 0 ) = θ This gives f (θ) = w 0 f (w 0 ) f(w 0 ) As f(0) = 0, f (θ) is the error in approximating f(0) with a first-order expansion around f(w 0 ) N. Cesa-Bianchi (UNIMI) Online Learning 30 / 49

57 Convex duality (thanks to Shai Shalev-Shwartz for the image) N. Cesa-Bianchi (UNIMI) Online Learning 31 / 49

58 Convexity, smoothness, and duality Examples Euclidean norm: Φ = and Φ = Φ N. Cesa-Bianchi (UNIMI) Online Learning 32 / 49

59 Convexity, smoothness, and duality Examples Euclidean norm: Φ = and Φ = Φ p-norm: Φ = p and Φ = q where 1 p + 1 q = 1 N. Cesa-Bianchi (UNIMI) Online Learning 32 / 49

60 Convexity, smoothness, and duality Examples Euclidean norm: Φ = and Φ = Φ p-norm: Φ = p and Φ = q where 1 p + 1 q = 1 Entropy: Φ(p) = d i=1 ( ) p i ln p i and Φ (θ) = ln e θ e θ d N. Cesa-Bianchi (UNIMI) Online Learning 32 / 49

61 Convexity, smoothness, and duality Examples Euclidean norm: Φ = and Φ = Φ p-norm: Φ = p and Φ = q where 1 p + 1 q = 1 Entropy: Φ(p) = d i=1 ( ) p i ln p i and Φ (θ) = ln e θ e θ d Power norm: Φ(w) = 1 2 w Aw and Φ (θ) = 1 2 θ A 1 θ N. Cesa-Bianchi (UNIMI) Online Learning 32 / 49

62 Some useful properties If Φ : S R is β-strongly convex w.r.t., then Its convex dual Φ is everywhere differentiable and 1 β -smooth w.r.t. (the dual norm of ) ( ) Φ (θ) = argmax θ w Φ(w) w S N. Cesa-Bianchi (UNIMI) Online Learning 33 / 49

63 Some useful properties If Φ : S R is β-strongly convex w.r.t., then Its convex dual Φ is everywhere differentiable and 1 β -smooth w.r.t. (the dual norm of ) ( ) Φ (θ) = argmax θ w Φ(w) w S Recall: Follow the regularized leader (FTRL) [ ] t w t+1 = argmin η l s (w) + Φ(w) w S s=1 Φ is a strongly convex regularizer and η > 0 is a scale parameter N. Cesa-Bianchi (UNIMI) Online Learning 33 / 49

64 Using the loss gradient Linearization of convex losses l t (w t ) l t (u) l t (w t ) w } {{ } t l t (w t ) } {{ } lt lt u FTRL with linearized losses w t+1 = argmin w S ( η ) t ls w + Φ(w) s=1 } {{ } θ t+1 Note: w t+1 S always holds ) = argmax (θ t+1 w Φ(w) w S = Φ ( θ t+1 ) N. Cesa-Bianchi (UNIMI) Online Learning 34 / 49

65 The Mirror Descent algorithm [Nemirovsky and Yudin, 1983] Recall: w t+1 = Φ ( θ t ) = Φ ( η ) t l s (w s ) s=1 Online Mirror Descent (FTRL with linearized losses) Parameters: Strongly convex regularizer Φ with domain S, η > 0 Initialize: θ 1 = 0 // primal parameter For t = 1, 2,... 1 Use w t = Φ (θ t ) // dual parameter (via mirror step) 2 Suffer loss l t (w t ) 3 Observe loss gradient l t (w t ) 4 Update θ t+1 = θ t η l t (w t ) // gradient step N. Cesa-Bianchi (UNIMI) Online Learning 35 / 49

66 An equivalent formulation Under some assumptions on the regularizer Φ, OMD can be equivalently written in terms of projected gradient descent Online Mirror Descent (alternative version) Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: z 1 = Φ ( 0 ) ( ) and w 1 = argmin D Φ w z1 w S For t = 1, 2,... 1 Use w t and suffer loss l t (w t ) 2 Observe loss gradient l t (w t ) ) 3 Update z t+1 = Φ ( Φ(z t ) η l t (w t ) ( ) 4 w t+1 = argmin D Φ w zt+1 w S // gradient step // projection step N. Cesa-Bianchi (UNIMI) Online Learning 36 / 49

67 An equivalent formulation Under some assumptions on the regularizer Φ, OMD can be equivalently written in terms of projected gradient descent Online Mirror Descent (alternative version) Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: z 1 = Φ ( 0 ) ( ) and w 1 = argmin D Φ w z1 w S For t = 1, 2,... 1 Use w t and suffer loss l t (w t ) 2 Observe loss gradient l t (w t ) ) 3 Update z t+1 = Φ ( Φ(z t ) η l t (w t ) ( ) 4 w t+1 = argmin D Φ w zt+1 w S D Φ is the Bregman divergence induced by Φ // gradient step // projection step N. Cesa-Bianchi (UNIMI) Online Learning 36 / 49

68 Some examples Online Gradient Descent (OGD) [Zinkevich, 2003; Gentile, 2003] Φ(w) = 1 2 w 2 Update: w = w t η l t (w t ) p-norm version: Φ(w) = 1 2 w 2 p w t+1 = arginf w w 2 w S N. Cesa-Bianchi (UNIMI) Online Learning 37 / 49

69 Some examples Online Gradient Descent (OGD) [Zinkevich, 2003; Gentile, 2003] Φ(w) = 1 2 w 2 Update: w = w t η l t (w t ) p-norm version: Φ(w) = 1 2 w 2 p w t+1 = arginf w w 2 w S Exponentiated gradient (EG) [Kivinen and Warmuth, 1997] Φ(p) = p t+1,i = d p i ln p i i=1 p t,i e η l t(p t ) i d j=1 p t,je η l t(p t ) j Note: when losses are linear this is Hedge p S simplex N. Cesa-Bianchi (UNIMI) Online Learning 37 / 49

70 Regret analysis Regret bound [Kakade, Shalev-Shwartz and Tewari, 2012] R T (u) Φ(u) min w S Φ(w) η + η 2 T for all u S, where l 1, l 2,... are arbitrary convex losses l t (w t ) 2 β R T (u) GD T for all u S when η is tuned w.r.t. sup l t (w) G w S sup u,w S ( ) Φ(u) Φ(w) D Boundedness of gradients of l t w.r.t. equivalent to Lipschitzess of l t w.r.t. Regret bound optimal for general convex losses l t N. Cesa-Bianchi (UNIMI) Online Learning 38 / 49

71 Analysis relies on smoothness of Φ ( ) Φ (θ t+1 ) Φ (θ t ) Φ (θ t ) θ } {{ } t+1 θ t + 1 } {{ } 2β θ t+1 θ t 2 w t η l t (w t ) N. Cesa-Bianchi (UNIMI) Online Learning 39 / 49

72 Analysis relies on smoothness of Φ ( ) Φ (θ t+1 ) Φ (θ t ) Φ (θ t ) θ } {{ } t+1 θ t + 1 } {{ } 2β θ t+1 θ t 2 w t η l t (w t ) T ηu l t (w t ) Φ(u) = u θ T +1 Φ(u) Φ ( θ T +1 ) = Fenchel-Young inequality T ( Φ ( ) θ t+1 Φ ( )) θ t + Φ ( ) θ 1 T ) ( ηw t l t (w t ) + η2 2β l t(w t ) 2 + Φ (0) N. Cesa-Bianchi (UNIMI) Online Learning 39 / 49

73 Analysis relies on smoothness of Φ ( ) Φ (θ t+1 ) Φ (θ t ) Φ (θ t ) θ } {{ } t+1 θ t + 1 } {{ } 2β θ t+1 θ t 2 w t η l t (w t ) T ηu l t (w t ) Φ(u) = u θ T +1 Φ(u) Φ ( θ T +1 ) = Fenchel-Young inequality T ( Φ ( ) θ t+1 Φ ( )) θ t + Φ ( ) θ 1 T ( ) Φ (0) = max w 0 Φ(w) w S ) ( ηw t l t (w t ) + η2 2β l t(w t ) 2 + Φ (0) = min w S Φ(w) N. Cesa-Bianchi (UNIMI) Online Learning 39 / 49

74 Some examples l t (w) l t ( w x t ) max t l t L max t x t p X p N. Cesa-Bianchi (UNIMI) Online Learning 40 / 49

75 Some examples l t (w) l t ( w x t ) max t l t L max t x t p X p Bounds for OGD with convex losses R T (u) BLX 2 T = O ( dl T ) for all u such that u 2 B N. Cesa-Bianchi (UNIMI) Online Learning 40 / 49

76 Some examples l t (w) l t ( w x t ) max t l t L max t x t p X p Bounds for OGD with convex losses R T (u) BLX 2 T = O ( dl T ) for all u such that u 2 B Bounds logarithmic in the dimension Regret bound for EG run in the simplex, S = d ( ) R T (q) LX (ln d)t = O L (ln d)t p d Same bound for p-norm regularizer with p = ln d ln d 1 If losses are linear with [0, 1] coefficients then we recover the bound for Hedge N. Cesa-Bianchi (UNIMI) Online Learning 40 / 49

77 Exploiting curvature: minimization of SVM objective Training set (x 1, y 1 ),..., (x m, y m ) R d { 1, +1} SVM objective F(w) = 1 m [ 1 yt w ] x t + λ m + } {{ } 2 w 2 over R d hinge loss h t (w) Rewrite F(w) = 1 m l t (w) where l t (w) = h t (w) + λ m 2 w 2 Each loss l t is λ-strongly convex N. Cesa-Bianchi (UNIMI) Online Learning 41 / 49

78 Exploiting curvature: minimization of SVM objective Training set (x 1, y 1 ),..., (x m, y m ) R d { 1, +1} SVM objective F(w) = 1 m [ 1 yt w ] x t + λ m + } {{ } 2 w 2 over R d hinge loss h t (w) Rewrite F(w) = 1 m l t (w) where l t (w) = h t (w) + λ m 2 w 2 Each loss l t is λ-strongly convex The Pegasos algorithm Run OGD on random sequence of T training examples [ ( )] 1 T E F w t min F(w) + G2 ln T + 1 T w R d 2λ T O(ln T) rates hold for any sequence of strongly convex losses N. Cesa-Bianchi (UNIMI) Online Learning 41 / 49

79 Exp-concave losses Exp-concavity (strong convexity along the gradient direction) A convex l : S R is α-exp-concave when g(w) = e αl(w) is concave For twice-differentiable losses: 2 l(w) α l(w) l(w) for all w S l t (w) = ln ( w x t ) is exp-concave N. Cesa-Bianchi (UNIMI) Online Learning 42 / 49

80 Online Newton Step [Hazan, Agarwal and Kale, 2007] Update: w = A 1 t l t(w t ) w t+1 = argmin w w At w S t Where A t = εi + l s (w s ) l s (w s ) s=1 Note: Not an instance of OMD N. Cesa-Bianchi (UNIMI) Online Learning 43 / 49

81 Online Newton Step [Hazan, Agarwal and Kale, 2007] Update: w = A 1 t l t(w t ) w t+1 = argmin w w At w S t Where A t = εi + l s (w s ) l s (w s ) s=1 Note: Not an instance of OMD Logarithmic regret bound for exp-concave losses ( ) 1 R T (u) 5d α + GD ln(t + 1) u S N. Cesa-Bianchi (UNIMI) Online Learning 43 / 49

82 Online Newton Step [Hazan, Agarwal and Kale, 2007] Update: w = A 1 t l t(w t ) w t+1 = argmin w w At w S t Where A t = εi + l s (w s ) l s (w s ) s=1 Note: Not an instance of OMD Logarithmic regret bound for exp-concave losses ( ) 1 R T (u) 5d α + GD ln(t + 1) u S Extension of ONS to convex losses [Luo, Agarwal, C-B, Langford, 2016] l t (w) l t ( w x t ) max t l t L R T (u) Õ( CL dt ) for all u s.t. Invariance to linear transformations of the data u x t C N. Cesa-Bianchi (UNIMI) Online Learning 43 / 49

83 Online Ridge Regression [Vovk, 2001; Azoury and Warmuth, 2001] Logarithmic regret for square loss l t (u) = ( u x t y t ) 2 Y = max,...,t y t X = max,...,t x t OMD with adaptive regularizer Φ t (w) = 1 2 w 2 A t Where A t = I + t x s x s and θ t = s=1 t y s x s s=1 Regret bound (oracle inequality) ( T T ) l t (w t ) l t (u) + u 2 + dy 2 ln Parameterless inf u R d Scale-free: unbounded comparison set ) (1 + TX2 d N. Cesa-Bianchi (UNIMI) Online Learning 44 / 49

84 Scale free algorithm for convex losses [Orabona and Pál, 2015] Scale free algorithm for convex losses OMD with adaptive regularizer Φ t (w) = Φ 0 (w) t 1 l s (w s ) 2 s=1 Φ 0 is a β-strongly convex base regularizer Regret bound (oracle inequality) for convex loss functions l t T l t (w t ) inf u R d T ( l t (u) + Φ 0 (u) + 1 ) T β + max l t (w t ) t N. Cesa-Bianchi (UNIMI) Online Learning 45 / 49

85 Regularization via stochastic smoothing w t+1 = E Z [argmin w S t s=1 ] ( ) w η l s (w s ) + Z The distribution of Z must be stable (small variance and small average sensitivity) Regret bound similar to FTRL/OMD For some choices of Z, FPL becomes equivalent to OMD [Abernethy, Lee, Sinha and Tewari, 2014] Linear losses: Follow the Perturbed Leader algorithm [Kalai and Vempala, 2005] N. Cesa-Bianchi (UNIMI) Online Learning 46 / 49

86 Shifting regret [Herbster and Warmuth, 2001] Nonstationarity If data source is not fitted well by any model in the class, then comparing to the best model u S is trivial Compare instead to the best sequence u 1, u 2, S of models Shifting Regret for OMD [Zinkevich, 2003] T l t (w t ) } {{ } cumulative loss inf u 1,...,u T S T l t (u t ) } {{ } model fit + T u t u t 1 } {{ } shifting model cost + diam(s) + N. Cesa-Bianchi (UNIMI) Online Learning 47 / 49

87 Strongly adaptive regret [Daniely, Gonen, Shalev-Shwartz, 2015] Definition For all intervals I = {r,..., s} with 1 r < s T R T,I (u) = l t (w t ) l t (u) t I t I N. Cesa-Bianchi (UNIMI) Online Learning 48 / 49

88 Strongly adaptive regret [Daniely, Gonen, Shalev-Shwartz, 2015] Definition For all intervals I = {r,..., s} with 1 r < s T R T,I (u) = l t (w t ) l t (u) t I t I Regret bound for strongly adaptive OGD ( R T,I (u) BLX 2 + ln(t + 1)) I for all u such that u 2 B N. Cesa-Bianchi (UNIMI) Online Learning 48 / 49

89 Strongly adaptive regret [Daniely, Gonen, Shalev-Shwartz, 2015] Definition For all intervals I = {r,..., s} with 1 r < s T R T,I (u) = l t (w t ) l t (u) t I t I Regret bound for strongly adaptive OGD ( R T,I (u) BLX 2 + ln(t + 1)) I for all u such that u 2 B Remarks Generic black-box reduction applicable to any online learning algorithm It runs a logarithmic number of instances of the base learner N. Cesa-Bianchi (UNIMI) Online Learning 48 / 49

90 Online bandit convex optimization 1 Play w t from a convex and compact subset S of a linear space 2 Observe l t (w t ), where l : S R is unobserved convex loss 3 Update: w t w t+1 S T T Regret: R T (u) = l t (w t ) l t (u) u S N. Cesa-Bianchi (UNIMI) Online Learning 49 / 49

91 Online bandit convex optimization 1 Play w t from a convex and compact subset S of a linear space 2 Observe l t (w t ), where l : S R is unobserved convex loss 3 Update: w t w t+1 S T T Regret: R T (u) = l t (w t ) l t (u) u S Results Linear losses: Ω ( d T ) [Dani, Hayes, and Kakade, 2008] N. Cesa-Bianchi (UNIMI) Online Learning 49 / 49

92 Online bandit convex optimization 1 Play w t from a convex and compact subset S of a linear space 2 Observe l t (w t ), where l : S R is unobserved convex loss 3 Update: w t w t+1 S T T Regret: R T (u) = l t (w t ) l t (u) u S Results Linear losses: Ω ( d T ) [Dani, Hayes, and Kakade, 2008] Linear losses: Õ ( d T ) [Bubeck, C-B, and Kakade, 2012] N. Cesa-Bianchi (UNIMI) Online Learning 49 / 49

93 Online bandit convex optimization 1 Play w t from a convex and compact subset S of a linear space 2 Observe l t (w t ), where l : S R is unobserved convex loss 3 Update: w t w t+1 S T T Regret: R T (u) = l t (w t ) l t (u) u S Results Linear losses: Ω ( d T ) [Dani, Hayes, and Kakade, 2008] Linear losses: Õ ( d T ) [Bubeck, C-B, and Kakade, 2012] Strongly convex and smooth losses: Õ ( d 3/2 T ) [Hazan and Levy, 2014] N. Cesa-Bianchi (UNIMI) Online Learning 49 / 49

94 Online bandit convex optimization 1 Play w t from a convex and compact subset S of a linear space 2 Observe l t (w t ), where l : S R is unobserved convex loss 3 Update: w t w t+1 S T T Regret: R T (u) = l t (w t ) l t (u) u S Results Linear losses: Ω ( d T ) [Dani, Hayes, and Kakade, 2008] Linear losses: Õ ( d T ) [Bubeck, C-B, and Kakade, 2012] Strongly convex and smooth losses: Õ ( d 3/2 T ) [Hazan and Levy, 2014] Convex losses: Õ ( d 9.5 T ) [Bubeck, Eldan, and Lee, 2016] N. Cesa-Bianchi (UNIMI) Online Learning 49 / 49

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Regret in Online Combinatorial Optimization

Regret in Online Combinatorial Optimization Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Better Algorithms for Selective Sampling

Better Algorithms for Selective Sampling Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

arxiv: v1 [cs.lg] 23 Jan 2019

arxiv: v1 [cs.lg] 23 Jan 2019 Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

Robust Selective Sampling from Single and Multiple Teachers

Robust Selective Sampling from Single and Multiple Teachers Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

An adaptive algorithm for finite stochastic partial monitoring

An adaptive algorithm for finite stochastic partial monitoring An adaptive algorithm for finite stochastic partial monitoring Gábor Bartók Navid Zolghadr Csaba Szepesvári Department of Computing Science, University of Alberta, AB, Canada, T6G E8 bartok@ualberta.ca

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Online Prediction Peter Bartlett

Online Prediction Peter Bartlett Online Prediction Peter Bartlett Statistics and EECS UC Berkeley and Mathematical Sciences Queensland University of Technology Online Prediction Repeated game: Cumulative loss: ˆL n = Decision method plays

More information

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

University of Alberta. The Role of Information in Online Learning

University of Alberta. The Role of Information in Online Learning University of Alberta The Role of Information in Online Learning by Gábor Bartók A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

The Interplay Between Stability and Regret in Online Learning

The Interplay Between Stability and Regret in Online Learning The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Gambling in a rigged casino: The adversarial multi-armed bandit problem Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

arxiv: v4 [cs.lg] 27 Jan 2016

arxiv: v4 [cs.lg] 27 Jan 2016 The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract

More information

Online and Stochastic Learning with a Human Cognitive Bias

Online and Stochastic Learning with a Human Cognitive Bias Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Online and Stochastic Learning with a Human Cognitive Bias Hidekazu Oiwa The University of Tokyo and JSPS Research Fellow 7-3-1

More information

Combinatorial Bandits

Combinatorial Bandits Combinatorial Bandits Nicolò Cesa-Bianchi Università degli Studi di Milano, Italy Gábor Lugosi 1 ICREA and Pompeu Fabra University, Spain Abstract We study sequential prediction problems in which, at each

More information

The Price of Differential Privacy for Online Learning

The Price of Differential Privacy for Online Learning Naman Agarwal Karan Singh Abstract We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal ( p T ) regret bounds.

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Proceedings of Machine Learning Research vol 65:1 17, 2017 Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi Università degli Studi di Milano, Milano,

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Partial monitoring classification, regret bounds, and algorithms

Partial monitoring classification, regret bounds, and algorithms Partial monitoring classification, regret bounds, and algorithms Gábor Bartók Department of Computing Science University of Alberta Dávid Pál Department of Computing Science University of Alberta Csaba

More information

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz To cite this version: Nicolò Cesa-Bianchi,

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr

More information

Online Learning meets Optimization in the Dual

Online Learning meets Optimization in the Dual Online Learning meets Optimization in the Dual Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., The Hebrew University, Jerusalem 91904, Israel 2 Google Inc., 1600 Amphitheater

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information