The Online Approach to Machine Learning
|
|
- Erik Curtis
- 5 years ago
- Views:
Transcription
1 The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53
2 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 2 / 53
3 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 3 / 53
4 Machine learning Classification/regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) Training data S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) (e.g., messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A(S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Approach to ML 4 / 53
5 Two notions of risk View data as a statistical sample: statistical risk [ E loss ( ) ] A( S T ), (X, Y) } {{ }} {{ } trained test model example Training set S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) and test example (X, Y) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T loss ( ) A(S t 1 ), (X } {{ } t, Y t ) } {{ } trained test model example t=1 Sequence of models trained on growing prefixes S t = ( (X 1, Y 1 ),..., (X t, Y t ) ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Approach to ML 5 / 53
6 Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning [ E loss ( A(S T ), (X, Y) )] [ inf E loss ( h, (X, Y) )] h H compare to expected loss of best model in the class Regret in online learning T loss ( A(S t 1 ), (X t, Y t ) ) inf t=1 h H t=1 T loss ( h, (X t, Y t ) ) compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Approach to ML 6 / 53
7 Incremental model update A natural blueprint for online learning algorithms For t = 1, 2,... 1 Apply current model h t 1 to next data element (X t, Y t ) 2 Update current model: h t 1 h t H Goal: control regret T loss ( h t 1, (X t, Y t ) ) inf t=1 h H t=1 T loss ( h, (X t, Y t ) ) View this as a repeated game between a player generating predictors h t H and an opponent generating data (X t, Y t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 7 / 53
8 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 8 / 53
9 Theory of repeated games James Hannan ( ) David Blackwell ( ) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Approach to ML 9 / 53
10 Zero-sum 2-person games played more than once M 1 l(1, 1) l(1, 2)... 2 l(2, 1) l(2, 2) N N M known loss matrix Row player (player) has N actions Column player (opponent) has M actions For each game round t = 1, 2,... Player chooses action i t and opponent chooses action y t The player suffers loss l(i t, y t ) (= gain of opponent) Player can learn from opponent s history of past choices y 1,..., y t 1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 10 / 53
11 Prediction with expert advice t = 1 t = l 1 (1) l 2 (1)... 2 l 1 (2) l 2 (2) N l 1 (N) l 2 (N) Volodya Vovk Manfred Warmuth Opponent s moves y 1, y 2,... define a sequential prediction problem with a time-varying loss function l(i t, y t ) = l t (i t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 11 / 53
12 Playing the experts game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
13 Playing the experts game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
14 Playing the experts game N actions For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: l t (1),..., l t (N) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
15 Oblivious opponents Losses l t (1),..., l t (N) for all t = 1, 2,... are fixed beforehand, and unknown to the (randomized) player Oblivious regret minimization [ T ] def R T = E l t (I t ) t=1 min T i=1,...,n t=1 l t (i) want = o(t) N. Cesa-Bianchi (UNIMI) Online Approach to ML 13 / 53
16 Bounds on regret [Experts paper, 1997] Lower bound using random losses l t (i) L t (i) {0, 1} independent random coin flip [ T ] For any player strategy E L t (I t ) = T 2 t=1 Then the expected regret is [ T ( ) ] 1 E max i=1,...,n 2 L t(i) = ( 1 o(1) ) T ln N 2 t=1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 14 / 53
17 Exponentially weighted forecaster At time t pick action I t = i with probability proportional to ( ) t 1 exp η l s (i) s=1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts paper, 1997] If η = T ln N (ln N)/(8T) then R T 2 Matching lower bound including constants Dynamic choice η t = (ln N)/(8t) only loses small constants N. Cesa-Bianchi (UNIMI) Online Approach to ML 15 / 53
18 The bandit problem: playing an unknown game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
19 The bandit problem: playing an unknown game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
20 The bandit problem: playing an unknown game N actions? 3??????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: Only l t (I t ) is revealed N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
21 The bandit problem: playing an unknown game N actions? 3??????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: Only l t (I t ) is revealed Many applications Ad placement, dynamic content adaptation, routing, online auctions N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
22 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 17 / 53
23 Relationships between actions [Mannor and Shamir, 2011] Undirected Directed N. Cesa-Bianchi (UNIMI) Online Approach to ML 18 / 53
24 A graph of relationships over actions?????????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
25 A graph of relationships over actions?????????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
26 A graph of relationships over actions ????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
27 Recovering expert and bandit settings Experts: clique Bandits: empty graph? 3???????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 20 / 53
28 Exponentially weighted forecaster Reprise Player s strategy [Alon, C-B, Gentile, Mannor, Mansour and Shamir, 2013] ( ) t 1 P t (I t = i) exp η ls (i) i = 1,..., N lt (i) = s=1 l t (i) P t ( lt (i) observed ) Importance sampling estimator if l t(i) is observed 0 otherwise ] E t [ lt (i) = l t (i) unbiased E t [ lt (i) 2] 1 P t ( lt (i) observed ) variance control N. Cesa-Bianchi (UNIMI) Online Approach to ML 21 / 53
29 Independence number α(g) The size of the largest independent set N. Cesa-Bianchi (UNIMI) Online Approach to ML 22 / 53
30 Independence number α(g) The size of the largest independent set N. Cesa-Bianchi (UNIMI) Online Approach to ML 22 / 53
31 Regret bounds Analysis (undirected graphs) R T ln N η + η 2 { }} { T N P ( I t = i l t (i) observed ) t=1 i=1 α(g) = α(g)t ln N by tuning η If graph is directed, then bound worsens only by log factors Special cases Experts: α(g) = 1 R T T ln N Bandits: α(g) = N R T TN ln N N. Cesa-Bianchi (UNIMI) Online Approach to ML 23 / 53
32 Reactive opponents [Dekel, Koren and Peres, 2014] The loss of action i at time t depends on the player s past m actions l t (i) L t (I t m,..., I t 1, i) Adaptive regret T = E L t (I t m,..., I t 1, I t ) R ada T t=1 min i=1,...,k t=1 T L t (i,..., i, i) } {{ } m times Minimax adaptive regret (for any constant m > 1) R ada T = Θ ( T 2/3) N. Cesa-Bianchi (UNIMI) Online Approach to ML 24 / 53
33 Partial monitoring: not observing any loss Dynamic pricing: Perform as the best fixed price 1 Post a T-shirt price 2 Observe if next customer buys or not 3 Adjust price Feedback does not reveal the player s loss c c c c c c c c c c 0 Loss matrix Feedback matrix N. Cesa-Bianchi (UNIMI) Online Approach to ML 25 / 53
34 A characterization of minimax regret Special case Multiarmed bandits: loss and feedback matrix are the same A general gap theorem [Bartok, Foster, Pál, Rakhlin and Szepesvári, 2013] A constructive characterization of the minimax regret for any pair of loss/feedback matrix Only three possible rates for nontrivial games: 1 Easy games (e.g., bandits): Θ ( T ) 2 Hard games (e.g., revealing action): Θ ( T 2/3) 3 Impossible games: Θ(T) N. Cesa-Bianchi (UNIMI) Online Approach to ML 26 / 53
35 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 27 / 53
36 A game equivalent to prediction with expert advice Online linear optimization in the simplex 1 Play point p t from the N-dimensional simplex N 2 Incur linear loss E [ l t (I t ) ] = p l t t 3 Observe loss gradient l t Regret: compete against the best point in the simplex T p l t t t=1 min q N T q lt t=1 } {{ } 1 T = min lt (i) i=1,...,n T t=1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 28 / 53
37 From game theory to machine learning UNLABELED DATA CLASSIFICATION SYSTEM GUESSED LABEL TRUE LABEL OPPONENT Opponent s moves y t are viewed as values or labels assigned to observations x t R d (e.g., categories of documents) A repeated game between the player choosing an element w t of a linear space and the opponent choosing a label y t for x t Regret with respect to best element in the linear space N. Cesa-Bianchi (UNIMI) Online Approach to ML 29 / 53
38 Online convex optimization [Zinkevich, 2003] 1 Play point w t from a convex linear space S 2 Incur convex loss l t (w t ) 3 Observe loss gradient l t (w t ) 4 Update point: w t w t+1 S Example Regression with square loss: l t (w) = ( w x t y t ) 2 y t R Classification with hinge loss: l t (w) = [ 1 y t w x t ]+ y t { 1, +1} Regret T l t (w t ) inf t=1 u S t=1 T l t (u) N. Cesa-Bianchi (UNIMI) Online Approach to ML 30 / 53
39 Finding a good online algorithm Follow the leader w t+1 = arginf w S t l s (w) s=1 Regret can be linear due to lack of stability Example S = [ 1, +1] l 1 (w) = 1 + w 2 l t (w) = { w if t is even +w if t is odd N. Cesa-Bianchi (UNIMI) Online Approach to ML 31 / 53
40 Regularized online learning Strong convexity Φ : S R is β-strongly convex w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + β u v 2 2 Example: Φ(v) = 1 2 v 2 Follow the regularized leader [Shalev-Shwartz, 2007; Abernethy, Hazan and Rakhlin, 2008] [ ] t w t+1 = argmin η l s (w) + Φ(w) w S s=1 Φ is a strongly convex regularizer defined on S N. Cesa-Bianchi (UNIMI) Online Approach to ML 32 / 53
41 Deriving an incremental update Linearization of convex losses l t (w t ) l t (u) l t (w t ) w } {{ } t l t (w t ) } {{ } lt lt u Follow the regularized leader with linearized losses w t+1 = argmin w S ( η ) t ls w + Φ(w) s=1 } {{ } θ t Φ is the convex dual of Φ ( ) = argmax ηθ t w Φ(w) w S = Φ ( η θ t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 33 / 53
42 The Mirror Descent algorithm [Nemirovsky and Yudin, 1983] Recall: ( ) w t+1 = Φ ( ) t η θ t = Φ η l s (w s ) s=1 Online Mirror Descent Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: θ 1 = 0 // primal parameter For t = 1, 2,... 1 Use w t = Φ (θ t ) // dual parameter (via mirror step) 2 Suffer loss l t (w t ) 3 Observe loss gradient l t (w t ) 4 Update θ t+1 = θ t η l t (w t ) // gradient step N. Cesa-Bianchi (UNIMI) Online Approach to ML 34 / 53
43 Some examples Exponentiated gradient: S = N and Φ(w) = [Kivinen and Warmuth, 1997] d w i ln w i i=1 Online Gradient Descent: S = R d and Φ(w) = 1 2 w 2 [Zinkevich, 2003] p-norm Gradient Descent: S = R d and Φ(w) = [Gentile, 2003] 1 2(p 1) w 2 p Matrix gradient descent [Cavallanti, C-B and Gentile, 2010; Kakade, Shalev-Shwartz and Tewari, 2012] N. Cesa-Bianchi (UNIMI) Online Approach to ML 35 / 53
44 General regret bound Analysis relies on smoothness of Φ in order to bound increments Φ (θ t+1 ) Φ (θ t ) via l t (w t ) 2 Oracle bound [Kakade, Shalev-Shwartz and Tewari, 2012] ( T T ) l t (w t ) inf l t (u) + Φ(u) + η T l t (w t ) 2 u S η 2 β t=1 t=1 } {{ }} {{ }} {{ } t=1 model cost cumulative loss model fit l 1, l 2,... are arbitrary convex losses If gradients are bounded, then R T = O ( T ) This is optimal for general convex losses l t If all l t are strongly convex, then R T = O ( ln T ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 36 / 53
45 Regularization via stochastic smoothing Follow the perturbed leader [Kalai and Vempala, 2005] [ ( ) ] w t+1 = E argmin ηθ t w + Z w w S The distribution of Z must be stable (small variance and small average sensitivity) For some choices of Z, FPL becomes equivalent to OMD [Abernethy, Lee, Sinha and Tewari, 2014] N. Cesa-Bianchi (UNIMI) Online Approach to ML 37 / 53
46 Adaptive regularization Online Ridge Regression [Vovk, 2001; Azoury and Warmuth, 2001] ( T ( ) T ) w 2 ( ( t x t y t inf u ) 2 x t y t + u 2 + d ln 1 + T ) u R t=1 d d t=1 Φ t (w) = 1 t 2 w 2 A t A t = I + x s x s More examples s=1 Online Newton Step [Hazan, Agarwal and Kale, 2007] Logarithmic regret for exp-concave loss functions AdaGrad [Duchi, Hazan and Singer, 2010] Competitive with optimal fixed regularizer Scale-invariant algorithms [Ross, Mineiro and Langford, 2013] Regret invariant w.r.t. rescaling of single features N. Cesa-Bianchi (UNIMI) Online Approach to ML 38 / 53
47 Shifting regret [Herbster and Warmuth, 2001] Nonstationarity If data source is not fitted well by any model in the class, then comparing to the best model u S is trivial Compare instead to the best sequence u 1, u 2, S of models Shifting Regret for Online Mirror Descent [Zinkevich, 2003] T l t (w t ) t=1 } {{ } cumulative loss inf u 1,...,u T S T l t (u t ) t=1 } {{ } model fit + T u t u t 1 t=1 } {{ } shifting model cost + diam(s) + N. Cesa-Bianchi (UNIMI) Online Approach to ML 39 / 53
48 Online active learning TRUE LABEL (UPON REQUEST) HUMAN EXPERT UNLABELED DATA CLASSIFIER GUESSED LABEL USER Observing the data process is cheap Observing the label process is expensive need to query the human expert Question How much better can we do by subsampling adaptively the label process? N. Cesa-Bianchi (UNIMI) Online Approach to ML 40 / 53
49 A game with the opponent [C-B, Gentile, Zaniboni, 2006] Opponent avoids causing mistakes on documents far away from decision surface vectorized document decision surface Probability of querying a document proportional to inverse distance to decision surface Binary classification performance guarantee remains identical (in expectation) to the full sampling case N. Cesa-Bianchi (UNIMI) Online Approach to ML 41 / 53
50 Experiments on document categorization N. Cesa-Bianchi (UNIMI) Online Approach to ML 42 / 53
51 Stochastic Online Mirror Descent Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: θ 1 = 0 // primal parameter For t = 1, 2,... 1 Use w t = Φ (θ t ) // mirror step with projection on S 2 Suffer loss l t (w t ) 3 Compute estimate ĝ t of loss gradient l t (w t ) 4 Update θ t+1 = θ t ηĝ t // gradient step Typically, Φ(w) = 1 2 w 2 (stochastic OGD) N. Cesa-Bianchi (UNIMI) Online Approach to ML 43 / 53
52 Attribute efficient learning [C-B, Shamir, Shalev-Shwartz, 2011] original sampled Obtain a few attributes from each training example Use Stochastic OGD with square loss l t (w) = 1 2 l t (w) = ( w x t y t ) xt ( w x t y t ) 2 Unbiased estimate of square loss gradient using two attributes 1 Estimate of w x: query x i according to p(i) = w i w 1 2 Estimate of x: query x j uniformly at random ( ) 3 Gradient estimate: ĝ = w 1 sgn(w i ) x i y d x j e j N. Cesa-Bianchi (UNIMI) Online Approach to ML 44 / 53
53 Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 45 / 53
54 Online convex optimization with bandit feedback For T = 1, 2,... 1 Play point w t from a convex linear space S 2 Incur and observe convex loss l t (w t ) 3 Update point: w t w t+1 S Regret [ T ] R T = E l t (w t ) inf t=1 u S t=1 T l t (u) N. Cesa-Bianchi (UNIMI) Online Approach to ML 46 / 53
55 Gradient descent without a gradient [Flaxman, Kalai and McMahan, 2004] Run stochastic OGD using a perturbed version of w t : w t + δ U (U is a random unit vector and δ > 0) Gradient estimate ĝ t = d δ l t(w t + δ U) U Fact (Stokes Theorem): If l t were differentiable, then E [ ĝ t ] = E [ lt (w t + δ B) ] where B is a random vector in the unit ball ĝ t estimates the gradient of a locally smoothed version of l t N. Cesa-Bianchi (UNIMI) Online Approach to ML 47 / 53
56 Guarantees If l t is Lipschitz, then the smoothed version is a good approximation of l t Radius δ of perturbation controls bias/variance trade-off Regret of stochastic OGD for convex and Lipschitz loss sequences R T = O ( T 3/4) N. Cesa-Bianchi (UNIMI) Online Approach to ML 48 / 53
57 Guarantees If l t is Lipschitz, then the smoothed version is a good approximation of l t Radius δ of perturbation controls bias/variance trade-off Regret of stochastic OGD for convex and Lipschitz loss sequences R T = O ( T 3/4) The linear case Assume losses are linear functions on S, l t (w) = l t w Can we achieve a better rate? N. Cesa-Bianchi (UNIMI) Online Approach to ML 48 / 53
58 Self-concordant functions [Abernethy, Hazan and Rakhlin, 2008] Run stochastic OGD regularized with a self-concordant function for S Variance control through the associated Dikin ellipsoid Loss estimate l t obtained via perturbed point W t ± e i λi {e i, λ i } is a randomly drawn eigenvector-eigenvalue pair of Dikin ellipsoid Regret for linear functions ( R T = O d ) 3/2 T ln T N. Cesa-Bianchi (UNIMI) Online Approach to ML 49 / 53
59 Algorithm GeometricHedge [Dani, Hayes and Kakade, 2008] Build an ε-cover S 0 S of size ε d Use experts algorithm (e.g., exponential weights) to draw actions W t S 0 and use unbiased linear estimator for the loss lt = Pt 1 W t Wt l t where P t = E [ W t Wt ] Mix exponential weights with exploration distribution µ over the actions in the cover: p t (w) = (1 γ) q t (w) + γ µ(w) (0 γ 1) } {{ } exp. distrib. µ controls the variance of the loss estimates by ensuring all directions are sampled often enough N. Cesa-Bianchi (UNIMI) Online Approach to ML 50 / 53
60 Analysis [C-B and Lugosi, 2010] Regret bound R T = O ( d ( ) ) T ln T d λ min λ min = smallest eigenvalue of E µ [ W W ] λ 1 min is proportional to the variance of loss estimates When λ min 1 d we get the optimal bound Θ ( d ) T ln T If µ is uniform over all actions, the above happens when action space is approximately isotropic N. Cesa-Bianchi (UNIMI) Online Approach to ML 51 / 53
61 Good exploration bases [Bubeck, C-B and Kakade, 2012] Choose a basis under which the action set looks isotropic There are at most O ( d 2) contact points between S 0 and Löwner ellipsoid (the min volume ellipsoid enclosing S 0 ) Put exploration distribution µ on these contact points This ensures that E µ [ W W ] is isotropic: λ min = 1 d Exploration on contact points of Löwner ellipsoid achieves optimal regret ( R T = O d ) T ln T However, this construction is not efficient in general An efficient construction uses volumetric ellipsoids [Hazan, Gerber and Meka, 2014] N. Cesa-Bianchi (UNIMI) Online Approach to ML 52 / 53
62 Conclusions More applications Portfolio management Matrix completion Competitive analysis of algorithms Recommendation systems Some open problems Exact rates for bandit convex optimization Trade-offs between regret bounds and running times Online tensor and spectral learning Problems with states N. Cesa-Bianchi (UNIMI) Online Approach to ML 53 / 53
Online Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationBandit Convex Optimization
March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationOptimistic Bandit Convex Optimization
Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu
More informationRegret in Online Combinatorial Optimization
Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and
More informationarxiv: v1 [cs.lg] 23 Jan 2019
Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department
More informationUniversity of Alberta. The Role of Information in Online Learning
University of Alberta The Role of Information in Online Learning by Gábor Bartók A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationRobust Selective Sampling from Single and Multiple Teachers
Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationCombinatorial Bandits
Combinatorial Bandits Nicolò Cesa-Bianchi Università degli Studi di Milano, Italy Gábor Lugosi 1 ICREA and Pompeu Fabra University, Spain Abstract We study sequential prediction problems in which, at each
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationLower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization
JMLR: Workshop and Conference Proceedings vol 40:1 16, 2015 28th Annual Conference on Learning Theory Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization Mehrdad
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationA Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationAccelerating Online Convex Optimization via Adaptive Prediction
Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationUnconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationThe convex optimization approach to regret minimization
The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision
More informationAn Online Convex Optimization Approach to Blackwell s Approachability
Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical
More informationA Drifting-Games Analysis for Online Learning and Applications to Boosting
A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301
More informationBandit Convex Optimization: T Regret in One Dimension
Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationOnline and Stochastic Learning with a Human Cognitive Bias
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Online and Stochastic Learning with a Human Cognitive Bias Hidekazu Oiwa The University of Tokyo and JSPS Research Fellow 7-3-1
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationOnline Learning with Predictable Sequences
JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We
More informationPartial monitoring classification, regret bounds, and algorithms
Partial monitoring classification, regret bounds, and algorithms Gábor Bartók Department of Computing Science University of Alberta Dávid Pál Department of Computing Science University of Alberta Csaba
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationMinimax Policies for Combinatorial Prediction Games
Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationOnline Prediction Peter Bartlett
Online Prediction Peter Bartlett Statistics and EECS UC Berkeley and Mathematical Sciences Queensland University of Technology Online Prediction Repeated game: Cumulative loss: ˆL n = Decision method plays
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationImportance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationBetter Algorithms for Benign Bandits
Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationSimultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning Francesco Orabona Yahoo! Labs New York, USA francesco@orabona.com Abstract Stochastic gradient descent algorithms
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationarxiv: v4 [cs.lg] 27 Jan 2016
The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract
More information