The Online Approach to Machine Learning

Similar documents
Online Learning and Online Convex Optimization

From Bandits to Experts: A Tale of Domination and Independence

Online Learning with Feedback Graphs

Bandits for Online Optimization

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Full-information Online Learning

Adaptive Online Gradient Descent

Exponential Weights on the Hypercube in Polynomial Time

Adaptive Online Learning in Dynamic Environments

The No-Regret Framework for Online Learning

Optimal and Adaptive Online Learning

Online Learning with Feedback Graphs

Advanced Machine Learning

Learning, Games, and Networks

Online Learning and Sequential Decision Making

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

On the Generalization Ability of Online Strongly Convex Programming Algorithms

Bandit Convex Optimization

Adaptive Sampling Under Low Noise Conditions 1

No-Regret Algorithms for Unconstrained Online Convex Optimization

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

A Second-order Bound with Excess Losses

Dynamic Regret of Strongly Adaptive Methods

Online Convex Optimization

Efficient Bandit Algorithms for Online Multiclass Prediction

Better Algorithms for Selective Sampling

Optimistic Bandit Convex Optimization

Regret in Online Combinatorial Optimization

arxiv: v1 [cs.lg] 23 Jan 2019

University of Alberta. The Role of Information in Online Learning

A survey: The convex optimization approach to regret minimization

Robust Selective Sampling from Single and Multiple Teachers

Efficient learning by implicit exploration in bandit problems with side observations

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Combinatorial Bandits

Learning with Large Number of Experts: Component Hedge Algorithm

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization

Ad Placement Strategies

Online Passive-Aggressive Algorithms

Advanced Machine Learning

Yevgeny Seldin. University of Copenhagen

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Accelerating Online Convex Optimization via Adaptive Prediction

Online Optimization : Competing with Dynamic Comparators

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

The FTRL Algorithm with Strongly Convex Regularizers

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Efficient learning by implicit exploration in bandit problems with side observations

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

Computational and Statistical Learning Theory

The convex optimization approach to regret minimization

An Online Convex Optimization Approach to Blackwell s Approachability

A Drifting-Games Analysis for Online Learning and Applications to Boosting

Optimization for Machine Learning

Stochastic Optimization

Learnability, Stability, Regularization and Strong Convexity

Trade-Offs in Distributed Learning and Optimization

Stochastic and Adversarial Online Learning without Hyperparameters

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Online Learning with Gaussian Payoffs and Side Observations

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Bandit Convex Optimization: T Regret in One Dimension

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Machine Learning in the Data Revolution Era

Online and Stochastic Learning with a Human Cognitive Bias

Introduction to Machine Learning (67577) Lecture 7

Online Learning with Predictable Sequences

Partial monitoring classification, regret bounds, and algorithms

Bandit Online Convex Optimization

Notes on AdaGrad. Joseph Perla 2014

Minimax Policies for Combinatorial Prediction Games

Practical Agnostic Active Learning

0.1 Motivating example: weighted majority algorithm

Worst-Case Bounds for Gaussian Process Models

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Online Prediction Peter Bartlett

The Interplay Between Stability and Regret in Online Learning

Distributed online optimization over jointly connected digraphs

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Internal Regret in On-line Portfolio Selection

Perceptron Mistake Bounds

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Better Algorithms for Benign Bandits

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Convex Repeated Games and Fenchel Duality

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Online Passive-Aggressive Algorithms

Agnostic Online learnability

arxiv: v4 [cs.lg] 27 Jan 2016

Transcription:

The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 2 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 3 / 53

Machine learning Classification/regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) Training data S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) (e.g., email messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A(S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Approach to ML 4 / 53

Two notions of risk View data as a statistical sample: statistical risk [ E loss ( ) ] A( S T ), (X, Y) } {{ }} {{ } trained test model example Training set S T = ( (X 1, Y 1 ),..., (X T, Y T ) ) and test example (X, Y) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T loss ( ) A(S t 1 ), (X } {{ } t, Y t ) } {{ } trained test model example t=1 Sequence of models trained on growing prefixes S t = ( (X 1, Y 1 ),..., (X t, Y t ) ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Approach to ML 5 / 53

Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning [ E loss ( A(S T ), (X, Y) )] [ inf E loss ( h, (X, Y) )] h H compare to expected loss of best model in the class Regret in online learning T loss ( A(S t 1 ), (X t, Y t ) ) inf t=1 h H t=1 T loss ( h, (X t, Y t ) ) compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Approach to ML 6 / 53

Incremental model update A natural blueprint for online learning algorithms For t = 1, 2,... 1 Apply current model h t 1 to next data element (X t, Y t ) 2 Update current model: h t 1 h t H Goal: control regret T loss ( h t 1, (X t, Y t ) ) inf t=1 h H t=1 T loss ( h, (X t, Y t ) ) View this as a repeated game between a player generating predictors h t H and an opponent generating data (X t, Y t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 7 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 8 / 53

Theory of repeated games James Hannan (1922 2010) David Blackwell (1919 2010) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Approach to ML 9 / 53

Zero-sum 2-person games played more than once 1 2... M 1 l(1, 1) l(1, 2)... 2 l(2, 1) l(2, 2)......... N N M known loss matrix Row player (player) has N actions Column player (opponent) has M actions For each game round t = 1, 2,... Player chooses action i t and opponent chooses action y t The player suffers loss l(i t, y t ) (= gain of opponent) Player can learn from opponent s history of past choices y 1,..., y t 1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 10 / 53

Prediction with expert advice t = 1 t = 2... 1 l 1 (1) l 2 (1)... 2 l 1 (2) l 2 (2)......... N l 1 (N) l 2 (N) Volodya Vovk Manfred Warmuth Opponent s moves y 1, y 2,... define a sequential prediction problem with a time-varying loss function l(i t, y t ) = l t (i t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 11 / 53

Playing the experts game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53

Playing the experts game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53

Playing the experts game N actions 7 3 2 4 1 6 7 4 9 For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: l t (1),..., l t (N) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53

Oblivious opponents Losses l t (1),..., l t (N) for all t = 1, 2,... are fixed beforehand, and unknown to the (randomized) player Oblivious regret minimization [ T ] def R T = E l t (I t ) t=1 min T i=1,...,n t=1 l t (i) want = o(t) N. Cesa-Bianchi (UNIMI) Online Approach to ML 13 / 53

Bounds on regret [Experts paper, 1997] Lower bound using random losses l t (i) L t (i) {0, 1} independent random coin flip [ T ] For any player strategy E L t (I t ) = T 2 t=1 Then the expected regret is [ T ( ) ] 1 E max i=1,...,n 2 L t(i) = ( 1 o(1) ) T ln N 2 t=1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 14 / 53

Exponentially weighted forecaster At time t pick action I t = i with probability proportional to ( ) t 1 exp η l s (i) s=1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts paper, 1997] If η = T ln N (ln N)/(8T) then R T 2 Matching lower bound including constants Dynamic choice η t = (ln N)/(8t) only loses small constants N. Cesa-Bianchi (UNIMI) Online Approach to ML 15 / 53

The bandit problem: playing an unknown game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53

The bandit problem: playing an unknown game N actions????????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53

The bandit problem: playing an unknown game N actions? 3??????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: Only l t (I t ) is revealed N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53

The bandit problem: playing an unknown game N actions? 3??????? For t = 1, 2,... 1 Loss l t (i) [0, 1] is assigned to every action i = 1,..., N (hidden from the player) 2 Player picks an action I t (possibly using randomization) and incurs loss l t (I t ) 3 Player gets feedback information: Only l t (I t ) is revealed Many applications Ad placement, dynamic content adaptation, routing, online auctions N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 17 / 53

Relationships between actions [Mannor and Shamir, 2011] Undirected Directed N. Cesa-Bianchi (UNIMI) Online Approach to ML 18 / 53

A graph of relationships over actions?????????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53

A graph of relationships over actions?????????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53

A graph of relationships over actions 7 3 6 7 2????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53

Recovering expert and bandit settings Experts: clique 7 3 6 7 2 1 2 4 9 4 Bandits: empty graph? 3???????? N. Cesa-Bianchi (UNIMI) Online Approach to ML 20 / 53

Exponentially weighted forecaster Reprise Player s strategy [Alon, C-B, Gentile, Mannor, Mansour and Shamir, 2013] ( ) t 1 P t (I t = i) exp η ls (i) i = 1,..., N lt (i) = s=1 l t (i) P t ( lt (i) observed ) Importance sampling estimator if l t(i) is observed 0 otherwise ] E t [ lt (i) = l t (i) unbiased E t [ lt (i) 2] 1 P t ( lt (i) observed ) variance control N. Cesa-Bianchi (UNIMI) Online Approach to ML 21 / 53

Independence number α(g) The size of the largest independent set N. Cesa-Bianchi (UNIMI) Online Approach to ML 22 / 53

Independence number α(g) The size of the largest independent set N. Cesa-Bianchi (UNIMI) Online Approach to ML 22 / 53

Regret bounds Analysis (undirected graphs) R T ln N η + η 2 { }} { T N P ( I t = i l t (i) observed ) t=1 i=1 α(g) = α(g)t ln N by tuning η If graph is directed, then bound worsens only by log factors Special cases Experts: α(g) = 1 R T T ln N Bandits: α(g) = N R T TN ln N N. Cesa-Bianchi (UNIMI) Online Approach to ML 23 / 53

Reactive opponents [Dekel, Koren and Peres, 2014] The loss of action i at time t depends on the player s past m actions l t (i) L t (I t m,..., I t 1, i) Adaptive regret T = E L t (I t m,..., I t 1, I t ) R ada T t=1 min i=1,...,k t=1 T L t (i,..., i, i) } {{ } m times Minimax adaptive regret (for any constant m > 1) R ada T = Θ ( T 2/3) N. Cesa-Bianchi (UNIMI) Online Approach to ML 24 / 53

Partial monitoring: not observing any loss Dynamic pricing: Perform as the best fixed price 1 Post a T-shirt price 2 Observe if next customer buys or not 3 Adjust price Feedback does not reveal the player s loss 1 2 3 4 5 1 0 1 2 3 4 2 c 0 1 2 3 3 c c 0 1 2 4 c c c 0 1 5 c c c c 0 Loss matrix 1 2 3 4 5 1 1 1 1 1 1 2 0 1 1 1 1 3 0 0 1 1 1 4 0 0 0 1 1 5 0 0 0 0 1 Feedback matrix N. Cesa-Bianchi (UNIMI) Online Approach to ML 25 / 53

A characterization of minimax regret Special case Multiarmed bandits: loss and feedback matrix are the same A general gap theorem [Bartok, Foster, Pál, Rakhlin and Szepesvári, 2013] A constructive characterization of the minimax regret for any pair of loss/feedback matrix Only three possible rates for nontrivial games: 1 Easy games (e.g., bandits): Θ ( T ) 2 Hard games (e.g., revealing action): Θ ( T 2/3) 3 Impossible games: Θ(T) N. Cesa-Bianchi (UNIMI) Online Approach to ML 26 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 27 / 53

A game equivalent to prediction with expert advice Online linear optimization in the simplex 1 Play point p t from the N-dimensional simplex N 2 Incur linear loss E [ l t (I t ) ] = p l t t 3 Observe loss gradient l t Regret: compete against the best point in the simplex T p l t t t=1 min q N T q lt t=1 } {{ } 1 T = min lt (i) i=1,...,n T t=1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 28 / 53

From game theory to machine learning UNLABELED DATA CLASSIFICATION SYSTEM GUESSED LABEL TRUE LABEL OPPONENT Opponent s moves y t are viewed as values or labels assigned to observations x t R d (e.g., categories of documents) A repeated game between the player choosing an element w t of a linear space and the opponent choosing a label y t for x t Regret with respect to best element in the linear space N. Cesa-Bianchi (UNIMI) Online Approach to ML 29 / 53

Online convex optimization [Zinkevich, 2003] 1 Play point w t from a convex linear space S 2 Incur convex loss l t (w t ) 3 Observe loss gradient l t (w t ) 4 Update point: w t w t+1 S Example Regression with square loss: l t (w) = ( w x t y t ) 2 y t R Classification with hinge loss: l t (w) = [ 1 y t w x t ]+ y t { 1, +1} Regret T l t (w t ) inf t=1 u S t=1 T l t (u) N. Cesa-Bianchi (UNIMI) Online Approach to ML 30 / 53

Finding a good online algorithm Follow the leader w t+1 = arginf w S t l s (w) s=1 Regret can be linear due to lack of stability Example S = [ 1, +1] l 1 (w) = 1 + w 2 l t (w) = { w if t is even +w if t is odd N. Cesa-Bianchi (UNIMI) Online Approach to ML 31 / 53

Regularized online learning Strong convexity Φ : S R is β-strongly convex w.r.t. a norm if for all u, v S Φ(v) Φ(u) + Φ(u) (v u) + β u v 2 2 Example: Φ(v) = 1 2 v 2 Follow the regularized leader [Shalev-Shwartz, 2007; Abernethy, Hazan and Rakhlin, 2008] [ ] t w t+1 = argmin η l s (w) + Φ(w) w S s=1 Φ is a strongly convex regularizer defined on S N. Cesa-Bianchi (UNIMI) Online Approach to ML 32 / 53

Deriving an incremental update Linearization of convex losses l t (w t ) l t (u) l t (w t ) w } {{ } t l t (w t ) } {{ } lt lt u Follow the regularized leader with linearized losses w t+1 = argmin w S ( η ) t ls w + Φ(w) s=1 } {{ } θ t Φ is the convex dual of Φ ( ) = argmax ηθ t w Φ(w) w S = Φ ( η θ t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 33 / 53

The Mirror Descent algorithm [Nemirovsky and Yudin, 1983] Recall: ( ) w t+1 = Φ ( ) t η θ t = Φ η l s (w s ) s=1 Online Mirror Descent Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: θ 1 = 0 // primal parameter For t = 1, 2,... 1 Use w t = Φ (θ t ) // dual parameter (via mirror step) 2 Suffer loss l t (w t ) 3 Observe loss gradient l t (w t ) 4 Update θ t+1 = θ t η l t (w t ) // gradient step N. Cesa-Bianchi (UNIMI) Online Approach to ML 34 / 53

Some examples Exponentiated gradient: S = N and Φ(w) = [Kivinen and Warmuth, 1997] d w i ln w i i=1 Online Gradient Descent: S = R d and Φ(w) = 1 2 w 2 [Zinkevich, 2003] p-norm Gradient Descent: S = R d and Φ(w) = [Gentile, 2003] 1 2(p 1) w 2 p Matrix gradient descent [Cavallanti, C-B and Gentile, 2010; Kakade, Shalev-Shwartz and Tewari, 2012] N. Cesa-Bianchi (UNIMI) Online Approach to ML 35 / 53

General regret bound Analysis relies on smoothness of Φ in order to bound increments Φ (θ t+1 ) Φ (θ t ) via l t (w t ) 2 Oracle bound [Kakade, Shalev-Shwartz and Tewari, 2012] ( T T ) l t (w t ) inf l t (u) + Φ(u) + η T l t (w t ) 2 u S η 2 β t=1 t=1 } {{ }} {{ }} {{ } t=1 model cost cumulative loss model fit l 1, l 2,... are arbitrary convex losses If gradients are bounded, then R T = O ( T ) This is optimal for general convex losses l t If all l t are strongly convex, then R T = O ( ln T ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 36 / 53

Regularization via stochastic smoothing Follow the perturbed leader [Kalai and Vempala, 2005] [ ( ) ] w t+1 = E argmin ηθ t w + Z w w S The distribution of Z must be stable (small variance and small average sensitivity) For some choices of Z, FPL becomes equivalent to OMD [Abernethy, Lee, Sinha and Tewari, 2014] N. Cesa-Bianchi (UNIMI) Online Approach to ML 37 / 53

Adaptive regularization Online Ridge Regression [Vovk, 2001; Azoury and Warmuth, 2001] ( T ( ) T ) w 2 ( ( t x t y t inf u ) 2 x t y t + u 2 + d ln 1 + T ) u R t=1 d d t=1 Φ t (w) = 1 t 2 w 2 A t A t = I + x s x s More examples s=1 Online Newton Step [Hazan, Agarwal and Kale, 2007] Logarithmic regret for exp-concave loss functions AdaGrad [Duchi, Hazan and Singer, 2010] Competitive with optimal fixed regularizer Scale-invariant algorithms [Ross, Mineiro and Langford, 2013] Regret invariant w.r.t. rescaling of single features N. Cesa-Bianchi (UNIMI) Online Approach to ML 38 / 53

Shifting regret [Herbster and Warmuth, 2001] Nonstationarity If data source is not fitted well by any model in the class, then comparing to the best model u S is trivial Compare instead to the best sequence u 1, u 2, S of models Shifting Regret for Online Mirror Descent [Zinkevich, 2003] T l t (w t ) t=1 } {{ } cumulative loss inf u 1,...,u T S T l t (u t ) t=1 } {{ } model fit + T u t u t 1 t=1 } {{ } shifting model cost + diam(s) + N. Cesa-Bianchi (UNIMI) Online Approach to ML 39 / 53

Online active learning TRUE LABEL (UPON REQUEST) HUMAN EXPERT UNLABELED DATA CLASSIFIER GUESSED LABEL USER Observing the data process is cheap Observing the label process is expensive need to query the human expert Question How much better can we do by subsampling adaptively the label process? N. Cesa-Bianchi (UNIMI) Online Approach to ML 40 / 53

A game with the opponent [C-B, Gentile, Zaniboni, 2006] Opponent avoids causing mistakes on documents far away from decision surface vectorized document decision surface Probability of querying a document proportional to inverse distance to decision surface Binary classification performance guarantee remains identical (in expectation) to the full sampling case N. Cesa-Bianchi (UNIMI) Online Approach to ML 41 / 53

Experiments on document categorization N. Cesa-Bianchi (UNIMI) Online Approach to ML 42 / 53

Stochastic Online Mirror Descent Parameters: Strongly convex regularizer Φ and learning rate η > 0 Initialize: θ 1 = 0 // primal parameter For t = 1, 2,... 1 Use w t = Φ (θ t ) // mirror step with projection on S 2 Suffer loss l t (w t ) 3 Compute estimate ĝ t of loss gradient l t (w t ) 4 Update θ t+1 = θ t ηĝ t // gradient step Typically, Φ(w) = 1 2 w 2 (stochastic OGD) N. Cesa-Bianchi (UNIMI) Online Approach to ML 43 / 53

Attribute efficient learning [C-B, Shamir, Shalev-Shwartz, 2011] original sampled Obtain a few attributes from each training example Use Stochastic OGD with square loss l t (w) = 1 2 l t (w) = ( w x t y t ) xt ( w x t y t ) 2 Unbiased estimate of square loss gradient using two attributes 1 Estimate of w x: query x i according to p(i) = w i w 1 2 Estimate of x: query x j uniformly at random ( ) 3 Gradient estimate: ĝ = w 1 sgn(w i ) x i y d x j e j N. Cesa-Bianchi (UNIMI) Online Approach to ML 44 / 53

Summary 1 My beautiful regret 2 A supposedly fun game I ll play again 3 A graphic novel 4 The joy of convex 5 The joy of convex (without the gradient) N. Cesa-Bianchi (UNIMI) Online Approach to ML 45 / 53

Online convex optimization with bandit feedback For T = 1, 2,... 1 Play point w t from a convex linear space S 2 Incur and observe convex loss l t (w t ) 3 Update point: w t w t+1 S Regret [ T ] R T = E l t (w t ) inf t=1 u S t=1 T l t (u) N. Cesa-Bianchi (UNIMI) Online Approach to ML 46 / 53

Gradient descent without a gradient [Flaxman, Kalai and McMahan, 2004] Run stochastic OGD using a perturbed version of w t : w t + δ U (U is a random unit vector and δ > 0) Gradient estimate ĝ t = d δ l t(w t + δ U) U Fact (Stokes Theorem): If l t were differentiable, then E [ ĝ t ] = E [ lt (w t + δ B) ] where B is a random vector in the unit ball ĝ t estimates the gradient of a locally smoothed version of l t N. Cesa-Bianchi (UNIMI) Online Approach to ML 47 / 53

Guarantees If l t is Lipschitz, then the smoothed version is a good approximation of l t Radius δ of perturbation controls bias/variance trade-off Regret of stochastic OGD for convex and Lipschitz loss sequences R T = O ( T 3/4) N. Cesa-Bianchi (UNIMI) Online Approach to ML 48 / 53

Guarantees If l t is Lipschitz, then the smoothed version is a good approximation of l t Radius δ of perturbation controls bias/variance trade-off Regret of stochastic OGD for convex and Lipschitz loss sequences R T = O ( T 3/4) The linear case Assume losses are linear functions on S, l t (w) = l t w Can we achieve a better rate? N. Cesa-Bianchi (UNIMI) Online Approach to ML 48 / 53

Self-concordant functions [Abernethy, Hazan and Rakhlin, 2008] Run stochastic OGD regularized with a self-concordant function for S Variance control through the associated Dikin ellipsoid Loss estimate l t obtained via perturbed point W t ± e i λi {e i, λ i } is a randomly drawn eigenvector-eigenvalue pair of Dikin ellipsoid Regret for linear functions ( R T = O d ) 3/2 T ln T N. Cesa-Bianchi (UNIMI) Online Approach to ML 49 / 53

Algorithm GeometricHedge [Dani, Hayes and Kakade, 2008] Build an ε-cover S 0 S of size ε d Use experts algorithm (e.g., exponential weights) to draw actions W t S 0 and use unbiased linear estimator for the loss lt = Pt 1 W t Wt l t where P t = E [ W t Wt ] Mix exponential weights with exploration distribution µ over the actions in the cover: p t (w) = (1 γ) q t (w) + γ µ(w) (0 γ 1) } {{ } exp. distrib. µ controls the variance of the loss estimates by ensuring all directions are sampled often enough N. Cesa-Bianchi (UNIMI) Online Approach to ML 50 / 53

Analysis [C-B and Lugosi, 2010] Regret bound R T = O ( d ( ) ) 1 + 1 T ln T d λ min λ min = smallest eigenvalue of E µ [ W W ] λ 1 min is proportional to the variance of loss estimates When λ min 1 d we get the optimal bound Θ ( d ) T ln T If µ is uniform over all actions, the above happens when action space is approximately isotropic N. Cesa-Bianchi (UNIMI) Online Approach to ML 51 / 53

Good exploration bases [Bubeck, C-B and Kakade, 2012] Choose a basis under which the action set looks isotropic There are at most O ( d 2) contact points between S 0 and Löwner ellipsoid (the min volume ellipsoid enclosing S 0 ) Put exploration distribution µ on these contact points This ensures that E µ [ W W ] is isotropic: λ min = 1 d Exploration on contact points of Löwner ellipsoid achieves optimal regret ( R T = O d ) T ln T However, this construction is not efficient in general An efficient construction uses volumetric ellipsoids [Hazan, Gerber and Meka, 2014] N. Cesa-Bianchi (UNIMI) Online Approach to ML 52 / 53

Conclusions More applications Portfolio management Matrix completion Competitive analysis of algorithms Recommendation systems Some open problems Exact rates for bandit convex optimization Trade-offs between regret bounds and running times Online tensor and spectral learning Problems with states N. Cesa-Bianchi (UNIMI) Online Approach to ML 53 / 53