A Second-order Bound with Excess Losses

Size: px
Start display at page:

Download "A Second-order Bound with Excess Losses"

Transcription

1 A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands June 14, / 14

2 Setting of prediction with expert advice In each round t the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k {1,..., K } incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The goal of the learner is to control his cumulative loss, which he can do by controlling his cumulative regret against any expert k: R k,t ) 2 / 14

3 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

4 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

5 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

6 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

7 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

8 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14

9 A Second-order Bound with Excess Losses We provide a third form of second-order bound R k,t ) 2. (1) Features of the bound: bounds of form (1) entail optimal scaling in the setting of experts reporting confidences [Blum and Mansour, 2007]. improvement for small excess losses. constant regret in the special case of i.i.d. losses [Van Erven et al., 2011]. probabilistic bounds on the cumulative predictive risk [Wintenberger, 2014]. Key element in the analysis: consider multiple learning rates [Blum and Mansour, 2007] and develop tuning techniques that go with it. 4 / 14

10 The Prod forecaster [Cesa-Bianchi et al, 2007] Parameter: η > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t = w k,t 1 / ( j w j,t 1 for each expert k perform the update w k,t = w k,t 1 ( 1 + η ) ) ) If η 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η ) 2 + η 5 / 14

11 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If η k 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η k + η k ) 2 6 / 14

12 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If we could optimize η k = ln K / ( l t t l k,t ) 2 R k,t 2 ) 2 The learning rates can be calibrated online at the multiplicative cost ln ln T in the regret bound. 7 / 14

13 Improvement for small excess losses If a strategy satisfies a bound of the form R k,t ) 2 + Then, if l t [0, 1] K, it also satisfies R k,t ( ln K ) t:l k,t l t ( lk,t l t ) + This bound is invariant by translation of ( the losses and implies the improvement for small losses R k,t O ) l t k,t. 8 / 14

14 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The learner aims at minimizing his confidence regret simultaneously for all experts R c = ) k,t I k,t The special case I k,t = 0 expresses that expert k is inactive in round t. 9 / 14

15 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The best available stated bound [Blum and Mansour, 2007] is R c = ) k,t I k,t I k,t l k,t. 10 / 14

16 Application to experts that report their confidence If a strategy satisfies a standard regret bound of the form R k,t ) 2 + Then, if l t [0, 1] K, applying the strategy on the modified losses l k,t = I k,t l k,t + (1 I k,t )l k,t, leads to an algorithm with a confidence regret bound of the form Rk,T c ) Ik,t( lt 2 2 l k,t + Ik,t 2 + for all k. 11 / 14

17 Stochastic (i.i.d.) losses We now turn to a stochastic setting considered by [Van Erven et al, 2011] where the loss vectors are identically distributed. Assumption [Van Erven et al, 2011] The loss vectors l t [0, 1] K are independent random variables such that there exists an expert k and some α (0, 1] such that t 1, min E[ ] l k,t l k k k,t α. If some strategy satisfies R k,t ) 2 ( lt l k,t + Then E [R k,t ] ln K α For any δ (0, 1), with probability at least 1 δ R k,t ln K α + α ln δ ln K 12 / 14

18 Application to cumulative risk [Wintenberger, 2014] Some additional results were obtained recently by [Wintenberger, 2014] extends the analysis to exponential updates; proves that deterministic second-order bounds in excess losses imply bounds on cumulative risk in a quite general stochastic setting. 13 / 14

19 Summary We provide a new form of second-order bound with several desirable features: R k,t ( ln K ) T ( lk,t l ) 2 t + in the setting of experts reporting confidences improvement for small excess losses constant regret for i.i.d. losses probabilistic bound on cumulative risk Thank you! 14 / 14

20 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 If 0 η k,t 1/2, (η k,t ) is non-increasing in t and l t [0, 1] K, ( R k,t ln K ) 2 + η k,t ln K η k,0 η k,t ε k =1 ( ηk,t 1 η k,t 1 )) } {{ } Cost of tuning multiple learning rates

21 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 With learning rates, for t 1, { 1 η k,t 1 = min 2,. 1 + t 1 s=1 } ln K ) 2, ( ls l k,s the cumulative regret is bounded simultaneously for all expert k as (ln ) R k,t = O 2 ln K + K ) ln ln T,

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Forecasting the electricity consumption by aggregating specialized experts

Forecasting the electricity consumption by aggregating specialized experts Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Goal

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

An Introduction to Adaptive Online Learning

An Introduction to Adaptive Online Learning An Introduction to Adaptive Online Learning Tim van Erven Joint work with: Wouter Koolen, Peter Grünwald ABN AMRO October 18, 2018 Example: Sequential Prediction for Football Games Before every match t

More information

arxiv: v1 [math.st] 23 May 2018

arxiv: v1 [math.st] 23 May 2018 EFFICIEN ONLINE ALGORIHMS FOR FAS-RAE REGRE BOUNDS UNDER SPARSIY PIERRE GAILLARD AND OLIVIER WINENBERGER arxiv:8050974v maths] 3 May 08 Abstract We consider the online convex optimization problem In the

More information

OPERA: Online Prediction by ExpeRt Aggregation

OPERA: Online Prediction by ExpeRt Aggregation OPERA: Online Prediction by ExpeRt Aggregation Pierre Gaillard, Department of Mathematical Sciences of Copenhagen University Yannig Goude, EDF R&D, LMO University of Paris-Sud Orsay UseR conference, Standford

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Sparse Accelerated Exponential Weights

Sparse Accelerated Exponential Weights Pierre Gaillard pierre.gaillard@inria.fr INRIA - Sierra project-team, Département d Informatique de l Ecole Normale Supérieure, Paris, France University of Copenhagen, Denmark Olivier Wintenberger olivier.wintenberger@upmc.fr

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

A Further Look at Sequential Aggregation Rules for Ozone Ensemble Forecasting

A Further Look at Sequential Aggregation Rules for Ozone Ensemble Forecasting A Further Look at Sequential Aggregation Rules for Ozone Ensemble Forecasting Vivien Mallet y, Sébastien Gerchinovitz z, and Gilles Stoltz z History: This version (version 1) September 28 INRIA, Paris-Rocquencourt

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Aggregation of sleeping predictors to forecast electricity consumption

Aggregation of sleeping predictors to forecast electricity consumption Aggregation of sleeping predictors to forecast electricity consumption Marie Devaine 1 2 3, Yannig Goude 2, and Gilles Stoltz 1 4 1 École Normale Supérieure, Paris, France 2 EDF R&D, Clamart, France 3

More information

Regret Minimization for Branching Experts

Regret Minimization for Branching Experts JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 21 Regret Minimization for Branching Experts Eyal Gofer Tel Aviv University, Tel Aviv, Israel Nicolò Cesa-Bianchi Università degli Studi di Milano,

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory. Lecture 14. CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems

More information

Prediction and Playing Games

Prediction and Playing Games Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Online Aggregation of Unbounded Signed Losses Using Shifting Experts

Online Aggregation of Unbounded Signed Losses Using Shifting Experts Proceedings of Machine Learning Research 60: 5, 207 Conformal and Probabilistic Prediction and Applications Online Aggregation of Unbounded Signed Losses Using Shifting Experts Vladimir V. V yugin Institute

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

arxiv: v2 [cs.lg] 27 Sep 2012

arxiv: v2 [cs.lg] 27 Sep 2012 Mirror Descent Meets Fixed Share and feels no regret arxiv:2023323v2 [cslg] 27 Sep 202 Nicolò Cesa-Bianchi Università degli Studi di Milano nicolocesa-bianchi@unimiit Gábor Lugosi ICREA & Universitat Pompeu

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with

More information

Multi-Armed Bandit Formulations for Identification and Control

Multi-Armed Bandit Formulations for Identification and Control Multi-Armed Bandit Formulations for Identification and Control Cristian R. Rojas Joint work with Matías I. Müller and Alexandre Proutiere KTH Royal Institute of Technology, Sweden ERNSI, September 24-27,

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Game-Theoretic Learning:

Game-Theoretic Learning: Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

arxiv: v4 [cs.lg] 22 Oct 2017

arxiv: v4 [cs.lg] 22 Oct 2017 Online Learning with Automata-based Expert Sequences Mehryar Mohri Scott Yang October 4, 07 arxiv:705003v4 cslg Oct 07 Abstract We consider a general framewor of online learning with expert advice where

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

arxiv: v2 [cs.lg] 17 Jan 2013

arxiv: v2 [cs.lg] 17 Jan 2013 arxiv:1301.0534v2 [cs.lg] 17 Jan 2013 Steven de Rooij Centrum Wiskunde & Informatica (CWI) Science Park 123, P.O. Box 94079 1090 GB Amsterdam, the Netherlands im van Erven Département de Mathématiques

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz To cite this version: Nicolò Cesa-Bianchi,

More information

Regret to the Best vs. Regret to the Average

Regret to the Best vs. Regret to the Average Regret to the Best vs. Regret to the Average Eyal Even-Dar 1, Michael Kearns 1, Yishay Mansour 2, Jennifer Wortman 1 1 Department of Computer and Information Science, University of Pennsylvania 2 School

More information

Efficient tracking of a growing number of experts

Efficient tracking of a growing number of experts Proceedings of Machine Learning Research 76: 23, 207 Algorithmic Learning Theory 207 Efficient tracking of a growing number of experts Jaouad Mourtada Centre de Mathématiques Appliquées École Polytechnique

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Online learning with noisy side observations

Online learning with noisy side observations Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Partial monitoring classification, regret bounds, and algorithms

Partial monitoring classification, regret bounds, and algorithms Partial monitoring classification, regret bounds, and algorithms Gábor Bartók Department of Computing Science University of Alberta Dávid Pál Department of Computing Science University of Alberta Csaba

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal

More information

Strategies for Prediction Under Imperfect Monitoring

Strategies for Prediction Under Imperfect Monitoring MATHEMATICS OF OPERATIONS RESEARCH Vol. 33, No. 3, August 008, pp. 53 58 issn 0364-765X eissn 56-547 08 3303 053 informs doi 0.87/moor.080.03 008 INFORMS Strategies for Prediction Under Imperfect Monitoring

More information

Robust Selective Sampling from Single and Multiple Teachers

Robust Selective Sampling from Single and Multiple Teachers Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan

More information

Minimizing Adaptive Regret with One Gradient per Iteration

Minimizing Adaptive Regret with One Gradient per Iteration Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

arxiv: v3 [cs.lg] 30 Jun 2012

arxiv: v3 [cs.lg] 30 Jun 2012 arxiv:05874v3 [cslg] 30 Jun 0 Orly Avner Shie Mannor Department of Electrical Engineering, Technion Ohad Shamir Microsoft Research New England Abstract We consider a multi-armed bandit problem where the

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement

More information

Online Density Estimation of Nonstationary Sources Using Exponential Family of Distributions

Online Density Estimation of Nonstationary Sources Using Exponential Family of Distributions IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Online Density Estimation of Nonstationary Sources Using Exponential Family of Distributions Kaan Gokcesu and Suleyman S. Kozat, Senior Member,

More information

Pure Exploration in Multi-armed Bandits Problems

Pure Exploration in Multi-armed Bandits Problems Pure Exploration in Multi-armed Bandits Problems Sébastien Bubeck 1,Rémi Munos 1, and Gilles Stoltz 2,3 1 INRIA Lille, SequeL Project, France 2 Ecole normale supérieure, CNRS, Paris, France 3 HEC Paris,

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Proceedings of Machine Learning Research vol 65:1 17, 2017 Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi Università degli Studi di Milano, Milano,

More information

Pure Exploration in Finitely Armed and Continuous Armed Bandits

Pure Exploration in Finitely Armed and Continuous Armed Bandits Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information