A Second-order Bound with Excess Losses
|
|
- Clementine Austin
- 5 years ago
- Views:
Transcription
1 A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands June 14, / 14
2 Setting of prediction with expert advice In each round t the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k {1,..., K } incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The goal of the learner is to control his cumulative loss, which he can do by controlling his cumulative regret against any expert k: R k,t ) 2 / 14
3 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
4 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
5 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t but no method to optimize η R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
6 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
7 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
8 Regret bounds Worst-case: R k,t T ln K Improvement for small losses [Cesa-Bianchi and Lugosi, 2006]: R k,t L k,t ln K where L k,t = T l k,t Second-order [Cesa-Bianchi et al, 2007, Hazan and Kale, 2011]: R k,t /η + η T l2 k,t R k,t T l2 k,t R k,t T v t where v t = K ) 2 p k,t Our contribution: new second-order bound in terms of excess losses: ) 2 R k,t k=1 3 / 14
9 A Second-order Bound with Excess Losses We provide a third form of second-order bound R k,t ) 2. (1) Features of the bound: bounds of form (1) entail optimal scaling in the setting of experts reporting confidences [Blum and Mansour, 2007]. improvement for small excess losses. constant regret in the special case of i.i.d. losses [Van Erven et al., 2011]. probabilistic bounds on the cumulative predictive risk [Wintenberger, 2014]. Key element in the analysis: consider multiple learning rates [Blum and Mansour, 2007] and develop tuning techniques that go with it. 4 / 14
10 The Prod forecaster [Cesa-Bianchi et al, 2007] Parameter: η > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t = w k,t 1 / ( j w j,t 1 for each expert k perform the update w k,t = w k,t 1 ( 1 + η ) ) ) If η 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η ) 2 + η 5 / 14
11 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If η k 1/2 and l t [0, 1] K, the cumulative regret is bounded as R k,t ln K η k + η k ) 2 6 / 14
12 Prod with multiple learning rates (ML-Prod) Parameters: η 1,..., η K > 0 Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight for each expert k perform the update w k,t = w k,t 1 ( 1 + η k ) ) p k,t = η k w k,t 1 / ( j η jw j,t 1 ) If we could optimize η k = ln K / ( l t t l k,t ) 2 R k,t 2 ) 2 The learning rates can be calibrated online at the multiplicative cost ln ln T in the regret bound. 7 / 14
13 Improvement for small excess losses If a strategy satisfies a bound of the form R k,t ) 2 + Then, if l t [0, 1] K, it also satisfies R k,t ( ln K ) t:l k,t l t ( lk,t l t ) + This bound is invariant by translation of ( the losses and implies the improvement for small losses R k,t O ) l t k,t. 8 / 14
14 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The learner aims at minimizing his confidence regret simultaneously for all experts R c = ) k,t I k,t The special case I k,t = 0 expresses that expert k is inactive in round t. 9 / 14
15 Experts that report their confidence [Blum and Mansour, 2007] In each round t = 1,..., T each expert k expresses his confidence as a number I k,t [0, 1] the learner makes a prediction by choosing a vector p t = ( p 1,t,..., p K,t ) of non-negative weights that sum to one every expert k incurs loss l k,t [0, 1] the learner s loss is l ( t = p t l t = ) K k=1 p k,tl k,t The best available stated bound [Blum and Mansour, 2007] is R c = ) k,t I k,t I k,t l k,t. 10 / 14
16 Application to experts that report their confidence If a strategy satisfies a standard regret bound of the form R k,t ) 2 + Then, if l t [0, 1] K, applying the strategy on the modified losses l k,t = I k,t l k,t + (1 I k,t )l k,t, leads to an algorithm with a confidence regret bound of the form Rk,T c ) Ik,t( lt 2 2 l k,t + Ik,t 2 + for all k. 11 / 14
17 Stochastic (i.i.d.) losses We now turn to a stochastic setting considered by [Van Erven et al, 2011] where the loss vectors are identically distributed. Assumption [Van Erven et al, 2011] The loss vectors l t [0, 1] K are independent random variables such that there exists an expert k and some α (0, 1] such that t 1, min E[ ] l k,t l k k k,t α. If some strategy satisfies R k,t ) 2 ( lt l k,t + Then E [R k,t ] ln K α For any δ (0, 1), with probability at least 1 δ R k,t ln K α + α ln δ ln K 12 / 14
18 Application to cumulative risk [Wintenberger, 2014] Some additional results were obtained recently by [Wintenberger, 2014] extends the analysis to exponential updates; proves that deterministic second-order bounds in excess losses imply bounds on cumulative risk in a quite general stochastic setting. 13 / 14
19 Summary We provide a new form of second-order bound with several desirable features: R k,t ( ln K ) T ( lk,t l ) 2 t + in the setting of experts reporting confidences improvement for small excess losses constant regret for i.i.d. losses probabilistic bound on cumulative risk Thank you! 14 / 14
20 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 If 0 η k,t 1/2, (η k,t ) is non-increasing in t and l t [0, 1] K, ( R k,t ln K ) 2 + η k,t ln K η k,0 η k,t ε k =1 ( ηk,t 1 η k,t 1 )) } {{ } Cost of tuning multiple learning rates
21 Adaptive version of ML-Prod Parameters: a rule to pick η k,t online Initialization: w 0 = (1/K,..., 1/K ) For each round t = 1, 2,... assign to each expert k the weight p k,t η k,t 1 w k,t 1 for each expert k perform the update ) w k,t = (w k,t 1 (1 )) η k,t η k,t 1 + η k,t 1 With learning rates, for t 1, { 1 η k,t 1 = min 2,. 1 + t 1 s=1 } ln K ) 2, ( ls l k,s the cumulative regret is bounded simultaneously for all expert k as (ln ) R k,t = O 2 ln K + K ) ln ln T,
Full-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationForecasting the electricity consumption by aggregating specialized experts
Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Goal
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationAn Introduction to Adaptive Online Learning
An Introduction to Adaptive Online Learning Tim van Erven Joint work with: Wouter Koolen, Peter Grünwald ABN AMRO October 18, 2018 Example: Sequential Prediction for Football Games Before every match t
More informationarxiv: v1 [math.st] 23 May 2018
EFFICIEN ONLINE ALGORIHMS FOR FAS-RAE REGRE BOUNDS UNDER SPARSIY PIERRE GAILLARD AND OLIVIER WINENBERGER arxiv:8050974v maths] 3 May 08 Abstract We consider the online convex optimization problem In the
More informationOPERA: Online Prediction by ExpeRt Aggregation
OPERA: Online Prediction by ExpeRt Aggregation Pierre Gaillard, Department of Mathematical Sciences of Copenhagen University Yannig Goude, EDF R&D, LMO University of Paris-Sud Orsay UseR conference, Standford
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationSparse Accelerated Exponential Weights
Pierre Gaillard pierre.gaillard@inria.fr INRIA - Sierra project-team, Département d Informatique de l Ecole Normale Supérieure, Paris, France University of Copenhagen, Denmark Olivier Wintenberger olivier.wintenberger@upmc.fr
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationA Further Look at Sequential Aggregation Rules for Ozone Ensemble Forecasting
A Further Look at Sequential Aggregation Rules for Ozone Ensemble Forecasting Vivien Mallet y, Sébastien Gerchinovitz z, and Gilles Stoltz z History: This version (version 1) September 28 INRIA, Paris-Rocquencourt
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOnline Learning with Predictable Sequences
JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationAggregation of sleeping predictors to forecast electricity consumption
Aggregation of sleeping predictors to forecast electricity consumption Marie Devaine 1 2 3, Yannig Goude 2, and Gilles Stoltz 1 4 1 École Normale Supérieure, Paris, France 2 EDF R&D, Clamart, France 3
More informationRegret Minimization for Branching Experts
JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 21 Regret Minimization for Branching Experts Eyal Gofer Tel Aviv University, Tel Aviv, Israel Nicolò Cesa-Bianchi Università degli Studi di Milano,
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 14.
CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems
More informationPrediction and Playing Games
Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301
More informationOnline Aggregation of Unbounded Signed Losses Using Shifting Experts
Proceedings of Machine Learning Research 60: 5, 207 Conformal and Probabilistic Prediction and Applications Online Aggregation of Unbounded Signed Losses Using Shifting Experts Vladimir V. V yugin Institute
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationarxiv: v2 [cs.lg] 27 Sep 2012
Mirror Descent Meets Fixed Share and feels no regret arxiv:2023323v2 [cslg] 27 Sep 202 Nicolò Cesa-Bianchi Università degli Studi di Milano nicolocesa-bianchi@unimiit Gábor Lugosi ICREA & Universitat Pompeu
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,
More informationOn Minimaxity of Follow the Leader Strategy in the Stochastic Setting
On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationMinimax Policies for Combinatorial Prediction Games
Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica
More informationA Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with
More informationMulti-Armed Bandit Formulations for Identification and Control
Multi-Armed Bandit Formulations for Identification and Control Cristian R. Rojas Joint work with Matías I. Müller and Alexandre Proutiere KTH Royal Institute of Technology, Sweden ERNSI, September 24-27,
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationGame-Theoretic Learning:
Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationBetter Algorithms for Benign Bandits
Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com
More informationarxiv: v4 [cs.lg] 22 Oct 2017
Online Learning with Automata-based Expert Sequences Mehryar Mohri Scott Yang October 4, 07 arxiv:705003v4 cslg Oct 07 Abstract We consider a general framewor of online learning with expert advice where
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationarxiv: v2 [cs.lg] 17 Jan 2013
arxiv:1301.0534v2 [cs.lg] 17 Jan 2013 Steven de Rooij Centrum Wiskunde & Informatica (CWI) Science Park 123, P.O. Box 94079 1090 GB Amsterdam, the Netherlands im van Erven Département de Mathématiques
More informationAlgorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz To cite this version: Nicolò Cesa-Bianchi,
More informationRegret to the Best vs. Regret to the Average
Regret to the Best vs. Regret to the Average Eyal Even-Dar 1, Michael Kearns 1, Yishay Mansour 2, Jennifer Wortman 1 1 Department of Computer and Information Science, University of Pennsylvania 2 School
More informationEfficient tracking of a growing number of experts
Proceedings of Machine Learning Research 76: 23, 207 Algorithmic Learning Theory 207 Efficient tracking of a growing number of experts Jaouad Mourtada Centre de Mathématiques Appliquées École Polytechnique
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationOnline learning with noisy side observations
Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationPartial monitoring classification, regret bounds, and algorithms
Partial monitoring classification, regret bounds, and algorithms Gábor Bartók Department of Computing Science University of Alberta Dávid Pál Department of Computing Science University of Alberta Csaba
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationStrategies for Prediction Under Imperfect Monitoring
MATHEMATICS OF OPERATIONS RESEARCH Vol. 33, No. 3, August 008, pp. 53 58 issn 0364-765X eissn 56-547 08 3303 053 informs doi 0.87/moor.080.03 008 INFORMS Strategies for Prediction Under Imperfect Monitoring
More informationRobust Selective Sampling from Single and Multiple Teachers
Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan
More informationMinimizing Adaptive Regret with One Gradient per Iteration
Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationMinimax strategy for prediction with expert advice under stochastic assumptions
Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction
More informationarxiv: v3 [cs.lg] 30 Jun 2012
arxiv:05874v3 [cslg] 30 Jun 0 Orly Avner Shie Mannor Department of Electrical Engineering, Technion Ohad Shamir Microsoft Research New England Abstract We consider a multi-armed bandit problem where the
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationOnline Density Estimation of Nonstationary Sources Using Exponential Family of Distributions
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Online Density Estimation of Nonstationary Sources Using Exponential Family of Distributions Kaan Gokcesu and Suleyman S. Kozat, Senior Member,
More informationPure Exploration in Multi-armed Bandits Problems
Pure Exploration in Multi-armed Bandits Problems Sébastien Bubeck 1,Rémi Munos 1, and Gilles Stoltz 2,3 1 INRIA Lille, SequeL Project, France 2 Ecole normale supérieure, CNRS, Paris, France 3 HEC Paris,
More informationAlgorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
Proceedings of Machine Learning Research vol 65:1 17, 2017 Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi Università degli Studi di Milano, Milano,
More informationPure Exploration in Finitely Armed and Continuous Armed Bandits
Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More information