Online learning with noisy side observations

Size: px
Start display at page:

Download "Online learning with noisy side observations"

Transcription

1 Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL Inria Lille AISTATS 2016 Cadiz, April 2016

2 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17

3 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17

4 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17

5 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Perfect observations are in many cases not realistic Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17

6 Problem representation Problem representation in our work Problem representation Problem representation in our paper Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17

7 Problem representation Problem representation in our work Problem representation Problem representation in our paper 0.9l+0.1ξ l+0.8ξ l+0ξ l+0.3ξ Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17

8 Framework Description goals Framework Description and goals Learning process N actions (nodes of a graph) T rounds: Environment sets losses for actions Environment choses a graph structure (not disclosed) Learner picks an action It to play Learner incurs the loss lt,it of the action I t Learner observes graph Learner observes noisy losses ct,j of the neighbors j N(I t) Goal of the learner Minimizing cumulative regret R t = T t=1 l It }{{} Learner min j [N] [ T ] l j t=1 } {{ } Best action Online learning with noisy side observations SequeL - 4/17

9 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Edgeless graph No side observations Regret bound of Õ( NT ) - Played action - Unobserved action Online learning with noisy side observations SequeL - 5/17

10 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Complete graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) - Played action - Observed action Online learning with noisy side observations SequeL - 5/17

11 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) General graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number - Played action - Observed action - Unobserved action Online learning with noisy side observations SequeL - 5/17

12 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17

13 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17

14 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 NT αt T Online learning with noisy side observations SequeL - 5/17

15 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Online learning with noisy side observations SequeL - 6/17

16 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Question: What are good loss estimates? Online learning with noisy side observations SequeL - 6/17

17 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i Online learning with noisy side observations SequeL - 7/17

18 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 Online learning with noisy side observations SequeL - 7/17

19 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: Loss estimates E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 ˆl t,i = c t,i N j=1 s t,(j,i)p t,j Online learning with noisy side observations SequeL - 7/17

20 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17

21 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17

22 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

23 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

24 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17

25 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17

26 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17

27 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17

28 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17

29 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Online learning with noisy side observations SequeL - 11/17

30 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Regret bound of Exp3-IXt For optimal value of ε, Exp3-IXt has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 11/17

31 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α log(n) U(0, 1) weights Online learning with noisy side observations SequeL - 12/17

32 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α 1 c 2 U(c, 1) weights Online learning with noisy side observations SequeL - 12/17

33 Second attempt Effective independence number Question: How do we set ε? Online learning with noisy side observations SequeL - 13/17

34 Second attempt Effective independence number Question: How do we set ε? Answer: Finding optimal ε is hard Necessary to know whole graph Computing α(ε) is NP hard Online learning with noisy side observations SequeL - 13/17

35 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = [ st,(it,i)l t,i + (1 s t,(it,j))ξ t,i ] N j=1 s t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17

36 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17

37 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Properties of the estimates Unbiased estimates Smaller variance. Multiplying by s t,(it,i): Pulls estimate towards zero when weight is small Decreasing variance of noise Online learning with noisy side observations SequeL - 14/17

38 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 15/17

39 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Advantages over Exp3-IXt algorithm No thresholding (using all side observations) Algorithm does not need to know the best ε (NP-hard) Regret bound of order α T Online learning with noisy side observations SequeL - 15/17

40 Experiments Empirical performance Experiments Empirical performance Exp3 - basic algorithm which ignores all side observations Exp3-IXt - thresholded algorithm (needs to set ε) Exp3-WIX - proposed algorithm Adaptive learning rate Exp3-WIX Exp3 Exp3-IXt Cumulative regret Value of ε Online learning with noisy side observations SequeL - 16/17

41 Conclusion Conclusion New setting with noisy side observations Introduction of effective independence number α Exp3-WIX algorithm for the setting Does not need to threshold Does not need to know whole graph Regret bound of order α T Open questions: Is the effective independence number right quantity? Is there a matching lower-bound for Exp3-WIX? Upper-bound of Exp3-WIX matches lower-bound for some cases (e.g., bandits, full information, setting of Mannor and Shamir 2011) Related lower-bound (Wu et al. 2015) for a stochastic setting with Gaussian noise ( ) α R t = Ω ε T 2 Online learning with noisy side observations SequeL - 17/17

42 Thank you! Tomorrow session: Poster 10 SequeL Inria Lille AISTATS 2016 Toma s Koca k tomas.kocak@inria.fr sequel.lille.inria.fr

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France Partially based on material by: Toma s Koca k November 23, 2015 MVA 2015/2016 Last Lecture Examples of applications of online SSL

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Multi-task Linear Bandits

Multi-task Linear Bandits Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic

More information

Online learning with feedback graphs and switching costs

Online learning with feedback graphs and switching costs Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence

More information

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Subsampling, Concentration and Multi-armed bandits

Subsampling, Concentration and Multi-armed bandits Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Rob Fergus, Tomáš Kocák March 17, 2015 MVA 2014/2015 Last Lecture Analysis of online SSL Analysis

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

arxiv: v2 [cs.lg] 24 Oct 2018

arxiv: v2 [cs.lg] 24 Oct 2018 Adversarial Online Learning with noise arxiv:80.09346v [cs.lg] 4 Oct 08 Alon Resler alonress@gmail.com Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel Yishay Mansour mansour.yishay@gmail.com

More information

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses Online combinatorial optimization with stochastic decision sets and adversarial losses Gergely Neu Michal Valko SequeL team, INRIA Lille Nord urope, France {gergely.neu,michal.valko}@inria.fr Abstract

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Pack only the essentials: distributed sequential sampling for adaptive kernel DL

Pack only the essentials: distributed sequential sampling for adaptive kernel DL Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

Online Learning with Abstention

Online Learning with Abstention Corinna Cortes 1 Giulia DeSalvo 1 Claudio Gentile 1 2 Mehryar Mohri 3 1 Scott Yang 4 Abstract We present an extensive study of a key problem in online learning where the learner can opt to abstain from

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu András György 2 Csaba Szepesvári Dept. of Computing Science University of Alberta {ywu2,szepesva}@ualberta.ca 2 Dept. of Electrical

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Submodular Functions Properties Algorithms Machine Learning

Submodular Functions Properties Algorithms Machine Learning Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised

More information

Online Learning under Full and Bandit Information

Online Learning under Full and Bandit Information Online Learning under Full and Bandit Information Artem Sokolov Computerlinguistik Universität Heidelberg 1 Motivation 2 Adversarial Online Learning Hedge EXP3 3 Stochastic Bandits ε-greedy UCB Real world

More information

Online Optimization in X -Armed Bandits

Online Optimization in X -Armed Bandits Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole

More information

The optimistic principle applied to function optimization

The optimistic principle applied to function optimization The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely

More information

Online Learning of Probabilistic Graphical Models

Online Learning of Probabilistic Graphical Models 1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic

More information

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Pure Exploration Stochastic Multi-armed Bandits

Pure Exploration Stochastic Multi-armed Bandits CAS2016 Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction Optimal PAC Algorithm (Best-Arm, Best-k-Arm):

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

arxiv: v1 [cs.lg] 23 Jan 2019

arxiv: v1 [cs.lg] 23 Jan 2019 Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters Sobel Filters Note that smoothing the image before applying a Sobel filter typically gives better results. Even thresholding the Sobel filtered image cannot usually create precise, i.e., -pixel wide, edges.

More information

Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles. Léon Bottou COS 424 4/8/2010 Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

More information

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Online Learning with Costly Features and Labels

Online Learning with Costly Features and Labels Online Learning with Costly Features and Labels Navid Zolghadr Department of Computing Science University of Alberta zolghadr@ualberta.ca Gábor Bartók Department of Computer Science TH Zürich bartok@inf.ethz.ch

More information

8 Basics of Hypothesis Testing

8 Basics of Hypothesis Testing 8 Basics of Hypothesis Testing 4 Problems Problem : The stochastic signal S is either 0 or E with equal probability, for a known value E > 0. Consider an observation X = x of the stochastic variable X

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Robustesse des techniques de Monte Carlo dans l analyse d événements rares

Robustesse des techniques de Monte Carlo dans l analyse d événements rares Institut National de Recherche en Informatique et Automatique Institut de Recherche en Informatique et Systèmes Aléatoires Robustesse des techniques de Monte Carlo dans l analyse d événements rares H.

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Key words. online learning, multi-armed bandits, learning from experts, learning with partial feedback, graph theory

Key words. online learning, multi-armed bandits, learning from experts, learning with partial feedback, graph theory SIAM J. COMPUT. Vol. 46, No. 6, pp. 785 86 c 07 the authors NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK NOGA ALON, NICOLÒ CESA-BIANCHI, CLAUDIO GENTILE, SHIE MANNOR, YISHAY MANSOUR,

More information

Project in Computational Game Theory: Communities in Social Networks

Project in Computational Game Theory: Communities in Social Networks Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].

More information

Bandits on graphs and structures

Bandits on graphs and structures Bandits on graphs and structures Michal Valko To cite this version: Michal Valko. Bandits on graphs and structures. Machine Learning [stat.ml]. École normale supérieure de Cachan - ENS Cachan, 2016.

More information

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit European Worshop on Reinforcement Learning 14 (2018 October 2018, Lille, France. Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit Réda Alami Orange Labs 2 Avenue Pierre Marzin 22300,

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

6.207/14.15: Networks Lecture 12: Generalized Random Graphs 6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model

More information

Better Algorithms for Selective Sampling

Better Algorithms for Selective Sampling Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Adaptive Policies for Online Ad Selection Under Chunked Reward Pricing Model

Adaptive Policies for Online Ad Selection Under Chunked Reward Pricing Model Adaptive Policies for Online Ad Selection Under Chunked Reward Pricing Model Dinesh Garg IBM India Research Lab, Bangalore garg.dinesh@in.ibm.com (Joint Work with Michael Grabchak, Narayan Bhamidipati,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Pure Exploration in Finitely Armed and Continuous Armed Bandits

Pure Exploration in Finitely Armed and Continuous Armed Bandits Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,

More information

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits An Experimental Evaluation of High-Dimensional Multi-Armed Bandits Naoki Egami Romain Ferrali Kosuke Imai Princeton University Talk at Political Data Science Conference Washington University, St. Louis

More information

Predicting Graph Labels using Perceptron. Shuang Song

Predicting Graph Labels using Perceptron. Shuang Song Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Combining Interval and Probabilistic Uncertainty in Engineering Applications

Combining Interval and Probabilistic Uncertainty in Engineering Applications Combining Interval and Probabilistic Uncertainty in Engineering Applications Andrew Pownuk Computational Science Program University of Texas at El Paso El Paso, Texas 79968, USA ampownuk@utep.edu Page

More information

Learning with Exploration

Learning with Exploration Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.

More information

Quantum query complexity of entropy estimation

Quantum query complexity of entropy estimation Quantum query complexity of entropy estimation Xiaodi Wu QuICS, University of Maryland MSR Redmond, July 19th, 2017 J o i n t C e n t e r f o r Quantum Information and Computer Science Outline Motivation

More information

Online Forest Density Estimation

Online Forest Density Estimation Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density

More information