Online learning with noisy side observations
|
|
- Paulina Stevens
- 5 years ago
- Views:
Transcription
1 Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL Inria Lille AISTATS 2016 Cadiz, April 2016
2 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17
3 Motivation Example Example Fishing spots Every day: Choose fishing spot (at the beginning of the day) Maximize total number of fish caught This example is motivated by an example of Wu et al Online learning with noisy side observations SequeL - 1/17
4 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17
5 Problem representation Previous representation Problem representation Previous representation, introduced by Mannor and Shamir 2011 Perfect observations are in many cases not realistic Actions are nodes of directed unweighted graph Playing action reveals losses c t,j of neighbors j N(I t ) c t,j = l t,j Online learning with noisy side observations SequeL - 2/17
6 Problem representation Problem representation in our work Problem representation Problem representation in our paper Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17
7 Problem representation Problem representation in our work Problem representation Problem representation in our paper 0.9l+0.1ξ l+0.8ξ l+0ξ l+0.3ξ Actions are nodes of directed weighted graph. Playing action reveals noisy losses c t,j of neighbors j N(I t ). Smaller weight st,(it,j) bigger noise c t,j = s t,(it,j)l t,j + (1 s t,(it,j))ξ t,j Online learning with noisy side observations SequeL - 3/17
8 Framework Description goals Framework Description and goals Learning process N actions (nodes of a graph) T rounds: Environment sets losses for actions Environment choses a graph structure (not disclosed) Learner picks an action It to play Learner incurs the loss lt,it of the action I t Learner observes graph Learner observes noisy losses ct,j of the neighbors j N(I t) Goal of the learner Minimizing cumulative regret R t = T t=1 l It }{{} Learner min j [N] [ T ] l j t=1 } {{ } Best action Online learning with noisy side observations SequeL - 4/17
9 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Edgeless graph No side observations Regret bound of Õ( NT ) - Played action - Unobserved action Online learning with noisy side observations SequeL - 5/17
10 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) Complete graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) - Played action - Observed action Online learning with noisy side observations SequeL - 5/17
11 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) General graph No side observations Regret bound of Õ( NT ) Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number - Played action - Observed action - Unobserved action Online learning with noisy side observations SequeL - 5/17
12 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17
13 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 Online learning with noisy side observations SequeL - 5/17
14 Regret bounds Unweighted graphs Regret bounds - special cases Unweighted graphs Edgeless graph (Bandit problem) No side observations Regret bound of Õ( NT ) Independence number Complete graph (Full information) All side observations Regret bound of Õ( T ) General unweighted graph (Mannor and Shamir 2011) Some side observations Regret bound of Õ( αt ) α - independence number Size of the largest independence set Not connected nodes Example above: α = 5 NT αt T Online learning with noisy side observations SequeL - 5/17
15 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Online learning with noisy side observations SequeL - 6/17
16 General template Exp3 type algorithm template In every round: Compute exponential weights using loss estimates ˆl s,i ( ) t 1 w t,i = exp η t ˆl s,i s=1 Create a probability distribution such that p t,i w t,i Play action I t such that P(I t = i) = p t,i = w t,i N j=1 w t,j Create loss estimates ˆl t,i (using observability graph) Loss estimates define the algorithm Loss estimates compensate for lack of side observations Question: What are good loss estimates? Online learning with noisy side observations SequeL - 6/17
17 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i Online learning with noisy side observations SequeL - 7/17
18 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 Online learning with noisy side observations SequeL - 7/17
19 Loss estimates Loss estimates Desired property of loss estimates: Good loss approximation (unbiased estimates) ] E [ˆlt,i l t,i We can use noisy side observations: Loss estimates E [c t,i ] = E [ s t,(it,i)l t,j ] }{{} ( N j=1 s t,(j,i)p t,j)l t,i + E [ (1 s t,(it,i))ξ t,i ] }{{} 0 ˆl t,i = c t,i N j=1 s t,(j,i)p t,j Online learning with noisy side observations SequeL - 7/17
20 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17
21 Loss estimates Loss estimates First attempt ˆl t,i = s t,(i t,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s t,(j,i)p t,j Pros: Unbiased estimates (good approximation of real losses) Cons: Unreliable observations are included (small weights) Large variance of estimates No theoretical guaranties! Online learning with noisy side observations SequeL - 8/17
22 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17
23 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17
24 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Online learning with noisy side observations SequeL - 9/17
25 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17
26 Second attempt Loss estimates Second attempt - thresholding (Exp3-IXt algorithm) Small weights are not reliable Loss estimates for thresholded weights: ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Online learning with noisy side observations SequeL - 9/17
27 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17
28 Second attempt Loss estimates Second attempt - Exp3-IXt algorithm ˆl t,i = c t,i I{s t,(it,i) ε} N j=1 s t,(j,i)p t,j I{s t,(j,i) ε} Theorem (Regret bound of Exp3-IXt) When tuned properly, Exp3-IXt has regret bound of ( ) α(ε) E [R t ] = Õ ε T 2 α(ε) - independence number of thresholded graph Delete all the edges with weight smaller than ε Compute independence number (ignoring weights) Online learning with noisy side observations SequeL - 10/17
29 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Online learning with noisy side observations SequeL - 11/17
30 Second attempt Effective independence number Effective independence number (setting ε to optimal value) α α(ε) = min ε [0,1] ε 2 Regret bound of Exp3-IXt For optimal value of ε, Exp3-IXt has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 11/17
31 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α log(n) U(0, 1) weights Online learning with noisy side observations SequeL - 12/17
32 Second attempt Effective independence number α can be much smaller then N Example: complete graph with uniformly distributed weights α 1 c 2 U(c, 1) weights Online learning with noisy side observations SequeL - 12/17
33 Second attempt Effective independence number Question: How do we set ε? Online learning with noisy side observations SequeL - 13/17
34 Second attempt Effective independence number Question: How do we set ε? Answer: Finding optimal ε is hard Necessary to know whole graph Computing α(ε) is NP hard Online learning with noisy side observations SequeL - 13/17
35 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = [ st,(it,i)l t,i + (1 s t,(it,j))ξ t,i ] N j=1 s t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17
36 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Online learning with noisy side observations SequeL - 14/17
37 Third attempt Loss estimates Third attempt - Exp3-WIX algorithm ˆl t,i = s [ ] t,(i t,i) st,(it,i)l t,i + (1 s t,(it,j))ξ t,i N j=1 s2 t,(j,i) p t,j Properties of the estimates Unbiased estimates Smaller variance. Multiplying by s t,(it,i): Pulls estimate towards zero when weight is small Decreasing variance of noise Online learning with noisy side observations SequeL - 14/17
38 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Online learning with noisy side observations SequeL - 15/17
39 Third attempt Weighted graph Third attemp - regret bound of Exp3-WIX algorithm Theorem (Regret bound of Exp3-WIX) When tuned properly, Exp3-WIX algorithm has regret bound of ( ) E [R t ] = Õ α T Advantages over Exp3-IXt algorithm No thresholding (using all side observations) Algorithm does not need to know the best ε (NP-hard) Regret bound of order α T Online learning with noisy side observations SequeL - 15/17
40 Experiments Empirical performance Experiments Empirical performance Exp3 - basic algorithm which ignores all side observations Exp3-IXt - thresholded algorithm (needs to set ε) Exp3-WIX - proposed algorithm Adaptive learning rate Exp3-WIX Exp3 Exp3-IXt Cumulative regret Value of ε Online learning with noisy side observations SequeL - 16/17
41 Conclusion Conclusion New setting with noisy side observations Introduction of effective independence number α Exp3-WIX algorithm for the setting Does not need to threshold Does not need to know whole graph Regret bound of order α T Open questions: Is the effective independence number right quantity? Is there a matching lower-bound for Exp3-WIX? Upper-bound of Exp3-WIX matches lower-bound for some cases (e.g., bandits, full information, setting of Mannor and Shamir 2011) Related lower-bound (Wu et al. 2015) for a stochastic setting with Gaussian noise ( ) α R t = Ω ε T 2 Online learning with noisy side observations SequeL - 17/17
42 Thank you! Tomorrow session: Poster 10 SequeL Inria Lille AISTATS 2016 Toma s Koca k tomas.kocak@inria.fr sequel.lille.inria.fr
Efficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France Partially based on material by: Toma s Koca k November 23, 2015 MVA 2015/2016 Last Lecture Examples of applications of online SSL
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationMulti-task Linear Bandits
Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationOnline learning with feedback graphs and switching costs
Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence
More informationSpectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationSubsampling, Concentration and Multi-armed bandits
Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Rob Fergus, Tomáš Kocák March 17, 2015 MVA 2014/2015 Last Lecture Analysis of online SSL Analysis
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationarxiv: v2 [cs.lg] 24 Oct 2018
Adversarial Online Learning with noise arxiv:80.09346v [cs.lg] 4 Oct 08 Alon Resler alonress@gmail.com Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel Yishay Mansour mansour.yishay@gmail.com
More informationOnline combinatorial optimization with stochastic decision sets and adversarial losses
Online combinatorial optimization with stochastic decision sets and adversarial losses Gergely Neu Michal Valko SequeL team, INRIA Lille Nord urope, France {gergely.neu,michal.valko}@inria.fr Abstract
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationPack only the essentials: distributed sequential sampling for adaptive kernel DL
Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationOnline Learning with Abstention
Corinna Cortes 1 Giulia DeSalvo 1 Claudio Gentile 1 2 Mehryar Mohri 3 1 Scott Yang 4 Abstract We present an extensive study of a key problem in online learning where the learner can opt to abstain from
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu András György 2 Csaba Szepesvári Dept. of Computing Science University of Alberta {ywu2,szepesva}@ualberta.ca 2 Dept. of Electrical
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationPreference-Based Rank Elicitation using Statistical Models: The Case of Mallows
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationSubmodular Functions Properties Algorithms Machine Learning
Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised
More informationOnline Learning under Full and Bandit Information
Online Learning under Full and Bandit Information Artem Sokolov Computerlinguistik Universität Heidelberg 1 Motivation 2 Adversarial Online Learning Hedge EXP3 3 Stochastic Bandits ε-greedy UCB Real world
More informationOnline Optimization in X -Armed Bandits
Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole
More informationThe optimistic principle applied to function optimization
The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationImportance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits
Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely
More informationOnline Learning of Probabilistic Graphical Models
1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic
More informationIntroduction to Reinforcement Learning Part 3: Exploration for sequential decision making
Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationPure Exploration Stochastic Multi-armed Bandits
CAS2016 Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction Optimal PAC Algorithm (Best-Arm, Best-k-Arm):
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationarxiv: v1 [cs.lg] 23 Jan 2019
Cooperative Online Learning: Keeping your Neighbors Updated Nicolò Cesa-Bianchi, Tommaso R. Cesari, and Claire Monteleoni 2 Dipartimento di Informatica, Università degli Studi di Milano, Italy 2 Department
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationLaplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters
Sobel Filters Note that smoothing the image before applying a Sobel filter typically gives better results. Even thresholding the Sobel filtered image cannot usually create precise, i.e., -pixel wide, edges.
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationOnline Learning with Costly Features and Labels
Online Learning with Costly Features and Labels Navid Zolghadr Department of Computing Science University of Alberta zolghadr@ualberta.ca Gábor Bartók Department of Computer Science TH Zürich bartok@inf.ethz.ch
More information8 Basics of Hypothesis Testing
8 Basics of Hypothesis Testing 4 Problems Problem : The stochastic signal S is either 0 or E with equal probability, for a known value E > 0. Consider an observation X = x of the stochastic variable X
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationOnline (and Distributed) Learning with Information Constraints. Ohad Shamir
Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information
More informationRobustesse des techniques de Monte Carlo dans l analyse d événements rares
Institut National de Recherche en Informatique et Automatique Institut de Recherche en Informatique et Systèmes Aléatoires Robustesse des techniques de Monte Carlo dans l analyse d événements rares H.
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationKey words. online learning, multi-armed bandits, learning from experts, learning with partial feedback, graph theory
SIAM J. COMPUT. Vol. 46, No. 6, pp. 785 86 c 07 the authors NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK NOGA ALON, NICOLÒ CESA-BIANCHI, CLAUDIO GENTILE, SHIE MANNOR, YISHAY MANSOUR,
More informationProject in Computational Game Theory: Communities in Social Networks
Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].
More informationBandits on graphs and structures
Bandits on graphs and structures Michal Valko To cite this version: Michal Valko. Bandits on graphs and structures. Machine Learning [stat.ml]. École normale supérieure de Cachan - ENS Cachan, 2016.
More informationThompson Sampling for the non-stationary Corrupt Multi-Armed Bandit
European Worshop on Reinforcement Learning 14 (2018 October 2018, Lille, France. Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit Réda Alami Orange Labs 2 Avenue Pierre Marzin 22300,
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationAdaptive Policies for Online Ad Selection Under Chunked Reward Pricing Model
Adaptive Policies for Online Ad Selection Under Chunked Reward Pricing Model Dinesh Garg IBM India Research Lab, Bangalore garg.dinesh@in.ibm.com (Joint Work with Michael Grabchak, Narayan Bhamidipati,
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationPure Exploration in Finitely Armed and Continuous Armed Bandits
Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,
More informationAn Experimental Evaluation of High-Dimensional Multi-Armed Bandits
An Experimental Evaluation of High-Dimensional Multi-Armed Bandits Naoki Egami Romain Ferrali Kosuke Imai Princeton University Talk at Political Data Science Conference Washington University, St. Louis
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationCombining Interval and Probabilistic Uncertainty in Engineering Applications
Combining Interval and Probabilistic Uncertainty in Engineering Applications Andrew Pownuk Computational Science Program University of Texas at El Paso El Paso, Texas 79968, USA ampownuk@utep.edu Page
More informationLearning with Exploration
Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.
More informationQuantum query complexity of entropy estimation
Quantum query complexity of entropy estimation Xiaodi Wu QuICS, University of Maryland MSR Redmond, July 19th, 2017 J o i n t C e n t e r f o r Quantum Information and Computer Science Outline Motivation
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More information