Multi-task Linear Bandits

Size: px
Start display at page:

Download "Multi-task Linear Bandits"

Transcription

1 Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop on Transfer and Multi-task Learning : Theory meets Practice Montreal, December 2014

2 Motivating example: Movie recommendation system Features: age, profession,... Ratings Learn his preference as fast as possible! Marta Soare Multi-task Linear Bandits 2/10

3 Motivating example: Movie recommendation system Features: age, profession,... Ratings Learn his preference as fast as possible! Features: age, profession,... Learn his preference faster (with fewer ratings)! Marta Soare Multi-task Linear Bandits 2/10

4 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Marta Soare Multi-task Linear Bandits 3/10

5 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Task Similarity: there exists ε > 0 such that for any pair of tasks (θ, θ ) we have θ θ 2 ε. Marta Soare Multi-task Linear Bandits 3/10

6 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Task Similarity: there exists ε > 0 such that for any pair of tasks (θ, θ ) we have θ θ 2 ε. Improve performance on a particular task by leveraging information from previous tasks! Marta Soare Multi-task Linear Bandits 3/10

7 Single task estimates: OLS estimate: For any task j, the estimate of θ j after n rewards is A j,n = ˆθ j,n = A 1 j,n b j,n n x t xt b j,n = t=1 n x t r t Prediction error: Thm.2 in [Abbasi Yadkori et al., NIPS 2012] (w.p. 1 δ) ( x θ j x ˆθλ x (A 1 + nl2 /λ ) ) j,n λ j,n ) (R d log + λ 1/2 θ 1 j δ }{{} B j,n(x) t=1 Marta Soare Multi-task Linear Bandits 4/10

8 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10

9 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10

10 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10

11 MT-LinUCB Input: budgets {n j } j, arms X R d, regularizer λ j = 1 A = λi d, b = b = 0 d, Ãj = λi d, ˆθ j = A 1 b for j = 1,..., m + 1 do Compute the multi-task OLS solution: Ã j = Ãj + A λi d, b = b + b, θ j = 1 b Ãj Compute the task-specific OLS solution: A = λi d, b = 0 d, ˆθ j = A 1 b for t = 1,..., n j do Select arms according to the tightest bound: x t =arg max min x X [ x ˆθj + B j,t (x); x θ j + B j,t (x) Observe reward: r t = x t θ j + η t Update: A, b, ˆθ j, B j,t, Ãj, b, θ j, B j,t end for end for ] Marta Soare Multi-task Linear Bandits 6/10

12 Experiments 200 tasks with θ 1,..., θ 200 R 2 randomly generated max θ θ 2 = samples for each task ε = MT-LinUCB is never worse than Regret for each task LinUCB MT LinUCB LinUCB Task number Marta Soare Multi-task Linear Bandits 7/10

13 Experiments As the difference between tasks reduces, the advantage of MT-LinUCB becomes more evident. For ε = 0.2 2, the regret of MT-LinUCB decreases with every additional task, while the regret for LinUCB remains constant over time. Regret for each task LinUCB MT LinUCB Task number Marta Soare Multi-task Linear Bandits 8/10

14 Follow up work Regret bound for MT-LinUCB. Additional experiments on datasets (see Boston Housing example on the poster). Weighting scheme according to task relevance. Marta Soare Multi-task Linear Bandits 9/10

15 Thank you! SequeL INRIA Lille

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France Partially based on material by: Toma s Koca k November 23, 2015 MVA 2015/2016 Last Lecture Examples of applications of online SSL

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Rob Fergus, Tomáš Kocák March 17, 2015 MVA 2014/2015 Last Lecture Analysis of online SSL Analysis

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Online learning with noisy side observations

Online learning with noisy side observations Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL

More information

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Sequential Transfer in Multi-armed Bandit with Finite Set of Models Sequential Transfer in Multi-armed Bandit with Finite Set of Models Mohammad Gheshlaghi Azar School of Computer Science CMU Alessandro Lazaric INRIA Lille - Nord Europe Team SequeL Emma Brunskill School

More information

Profile-Based Bandit with Unknown Profiles

Profile-Based Bandit with Unknown Profiles Journal of Machine Learning Research 9 (208) -40 Submitted /7; Revised 6/8; Published 9/8 Profile-Based Bandit with Unknown Profiles Sylvain Lamprier sylvain.lamprier@lip6.fr Sorbonne Universités, UPMC

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Thèse de Doctorat. Sequential Resource Allocation in Linear Stochastic Bandits

Thèse de Doctorat. Sequential Resource Allocation in Linear Stochastic Bandits École Doctorale Sciences pour l Ingénieur Inria Lille - Nord Europe Université Lille 1 Thèse de Doctorat présentée pour obtenir le grade de DOCTEUR EN SCIENCES DE L UNIVERSITÉ LILLE 1 Spécialité : Informatique

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

Estimation Considerations in Contextual Bandits

Estimation Considerations in Contextual Bandits Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to

More information

Discovering Unknown Unknowns of Predictive Models

Discovering Unknown Unknowns of Predictive Models Discovering Unknown Unknowns of Predictive Models Himabindu Lakkaraju Stanford University himalv@cs.stanford.edu Rich Caruana rcaruana@microsoft.com Ece Kamar eckamar@microsoft.com Eric Horvitz horvitz@microsoft.com

More information

Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation

Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation Lijing Qin Shouyuan Chen Xiaoyan Zhu Abstract Recommender systems are faced with new challenges that are beyond

More information

A fully adaptive algorithm for pure exploration in linear bandits

A fully adaptive algorithm for pure exploration in linear bandits A fully adaptive algorithm for pure exploration in linear bandits Liyuan Xu Junya Honda Masashi Sugiyama :The University of Tokyo :RIKEN Abstract We propose the first fully-adaptive algorithm for pure

More information

Pack only the essentials: distributed sequential sampling for adaptive kernel DL

Pack only the essentials: distributed sequential sampling for adaptive kernel DL Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

Transferable Contextual Bandit for Cross-Domain Recommendation

Transferable Contextual Bandit for Cross-Domain Recommendation Transferable Contextual Bandit for Cross-Domain Recommendation Bo Liu, Ying Wei, Yu Zhang, Zhixian Yan, Qiang Yang Hong Kong University of Science and Technology, Hong Kong Cheetah Mobile USA bliuab@cse.ust.hk,

More information

Monte-Carlo Tree Search by. MCTS by Best Arm Identification

Monte-Carlo Tree Search by. MCTS by Best Arm Identification Monte-Carlo Tree Search by Best Arm Identification and Wouter M. Koolen Inria Lille SequeL team CWI Machine Learning Group Inria-CWI workshop Amsterdam, September 20th, 2017 Part of...... a new Associate

More information

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those

More information

EFFICIENT ALGORITHMS FOR LINEAR POLYHEDRAL BANDITS. Manjesh K. Hanawal Amir Leshem Venkatesh Saligrama

EFFICIENT ALGORITHMS FOR LINEAR POLYHEDRAL BANDITS. Manjesh K. Hanawal Amir Leshem Venkatesh Saligrama EFFICIENT ALGORITHMS FOR LINEAR POLYHEDRAL BANDITS Manjesh K. Hanawal Amir Leshem Venkatesh Saligrama IEOR Group, IIT-Bombay, Mumbai, India 400076 Dept. of EE, Bar-Ilan University, Ramat-Gan, Israel 52900

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Lecture 20: Linear Dynamics and LQG

Lecture 20: Linear Dynamics and LQG CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecturer: Kevin Jamieson Lecture 20: Linear Dynamics and LQG Scribes: Atinuke Ademola-Idowu, Yuanyuan Shi Disclaimer: These notes have not been

More information

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Masrour Zoghi 1, Shimon Whiteson 1, Rémi Munos 2 and Maarten de Rijke 1 University of Amsterdam 1 ; INRIA Lille / MSR-NE 2 June 24,

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Tor Lattimore & Csaba Szepesvári

Tor Lattimore & Csaba Szepesvári Bandit Algorithms Tor Lattimore & Csaba Szepesvári Outline 1 From Contextual to Linear Bandits 2 Stochastic Linear Bandits 3 Confidence Bounds for Least-Squares Estimators 4 Improved Regret for Fixed,

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

The optimistic principle applied to function optimization

The optimistic principle applied to function optimization The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Evaluation of multi armed bandit algorithms and empirical algorithm

Evaluation of multi armed bandit algorithms and empirical algorithm Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.

More information

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making

More information

Selecting the State-Representation in Reinforcement Learning

Selecting the State-Representation in Reinforcement Learning Selecting the State-Representation in Reinforcement Learning Odalric-Ambrym Maillard INRIA Lille - Nord Europe odalricambrym.maillard@gmail.com Rémi Munos INRIA Lille - Nord Europe remi.munos@inria.fr

More information

Truthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities

Truthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities Truthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities Nicola Gatti a, Alessandro Lazaric b, Marco Rocco a, Francesco Trovò a a Politecnico di Milano, piazza Leonardo

More information

Generalization to a zero-data task: an empirical study

Generalization to a zero-data task: an empirical study Generalization to a zero-data task: an empirical study Université de Montréal 20/03/2007 Introduction Introduction and motivation What is a zero-data task? task for which no training data are available

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Multi-armed Bandits in the Presence of Side Observations in Social Networks 52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos

More information

Bayesian optimization for automatic machine learning

Bayesian optimization for automatic machine learning Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization

More information

Contextual Bandits in A Collaborative Environment

Contextual Bandits in A Collaborative Environment Contextual Bandits in A Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department of Systems and Information Engineering University

More information

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Saba Q. Yahyaa, Madalina M. Drugan and Bernard Manderick Vrije Universiteit Brussel, Department of Computer Science, Pleinlaan 2, 1050 Brussels,

More information

Informational Confidence Bounds for Self-Normalized Averages and Applications

Informational Confidence Bounds for Self-Normalized Averages and Applications Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree

More information

The multi armed-bandit problem

The multi armed-bandit problem The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization

More information

Pure Exploration in Finitely Armed and Continuous Armed Bandits

Pure Exploration in Finitely Armed and Continuous Armed Bandits Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,

More information

Collaborative Filtering as a Multi-Armed Bandit

Collaborative Filtering as a Multi-Armed Bandit Collaborative Filtering as a Multi-Armed Bandit Frédéric Guillou, Romaric Gaudel, Philippe Preux To cite this version: Frédéric Guillou, Romaric Gaudel, Philippe Preux. Collaborative Filtering as a Multi-Armed

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit European Worshop on Reinforcement Learning 14 (2018 October 2018, Lille, France. Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit Réda Alami Orange Labs 2 Avenue Pierre Marzin 22300,

More information

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,

More information

Bandits and Recommender Systems

Bandits and Recommender Systems Bandits and Recommender Systems Jérémie Mary, Romaric Gaudel, Philippe Preux To cite this version: Jérémie Mary, Romaric Gaudel, Philippe Preux. Bandits and Recommender Systems. First International Workshop

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

Stochastic Contextual Bandits with Known. Reward Functions

Stochastic Contextual Bandits with Known. Reward Functions Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern

More information

arxiv: v1 [cs.lg] 6 Jun 2018

arxiv: v1 [cs.lg] 6 Jun 2018 Mitigating Bias in Adaptive Data Gathering via Differential Privacy Seth Neel Aaron Roth June 7, 2018 arxiv:1806.02329v1 [cs.lg] 6 Jun 2018 Abstract Data that is gathered adaptively via bandit algorithms,

More information

Mostly Exploration-Free Algorithms for Contextual Bandits

Mostly Exploration-Free Algorithms for Contextual Bandits Mostly Exploration-Free Algorithms for Contextual Bandits Hamsa Bastani IBM Thomas J. Watson Research and Wharton School, hamsab@wharton.upenn.edu Mohsen Bayati Stanford Graduate School of Business, bayati@stanford.edu

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights

More information

Computation time/accuracy trade-off and linear regression

Computation time/accuracy trade-off and linear regression Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Thompson sampling Bernoulli strategy Regret bounds Extensions the flexibility of Bayesian strategies 1 Bayesian bandit strategies

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

Learning the Number of Neurons in Deep Networks

Learning the Number of Neurons in Deep Networks Learning the Number of Jose M. Alvarez 1 Mathieu Salzmanno 2 1 Data61 @ CSIRO,Canberra, ACT 2601, Australia 2 CVLab, EPFL,CH-1015 Lausanne, Switzerland NIPS,2016 Presenter: Arshdeep Sekhon NIPS,2016 Presenter:

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders REINFORCEMENT LEARNING CHALLENGES f(θ) is a discrete function of theta How do we get a gradient θ f? Credit assignment

More information

Lecture 10: Contextual Bandits

Lecture 10: Contextual Bandits CSE599i: Online and Adaptive Machine Learning Winter 208 Lecturer: Lalit Jain Lecture 0: Contextual Bandits Scribes: Neeraja Abhyankar, Joshua Fan, Kunhui Zhang Disclaimer: These notes have not been subjected

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe

More information

Multiple Identifications in Multi-Armed Bandits

Multiple Identifications in Multi-Armed Bandits Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

Preference Based Adaptation for Learning Objectives

Preference Based Adaptation for Learning Objectives Preference Based Adaptation for Learning Objectives Yao-Xiang Ding Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 223, China {dingyx, zhouzh}@lamda.nju.edu.cn

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

Stat 260/CS Learning in Sequential Decision Problems.

Stat 260/CS Learning in Sequential Decision Problems. Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential

More information

Robustness in GANs and in Black-box Optimization

Robustness in GANs and in Black-box Optimization Robustness in GANs and in Black-box Optimization Stefanie Jegelka MIT CSAIL joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher Robustness in ML noise Generator Critic

More information

Online Optimization in X -Armed Bandits

Online Optimization in X -Armed Bandits Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole

More information

arxiv: v1 [cs.lg] 23 May 2018

arxiv: v1 [cs.lg] 23 May 2018 Learning Contextual Bandits in a Non-stationary Environment Qingyun Wu, Naveen Iyer, Hongning Wang Department of Computer Science, University of Virginia Charlottesville, VA, USA {qwky,nkikd,hw5x}@virginia.edu

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

1 A Support Vector Machine without Support Vectors

1 A Support Vector Machine without Support Vectors CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

A Time and Space Efficient Algorithm for Contextual Linear Bandits

A Time and Space Efficient Algorithm for Contextual Linear Bandits A Time and Space Efficient Algorithm for Contextual Linear Bandits José Bento, Stratis Ioannidis, S. Muthukrishnan, and Jinyun Yan Stanford University, Technicolor, Rutgers University, Rutgers University

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement

More information

arxiv: v1 [stat.ml] 13 Mar 2019

arxiv: v1 [stat.ml] 13 Mar 2019 Gussian Process Optimization with Adaptive Sketching: Scalable and No Regret arxiv:1903.05594v1 [stat.ml] 13 Mar 2019 Daniele Calandriello LCSL - Istituto Italiano di Tecnologia, Genova, Italy & MIT, Cambridge,

More information