Multi-task Linear Bandits
|
|
- Rosalind Wiggins
- 5 years ago
- Views:
Transcription
1 Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop on Transfer and Multi-task Learning : Theory meets Practice Montreal, December 2014
2 Motivating example: Movie recommendation system Features: age, profession,... Ratings Learn his preference as fast as possible! Marta Soare Multi-task Linear Bandits 2/10
3 Motivating example: Movie recommendation system Features: age, profession,... Ratings Learn his preference as fast as possible! Features: age, profession,... Learn his preference faster (with fewer ratings)! Marta Soare Multi-task Linear Bandits 2/10
4 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Marta Soare Multi-task Linear Bandits 3/10
5 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Task Similarity: there exists ε > 0 such that for any pair of tasks (θ, θ ) we have θ θ 2 ε. Marta Soare Multi-task Linear Bandits 3/10
6 Sequential multi-task linear bandit Linear bandit task: Set of arms X R d, X = K and x 2 L, x X. Reward model for task j r j (x) = x θ j + η where θ j R d is unknown and η is an R sub-gaussian noise. Cumulative regret w.r.t. the optimal arm = arg max x X x θ j x j n j R nj = s=1 (x j x s ) θ j Task Similarity: there exists ε > 0 such that for any pair of tasks (θ, θ ) we have θ θ 2 ε. Improve performance on a particular task by leveraging information from previous tasks! Marta Soare Multi-task Linear Bandits 3/10
7 Single task estimates: OLS estimate: For any task j, the estimate of θ j after n rewards is A j,n = ˆθ j,n = A 1 j,n b j,n n x t xt b j,n = t=1 n x t r t Prediction error: Thm.2 in [Abbasi Yadkori et al., NIPS 2012] (w.p. 1 δ) ( x θ j x ˆθλ x (A 1 + nl2 /λ ) ) j,n λ j,n ) (R d log + λ 1/2 θ 1 j δ }{{} B j,n(x) t=1 Marta Soare Multi-task Linear Bandits 4/10
8 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10
9 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10
10 Multi-task estimates : θ m+1,t = à 1 m+1,t b m+1,t E [ θm+1,t ] = θ m+1,t m à m+1,t = A j +A m+1,t m bm+1,t = b j +b m+1,t j=1 j=1 Theorem Let θ m+1,t λ be the multi-task regularized least-squares estimate. Then, for any δ 0, for any t 1, with probability greater than 1 δ it holds that: x ( θ m+1,t λ θ m+1 ) ) ) +λ 1/2 S +ε x. x ( à (R d log ( det(ãλ m+1,t )1/2 det(λi) 1/2 λ m+1,t ) 1 δ }{{} B m+1,t(x) Marta Soare Multi-task Linear Bandits 5/10
11 MT-LinUCB Input: budgets {n j } j, arms X R d, regularizer λ j = 1 A = λi d, b = b = 0 d, Ãj = λi d, ˆθ j = A 1 b for j = 1,..., m + 1 do Compute the multi-task OLS solution: Ã j = Ãj + A λi d, b = b + b, θ j = 1 b Ãj Compute the task-specific OLS solution: A = λi d, b = 0 d, ˆθ j = A 1 b for t = 1,..., n j do Select arms according to the tightest bound: x t =arg max min x X [ x ˆθj + B j,t (x); x θ j + B j,t (x) Observe reward: r t = x t θ j + η t Update: A, b, ˆθ j, B j,t, Ãj, b, θ j, B j,t end for end for ] Marta Soare Multi-task Linear Bandits 6/10
12 Experiments 200 tasks with θ 1,..., θ 200 R 2 randomly generated max θ θ 2 = samples for each task ε = MT-LinUCB is never worse than Regret for each task LinUCB MT LinUCB LinUCB Task number Marta Soare Multi-task Linear Bandits 7/10
13 Experiments As the difference between tasks reduces, the advantage of MT-LinUCB becomes more evident. For ε = 0.2 2, the regret of MT-LinUCB decreases with every additional task, while the regret for LinUCB remains constant over time. Regret for each task LinUCB MT LinUCB Task number Marta Soare Multi-task Linear Bandits 8/10
14 Follow up work Regret bound for MT-LinUCB. Additional experiments on datasets (see Boston Housing example on the poster). Weighting scheme according to task relevance. Marta Soare Multi-task Linear Bandits 9/10
15 Thank you! SequeL INRIA Lille
Graphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France Partially based on material by: Toma s Koca k November 23, 2015 MVA 2015/2016 Last Lecture Examples of applications of online SSL
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Rob Fergus, Tomáš Kocák March 17, 2015 MVA 2014/2015 Last Lecture Analysis of online SSL Analysis
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationOnline learning with noisy side observations
Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL
More informationSequential Transfer in Multi-armed Bandit with Finite Set of Models
Sequential Transfer in Multi-armed Bandit with Finite Set of Models Mohammad Gheshlaghi Azar School of Computer Science CMU Alessandro Lazaric INRIA Lille - Nord Europe Team SequeL Emma Brunskill School
More informationProfile-Based Bandit with Unknown Profiles
Journal of Machine Learning Research 9 (208) -40 Submitted /7; Revised 6/8; Published 9/8 Profile-Based Bandit with Unknown Profiles Sylvain Lamprier sylvain.lamprier@lip6.fr Sorbonne Universités, UPMC
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationThèse de Doctorat. Sequential Resource Allocation in Linear Stochastic Bandits
École Doctorale Sciences pour l Ingénieur Inria Lille - Nord Europe Université Lille 1 Thèse de Doctorat présentée pour obtenir le grade de DOCTEUR EN SCIENCES DE L UNIVERSITÉ LILLE 1 Spécialité : Informatique
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille
More informationLarge-scale Information Processing, Summer Recommender Systems (part 2)
Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a
More informationSpectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationEstimation Considerations in Contextual Bandits
Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to
More informationDiscovering Unknown Unknowns of Predictive Models
Discovering Unknown Unknowns of Predictive Models Himabindu Lakkaraju Stanford University himalv@cs.stanford.edu Rich Caruana rcaruana@microsoft.com Ece Kamar eckamar@microsoft.com Eric Horvitz horvitz@microsoft.com
More informationContextual Combinatorial Bandit and its Application on Diversified Online Recommendation
Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation Lijing Qin Shouyuan Chen Xiaoyan Zhu Abstract Recommender systems are faced with new challenges that are beyond
More informationA fully adaptive algorithm for pure exploration in linear bandits
A fully adaptive algorithm for pure exploration in linear bandits Liyuan Xu Junya Honda Masashi Sugiyama :The University of Tokyo :RIKEN Abstract We propose the first fully-adaptive algorithm for pure
More informationPack only the essentials: distributed sequential sampling for adaptive kernel DL
Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationTransferable Contextual Bandit for Cross-Domain Recommendation
Transferable Contextual Bandit for Cross-Domain Recommendation Bo Liu, Ying Wei, Yu Zhang, Zhixian Yan, Qiang Yang Hong Kong University of Science and Technology, Hong Kong Cheetah Mobile USA bliuab@cse.ust.hk,
More informationMonte-Carlo Tree Search by. MCTS by Best Arm Identification
Monte-Carlo Tree Search by Best Arm Identification and Wouter M. Koolen Inria Lille SequeL team CWI Machine Learning Group Inria-CWI workshop Amsterdam, September 20th, 2017 Part of...... a new Associate
More informationLecture 2: Learning from Evaluative Feedback. or Bandit Problems
Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those
More informationEFFICIENT ALGORITHMS FOR LINEAR POLYHEDRAL BANDITS. Manjesh K. Hanawal Amir Leshem Venkatesh Saligrama
EFFICIENT ALGORITHMS FOR LINEAR POLYHEDRAL BANDITS Manjesh K. Hanawal Amir Leshem Venkatesh Saligrama IEOR Group, IIT-Bombay, Mumbai, India 400076 Dept. of EE, Bar-Ilan University, Ramat-Gan, Israel 52900
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationLecture 20: Linear Dynamics and LQG
CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecturer: Kevin Jamieson Lecture 20: Linear Dynamics and LQG Scribes: Atinuke Ademola-Idowu, Yuanyuan Shi Disclaimer: These notes have not been
More informationRelative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Masrour Zoghi 1, Shimon Whiteson 1, Rémi Munos 2 and Maarten de Rijke 1 University of Amsterdam 1 ; INRIA Lille / MSR-NE 2 June 24,
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationTor Lattimore & Csaba Szepesvári
Bandit Algorithms Tor Lattimore & Csaba Szepesvári Outline 1 From Contextual to Linear Bandits 2 Stochastic Linear Bandits 3 Confidence Bounds for Least-Squares Estimators 4 Improved Regret for Fixed,
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationThe optimistic principle applied to function optimization
The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationSelecting the State-Representation in Reinforcement Learning
Selecting the State-Representation in Reinforcement Learning Odalric-Ambrym Maillard INRIA Lille - Nord Europe odalricambrym.maillard@gmail.com Rémi Munos INRIA Lille - Nord Europe remi.munos@inria.fr
More informationTruthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities
Truthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities Nicola Gatti a, Alessandro Lazaric b, Marco Rocco a, Francesco Trovò a a Politecnico di Milano, piazza Leonardo
More informationGeneralization to a zero-data task: an empirical study
Generalization to a zero-data task: an empirical study Université de Montréal 20/03/2007 Introduction Introduction and motivation What is a zero-data task? task for which no training data are available
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationPreference-Based Rank Elicitation using Statistical Models: The Case of Mallows
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos
More informationBayesian optimization for automatic machine learning
Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization
More informationContextual Bandits in A Collaborative Environment
Contextual Bandits in A Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department of Systems and Information Engineering University
More informationAnnealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm
Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Saba Q. Yahyaa, Madalina M. Drugan and Bernard Manderick Vrije Universiteit Brussel, Department of Computer Science, Pleinlaan 2, 1050 Brussels,
More informationInformational Confidence Bounds for Self-Normalized Averages and Applications
Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree
More informationThe multi armed-bandit problem
The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization
More informationPure Exploration in Finitely Armed and Continuous Armed Bandits
Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,
More informationCollaborative Filtering as a Multi-Armed Bandit
Collaborative Filtering as a Multi-Armed Bandit Frédéric Guillou, Romaric Gaudel, Philippe Preux To cite this version: Frédéric Guillou, Romaric Gaudel, Philippe Preux. Collaborative Filtering as a Multi-Armed
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationOpen Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationThompson Sampling for the non-stationary Corrupt Multi-Armed Bandit
European Worshop on Reinforcement Learning 14 (2018 October 2018, Lille, France. Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit Réda Alami Orange Labs 2 Avenue Pierre Marzin 22300,
More informationCrowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning
Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,
More informationBandits and Recommender Systems
Bandits and Recommender Systems Jérémie Mary, Romaric Gaudel, Philippe Preux To cite this version: Jérémie Mary, Romaric Gaudel, Philippe Preux. Bandits and Recommender Systems. First International Workshop
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationStochastic Contextual Bandits with Known. Reward Functions
Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern
More informationarxiv: v1 [cs.lg] 6 Jun 2018
Mitigating Bias in Adaptive Data Gathering via Differential Privacy Seth Neel Aaron Roth June 7, 2018 arxiv:1806.02329v1 [cs.lg] 6 Jun 2018 Abstract Data that is gathered adaptively via bandit algorithms,
More informationMostly Exploration-Free Algorithms for Contextual Bandits
Mostly Exploration-Free Algorithms for Contextual Bandits Hamsa Bastani IBM Thomas J. Watson Research and Wharton School, hamsab@wharton.upenn.edu Mohsen Bayati Stanford Graduate School of Business, bayati@stanford.edu
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationComputation time/accuracy trade-off and linear regression
Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Thompson sampling Bernoulli strategy Regret bounds Extensions the flexibility of Bayesian strategies 1 Bayesian bandit strategies
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationLearning the Number of Neurons in Deep Networks
Learning the Number of Jose M. Alvarez 1 Mathieu Salzmanno 2 1 Data61 @ CSIRO,Canberra, ACT 2601, Australia 2 CVLab, EPFL,CH-1015 Lausanne, Switzerland NIPS,2016 Presenter: Arshdeep Sekhon NIPS,2016 Presenter:
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationINTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders
INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders REINFORCEMENT LEARNING CHALLENGES f(θ) is a discrete function of theta How do we get a gradient θ f? Credit assignment
More informationLecture 10: Contextual Bandits
CSE599i: Online and Adaptive Machine Learning Winter 208 Lecturer: Lalit Jain Lecture 0: Contextual Bandits Scribes: Neeraja Abhyankar, Joshua Fan, Kunhui Zhang Disclaimer: These notes have not been subjected
More informationStochastic Gradient Descent. CS 584: Big Data Analytics
Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationIntroduction to Reinforcement Learning Part 3: Exploration for sequential decision making
Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe
More informationMultiple Identifications in Multi-Armed Bandits
Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationPreference Based Adaptation for Learning Objectives
Preference Based Adaptation for Learning Objectives Yao-Xiang Ding Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 223, China {dingyx, zhouzh}@lamda.nju.edu.cn
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationLarge-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1
More informationStat 260/CS Learning in Sequential Decision Problems.
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential
More informationRobustness in GANs and in Black-box Optimization
Robustness in GANs and in Black-box Optimization Stefanie Jegelka MIT CSAIL joint work with Zhi Xu, Chengtao Li, Ilija Bogunovic, Jonathan Scarlett and Volkan Cevher Robustness in ML noise Generator Critic
More informationOnline Optimization in X -Armed Bandits
Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole
More informationarxiv: v1 [cs.lg] 23 May 2018
Learning Contextual Bandits in a Non-stationary Environment Qingyun Wu, Naveen Iyer, Hongning Wang Department of Computer Science, University of Virginia Charlottesville, VA, USA {qwky,nkikd,hw5x}@virginia.edu
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More information1 A Support Vector Machine without Support Vectors
CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationA Time and Space Efficient Algorithm for Contextual Linear Bandits
A Time and Space Efficient Algorithm for Contextual Linear Bandits José Bento, Stratis Ioannidis, S. Muthukrishnan, and Jinyun Yan Stanford University, Technicolor, Rutgers University, Rutgers University
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationarxiv: v1 [stat.ml] 13 Mar 2019
Gussian Process Optimization with Adaptive Sketching: Scalable and No Regret arxiv:1903.05594v1 [stat.ml] 13 Mar 2019 Daniele Calandriello LCSL - Istituto Italiano di Tecnologia, Genova, Italy & MIT, Cambridge,
More information