The optimistic principle applied to function optimization
|
|
- Dwight Arnold
- 5 years ago
- Views:
Transcription
1 The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015
2 The optimistic principle applied to function optimization Optimistic principle: Play optimally in the most favorable world compatible with observations Function optimization: Lipschitz optimization Extensions: Relax Lipschitz assumption Design tractable algorithms Deal with unknown metric, Handle noise Numerical experiments
3 Optimization of a deterministic Lipschitz function Problem: Find online the maximum of f : X IR, assumed to be Lipschitz: f (x) f (y) L x y }{{} 2. l(x,y) Protocol: For each time step t = 1, 2,..., n, Select a state x t X Observe f (x t ) Return x(n) Loss: where f = sup x X f (x). r n = f f (x(n)),
4 Example in 1d f * f f(x ) t x t Lipschitz property the evaluation of f at x t provides a first upper-bound on f : B(x) = f (x t ) + l(x, x t ).
5 Example in 1d (continued) New point refined upper-bound: B(x) = min t [f (x t ) + l(x, x t )]
6 Example in 1d (continued) Question: where should one sample the next point? Answer: select the point with highest upper bound! Optimism in the face of (partial observation) uncertainty
7 Several issues 1. Lipschitz assumption is too strong 2. Finding the maximum of the upper-bounding function is computationally difficult! 3. What if we don t know the metric l? 4. How to handle noise?
8 Local smoothness property Assumption: f is locally smooth around its max. w.r.t. a semi-metric l: for all x X, f (x ) f (x) l(x, x ). f(x ) f f(x ) l(x, x ) x X
9 Local smoothness is enough! f(x ) f x Upper-bounding function: B(x) = min t [f (x t ) + l(x, x t )] Property: B(x ) f (x ) Optimistic principle only requires a true bound at the maximum
10 Efficient implementation Use a hierarchical partitioning of the space. Ex: The partitioning can be represented by a tree.
11 Deterministic Optimistic Optimization (DOO) For t = 1 to n, Evaluate the function at the center x i of each cell i Split the cell with highest bound I t = argmax f (x i ) + max l(x, x i ), i x X i }{{} diam l (X i ) Return x(n) def = argmax {x t} 1 t n f (x t )
12 Analysis of DOO Theorem 1. def The loss r n = f (x ) f (x(n)) of DOO is O ( n 1/d) for d > 0, r n = O ( e cn) for d = 0. where d is the near-optimality dimension of f w.r.t. l: C, ɛ, the set of ε-optimal states X ε def = {x X, f (x) f ε} can be covered by Cε d l-balls of radius ε.
13 Example 1: Assume the function is piecewise linear at its maximum: f (x ) f (x) = Θ( x x ). ε ε Using l(x, y) = x y, it takes O(ɛ 0 ) balls of radius ɛ to cover X ε. Thus d = 0.
14 Example 2: Assume the function is locally quadratic around its maximum: f (x ) f (x) = Θ( x x 2 ). ε f f(x ) x x ε For l(x, y) = x y, it takes O(ɛ D/2 ) balls of radius ɛ to cover X ε (of size O(ɛ D/2 )). Thus d = D/2.
15 Example 2: Assume the function is locally quadratic around its maximum: f (x ) f (x) = Θ( x x 2 ) ε f f(x ) x x 2 ε For l(x, y) = x y 2, it takes O(ɛ 0 ) l-balls of radius ɛ to cover X ε. Thus d = 0.
16 Example 3: Assume the function has a square-root behavior around its maximum: f (x ) f (x) = Θ( x x 1/2 ) ε f For l(x, y) = x y 1/2 we have d = 0. ε f(x ) x x 1/2
17 About the local smoothness assumption Assume X = [0, 1] D and f is locally equivalent to a polynomial of degree α > 0 around its maximum (i.e. f is α-smooth): f (x ) f (x) = Θ( x x α ) Consider the semi-metric l(x, y) = x y β, for some β > 0. If α = β, then d = 0. If α > β, then d = D( 1 β 1 α ) > 0. If α < β, then the function is not locally smooth wrt l. DOO heavilly depends on our knowledge of the true local smoothness.
18 Several possible metrics A function may be locally smooth w.r.t. several metrics: l 1 (x, y) = 14 x y (i.e., f is globally Lipschitz) l 2 (x, y) = 222 x y 2 (i.e., f is locally quadratic)
19 Experiments [2] Using l 1 (x, y) = 14 x y (i.e., f is globally Lipschitz). n = 150.
20 Experiments [3] Using l 2 (x, y) = 222 x y 2 (i.e., f is locally quadratic). n = 150.
21 Experiments [4] n uniform grid DOO with l 1 (d = 1/2) DOO with l 2 (d = 0) Loss r n for different values of n for a uniform grid and DOO with the two semi-metric l 1 and l 2.
22 What if the smoothness is unknown? Assume f is locally smooth w.r.t. some unknown metric l. Question: Is it possible to implement the optimistic principle? Answer: Be optimistic for all possible metrics! Simultaneous Optimistic Optimization (SOO) [Munos, 2011]: Expand several cells simultaneously SOO expands at most one cell per size SOO expands a cell only if its value is larger that the value of all cells of same or larger size. (inspired by DIRECT algorithm [Jones et al., 1993])
23 SOO algorithm Input: the maximum depth function t h max (t) Initialization: T 1 = {(0, 0)} (root node). Set t = 1. while True do Set v max =. for h = 0 to min(depth(t t ), h max (t)) do Select the leaf (h, j) L t of depth h with max f (x h,j ) value if f (x h,i ) > v max then Expand the node (h, i), Set v max = f (x h,i ), Set t = t + 1 if t = n then return x(n) = arg max (h,i) Tn x h,i end if end for end while.
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48 Performance of SOO Theorem 2 (Munos, 2014). For any semi-metric l such that f is locally smooth w.r.t. l. Let d be the near-optimality dimension of f w.r.t. l. Then { Õ(n r n = 1/d ) if d > 0, O(e c n ) if d = 0 Remarks: The algorithm does not use the knowledge of l, thus the analysis holds for the best possible choice of l SOO does almost as well as DOO optimally fitted.
49 Numerical experiments Again for the function f (x) = (sin(13x) sin(27x) + 1)/2 we have: n uniform grid DOO with l 1 DOO with l 2 SOO
50 The case d = 0 is non-trivial! Example: f is locally α-smooth around its maximum: for some α > 0. f (x ) f (x) = Θ( x x α ), SOO algorithm does not require the knowledge of l, Using l(x, y) = x y α in the analysis, all assumptions are satisfied (with γ = 3 α/d and d = 0, thus the loss of SOO is r n = O(3 nα/(cd) ) This is almost as good as DOO optimally fitted! (Extends to the case f (x ) f (x) D i=1 c i x i x i α i )
51 The case d = 0 More generally, any function whose upper- and lower envelopes around x have the same shape: c > 0 and η > 0, such that min(η, cl(x, x )) f (x ) f (x) l(x, x ), for all x X. has a near-optimality d = 0 (w.r.t. the metric l). f(x ) f(x ) cl(x, x ) f(x ) η f(x ) l(x, x ) x
52 Example of functions for which d = 0 l(x, y) = c x y 2. Thus r n = O(e c n )
53 Example of functions with d = 0 l(x, y) = c x y 1/2. Thus r n = O(e c n )
54 d = 0? l(x, y) = c x y 1/2.
55 d > 0 f (x) = 1 x + ( x 2 + x) (sin(1/x 2 ) + 1)/2 The lower-envelope is of order 1/2 whereas the upper one is of order 2. We deduce that d = 3/2 and r n = O(n 2/3 ).
56 How to handle noise? The evaluation of f at x t is perturbed by noise: y t = f (x t ) + ɛ t, with E[ɛ t x t ] = 0. f * f f(x ) t x t
57 Stochastic SOO (StoSOO) Extends SOO to stochastic evaluations: Select the cells X i (at most one per depth) according to SOO based on the UCBs: log n µ i,t + c T i (t), and get one more value y t = f (x i ) + ɛ t of f at x i, If T i (t) k, then split the cell X i. Remark: This really looks like UCT [Kocsis, Szepesvári, 2006] except that several cells are selected at each round. Implements the optimistic principle (here UCB) at several scales.
58 Performance of StoSOO Theorem 3 (Valko, Carpentier, Munos, 2013). For any semi-metric l such that f is locally smooth w.r.t. l. Assume the near-optimality dimension of f w.r.t. l is d = 0. By choosing k = n, h (log n) 3 max (n) = (log n) 3/2, the expected loss of StoSOO is ( (log n) 2 Er n = O ). n This is almost as good as the best known bandits algorithms in metric spaces optimally fitted (HOO [Bubeck, Munos, Szepesvári, Stoltz, 2011], Zooming [Kleinberg, Slivkins, Upfal, 2008]). See also [Bull, 2013].
59 Numerical evaluation of SOO We consider the CEC 2014 Competition on Single Objective Real-Parameter Numerical Optimization. Several test functions defined in [ 100, 100] D in dimension D {10,..., 100} using a finite budget of function evaluations. Joint work with Philippe Preux and Michal Valko.
60 Shifted and Rotated Rosenbrock s Function
61 Shifted and Rotated Ackley s Function
62 Shifted and Rotated Weierstrass Function
63 Shifted and Rotated Griewank s Function
64 Shifted Rastrigin s Function
65 Shifted Schwefel s Function
66 Shifted and Rotated HappyCat Function
67 Shifted and Rotated Expanded Griewank s plus Rosenbrock s Function
68 Shifted and Rotated Expanded Scaffer s Function
69 Composition Function 1
70 Composition Function 2
71 Composition Function 3
72 Performance of SOO for CEC 2014 competition Function 10 D 30 D 50 D 100 D Rosenbrock Ackley Weierstrass Griewank Rastrigin Schwefel HappyCat Gri. + Ros Scaffer Compo Compo Compo Relative error f (x(n))/f, for n = 10 5 function evaluations.
73 Post-processing with local optimizer Use 5% of budget allocated to post-optimization (using NLopt). Function 10D 30D no PO with PO no PO with PO Rosenbrock Ackley Weierstrass Griewank Rastrigin Schwefel HappyCat Gri. + Ros Scaffer Compo Compo Compo
74 Conclusions The optimistic principle Explore the most promising areas of the search space first Multi-scales exploration Natural transition from global to local search Dimension is not the only measure of complexity Rather it is local smoothness of the function around the maximum and how much we know about this smoothness
75 Reference From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, Vol 7, n 1, p , Thanks!!!
Stochastic Simultaneous Optimistic Optimization
Michal Valko michal.valko@inria.fr INRIA Lille - Nord Europe, SequeL team, 4 avenue Halley 5965, Villeneuve d Ascq, France Alexandra Carpentier a.carpentier@statslab.cam.ac.uk Statistical Laboratory, CMS,
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationOptimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness Rémi Munos SequeL project, INRIA Lille Nord Europe, France remi.munos@inria.fr Abstract We consider a global
More informationIntroduction to Reinforcement Learning and multi-armed bandits
Introduction to Reinforcement Learning and multi-armed bandits Rémi Munos INRIA Lille - Nord Europe Currently on leave at MSR-NE http://researchers.lille.inria.fr/ munos/ NETADIS Summer School 2013, Hillerod,
More informationOnline Optimization in X -Armed Bandits
Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole
More informationIntroduction to Reinforcement Learning Part 3: Exploration for sequential decision making
Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe
More informationIntroduction to Reinforcement Learning Part 3: Exploration for decision making, Application to games, optimization, and planning
Introduction to Reinforcement Learning Part 3: Exploration for decision making, Application to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationAdaptive-treed bandits
Bernoulli 21(4), 2015, 2289 2307 DOI: 10.3150/14-BEJ644 arxiv:1302.2489v4 [math.st] 29 Sep 2015 Adaptive-treed bandits ADAM D. BULL 1 Statistical Laboratory, Wilberforce Road, Cambridge CB3 0WB, UK. E-mail:
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationFrom Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning Rémi Munos To cite this version: Rémi Munos. From Bandits to Monte-Carlo Tree Search: The Optimistic
More informationMonte-Carlo Tree Search by. MCTS by Best Arm Identification
Monte-Carlo Tree Search by Best Arm Identification and Wouter M. Koolen Inria Lille SequeL team CWI Machine Learning Group Inria-CWI workshop Amsterdam, September 20th, 2017 Part of...... a new Associate
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationAdaptivity to Smoothness in X -armed bandits
Proceedings of Machine Learning Research vol 75:1 30, 2018 31st Annual Conference on Learning Theory Adaptivity to Smoothness in X -armed bandits Andrea Locatelli Mathematics Department, Otto-von-Guericke-Universität
More informationOptimistic planning for Markov decision processes
Lucian Buşoniu Rémi Munos Team SequeL INRIA Lille Nord Europe 40 avenue Halley, Villeneuve d Ascq, France 1 Abstract The reinforcement learning community has recently intensified its interest in online
More informationarxiv: v1 [cs.ne] 29 Jul 2014
A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationarxiv: v1 [math.oc] 26 Dec 2016
arxiv:1612.08412v1 [math.oc] 26 Dec 2016 Multi-Objective Simultaneous Optimistic Optimization Abdullah Al-Dujaili S. Suresh December 28, 2016 Abstract Optimistic methods have been applied with success
More informationAnalysis of Classification-based Policy Iteration Algorithms
Journal of Machine Learning Research 17 (2016) 1-30 Submitted 12/10; Revised 9/14; Published 4/16 Analysis of Classification-based Policy Iteration Algorithms Alessandro Lazaric 1 Mohammad Ghavamzadeh
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationPure Exploration in Finitely Armed and Continuous Armed Bandits
Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationContextual Bandits with Similarity Information
Journal of Machine Learning Research 15 (2014) 2533-2568 Submitted 2/13; Revised 1/14; Published 7/14 Contextual Bandits with Similarity Information Aleksandrs Slivkins Microsoft Research New York City
More informationBayesian reinforcement learning and partially observable Markov decision processes November 6, / 24
and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationContextual Bandits with Similarity Information
JMLR: Workshop and Conference Proceedings 19 (2011) 679 701 24th Annual Conference on Learning Theory Contextual Bandits with Similarity Information Aleksandrs Slivkins Microsoft Research Silicon Valley,
More informationAdvanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x
. Define f n, g n : [, ] R by f n (x) = Advanced Calculus Math 27B, Winter 25 Solutions: Final nx2 + n 2 x, g n(x) = n2 x 2 + n 2 x. 2 Show that the sequences (f n ), (g n ) converge pointwise on [, ],
More informationMulti-task Linear Bandits
Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop
More informationNatural Evolution Strategies for Direct Search
Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday
More informationA Branch-and-Bound Algorithm for Unconstrained Global Optimization
SCAN 2010, Lyon, September 27 30, 2010 1 / 18 A Branch-and-Bound Algorithm for Unconstrained Global Optimization Laurent Granvilliers and Alexandre Goldsztejn Université de Nantes LINA CNRS Interval-based
More informationModel predictive control for max-plus-linear systems via optimistic optimization
Delft University of Technology Delft Center for Systems and Control Technical report 14-09 Model predictive control for max-plus-linear systems via optimistic optimization J. Xu, B. De Schutter, and T.J.J.
More informationWhite Hole-Black Hole Algorithm
White Hole-Black Hole Algorithm Suad Khairi Mohammed Faculty of Electrical and Electronics Engineering University Malaysia Pahang Pahang, Malaysia suad.khairim@gmail.com Zuwairie Ibrahim Faculty of Electrical
More informationUpper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,
More informationAlgorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
Proceedings of Machine Learning Research vol 65:1 17, 2017 Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi Università degli Studi di Milano, Milano,
More informationEfficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search Arthur Guez aguez@gatsby.ucl.ac.uk David Silver d.silver@cs.ucl.ac.uk Peter Dayan dayan@gatsby.ucl.ac.uk arxiv:1.39v4 [cs.lg] 18
More informationMinimax Number of Strata for Online Stratified Sampling given Noisy Samples
Minimax Number of Strata for Online Stratified Sampling given Noisy Samples Alexandra Carpentier and Rémi Munos INRIA Lille Nord-Europe, 40 avenue Halley, 59650 Villeneuve d Ascq, France {alexandra.carpentier,remi.munos}@inria.fr
More informationAlgorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz To cite this version: Nicolò Cesa-Bianchi,
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationMulti-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang
Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.
More informationSpectral Bandits for Smooth Graph Functions with Applications in Recommender Systems
Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationData Structures for Efficient Inference and Optimization
Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete
More informationBayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison
Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison Michaël Castronovo University of Liège, Institut Montefiore, B28, B-4000 Liège, BELGIUM Damien Ernst
More informationThe multi armed-bandit problem
The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization
More informationApproximate Universal Artificial Intelligence
Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter David Silver University of New South Wales National ICT Australia The Australian National
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationAdaptive Strategy for Stratified Monte Carlo Sampling
Journal of Machine Learning Research 16 2015 2231-2271 Submitted 3/13; Revised 12/14; Published 8/15 Adaptive Strategy for Stratified Monte Carlo Sampling Alexandra Carpentier Statistical Laboratory Center
More informationModel Predictive Control for continuous piecewise affine systems using optimistic optimization
Delft University of Technology Model Predictive Control for continuous piecewise affine systems using optimistic optimization Xu, Jia; van den Bm, Ton; Busoniu, L; De Schutter, Bart DOI.9/ACC.26.752658
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationLearning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
JMLR: Workshop and Conference Proceedings vol:1 8, 2012 10th European Workshop on Reinforcement Learning Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning Michael
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationLocal Self-Tuning in Nonparametric Regression. Samory Kpotufe ORFE, Princeton University
Local Self-Tuning in Nonparametric Regression Samory Kpotufe ORFE, Princeton University Local Regression Data: {(X i, Y i )} n i=1, Y = f(x) + noise f nonparametric F, i.e. dim(f) =. Learn: f n (x) = avg
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationRecent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems
Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems SGM2014: Stochastic Gradient Methods IPAM, February 24 28, 2014 James C. Spall
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationOnline learning with noisy side observations
Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationRL 3: Reinforcement Learning
RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationDeviations of stochastic bandit regret
Deviations of stochastic bandit regret Antoine Salomon 1 and Jean-Yves Audibert 1,2 1 Imagine École des Ponts ParisTech Université Paris Est salomona@imagine.enpc.fr audibert@imagine.enpc.fr 2 Sierra,
More informationEfficient learning by implicit exploration in bandit problems with side observations
Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr
More informationMulti-Armed Bandits with Metric Movement Costs
Multi-Armed Bandits with Metric Movement Costs Tomer Koren 1, Roi Livni 2, and Yishay Mansour 3 1 Google; tkoren@google.com 2 Princeton University; rlivni@cs.princeton.edu 3 Tel Aviv University; mansour@cs.tau.ac.il
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More information(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2]
TMA M208 06 Cut-off date 28 April 2014 (Analysis Block B) Question 1 (Unit AB1) 25 marks This question tests your understanding of limits, the ε δ definition of continuity and uniform continuity, and your
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationOnline Learning with Abstention. Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 2016a).
+ + + + + + Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 206a). A. Further Related Work Learning with abstention is a useful paradigm in applications where the cost
More informationCS412: Introduction to Numerical Methods
CS412: Introduction to Numerical Methods MIDTERM #1 2:30PM - 3:45PM, Tuesday, 03/10/2015 Instructions: This exam is a closed book and closed notes exam, i.e., you are not allowed to consult any textbook,
More informationMinimax strategy for Stratified Sampling for Monte Carlo
Journal of Machine Learning Research 1 2000 1-48 Submitted 4/00; Published 10/00 Minimax strategy for Stratified Sampling for Monte Carlo Alexandra Carpentier INRIA Lille - Nord Europe 40, avenue Halley
More informationTsinghua Machine Learning Guest Lecture, June 9,
Tsinghua Machine Learning Guest Lecture, June 9, 2015 1 Lecture Outline Introduction: motivations and definitions for online learning Multi-armed bandit: canonical example of online learning Combinatorial
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationPartitioning Metric Spaces
Partitioning Metric Spaces Computational and Metric Geometry Instructor: Yury Makarychev 1 Multiway Cut Problem 1.1 Preliminaries Definition 1.1. We are given a graph G = (V, E) and a set of terminals
More informationLecture 6: Non-stochastic best arm identification
CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:
More informationZero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal
Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationBandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen
Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search Wouter M. Koolen Machine Learning and Statistics for Structures Friday 23 rd February, 2018 Outline 1 Intro 2 Model
More informationCognitive Cyber-Physical System
Cognitive Cyber-Physical System Physical to Cyber-Physical The emergence of non-trivial embedded sensor units, networked embedded systems and sensor/actuator networks has made possible the design and implementation
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationOptimistic Planning for Belief-Augmented Markov Decision Processes
Optimistic Planning for Belief-Augmented Markov Decision Processes Raphael Fonteneau, Lucian Buşoniu, Rémi Munos Department of Electrical Engineering and Computer Science, University of Liège, BELGIUM
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationMultilevel Monte Carlo for Lévy Driven SDEs
Multilevel Monte Carlo for Lévy Driven SDEs Felix Heidenreich TU Kaiserslautern AG Computational Stochastics August 2011 joint work with Steffen Dereich Philipps-Universität Marburg supported within DFG-SPP
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationPure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence
Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam To cite this version: Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed
More information