The optimistic principle applied to function optimization

Size: px
Start display at page:

Download "The optimistic principle applied to function optimization"

Transcription

1 The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015

2 The optimistic principle applied to function optimization Optimistic principle: Play optimally in the most favorable world compatible with observations Function optimization: Lipschitz optimization Extensions: Relax Lipschitz assumption Design tractable algorithms Deal with unknown metric, Handle noise Numerical experiments

3 Optimization of a deterministic Lipschitz function Problem: Find online the maximum of f : X IR, assumed to be Lipschitz: f (x) f (y) L x y }{{} 2. l(x,y) Protocol: For each time step t = 1, 2,..., n, Select a state x t X Observe f (x t ) Return x(n) Loss: where f = sup x X f (x). r n = f f (x(n)),

4 Example in 1d f * f f(x ) t x t Lipschitz property the evaluation of f at x t provides a first upper-bound on f : B(x) = f (x t ) + l(x, x t ).

5 Example in 1d (continued) New point refined upper-bound: B(x) = min t [f (x t ) + l(x, x t )]

6 Example in 1d (continued) Question: where should one sample the next point? Answer: select the point with highest upper bound! Optimism in the face of (partial observation) uncertainty

7 Several issues 1. Lipschitz assumption is too strong 2. Finding the maximum of the upper-bounding function is computationally difficult! 3. What if we don t know the metric l? 4. How to handle noise?

8 Local smoothness property Assumption: f is locally smooth around its max. w.r.t. a semi-metric l: for all x X, f (x ) f (x) l(x, x ). f(x ) f f(x ) l(x, x ) x X

9 Local smoothness is enough! f(x ) f x Upper-bounding function: B(x) = min t [f (x t ) + l(x, x t )] Property: B(x ) f (x ) Optimistic principle only requires a true bound at the maximum

10 Efficient implementation Use a hierarchical partitioning of the space. Ex: The partitioning can be represented by a tree.

11 Deterministic Optimistic Optimization (DOO) For t = 1 to n, Evaluate the function at the center x i of each cell i Split the cell with highest bound I t = argmax f (x i ) + max l(x, x i ), i x X i }{{} diam l (X i ) Return x(n) def = argmax {x t} 1 t n f (x t )

12 Analysis of DOO Theorem 1. def The loss r n = f (x ) f (x(n)) of DOO is O ( n 1/d) for d > 0, r n = O ( e cn) for d = 0. where d is the near-optimality dimension of f w.r.t. l: C, ɛ, the set of ε-optimal states X ε def = {x X, f (x) f ε} can be covered by Cε d l-balls of radius ε.

13 Example 1: Assume the function is piecewise linear at its maximum: f (x ) f (x) = Θ( x x ). ε ε Using l(x, y) = x y, it takes O(ɛ 0 ) balls of radius ɛ to cover X ε. Thus d = 0.

14 Example 2: Assume the function is locally quadratic around its maximum: f (x ) f (x) = Θ( x x 2 ). ε f f(x ) x x ε For l(x, y) = x y, it takes O(ɛ D/2 ) balls of radius ɛ to cover X ε (of size O(ɛ D/2 )). Thus d = D/2.

15 Example 2: Assume the function is locally quadratic around its maximum: f (x ) f (x) = Θ( x x 2 ) ε f f(x ) x x 2 ε For l(x, y) = x y 2, it takes O(ɛ 0 ) l-balls of radius ɛ to cover X ε. Thus d = 0.

16 Example 3: Assume the function has a square-root behavior around its maximum: f (x ) f (x) = Θ( x x 1/2 ) ε f For l(x, y) = x y 1/2 we have d = 0. ε f(x ) x x 1/2

17 About the local smoothness assumption Assume X = [0, 1] D and f is locally equivalent to a polynomial of degree α > 0 around its maximum (i.e. f is α-smooth): f (x ) f (x) = Θ( x x α ) Consider the semi-metric l(x, y) = x y β, for some β > 0. If α = β, then d = 0. If α > β, then d = D( 1 β 1 α ) > 0. If α < β, then the function is not locally smooth wrt l. DOO heavilly depends on our knowledge of the true local smoothness.

18 Several possible metrics A function may be locally smooth w.r.t. several metrics: l 1 (x, y) = 14 x y (i.e., f is globally Lipschitz) l 2 (x, y) = 222 x y 2 (i.e., f is locally quadratic)

19 Experiments [2] Using l 1 (x, y) = 14 x y (i.e., f is globally Lipschitz). n = 150.

20 Experiments [3] Using l 2 (x, y) = 222 x y 2 (i.e., f is locally quadratic). n = 150.

21 Experiments [4] n uniform grid DOO with l 1 (d = 1/2) DOO with l 2 (d = 0) Loss r n for different values of n for a uniform grid and DOO with the two semi-metric l 1 and l 2.

22 What if the smoothness is unknown? Assume f is locally smooth w.r.t. some unknown metric l. Question: Is it possible to implement the optimistic principle? Answer: Be optimistic for all possible metrics! Simultaneous Optimistic Optimization (SOO) [Munos, 2011]: Expand several cells simultaneously SOO expands at most one cell per size SOO expands a cell only if its value is larger that the value of all cells of same or larger size. (inspired by DIRECT algorithm [Jones et al., 1993])

23 SOO algorithm Input: the maximum depth function t h max (t) Initialization: T 1 = {(0, 0)} (root node). Set t = 1. while True do Set v max =. for h = 0 to min(depth(t t ), h max (t)) do Select the leaf (h, j) L t of depth h with max f (x h,j ) value if f (x h,i ) > v max then Expand the node (h, i), Set v max = f (x h,i ), Set t = t + 1 if t = n then return x(n) = arg max (h,i) Tn x h,i end if end for end while.

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48 Performance of SOO Theorem 2 (Munos, 2014). For any semi-metric l such that f is locally smooth w.r.t. l. Let d be the near-optimality dimension of f w.r.t. l. Then { Õ(n r n = 1/d ) if d > 0, O(e c n ) if d = 0 Remarks: The algorithm does not use the knowledge of l, thus the analysis holds for the best possible choice of l SOO does almost as well as DOO optimally fitted.

49 Numerical experiments Again for the function f (x) = (sin(13x) sin(27x) + 1)/2 we have: n uniform grid DOO with l 1 DOO with l 2 SOO

50 The case d = 0 is non-trivial! Example: f is locally α-smooth around its maximum: for some α > 0. f (x ) f (x) = Θ( x x α ), SOO algorithm does not require the knowledge of l, Using l(x, y) = x y α in the analysis, all assumptions are satisfied (with γ = 3 α/d and d = 0, thus the loss of SOO is r n = O(3 nα/(cd) ) This is almost as good as DOO optimally fitted! (Extends to the case f (x ) f (x) D i=1 c i x i x i α i )

51 The case d = 0 More generally, any function whose upper- and lower envelopes around x have the same shape: c > 0 and η > 0, such that min(η, cl(x, x )) f (x ) f (x) l(x, x ), for all x X. has a near-optimality d = 0 (w.r.t. the metric l). f(x ) f(x ) cl(x, x ) f(x ) η f(x ) l(x, x ) x

52 Example of functions for which d = 0 l(x, y) = c x y 2. Thus r n = O(e c n )

53 Example of functions with d = 0 l(x, y) = c x y 1/2. Thus r n = O(e c n )

54 d = 0? l(x, y) = c x y 1/2.

55 d > 0 f (x) = 1 x + ( x 2 + x) (sin(1/x 2 ) + 1)/2 The lower-envelope is of order 1/2 whereas the upper one is of order 2. We deduce that d = 3/2 and r n = O(n 2/3 ).

56 How to handle noise? The evaluation of f at x t is perturbed by noise: y t = f (x t ) + ɛ t, with E[ɛ t x t ] = 0. f * f f(x ) t x t

57 Stochastic SOO (StoSOO) Extends SOO to stochastic evaluations: Select the cells X i (at most one per depth) according to SOO based on the UCBs: log n µ i,t + c T i (t), and get one more value y t = f (x i ) + ɛ t of f at x i, If T i (t) k, then split the cell X i. Remark: This really looks like UCT [Kocsis, Szepesvári, 2006] except that several cells are selected at each round. Implements the optimistic principle (here UCB) at several scales.

58 Performance of StoSOO Theorem 3 (Valko, Carpentier, Munos, 2013). For any semi-metric l such that f is locally smooth w.r.t. l. Assume the near-optimality dimension of f w.r.t. l is d = 0. By choosing k = n, h (log n) 3 max (n) = (log n) 3/2, the expected loss of StoSOO is ( (log n) 2 Er n = O ). n This is almost as good as the best known bandits algorithms in metric spaces optimally fitted (HOO [Bubeck, Munos, Szepesvári, Stoltz, 2011], Zooming [Kleinberg, Slivkins, Upfal, 2008]). See also [Bull, 2013].

59 Numerical evaluation of SOO We consider the CEC 2014 Competition on Single Objective Real-Parameter Numerical Optimization. Several test functions defined in [ 100, 100] D in dimension D {10,..., 100} using a finite budget of function evaluations. Joint work with Philippe Preux and Michal Valko.

60 Shifted and Rotated Rosenbrock s Function

61 Shifted and Rotated Ackley s Function

62 Shifted and Rotated Weierstrass Function

63 Shifted and Rotated Griewank s Function

64 Shifted Rastrigin s Function

65 Shifted Schwefel s Function

66 Shifted and Rotated HappyCat Function

67 Shifted and Rotated Expanded Griewank s plus Rosenbrock s Function

68 Shifted and Rotated Expanded Scaffer s Function

69 Composition Function 1

70 Composition Function 2

71 Composition Function 3

72 Performance of SOO for CEC 2014 competition Function 10 D 30 D 50 D 100 D Rosenbrock Ackley Weierstrass Griewank Rastrigin Schwefel HappyCat Gri. + Ros Scaffer Compo Compo Compo Relative error f (x(n))/f, for n = 10 5 function evaluations.

73 Post-processing with local optimizer Use 5% of budget allocated to post-optimization (using NLopt). Function 10D 30D no PO with PO no PO with PO Rosenbrock Ackley Weierstrass Griewank Rastrigin Schwefel HappyCat Gri. + Ros Scaffer Compo Compo Compo

74 Conclusions The optimistic principle Explore the most promising areas of the search space first Multi-scales exploration Natural transition from global to local search Dimension is not the only measure of complexity Rather it is local smoothness of the function around the maximum and how much we know about this smoothness

75 Reference From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, Vol 7, n 1, p , Thanks!!!

Stochastic Simultaneous Optimistic Optimization

Stochastic Simultaneous Optimistic Optimization Michal Valko michal.valko@inria.fr INRIA Lille - Nord Europe, SequeL team, 4 avenue Halley 5965, Villeneuve d Ascq, France Alexandra Carpentier a.carpentier@statslab.cam.ac.uk Statistical Laboratory, CMS,

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness Rémi Munos SequeL project, INRIA Lille Nord Europe, France remi.munos@inria.fr Abstract We consider a global

More information

Introduction to Reinforcement Learning and multi-armed bandits

Introduction to Reinforcement Learning and multi-armed bandits Introduction to Reinforcement Learning and multi-armed bandits Rémi Munos INRIA Lille - Nord Europe Currently on leave at MSR-NE http://researchers.lille.inria.fr/ munos/ NETADIS Summer School 2013, Hillerod,

More information

Online Optimization in X -Armed Bandits

Online Optimization in X -Armed Bandits Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole

More information

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making

Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Introduction to Reinforcement Learning Part 3: Exploration for sequential decision making Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe

More information

Introduction to Reinforcement Learning Part 3: Exploration for decision making, Application to games, optimization, and planning

Introduction to Reinforcement Learning Part 3: Exploration for decision making, Application to games, optimization, and planning Introduction to Reinforcement Learning Part 3: Exploration for decision making, Application to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Adaptive-treed bandits

Adaptive-treed bandits Bernoulli 21(4), 2015, 2289 2307 DOI: 10.3150/14-BEJ644 arxiv:1302.2489v4 [math.st] 29 Sep 2015 Adaptive-treed bandits ADAM D. BULL 1 Statistical Laboratory, Wilberforce Road, Cambridge CB3 0WB, UK. E-mail:

More information

Two generic principles in modern bandits: the optimistic principle and Thompson sampling

Two generic principles in modern bandits: the optimistic principle and Thompson sampling Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle

More information

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning Rémi Munos To cite this version: Rémi Munos. From Bandits to Monte-Carlo Tree Search: The Optimistic

More information

Monte-Carlo Tree Search by. MCTS by Best Arm Identification

Monte-Carlo Tree Search by. MCTS by Best Arm Identification Monte-Carlo Tree Search by Best Arm Identification and Wouter M. Koolen Inria Lille SequeL team CWI Machine Learning Group Inria-CWI workshop Amsterdam, September 20th, 2017 Part of...... a new Associate

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Adaptivity to Smoothness in X -armed bandits

Adaptivity to Smoothness in X -armed bandits Proceedings of Machine Learning Research vol 75:1 30, 2018 31st Annual Conference on Learning Theory Adaptivity to Smoothness in X -armed bandits Andrea Locatelli Mathematics Department, Otto-von-Guericke-Universität

More information

Optimistic planning for Markov decision processes

Optimistic planning for Markov decision processes Lucian Buşoniu Rémi Munos Team SequeL INRIA Lille Nord Europe 40 avenue Halley, Villeneuve d Ascq, France 1 Abstract The reinforcement learning community has recently intensified its interest in online

More information

arxiv: v1 [cs.ne] 29 Jul 2014

arxiv: v1 [cs.ne] 29 Jul 2014 A CUDA-Based Real Parameter Optimization Benchmark Ke Ding and Ying Tan School of Electronics Engineering and Computer Science, Peking University arxiv:1407.7737v1 [cs.ne] 29 Jul 2014 Abstract. Benchmarking

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Two optimization problems in a stochastic bandit model

Two optimization problems in a stochastic bandit model Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization

More information

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008 LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,

More information

arxiv: v1 [math.oc] 26 Dec 2016

arxiv: v1 [math.oc] 26 Dec 2016 arxiv:1612.08412v1 [math.oc] 26 Dec 2016 Multi-Objective Simultaneous Optimistic Optimization Abdullah Al-Dujaili S. Suresh December 28, 2016 Abstract Optimistic methods have been applied with success

More information

Analysis of Classification-based Policy Iteration Algorithms

Analysis of Classification-based Policy Iteration Algorithms Journal of Machine Learning Research 17 (2016) 1-30 Submitted 12/10; Revised 9/14; Published 4/16 Analysis of Classification-based Policy Iteration Algorithms Alessandro Lazaric 1 Mohammad Ghavamzadeh

More information

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Pure Exploration in Finitely Armed and Continuous Armed Bandits

Pure Exploration in Finitely Armed and Continuous Armed Bandits Pure Exploration in Finitely Armed and Continuous Armed Bandits Sébastien Bubeck INRIA Lille Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve d Ascq, France Rémi Munos INRIA Lille Nord Europe,

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Contextual Bandits with Similarity Information

Contextual Bandits with Similarity Information Journal of Machine Learning Research 15 (2014) 2533-2568 Submitted 2/13; Revised 1/14; Published 7/14 Contextual Bandits with Similarity Information Aleksandrs Slivkins Microsoft Research New York City

More information

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24 and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Contextual Bandits with Similarity Information

Contextual Bandits with Similarity Information JMLR: Workshop and Conference Proceedings 19 (2011) 679 701 24th Annual Conference on Learning Theory Contextual Bandits with Similarity Information Aleksandrs Slivkins Microsoft Research Silicon Valley,

More information

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x . Define f n, g n : [, ] R by f n (x) = Advanced Calculus Math 27B, Winter 25 Solutions: Final nx2 + n 2 x, g n(x) = n2 x 2 + n 2 x. 2 Show that the sequences (f n ), (g n ) converge pointwise on [, ],

More information

Multi-task Linear Bandits

Multi-task Linear Bandits Multi-task Linear Bandits Marta Soare Alessandro Lazaric Ouais Alsharif Joelle Pineau INRIA Lille - Nord Europe INRIA Lille - Nord Europe McGill University & Google McGill University NIPS2014 Workshop

More information

Natural Evolution Strategies for Direct Search

Natural Evolution Strategies for Direct Search Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday

More information

A Branch-and-Bound Algorithm for Unconstrained Global Optimization

A Branch-and-Bound Algorithm for Unconstrained Global Optimization SCAN 2010, Lyon, September 27 30, 2010 1 / 18 A Branch-and-Bound Algorithm for Unconstrained Global Optimization Laurent Granvilliers and Alexandre Goldsztejn Université de Nantes LINA CNRS Interval-based

More information

Model predictive control for max-plus-linear systems via optimistic optimization

Model predictive control for max-plus-linear systems via optimistic optimization Delft University of Technology Delft Center for Systems and Control Technical report 14-09 Model predictive control for max-plus-linear systems via optimistic optimization J. Xu, B. De Schutter, and T.J.J.

More information

White Hole-Black Hole Algorithm

White Hole-Black Hole Algorithm White Hole-Black Hole Algorithm Suad Khairi Mohammed Faculty of Electrical and Electronics Engineering University Malaysia Pahang Pahang, Malaysia suad.khairim@gmail.com Zuwairie Ibrahim Faculty of Electrical

More information

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Alexandra Carpentier 1, Alessandro Lazaric 1, Mohammad Ghavamzadeh 1, Rémi Munos 1, and Peter Auer 2 1 INRIA Lille - Nord Europe,

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Proceedings of Machine Learning Research vol 65:1 17, 2017 Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi Università degli Studi di Milano, Milano,

More information

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search Arthur Guez aguez@gatsby.ucl.ac.uk David Silver d.silver@cs.ucl.ac.uk Peter Dayan dayan@gatsby.ucl.ac.uk arxiv:1.39v4 [cs.lg] 18

More information

Minimax Number of Strata for Online Stratified Sampling given Noisy Samples

Minimax Number of Strata for Online Stratified Sampling given Noisy Samples Minimax Number of Strata for Online Stratified Sampling given Noisy Samples Alexandra Carpentier and Rémi Munos INRIA Lille Nord-Europe, 40 avenue Halley, 59650 Villeneuve d Ascq, France {alexandra.carpentier,remi.munos}@inria.fr

More information

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz To cite this version: Nicolò Cesa-Bianchi,

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.

More information

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems

Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Spectral Bandits for Smooth Graph Functions with Applications in Recommender Systems Tomáš Kocák SequeL team INRIA Lille France Michal Valko SequeL team INRIA Lille France Rémi Munos SequeL team, INRIA

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Data Structures for Efficient Inference and Optimization

Data Structures for Efficient Inference and Optimization Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete

More information

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison

Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison Michaël Castronovo University of Liège, Institut Montefiore, B28, B-4000 Liège, BELGIUM Damien Ernst

More information

The multi armed-bandit problem

The multi armed-bandit problem The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization

More information

Approximate Universal Artificial Intelligence

Approximate Universal Artificial Intelligence Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter David Silver University of New South Wales National ICT Australia The Australian National

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

The information complexity of best-arm identification

The information complexity of best-arm identification The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model

More information

Adaptive Strategy for Stratified Monte Carlo Sampling

Adaptive Strategy for Stratified Monte Carlo Sampling Journal of Machine Learning Research 16 2015 2231-2271 Submitted 3/13; Revised 12/14; Published 8/15 Adaptive Strategy for Stratified Monte Carlo Sampling Alexandra Carpentier Statistical Laboratory Center

More information

Model Predictive Control for continuous piecewise affine systems using optimistic optimization

Model Predictive Control for continuous piecewise affine systems using optimistic optimization Delft University of Technology Model Predictive Control for continuous piecewise affine systems using optimistic optimization Xu, Jia; van den Bm, Ton; Busoniu, L; De Schutter, Bart DOI.9/ACC.26.752658

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting

More information

Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning

Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning JMLR: Workshop and Conference Proceedings vol:1 8, 2012 10th European Workshop on Reinforcement Learning Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning Michael

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Local Self-Tuning in Nonparametric Regression. Samory Kpotufe ORFE, Princeton University

Local Self-Tuning in Nonparametric Regression. Samory Kpotufe ORFE, Princeton University Local Self-Tuning in Nonparametric Regression Samory Kpotufe ORFE, Princeton University Local Regression Data: {(X i, Y i )} n i=1, Y = f(x) + noise f nonparametric F, i.e. dim(f) =. Learn: f n (x) = avg

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems

Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems SGM2014: Stochastic Gradient Methods IPAM, February 24 28, 2014 James C. Spall

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

Online learning with noisy side observations

Online learning with noisy side observations Online learning with noisy side observations Tomáš Kocák Gergely Neu Michal Valko Inria Lille - Nord Europe, France DTIC, Universitat Pompeu Fabra, Barcelona, Spain Inria Lille - Nord Europe, France SequeL

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Graphs in Machine Learning

Graphs in Machine Learning Graphs in Machine Learning Michal Valko INRIA Lille - Nord Europe, France Partially based on material by: Ulrike von Luxburg, Gary Miller, Doyle & Schnell, Daniel Spielman January 27, 2015 MVA 2014/2015

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

RL 3: Reinforcement Learning

RL 3: Reinforcement Learning RL 3: Reinforcement Learning Q-Learning Michael Herrmann University of Edinburgh, School of Informatics 20/01/2015 Last time: Multi-Armed Bandits (10 Points to remember) MAB applications do exist (e.g.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Deviations of stochastic bandit regret

Deviations of stochastic bandit regret Deviations of stochastic bandit regret Antoine Salomon 1 and Jean-Yves Audibert 1,2 1 Imagine École des Ponts ParisTech Université Paris Est salomona@imagine.enpc.fr audibert@imagine.enpc.fr 2 Sierra,

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr

More information

Multi-Armed Bandits with Metric Movement Costs

Multi-Armed Bandits with Metric Movement Costs Multi-Armed Bandits with Metric Movement Costs Tomer Koren 1, Roi Livni 2, and Yishay Mansour 3 1 Google; tkoren@google.com 2 Princeton University; rlivni@cs.princeton.edu 3 Tel Aviv University; mansour@cs.tau.ac.il

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following

More information

(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2]

(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2] TMA M208 06 Cut-off date 28 April 2014 (Analysis Block B) Question 1 (Unit AB1) 25 marks This question tests your understanding of limits, the ε δ definition of continuity and uniform continuity, and your

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Online Learning with Abstention. Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 2016a).

Online Learning with Abstention. Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 2016a). + + + + + + Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 206a). A. Further Related Work Learning with abstention is a useful paradigm in applications where the cost

More information

CS412: Introduction to Numerical Methods

CS412: Introduction to Numerical Methods CS412: Introduction to Numerical Methods MIDTERM #1 2:30PM - 3:45PM, Tuesday, 03/10/2015 Instructions: This exam is a closed book and closed notes exam, i.e., you are not allowed to consult any textbook,

More information

Minimax strategy for Stratified Sampling for Monte Carlo

Minimax strategy for Stratified Sampling for Monte Carlo Journal of Machine Learning Research 1 2000 1-48 Submitted 4/00; Published 10/00 Minimax strategy for Stratified Sampling for Monte Carlo Alexandra Carpentier INRIA Lille - Nord Europe 40, avenue Halley

More information

Tsinghua Machine Learning Guest Lecture, June 9,

Tsinghua Machine Learning Guest Lecture, June 9, Tsinghua Machine Learning Guest Lecture, June 9, 2015 1 Lecture Outline Introduction: motivations and definitions for online learning Multi-armed bandit: canonical example of online learning Combinatorial

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

Partitioning Metric Spaces

Partitioning Metric Spaces Partitioning Metric Spaces Computational and Metric Geometry Instructor: Yury Makarychev 1 Multiway Cut Problem 1.1 Preliminaries Definition 1.1. We are given a graph G = (V, E) and a set of terminals

More information

Lecture 6: Non-stochastic best arm identification

Lecture 6: Non-stochastic best arm identification CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:

More information

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen

Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search. Wouter M. Koolen Bandit Algorithms for Pure Exploration: Best Arm Identification and Game Tree Search Wouter M. Koolen Machine Learning and Statistics for Structures Friday 23 rd February, 2018 Outline 1 Intro 2 Model

More information

Cognitive Cyber-Physical System

Cognitive Cyber-Physical System Cognitive Cyber-Physical System Physical to Cyber-Physical The emergence of non-trivial embedded sensor units, networked embedded systems and sensor/actuator networks has made possible the design and implementation

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Optimistic Planning for Belief-Augmented Markov Decision Processes

Optimistic Planning for Belief-Augmented Markov Decision Processes Optimistic Planning for Belief-Augmented Markov Decision Processes Raphael Fonteneau, Lucian Buşoniu, Rémi Munos Department of Electrical Engineering and Computer Science, University of Liège, BELGIUM

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Multilevel Monte Carlo for Lévy Driven SDEs

Multilevel Monte Carlo for Lévy Driven SDEs Multilevel Monte Carlo for Lévy Driven SDEs Felix Heidenreich TU Kaiserslautern AG Computational Stochastics August 2011 joint work with Steffen Dereich Philipps-Universität Marburg supported within DFG-SPP

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam To cite this version: Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed

More information