The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
|
|
- Natalie Chapman
- 5 years ago
- Views:
Transcription
1 The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
2 Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm and Theoretical Results Experiments on real and synthetic data sets Further Results: Quadratic Forms, Noise-free Optimization and Lower Bounds
3 Sequential Black-Box Optimization Problem Statement Let f : X R where X could be a subset of R D, non-parametric,... We consider the problem of finding the maximum of f denoted by: f = sup f (x), x X via successive (expensive) queries f (x 1 ), f (x 2 ),... Noisy Observations At iteration T we choose x T +1 using the previous noisy observations Y T = {y 1,..., y T }, where t T : y t = f (x t ) + ɛ t and ɛ t iid N (0, η 2 ). Contal 3/41 Gaussian Processes Optimization
4 Objective Regret (unknown in practice) The efficiency of a policy is measured via the simple or cumulative regret: } S T = min t T {f f (x t ) R T = ( ) t T f f (x t ). Goal S T t 0 as fast as possible (e.g. numerical optimization) R T = o(t ) as small as possible (e.g clinical trials, ads campaign) Our aim is to obtain upper bounds on S T an R T with high probability. Contal 4/41 Gaussian Processes Optimization
5 Exploration/Exploitation tradeoff 1 x 5? objective 0 1 (x 3, y 3) (x 4, y 4) (x 1, y 1) 2 (x 2, y 2) parameter Contal 5/41 Gaussian Processes Optimization
6 Exploration/Exploitation tradeoff 1 x 5? objective 0 1 (x 3, y 3) (x 4, y 4) (x 1, y 1) 2 (x 2, y 2) parameter Contal 5/41 Gaussian Processes Optimization
7 Gaussian Processes Definition f GP(m, k) with mean function m : X R and covariance function k : X X R +, when for all x 1,..., x n X we have: ( ) ( [m(xi f (x 1 ),..., f (x n ) N ) ], [ k(x xi i, x j ) ] ). xi,xj Probabilistic Smoothness Assumption Nearby locations are highly correlated Large local variations have low probability Example of Covariance Function Squared Exponential RBF: k(x, y) = exp( x y 2 2 Rational Quadratic: k(x, y) = ( 1 + x y 2 2 2αl 2 ) 2l ) 2 α Contal 6/41 Gaussian Processes Optimization
8 Gaussian Processes: Examples Figure: Samples of 1D Gaussian Processes with different covariance functions Contal 7/41 Gaussian Processes Optimization
9 Posterior Distribution Bayesian Inference Rasmussen and Williams (2006) Given the observations Y t = [y 1,..., y t ] at the query locations X t = (x 1,..., x t ) we compute for all x X : µ t (x) := E[f (x) X t, Y t ] = k t (x) C 1 t Y t σt 2 (x) := V[f (x) X t, Y t ] = k(x, x) k t (x) C 1 t k t (x) where C t = K t + η 2 I and K t = [k(x i, x j )] xi,x j X t Interpretation posterior mean µ t : prediction posterior deviation σ t : uncertainty Contal 8/41 Gaussian Processes Optimization
10 Gaussian Processes Confidence Bounds Contal 9/41 Gaussian Processes Optimization
11 Setup: Summary Assumptions f GP(0, k) with known covariance k, y t x t + ɛ t where ɛ t N (0, η 2 ) with known η. Regrets { } S T = min f f (x t ), t T R T = ( ) f f (x t ). t T Contal 10/41 Gaussian Processes Optimization
12 Related Work Bayesian Optimization Bull (2011): Expected Improvement Algorithm Hennig et al. (2012): Entropy Search Algorithm Upper Confidence Bounds Freitas et al. (2012): Deterministic GP Srinivas et al. (2012): GP-UCB Djolonga et al. (2013): High-dimensional GP Chaining Grunewalder et al. (2010): Known horizon bandits Gaillard and Gerchinovitz. (2015): Online regression Contal 11/41 Gaussian Processes Optimization
13 Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm and Theoretical Results Experiments on real and synthetic data sets Further Results: Quadratic Forms, Noise-free Optimization and Lower Bounds
14 Upper Confidence Bounds (UCB) Strategy: Upper Confidence Bounds (UCB) If we have with high probability, f f (x t ) UCB t (x t ), then we can control the regret: R T t T UCB t (x t ) and S T 1 T UCB t (x t ). t T Contal 13/41 Gaussian Processes Optimization
15 Canonical Pseudo-Distance A First UCB Fix x X, let d 2 t (x, x t ) = V[f (x ) f (x t ) X t, Y t ] = σ 2 t (x ) 2k t (x, x t ) + σ 2 t (x t ). For all δ > 0, set β δ = 2 log δ 1, with probability at least 1 δ: Union Bound over X f (x ) f (x t ) µ t (x ) µ t (x t ) + β δ d t (x, x t ). With X <, we have with probability at least 1 X δ: sup f (x ) f (x t ) sup µ t (x ) µ t (x t ) + β δ d t (x, x t ). x X x X Contal 14/41 Gaussian Processes Optimization
16 Canonical Pseudo-Distance A First UCB Fix x X, let d 2 t (x, x t ) = V[f (x ) f (x t ) X t, Y t ] = σ 2 t (x ) 2k t (x, x t ) + σ 2 t (x t ). For all δ > 0, set β δ = 2 log δ 1, with probability at least 1 δ: Union Bound over X f (x ) f (x t ) µ t (x ) µ t (x t ) + β δ d t (x, x t ). With X <, we have with probability at least 1 X δ: sup f (x ) f (x t ) sup µ t (x ) µ t (x t ) + β δ d t (x, x t ). x X x X But what if X =? Contal 14/41 Gaussian Processes Optimization
17 Covering Numbers ε-net T X is an ε-net of X for d t iff: x X, x T s.t. d t (x, x ) ε Covering Numbers The covering number N(X, d t, ε) is the size of the smallest ε-net. Contal 15/41 Gaussian Processes Optimization
18 An ε-net for the Euclidean Distance X ε Contal 16/41 Gaussian Processes Optimization
19 Hierarchical Covers Assumption (w.l.o.g) x, y X, k(x, y) 1. Since d t (x, y) k(x, y), any point of X is a 1-net of X for d t. Hierarchical Covers Let T = (T i ) i 0 such that for all i 0: T i is an ε i -net with ε i = 2 i, T i T i+1. Contal 17/41 Gaussian Processes Optimization
20 Hierarchical Covers: ɛ 0 = 1 Contal 18/41 Gaussian Processes Optimization
21 Hierarchical Covers: ɛ 1 = 1 2 Contal 18/41 Gaussian Processes Optimization
22 Hierarchical Covers: ɛ 2 = 1 4 Contal 18/41 Gaussian Processes Optimization
23 Hierarchical Covers: ɛ 3 = 1 8 Contal 18/41 Gaussian Processes Optimization
24 Localized Chaining Projection to T starting from x t toward x Define π i (x ) = argmin xi T i {x t} d t (x, x i ), then: π i (x ) i x, π i (x ) = x t if d t (x, x t ) < ε i. The Chaining Trick f (x ) f (x t ) = i:ɛ i <d t(x,x t) f (π i (x )) f (π i 1 (x )). Contal 19/41 Gaussian Processes Optimization
25 Localized Chaining Projection to T starting from x t toward x Define π i (x ) = argmin xi T i {x t} d t (x, x i ), then: π i (x ) i x, π i (x ) = x t if d t (x, x t ) < ε i. The Chaining Trick sup f (x ) f (x t ) = sup x X x X i:ɛ i <d t(x,x t) f (π i (x )) f (π i 1 (x )). Contal 19/41 Gaussian Processes Optimization
26 The Chaining Trick x x t Contal 20/41 Gaussian Processes Optimization
27 Upper Confidence Bound Converging distances d t ( πi (x ), π i 1 (x ) ) ε i 1 UCB at depth i: Union Bound on T i For any i 1, with probability at least 1 T i δ: sup f (π i (x )) f (π i 1 (x )) sup µ t (π i (x )) µ t (π i 1 (x )) + β δ ε i 1 x X x X Final UCB Set β δ,i = 2 log ( i 2 T i δ 1), with probability at least 1 π2 6 δ: sup f (x ) f (x t ) sup µ t (x ) µ t (x t ) + ε i β δ,i. x X x X i:ε i <d t(x,x) Contal 21/41 Gaussian Processes Optimization
28 Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm and Theoretical Results Experiments on real and synthetic data sets Further Results: Quadratic Forms, Noise-free Optimization and Lower Bounds
29 The Chaining-UCB Algorithm Contal et al. (2015) UCB policy x t+1 argmax µ t (x) + ε i β δ,i. x X i:ε i <σ t(x) Practical Remark The algorithm needs only to compute the β δ,i for i where ε i > min x X σ t(x). Contal 23/41 Gaussian Processes Optimization
30 Upper Confidence Bound Contal 24/41 Gaussian Processes Optimization
31 Greedy Cover NP-hardness Computing the β δ,i requires to compute the hierarchical ε i -nets. Finding the smallest ε i -net is NP-hard. Greedy optimal approximation T ; X X while X = do x argmax x X { x X : d(x, x ) ε } T T {x} X X \ {x X : d(x, x ) ε} end return T Contal 25/41 Gaussian Processes Optimization
32 Theorem: Generic bounds for the Chaining-UCB algorithm For δ > 0, denoting σ t = σ t (x t ), we have t 1, c δ R: sup f (x ( ) ) f (x t ) σ t cδ 6 log σ t + 9 ε i log N(X, dt, ε i ), x X i:ε i <σ t with probability at least 1 δ. Contal 26/41 Gaussian Processes Optimization
33 Corollary when Controlling the Metric Entropy Assumption D R such that N(X, d n, ε) = O ( ( ) ε D). It suffices that d n (x, x ) = O x x 2 and X [0, R] D. e.g. Squared Exponential covariance, Matérn covariance,... Corollary ( ) sup f (x ) f (x t ) O Dσt log σt 1, x X ( T ) thus, R T O D σ t log σt 1. t=1 Contal 27/41 Gaussian Processes Optimization
34 Information Gain Lemma [Srinivas et al., 2012] T σt 2 O ( ) γ T, t=1 where γ T = max X X : X =T I (X ), the maximum information gain on f by a set of T observations. For GP, I (X ) = 1 2 log det ( I + η 2 K X ). Upper Bounds Linear covariance k(x, y) = x y, γ T O ( D log T ) Squared exp covariance k(x, y) = e 1 2 x y 2 2, γt O ( (logt ) D+1) Matérn covariance with parameter ν > 1, γ T O ( (logt )T a), with a = D(D+1) 2ν+D(D+1) < 1. Contal 28/41 Gaussian Processes Optimization
35 Corollary for the regret Linear covariance, R T O ( DT log T ) Squared exp covariance, R T O ( T (log T ) D+2) Matérn covariance, R T O ( log T T a), with a = ν+d+d2 2ν+D+D 2. Contal 29/41 Gaussian Processes Optimization
36 Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm and Theoretical Results Experiments on real and synthetic data sets Further Results: Quadratic Forms, Noise-free Optimization and Lower Bounds
37 Experiments 1/4: Himmelblau s function 10 0 Chaining UCB GP UCB Random 10 1 S n Iteration n Contal 31/41 Gaussian Processes Optimization
38 Experiments 2/4: SE kernel Chaining UCB GP UCB Random 10 0 S n Iteration n Contal 32/41 Gaussian Processes Optimization
39 Experiments 3/4: Wave Energy Converter 10 0 Chaining UCB GP UCB Random 10 1 S n Iteration n Contal 33/41 Gaussian Processes Optimization
40 Experiments 4/4: Graphs kernel S n Chaining UCB GP UCB Random f( ) = 1.3 f( ) = 0.2 f( ) = Iteration n Contal 34/41 Gaussian Processes Optimization
41 Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm and Theoretical Results Experiments on real and synthetic data sets Further Results: Quadratic Forms, Noise-free Optimization and Lower Bounds
42 Optimization of Other Stochastic Processes Minimal assumption f : X R a stochastic process with pseudo-distance d(, ) and ψ u ( ) such that: [ Pr f (x) f (x ( ) > ψ u d(x, x ) )] < e u. Example: Quadratic Form of GP Applications f (x) = N i=1 g 2 i (x) where g i GP(0, k i ). Optimization of a costly mean-square-error Optimization of a costly Gaussian likelihood (Bayesian model calibration) Contal 36/41 Gaussian Processes Optimization
43 Noise-free Optimization Problem setting f GP(0, k) and y t = f (x t ). Algorithm Pre-compute the hierarchical ε i -nets for d 0 and build the tree. For x in the tree, let δ (x) = i>depth(x) ɛ iβ δ,i. Evaluate f at the root. Loop x t+1 = argmax x Childs(x1,...,x t) f (x) + δ (x). Contal 37/41 Gaussian Processes Optimization
44 Noise-free Optimization: Results Property With probability at least 1 δ, f f (x t ) δ (x t ). Lemma { x t : ɛ δ (x t ) ɛ ( 1 + Depth(x t ) ) } 1 2 O(1). Theorem (Ongoing Work) If N(X, d 0, ε) = O ( ε D), for the previous algorithm, R T = O(1) and S T = O(e T ). Contal 38/41 Gaussian Processes Optimization
45 Lower Bounds on the Supremum of the GP Reminder: UCB rewritten With probability at least 1 δ, for all x at depth h in the tree, sup f (x ) f (x) cste ɛ i β δ,i. x B(x,ɛ h ) i>h Theorem: LCB (Ongoing Work) With probability at least 1 δ, for all x at depth h in the tree, sup f (x ) f (x) cste ɛ i β δ,i. x B(x,ɛ h ) i>h Contal 39/41 Gaussian Processes Optimization
46 Conclusion The Chaining-UCB Algorithm automatic calibration of the exploration/exploitation tradeoff adapts to various settings computationally tractable Matlab code online Contal 40/41 Gaussian Processes Optimization
47 Contal, E., Malherbe, C., and Vayatis, N. (2015). Optimization for gaussian processes via chaining. NIPS Workshop on Bayesian Optimization. Munos, R. (2011). Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in neural information processing systems (NIPS). Rasmussen, C. E. and Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. (2012). Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5): Contal 41/41 Gaussian Processes Optimization
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration Emile Contal David Buffoni Alexandre Robicquet Nicolas Vayatis CMLA, ENS Cachan, France September 25, 2013 Motivating
More informationGaussian Process Optimization with Mutual Information
Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,
More informationOptimisation séquentielle et application au design
Optimisation séquentielle et application au design d expériences Nicolas Vayatis Séminaire Aristote, Ecole Polytechnique - 23 octobre 2014 Joint work with Emile Contal (computer scientist, PhD student)
More informationQuantifying mismatch in Bayesian optimization
Quantifying mismatch in Bayesian optimization Eric Schulz University College London e.schulz@cs.ucl.ac.uk Maarten Speekenbrink University College London m.speekenbrink@ucl.ac.uk José Miguel Hernández-Lobato
More informationTalk on Bayesian Optimization
Talk on Bayesian Optimization Jungtaek Kim (jtkim@postech.ac.kr) Machine Learning Group, Department of Computer Science and Engineering, POSTECH, 77-Cheongam-ro, Nam-gu, Pohang-si 37673, Gyungsangbuk-do,
More informationProbabilistic numerics for deep learning
Presenter: Shijia Wang Department of Engineering Science, University of Oxford rning (RLSS) Summer School, Montreal 2017 Outline 1 Introduction Probabilistic Numerics 2 Components Probabilistic modeling
More informationarxiv: v3 [stat.ml] 8 Jun 2015
Gaussian Process Optimization with Mutual Information arxiv:1311.485v3 [stat.ml] 8 Jun 15 Emile Contal 1, Vianney Perchet, and Nicolas Vayatis 1 1 CMLA, UMR CNRS 8536, ENS Cachan, France LPMA, Université
More informationPredictive Variance Reduction Search
Predictive Variance Reduction Search Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh Centre of Pattern Recognition and Data Analytics (PRaDA), Deakin University Email: v.nguyen@deakin.edu.au
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationA parametric approach to Bayesian optimization with pairwise comparisons
A parametric approach to Bayesian optimization with pairwise comparisons Marco Co Eindhoven University of Technology m.g.h.co@tue.nl Bert de Vries Eindhoven University of Technology and GN Hearing bdevries@ieee.org
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationParallelised Bayesian Optimisation via Thompson Sampling
Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Carnegie Mellon University Google Research, Mountain View, CA Sep 27, 2017 Slides: www.cs.cmu.edu/~kkandasa/talks/google-ts-slides.pdf
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationLecture 1c: Gaussian Processes for Regression
Lecture c: Gaussian Processes for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationTruncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher Laboratory for Information and Inference
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationFast Likelihood-Free Inference via Bayesian Optimization
Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationBayesian optimization for automatic machine learning
Bayesian optimization for automatic machine learning Matthew W. Ho man based o work with J. M. Hernández-Lobato, M. Gelbart, B. Shahriari, and others! University of Cambridge July 11, 2015 Black-bo optimization
More informationContextual Gaussian Process Bandit Optimization
Contextual Gaussian Process Bandit Optimization Andreas Krause Cheng Soon Ong Department of Computer Science, ETH Zurich, 89 Zurich, Switzerland krausea@ethz.ch chengsoon.ong@inf.ethz.ch Abstract How should
More informationThe optimistic principle applied to function optimization
The optimistic principle applied to function optimization Rémi Munos Google DeepMind INRIA Lille, Sequel team LION 9, 2015 The optimistic principle applied to function optimization Optimistic principle:
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationarxiv: v4 [cs.lg] 9 Jun 2010
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design arxiv:912.3995v4 [cs.lg] 9 Jun 21 Niranjan Srinivas California Institute of Technology niranjan@caltech.edu Sham M.
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationMulti-Attribute Bayesian Optimization under Utility Uncertainty
Multi-Attribute Bayesian Optimization under Utility Uncertainty Raul Astudillo Cornell University Ithaca, NY 14853 ra598@cornell.edu Peter I. Frazier Cornell University Ithaca, NY 14853 pf98@cornell.edu
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationVariable sigma Gaussian processes: An expectation propagation perspective
Variable sigma Gaussian processes: An expectation propagation perspective Yuan (Alan) Qi Ahmed H. Abdel-Gawad CS & Statistics Departments, Purdue University ECE Department, Purdue University alanqi@cs.purdue.edu
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationHigh Dimensional Bayesian Optimization via Restricted Projection Pursuit Models
High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models Chun-Liang Li Kirthevasan Kandasamy Barnabás Póczos Jeff Schneider {chunlial, kandasamy, bapoczos, schneide}@cs.cmu.edu Carnegie
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationDynamic Batch Bayesian Optimization
Dynamic Batch Bayesian Optimization Javad Azimi EECS, Oregon State University azimi@eecs.oregonstate.edu Ali Jalali ECE, University of Texas at Austin alij@mail.utexas.edu Xiaoli Fern EECS, Oregon State
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationModelling Transcriptional Regulation with Gaussian Processes
Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationarxiv: v1 [stat.ml] 24 Oct 2016
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation arxiv:6.7379v [stat.ml] 4 Oct 6 Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher Laboratory
More informationA COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION
A COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION Theodoros Tsiligkaridis, Brian M Sadler and Alfred O Hero III, University of Michigan, EECS Dept and Dept Statistics,
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationModel-Based Reinforcement Learning with Continuous States and Actions
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More information