Distributionally robust optimization techniques in batch bayesian optimisation

Size: px
Start display at page:

Download "Distributionally robust optimization techniques in batch bayesian optimisation"

Transcription

1 Distributionally robust optimization techniques in batch bayesian optimisation Nikitas Rontsis June 13, Introduction This report is concerned with performing batch bayesian optimization of an unknown function f. Using a Gaussian process (GP) framework to estimate the function, we are searching for the best batch of k points where the function will be evaluated. According to [3], the expected loss of a specific choice of k points involves the expensive integration over multidimensional regions. We reformulate the problem using worst case expectation techniques with second-order moment information [6]. This reformulation is a conservative approximation of the original problem that considers all the distributions with a given mean and covariance, including the Gaussian distribution which is assumed in the original problem. This reformulation is free from the expensive integrations of the original problem. We show that this formulation is overly conservative returning trivial solutions. A solution to avoid this problem is introduced by bounding the support of the distributions that are considered in the worst case expectation. However, this leads to a semi-infinite optimization problem. Sum-of-square techniques are suggested as a possible relaxation. 2 Initial Definitions Let f : R n R be a smooth function to be minimized. Assume that l points y i = f(x i ) have been gathered so far, forming a dataset D 0 = (x i, y i )} = (X 0, y 0 ) with X 0 R l n and y 0 R n. In order to find the next k points X R k n where 1

2 the function will be evaluated, a GP is used to build a statistical picture of the function s form. For an overview of Gaussian processes see [5]. The properties of a GP are determined by a prior mean function m(x) = E(f(x)) which, without loss of generality, can be assumed to be zero, and a prior positive semi-definite covariance function k(x, x ) = E((f(x) m(x))(f(x ) m(x ))). Given these, GP dictates the following probability distribution for the function values y of the k selected points X with mean value and variance y D 0 N (µ(x), Σ(X)) (1) µ(x) = K(X 0, X) (K(X 0, X 0 ) + σ 2 ni) 1 y 0, (2) Σ(X) = K(X, X) K(X 0, X) (K(X 0, X 0 ) + σ 2 ni) 1 K(X 0, X), (3) where K(A, B) (i,j) = k(a i, B j ), i.e. each element of the matrix K(A, B) is the covariance between the i-th point of A and the j-th point of B. The mean value µ and the variance Σ depend also on the prior D 0, but we do not explicitly denote their dependence in order to keep the notation uncluttered. 3 Expected Loss Function The expectation of the next evaluation is with η = min y 0. This can be reformulated as Λ(X D 0 ) = E(miny, η)} (4) Λ(X D 0 ) = η N (y; µ, Σ)dy + C 0 k C i y i N (y; µ, Σ)dy, (5) where the integrals are taken over C 0 = y R k y j η, j = 1... k } and C i = y R k y i η y i y j, j = 1... k }. In order to calculate the above k+1 k-dimensional integrals, [3] suggests using Expectation Propagation. However, Expectation Propagation is an approximate and expensive operation that needs to be performed in every step of the optimization algorithm used to minimize (4), considerably increasing the complexity of the resulting global optimization procedure. 2

3 4 Worst Case Expectation Formulation In this section we derive conservative approximations for the minimization of (4) using worst case expectation techniques, in order to avoid the multidimensional integrations that are present in (5). 4.1 Generic Formulation Define the set P(µ, Σ) of all probability distributions on R k with given mean vector µ and covariance matrix Σ 0. Then, let sup P P E P (g(ξ)) denote the worst case expectation of a measurable function g : R k R. The worst case expectation can be described by the following optimization problem [6] θ wc = sup g(ξ)ν(dξ) ν M + R k subject to ν(dξ) = 1 Rk (6) ξν(dξ) = µ Rk R k ξξ ν(dξ) = Σ + µµ, where M + represents the cone of nonnegative Borel measures on R k. This is a linear program with infinite dimensional variables and finite constraints. Its dual has finite number of variables and infinite constraints, while exhibiting zero duality gap [6]. Hence we can equivalently focus on the dual problem to (5) inf Ω, M subject to [ ξ 1 ] M [ ξ 1 ] g(ξ) ξ R k, with M S k+1 as the variable and Ω S k+1 denoting the second order moment matrix of ξ ( ) Σ + µµ µ Ω = µ. 1 We use Ω, M for the trace inner product. The matrix M consists of the Lagrange multipliers Y R p p, y R k and y 0 R that correspond to the equality constraints for the covariance matrix, the mean vector and ν integrating to one, i.e. ( ) Y y M = y y 0 3 (7)

4 4.2 Concave piecewise affine function When g(ξ) is a piecewise affine function, the infinite collection of constraints in (7) parameterized by ξ can be eliminated with techniques used in [6, Theorem 2.3]. First, note that a concave piecewise affine function can be reformulated as the minimum of a linear function over the probability simplex [2, Exercise 4.8] min,,l (a i + b i ξ) = min λ 1...l λ i (a i + b i ξ). (8) Combining this result with the Minmax Lemma [1, Lemma D 4.1], which allows us to swap the order of minimization and maximization, we will convert the infinite dimensional constraint to a linear matrix inequality. The Minmax Lemma requires [ ξ 1 ] M [ ξ 1 ] to be a convex function of ξ, i.e. Y 0. This is necessary when g(ξ) is concave piecewise affine, as a negative eigenvalue in Y will result in a violation of the inequality along the direction of the corresponding eigenvalue. For example, in the particular case of g(ξ) = minξ, η} we can reformulate the constraint as following [ ξ 1 ] M [ ξ 1 ] minξ, η} ξ R k [ ξ 1 ] M [ ξ 1 ] ( k ) min λ i ξ i + λ k+1 η 0, ξ R k λ [ξ min max 1 ] M [ ξ 1 ] ( k )} λ i ξ i + λ k+1 η 0 ξ R k λ [ξ max min 1 ] M [ ξ 1 ] ( k )} λ i ξ i + λ k+1 η 0 λ ξ R k [ξ min 1 ] M [ ξ 1 ] ( k )} λ i ξ i + λ k+1 η 0, for a λ ξ R k [ M where = in R k+1. 0 λ 1,...,k /2 λ 1,...,k /2 λ k+1η ] 0, for a λ, λ R k+1 : } k+1 λ i = 1, λ 0 denotes the probability simplex 4

5 4.3 Results for the case of batch bayesian optimization Using the previous results we can derive the following tractable optimization problem that is a conservative approximation of the minimization of (5) inf Ω(X), M [ 0 λ subject to M 1,...,k /2 λ 1,...,k /2 λ k+1η ] 0, λ (9) with variables M S k+1, X R k n, and λ R k+1. The dependence of Ω to X is, in general, complex. Only in very simple cases it can be convex. For example, in Appendix A we show that when k = 1 and the kernel used in the GP is linear, then the objective of the optimization problem is convex separately on (X) and (M, λ). Assume that h(x) is the result of the optimization problem (9) when minimizing only over M. This minimization can be solved globally and with standard software tools, because it is a semidefinite optimization problem. In an upper level, we pass the function h(x) to a nonlinear solver, thus optimizing the whole problem. As a result, the task of the nonlinear solver was reduced to optimizing over X. Unfortunately, the worst case expectation achieved by (9) is trivially equal to minµ, η}, as we can deduce from the following proposition. Proposition 4.1. For a concave piecewise affine function g(ξ) = min,...,l (a i + b i ξ) the optimal value of (6) is sup P P E P(g(ξ)) = min,...,l (a i + b i µ). Proof. First note that g(ξ) is bounded above ( ) E min i + b i ξ),...,l min i + b i ξ) = min i + b i µ),...,l,...,l (10) We will construct a distribution that achieves the above mentioned upper bound. Assume the one-dimensional, uncorrelated random variables z, w ( z U 1 ɛ, 1 ), w N (0, ɛ), (11) ɛ i.e. z is uniformly distributed in ( ɛ 1, ɛ 1 ), and w is a zero mean Gaussian with variance ɛ, where ɛ R ++. 5

6 Now, assuming 0 < ɛ distribution x = 1 3, consider the random variable x with the mixture z with probability 3ɛ 2 w with probability 1 3ɛ 2 (12) Since both of the mixing distributions are zero mean, the resulting distribution is zero mean with variance E(x 2 ) = 3ɛ 2 E(z 2 ) + (1 3ɛ 2 )E(w 2 ) = 1 + ɛ(1 3ɛ 2 ) (13) In the limit ɛ 0 the random variable x has zero mean and variance one, but its probability distribution function is infinitesimal everywhere outside the origin. Assuming x is a vector of independent variables distributed identically to x for ɛ 0, the random vector ξ = Σ 1/2 x + µ has covariance matrix Σ and mean value µ, with its probability distribution being infinitesimal everywhere expect in µ. For this random vector the inequality (14) holds tightly. It is worth noting that when g(ξ) = max,...,l (a i + b i ξ), i.e. a convex piecewise affine function, we can have a similar lower bound ( ) E max (a i + b i ξ),...,l max E(a i + b i ξ) = max (a i + b i µ) (14),...,l,...,l This bound is also tight: it is achieved with the random vector x. However, this is of no practical importance since we are performing a maximization. 4.4 Bounded support set One way to avoid these problematic distributions that achieve trivial solutions is to bound the support of the distribution. One possible choice would be to enforce each distribution ν P to be supported only in the following set S = ξ R k (ξ µ) Σ 1 (ξ µ) < α 2 } (15) which for α = 3 and a Gaussian probability measure N (µ, Σ) includes nearly all the mass. Under this constraint, the dual problem (7) is reformulated to inf Ω, M subject to [ ξ 1 ] M [ ξ 1 ] min (a i + b i ξ) ξ S (16),...,l 6

7 Unfortunately in this problem [ ξ 1 ] M [ ξ 1 ] is not necessarily convex (Y is, in general, indefinite). Hence we cannot apply the Minmax Lemma to reduce the inifinite constraints to linear matrix inequalities. The infinite number of constraints can be eliminated conservatively by sum of squares techniques using the following result Proposition 4.2. By the Positivstellensatz [4] the following set is empty h ξ R k i (ξ) = [ ξ 1 ] M [ ξ 1 ] (ai + b i ξ) 0, i = 1,..., l } h l+1 (ξ) = (ξ µ) Σ 1 (ξ µ) α 2 0 (17) if and only if there exist globally positive polynomials s i, p i,j such that l+1 1 = s 0 (ξ) s i (ξ)h i (ξ) + i j p i,j (ξ)h i (ξ)h j (ξ) ξ R k (18) The proof is a straightforward specialization of the results in [4]. Under the typical sum of squares relaxation [4], we choose a vector z of monomials, and we represent s i as z L i z with L i 0 (similarly for p i,j ). Then we equate the elements of L i such that the resulting polynomial equation dictated by Proposition 4.2 holds. However, in our case the coefficients of the sum of squares polynomials are multiplied by the elements of M, resulting in a bilinearity, making the problem difficult to solve. 5 Conclusions The main goal of this mini project was to explore the connections between distributionally robust optimization techniques and Gaussian processes. To this end, a better understanding of both fields was obtained, encouraging further exploration in this area. Probably the main problem that constrained from having successful results was the one described in Proposition 4.1. One easier way to avoid this problem might be instead of considering the worst case expectation to consider the best case expectation, which does not exhibit the problems described in Proposition

8 References [1] Aharon Ben-Tal and Arkadiaei Semenovich Nemirovskiaei. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, [2] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, USA, [3] Javier González, Michael A. Osborne, and Neil D. Lawrence. GLASSES : Relieving The Myopia Of Bayesian Optimisation. In International Conference on Artificial Intelligence and Statistics (AISTATS), [4] Pablo A Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, Citeseer, [5] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, [6] Steve Zymler, Daniel Kuhn, and Berç Rustem. Distributionally robust joint chance constraints with second-order moment information. Mathematical Programming, 137(1): ,

9 A Linear kernel The linear kernel is defined as k(x, x ) = σb 2 + σ2 ν(x c) (x c). This kernel is non-stationary. The hyperparameter c determines the point that all lines in the posterior go through. The hyperparameter σb 2 specifies the absolute value of the function at zero by putting a prior on it. When only one future point is considered (k = 1), then the optimization problem (9) can be simplified as follows. We denote by 1 a vector with all components one. Define A = X 0 c1, B = ( ( ) AA σ 2I ( ) + n σν + σ b ) and w = x c. The resulting one dimensional variance of (1) is given by [ ] ( )[ ] B σ 2 (x) = σb 2 + σνw 2 w σνaw 2 + σb 2 1 σ σ νaw 2 + σ 2 ν 2 b 1 ( ) σ = σb 2 + σνw 2 (I A BA)w 2σb (19) BAw b 1 B1, which, as we will prove below, is a positive definite quadratic. 1 First, note that for E S n ++, F S n + the following equivalence holds E F = z z z E 1/2 FE 1/2 z z R n = z E 1/2 F + E 1/2 z z z z R(F ) = z F + z z E 1 z z R(F ), where E 1/2 denotes the principal square root of E, F + the pseudoinvesre of F, and R(F ) the row space of F. Using the above result, we have ( ) 2 ( ) ) 2 z (AA σn σb + I + 11 z z AA z z R n = z (AA + ( σn ) 2 I + ( σb ) 2 11 ) 1 z z (AA ) + z z R(A A) = z A BAz za (AA ) + Az z R n, 1 This can also be proved very easily by noting that σ 2 (x) is in quadratic form and, as a valid variance function, it is always positive. However we leave the alternative proof to provide a better insight. 9

10 since Az R(A A) z R n. Finally, note that A (AA ) + A I since z A (AA ) + z Iz, z R(A ) Az = 0, z N (A). (20) Hence, we conclude that A BA I and, as a result, (19) is a positive definite quadratic. The one dimensional mean function µ(x) of (1) is a linear function of x and is given by [ ( ) ] 2 σb µ(x) = Aw + 1 By 0. (21) As a result, the objective function of (9) is quadratic separately on (x) and (M, λ). 10

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Distributionally Robust Convex Optimization

Distributionally Robust Convex Optimization Submitted to Operations Research manuscript OPRE-2013-02-060 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However,

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

On deterministic reformulations of distributionally robust joint chance constrained optimization problems

On deterministic reformulations of distributionally robust joint chance constrained optimization problems On deterministic reformulations of distributionally robust joint chance constrained optimization problems Weijun Xie and Shabbir Ahmed School of Industrial & Systems Engineering Georgia Institute of Technology,

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 28 (Algebra + Optimization) 02 May, 2013 Suvrit Sra Admin Poster presentation on 10th May mandatory HW, Midterm, Quiz to be reweighted Project final report

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

arxiv: v3 [stat.ml] 24 Oct 2017

arxiv: v3 [stat.ml] 24 Oct 2017 Distributionally Ambiguous Optimization Techniques for Batch Bayesian Optimization Nikitas Rontsis Michael A. Osborne Paul J. Goulart University of Oxford University of Oxford University of Oxford arxiv:1707.04191v3

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Robust Fisher Discriminant Analysis

Robust Fisher Discriminant Analysis Robust Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen P. Boyd Information Systems Laboratory Electrical Engineering Department, Stanford University Stanford, CA 94305-9510 sjkim@stanford.edu

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint

A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint Iranian Journal of Operations Research Vol. 2, No. 2, 20, pp. 29-34 A semidefinite relaxation scheme for quadratically constrained quadratic problems with an additional linear constraint M. Salahi Semidefinite

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

4. Algebra and Duality

4. Algebra and Duality 4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

A new look at nonnegativity on closed sets

A new look at nonnegativity on closed sets A new look at nonnegativity on closed sets LAAS-CNRS and Institute of Mathematics, Toulouse, France IPAM, UCLA September 2010 Positivstellensatze for semi-algebraic sets K R n from the knowledge of defining

More information

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)

Chapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1) Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems

Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems V. Jeyakumar and G. Li Revised Version:August 31, 2012 Abstract An exact semidefinite linear programming (SDP) relaxation

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term EE 150 - Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko JPL Third Term 2011-2012 Due on Thursday May 3 in class. Homework Set #4 1. (10 points) (Adapted

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

arxiv: v1 [math.oc] 31 Jan 2017

arxiv: v1 [math.oc] 31 Jan 2017 CONVEX CONSTRAINED SEMIALGEBRAIC VOLUME OPTIMIZATION: APPLICATION IN SYSTEMS AND CONTROL 1 Ashkan Jasour, Constantino Lagoa School of Electrical Engineering and Computer Science, Pennsylvania State University

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization

Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization V. Jeyakumar and G. Y. Li Revised Version: September 11, 2013 Abstract The trust-region

More information

Extending the Scope of Robust Quadratic Optimization

Extending the Scope of Robust Quadratic Optimization Extending the Scope of Robust Quadratic Optimization Ahmadreza Marandi Aharon Ben-Tal Dick den Hertog Bertrand Melenberg June 1, 017 Abstract In this paper, we derive tractable reformulations of the robust

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min. MA 796S: Convex Optimization and Interior Point Methods October 8, 2007 Lecture 1 Lecturer: Kartik Sivaramakrishnan Scribe: Kartik Sivaramakrishnan 1 Conic programming Consider the conic program min s.t.

More information

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty V. Jeyakumar, G. M. Lee and G. Li Communicated by Sándor Zoltán Németh Abstract This paper deals with convex optimization problems

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Module 04 Optimization Problems KKT Conditions & Solvers

Module 04 Optimization Problems KKT Conditions & Solvers Module 04 Optimization Problems KKT Conditions & Solvers Ahmad F. Taha EE 5243: Introduction to Cyber-Physical Systems Email: ahmad.taha@utsa.edu Webpage: http://engineering.utsa.edu/ taha/index.html September

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Relaxations and Randomized Methods for Nonconvex QCQPs

Relaxations and Randomized Methods for Nonconvex QCQPs Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

On self-concordant barriers for generalized power cones

On self-concordant barriers for generalized power cones On self-concordant barriers for generalized power cones Scott Roy Lin Xiao January 30, 2018 Abstract In the study of interior-point methods for nonsymmetric conic optimization and their applications, Nesterov

More information

Support Vector Machines for Regression

Support Vector Machines for Regression COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 2003 2003.09.02.10 6. The Positivstellensatz Basic semialgebraic sets Semialgebraic sets Tarski-Seidenberg and quantifier elimination Feasibility

More information

Lecture 3: Semidefinite Programming

Lecture 3: Semidefinite Programming Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming, examples, canonical form, and duality Part II: Strong Duality Failure Examples Part III: Conditions for strong duality

More information

Structure of Valid Inequalities for Mixed Integer Conic Programs

Structure of Valid Inequalities for Mixed Integer Conic Programs Structure of Valid Inequalities for Mixed Integer Conic Programs Fatma Kılınç-Karzan Tepper School of Business Carnegie Mellon University 18 th Combinatorial Optimization Workshop Aussois, France January

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

Supplementary Material

Supplementary Material ec1 Supplementary Material EC.1. Measures in D(S, µ, Ψ) Are Less Dispersed Lemma EC.1. Given a distributional set D(S, µ, Ψ) with S convex and a random vector ξ such that its distribution F D(S, µ, Ψ),

More information